|Home | About | Journals | Submit | Contact Us | Français|
The characterization of atypical mutations in loci associated with diseases is a powerful tool to discover novel regulatory elements. We previously identified a dinucleotide deletion in the human ankyrin-1 gene (ANK-1) promoter that underlies ankyrin-deficient hereditary spherocytosis. The presence of the deletion was associated with a decrease in promoter function both in vitro and in vivo establishing it as a causative hereditary spherocytosis mutation. The dinucleotide deletion is located in the 5′ untranslated region of the ANK-1 gene and disrupts the binding of TATA binding protein and TFIID, components of the preinitiation complex. We hypothesized that the nucleotides surrounding the mutation define an uncharacterized regulatory sequence. To test this hypothesis, we generated a library of more than 16,000 ANK-1 promoters with degenerate sequence around the mutation and cloned the functional promoter sequences after cell-free transcription. We identified the wild type and three additional sequences, from which we derived a consensus. The sequences were shown to be functional in cell-free transcription, transient-transfection, and transgenic mouse assays. One sequence increased ANK-1 promoter function 5-fold, while randomly chosen sequences decreased ANK-1 promoter function. Our results demonstrate a novel functional motif in the ANK-1 promoter.
Hereditary spherocytosis (HS; OMIM 182900) is a dominant inherited hemolytic anemia that affects approximately 1/2,500 people of all races worldwide (1, 17, 18). Typically, HS patients have mild symptoms, which can be exacerbated by viral infections (19). These symptoms include elevated reticulocyte counts and smaller, spherical erythrocytes on a blood smear and are accompanied by an abnormal osmotic fragility (13, 19, 23). The majority of HS mutations have been found in the genes encoding the erythrocyte membrane skeleton proteins ankyrin-1 (ANK-1; ~60%) and Band 3 (SLC4A1; ~20%) (1, 19). Virtually all of the described HS mutations cause a functional deficiency of erythrocyte skeleton proteins, either by premature termination and/or amino acid substitutions in regions critical for the protein-protein interactions that stabilize the erythrocyte membrane skeleton (17, 18). In the 10 to 20% of patients in whom no mutations have been detected in the coding region of the membrane skeleton protein genes, the causative mutations are proposed to be in cis-acting regulatory regions resulting in decreased transcription of mRNA resulting in haploinsufficiency (12, 17, 18).
Support for this hypothesis has come from our previous analysis of a German patient with a severe form of HS (20). The patient was shown to have two mutations in the ANK-1 gene. The first was a 20-bp deletion in exon 6, leading to premature termination, presumably inherited from the father. The second mutation was a deletion of a TG dinucleotide in the 5′ untranslated region of the ANK-1 gene located at position −72/73 relative to the ATG initiation codon (12, 20) or + 12/13 from the transcriptional start site (TSS) listed in the database of transcriptional start sites (DBTSS) (37-39). We showed that the TATA-binding protein (TBP) of the transcription initiation complex, TFIID, bound to a region spanning nucleotides −78 to −70 that included the TG dinucleotide. Deletion of the TG dinucleotide disrupted the binding of these factors in vitro. Finally, we showed that the TG deletion caused ankyrin deficiency by decreasing ANK-1 promoter function both in vitro and in transgenic mice, establishing the TG deletion as a causative HS mutation (20).
In eukaryotes the protein-coding genes are transcribed by RNA polymerase II (Pol II) and are referred to as class II genes. The TSS and the sequences immediately flanking the TSS are referred to as the core promoter (24, 34), which is functionally defined as the minimal DNA region required to direct low levels of accurate RNA Pol II transcription initiation in vitro (9). Core promoters contain one or more DNA sequence elements that direct the recruitment and assembly of the class II basal and/or general transcription factors (TFIID, TFIIA, TFIIB, TFIIF, TFIIE, and TFIIH) and RNA Pol II into a functional preinitiation complex (PIC) at the transcription start site (31, 34). For example, the TATA box (consensus sequence TATAAA) is located 25 to 30 bp upstream from the transcription initiation site (7) and is directly bound by the TATA-binding protein (TBP) subunit of the TFIID complex. The initiator element [Inr; consensus sequence YYAN(T/A)YY] encompasses the TSS and is recognized by the TBP-associated factors TAF1 and TAF2 of the TFIID complex (10). The TFIIB recognition element [BRE; consensus sequence (G/C)(G/C)(G/A)CGCC] immediately flanks the TATA box and is directly bound by TFIIB (27). The downstream promoter element [DPE; consensus sequence (A/G)G(A/T)(C/T)(G/A/C)], originally described in Drosophila but conserved in mammals (8), is located 30 bp downstream of the transcription start site in TATA-less promoters and bind subunits TAF6 and TAF9 of TFIID (8).
In mammals, approximately half of the promoters for protein-coding genes are associated with CpG islands (4, 37). These promoters generally lack consensus or near-consensus TATA boxes, DPE elements, or Inr elements (5, 33). Common features of CpG island promoters are multiple, dispersed transcription initiation sites and the presence of multiple binding sites for transcription factor Sp1 (5, 6). Transcription start sites are often located 40 to 80 bp downstream of the Sp1 sites, suggesting that Sp1 may direct the basal machinery to form a PIC (34). However, the variety of TSSs in dispersed promoters make it difficult to identify the positioning of core promoter elements relative to the initiation sites. We have previously demonstrated that the minimal human ANK-1 promoter has a high G+C content (77%) with no consensus promoter motifs (e.g., TATA box or InR sequence). The ANK-1 promoter has the typical multiple transcription initiation sites associated with CpG island promoters (21).
We hypothesized that the region surrounding the HS mutation defined a functional component of the ANK-1 promoter and that sequence variations in this region would affect level of transcription from the ANK-1 promoter. To test this hypothesis, we generated a library of ~16,000 ANK-1 promoters with degenerate sequence in the region occupied by TBP/TFIID. This library was used as a template for cell-free, in vitro, transcription and active ANK-1 promoter sequences were identified by rapid amplification of cDNA ends (RACE) analysis of the transcribed RNA. We identified four functional sequences: the wild-type ANK-1 sequence and three additional sequences that differed from the wild-type sequence by two or three bases. Analysis of these individual sequences showed that all were active in cell-free transcription, transient-transfection, and transgenic mouse assays, while randomly selected sequences were inactive. Two sequences directed similar levels of expression as the wild-type sequence, while the fourth sequence showed 5-fold higher levels of expression compared to the wild type. We conclude that the region from −78 to −70 of the ANK-1 gene is a regulatory region and that variation in the sequence on this region defines the level of transcription from the ANK-1 promoter.
A set of 100-bp oligonucleotides was synthesized containing the ANK-1 wild-type sequence with degenerate sequence at nucleotides −78 to −74 and nucleotides −71 to −70, while the critical TG dinucleotide at positions −72 and −73 was left intact. These oligonucleotides were annealed to complementary oligonucleotide primers and subjected to PCR (94°C for 5 min followed by 40 cycles at 94°C for 0.5 min and 72°C for 1 min). The PCR products were digested with BmgB1 and XhoI and ligated into the pGL2B-based ANK-1/luciferase plasmid described previously (pANK-1WT; p296 from reference 21) by using a quick ligation kit (New England Biolabs, Ipswich, MA). Maximum-efficiency DH5α bacteria (Invitrogen, Gaithersburg, MD) were transformed with the ligation products and grown for 16 h in LB medium containing ampicillin, after which plasmid DNA was prepared. Dilutions of the bacteria were plated on LB agar to isolate individual clones for sequence analysis.
Magnesium agarose electrophoretic mobility shift analyses (EMSAs) were performed as described previously (9). Wild-type (ANK-1WT) or variant ankyrin (ANK-1GC, ANK-1GG, ANK-1GGG, and ANK-1rTG) probes excised with SmaI and HindIII (281 bp) were end labeled with [γ-32P]ATP. Labeled promoter fragments were incubated with recombinant human TFIIA (Austral Biologicals, San Ramon, CA) alone or with purified human TFIID (Protein One, Bethesda, MD). In the binding reactions, the amounts of recombinant TFIID were titrated over a range of concentrations (0, 500, 1,000, and 1,500 ng per reaction), and 20 ng of TFIIA per reaction was added. The final KCl concentration in each reaction mix was 60 mM. After incubation at 30°C for 60 min, reaction products were electrophoresed at room temperature in 1.4% magnesium agarose gels. The gels were dried and subjected to autoradiography, followed by exposure to a phosphorimager screen and quantitative analyses on a Molecular Dynamics PhosphorImager.
In vitro DNase I footprinting was performed as described previously (20) using 32P-labeled 175-bp BmgBI/XbaI fragments containing the wild type and variant ankyrin promoter sequences as templates. Footprinting reaction mixes contained purified human TFIID (Protein One).
Cell-free in vitro transcription was performed using a runoff technique as described previously (9) using a HeLaScribe kit (Promega, Madison, WI) with the HeLa cell nuclear extract provided or with K562 extract prepared as described by Andrews and Faller (2). For each reaction, 200 ng of template from the library was linearized with BstBI (which cuts in the luciferase gene) and transcribed in a final MgCl2 concentration of 3 mM. After incubation at 30°C for 60 min, reactions were stopped and ethanol precipitated.
To measure cell free transcription from individual promoters, a SmaI/BstBI fragment containing the cytomegalovirus (CMV) promoter (1,194 bp) was inserted in reverse orientation to SmaI/BstBI fragments containing either the pANK-1WT template or the templates with variant sequences. Fragments containing both templates were excised with SmaI and purified. Prior to cell free transcription, these fragments were digested with BstBI, ensuring equivalent numbers of control and test template from each isolation. The fragments (200 ng) were transcribed as described above in the presence of a [32P]GTP (Amersham, Piscataway, NJ). Runoff transcription products were analyzed on a denaturing polyacrylamide gel, and the ratio of control transcript to mutant transcript was calculated by using a Molecular Dynamics PhosphorImager (Amersham).
RNA from in vitro transcription assays was reverse transcribed using Moloney murine leukemia virus reverse transcriptase using an oligo(dT) adapter primer (Clontech, Mountain View, CA) as described previously (14, 16). A total of 20% of the reverse-transcribed cDNA was amplified by PCR using an adapter primer and a gene-specific primer (TGCAGTTGCTCTCCAGCGGTTCCATC). Amplification products were subcloned into pCR 2.1 vector by using a TA cloning kit (Invitrogen), and the nucleotide sequences determined.
The K562 (erythroid) cell line was maintained in improved Eagle minimal essential medium (Invitrogen), containing 10% fetal calf serum (HyClone, Logan, UT). The different ankyrin promoter fragments (ANK-1WT, ANK-1GC, ANK-1GG, and ANK-1GGG) were subcloned into the firefly luciferase reporter plasmid pANK-1WT and sequence verified. K562 cells (107) were transfected by electroporation with a single pulse of 300 V at 950 μF with 20 μg of test plasmid and 0.5 μg of pRL-SV40, a mammalian reporter plasmid expressing Renilla luciferase driven by the simian virus 40 (SV40) early gene promoter (Promega). At 48 h after transfection, cells were harvested and lysed, and the ratios of firefly luciferase (test) and Renilla luciferase activity (control) were determined by using a Fluoroskan Ascent FL (Thermo, Gaithersburg, MD). All assays were performed in triplicate.
SmaI/BglII fragments containing the ANKWT or variant ANK-1 promoter sequences were excised from the pGL2B luciferase reporter vectors described above. A triple ligation consisting of the ankyrin promoter, 1,938-bp BglII/HindIII fragment containing the Aγ-globin coding exons and introns and SmaI/HindIII-digested pSP72 was used to generate the variant ANK-1/Aγ-globin promoter plasmids. The ANK-1/Aγ-globin gene fragments (2,244 bp) were released from the plasmid with EcoRV and HindIII, separated on an agarose gel, electroeluted, and purified with an Elutip-d minicolumn (Schleicher & Schuell). Transgenic mice were generated as described previously (20, 22, 30, 32). Founder animals were identified by Southern analysis of DNA extracted from tail biopsies by probing with an ANK/Aγ-globin probe (32) and crossed to FVB/N mice for propagation. The copy number was determined by comparing the γ-globin signals from Southern blot analysis of F1 animals and K562 DNA using a PhosphorImager.
Total cellular RNA was extracted from adult reticulocytes, obtained from phlebotomized animals, using TRIzol reagent according to the manufacturer's specifications (Life Technologies, Inc.). Linear DNA templates containing sequences for both exon 2 of the human Aγ-globin gene and exon 2 of the murine α-globin gene were prepared by BglII digestion of cesium chloride-purified plasmid preparations (22). 32P-labeled RNA probes were transcribed by using a MAXIscript in vitro transcription kit (Ambion, Inc.). Hybridization of the probe and RNA (0.1 μg) and RNase A/T1 digestion were was carried out according to standard procedures (RPA II; Ambion, Inc.), and the protected fragments were separated on an 8% nondenaturing polyacrylamide gel. The relative amounts of human Aγ-globin exon 2 (223 bp) and mouse α-globin exon 2 (186 bp) were recorded on a PhosphorImager and estimated using following formula: (Aγ-globin RNA/transgene copy number) × (1/mouse α-globin RNA) (22). Statistical analysis of copy number was performed, and the expression data were analyzed by linear regression using GraphPad Prism version 2.0 software.
Detection and measurement of γ-globin protein in red blood cells were performed as described by Thorpe et al. (40). Red blood cells were washed in cold (4°C) phosphate-buffered saline, fixed in ice-cold (4°C) 4% paraformaldehyde solution, washed with 1:1 acetone-water (−20°C), acetone (−20°C), and 1:1 acetone-water (−20°C) before resuspension in phosphate-buffered saline plus 2% fetal bovine serum (4°C). Hemoglobin tetramers containing human γ-globin were identified with a fluorescein isothiocyanate-conjugated human hemoglobin F antibody (Perkin-Elmer Life Sciences). Analysis was performed on a FACStar instrument (Becton Dickinson, Franklin Lakes, NJ). To prevent leaching of hemoglobin, the cells were maintained at ≤4°C throughout the procedure.
Relative to the ATG initiation codon, human erythroid ankyrin-1 (ANK-1) transcripts initiate at positions −106, −100, −84, −71, −63, and −56 (Fig. (Fig.11 A) (20, 21). These correspond to positions −23, −16, +1, +16, +24, and +30 relative to the TSS designated as +1 in the Database of Transcriptional Start Sites (DBTSS; http://dbtss.hgc.jp/) (37-39). We previously showed that the TSS at position −84 is responsible for ~50% of the ANK-1 mRNAs (20). We also showed that TBP, the DNA binding component of the multiprotein TFIID complex, binds to a 9-nucleotide region between positions −78 and −70 in the 5′ untranslated region of the erythroid ANK-1 gene (TGCGGTGAG), which contains the TG dinucleotide deleted in the German HS patient (20).
To confirm that the RNA Pol II/TFIID complex that forms the transcription initiation complex is associated with the region from −78 to −70 in living cells, we queried the ChIPSeq data described in Steiner et al. (36) in which chromatin from erythroid K562 cells was enriched using an antibody against RNA Pol II, and the immunoprecipitated fragments were subjected to high-throughput sequence analysis. In agreement with the in vitro footprinting results, a peak of RNA Pol II-enriched chromatin maps between chr8 coordinates 41774250 and 41774300 (hg18), a region that contains the region from positions −78 to −70 (chr8 41774282 to 41774291) (Fig. (Fig.1B1B).
To determine whether variation in the sequence in the −78 to −70 region altered ANK-1 promoter function, we generated a library of ANK-1 promoter sequences containing degenerate sequence in the 5 bp upstream and 2 bp downstream of the TG dinucleotide (which was not varied) (Fig. (Fig.2).2). Sequence analysis of 98 randomly selected clones demonstrated 98 different sequences, indicating that the library captured a large proportion of the 16,384 possible permutations of the −78 to −70 region. The library was transcribed in both K652 and HeLa cell extracts, and the transcribed RNA (~290 bp) was amplified and cloned by RACE. Sequence analysis of more than 100 clones recovered from three independent experiments detected four sequences, each recovered multiple times (Table (Table1).1). The wild-type (ANK-1WT) sequence, TGCGGTGAG, comprised 11.4% of the recovered sequences. Three related sequences differed from ANK-1WT in their first two or three nucleotides. ANK-1GC (GCGGGTGAG), ANK-1GG (GGCGGTGAG), and ANK-1GGG (GGGGGTGAG) comprised 38.1, 41.9, and 8.6% of the sequences, respectively (Table (Table1).1). We conclude that of all possible nucleotide combinations, only a limited number of variations in the −78 to −70 region supported cell-free transcription, indicating a targeted interaction of transcription complexes with DNA rather than a general positioning on the underlying sequence.
The experimental sequences we detected have the consensus sequence: (G/T)(C/G)(C/G)GGTGAG. The (G/T)(C/G)(C/G)GGTGAG consensus does not appear in the TRANSFAC database of known DNA-binding protein sequences (26, 28). The last five bases of the (G/T)(C/G)(C/G)GGTGAG consensus resemble that of a splice junction. However, no spliced human mRNAs or expressed sequence tags (ESTs) are joined in this region of the ANK-1 gene (http://genome.ucsc.edu/cgi-binhgTracks?org=human), and we conclude that function of the −78 to −70 region does not involve splicing.
A database of human promoters assembled by L. Elnitski et al. (unpublished data) was queried for the presence of the experimental (G/T)(C/G)(C/G)GGTGAG consensus in the region between positions −100 and +40 relative to the transcriptional start site. More than 150 human promoters had an exact match to the consensus sequence (χ2 = 80.3; P = 1.12 × 10−9) (data available on request). The promoters with the consensus sequence did not fit into any specific ontology group and represent genes expressed both ubiquitously and in specific tissues. The location of the (G/T)(C/G)(C/G)GGTGAG consensus sequence in these promoters was evenly divided between the 5′-flanking sequence upstream of the TSS and the 5′ untranslated sequence downstream of the TSS. Although most promoters had a single (G/T)(C/G)(C/G)GGTGAG consensus, several promoters, including the core binding factor beta promoter, contained two consensus sequences: one located in the 5′-flanking sequence upstream of the TSS and one in the 5′ untranslated sequence downstream of the TSS.
To determine the degree of evolutionary conservation of the −78 to −70 region, we performed a computer-aided, unbiased alignment of mammalian sequences surrounding the erythroid ANK-1 TSS (e.g., for mice and some other species the erythroid ANK-1 TSS is not annotated) (Fig. (Fig.3).3). The black line at the top of the figure shows the alignment of the erythroid ANK-1 mRNA. The sequence of the −78 to −70 region (and a nearly identical region at −158 and −150) is conserved among primates (boxed). The sequence in the −78 to −70 region varies among other mammals but is conserved in the region from −78 to −73 in placental mammals and vertebrates (green boxes). Overall, the sequence of the −78 to −70 region matches the experimentally derived consensus sequence with the exception of position 8. These data suggest that our experimental analysis did not detect all of the tolerated variation within this region.
To establish that TFIID associates with the −78 to −70 region in the wild-type ANK-1 promoter and the three sequences identified through in vitro transcription, we performed an in vitro DNase I footprint analysis of labeled 175-bp probes containing these sequences. Purified TFIID protected a region of the ANK-1WT, ANK-1GC, ANK-1GG, and ANK-1GGG promoters between positions −78 and −70 (Fig. (Fig.44 A). TFIID did not protect this region in the ANK-1 promoter containing the critical TG deletion (ANK-1−TG) (Fig. (Fig.4B).4B). Downstream of −60, where other ANK-1 TSS are present, TFIID also protected the probe from DNase I digestion (Fig. (Fig.4A),4A), although in the absence of TFIID specific protection of the −78 to −70 region of the ANK-1−TG sequence, the downstream protection was also reduced. We conclude that the consensus sequence is necessary for TFIID binding to the −78 to −70 region.
Neither the detection of RNA Pol II in the −100 to −50 region nor the DNase I footprint analysis demonstrates the direct binding of TFIID to DNA. The Mg-agarose EMSA is the best-characterized method for detecting TFIID binding to DNA. We incubated 32P-labeled 281-bp probes (from −296 to −15) containing the ANK-1WT, ANK-1GC, ANK-1GG, ANK-1GGG, or the ANK-1−TG sequences with recombinant TFIIA and purified TFIID (Fig. (Fig.4C).4C). The affinity of each site for TFIID was determined by comparing the relative amount of bound versus unbound probe. As shown previously, the ANK-1−TG sequence showed significantly decreased binding affinity for TFIID (0.52 ± 0.12, P < 0.01, n = 4) compared to the ANK-1WT sequence (Fig. (Fig.4D).4D). In contrast, all three experimentally identified sequences exhibited significantly higher TFIID binding affinity than the ANK-1WT sequence (ANK-1GC = 1.42 ± 0.10, P < 0.02; ANK-1GG = 1.46 ± 0.28, P < 0.05; ANK-1GGG = 1.37 ± 0.12, P < 0.02; n = 4) (Fig. (Fig.4D).4D). We conclude that the experimentally identified sequences are all capable of binding TFIID.
To compare transcriptional activity of the wild type and variant promoter templates, we performed a cell-free, in vitro, runoff transcription assay. The templates were incubated with nuclear extracts from HeLa and K562 (erythroid) cells, and transcription levels were normalized to an internal CMV control promoter sequence that contains a TATA box (see Materials and Methods). Consistent with the presence of the TATA box, the CMV transcripts appear as a single band (Fig. (Fig.55 A). In HeLa cells, ANK-1 transcripts were less abundant than the CMV transcripts, suggesting that the TATA box promoter is more active than the ANK-1 promoter. As we have previously shown, the major TSSs observed at −106, −100, −84, and −71, and minor sites were noted at −63 and −56 in primary erythroid cells are also observed in vitro (20, 21). The ANK-1WT, ANK-1GG, and ANK-1GGG templates were transcribed at similar levels used the TSS at similar frequencies (Fig. (Fig.5A).5A). Compared to the other ANK-1 templates, the ANK-1GC template was transcribed at higher levels and preferentially used the TSS at positions −106 and −100 (Fig. (Fig.5A).5A). As we have shown previously, the ANK-1−TG template was transcribed at lower levels than the ANK-1WT template and used the upstream start sites (Fig. (Fig.5B5B).
In K562 cells, which contain erythroid specific transcription factors, ANK-1 transcripts were more abundant than the CMV transcripts, suggesting that in erythroid cells, the ANK-1 promoter is more active than the CMV promoter. The ANK-1WT, ANK-1GG, and ANK-1GGG templates were transcribed at similar levels and used the TSS at similar frequencies (Fig. 5A and C), while the ANK-1−TG template was transcribed at lower levels (Fig. (Fig.5B).5B). The ANK-1GC template was transcribed at a significantly higher level than the other templates (P < 0.01) and preferentially used the TSS at −106 and −100 (Fig. 5A and C). We conclude that the experimentally identified sequences are all capable of directing cell free transcription and that the rate of transcription is greater in erythroid cells.
To determine whether the consensus sequence between positions −78 and −70 of the ANK-1 promoter was necessary for ANK-1 expression, we linked the ANK-1WT, ANK-1−TG, and five ANK-1 promoters that have nonconsensus flanking the TG dinucleotide to a luciferase reporter gene. We then performed transient-transfection assays in K562 cells and compared the relative levels of luciferase expression from each construct. The level of expression from the ANK-1−TG plasmid (20) and the five nonconsensus promoters were all expressed at levels that were ca. 50% the level of the ANK-1WT promoter (0.46 ± 0.12, P < 0.05, and n = 5 for ANK-1−TG and 0.39 to 0.51, P < 0.02, and n = 5 each for the five nonconsensus promoters) (Fig. (Fig.66 A). We conclude that both the TG dinucleotide identified in the German HS patient and the surrounding (G/T)(C/G)(C/G)GG__AG consensus sequence are both necessary for full activity of the ANK-1 promoter.
To determine the effects of variation within the (G/T)(C/G)(C/G)GGTGAG consensus sequence, we linked the ANK-1GC, ANK-1GG, and ANK-1GGG promoters to luciferase reporter genes and performed transient-transfection assays in K562 cells. Consistent with the increased level of in vitro transcription, the ANK-1GC plasmid expressed significantly higher levels of the luciferase reporter gene compared to the ANK-1WT plasmid (7.15 ± 2.10, P < 0.001) (Fig. (Fig.6B).6B). The ANK-1GG plasmid also directed higher levels of expression, whereas the level of expression from the ANK-1GGG plasmid was similar to that of ANK-1WT plasmid (2.20 ± 0.96, P < 0.01, n = 5 and 0.84 ± 0.16, P > 0.05, n = 5, respectively) (Fig. (Fig.6B).6B). We conclude that the experimentally identified sequences are all capable of supporting ANK-1 promoter activity and that variation within the consensus sequence alters the level of activity.
A rigorous test of whether the ANK-1 promoter variants have different relative levels of activity is a transgenic mouse assay. We attached the variant promoters to a human Aγ-globin (hu-γ) reporter gene and generated panels of transgenic mice for each sequence. We generated panels of ANK-1GC/γ-globin (five lines), ANK-1GG/γ-globin (six lines), and ANK-1GGG/γ-globin (five lines) animals to compare to our previously described panels of 16 ANK-1WT/γ-globin and 7 ANK-1ΔTG/γ-globin mice (20, 32). The relative number of transgene copies in each line ranged from 1 to 10 copies in the variant promoter lines as estimated by Southern blot analysis of DNA from F1 offspring (Fig. (Fig.77 A and Table Table2).2). Consistent with the cell-free transcription and transient-transfection assays, analysis of reticulocyte RNA from the same animals demonstrated that the level of human γ-globin mRNA in the ANK-1GC animals was 25.7% of mouse α-globin mRNA per transgene copy, 5-fold higher than the level in ANK-1WT animals (P < 0.01) (Table (Table2;2; Fig. 7A and B). Also consistent with the cell-free transcription assays, the level of human γ-globin mRNA in the ANK-1GG and ANK-1GGG animals was similar to the level in ANK-1WT animals. We conclude that variation in the −78 to −70 region of the ANK-1 promoter can affect the relative activity of the promoter in both in vitro and in vivo assays.
Many transgenes are subject to position effects that lead to expression in a subpopulation of cells in mice (25). We have previously shown that the ANK-1WT and ANK-1−TG promoters direct uniform, position-independent, copy number-dependent expression of γ-globin mRNA in transgenic mice (20, 22, 32). Similarly, γ-globin protein was detected in 100% of erythrocytes from all lines of mice carrying the variant ANK-1 promoter transgenes (uniform) (Table (Table22 and Fig. Fig.7C).7C). Likewise, we found that all of the ANK-1GC, ANK-1GG, and ANK-1GGG lines expressed γ-globin mRNA (position independent) (Table (Table2)2) and that there was a significant correlation between transgene copy number and the level of γ-globin mRNA in all three sets of animals (copy number dependent) (Table (Table2).2). We conclude that while sequence variation in the region of the HS mutation alters the activity of the ANK-1 promoter, variation in this region does not affect the ability of the ANK-1 promoter to direct uniform, position-independent, and copy number-dependent expression.
Approximately 50% of human gene promoters are defined as CpG-rich (37). The erythroid ANK-1 promoter does not contain any other previously defined promoter motif such as a TATA box, CAAT box, InR, or DPE (21). Our analysis demonstrates that the TG dinucleotide deletion in the German HS patient disrupts the function of a previously undescribed promoter element in the ANK-1 promoter that binds TFIID. Within the consensus sequence derived from our experimental results, (G/T)(C/G)(C/G)GGTGAG, one sequence, GCGGGTGAG, bound more TFIID in vitro and directed higher levels of expression in cell-free, transient-transfection, and transgenic mice than the wild-type ANK-1 sequence. This sequence differs from the wild-type sequence at positions 1 (G instead of T), 2 (C instead of G), and 3 (G instead of C). We propose that sequence variation within the consensus sequence can alter the affinity for TFIID and may be a mechanism to produce different rates of transcription from core promoters.
Our data show that both the TG dinucleotide at −72/73 and the surrounding (G/T)(C/G)(C/G)GG__AG consensus sequence were necessary for full activity of the ANK-1 promoter. Because the TG dinucleotide at −72/73 was not varied in our degenerate promoter library, variation at one or both of these residues may also be tolerated. Upstream of the −78 to −70 region, between positions −158 and −150, is a sequence that differs from the consensus in that a C appears at positions 4 and 6 in place of the invariant G seen in the experimental analysis and the invariant T in the experimental design, respectively (Fig. (Fig.1).1). This sequence would not have been detected in our experiments but may also be a functional component of the ANK-1 promoter. Smale has proposed a model in which the DNA-binding protein Sp1 directs TFIID to sequences within the promoter to initiate transcription from CpG-rich promoters (9, 35). The ANK-1 promoter has two Sp1 binding sites located at positions −176 and −160 (21). For the ANK-1 promoter we propose that these TFIID binding elements are the sequences between positions −78 and −70 and between positions −158 and −150. In support of this hypothesis, the ANK-1−TG mutation and nonfunctional sequences abolish the TFIID footprint in the −78 to −70 region in vitro but only reduce TFIID binding to the full-length ANK-1 promoter and ANK-1 promoter function by 50%. The increased activity of the functional ANK-1 promoters in erythroid K562 cells compared to HeLa cells suggests that erythroid transcription factors are also involved in the recruitment of TFIID to the −78 to −70 region. We have previously demonstrated that GATA-1 binds at position −123 of the ANK-1 promoter and that mutation of this site significantly reduces ANK-1 promoter activity (21). We propose that while the basal promoter activity of the ANK-1 promoter seen in HeLa cells requires the (G/T)(C/G)(C/G)GGTGAG consensus sequence, full activity of the ANK-1 promoter in erythroid cells requires GATA-1 binding as well.
The identification of noncoding DNA variations that cause differential expression of genes is a powerful method for discovering new regulatory elements. The number of disease mutations in cis-regulatory regions is rapidly increasing due to the advent of high-density single nucleotide polymorphism detection and next-generation sequencing technology (15). For some DNA variants identified, such as those that alter either the conserved TATA box (3) or the binding site for the erythroid Kruppel-like transcription factor (29) in the human β-globin promoter, the association with the disease (β-thalassemia) with the causative variant is relatively easy to demonstrate. Other variations, such as the variant that creates a GATA-1 binding site that causes α-thalassemia by recruiting the transcriptional machinery away from the α-globin genes (11), are more complex and require both genomic and functional analysis to prove they are the causative mutations. The approach we have described here can be used to determine whether any disease-associated DNA variation discovered within a core promoter alters a functional promoter element.
This work was supported by NHGRI intramural funds (L.E. and D.M.B.), HD000850 (L.A.S.), DK60239, DK04015 (P.G.G.), and Fonds de la Recherche en Santé du Québec (K.L.).
K.L., A.N.O., E.E.D., M.Q.Y., C.W., L.A.S., and L.J.G. performed experiments. K.L., L.E., P.G.G., and D.M.B. wrote the paper. L.E., P.G.G., and D.M.B. designed the experiments.
Published ahead of print on 17 May 2010.