Hereditary spherocytosis (HS; OMIM 182900) is a dominant inherited hemolytic anemia that affects approximately 1/2,500 people of all races worldwide (
1,
17,
18). Typically, HS patients have mild symptoms, which can be exacerbated by viral infections (
19). These symptoms include elevated reticulocyte counts and smaller, spherical erythrocytes on a blood smear and are accompanied by an abnormal osmotic fragility (
13,
19,
23). The majority of HS mutations have been found in the genes encoding the erythrocyte membrane skeleton proteins ankyrin-1 (ANK-1; ~60%) and Band 3 (SLC4A1; ~20%) (
1,
19). Virtually all of the described HS mutations cause a functional deficiency of erythrocyte skeleton proteins, either by premature termination and/or amino acid substitutions in regions critical for the protein-protein interactions that stabilize the erythrocyte membrane skeleton (
17,
18). In the 10 to 20% of patients in whom no mutations have been detected in the coding region of the membrane skeleton protein genes, the causative mutations are proposed to be in
cis-acting regulatory regions resulting in decreased transcription of mRNA resulting in haploinsufficiency (
12,
17,
18).
Support for this hypothesis has come from our previous analysis of a German patient with a severe form of HS (
20). The patient was shown to have two mutations in the
ANK-1 gene. The first was a 20-bp deletion in exon 6, leading to premature termination, presumably inherited from the father. The second mutation was a deletion of a TG dinucleotide in the 5′ untranslated region of the
ANK-1 gene located at position −72/73 relative to the ATG initiation codon (
12,
20) or + 12/13 from the transcriptional start site (TSS) listed in the database of transcriptional start sites (DBTSS) (
37-
39). We showed that the TATA-binding protein (TBP) of the transcription initiation complex, TFIID, bound to a region spanning nucleotides −78 to −70 that included the TG dinucleotide. Deletion of the TG dinucleotide disrupted the binding of these factors
in vitro. Finally, we showed that the TG deletion caused ankyrin deficiency by decreasing
ANK-1 promoter function both
in vitro and in transgenic mice, establishing the TG deletion as a causative HS mutation (
20).
In eukaryotes the protein-coding genes are transcribed by RNA polymerase II (Pol II) and are referred to as class II genes. The TSS and the sequences immediately flanking the TSS are referred to as the core promoter (
24,
34), which is functionally defined as the minimal DNA region required to direct low levels of accurate RNA Pol II transcription initiation
in vitro (
9). Core promoters contain one or more DNA sequence elements that direct the recruitment and assembly of the class II basal and/or general transcription factors (TFIID, TFIIA, TFIIB, TFIIF, TFIIE, and TFIIH) and RNA Pol II into a functional preinitiation complex (PIC) at the transcription start site (
31,
34). For example, the TATA box (consensus sequence TATAAA) is located 25 to 30 bp upstream from the transcription initiation site (
7) and is directly bound by the TATA-binding protein (TBP) subunit of the TFIID complex. The initiator element [Inr; consensus sequence YYAN(T/A)YY] encompasses the TSS and is recognized by the TBP-associated factors TAF1 and TAF2 of the TFIID complex (
10). The TFIIB recognition element [BRE; consensus sequence (G/C)(G/C)(G/A)CGCC] immediately flanks the TATA box and is directly bound by TFIIB (
27). The downstream promoter element [DPE; consensus sequence (A/G)G(A/T)(C/T)(G/A/C)], originally described in
Drosophila but conserved in mammals (
8), is located 30 bp downstream of the transcription start site in TATA-less promoters and bind subunits TAF6 and TAF9 of TFIID (
8).
In mammals, approximately half of the promoters for protein-coding genes are associated with CpG islands (
4,
37). These promoters generally lack consensus or near-consensus TATA boxes, DPE elements, or Inr elements (
5,
33). Common features of CpG island promoters are multiple, dispersed transcription initiation sites and the presence of multiple binding sites for transcription factor Sp1 (
5,
6). Transcription start sites are often located 40 to 80 bp downstream of the Sp1 sites, suggesting that Sp1 may direct the basal machinery to form a PIC (
34). However, the variety of TSSs in dispersed promoters make it difficult to identify the positioning of core promoter elements relative to the initiation sites. We have previously demonstrated that the minimal human
ANK-1 promoter has a high G+C content (77%) with no consensus promoter motifs (e.g., TATA box or InR sequence). The
ANK-1 promoter has the typical multiple transcription initiation sites associated with CpG island promoters (
21).
We hypothesized that the region surrounding the HS mutation defined a functional component of the ANK-1 promoter and that sequence variations in this region would affect level of transcription from the ANK-1 promoter. To test this hypothesis, we generated a library of ~16,000 ANK-1 promoters with degenerate sequence in the region occupied by TBP/TFIID. This library was used as a template for cell-free, in vitro, transcription and active ANK-1 promoter sequences were identified by rapid amplification of cDNA ends (RACE) analysis of the transcribed RNA. We identified four functional sequences: the wild-type ANK-1 sequence and three additional sequences that differed from the wild-type sequence by two or three bases. Analysis of these individual sequences showed that all were active in cell-free transcription, transient-transfection, and transgenic mouse assays, while randomly selected sequences were inactive. Two sequences directed similar levels of expression as the wild-type sequence, while the fourth sequence showed 5-fold higher levels of expression compared to the wild type. We conclude that the region from −78 to −70 of the ANK-1 gene is a regulatory region and that variation in the sequence on this region defines the level of transcription from the ANK-1 promoter.