|Home | About | Journals | Submit | Contact Us | Français|
Erythrocyte membrane protein genes serve as excellent models of complex gene locus structure and function, but their study has been complicated by both their large size and their complexity. To begin to understand the intricate interplay of transcription, dynamic chromatin architecture, transcription factor binding, and genomic organization in regulation of erythrocyte membrane protein genes, we performed chromatin immunoprecipitation (ChIP) coupled with microarray analysis and ChIP coupled with massively parallel DNA sequencing in both erythroid and nonerythroid cells. Unexpectedly, most regions of GATA-1 and NF-E2 binding were remote from gene promoters and transcriptional start sites, located primarily in introns. Cooccupancy with FOG-1, SCL, and MTA-2 was found at all regions of GATA-1 binding, with cooccupancy of SCL and MTA-2 also found at regions of NF-E2 binding. Cooccupancy of GATA-1 and NF-E2 was found frequently. A common signature of histone H3 trimethylation at lysine 4, GATA-1, NF-E2, FOG-1, SCL, and MTA-2 binding and consensus GATA-1-E-box binding motifs located 34 to 90 bp away from NF-E2 binding motifs was found frequently in erythroid cell-expressed genes. These results provide insights into our understanding of membrane protein gene regulation in erythropoiesis and the regulation of complex genetic loci in erythroid and nonerythroid cells and identify numerous candidate regions for mutations associated with membrane-linked hemolytic anemia.
The erythrocyte membrane is a multifunctional, complex structure that provides the red cell the deformability and stability required to withstand its travels through macro- and microcirculation. It plays critical roles in erythropoiesis, including responding to erythropoietin, importing iron required for hemoglobin synthesis, and regulating cellular metabolism. Qualitative and quantitative disorders of erythrocyte membrane proteins have been associated with inherited abnormalities of red cell shape, including hereditary spherocytosis, hereditary elliptocytosis, and hereditary pyropoikilocytosis syndromes (65, 103). Despite biochemical and genetic linkage to specific erythrocyte membrane protein genes, e.g., ankyrin-1, α- or β-spectrin, and band 3, mutations are found in the coding exons and promoter regions of only ~75% of cases studied. This suggests that the disease-causing mutation is located in critical regulatory regions outside the promoters and exons in a quarter of cases.
Most erythrocyte membrane protein genes are large, comprised of >25 exons. They encode numerous diverse and complex isoforms, frequently generated by alternate splicing, alternate promoter usage, or alternate polyadenylation (18). In many cases, alternate promoters direct combinations of exons encoding diverse tissue-specific, cell type-specific, developmental-stage-specific, and/or differentiation stage-specific isoforms (6, 12, 13, 19, 21-24, 44, 52, 62, 78, 86, 108, 112-114). As such, erythrocyte membrane protein genes serve as excellent models of complex gene locus structure and function. Study of the regulation of erythrocyte membrane protein genes has been hampered both by their large size and by their complexity. Limited information regarding their regulation in erythroid cells is available and consists primarily of in vitro studies of core erythroid cell promoters (6, 21, 22, 51, 53).
Advances in technology have permitted the rapid identification of critical regions of gene regulation on a genome-wide scale, identifying regions bound by transcription factors and other DNA-associated proteins, delineating regions of various histone architectures, and revealing the methylation status of regions of DNA. Techniques available for mapping protein-DNA interactions in vivo couple chromatin immunoprecipitation (ChIP) with microarrays that contain regions of genomic DNA (ChIP-chip) or with massively parallel DNA sequencing (ChIP-seq). These technologies are ideally suited for application to the study of the regulation of the large and complex membrane protein gene loci in erythroid cells.
To begin to understand the intricate interplay of transcription, dynamic chromatin architecture, transcription factor binding, and genomic organization in regulation of erythrocyte membrane protein genes, we performed ChIP-chip and ChIP-seq with erythroid and nonerythroid cells. Unexpectedly, most regions of GATA-1 and NF-E2 binding were remote from gene promoters and transcriptional start sites (TSS), located primarily in introns. Cooccupancy with FOG-1, SCL, and MTA-2 was found at all regions of GATA-1 binding. Interestingly, cooccupancy of SCL and MTA-2 was also found at regions of NF-E2 binding. Cooccupancy of GATA-1 and NF-E2 was found frequently. A common signature of histone H3 trimethylation at lysine 4 (H3Me3K4), GATA-1, NF-E2, FOG-1, SCL, and MTA-2 binding and consensus GATA-1-E-box binding motifs located 34 to 90 bp away from NF-E2 binding motifs was found frequently in erythroid cell-expressed genes.
These results provide insights into our understanding of membrane protein gene regulation in erythropoiesis and the regulation of complex genetic loci in erythroid and nonerythroid cells and identify numerous candidate regions for mutations associated with membrane-linked hemolytic anemia.
K562 cells (chronic myelogenous leukemia in blast crisis with erythroid characteristics, ATCC, CCL 243) were maintained in RPMI 1640 medium with 10% fetal calf serum. HeLa cells (epithelial-like carcinoma, cervix, CCL 2) were maintained in Eagle's minimal essential medium with 10% fetal calf serum. Human CD34-selected stem and progenitor cells were obtained from the Yale Center of Excellence in Molecular Hematology Cell Core and cultured in StemSpan SF expansion medium (StemSpan 09650) with estradiol (100 ng/ml), dexamethasone (10 ng/ml), human transferrin (200 ng/ml), insulin (10 ng/ml), Flt3 ligand (100 ng/ml), stem cell factor (100 ng/ml), interleukin-3 (50 ng/ml), interleukin-6 (20 ng/ml), insulin-like growth factor 1 (50 ng/ml), and erythropoietin (3 U/ml) for 9 to 14 days (63, 77). Fluorescence-activated cell sorter analysis was used to analyze the cellular expression of CD71 (transferrin receptor) and CD235a (glycophorin A). Magnetic bead selection for CD71 (MACS 130-046-201; Miltenyi Biotech) and CD235a (MACS 130-050-501; Miltenyi Biotech) was used to purify an R3/R4 population of erythroid cells (117).
RNA prepared from K562 cells, HeLa cells, and primary, cultured human erythroid cells (RNeasy mini kit; Qiagen) was treated with amplification-grade DNase I and reverse transcribed with an oligo(dT) primer by use of a SuperScript first-strand synthesis system (Invitrogen). Primer pairs for each of the 15 target membrane protein genes were designed using Primer 3 software. Reverse transcription products were amplified by real-time PCR using an iCycler (Bio-Rad) with the primers listed in Table S1 in the supplemental material. PCR specificity was verified by assessing amplification product melting curves. Real-time PCR data were normalized to an ornithine decarboxylase antizyme 1 (OAZ1) mRNA control. The changes in specific mRNA levels were calculated using the ΔΔCT method (where CT is threshold cycle), with results presented as means ± standard errors of the means. Results were normalized to the gene with the highest expression level in each group (100). Triplicate analyses were performed for each target gene.
ChIP assays were performed as previously described (20), with minor variation (ChIP kit 17-295; Upstate/Millipore). Briefly, 1% formaldehyde was added to ~1 × 108 cells for 10 min at 37°C. Cells were washed twice in cold phosphate-buffered saline with protease inhibitors and then placed in hypotonic buffer for 20 min, followed by Dounce homogenization to isolate cross-linked nuclei. Nuclei were placed in sodium dodecyl sulfate lysis buffer for 30 min and then sonicated on ice with 21 10-s pulses, each followed by a 10-s rest period. Samples were diluted and then precleared at 4°C for 60 min with protein A or G agarose beads. Samples were immunoprecipitated for 12 to 18 h on a rotating platform at 4°C. Antibodies utilized for immunoprecipitation included H3Me3K4 (ab8580; Abcam), RNA polymerase II (RNAPII) (sc-899; Santa Cruz), GATA-1 (sc-265; Santa Cruz), and NF-E2 (sc-22827; Santa Cruz) antibodies, as well as nonspecific rabbit immunoglobulin G (IgG) (sc-2091; Santa Cruz). Antibody-bound DNA-protein complexes were collected using protein A or G agarose beads. Antibody-bound DNA-protein complexes were washed and eluted from the beads according to instructions. Cross-linking of DNA-protein adducts was reversed by incubation at 65°C for 4 h. The resulting DNA was cleaned with a QIAquick PCR purification kit (Qiagen) according to manufacturer instructions and amplified with a GenomePlex whole-genome amplification kit (Sigma) according to manufacturer instructions. Amplified DNA was cleaned using a QIAquick PCR purification kit (Qiagen) before amplification, labeling, and hybridization to arrays or before quantitative PCR (qPCR) analyses.
For ChIP-seq experiments, 1 × 108 K562 or HeLa cells were cross-linked with 1% formaldehyde for 10 min at room temperature, followed by Dounce homogenization. Cross-linked nuclei were isolated, followed by sonication to obtain chromatin-containing DNA fragments with an average size of ~500 bp. For each ChIP, 20 μg of antibody or the appropriate control IgG species was used. The antigen-antibody complex was captured on protein G beads, washed four times with radioimmunoprecipitation assay buffer, and then washed with phosphate-buffered saline. The DNA-protein complex was eluted from the protein G beads with 1% sodium dodecyl sulfate at 65°C, and cross-linking of DNA-protein adducts was reversed by overnight incubation at 65°C. After proteinase K and RNase digestion of the reverse-cross-linked sample, DNA was extracted with phenol-chloroform, precipitated with ethanol, suspended in 50 ml of 10 mM Tris-EDTA, pH 8.0, and used for Solexa deep sequencing and qPCR validation.
Additional ChIP experiments were performed with K562 cell chromatin using antibodies against FOG-1 (sc-9361; Santa Cruz), SCL (sc-12984; Santa Cruz), and MTA-2 (sc-9447; Santa Cruz) as described above for ChIP-chip, with the following modifications. Prior to the addition to formaldehyde, ethylene glycol-bis, dimethyl adipimidate, and di(N-succinimidyl) glutarate were added to the culture medium at a concentration of 0.15 mM, and the medium was incubated at room temperature for 30 min to maximize protein-protein cross-linking (116). The remainder of the ChIP assay was completed as described above.
A custom, high-density genomic tiling array containing probes for the genomic regions of 100 genes expressed in erythroid cells, including the 15 target membrane protein genes, was designed with NimbleGen Systems software. The probes, designed to have an optimal melting temperature of ≥76°C, were typically ~50 bp in length, with some longer probes designed to increase the melting temperature to 76°C. Probes were tiled with 10- to 100-bp spacing, typically ~65 bp. Regions of repetitive DNA were excluded. Ten to 100 kb of flanking DNA was included on the chip for each locus, with 50 to 100 kb of flanking DNA included for the 15 target membrane protein genes. Labeling and hybridization of DNA samples for ChIP-chip analysis were performed by NimbleGen Systems.
Approximately 500 ng of ChIP DNA was run on a 1.5% agarose gel to size select ChIP DNA in the 200- to 300-bp range. The size-selected ChIP DNA was purified using a gel extraction kit (Qiagen) and processed for Solexa sequencing as per manufacturer's protocol. Briefly, size-selected ChIP DNA was end repaired using an End-IT DNA end repair kit (ER0720; Epicenter), followed by addition of an adenine base at the 3′ end by Klenow reaction and Solexa adaptor ligation. The modified DNA was then PCR amplified with one initial heating step of 98°C for 30 s, followed by 15 cycles of amplification with a melting temperature of 98°C for 30 s, an annealing temperature of 65°C for 30 s, and a product extension at 72°C for 30 s. At the end of the amplification, a final extension at 72°C for 5 min was performed. Amplified ChIP DNA was then size selected on a 1.5% agarose gel, purified by using a gel extraction kit (Qiagen), and subjected to deep sequencing. An average of 5 million reads was attained with each sequencing procedure.
Data obtained from ChIP-chip experiments were analyzed using the Tamalpais peak calling algorithm (4) to determine areas of DNA-protein interaction. Control and immunoprecipitation paired-data files were processed with R Smudgekit version 2.4 software (Jay Emerson, unpublished) to remove chip hybridization artifacts. Ratio GFF files of control and experimental data were generated, and the three replicate data files were processed together using the Tamalpais web server to generate candidate GATA-1 and NF-E2 binding regions. Peaks identified at all four levels of stringency were subjected to additional analyses because even though levels L1, L2, and L3 are more stringent and identify regions of binding with higher accuracy than L4, L4 peaks also often yield valid binding sites.
The conservation of each candidate region of GATA-1 and NF-E2 binding regions between corresponding genes of placental mammals was determined using the UCSC hg 18 genome browser database (43) with the 44-way placental-mammal PhastCons track (91). For each potential region of NF-E2 and GATA-1 binding, the maximum PhastCons score was determined using the galaxy aggregate function (25, 46, 48). The UCSC genome browser 7X regulatory potential table was used to determine the maximum regulatory potential (RP) scores for each region (16, 48). TESS, the Transcription Element Search System (http://www.cbil.upenn.edu/cgi-bin/tess/tess), was used to identify potential GATA and NF-2E binding motifs in regions of GATA-1 and NF-E2 occupancy. Maximum PhastCons and RP scores were also determined for each GATA-1 and NF-E2 binding motif present in regions of GATA-1 and NF-E2 occupancy predicted by Tamalpais.
For ChIP-seq analyses of RNAPII binding, files for data visualization were created by the Yale CEGS Solexa processing pipeline. Each mapped sequence read was extended to a 200-bp window. The number of overlapping windows at each window start point was counted to generate the value displayed via an SGR file format.
Primers were designed for each binding region in the target genes identified by the Tamalpais peak calling algorithm with a level of stringency of L1 to L4 (see Table S2 in the supplemental material). Immunoprecipitated DNA was analyzed by quantitative real-time PCR (iCycler; Bio-Rad) using the appropriate primers. SYBR green fluorescence in 25-μl PCRs was determined, and the amount of product was determined relative to a standard curve generated from a titration of input chromatin. Amplification of a single amplification product was confirmed by dissociation curve analysis. Enrichment of binding sites in target DNA over input was determined using ΔΔCT analysis. Results are presented as means ± standard errors of the means, with triplicate analyses performed for each binding site. Student's t test was used to compare the level of enrichment attained with each specific antibody to the level of enrichment attained using nonspecific IgG. P values of >0.05 were considered to be significant.
Binding reactions were carried out as previously described (108). Competitor oligonucleotides were added at a 100-fold molar excess. Antibodies to GATA-1 were obtained from Santa Cruz Biotechnologies (M-20 and sc-1234; Santa Cruz, CA). Primers used in gel mobility shift analyses are listed in Table S3 in the supplemental material.
In the erythrocyte membrane, there are 15 major proteins and hundreds of minor ones (66). The major membrane proteins are known to be critical for membrane assembly, structure, and function, and mutations in several of these genes result in inherited hemolytic anemia. Because there is limited information regarding the regulation of these genes in erythroid cells, to begin to identify and characterize the regulatory elements controlling their expression (Table (Table1),1), ChIP-chip or ChIP-seq was performed.
Quantitative reverse transcription-PCR was utilized to confirm that all 15 membrane genes were expressed both in K562 cells and in primary, cultured human erythroid cells. As expected, mRNA from all 15 genes was detectable both in K562 cells and in an R3/R4 population (117) of primary, cultured human erythroid cells. In HeLa cells, mRNA was detectable for 11 of 15 membrane genes, although generally at much lower levels than in K562 or primary, cultured erythroid cells (Fig. (Fig.11).
RNAPII preinitiation complex assembly is a feature of all protein-coding eukaryotic gene promoters. This process has been studied on a genome-wide scale via ChIP-chip and ChIP-seq with several cell types (8, 47, 83). RNAPII occupancy was examined across the 15 erythrocyte membrane genes in erythroid (K562) and nonerythroid (HeLa) cells by using ChIP-seq (Fig. (Fig.2;2; also see Fig. S1 in the supplemental material). In K562 cells, RNAPII binding was detected at a single promoter or putative promoter in 4 of 15 genes and at multiple promoters/putative promoters in 10 of 15 genes. The genes with multiple RNAPII peaks in K562 chromatin included ACTB, TPM3, SPTB, ADD2, ANK1, EPB41, EPB49, ERMAP, ICAM4, and SLC4A1. RNAPII binding was not found in the region of the putative TSS (identified by the RefSeq Genes track in the UCSC genome browser) of the TMOD1 gene in K562 cells, but a peak of RNAPII enrichment was found ~20 kb 5′ of the putative TSS. This corresponds to the TSS of “exon 0,” an alternate transcript of TMOD1 (115).
Interestingly, several genes had intragenic peaks of RNAPII binding. For example, both the ANK1 and EPB41 loci had significant intragenic peaks of RNAPII binding that did not correlate with any known TSS (see Fig. S1 in the supplemental material). Both ANK1 and EPB41 are large complex loci which encode multiple isoforms with alternate 5′ ends. There are two expressed sequence tags in the intragenic region of RNAPII binding in the ANK1 locus (GenBank accession no. BY800201 and DW421256) and one in the EPB41 locus (GenBank accession no. EH345598). The intragenic RNAPII peak in these genes likely represents either the first exon of an alternate 5′ isoform or a separate gene embedded within the locus.
Previous studies have shown that over 10% of genes with detectable RNAPII/preinitiation complex binding at their promoter regions do not produce detectable transcripts, attributed to promoter stalling or pausing (54, 60, 71, 85). Transcripts of all 15 membrane protein genes were detected in K562 cell mRNA; thus, this phenomenon was not observed with these genes.
In nonerythroid (HeLa) cells, RNAPII binding was detected at a single promoter/putative promoter/TSS in 7 of 15 genes and at multiple promoters/putative promoters/TSS of three genes, ANK1, ACTB, and EPB41. Five genes with undetectable or nearly undetectable mRNA levels in HeLa cells (Fig. (Fig.1)1) showed no RNAPII binding across their loci (SPTA, EPB42, EPB49, SPTB, and ADD2).
H3Me3K4 is both necessary for and a specific marker of transcriptionally active genes (72, 87, 88, 95). ChIP-chip technology using a custom-designed NimbleGen DNA microarray was employed to analyze H3Me3K4 binding in the 15 membrane protein genes in erythroid (K562) and nonerythroid (HeLa) cells. Microarray probes corresponding to the erythrocyte membrane protein genes spanned 3.35 Mb of the human genome and encompassed 11 different chromosomes (Table (Table1).1). Fourteen of 15 genes were enriched for H3Me3K4 at a putative promoter/promoter/TSS in K562 cells (Fig. (Fig.2;2; also see Fig. S1 in the supplemental material). Similarly to results for RNAPII occupancy, TMOD1 was not enriched at the putative TSS but did demonstrate a large peak of H3Me3K4 enrichment 20 kb 5′ of the putative TSS, corresponding to the TSS of “exon 0.” Two compact genes, β-actin and ICAM4, had H3K4me3 enrichment not only in the promoter region but also throughout the body of the gene. In many cases, H3Me3K4 enrichment at the promoter/TSS was cell type specific, with some genes having dramatically lower or nondetectable binding of H3Me3K4 in HeLa cell chromatin (Fig. (Fig.2;2; also see Fig. S2 in the supplemental material).
The genetic regulation of the erythrocyte membrane protein genes is complex, and the expression of these genes is influenced by many cis- and trans-acting factors. To gain insight into some of the trans-acting factors involved in the transcriptional regulation of these genes, regions of GATA-1 and NF-E2 binding were studied. GATA-1 is a zinc finger transcription factor that promotes erythroid cell, megakaryocyte, and mast cell development (70, 80, 81, 90, 93, 96, 106). Able to act as either an enhancer or a repressor, GATA-1 exerts its effects on target genes through a number of molecular mechanisms, including cofactor recruitment (5, 30, 33, 41) and higher-order chromatin organization (39, 45). NF-E2 is a heterodimer of an erythroid cell-specific 45-kDa subunit and a ubiquitously expressed 18-kDa subunit (1, 36). Like GATA-1, NF-E2 plays critical roles in erythropoiesis and megakaryocytopoiesis (74) and influences higher-order chromatin structure (3, 27). Previous studies employing ChIP-chip technology to study GATA-1 binding have been limited to the region of human chromosome 11 containing the β-globin locus (34), the region of mouse chromosome 7 in and around the β-globin locus (10, 97), and the region of murine chromosome 6 in and around the GATA-2 locus (30). To date, p45 NF-E2 binding has not been studied using high throughput genomic strategies.
GATA-1 binding in the 15 erythrocyte membrane protein genes was studied. Tamalpais identified 19 regions of GATA-1 occupancy in K562 chromatin (Table (Table22 and Fig. Fig.2;2; also see Fig. S3 in the supplemental material), with predicted regions of GATA-1 occupancy averaging 539 bp in length (range, 300 to 1,200 bp). All 19 regions of GATA-1 occupancy in K562 chromatin were validated by use of ChIP-qPCR (Fig. (Fig.3A).3A). K562 cells have been utilized as models of erythroid cells in numerous studies of gene structure and function. However, clonal variation in copy number produced by karyotypic abnormalities dictates that validation with primary erythroid cells be performed to confirm biologic relevance (49). The regions of GATA-1 occupancy identified for membrane protein genes in K562 cells were examined in chromatin from R3/R4 stage primary, cultured human erythroid cells by use of ChIP-qPCR (117). All 19 GATA-1 regions identified in K562 chromatin were also occupied in chromatin of R3/R4 primary erythroid cells (Fig. (Fig.3A3A).
GATA-1 binding was identified in 8 of 15 major erythrocyte membrane protein genes. There are several possible explanations why GATA-1 binding was not detected in the remaining seven genes. One possibility is that GATA-1 binding occurs at an earlier time point in erythrocyte differentiation (i.e., R1/R2) than our assay examined (R3/R4). Another possibility is that sites of low-level GATA-1 occupancy were not identified by the Tamalpais peak calling algorithm, which is very stringent in order to provide a low false-discovery rate. Due to the high stringency of the Tamalpais algorithm, it is likely, as reported by others utilizing similar genomic technologies (46), that not every region of GATA-1 binding in the major erythrocyte membrane genes was identified. Finally, it is possible that some of these erythrocyte genes exhibit GATA-1-independent expression.
H3Me3K4 has been associated with transcriptionally active chromatin, promoters, and enhancers. In the β-globin locus control region (LCR), disruption of GATA-1 binding to hypersensitive site 2 (HS2) leads to decreased levels of H3Me3K4, implying that GATA-1 is involved in the establishment of active histone methylation patterns (11, 84). The 19 regions of GATA-1 occupancy identified in the membrane genes were tested for the presence of H3me3K4 binding in erythroid (K562) and nonerythroid (HeLa) cells. In K562 chromatin, 16 of 19 regions of GATA-1 occupancy (Fig. (Fig.3B)3B) had significant enrichment for H3Me3K4. In nonerythroid cell (HeLa) chromatin, only 6 of 19 regions had significant enrichment for H3Me3K4. These six regions of GATA-1 occupancy were present in four genes: ADD3, ERMAP, EPB41, and TMOD1. mRNA transcripts of all four genes could be detected in nonerythroid (HeLa) cells but at significantly lower levels than in erythroid (K562) cells. Correspondingly, in these regions, the degree of H3Me3K4 enrichment in nonerythroid cell (HeLa) chromatin (4.05 ± 0.45 relative units) was significantly lower than that in erythroid cell (K562) chromatin (9.55 ± 1.90 relative units).
GATA-1 interacts with numerous proteins and protein complexes during hematopoiesis, including but not limited to FOG-1, SP/X-KLF proteins, the NuRD complex, SCL complexes, RNAPII, RUNX1, CBP/P300, Cbf-β, Fli-1, PU.1, the MeCP1 complex, the ACF/WRF complex, and the Mediator complex. FOG-1, a cofactor usually found at sites of GATA-1 occupancy, is a transcriptional regulator that plays a critical role in erythroid cell and megakaryocyte differentiation. It is found at nearly all sites of GATA-1 binding in erythroid cells, where it acts via heterodimer formation with GATA-1, conferring either activating or repressive functions (14, 33, 40, 41, 57, 76). Not surprisingly, FOG-1 binding was demonstrated at all 19 regions of GATA-1 occupancy in erythrocyte membrane protein genes (Fig. (Fig.3C3C).
SCL is a basic helix-loop-helix transcription factor essential for hematopoiesis (26, 75, 89, 102). It is frequently found at activating GATA-1 sites and is depleted at repressive GATA-1 sites (97). In some but not all cases, a tandem E box, the consensus binding motif for SCL, and a GATA-1 site are found within 9 to 12 bp or even up to 27 bp apart at sites of SCL GATA-1 cooccupancy (17, 56). MTA-2 is a member of the NuRD (nucleosome remodeling and deacetylase) complex (118). It is recruited by FOG-1 to sites of GATA-1 binding and participates in GATA-1-mediated repression (33). Sites of GATA-1 occupancy in membrane protein genes were tested for the presence of SCL-1 and MTA-2 binding. Similar to results from a previous report, all regions of GATA-1 occupancy also demonstrated occupancy of SCL (Fig. (Fig.3C)3C) (97). Interestingly, MTA-2, although a member of the NuRD complex described to be recruited by FOG-1 to GATA-1-repressed genes, was found at all sites of GATA-1 occupancy in these erythroid cell-expressed membrane protein genes. The function of the NuRD complex at these GATA-1 sites is unclear. A similar observation has recently been described (cited as unpublished data in reference 97). Of note, GATA-1 site number 11 had lower levels of enrichment for FOG-1, SCL, and MTA-2 than the other sites of GATA-1 occupancy. Although there was quantitatively less enrichment at site 11, there was statistically significant enrichment for all three factors (P = 0.010, P = 0.001, and P = 0.033, respectively).
ChIP-chip analyses identified 19 regions of GATA-1 binding in the genes encoding the 15 major membrane protein genes. These genes and their flanking DNA span 3.35 Mb of the human genome. A large number of potential GATA-1 DNA binding motifs were present in this 3.35 Mb of DNA. The minimal GATA-1 binding site, GATA, was found 15,875 times, and the GATA-1 consensus site, WGATAR, was found 6,269 times.
Functional regions of GATA-1 binding are expected to be more conserved than nonfunctional sites (10, 31, 48). Recent studies have used PhastCons analyses and RP scores to predict whether or not a region of DNA contains a CRM (10, 46, 48). PhastCons uses a hidden Markov model method on aligned genomic sequences to estimate a probability that any nucleotide is conserved (92). The UCSC genome browser 44-way placental-mammal PhastCons scores were determined for each of the 19 regions of GATA-1 binding. Thirteen of 19 (68%) GATA-1 binding regions had PhastCons scores of >0.8, suggesting that they contain a functional CRM (see Table S4 in the supplemental material).
An alternative way to predict the presence of CRMs is the RP score, which evaluates whether regions of DNA sequence have patterns more similar to those of regulatory elements or neutral DNA. Positive RP scores (RP scores of >0) (48) indicate conserved regions which contain a functional CRM. Fourteen of 19 regions (73%) of GATA-1 binding in erythrocyte membrane genes had positive RP scores (see Table S4 in the supplemental material).
Regions of protein occupancy identified by ChIP assays do not always contain consensus DNA-protein binding motifs. Genome-wide ChIP-based studies of various transcription factors and other regulatory proteins have highlighted this observation. In some cases, corresponding DNA-protein binding motifs were identified infrequently for some proteins, e.g., E2F1 (4) and OCT4 (4, 38), and very frequently for others, e.g., CTCF (46) and SCL (107). In erythrocyte membrane protein genes, the minimal consensus GATA-1 binding motif, GATA, was found in 18 of 19 regions, with only a single region, located in the promoter of the protein 4.1R gene, lacking any GATA-1 binding motifs. The canonical GATA-1 binding motif, (A/T)GATA(A/G), was present in 15 regions of GATA-1 occupancy.
ChIP assays identify regions where DNA and proteins interact in vivo, but they lack the resolution to define the precise site of DNA-protein interaction. As the minimal GATA-1 binding motif, GATA, is common, most regions of GATA-1 occupancy predicted by Tamalpais had several potential sites of GATA-1 binding. The in vitro ability of each of these potential binding sites to bind GATA-1 was analyzed using electrophoretic mobility shift assays (EMSA). With GATA used as the minimal GATA-1 binding sequence, there were 67 potential GATA-1 binding motifs in the 18 regions of GATA-1 occupancy. Double-stranded, 20-bp oligonucleotide probes corresponding to each of the 67 potential GATA-1 binding motifs were created (see Table S3 in the supplemental material) and used for EMSA with K562 cell nuclear extracts. Nineteen GATA-1 motifs with in vitro GATA-1 binding ability were identified in 10 regions of GATA-1 occupancy (see Table S5 in the supplemental material). Four regions of GATA-1 occupancy contained multiple GATA-1 binding motifs which were able to bind GATA-1 by EMSA: intron 3 of the α-spectrin gene (four positive binding sites), the 3′ flanking adjacent region of the ERMAP gene (two positive binding sites), intron 1 of the protein 4.1R gene (four positive binding sites), and intron 1/exon 2 of the protein 4.1R gene (two positive binding sites).
There were nine regions of GATA-1 occupancy that did not have any sites of GATA-1 binding in vitro. By a comparison using Fischer's exact test, no significant differences between regions of GATA-1 occupancy with positive GATA-1 binding in vitro and regions without GATA-1 binding in vitro were identified. This included no significant differences between the numbers of regions with PhastCons scores of <0.8 (P = 0.6), the locations of binding sites (e.g., intron, 5′ flanking region, 3′ flanking region, etc.) (P = 0.11), or the numbers of regions with a Tamalpais stringency level of L4 (P = 0.99) (see Table S5 in the supplemental material). In addition, PhastCons scores were calculated for each of the 67 potential GATA-1 binding motifs. These did not differ significantly between EMSA-positive and EMSA-negative motifs (P = 0.46, t test).
Tamalpais identified 18 regions of NF-E2 occupancy in K562 chromatin of the 15 membrane protein genes (Table (Table33 and Fig. Fig.2;2; also see Fig. S4 in the supplemental material), with predicted regions of NF-E2 binding averaging 672 bp (range, 250 to 1,200 bp). Seventeen of 18 regions of NF-E2 occupancy in K562 chromatin were validated using ChIP-qPCR (Fig. (Fig.4A).4A). All validated NF-E2 sites were also occupied in chromatin from R3/R4 primary, cultured erythroid cells (Fig. (Fig.4A4A).
Similarly to results for GATA-1, disruption of NF-E2 binding to HS2 in the β-globin LCR leads to decreased levels of H3Me3K4, implying that NF-E2 may also be involved in the establishment of active histone methylation patterns (11, 84). The 18 sites of NF-E2 occupancy identified in erythrocyte membrane genes were tested for the presence of H3Me3K4 in both erythroid (K562) and nonerythroid (HeLa) cells. As with regions of GATA-1 occupancy, regions of NF-E2 binding in erythroid cells were associated with enrichment for H3Me3K4. Sixteen of 19 NF-E2 sites (Fig. (Fig.4B)4B) demonstrated significant enrichment for H3Me3K4 in erythroid cell (K562) chromatin, with only 9 of 18 regions demonstrating enrichment (and at lower levels) for H3Me3K4 in nonerythroid cell (HeLa) chromatin.
The UCSC genome browser 44-way placental-mammal PhastCons scores were calculated for each of the 18 regions of NF-E2 binding. Seventy-two percent (13/18) of the regions had PhastCons scores of >0.8, indicating that they are likely to contain a functional CRM (see Table S6 in the supplemental material). In addition, 83% of the NF-E2 regions (15/18) had positive RP scores.
GATA-1 and NF-E2 cooccupancy has been demonstrated to occur at several sites, including the HSs of the β-globin LCR and the flanking regions of the α-globin genes (29, 82, 94). Regions of GATA-1 occupancy identified in membrane protein genes were examined for NF-E2 cooccupancy. Five sites of GATA-1-NF-E2 cooccupancy were identified: one site in intron 1 of the α-adducin gene, one site in intron 2 of the tropomyosin gene, one site in the 5′ adjacent region of the protein 4.1R gene, and two sites in the 5′ intergenic region of the tropomyosin gene (Table (Table44).
To determine if GATA-1-NF-E2 cooccupancy is a more widespread phenomenon in erythroid cell-expressed genes, GATA-1 and NF-E2 cooccupancy in other erythroid cell-expressed genes present on the high-density NimbleGen microarray was examined. The array contained a total of 100 erythroid cell-expressed genes, the 15 major membrane genes as well as 85 other erythroid cell-expressed genes. The probes corresponding to these erythroid cell-expressed genes span 12.80 Mb of the human genome, encompassing 20 different chromosomes. In the aggregate 100 erythroid cell-expressed genes, Tamalpais identified 74 regions of GATA-1 occupancy and 78 regions of NF-E2 occupancy. The majority of the GATA-1 (37/74) and the NF-E2 (37/78) binding regions were located in introns, with 21 of 37 GATA-1 sites and 25 of 37 NF-E2 sites located in intron 1.
Eighteen regions bound both GATA-1 and NF-E2 (Fig. (Fig.5A5A and Table Table4),4), and all of these were validated using ChIP-qPCR with K562 chromatin (Fig. (Fig.5A).5A). Ten of the 18 regions were located in introns. Fifteen of the 18 regions demonstrated significant occupancy of H3Me3K4 in K562 chromatin (Fig. (Fig.5B),5B), while only 6 of the 18 regions demonstrated any detectable H3Me3K4 occupancy in HeLa chromatin (data not shown). Seventeen regions contained consensus GATA-1 binding sites, while 15 regions contained potential NF-E2 binding sites. A region of GATA-1 and NF-E2 cooccupancy in the 5′ adjacent region of the protein 4.1R gene did not contain consensus binding sites for either GATA-1 or NF-E2. Half of the regions with GATA-1-NF-E2 cooccupancy had single consensus GATA-1 and NF-E2 binding sites separated by <50 bp in the corresponding genomic DNA. Thus, these sites differ slightly from sites of GATA-1-NF-E2 cooccupancy in the HS cores of the β-globin LCR, which contain consensus NF-E2 binding sequences ~50 bp from tandem inverted GATA-1 consensus sequences (28, 69, 94), possibly indicating that the mechanism(s) of cooperation at these sites differs from that in the β-globin LCR.
In hematopoietic cells, in addition to GATA-1, SCL and MTA-2 associate with other regulatory proteins. The identification of FOG-1 or SCL binding at sites of GATA-1-NF-E2 cooccupancy would not be unexpected (97). Not surprisingly, in erythroid cell-expressed genes (Fig. (Fig.5C),5C), all 18 regions of GATA-1-NF-E2 cooccupancy also bound FOG-1, MTA-2, and SCL. In half of these regions, consensus NF-E2 binding motifs were located 34 to 90 bp away from GATA-1-E-box motifs.
Little is known about SCL binding at GATA-1-independent regions of NF-E2 occupancy. The 13 GATA-1-independent regions of NF-E2 binding in erythrocyte membrane protein genes were tested for the presence of SCL. All GATA-1-independent regions of NF-E2 occupancy also showed enrichment for SCL (Fig. (Fig.5C),5C), demonstrating that SCL can associate with NF-E2 in the absence of GATA-1. Interestingly, the regions which bound both NF-E2 and SCL were also positive for MTA-2 (Fig. (Fig.5C),5C), providing another, but different, example of cooccupancy of these proteins in erythroid cell-expressed genes and suggesting that cooperation between them may be an important factor in erythrocyte gene regulation.
Previous studies attempting to identify critical regulatory elements of membrane protein genes have largely been limited to the study of core promoters, relying primarily on in vitro assays, such as reporter gene assays and EMSA (6, 21, 22, 51, 53). ChIP techniques have rarely been applied to the study of these genes (108, 111). When ChIP has been employed, the data have been of limited utility for a number of reasons, including the limited regions analyzed, the time and cost required to complete the studies, and investigator bias in choosing regions of study. High throughput ChIP-based technologies overcome many of these problems, allowing unbiased analyses of large contiguous genomic regions and using cost-effective commercially available platforms, as well as allowing comparisons and classifications of numerous genes in a single experiment.
This is the first report to utilize ChIP-chip to study in an unbiased manner both GATA-1 and NF-E2 binding in erythroid cell-expressed genes in numerous loci spread throughout the human genome. This approach has revealed that the majorities of regions of GATA-1 and NF-E2 occupancy are not at core erythrocyte promoters, as expected, but are located apart from the core promoter region, primarily in introns, frequently in intron 1. GATA-1 and NF-E2 in vivo binding outside core promoter regions has been described to occur in erythrocyte enhancers, most notably the β-globin LCR, but extensive non-promoter-related binding throughout numerous genomic loci was unexpected. High throughput, ChIP-based studies of other transcription factors have also demonstrated that the majority of binding sites of some, but not all, regulatory proteins may not necessarily be located at promoters or CpG islands. Together, these data provide additional evidence for the growing body of data supporting the critical role of long-range mechanisms, such as chromatin looping, in gene regulation (9, 39, 42, 50, 59, 73, 79, 98).
Genome-wide ChIP-based studies provide additional data supporting the role of long-range interactions in gene regulation. Several reports utilizing high throughput ChIP-based techniques have found multiple regions of factor occupancy in vivo that do not contain a consensus binding site in the corresponding DNA (4, 32, 38, 105, 110). Similarly, not all regions of GATA-1 occupancy in membrane protein genes contain a consensus GATA-1 binding site in the corresponding DNA. GATA-1 has been shown to play a critical role in long-range gene interactions at the c-kit and β-globin loci via formation of chromatin loops (39, 50, 101), a possible mechanism of action in regions of GATA-1 and NF-E2 binding that lack consensus GATA-1 or NF-E2 DNA binding sites.
Many different strategies have been employed to precisely identify and/or predict which GATA-1 consensus binding sites in genomic DNA interact with the GATA-1 protein in vivo. These have included various in silico and experimental techniques which have demonstrated that GATA-1 binding sites that regulate gene expression during erythropoiesis are under strong selection constraint (10, 104). As other reports have shown, regions of DNA in membrane protein genes demonstrating GATA-1 binding in vivo were more likely to demonstrate evolutionary conservation across species. However, even though these regions were identified as having conservation scores predictive of a cis-regulatory element, attempts to refine sites of GATA-1 binding by using in vitro binding with EMSA were unsuccessful, with no correlation between EMSA binding and PhastCons score. Numerous factors contribute to DNA-protein binding and subsequent erythrocyte gene expression, including cis sequences, concentration and stability of regulatory proteins, and chromatin architecture (7, 10, 35, 37, 41, 55, 58, 61, 64, 99). It is likely that a complex combination of factors regulates GATA-1 binding to its cognate DNA binding site.
As noted above, several recent studies have detailed binding partners at sites of GATA-1 and NF-E2 binding, as well as the composition of DNA-protein complexes. The GATA-1-associated proteins identified for membrane protein genes, including FOG-1, SCL, and MTA-2, have been described previously. In addition to heterodimerizing with small Maf proteins, p45 NF-E2 has been shown to interact with various proteins, such as MLL2, WWP1, and MCRS2 (2, 15, 67, 107, 109). With a few exceptions, proteins and multiprotein regulatory complexes interacting and associating with NF-E2 have not been characterized to the extent that GATA-1 has (68). Finding SCL and MTA-2 at all sites of NF-E2 binding implies that these transcription factors may play a broader role in the regulation of erythroid cell-expressed genes than may previously have been appreciated.
Importantly, these studies revealed a common erythroid cell type-specific chromatin signature located throughout the genomic loci of many erythroid cell-expressed genes. This mark includes H3Me3K4, with cooccupancy of GATA-1, NF-E2, FOG-1, SCL, and MTA-2 proteins, present at approximately a quarter of the GATA-1 binding sites. In the genomic DNA underlying half of these sites are single GATA-1-E-box consensus motifs separated from NF-E2 consensus motifs by 34 to 90 bp. An important goal in understanding erythrocyte gene regulation would be to have a cell type-specific topographic map of chromatin architecture with the interacting regulatory proteins, as well as the primary sequence and epigenetic configuration of the associated genomic DNA.
Disorders of erythrocyte shape comprise an important group of inherited hemolytic anemias. These disorders include the hereditary spherocytosis, elliptocytosis, and pyropoikilocytosis syndromes, which are often associated with qualitative and quantitative abnormalities of major erythrocyte membrane proteins. In many cases, causative mutations have been identified in the genes encoding these proteins. However, in as many as 25% of cases, despite specific protein deficiency and/or genetic linkage, the causative mutation is not identified, even after nucleotide sequence analysis of the coding exons, the immediate flanking intronic sequences, and the promoter regions (103). The high throughput genomic strategies employed in this study identify numerous excellent candidate regions for mutations associated with membrane-linked hemolytic anemia.
This work was supported in part by grants K12HD000850, P30DK072442, HL65448, and DK62039 from the National Institutes of Health.
Published ahead of print on 17 August 2009.
†Supplemental material for this article may be found at http://mcb.asm.org/.