General strategy in yeast for genome-wide screening for binding sites of DNA-binding proteins
We devised an improved assay system to perform whole genome screens for transcription factor-binding sites. The screens in yeast yield minimal background and quickly eliminate false-positives (). In essence, a transcription factor is tested against random genomic fragments to isolate DNA that can be directly bound by the protein of interest. By fusing the transcription factor to the activating domain of the yeast transcription factor GAL4 (GAL4AD), the protein will become a transcriptional activator, regardless of the normal role it plays in vivo or any co-factors that it would normally need to activate transcription. Binding of the transcription factor to the genomic DNA fragment results in activation of the URA3 reporter gene and growth of yeast on plates lacking uracil.
Figure 1. (A) Strategy of genomic library screens in yeast to identify binding sites for transcription factors. The transcription factor of interest is expressed in haploid yeast of mating type MATα. A library of genomic fragments cloned upstream of URA3 (more ...)
The screening efficiency relies on several key aspects: (i) high-quality libraries of genomic DNA with potentially several-fold coverage of the zebrafish and mouse genomes; (ii) tight repression of the SPO13 promoter upstream of the URA3 reporter gene; (iii) negative selection of genomic library yeast in 5-FOA containing media prior to screens resulting in few false-positives; (iv) ADE2-based expression plasmids for both auxotrophic selection and a visual color ‘sectoring’ assay; (v) maintenance of libraries and transcription factors in haploid yeast of opposite mating type for fast and easy execution of library screens and (vi) PCR amplification of the genomic fragments directly from yeast for rapid sequencing and reanalysis of the positive genomic fragments.
Construction of whole genome libraries for zebrafish and mouse
Libraries were constructed in plasmid pYoh366 that contains the URA3
reporter gene downstream of the SPO13
) in a CEN
plasmid (one copy per yeast cell) with the selective marker gene TRP1
). Genomic DNA from zebrafish and mouse was partially digested with Tsp509I. Genomic fragments of ~500 bp in length were gel-purified and cloned into the EcoRI site of pYoh366. A total of 3×107
independent clones with an average size of 300 bp were obtained for the zebrafish library, an approximate 4- to 6-fold coverage of the zebrafish genome. A total of 1.7×107
independent clones with an average size of 700 bp were obtained for the mouse library, representing an approximately 3- to 4-fold coverage of the mouse genome. For both libraries, >90% of plasmids had inserts. The libraries were transformed into yeast strain BY404 (MAT
), and >108
independent colonies from each library were pooled and frozen in aliquots.
Yeast expression plasmids for DNA-binding proteins
Several marker genes (such as HIS3
) are available for yeast expression plasmids. These markers allow for screens that involve more than one transcription factor. The marker gene ADE2
is particularly advantageous because loss of the plasmid can be monitored by accumulation of a red adenine precursor on plates with limiting amounts of adenine (30
). A colony that grows on plates lacking uracil yet shows red sectors or is completely red indicates that part or all of the colony has lost the transcription factor plasmid, easily identifying it as a false-positive clone.
Whole-genome screens for DNA-binding proteins in yeast
Screens were performed according to the outline in . A total of 2×108 haploid yeast of mating type MATa containing the genomic library were mated on non-selective YPD plates with an equal number of haploid yeast (MATα) containing the expression plasmid for a transcription factor. This approach is significantly more efficient than transforming the library into yeast expressing the target transcription factor, because (i) maintaining the library in yeast makes it an easily renewable resource and (ii) mating assays are more convenient and efficient than DNA transformation. The typical mating efficiency was 5–10%, yielding >107 diploid yeast per screen. After 3 days of growth at 30°C, Ura+ clones (non-sectoring white colonies in the case of ADE2 plasmids) were single-colony purified and checked for plasmid-dependency of the Ura+ phenotype. The library plasmids were rescued, transformed into BY404, retested for the Ura+ phenotype in mating assays for the transcription factor, sequenced and analyzed computationally.
Figure 2. Definition of p53 and FoxI1 consensus DNA-binding sites based on genomic fragments isolated in yeast screens. (A) A ‘standard’ p53 PSSM (position-specific scoring matrix) model was obtained based on alignment of 162 experimentally verified (more ...)
Characterization of the mouse genomic DNA library using p53
p53 is a tumor suppressor protein that is activated in response to cellular stresses. It has well-established DNA-binding characteristics (2
). All experiments were performed with the complete mouse library of 1.7×107
independent genomic fragments. Growing the yeast containing the genomic library in 5-FOA-containing media prior to the screen eliminated background from ‘self-activating’ fragments almost entirely (18 Ura+
clones per 1×106
We performed a screen with p53 under fully optimized conditions. Yeast with an empty control plasmid was processed in parallel to determine the background of false-positives. We obtained 330 Ura+ colonies per 1×106 diploid yeast for p53, whereas yeast with an empty plasmid yielded only two colonies per 1×106.
The very low level of false-positives in our assay allowed us to streamline the screening procedure significantly. Once Ura+
clones emerged after 2–5 days, a PCR reaction was used to amplify the genomic insert directly from yeast. One primer hybridized to the upstream SPO13
sequence and one to the downstream URA3
sequence so that only plasmid, but not genomic yeast, DNA was amplified. The PCR product was reintroduced into haploid yeast by co-transformation with a partially overlapping gapped plasmid, and the reporter gene plasmid was generated in yeast by homologous recombination (32
). The phenotype of the genomic fragment was confirmed in yeast after mating to haploid yeast with the transcription factor. In parallel, the PCR product was purified and sequenced. This approach reduces manual processing of yeast clones, speeds up screens and is adaptable to high-throughput screening.
Characterization of the zebrafish genomic DNA library using FoxI1
FoxI1 is a forkhead class transcription factor involved in proper organization of the zebrafish and mouse otic vesicle and zebrafish cranio-facial cartilage during early embryonic development (33–36
). Because FoxI1 can be either an activator or a repressor in vivo
), we fused the open reading frame of FoxI1 to the Gal4 activating domain (to ensure transcriptional activation) and screened the zebrafish library (to ensure transcriptional activation) and screened the zebrafish library (3×107
unique genomic inserts, 4- to 6-fold coverage of the zebrafish genome). In parallel, we performed a control-mating assay for haploid yeast with an empty plasmid. For this particular screen, we cloned the FoxI1
open reading frame into the pYOH-1 (modified from the Clontech plasmid pACT-1) plasmid marked with ADE2
. False-positives will not need to maintain the FoxI1
plasmid to grow on SC-Ura plates, so the colonies would take on a ‘sectored’ or all red appearance. True-positives should remain completely white.
Using the approach of and described for p53, 2.5×107 diploid yeast were screened on plates lacking uracil for 5 days at 30°C. The negative control resulted in 14 false-positive Ura+ colonies per 1×106 diploid yeast screened. In contrast, FoxI1 resulted in 132 Ura+ colonies per 1×106. A total of 710 white colonies were single colony purified. We PCR-amplified the genomic inserts directly from yeast and sequenced the fragments with nested PCR primers. The fragments were simultaneously re-tested in yeast as described before. All 710 fragments were again Ura+ and dependent on FoxI1 for growth.
Computational analysis of the genomic fragments for consensus DNA-binding sites
Of the 140 starting clones, 37 were duplicates and 3 had no insert. Of the remaining 100, 90 clones could be fully sequenced using primers up- and downstream of the genomic fragments, and 10 clones had an interior sequence gap. Sequences from the reference mouse genome were retrieved to fill the gap. The remaining six clones were chimerical and could not be mapped precisely because the sequencing gap included the region where the two distinct genomic fragments join. The February 2006 mouse genome assembly and BLAT at http://genome.ucsc.edu/were
used to map the 94 complete clones. Thirty-five clones could not be mapped because they contained exclusively repetitive sequences. Of the remaining 59 clones, 33 were mapped to a single and 26 split clones to more than one genomic locus (24 with 2, 1 with 3 and 1 with 4 fragments). Chimerical clones were a result of the library construction because the genomic fragments had compatible ends. In total, because of some chimerism, 59 clones mapped to 81 unique loci. Typically, chimeric clones had a predicted binding site in only one of the two (or more) genomic fragments in the clone. In summary, ~70% of library plasmids (94/140) were useful for the p53 DNA-binding site analysis and ~40% could be uniquely mapped to the genome.
The consensus p53 DNA-binding site is two half-sites of RRRCWWGYYY separated by 0–13 bp (2
). We built a new matrix from a larger data set of 162 p53 DNA-binding sites. According to the site predicted by our 162 known p53-binding sites, the spacing of 0 between the p53 half-sites is much preferred over any other spacing. This result was confirmed in yeast assays (26
). A logo of this half-site model is shown in A.
We applied the CONSENSUS program (37
) to 94 isolated clones with complete sequence data. The best motif is a 10 bp approximate palindrome that matches our standard p53 DNA-binding site model perfectly with a P
-value of 1.73×10−6
The p53 motif signal was very strong when analyzing the complete set of 94 clones. To test whether we can obtain an accurate motif for p53 with fewer sequences, we generated random subsets of the 94 sequences ranging from 5 to 50 and applied CONSENSUS to these data sets. With 20 or more sequences, the top predictions matched perfectly with our p53 DNA-binding site model at least 70% of the time. We then added two additional motif-finding programs, Gibbs sampler (38
) and Projection (39
). The combined sensitivity of these three programs was 100% on 10 random subsets of 20 sequences each indicating that just 20 genomic fragments per transcription factor may be sufficient to characterize the majority of putative transcription factors.
We mapped the likely p53 DNA-binding site in the library sequences. Out of the 94 complete library clones, 81 contained at least one perfect p53 DNA-binding site. Some clones contained as many as five predicted sites. Eleven of the remaining clones contained a good match to the p53 consensus DNA-binding site, but scored below our very stringent cutoff value. Two clones contained only one good p53 half-site.
A total of 154 unique sequences were obtained from the 710 total clones. One hundred and thirty-four of the 154 unique sequences could be mapped to the zebrafish genome (zebrafish build Zv6). Occasionally a sequence mapped to more than one location with 100% identity, these occurrences can often be explained by errors in the current zebrafish genomic build. In general, mapping was less ‘robust’ because of high polymorphism rates in the zebrafish genome and the less ‘complete’ nature of the assembly when compared to mouse. Approximately 50% of the clones were chimeric consisting of two different fragments fused together (Supplementary Data).
Human FoxI1 (HFH-3, FREAC6, HNF-3, FKHL10) binds to the motif TRTTTRKDD as determined by SELEX enrichment (40
). The CONSENSUS program identified two common motifs in the isolated fragments. The less frequent motif was identical to the published binding site (B), however, the more commonly represented binding site, TSATTGGYY, while similar, had some obvious differences (B), particularly the presence of an A in position 6 that is a T in the published consensus. This could either reflect slight differences in the preference of FoxI1 binding between the human and zebrafish forms, or differences in binding preference in the context of histone packaging in vivo
. Forkhead class transcription factors have been shown to bind more stably to DNA in the context of histones (41
), this may change the recognition site preferences when compared to in vitro
determinations. Thirteen fragments did not contain either predicted motif. The discovery of another putative binding site for FoxI1 as well as the confirmation of the previously published site reinforces the value of the technique. It is of particular note that motif 1 contains the NF-Y core sequence CCAAT, common in most eukaryotic promoters (42
). It is unclear what the presence of this sequence signifies, but NF-Y has been shown to be involved in DNA compaction (43
), similar to the role described for FoxI1 (21
), and like forkhead proteins, has a structure that is similar to histones (44
). Further study is needed to determine the significance of the CCAAT motif.
Identification of putative target genes
Fifty-nine unique clones identified 81 fragments with unique genomic locations because several clones were chimeric. Of the 56 fragments with a good to excellent p53 DNA-binding site, 10 mapped to introns. The remaining fragments were found several thousand base pairs up- or downstream of genes, similar to the findings of ChIP studies for p53 (46
) (Supplementary Data). These 46 fragments would not have been sampled by typical ‘promoter’ arrays. From the list of genes neighboring the isolated fragments with p53 DNA-binding sites, six genes (Sec61a2, Ass1, Aldh2, Kit1, Ela3b
) are known to be up-regulated in a p53-dependent fashion, but their p53 DNA-binding sites have not been reported (48–51
Because of chimerism, 154 unique sequences were mapped to 134 genomic loci. The remaining fragments were unmappable (Supplementary Data). There were only 20 instances where the fragment mapped within 10 kb of the transcriptional start site. This is consistent with a proposed second role for FoxI1 as a global chromatin remodeling protein (21
). There are not many genes known to be directly regulated by FoxI1, but within the small set of genes that have FoxI1 sites <10 kb away, two genes, Lhx3
, have roles in ear development (52
). This makes them candidates for regulation by FoxI1 since Foxl1 has a known role in ear development (34–36
In vivo testing of zebrafish fragments
Recently developed techniques allow researchers to use zebrafish embryos as a rapid readout for testing in vivo
activity of putative cis
-regulatory elements (55
). We selected 13 fragments and subcloned them into a TOL2-based transposon vector containing Gateway™ compatible cloning sites upstream of a minimal promoter and GFP
(gift of A. McCallion). All 13 fragments showed a significant increase in GFP expression compared to empty vector alone (A). This demonstrated a specific transcriptional response dependent on the inserted genomic fragment since previous research has shown that random genomic fragments do not consistently activate GFP expression (55
). Typically, expression of GFP was similar to the known expression pattern of FoxI1 (B). In order to demonstrate that transactivation was dependent on FoxI1, we selected two fragments, z84 and z11, and tested them in the presence or absence of morpholino oligonucleotides that inhibit FoxI1 expression (OpenBiosystems). Fragment z11 showed no significant differences in the presence or absence of embryonic FoxI1. However, fragment z84 showed a significant increase in GFP expression when FoxI1 protein was inhibited (C, P
< 0.001). We have previously demonstrated that FoxI1 is capable of remodeling chromatin higher order structure and only a few genes are activated or inhibited by FoxI1 expression (21
). In this study, we further found that FoxI1 has an inhibitory role on GFP expression in the context of certain genomic fragments. It is likely that the FoxI1 binding is crucial to recruit other transcription factor to the sites on those fragments that are acting as positive and/or negative regulators.
Figure 3. In vivo testing of isolated FoxI1 fragments. (A) Relative GFP activation compared to the empty TOL2 construct (pcfos). (B) Comparison of GFP activation from several of the fragments compared to the in situ expression pattern of FoxI1 (upper left). Some (more ...)
In summary, the fragments isolated using the yeast technique showed strong enrichment for cis-regulatory elements (13/13 tested) and we have demonstrated in one instance (of two tested) that FoxI1 is responsible for transcriptional regulation from the fragment.