In this study, we present high-resolution
in vitro DNA binding specificity data and motifs for 27
S. cerevisiae TFs, including some that contain a DBD for which no high-resolution motif had existed previously (for example, Vhr1 and Vhr2). These results contribute towards a complete set of high-resolution DNA binding specificity data for all TFs in this important model organism. In particular, our
in vitro PBM analysis of
S. cerevisiae TF DNA binding brings the set of known yeast TFs with high-resolution DNA binding specificity data to 150 (about 85%) out of a conservative total estimate of 176 TFs likely to have inherent sequence-specific, double-stranded DNA binding activity. With the addition of a more permissive set of 40 proteins (Additional file
12) that might exhibit DNA binding specificity (total of 216), this still brings us to at least 70% coverage of all
S. cerevisiae TF DNA binding specificities. We note that these estimates may differ from previous studies because we refer strictly to TFs with intrinsic DNA binding specificity and do not include proteins that interact with DNA only indirectly.
In total, our curated collection contains high-resolution DNA binding data for approximately 85% of all known and likely sequence-specific DNA-binding proteins in
S. cerevisiae. The remaining approximately 15% of sequence-specific
S. cerevisiae DNA-binding proteins might require targeted investigation or specialized strategies in order to achieve complete coverage of high-resolution DNA binding specificity data for all
S. cerevisiae TFs. We have identified 26 proteins that either are known TFs or have demonstrated lower resolution experimental data on their DNA binding specificity, or that contain a known sequence-specific DBD; we consider these proteins as the highest confidence candidates for future high-resolution
in vitro PBM analysis (Additional file
12). Although most of these 26 proteins are from DBD classes with known sequence-specific DNA binding activity (bZIP, homeodomain, zinc cluster, copper-fist, bHLH), their previous failed attempts by
in vitro methods may indicate that specific small-molecule cofactors and/or protein partners may be required for specific DNA binding [
22]. Investigations of the effects of post-translational modifications on TFs might also reveal requirements for DNA binding specificity or conditions for modified DNA binding specificities.
Generation of a complete set of DNA binding specificity profiles for all
S. cerevisiae TFs might also require experimental testing of proteins of even lower confidence, or to be identified by other criteria, for having potential sequence-specific DNA binding activity. Considering the set of all 222 proteins identified from previous TF candidate lists [
7,
10,
11] and updated annotations in the
Saccharomyces Genome Database [
50], we identified 40 proteins (Additional file
12) either that contain putative nucleic acid binding domains (Myb; zf-C2H2) found in other proteins that exhibit sequence-specific DNA binding, or that are known to interact with DNA or to be involved in transcriptional regulation, but for which it is currently unknown if they bind DNA directly in a sequence-specific manner (we note that availability of a DNA binding site motif from ChIP-chip data cannot be considered evidence of direct DNA binding of the TF tested by ChIP, as some factors may bind DNA only indirectly as part of transcriptional regulatory complexes [
13]). Several of these proteins belong to multisubunit complexes (for example, Hap2/3/4/5 complex) and may need to be examined for DNA binding specificity in the context of their protein partners [
51]. We annotated a set of 156 proteins as unlikely (Additional file
12) to possess sequence-specific DNA binding activity since they either contain protein structural domains that have never successfully yielded a motif from this or prior large-scale
in vitro surveys of TF DNA binding specificity, or interact with DNA indirectly, or lack prior literature evidence for direct sequence-specific DNA binding. Finally, in addition to traditional sequence-specific DNA binding site motifs, DNA structural motifs such as the recombination intermediates recognized by HU protein [
52] or alterations in DNA helical twist angle patterns could be investigated.
Towards the goal of collating a complete set of
cis-regulatory DNA sequences in
S. cerevisiae, we performed a complementary analysis - that is, considering candidate regulatory elements not from a protein-centric viewpoint, but rather from the standpoint of putative
cis-regulatory motifs. We collected 4,160 previously published
S. cerevisiae DNA motifs (Additional file
13), including known TF binding site motifs and candidate regulatory motifs derived from ChIP and gene expression data (Additional file
1). Our goal was to identify 'orphan' motifs, that is, those that do not match any known TF DNA binding site motifs. We identified 34 orphan motifs (Figure S6 in Additional file
1); comparisons to all TF DNA binding site motifs in the JASPAR, TRANSFAC, and UniPROBE databases [
53] (Additional file
1) did not identify significant matches to known TF DNA binding site motifs containing DBDs not yet annotated as occurring in any
S. cerevisiae genes. Some orphan motifs might correspond to novel TFs with DBDs not yet annotated in yeast, while others might represent weak matches to known TF binding site motifs for TFs that might be utilized only in specific cellular conditions, or in the presence of particular co-factors, or in the context of a limited number of
cis regulatory regions. Alternatively, some of the orphan motifs may represent enriched DNA sequences without a transcriptional regulatory role, or may be artifactual motifs returned by various motif discovery algorithms. Directed experimentation will be required to distinguish among these different possible scenarios.
The high-resolution nature of the
in vitro data that we compiled in this study allowed us to perform in-depth analyses of the DNA binding specificity of TFs, resulting in novel structural and gene regulatory insights, which would not have been possible using only the motifs reported in the literature from small-scale experiments that assay binding to only a subset of potential DNA binding sequences or from ChIP experiments. Our results suggest a number of structural studies that would be interesting to pursue to investigate distinct DNA binding specificities recognized either by an individual TF or different TF family members. For example, structural studies would aid in understanding how the bZIP protein Hac1 can bind E-boxes (typical of bHLH proteins) as well as the bZIP ATF/CREB motifs [
54]. Similarly, structural studies of Upc2 would provide insights on how it (and its close paralog Ecm22) recognize the sterol response element (SRE; TCGTATA) [
55], whereas most other members of the fungal-specific Zn
2Cys
6 family recognize CG-rich binding sites primarily comprising CGG triplet half-sites separated by degenerate spacers of varying lengths [
11]. It would also be interesting to determine how structurally distinct DBDs can recognize similar DNA sequences. Vhr1 and Vhr2 contain a relatively uncharacterized DBD for which no structural data are available from any species; it is not yet even known which amino acid residues in the Vhr1 DBD contact DNA. Our PBM data indicate many similarities in DNA binding specificity between the VHR class and members of the well-characterized bZIP family. Finally, the
in vivo utilization of primary and secondary motifs for distinct biological functions by Sko1 suggests a novel gene regulatory mechanism, namely, the potential for different functions to be divided among distinct DNA binding sites in the genome for a particular TF. The extent of functionally distinct primary and secondary TF motifs would be interesting to investigate in higher eukaryotes in future studies.
In summary, this study expands our understanding of redundancy and divergence among TF family members from a structural standpoint and in terms of their regulatory functions. Moreover, this study brings us closer to, and outlines a set of priorities for, the complete characterization of TF-DNA interaction specificities in S. cerevisiae. The data presented here will be a valuable resource for further studies of transcriptional regulatory networks, and also for further investigations of protein-DNA recognition rules within different TF families. Such efforts in S. cerevisiae serve as a template for similar work aimed at cataloguing and completely characterizing TF DNA binding specificity in higher eukaryotic model organisms and in human. Ultimately, a complete compendium of human TF-DNA interaction specificity will involve cell- and tissue-specific, as well as disease-specific, interaction data that will provide invaluable details towards our understanding of development and disease.