Genome-wide gene expression data using SAGE or DNA microarrays has provided a wealth of information on the regulation of genes under certain conditions or by specific transcription factors. The combination of this information with sequence analysis programs has enabled researchers to identify potential regulatory sites. For example, in a pioneering paper, Tavazoie et al.
clustered expression data and used multiple local sequence alignment algorithms on the promoter regions of the co-clustered genes to discover regulatory motifs [19
]. This approach has been further refined by using Bayesian networks to incorporate additional constraints regarding relative positions and the orientations of the motifs [20
]. Another approach has been to break the genes into modules and perform module assignments and motif searches at the same time via an expectation maximization algorithm (as opposed to clustering first and finding motifs later) [21
]. Although these approaches have worked well at identifying potential targets sites one drawback is that the expression patterns have to cluster well for these methods to work. For a small number of microarray experiments, this may always not be the case. A method that does not utilize clustering is a regression model based analysis to locate "words" in the promoter that correlates with modulation of expression [23
]. However, this approach is restricted to retrieving functional consensus binding sites in the promoter regions and for transcription factors with low sequence specificity, this approach needs to be modified. Most of these approaches attack the difficult problem of what to do when relatively little is known about the regulatory system and sequence recognition by the protein. Consequently they develop pattern recognition algorithms that are essentially unsupervised. Our focus has been to take advantage, as much as possible, of knowledge about the biological system and use that information combined with expression analysis to identify potential target sites. The minor loss of generality of the tools resulting from such an approach is more than offset by its predictive power.
To determine if the changes in expression of a specific gene are the result of a transcription factor working at the promoter we developed an algorithm that combines expression data with information on the binding site preference for a transcription factor. As a test for this algorithm we identified genes in yeast that are direct targets for regulation by the a1-α2 repressor complex. We also used this method to identify genes that are repressed in diploid cells but that are not direct targets of the complex, as well as functional a1-α2 binding sites that do not appear to repress transcription in their genomic context. The combination of these sets of findings has provided insight into the regulatory network and mechanism of repression by the a1-α2 complex.
The primary goal of this study was to identify genes that are direct targets for repression by the a
2 complex. There are two major functional subsets among the a
2 target genes identified in this analysis (Table ). One, not surprisingly, involves genes that are required for various processes in mating of the two haploid cell-types. These include components of the mating pheromone signal transduction pathway, such as GPA1
, and STE5,
which are activated in response to the binding of pheromone from the other cell type [24
]. This group also includes genes further down that pathway, such as FAR1
, which are required for cell-cycle arrest before mating. A number of these genes have previously been shown or suspected to be under the control of a
2 repressor complex [25
]. Repression of these genes in diploid cells is biologically important because it prevents further mating by diploid cells. If diploid cells mate they would form triploids or higher ordered genomic polyploids, which are genetically unstable during meiosis and therefore detrimental to cell survival.
The second subset of genes identified in the analysis is associated with mating type switching and recombination. The HO
gene is a known target of the a
2 complex and its promoter contains 10 binding sites of varying affinity [25
]. Repression of HO
is essential in diploid cells because it prevents switching of one of the MAT
loci to form homozygous a
diploid cells. Although diploid in genomic content, cells homozygous for the MAT
loci are competent to mate and therefore would form higher order genomic polyploids that are genetically unstable. We have also shown that NEJ1
, which is involved in non-homologous end-joining (NHEJ), is a direct target for the a
2 complex [27
]. It has been proposed that that repression of the NHEJ pathway may promote homologous recombination and crossing over in diploid cells. In addition, we found that RDH54
, a gene involved in double-stranded DNA break repair, is a direct target for the a
2 complex [29
]. This result is somewhat unexpected because RDH54
is required for meiosis and null mutants show significantly reduced spore viability. It is likely that the a
2 complex only partially reduces the level of expression of the gene and that diploid cells require a lower level of activity of the protein.
We also identified several genes that fell outside of these two subsets. One is RME1
, which encodes a transcriptional repressor of IME1
, the master regulator of meiosis [30
2-mediated repression of RME1
is required to allow cells to enter the meiotic pathway in diploid cells. Interestingly, we also found that PDE1
are weakly, but reproducibly, direct targets for repression by the a
2 complex. The Pde1 protein is a low affinity cAMP phosphodiesterase that appears to have a role in response to stress and cell aging [33
]. Repression of PDE1
in diploids may partially account for the difference of starvation response between haploids and diploids. Met31 is a zinc finger DNA-binding protein that activates genes involved in sulfur metabolism [34
]. It is unclear why this gene would be a target for the a
It is possible that the presence of an a
2 target site upstream of a gene that has lower expression in diploid cells was fortuitous and that these sites were not functional targets. However, if this was the case then there would be little pressure to conserve these binding sites through evolution. Several closely related species of yeast have been sequenced and comparison of the corresponding promoter regions has led to the discovery of conserved regulatory motifs [35
]. Although lack of conservation does not imply non-functionality, significant conservation strongly argues for functionality of a putative regulatory element. To investigate this possibility, we performed a phylogenetic comparison to infer whether these sites are preserved among six sequenced Saccharomyces
species using the PhyloGibbs program [37
]. The program identified the a
2 binding site among a promoter set including many known haploid-specific genes (HO, NEJ1, GPA1, STE4,
). This analysis also showed that the a
2 binding sites in the RDH54, PDE1
promoters are strongly conserved among multiple species, suggesting that these sites play an important functional role.
Our analysis identified a number of haploid-specific genes that do not appear to be direct targets of the a1-α2 repressor complex (Table ). Genes in this list do not contain a recognizable a1-α2-binding site and, with the exception of NEM1, are not detectably bound by the a1-α2 complex in the ChIP assays. It is possible that a1-α2 indirectly turns off these genes by repressing an activator protein that is required for their expression. However, besides MET31, there were no obvious genes coding for activator proteins that were direct targets of the a1-α2 complex. It is possible that the haploid-specific genes without a1-α2 sites are indirectly repressed through more complex mechanisms that involve repression of RME1.
We also identified potential a
2-binding sites in the genome that do not appear to repress expression of nearby genes. Although sites from the PRM8
, and LSM1
promoters appear to be moderate binding sites for the a
2 complex in vitro, ChIP and heterologous reporter assays showed these sites are neither bound by the proteins nor are functional repressor sites in vivo. Many of these sites lie in open reading frames of actively transcribed genes and so it is possible that transcription through the binding site or the chromatin structure of the region prevents high affinity binding by the complex. The model that the genomic context of these sites is important for their regulatory activity is further supported by our results that show that some of these sites, such as COX13
, function as strong a
2 dependent repressor sites in the context of the heterologous promoter. Although a
2 complex is bound to the COX13
site in vivo it does not appear to repress transcription of this gene in diploid cell. Interestingly, this binding site is very close to the end of the coding region of IME4
, an inducer of meiosis that is expressed in diploid cells [38
]. The IME4
gene is only expressed in diploid cells and it was thought that the a
2 complex may be indirectly activating its expression by repressing a repressor protein, such as RME1
. However, the fact that a
2 binds to the downstream region of this gene suggests that it may play a direct role in its expression.
Our data shows that the algorithm we have developed is useful in sorting between direct and indirect targets of a transcription factor. Although we have used mutational data to define the binding site for the a1-α2 complex, in principal binding site sequences derived from site selection experiments may also be used. This analysis may also complement genome-wide ChIP studies to identify the target sites of the transcription factor.