The transcription factor YY1 is a
Gli-Kruppel type zinc finger protein that is highly conserved from insects through vertebrates [
1]. YY1 can function as an activator, repressor, or initiator depending upon the other regulatory elements in the region [
2]. YY1 also interacts with a variety of proteins including components of RNA polymerase II complex, transcription factors, and histone-modifying complexes [
3-
5]. According to genome-wide surveys, about 10% of all human genes contain YY1 binding motifs in their promoter regions [
6]. Functionally, YY1 is involved in many biological processes, including embryonic development, cell cycle progression, apoptosis, B cell development, Polycomb group Gene (PcG)-mediated repression, genomic imprinting, and X chromosomal inactivation [
2,
3,
7]. YY1 was also initially identified as a factor controlling the transcriptional activity of the murine retrotransposon 'Intracisternal A Particle' [
8]. Since then, many retroposons, including SINE, LINE, and endogenous retrovirus families, have been shown to contain YY1 binding sites in their promoter regions [
3,
4]. Due to this ubiquitous presence of YY1 binding sites in genome-wide repeats, YY1 has also been regarded as a surveillance gene that is responsible for repressing transcriptional background noise from these repeats [
9].
The olfactory receptor (OLFR) genes of mammals encode short, single coding exon, G protein-coupled receptors that are responsible for sensing a large number of air-borne scents [
10]. This gene family is comprised of over 800 and 1,300 gene members in human and mouse respectively, forming the largest gene family in mammalian genomes [
10-
13]. The aquatic vertebrates, the teleost fish lineage, also have a similar odorant receptor gene family [
14]. However, the odorant receptor (OR) family of the fish lineage consists of a much smaller number of genes than that of the mammals, and these OR genes are also much more diverse in sequence identity than those of mammals. Mammalian OLFR genes are divided into Class I and Class II groups based on sequence identity [
15]. Class II genes make up ~90% of OLFRs and are thought to have expanded during the transition to land-based living.
In mammals these olfactory receptors presumably expanded due to the selective advantage conferred by a well developed sense of smell [
15]. While mice and other mammals retain function and expression of almost all OLFRs, the majority of these are pseudogenized in humans [
16]. The mammalian OLFR genes are highly tissue-specific and are expressed primarily in the olfactory epithelium though a subset expresses in a chemosensory role in other tissues such as kidney and sperm [
17-
19]. Furthermore, only one copy (allele) out of all 1,000 OLFR genes is selected and expressed in each neuron cell of the olfactory tissue [
20]. The unusual transcriptional control of the OLFR gene family is likely mediated through unknown
trans-acting factors [
21]. The tissue-specific nature of their expression coupled with their widespread duplication requires a mitotically-stable global silencing mechanism in all cell types. According to recent studies, potential
cis-regulatory elements recruiting these
trans factors are hypothesized to be located within the protein-coding regions of the OLFR genes rather than their surrounding genomic regions [
22,
23].
While performing genome-wide searches of the DNA-binding motifs of YY1, we discovered that the mammalian OLFR genes contain unusual clusters of YY1 binding sites within their protein-coding regions, whereas most YY1 binding sites are solitary and upstream of a regulated gene. In this study, we further analyzed the significance of this discovery with several bioinformatic and statistical measures, which will be described below. Specifically we test whether the presence of the YY1 binding sites could be explained by DNA sequence or amino acid motif conservation.