There are an estimated 1400 site-specific DNA binding factors encoded in the human genome [1
]. Although these factors can influence transcription when their binding sites are cloned in front of core promoters, they usually do not function alone. Most often, individual transcription factors collaborate to orchestrate gene expression through combinatorial binding to regulatory regions in chromatin [2
]. These regions, termed cis
modules, thereby activate, repress or otherwise epigenetically modify the transcriptional responses of individual genes. Elucidating the position and activities of individual cis
modules using reporter genes is time consuming and expensive. With recent advances in DNA sequencing technology, it is now feasible to generate global protein-DNA interaction profiles by chromatin immunoprecipitation (ChIP) followed by ultra-high-throughput sequencing [3
modules can then often be identified by applying bioinformatics searches for one or more cis
motifs recognized by unrelated alternative factors near the binding sites of the factor analyzed by ChIP-seq or by the co-localization of bound sites for two or more unrelated different site-specific factors.
Nuclear receptors (NRs) represent a special class of transcription factors that direct target gene transcription in a ligand-dependent fashion. NRs contain a DNA-binding domain that recognizes a specific DNA sequence, as well as a ligand binding domain that renders these factors environmentally-dependent regulators via interaction with distinct cognate ligands [4
]. The great majority of NRs homodimerize or heterodimerize with another NR, and then bind to two copies of a repeated hexanucleotide sequence (called a half-site) separated by variable spacing [5
]. The half-site consensus, AGGTCA, can occur in either orientation and variation from the consensus allows numerous alternative binding sites of (probably) variable affinity [5
]. Based on the number of spacer nucleotides separating the two half-sites and the orientation of the two half-sites relative to each other, NR binding sites have been categorized as direct repeats (DR0 - DR8), everted repeats (ER0 - ER8) or inverted repeats (IR0-IR8) [5
NR2C2 (human testicular receptor 4, TR4, in the older nomenclature) belongs to the nuclear receptor superfamily and is termed an orphan receptor due to the fact that no ligand has been discovered [6
]. TR4 was initially identified in hypothalamus, prostate, and testis cDNA libraries, but has since been demonstrated to be broadly expressed in many physiological systems [9
]. For example, TR4 has been shown to activate target gene expression in liver carcinoma HepG2 cells [11
]. In contrast, in erythroid cells, TR4 can heterodimerize with another closely related family member (TR2, or NR2C1) and binds to a DR1 (direct repeats with one nucleotide spacer) element to repress target gene transcription [12
]. The binding affinity of the TR4 homodimer for the DR1 element in vitro
is equivalent to that of the TR2:TR4 heterodimer [15
], and TR4 mRNA is more abundant than TR2 in human erythroid cells (Tanabe, unpublished observations). However, the broader physiological functions for, and the in vivo
genome-wide binding patterns of, this broadly expressed nuclear receptor are obscure. We therefore chose to initially investigate genome wide TR4 binding anticipating that these studies might reveal some common, but also perhaps some tissue-specific, metabolic processes to which this factor contributes.
In this study we investigated the first genome-wide identification of cellular targets of TR4 and preliminary characterization of TR4 in vivo binding in multiple cell types, including those in which TR4 has been suggested to be an activator (liver) and cells in which TR4 has been suggested to be a repressor (blood). Using ChIP-seq, we determined TR4 in vivo binding in four human ENCODE cell lines: K562 erythroleukemia cells, HepG2 liver carcinoma, HeLa cervical carcinoma, and GM12878 immortalized lymphoblast cells. TR4 binding patterns identified in the four diverse cell lines suggest that this factor controls cell metabolism by binding to the proximal promoter regions that are common to several hundred genes. Motif analysis shows that TR4 strongly prefers a DR1 sequence to all other categories of repeat elements in vivo. By integration of TR4 binding data with histone modification patterns and other genomic structures, we predict, and then experimentally test, putative cis modules.