TF-responsiveness was measured from the published data on the change of gene expression following induced overexpression of 50 TFs and 3 other genes [
9]. Expression of a transgene inserted in the ROSA26 locus was induced by doxycycline withdrawal (Dox-), whereas control cells were continuously cultured in a Dox+ condition. The effect of TF manipulation was measured by the logratio of gene expression (Dox-/Dox+) on the 2nd day after induction. We analyzed only those genes whose expression was determined with sufficient accuracy (so that 1.5-fold changes were statistically significant) and whose TSS were known. Thresholds of TF-responsiveness that separate responsive and non-responsive genes (Figure ) were estimated as 75-percentile and 50-percentile, respectively, for two groups of genes with log-expression (log10, RNA-seq) from 0.5 to 1.5 and from 2.5 to 3.5; then thresholds were linearly interpolated as a function of gene expression in ES cells.
The following organs from 30-week old male mice of C57BL/6 strain were used for gene expression profiling: brain cortex, cerebellum, eyes, skeletal muscle, heart, bone, liver, kidney, bladder, skin, visceral fat, lung, small intestine, large intestine, and stomach. Mouse husbandry and organ collection were approved by the Institutional Animal Care and Use Committee (ASP# 220-LG-2011). Although comparable data is available from the GNF database [
15], the advantage of our data is that we used in-house designed microarrays[
35] that represent a large set of genes (
N = 25030), and probes for many genes are more sensitive compared to arrays used in the GNF database. Mice were euthanized by cervical dislocation. Total RNA was isolated by TRIzol (Invitrogen). Cy3-CTP labeled sample targets were prepared with total RNA by Low RNA Input Fluorescent Linear Amplification Kit (Agilent). Cy5-CTP labeled reference target was produced from mixture of Stratagene Universal Mouse Reference RNA and MC1 cells RNA. Samples were collected in 2 replications taken from different animals.
To characterize the effect of inhibitors, which are known to support the pluripotent state of ES cells [
36], we treated B6R(5) mouse ES cells (C57BL/6 strain) with FGFR inhibitor PD173074 [
37] (100 μM), MEK inhibitor PD98059 [
38] (25 μM), and GSK-3 inhibitor BIO [
39] (2 μM) 24 hr after plating. Cells were grown without feeders on gelatin-coated 6-well plates at 100,000 cells/well (10
4 cells/cm
2) in complete ES medium, which was changed daily. Inhibitors dissolved in DMSO were added 24 hr after plating and cells were harvested 48 hr after treatment (72 hr after plating). Control cells were treated with DMSO. RNA was extracted and processed as described. Gene expression data were analyzed using the NIA Array Analysis [
40].
Whole-genome data on chromatin modifications H3K4me3, H3K27me3, and H3K36me3 [
7] were re-mapped to the latest mouse genome (mm9, NCBI/NIH) using the UCSC coordinate conversion tool (
http://genome.ucsc.edu/cgi-bin/hgLiftOver). Tri-methylation of H3K4 and H3K27 was counted for a gene if methylation peaks identified using hidden Markov models and sliding windows was within 1 Kb from the TSS, whereas tri-methylation of H3K36 was counted along the entire transcript length. Genes with H3K9me3 marks with 5 Kb of TSS were taken from the reference [
18]. Strength of H3K4me3 and H3K27me3 methylation was assessed by the number of ChIP-seq tags within 1 and 3 Kb from TSS, respectively, based on the data from the reference [
7]. Expression levels were estimated using published RNA-seq data [
19] and transcript coordinates from the NIA Mouse Gene Index, assembly mm9 [
20], and expressed in log-transformed number of tags per 1 Kb transcript length. The RNA-seq method with random tags is better for comparing expression levels of different genes than microarrays because it does not have biases related to the position of the oligo and its sequence. Obtained gene expression values correlated well with microarray results (
r = 0.71, Additional File
17). A few genes (
N = 345) had no RNA-seq tags, possibly because tags may have been assigned to a different gene copy in the genome. The expression of these genes was interpolated from microarray data (i.e., gene expression in Dox+ conditions) using linear regression. To plot the relationship between TF-responsiveness and chromatin status, genes were ordered by increasing TF-responsiveness and split into sequential sets of 100 genes. We then estimated the average TF-responsiveness and the proportion of genes with a specific chromatin modification in each group of 100 genes.
The location of the main TSS for 17,412 non-redundant genes was taken from [
7] and was identified using CisView [
41] for 4,981 other genes. CpG islands were identified using CpGProD software [
42] and attributed to genes if they were located within 1 Kb from the TSS. TF binding motifs over-represented in promoters of responsive and non-responsive genes were identified and annotated using CisFinder [
23]. We analyzed genes with and without CpG islands separately, and for each group of genes we analyzed 200-bp regions upstream of the TSS and 200-bp regions downstream of the TSS (4 pairs of comparisons in total). Promoters of all genes (from -500 to +500 bp) were searched for the occurrence of TF binding motifs using CisFinder assuming 5 false positive matches per 10 Kb of random sequence. Thresholds for the matching score were further adjusted to a minimum of 2-fold enrichment of motif abundance either in the group of non-responsive genes or responsive genes. In particular, all motif matches were separated into groups according to their orientation ("+" or "-") and position relative to TSS in 100-bp intervals (i.e., from -500 to -400; from -400 to -300,..., and from 400 to 500). The combination of these two criteria yielded 20 groups of motif matches, which were analyzed separately. If the over-representation ratio of motif matches in a specific group was >2 fold, then all matches were counted in this group. However, if the over-representation ratio was <2 fold, then the matching threshold was increased to achieve the 2-fold enrichment. If the 2-fold enrichment was not achieved after any increase of the threshold, then no matches were counted in that group. Because TATA box has a strictly defined location in the promoter, it was handled separately from other TF binding motifs. TATA box was identified using the degenerative pattern KAWWW starting from 40 to 20 bp upstream of the TSS [
41].