In order to test our proposals we analyzed two publicly available datasets corresponding to the two conditions (CM and E2T) described above
]. Details on the datasets are available in the Methods section. We report here the results of a set of enrichment tests performed on these data. Details on the tests are also available in the Methods section and in the Additional file
ERα preferentially binds DNA within particular classes of TE sequences in both CM and E2T conditions. While some of the selected TEs are enriched in both conditions, a few of them are specifically enriched only in CM or E2T datasets.
We show the results of our enrichment test in Figure
A for the “CM” dataset
] and in Figure
B for the “E2T” dataset
] (see also Additional file
: Table S3 and Additional Table S4 for absolute values and Figure
Figure 1 Enrichment of TEs in the ERα binding datasets. Histogram of the z-score values showing the enrichment, with respect to random reshuffling, of single TEs (left) and classes (right) in the CM dataset (A) and in the E2T dataset (B) respectively. (more ...)
Figure 2 Relative fractions of enriched single TEs and classes. Pie-charts showing the relative fraction of enriched single TEs and classes shown in Figure
A and Figure
B. Top pies: single TEs; bottom pies: classes; left pies: (more ...)
The very high levels of enrichment observed prove that ERα preferentially binds DNA within a few specific families of TEs both in CM and in E2T conditions. This confirms previous results for ERα reported in
] (which were based on a different ChIP-seq dataset) and agrees with what already observed for p53
], POU5F1 also known as OCT4
] and STAT1
Examining results in more detail, we see that the TEs are not all on the same ground and that the highest enrichment scores are reached for a few selected TE families. This suggests that besides a general affinity for the given TE class, ERα is able to perform a fine-tuned selection and shows a specific affinity for a few well-defined small families of TEs within the class. This remarkable selectivity allowed us to perform a detailed analysis of the combinatorial regulatory patterns involving ERα and to identify a few putative new co-regulators. Our results suggest that each particular TE family within a given class should actually correspond to a different combinatorial regulatory pattern.
As far as the difference between the two datasets is concerned, when comparing the enrichment scores we clearly see a twofold pattern. The ERV-like TEs (in particular those belonging to the ERV1 and ERVL classes) are enriched in both datasets while the MIR-like TEs show a preferential enrichment in the E2T dataset. In order to test this observation further we performed the same enrichment analysis separately for the three groups of sequences: i) intersection of the two datasets, ii) the binding events present only in the CM dataset and iii) the binding events present only in the E2T dataset. The results are reported in Table
and nicely confirm the previous results.
Table 1 z-scores (first five columns) and numbers of instances (last five columns) of the most enriched transposable elements and classes of TEs for the five possible combinations of the two datasets: CM, E2T, CM only (= CM\ E2T), E2T only (= E2T\ CM) and the (more ...)
We see two possible explanations for the different behaviour in the two datasets. They correspond to different experimental settings: the E2T dataset corresponds to an acute, sudden exposure to elevated concentration of 17β-estradiol, while the CM dataset corresponds to cells that are grown in a steady state situation of low estrogen concentration. Therefore, either i) the two different exposures to estrogen likely correspond to different concentrations of the Estrogen Receptor in the nucleus, thus making it possible that also binding sequences with a lower affinity to the receptor are bound in the dataset (E2T) with the highest concentration; or ii) continuous versus pulse estrogen exposure would result in different concentration and activity of cofactors known to play a crucial role in fine tuning the binding of ERα to its target sequences.
Most of the enriched classes of TEs predate the human-mouse separation but a few of them appeared in the last 100 Myrs and induced a rewiring of the human regulatory network with respect to the murine one.
An interesting feature of the above results is that most of the enriched TE families are rather old Transposons. This is particularly true for the MIR-like sequences, whose amplification is known to predate (at least in part) the mammalian radiation
]. In the case of the ERV-like sequences the time window is slightly larger. TEs belonging to the ERVL class, which is the largest class of ERV sequences, and is enriched in both our datasets, ceased their activities only 40 Myrs ago
], while TEs belonging to the ERVK class (which is enriched only in the CM dataset) are definitely younger. A tentative estimate of their mean age gives numbers ranging from 8 Myrs (if only human ERVK are considered) up to 18 Myrs (if human and chimp ERVK are considered)
]. Even if it is widely accepted that these TEs are no longer active, examples of very recent ERVK insertions (about 6 Myrs old) exist
]. In any case it is clear that most of the ERVK insertions occurred after the new world/old world monkeys split.
Thus it seems that, as far as these TEs binding sequences are concerned, a twofold picture exists also for the network rewiring.
A large portion of the ERα regulatory network (and in particular the part which is activated in the E2T condition) underwent a dramatic rewiring at the beginning of the mammalian radiation (more than 100 Myrs ago) but then it stabilized, and is now shared by all the Mammalians. This obviously does not exclude local changes and possibly also transposition events after the mouse/human divergence, but suggests that these should not be the rule and that a massive rewiring of this subnetwork in the last 100 Myrs should be excluded. Finding evidences of this conservation by a direct comparison of the human and mouse sequences is very difficult because most of these ancient TEs are too corrupted to be identified. Nonetheless, by combining the syntenic maps and the TE annotations of the human and mouse genomes downloaded from the Ensembl database
] we were able to single out, among the TEs bound by ERα in our datasets, several pairs of conserved Trasposons (our results are summarized in Additional file
: Tables S6 and S7). These examples are of particular interest since the very fact that they could be identified suggests that a large portion of the TE sequence should be under purifying selection and, therefore, they may represent important regulatory sequences. As expected from the above discussion, most of the conserved TEs belong to MIR-like classes (since most of the ERV type TEs are younger than the human-mouse separation) with a few notable exceptions like MER20, MER31A and MER77 which are known to be among the oldest ERV-type TEs.
Among the TEs enriched in our dataset belonging to the MIR family, a fraction between 15% and 25% of the TEs was conserved between human and mouse. We then looked for putative regulated genes of these conserved MIR-like TE’s (details in the Methods section). We found several putatively regulated genes (see Additional file
: Table S10 and S11) and, remarkably enough, most of them turned out to be important known targets or cofactors of ERα. In particular, among the regulated genes, we found GREB1 (growth regulation by estrogen in breast cancer 1 gene) which is known to be regulated by ER and to be involved in estrogen-responsive breast cancer (see for instance
]); RARα, which is a known cofactor of ERα
]; TAP/SEC14L2 which is known to be downregulated by ER in breast cancer cells
]; GLUT1/SLC2A1, whose expression in the luminal epithelial cells of the uterus was shown to be regulated by ERα
] and is involved in the ERα mediated response to hypoxia
] and several other ERα regulated genes like Anxa6
] and PRKACA
These findings suggest that also some of the remaining genes targeted by conserved TEs could be involved in ERα regulated pathways and that this combination of ChIP-seq and evolutionary conserved TEs could be in general (i.e. also for the other TFs associated TEs dicussed in the introduction) a powerful tool to identify reliable targets.
On the contrary, the portion of the regulatory network that is activated in CM conditions, which is strongly enriched in ERVL and ERVK insertions, most probably underwent a significant rewiring in the last 100 Myrs. However, note that, as above, also in this case one cannot exclude local cases of convergent evolution (see below for a nice example) or that part of these insertions (being too young to have been modeled by evolution) simply correspond to non-functional bindings, thus mitigating the rewiring effect of the ERV-like transposons expansion.
MIR-like and ERV-like TEs targeted by ERα are preferentially located near the TSS of regulated genes
In order to support the idea that the MIR-like and ERV-like TEs in the peak datasets have a regulatory role, we identified their chromosomal localization and measured their distance from the Transcription Start Site (TSS) of the nearest gene. Results of this analysis are reported in Figure
A for MIR-like TEs and in Figure
B for ERV-like TEs. We see that in both cases the TEs nicely peak around the TSS. The pattern is very similar in the two datasets and supports our suggestion that these TEs were indeed exapted by the Estrogen Receptor and are likely to play a regulatory role in driving the response to estrogen stimulation.
Figure 3 Selected classes of TEs are located nearby the TSS of known genes. (A) Histogram of the distance of MIR-like TEs targeted by ERα from the TSS of the nearest gene. (B) Histogram of the distance of ERV-like TEs targeted by ERα from the TSS (more ...)
TEs selected by ERα are enriched for binding sequences of TFs representing known cofactors of ERα. Our approach allows the identification of new putative cofactors
In order to exert its regulatory functions, ERα very often interacts with other TFs. In particular, clear evidences of regulatory interactions with FoxA1, RARα, GATA3, SP1 and AP1 (JUN
FOS) have been recently observed (
In order to test whether these interactions are mediated by TEs and in particular by those that are enriched in the two datasets, we evaluated the Transposon Affinity Score (TAS) value (for the definition see the Methods section) for each TF. Results are reported in Additional file
: Table S3 and S4 separately for the two datasets (only TFs with TAS value greater than 0.1 for at least one of the enriched TEs are considered).
Remarkably enough, most of the known cofactors of ERα listed above have a TAS value above threshold in at least one of the enriched TEs (in most cases in several TEs). Moreover the same procedure allowed us to identify a few combinatorial patterns associated to selected classes of TEs, thus supporting our suggestion of a TEs driven combinatorial organization of transcriptional regulation. These data (reported in Additional file
: Table S3 and S4 and in Figure
) allowed also the identification of several other putative cofactors (like AP-2 factors or ERR) for which, as far as we know, experimental evidences like those mentioned above for FoxA1, RARα, GATA3, SP1 and AP1 are still missing. In Figure
we report the heat-maps showing the fraction of enriched transposable elements in the datasets which carry given computationally predicted transcription factor binding sites. Yellow corresponds to low fractions, blue to high fractions. In the figure we also report the hierarchical clustering of both TEs and TFs. These heat-maps are probably the best representation of the “Transposon code” mentioned above: each column corresponds to a different TE and the blue spots along the column to the particular combination of TFBSs brought about by that particular TE. Well defined combinatorial patterns involving TEs belonging to the same class can be recognised in the heat-maps. Their organization is made more explicit by the hierarchical clusterings reported along the sides of the heat maps. Smaller versions of these maps, involving only the most important known cofactors of ERα can be found in the Additional Material (Additional file
: Figure S1 and Additional file
: Figure S2).
Figure 4 Enriched TEs are often bound by additional TFs, besides ERα. The heat-maps show the fraction of enriched TEs in the datasets which carry particular computationally predicted TFBSs. A refers to the CM dataset while B refers to the E2T dataset. (more ...)
Selection of combinatorial patterns using our “Transposon Affinity Score” has evolutionary implications. A TAS value above 10% (i.e. ten times the one expected by chance) is likely to occur only if the TE contained in its ancestral sequence one or more subsequences similar to the TFBS under study. If we accept this assumption then the TAS value measures the fraction of such ancient subsequences modified by evolution to become TFBSs (or conserved, if they were already present).
TFs enriched in the datasets, and in particular the known ERα cofactors, tend to be simultaneously present and to appear at fixed distance in the TEs that we studied.
A possible objection to the above statement is that simultaneous enrichment does not necessarily imply that the two TFBSs are simultaneously present in the same TE instance. To address this issue, for each set of TEs we looked for correlators of TFBSs at fixed distance along the sequence, i.e. we counted how many times a pair of TFBSs occurs exactly at a distance d
along the TE sequence. We report the results of this analysis in Additional file
: Table S5. An interesting example is plotted in Figure
for the case of ERR and GATA-3. As can be seen from Figure
, this correlation function has a very sharp peak at a fixed distance. This behaviour is typical of most of the identified correlators and agrees with a set of recent observations (see for instance
]) showing that TFs cooperation, to be effective, requires tight spacing patterns between the TFs.
Figure 5 TFBSs tend to appear at fixed distance in the TEs that we studied. The case of MER41B is reported as an example. In A is shown the logo of ERα ChIP-seq regions carrying GATA-3 and ERR at the fixed distance of 24 nts (measured from start to start (more ...)
Both in the CM and in the E2T datasets a putative ERR binding site is associated to GATA-3 at a fixed distance of 24 bp in the context of MER41A/B TEs. These TEs also host other important correlations: for example BRCA1 with E2F-1 (at a fixed distance of 33 bps) and RARα with GATA-3 (at a fixed distance of 19 bps, see the results listed in Additional file
: Table S5). Thus we see that two of the TEs showing the highest level of enrichment in our datasets host a collection of TFBSs located at a fixed distance among them, whose relative position seems to be carefully conserved by evolution. This example is interesting also for another reason. BRCA1 is characterized by a PWM with a very low information content and for this reason we could not single it out in the enrichment analysis described in the previous section. The same happened also for other TFs with similar scarcely informative PWMs. This problem is completely overcome by the correlation analysis discussed here, which is much more constrained and allows precise identification of TFs even when they display scarcely informative PWMs.
The genes of our datasets regulated by particular classes of TEs show a non trivial enrichment with respect to a few Gene Ontology categories
To further support these results, we searched if genes connected to specific TEs (or classes of TEs) are enriched for specific GO categories, following the procedure discussed in the Methods section. We performed this analysis both on the whole set of enriched TEs and separately on the subset of TEs conserved between human and mouse. Results of the analysis are reported in Additional file
: Tables S1 and S2 for the whole set of enriched TEs and in Additional file
: Tables S8 and S9 for the subset of conserved TEs.
We found several interesting results. A few of them are clearly related to the expected function of estrogen, like “cell-cell signaling” (p
4 10^-3) or “secreted” (p
5 10^-4), (enriched as expected in the E2T dataset) but a few others, which present rather strong functional enrichment are new and could suggest new functions (both in CM and E2T conditions) driven by selected classes of TEs. Interesting examples are GTPase regulator activity (p
6 × 10^-5), regulation of Ras protein signal transduction (p
6 × 10^-4) or Keratin (p
10^-6). In particular, this latter category, which is enriched in the E2T dataset and is strongly associated to the MER20 TE, suggests that this particular class of TEs could play a critical role in remodeling cell morphogenesis, a key process in mammary gland differentiation
]. In agreement with this observation, it is very interesting to notice that this specific TE is enriched in E2-treated dataset suggesting a link with increasing level of estrogen during pregnancy.