IAPs and ETn/MusDs are high copy number ERV families and, while hundreds to thousands of copies are present in the genome, relatively few are present near genes. Because DNA methylation in general targets TE copies, it is important for the host to manage the impact of epigenetic regulation of the copies that remain near genes. We show here, for the first time, that two ERV families, ETn/MusD and IAPs, are differently targeted by DNA methylation when near genes, with nearly all IAP copies remaining methylated throughout the genome but ETn/MusD copies being less methylated when near TSSs. Our dataset, although limited, contains every ETn/MusD copy close to genes and 30% of all IAP copies found near genes (78% of all IAP copies within 2 kb of a TSS). Therefore, our conclusions could reasonably apply to all copies of both types of ERVs in the genome.
We have previously shown that the repressive mark H3K9me3 spreads robustly from IAPs but less so from ETn/MusDs [5
]. Further evidence that these two ERV families are distinctly epigenetically regulated comes from a recent study showing that knockdown of both Dnmt1 and SetDB1 (responsible for depositing H3K9me3 on these ERV families) is required in ES cells to achieve robust de-repression of IAP transcription, whereas only SetDB1 knockdown is necessary for activation of ETn/MusD [14
]. These data could suggest that IAPs are more detrimental to host genes than ETn/MusDs, and are thus under more stringent control.
A recent study demonstrated that Alu SINE elements are hypomethylated in human when positioned near expressed genes, but are methylated when near silenced genes [32
]. However, in marked contrast to ERVs, Alus are generally well-tolerated near genes and in fact show enrichment in gene-rich regions [33
], suggesting epigenetic interactions between Alus and host genes are quite different than those between ERVs and genes. In rice, the retrotransposon dasheng
presents tissue-specific DNA methylation correlating with nearby gene expression tissue specificity [35
]. Furthermore, dasheng
unmethylated copies impact host gene expression by producing antisense chimeric transcripts that putatively promote mRNA degradation [35
]. Here, we found that mouse ERV elements impact the host gene by donating a promoter and producing fusion transcripts.
All 5' LTRs included in our analysis are methylated. Therefore we hypothesize that, since the regulatory sequences necessary for ERV transcription and possible transposition are present in the 5' LTR, methylation, and consequently silencing, of this LTR is necessary to reduce harmful effects of putative new transpositions. Furthermore, we have shown that, compared with CGI promoters, non-CGI promoters are relatively depleted of instances where the 5' LTR is proximal. This observation suggests that spreading of DNA methylation from 5' LTRs into non-CGI promoters might be the more likely scenario, thereby leading to harmful effects on gene expression and negative selection against such ERV copies. Indeed, the role of CpG methylation on the regulation of non-CGI genes remains unclear. Several reports have shown that expression of non-CGI genes is independent of DNA methylation [36
] while a recent report reveals in vitro
silencing of two CpG-poor genes caused by DNA methylation and nucleosome remodeling [37
], confirming our previous observations [38
]. CGI sequences are known to be resistant to methylation in humans and play an important role in maintaining an open chromatin environment via transcription factor binding and H3K4me3 enrichment ([40
] and reviewed in [41
]). The presence of H3K4me3 has previously been shown to exclude DNA methylation [24
], suggesting CGI promoters may normally be protected from DNA methylation spreading from nearby ERVs. By contrast, CpG-poor genes are thought to harbor less ubiquitous H3K4me3 enrichment than CGI genes ([23
] and reviewed in [42
]) and hence may be more sensitive to ERV DNA methylation spreading. We show that H3K4me3 euchromatin is able to spread from gene promoters to nearby sequences, likely contributing to the lack of methylation at ERV copies in these regions. In agreement with our observations, Hejnar et al
. have elegantly constructed a vector harboring a CGI from the mouse Aprt
gene upstream of avian Rous sarcoma virus-derived sequences and transfected into non-permissive mammalian cells in order to follow methylation status and transcription levels of integrated copies [43
]. While the Rous sarcoma virus is known to be methylated when inserted into mammalian cells, the adjacent CGI protects the inserted copies from DNA methylation and allows for virus transcription [43
]. Hejnar's group has recently shown that proviruses inserted close to TSSs enriched in H3K4me3 are not immediately silenced compared with intergenic insertions and are resistant to DNA methylation [44
], further supporting our hypothesis.
Boundary elements that act to separate euchromatin and heterochromatin domains may also act in blocking the accumulation and spreading of repressive marks, as has been shown for CTCF [26
] or H2AZ [45
]. A high proportion of 5' LTRs close to gene TSSs presented CTCF bound to their intervening regions, suggesting that 5' LTRs that remain after selection may require more than just H3K4me3 enrichment to block heterochromatin spreading. Interestingly, a recent genome-wide study in the human genome showed that gene promoters resistant to aberrant DNA methylation in cancer exhibited an increased frequency of retroelements nearby when compared with promoters prone to methylation. It was hypothesized that methylation-resistant genes may harbor more transcription factor-binding sites or boundary elements that act to prevent methylation, whereas methylation-prone genes do not have these protecting factors and are therefore more susceptible to potential silencing, which results in stronger negative selection against nearby insertions [46
]. This hypothesis is in accordance with our data.
The complex relationship that exists between TEs and host genes suggests that selection may act not only on the potential harmful effects of TEs on host genes but also on the epigenetic consequences of the TE presence. The fight between ERV heterochromatin and host CGI promoter euchromatin favors the host gene (Figure ), with the gene-induced open chromatin sometimes impacting the nearby ERV and, in turn, increasing expression of the host gene through alternative promoters. Cases where the ERV-induced heterochromatin overcomes the promoter euchromatin (Figure ) are likely to be quite rare as most such insertions will be eliminated due to selection unless their effects do not significantly impact host fitness. While all the mechanisms underlying this chromatin battle remain unknown, it is important to note that every TE family may have a different relationship with host genes and most copies that have survived selection seem to have reached an epigenetic equilibrium with their associated host gene (Figure ).
Figure 7 Gene-endogenous retrovirus confrontation. (A) Cartoon showing spreading of H3K4me3 euchromatin from the gene promoter towards the ERV sequence. The ERV becomes unmethylated and is able to act as an alternative promoter, potentially increasing expression (more ...)