|Home | About | Journals | Submit | Contact Us | Français|
Almost half of the human genome and as much as 40% of the mouse genome is composed of repetitive DNA sequences. The majority of these repeats are retrotransposons of the SINE and LINE families, and such repeats are generally repressed by epigenetic mechanisms. It has been proposed that these elements can act as methylation centers from which DNA methylation spreads into gene promoters in cancer. Contradictory to a methylation center function, we have found that retrotransposons are enriched near promoter CpG islands that stay methylation-free in cancer. Clearly, it is important to determine which influence, if any, these repetitive elements have on nearby gene promoters. Using an in vitro system, we confirm here that SINE B1 elements can influence the activity of downstream gene promoters, with acquisition of DNA methylation and loss of activating histone marks, thus resulting in a repressed state. SINE sequences themselves did not immediately acquire DNA methylation, but were marked by H3K9me2 and H3K27me3. Moreover, our bisulfite sequencing data did not support that gain of DNA methylation in gene promoters occurred by methylation spreading from SINE B1 repeats. Genome-wide analysis of SINE repeats distribution showed that their enrichment is directly correlated with the presence of USF1, USF2 and CTCF binding, proteins with insulator function. In summary, our work supports the concept that SINE repeats interfere negatively with gene expression and that their presence near gene promoters is counter-selected, except when the promoter is protected by an insulator element.
Retrotransposons of SINE and LINE classes have been extremely successful in colonizing mammalian genomes (1). Specifically in human and mouse, these two classes of repeats represent almost 50% of their genomes. This high frequency was in past regarded as a result of multiplication by retrotransposition and random integration into the genome. However, new evidences suggest that the current distribution of these elements was modeled by evolutionary constraints and that retrotransposons and other repeats have evolved from parasitic sequences into functional genomic elements (2, 3). Adding to these data, earlier evidences supported that the distribution of interspersed elements is conserved in the human and mouse genomes (4). The direct comparison of these genomes revealed a spatial concordance in positioning of SINE but not LINE repeats, indicating evolutionary pressure to maintain and/or exclude these repeats from orthologous regions (5).
Mobilization of retroelements is inhibited by epigenetic mechanisms (6). SINE and LINE elements are frequently methylated, as confirmed by the recent papers on genome-wide methylation studies in mouse and in human (7, 8). In normal cells, DNA methylation is rare in promoter-associated CpG islands but important to X-inactivation (9) and genomic imprinting (10). During tumorigenesis hundreds to thousands of genes gain methylation in promoter-associated CpG islands, affecting many pathways including tumor-suppressor pathways (11). The causes of this massive switch in DNA methylation remain mysterious. When searching for sequences that differentiate genes that are rarely methylated in cancer from those frequently methylated, we found that SINE and LINE retrotransposons are enriched around the transcription start sites (TSSs) of genes that are never or rarely methylated in cancer (12). Such association was so strong that it was possible to build a mathematical model to predict gene methylation based on the presence of retrotransposons near the genes TSS, and this model worked well when applied to methylation data from both cancer cell lines and primary tissues. Similar findings supporting that a signature of retrotransposon abundance predicts epigenetic states have been described for DNA methylation in normal tissues (13), DNA methylation in cancer (14), genomic imprinting (15) and X-chromosome inactivation (16), among other studies. Given the hypermethylated state of retrotransposons in normal cells, it is somewhat paradoxical that SINE and LINE elements are poorly represented in the vicinity of genes that become hypermethylated in cancer (12, 14, 17). Moreover, SINE B1 elements were previously reported to cause transcriptional silencing of the adjacent aprt gene by spreading of DNA methylation (18).
We hypothesized that the enrichment of retrotransposons near cancer methylation-resistant genes is due to a negative influence of SINE repeats on gene expression and counter selection over evolution, resulting in exclusion of these repeats from vulnerable genomic environments. Thus, only gene promoters with intrinsic resistance to DNA methylation are permissive to the nearby presence of retrotransposons. This hypothesis predicts that (i) SINE retrotransposons cause silencing of vulnerable nearby gene promoters, and (ii) SINE abundance near promoters correlate with genomic features that limit retrotransposons influence on adjacent chromatin. Here, we show that both of these predictions are correct.
Promoter regions containing CpG islands from murine genes Cdkn2d (−440 bp to +618 bp from the gene TSS), Cdkn2a long transcript, also known as p14Arf (−273 bp to +474 bp from the gene TSS) and Mlh1 (−226 bp to +267 bp from the gene TSS) were identified from mouse genome sequence repositories and amplified by PCR. The rationale for selecting these genes among thousands of others with a promoter CpG island was that p14Arf and Mlh1 have tumorigenic potential in mice and that their human homologues are prone to become hypermethylated in human neoplasias (19, 20). Cdkn2d was included because it belongs to the same gene family that p14Arf. Also, none of these promoter CpG islands are constitutively methylated in normal cells. NIH/3T3 DNA was used as template to Cdkn2d and Mlh1 PCR, and p14Arf was amplified from BALB/c mice liver DNA. The PCR fragments were digested by appropriate restriction enzymes and subcloned into a pGL3-Basic reporter plasmid (wild-type promoters). SINE B1 elements were inserted immediately upstream to cloned promoters, generating the plasmid variants containing 2, 4 or 6 B1 elements (see Table S1 for more details on the cloned SINE B1s and other repetitive elements used in this study). Mouse embryonic fibroblast NIH/3T3 cells were transiently co-transfected with the promoter constructs (1 μg) and 20 ng of pRL-TK vector using FuGene reagent (Roche) according standard methods. Cells were harvested in approximately 24, 48 and 72 hours after transfection and luciferase activity from cell extracts was detected using the Luciferase Assay System (Promega, Madison, WI, USA) as specified by the manufacturer. The magnitude of activation of luciferase constructs was determined after normalization to pRL-TK activity, and the values of each wild-type promoter was then taken as 1.0-fold. Stable transfections were performed by co-transfection of a neomycin-expression vector. Neomycin selection was initiated on day 2 after transfection, the medium was replaced with fresh DMEM containing 10% calf serum and G418 at 400 ug/ml (Invitrogen), and the cells were sub-cultured every 2–3 days. Cell pellets were collected for DNA extraction and luciferase readings at days 19, 36, 48, 60 and 90 for Cdkn2d and Mlh1 constructs and days 15, 22, 29, 36, 43 and 50 for p14Arf constructs. Sequences of all primers used in this work are provided in Table S2. For investigation of human SINEs, we cloned the human CDH1 gene promoter upstream to the luciferase reporter gene in four different configurations: wt-CDH1, without any retroelements (−303 bp to +1,370 bp from the gene TSS); SINE-CDH1, with Alu and Mir sequences downstream to the gene promoter (−303 bp to +2,318 bp from the gene TSS); LINE-CDH1, with a LINE element sequence downstream to the gene promoter cloned in the position +836 bp from the gene TSS in the wt-CDH1 plasmid; and (S/L)INE-CDH1, with Alu, Mir and LINE sequences downstream to the promoter (see Fig. 6a for a graphic representation of these constructs). Luciferase activity of the transgenes was measure as described above after transient transfection in the human cell lines RKO (colorectal carcinoma) and NCI-H1299 (lung carcinoma). All cell lines were obtained directly from ATCC (Manassas, Virginia). These cell lines were not re-authenticated in our laboratory since the vendor already authenticated them.
Bisulfite treatment was performed as previously reported (21) and 1/10th of the final volume was used as a template for PCR. Except for p14Arf (which is deleted in NIH/3T3 cells), the other gene promoters were amplified by semi-nested PCRs, where one of the primers of the first reaction was located in the plasmid sequence to avoid detection of the endogenous gene. Methylation density was determined by COBRA (Combined Bisulfite Restriction Analysis (22)); PCR products were separated by 6% polyacrylamide gel electrophoresis and stained with ethidium bromide, imaged, and quantitated in a Bio-Rad Geldoc 2000 imager (Bio-Rad, Hercules, CA), and the methylation density for each sample was computed as a ratio of the density of the digested band to the density of all bands in a given lane. DNA Methylation was confirmed by bisulfite sequencing and/or pyroMeth analysis. The full list or primers used in these studies is presented in Table S2.
Log-phase growing cells were crosslinked with 1% formaldehyde, washed twice with cold PBS with protease inhibitors and harvest by scrapping. Cells were sonicated in SDS lysis buffer, followed by elution in ChIP buffer. We selected to evaluate promoter marking by active histone modifications that are universally observed in active promoters (H3K9ac) or preferentially found in active promoter CpG islands (H3K4me3), and that are associated with gene repression at the same time or not that DNA methylation (H3K9me2) or typically mutually exclusive to DNA methylation (H3K27me3). The following antibodies were used for immunoprecipitation: H3K4me3 (Millipore, 17-614); H3K9ac (Millipore, 07-352); H3K9me2 (ABCAM, ab1220), H3K27me3 (Millipore, 17-622), Histone H3 (ABCAM, ab1791-100) and rabbit IgG (ABCAM, ab46540). Chromatin-antibody complexes were capture using Dynabeads Protein A/G (Invitrogen, Carlsbad) and the immunoprecipitated DNA was treated with proteinase K, purified by column filtration and eluted in Tris-buffer. Quantitative real-time PCR was used to detect amplicons from the target sequences (p14Arf promoter and SINE B1) and positive controls of active (tuba) and repressed genes (hbb-b1 and nanog), and the fold-enrichment of each histone modification and rabbit IgG to Histone H3 was calculated using the delta-Ct method.
Transcription start site coordinates of mouse and human RefSeq genes and SINE repeats were downloaded from the UCSC Genome Browser (mm8 and hg18 releases). The coordinates of binding sites of CTCF in the human genome were available from public data releases from the Encode Chromatin Group at Broad Institute and Massachusetts General Hospital, and mouse CTCF binding sites were reported by (23). Genome coordinates of USF1 and USF2 binding were available from (24). Each RefSeq gene was represented by 20 bins of 1-kb sequence each (10 bins upstream and 10 bins downstream of gene TSS), and each bin was then annotated as occupied or not by SINE retrotransposons (when falling in between bins, SINEs were annotated to the closest bin to TSS). We compared the average abundance of SINE retrotransposons per bin in gene bound and not bound by each CTCF, USF1 and USF2 in the 20-kb genomic region around gene TSS. Genes were also classified as having or not a CpG island overlapping with their proximal promoter regions (−200bp to +200bp). The frequency distribution of SINEs in the two groups (genes bound and not bound by insulator proteins) was compared using t-Student test.
To test the capacity of SINE B1 elements in promoting gene silencing, we generated a system where the luciferase gene is under the control of three different mouse gene promoters (Cdkn2d, p14Arf and Mlh1), and we inserted two and four copies of SINE B1 elements upstream to these promoters (Fig. 1a; all primers used in this study are included in Table S2). One plasmid containing six copies of SINE B1 elements (6B1-p14Arf) was also used in some of the experiments. These plasmids were transfected in the immortalized mouse fibroblast cell line NIH/3T3 (more details on Methods). This cell line was chosen for our experiments because it showed the capacity to sustain DNA methylation in transfected plasmids (25) and also due to its high efficiency of transfection. We initially assayed luciferase activity after short-term transfection, and we observed that the plasmids with SINE B1 elements were 30–60% less transcriptionally active than plasmids without SINE B1 insertion (Fig. 1b–d). The level of repression was dependent on the number of copies of inserted SINE elements, and we found that the repression increased with time. Transcriptional repression was independent of the orientation of SINE B1 elements, as these retrotransposons caused the same degree of repression when they were inserted in negative orientation to the gene promoter (Fig. S1).
We followed the dynamics of promoter activity over time during a two-month period, and we found that the p14Arf plasmids with SINE B1 insertion become fully repressed (Fig. 2a). The repression of the p14Arf promoter activity was reproducible in two independent experiments, ruling out possible effects related to plasmid insertion sites in the genome. Notably, the loss in promoter activity occurred in a temporal fashion, suggesting that reinforcing mechanisms act to promote gene repression. To evaluate whether epigenetic reprogramming had occurred, DNA methylation density near the TSS of p14Arf was measured using the COBRA assay (22). No special considerations for primer design were required to avoid investigation of an endogenous copy of p14Arf because this gene is homozygously deleted in NIH/3T3 cells. We observed that after an initial gain of methylation in the p14Arf promoter by day 29, no significant difference was observed afterwards (Fig. 2b), despite consistent transcriptional repression. The methylation data at day 29 were confirmed by pyroMeth, showing that the gain of methylation was not restricted to an individual CpG site but was concordant between 9 individual CpG sites located between the positions −21 to +88 base pairs from the gene TSS (Fig. 2c). We hypothesized that for p14Arf the DNA methylation mark was replaced by other epigenetic mechanisms of silencing, most likely histone marking. Indeed, using chromatin immunoprecipitation (ChIP) assays, we found that the p14Arf promoter showed a large enrichment for H3K4me3 and H3K9ac in p14Arf-wt compared to 4B1-p14Arf plasmids. However, the repressive marks H3K9me2 and H3K27me3 did not differ significantly (Fig. 2d), suggesting that other marks are relevant here. In summary, the presence of SINE B1 elements near gene TSS promoted epigenetic reprogramming with changes in DNA methylation and histone posttranslational modifications.
We also studied the long-term effects of SINE B1s in Cdkn2d and Mlh1 promoter activity. Consistent with a role of SINE B1s as repressors, the Cdkn2d promoter gradually lost activity until reaching only 5% to 10% of its original strength when cloned close to B1 sequences (Fig. 3a). In a sharp contrast to p14Arf, there was a progressive and persistent gain of methylation in the Cdkn2d promoter in B1-containing plasmids (Fig. 3b). An exception to continued silencing by SINE B1s was seen for Mlh1 plasmids (Fig. 3c). Despite the decrease in promoter activity of Mlh1 promoter in B1-containing plasmids in short-term transfections, no additional repression was measured afterwards. Inverting the direction of the Mlh1 promoter did not change this result in an independent experiment. These differences between p14Arf, Cdkn2d and Mlh1 in response to SINE-B1s in long-term transfections are also supported by the non-normalized data (Fig. S2). The Mlh1 promoter remained mostly unmethylated both in wt- and B1-containing plasmids, in agreement with lack of change in promoter activity as measured by luciferase activity (Fig. 3d). Bisulfite sequencing of the Mlh1 promoter region confirmed that the CpG sites close to the TSS were resistant to de-novo methylation (Fig. 3e, 3f) but the remaining CpG sites elsewhere gained methylation. However, this gain in methylation did not differ between wt- and B1-plasmids, suggesting that it occurred stochastically.
An important question is whether the SINE B1s are acting as methylation centers in this system, as reported previously for the mouse aprt gene. To address this question, we performed bisulfite sequencing to gain deeper information regarding methylation changes in the p14Arf promoter. The pattern of methylation was consistent with the data generated by COBRA and pyroMeth, with higher overall methylation of B1-containing plasmids (Fig. 4a). Two additional pieces of information were generated in this experiment. First, the pattern of methylation was not consistent with methylation spreading from B1 elements into nearby DNA, as the promoter sequence adjacent to SINE B1 remained mostly unmethylated. We observed, however, the existence of internal sequences in the p14Arf promoter that gained methylation before other CpG sites, and methylation spreading from these regions, which we call here a cryptic methylation center. Interestingly, the alignment of the four central CpG sites in the cryptic methylation center revealed a repeated sequence motif (A/G)GCG(A/G)(A/G), but no protein is known or predicted to bind to this site. Although also present in wt-p14, these regions gained methylation more rapidly in B1-containing plasmids. Second, as observed for the Mlh1 promoter, the CpG sites away from the TSS are more susceptible to gain of methylation (Fig. 4b), accumulating methylation density as high as 90% (for the methylation center) and 35% (the region upstream to the TSS).
In order to better characterize the epigenetic status of the SINE B1 elements, we performed bisulfite sequencing of the most 5′ region of the p14Arf promoter together with two or four copies of SINE B1s. This sequencing of B1 elements in p14Arf plasmids revealed that they remained methylation-free, supporting the idea that in this system these elements do not function as centers for spreading of DNA methylation (Fig. 5a). B1 elements, however, acquired H3K9me2 and H3K27me3, both markings associated with repressed status (Fig. 5b) as revealed by ChIP assay in comparison to the physiologically repressed genes Hbb-b1 and Nanog, and the constitutively expressed Tuba1a gene.
Since mouse SINE B1 elements caused transcriptional repression, we sought to investigate whether human SINE and LINE repeats could have the same effect. For this study, we used the human E-cadherin gene (CDH1) as a model (Fig. 6a). The human CDH1 gene has four SINEs adjacent to the 3′ region of its CpG island (three Alus and a Mir element), and in reporter assays the removal of these repeats resulted in higher, stable promoter activity in a cell line-dependent manner (Fig. 6b). However, the introduction of LINE sequence to plasmids with or without SINEs did not change promoter activity. Thus, SINE retrotransposons of human origin, similar to the tested mouse SINE, can interfere negatively with promoter activity.
To investigate the second prediction derived from our hypothesis, i.e. that SINE fitness is influenced by insulator elements, we annotated the abundance of mouse and human SINE retrotransposons in a 20-kb genomic region around genes TSS, and then compared the frequency of these elements between genes bound and not bound by CTCF, USF1 and USF2 (Fig. 7). These transcription factors have been previously proven to function as barriers to heterochromatin spreading (26, 27), and data of their in vivo binding in mouse ES cells (23) and multiple normal cell lines are available from whole-genome maps (24) and in public releases from the Encode Chromatin Group at Broad Institute and Massachusetts General Hospital. Other putative insulators like SP1 (25, 28) and VEZF1 (29) were not tested because they lack extensive in vivo binding studies, and we excluded from our study predicted binding sites. In general, genes bound by any of these factors show a higher frequency of SINE repeats than unbound genes, and the observed difference is statistically significant (p<0.01, t-Test). An exception is that the distribution of Alu repeats between CTCF-bound and -unbound genes is similar in promoter CpG islands (but different in non CpG island promoters). This finding was concordant between mouse and human genomes, and we believe that the lack of difference is related to the high frequency of CTCF binding sites in CpG islands (40% of promoter CGI are bound by CTCF, compared to 20% in non CpG island promoters), a weaker insulating activity of CTCF compared to USF1 and USF2, and that CpG islands are maintained in open chromatin status by additional mechanisms (for example, binding of Cfp1 and KDM2A proteins (30, 31)). USF1 and USF2 were similarly distributed between promoter CGI and non-CGI (USF1 and USF2 are present in 7% and 5% of gene promoters, respectively). In conclusion, the presence of SINE repeats is better tolerated by gene promoters insulated through USF1, USF2 and CTCF (in the case of non-CpG island promoters).
Here we show that SINE B1 elements can influence the activity of proximal promoters and ultimately lead to epigenetic reprogramming. The early effect on promoter activity is compatible with direct recruitment of co-repressors, however alternative mechanisms cannot be ruled out. Independent of the mechanism by which SINE elements promote transcriptional repression with associated epigenetic remodeling of adjacent promoters, it is evident that their close proximity to gene promoters have a deleterious effect and, as such, their insertion in close proximity to gene promoters is evolutionary constrained. Indeed, we show here that Alu repeats in the human genome are more frequently found near gene promoters that are bound by insulator proteins. Additionally, our results show that for the tested gene promoters, transcriptional repression occurred before DNA methylation, supporting previous findings in a different cellular system (32). In the case of Cdkn2d promoter, transcriptional repression was sufficient to trigger relatively stable DNA methylation. A more dynamic mechanism of repression was involved in p14Arf silencing, with DNA methylation being subsequently replaced by histone marking. A similar phenomenon has been observed in cancer cells and dubbed “epigenetic switch”, however in that case repressive histone markings were replaced by DNA methylation (33, 34). It is clear from our results that not all genes are equally sensitivity to repression by retrotransposons, as Mlh1 promoter activity was only moderately affected by B1 SINEs: despite an initial decrease in promoter activity very early after transfection, no additional effect is observed later on. We speculate that the proximal promoter of Mlh1 is protected from heterochromatinization; indeed, profiling of multiple cancer tissues has shown that Mlh1 is methylated in a lower fraction of tumors compared to other classically studied genes like p16 and DAPK1.
Our data add to previous reports where silencing of nearby genes was induced by a retrotransposon. As we mentioned earlier, mouse B1 elements were reported to act as methylation centers from where DNA methylation leaked into the aprt gene promoter (18). Our data differs from this reporter in that spreading of DNA methylation from the SINE B1 into the nearby gene promoters was not observed. Still, this retroelement mediated gene repression and facilitated methylation spreading from a cryptic methylation center, located in the proximal promoter of the p14Arf gene. A possible explanation for this difference is that hundreds of thousands of SINE B1 elements occur in mice, and it is likely that different subfamilies exert their influence in nearby sequences through alternative mechanisms. For example, a subfamily of B1 elements has been previously described, and its influence on the transcriptional activity of proximal genes is mediated by the transcription factors Ahr and Slug (35). It still remains to be fully understood the primordial role that this newly identified SINE B1 subfamily plays in the genome, as the same research group later described an insulator function for this element (36). Such apparently contradictory findings show the complexity of the issue in question, with alternative outcomes of SINE B1 presence being mediated by sequence variation, physical localization in the genome and potentially tissue-specific regulation. Lunyak et al. (37) have identified an insulator activity for another SINE family, SINE B2. In their study, the insulating activity of SINE B2 elements appeared to be tissue-specific, as it created a permissive chromatin state for the transcription of the pituitary-specific growth hormone gene, and also developmentally regulated. All together, these data point out that it is unlikely that all retroelements of a certain family will have a universal function. Subtle changes in sequence and location in the genome, which creates an opportunity for interaction with diverse regulatory elements, will ultimately model their activity.
A weakness of our study is that our system is based on random integration of plasmids. Thus, we cannot rule out site-specific effects that may influence the SINE B1 activity and the transcriptional outcome of the tested gene promoters. However, it appears that site-specific effects were not the major determinants of the transcriptional and epigenetic fate of the tested constructs, given the reproducibility of the data across different gene promoters and between repeated experiments. In addition, the observed correlation of SINE elements both in human and mouse genomes with known insulator factors suggests that by large there is a need for isolating genes from these elements. Recent reports revealed that retrotransposition occurs in relatively high levels and account for genome variability across individuals and normal to disease states (38–41). Once enough data have been generate with correlated genome, epigenome and transcriptome information for multiple subjects, it will be possible to more directly access the effect of newly inserted elements on the transcription and epigenetic state of neighboring genes. Naturally, hundreds to thousands of observations will be necessary to distinguish noise from an actual effect.
This work was supported by National Institutes of Health grants P50CA100632, RO1CA098006 and R33CA89837. J-PJI is an American Cancer Society Professor. All DNA sequencing was performed in the DNA Analysis Core Facility at the M.D. Anderson Cancer Center, which is supported by NCI Grant CA-16672 (DAF).
Conflict of interest: None