|Home | About | Journals | Submit | Contact Us | Français|
The mammalian placenta is remarkably distinct between species, suggesting a history of rapid evolutionary diversification1. To gain insight into the molecular drivers of placental evolution, we compared biochemically predicted enhancers between mouse and rat trophoblast stem cells (TSCs) and find that species-specific enhancers are highly enriched for endogenous retroviruses (ERVs) on a genome-wide level. One of these ERV families, RLTR13D5, contributes hundreds of mouse-specific H3K4me1/H3K27ac-defined enhancers that functionally bind Cdx2, Eomes, and Elf5 - core factors that define the TSC regulatory network. Furthermore, we demonstrate that RLTR13D5 is capable of driving gene expression in rat placental cells. Comparison with other tissues revealed that species-specific ERV enhancer activity is generally restricted to hypomethylated tissues, suggesting that tissues permissive to ERV activity gain access to an otherwise silenced source of regulatory variation. Overall, our results implicate ERV enhancer cooption as a mechanism underlying the striking evolutionary diversification of placental development.
During pregnancy, maternal-fetal physiological exchange is mediated by the placenta, an organ that emerged in mammals 150 mya. Though the placenta performs the same basic function in all mammals, striking interspecies differences exist in overall structure, organization of tissue layers, and trophoblast cell types1. The dramatic evolutionary diversification of the placenta is thought to be driven in part by parent-offspring conflict, where disagreement over optimal parental investment leads to antagonistic coevolution at the placental interface2-4. Evidence suggests that regulatory mutations may underlie the morphological diversification of the placenta, as the development of the placenta is governed by highly conserved proteins5,6. Although many placental-specific proteins are rapidly evolving, these are primarily hormones and growth factors secreted during the later physiological response to pregnancy and are not expressed during placenta development7-9. Overall, this suggests that that regulatory mutations—rather than protein-coding mutations—may form the basis for placental morphological evolution.
As mounting evidence has implicated regulatory mutations as a general mechanism underlying developmental diversification10, we sought to investigate the regulatory landscape of early placental development in two closely related species - mouse and rat. Despite the similarities between mouse and rat placentation, genes expressed by the mature placenta show clear signs of rapid evolution since rodents diverged9, suggesting that evolution at the regulatory level may also be detected. We cultured mouse and rat trophoblast stem cells (TSCs), which represent the first cell population to give rise to the fetal placenta11, and performed 3′ RNA-Seq12 and ChIP-Seq against histone marks indicative of promoters (H3K4me3), enhancers (H3K4me1 and H3K27ac), and repressed regions (H3K27me3 and H3K9me3)13. Only high quality uniquely mapping reads were retained, and histone marked regions were identified using MACS (v2.09) with an FDR < 0.05. We predicted 9,460 mouse and 7,932 rat TSC promoters based on H3K4me3 enrichment over gene transcriptional start sites (TSS), which were associated with expressed genes (Fig. 1a, b). We predicted 52,476 mouse and 41,142 rat TSC enhancers based on distal enrichment of H3K4me1 (>5 kb from a gene TSS), and 25,736 mouse and 4,471 rat regions of distal H3K27ac enrichment. These predicted enhancers are significantly enriched near genes with annotated placental function (Fig. S1). Repressive marks H3K9me3 and H3K27me3 were predominantly intergenic, consistent with their association with inactive chromatin (Fig. 1b). Notably, we did not observe H3K27me3 at promoters in either species (Fig. 1a,b, Fig. S2a,b), suggesting that H3K27me3 does not associate with silenced promoters within the placenta. These observations are consistent with a previous study14 and, together, strongly suggest that Polycomb activity in TSCs is distinct from its role in embryonic stem cells (ESCs), and that trophoblast-specific mechanisms of gene repression are likely to be conserved across rodents.
If regulatory mutations drive morphological differences between mouse and rat, then identifying species-specific regulatory elements might reveal genomic regions that underlie novel adaptations. To this end, we compared the regulatory landscape between these species by mapping each regulatory element from rat to its orthologous position in the mouse genome, and then examined whether the chromatin state at each region was epigenetically conserved (Fig. 2a). We found that, although the majority of promoter regions are conserved between mouse and rat, both enhancer and repressed regions are predominantly species-specific. Interestingly, 8-10% of species-specific enhancers could not be mapped to the other genome. We found that over 80% of these unmappable enhancers directly overlap species-specific transposable elements (TEs). TEs have been established as important mediators of regulatory evolution due to their intrinsic regulatory activity15, and species-specific TEs constitute the majority of genomic DNA unique to mouse or rat16. Taken together, these analyses indicate that the TSC regulatory landscape has undergone substantial evolution since the divergence of rodents, and pinpoint TEs as being a significant source of this variation.
As TEs have been implicated in the regulatory evolution of other systems17, we next investigated whether specific families of TEs have contributed to rapid evolutionary amplifications of enhancers, promoters, or repressed regions within TSCs. Therefore, we identified TEs whose individual copies were significantly overrepresented within each set of regulatory elements, using a conservative binomial test to compare the observed overlap against a background expectation (Methods) (Fig. 2b, Tables S1-S13). In repressed regions, we found an enrichment of species-specific endogenous retroviruses (ERVs) within H3K9me3 regions and ancestral LINE1s within H3K27me3 regions, suggesting distinct epigenetic strategies for silencing different classes of TEs. In promoters, marked by H3K4me3, we found no enrichment of TEs, consistent with the high overall conservation of promoters observed between mouse and rat. Surprisingly, we found multiple species-specific ERVs enriched in both mouse and rat TSC enhancer regions (marked by both H3K4me1 and H3K27ac or either/or; see above) (Fig. 2c, Fig. S3). These observations suggest that the amplification of specific TE families may have shaped enhancer activity during placental evolution.
The enrichment of species-specific ERV families within predicted TSC enhancer regions prompted us to investigate ERVs as potential drivers of placenta regulatory evolution. For this analysis, we selected one mouse-specific ERV, RLTR13D5, which is highly enriched within mouse TSC enhancer regions defined by both H3K4me1 and H3K27ac. RLTR13D5 is present in 608 copies that exhibit ~91% sequence identity to their consensus sequence, indicating that it likely integrated 15-25 mya but is no longer actively replicating (Fig. 3a). Of these 608 copies, 95 exhibit enrichment of both H3K4me1 and H3K27ac in the absence of H3K9me3, compared to < 20 expected by chance (Fig. 3b,c). The strong association between RLTR13D5 copies and enhancer marks prompted us to ask whether the RLTR13D5 consensus sequence harbored transcription factor binding motifs that might drive enhancer function. Strikingly, statistically significant binding sites were predicted for Eomes, Cdx2, and Elf5 (Fig. 3d), which together are known to define the core TSC regulatory network18-20. Furthermore, although individual RLTR13D5 copies exhibit 9% nucleotide divergence on average, the majority of copies retain these binding motifs (Fig. S4). These results suggest that RLTR13D5 copies may serve as mouse specific enhancer elements by recruiting the core TSC regulatory machinery.
Although the presence of binding motifs is suggestive of function, we next tested whether Eomes, Cdx2, and Elf5 physically associate with RLTR13D5 and other ERVs. To this end, we performed ChIP-Seq with anti-Eomes, anti-Cdx2, and anti-Elf5 antibodies in mouse TSCs, using 100 bp paired-end reads to assist in detecting punctuate binding sites within repetitive regions. We identified thousands of binding sites for Eomes (45,730), Cdx2 (11,451), and Elf5 (34,751), and de novo motif discovery recovered the canonical motifs for each transcription factor (Fig. 3d, ,4a).4a). All three transcription factors showed significant association with RLTR13D5 (Fig. 4b). Examination of the transcription factor and histone occupancy across all RLTR13D5 copies revealed strong patterns of co-occupancy at a subset of copies with potential enhancer activity (Fig. 4c), as well as a clear bimodal distribution of H3K4me1 and H3K27ac with all transcription factor binding centered between the histone peaks—a configuration strongly associated with active enhancers (Fig. 4d)21. Notably, 40% (241) of all RLTR13D5 copies were bound by at least 1 transcription factor, and 16% (96) were bound by all three - Eomes, Cdx2, and Elf5. As regions containing multiple transcription factor binding sites are most likely to function as active enhancers22, we next examined how frequently the co-association of all three transcription factors occurred genome-wide and subsequently what portion of these regions derive from ERVs. We found a total of 945 triply bound regions genome-wide, 96 (10%) of which were derived from RLTR13D5. Strikingly, in addition to RLTR13D5, several closely related ERVs including RLTR13B4 and RLTR13C3 were also dramatically enriched at triply bound regions (Fig. 4b). Overall, we find that 35% (336) of all genomic regions triply bound by Eomes, Cdx2, and Elf5 are derived directly from the mouse-specific RLTR13 ERV superfamily, demonstrating a central role in dramatically reshaping the TSC core regulatory network.
As RLTR13 elements have all the hallmarks of active TSC enhancers, we next investigated whether these elements functionally influence placental gene expression. First, given that RLTR13 is mouse-specific, we asked whether genes proximal to RLTR13-derived enhancers display species-specific patterns of expression. From our 3′ RNA-Seq data, we identified 9,698 orthologous genes that exhibit TSC expression in either rat, mouse, or both. We then determined which of these genes are proximal to the 336 triply bound RLTR13 elements within a range of 100 kb. This yielded 114 genes that collectively exhibit increased expression levels in mouse (P = 0.0036, Wilcoxon signed rank test), consistent with RLTR13 elements enhancing proximal gene expression in a species-specific manner (Fig. 5a). Next, we tested whether RLTR13-derived enhancers could functionally drive gene expression by performing a luciferase assay in Rcho-1 cells23, a readily transfectable rat TSC model with no native RLTR13 elements in its genome. We selected two copies of RLTR13D5 for examination. The first was an ‘active’ copy bound by H3K27ac, H3K4me1, Cdx2, Eomes, and Elf5, and was adjacent to Apoceb3, which is expressed in mouse TSCs but not in rat. The second was a ‘decayed’ copy that harbors a 200 bp deletion that removed binding sites for Eomes, Cdx2, and Elf5 (Fig. 5b). We found the active copy drove a significant 2-fold increase in expression over the minimal promoter, while the decayed copy failed to drive expression (Fig. 5c). Overall, these results demonstrate that RLTR13D5 is capable of driving gene expression in placental cells and provide strong evidence that RLTR13-derived enhancers have facilitated the evolution of mouse-specific gene expression patterns in TSCs.
We next asked whether species-specific ERVs might function as enhancer elements in other tissues. Using the mouse ENCODE H3K4me1 datasets24, we identified putative regulatory TEs in 11 non-placental tissues. First, we found RLTR13D5 and the majority of other putative ERV enhancers predicted from our TSC dataset were not enriched within enhancer regions of other tissues, indicating that their activity is restricted to the placenta (Fig. 6a). We next found that while putative regulatory TEs could be identified in most tissues, most of these TEs are ancient (shared between mouse, rat, human) and constitute multiple classes including DNA transposons (Fig. 6b). This is in contrast to TSCs, where the majority of regulatory TEs are species-specific ERVs. Notably, the only other samples exhibiting similar patterns were embryonic stem cells (ESCs), and testes. Intriguingly, placenta, ESCs and testes all feature global DNA hypomethylation and intrinsic ERV activity. This is in stark contrast to the embryo proper which undergoes genome-wide methylation and silences retroviral activity25-27. Overall, this suggests a correlation between a permissive epigenetic state and the ability of ERVs to escape repression. This escape from repression within these few cellular contexts, most notably the placenta, may allow for ERVs to function as active enhancers, potentially altering the development of these tissues.
We suggest that species-specific ERVs contribute to the rapid divergence of the placental gene regulatory network. ERVs have affected diverse aspects of mammalian biology17,28,29, and have been influential in shaping placental evolution by contributing viral proteins that mediate placental growth, immunosuppression, and cell fusion30. ERVs are normally repressed by the embryo, but for reasons that remain unclear, ERVs are highly active in the mammalian placenta31. Our findings suggest a model where placental ERV activity may be adaptive over time. Under parent-offspring conflict theory, the placental interface is shaped by ongoing conflict between mother and fetus4. We speculate that ERV variation, exposed by the permissive epigenetic state within the placenta, allows the fetus increased evolvability against maternal defenses. Specifically, by relaxing epigenetic repression of ERV activity, the placenta gains access to a highly polymorphic source of enhancer elements that may dramatically influence its developmental phenotype (Fig. S5). As the placenta is a transient organ, the long-term advantage conferred by increased developmental evolvability would outweigh the potentially mutagenic effects of ERV activity. We propose this model as a plausible explanation for the persistence of placenta-specific ERV activity, which has been observed in all major mammalian taxa31. Our study demonstrates that ERVs facilitate placental evolution at the regulatory level by serving as active developmental enhancers, and that this mechanism—made possible by the unique epigenetic environment of trophoblast cells—may contribute to the remarkable morphological diversification of the placenta.
All 3′ RNA-Seq and ChIP-Seq data for both mouse and rat have been deposited at the Gene Expression Omnibus (GEO), accession # GSE42207.
Mouse TSCs were obtained from Dr. Janet Rossant, Hospital for Sick Children, (Toronto, Canada) and maintained in DMEM/F12 with 15 mM Hepes, 20% FBS, 2 mM glutamine, 100 ug/ml streptomycin, 1 mM sodium pyruvate, 100 uM BME, supplemented with Activin, FGF4, and Heparin as previously described32. Rat TSCs and Rcho-1 TSCs were cultured as previously described23,33.
Each ChIP was performed with 20 million cells (100 million for transcription factors/TFs) using the ChIP Assay kit (Millipore) following manufacturer’s instructions. Briefly, cells were cross-linked in 2% formaldehyde for 15 minutes, quenched in 1 M glycine for 5 minutes, washed twice with PBS, and resuspended in lysis buffer (1% SDS, 10mM EDTA, 50mM Tris-HCl, pH 8.1) supplemented with a protease inhibitor cocktail (Roche) for 30 minutes. Cell lysates were diluted 1:1 with dilution buffer (0.01% SDS, 1.1% Triton X-100,1.2mM EDTA, 16.7mM Tris-HCl, pH 8.1, 167mM NaCl) then sonicated for 12 cycles (30 seconds on/off) at 60% amplitude to produce an average fragment size range of 300-600 bp. Immunoprecipitation was performed using 2-5 ug antibody (H3K4me3: ActiveMotif 39159, H3K27me3: ActiveMotif 39535, H3K27ac: Abcam ab4729, H3K9me3: Abcam ab8898, H3K4me1: Abcam ab8895, CDX2: Bethyl Labs A300-691-A, EOMES: Abcam ab23345, ELF5: Santa Cruz sc-9645x) conjugated to 50 ul protein G Dynabeads (Invitrogen) overnight. Bead-chromatin complexes were washed using High Salt Immune Complex Wash (0.1% SDS, 1%Triton X-100, 2mM EDTA, 20mM Tris-HCl, pH 8.1, 500mM NaCl), Low Salt Immune Complex Wash (0.1% SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris-HCl, pH 8.1, 150mM NaCl), LiCl Immune Complex Wash (0.25M LiCl,1% IGEPAL, 1% deoxycholate acid, 1mM EDTA, 10mM Tris-HCl, pH 8.1), and TE buffer (10mM Tris-HCl, 1mM EDTA, pH 8.0) with each wash performed twice for 5 minutes. Cell lysis, sonication, immunoprecipitation, and cleanup steps were all performed at 4 °C. Finally, chromatin was eluted from beads using elution buffer (1% SDS, 0.1M NaHCO3) and protein-DNA crosslinks were reversed with the addition of 5M NaCl, and DNA was purified using Qiaquick column cleanup (Qiagen). 50-500 ng immunoprecipitated DNA was prepared for sequencing using the Illumina genomic DNA preparation kit. Briefly, DNA fragments were end repaired and ligated to Illumina adapter linkers, size selected using Invitrogen E-gel SizeSelect agarose gels, PCR amplified 15 cycles (18 for TFs), and purified using Ampure XP beads (Beckman Coulter). Libraries were sequenced using the Illumina Genome Analyzer IIx or HiSeq 2000.
High quality 36 bp or paired-end 100 bp reads (for TFs) were aligned to the mouse (mm9) or rat (rn4) genomes using BWA (v0.9.6)34 and filtered to remove reads that mapped to multiple locations. Data from replicates were pooled and regions of enriched occupancy relative to a background input were identified using MACS (v2.09)35 with default settings and regions with an FDR < 0.05 were retained for analysis. Histone marks were called using the “broad” peak setting, which accounts for breaks in coverage due to repetitive sequence. All association of MACS ChIP-Seq defined regions to gene body features (e.g. gene transcriptional start site/TSS and exon/intron coordinates) was performed using coordinates downloaded from ENSEMBL e63 for mouse and rat. Basic genomic region manipulations including intersections and windows were performed using BedTools36. Functional annotation enrichment of genes near predicted enhancers (defined as regions of H3K27ac + H3K4me1 enrichment >5 kb from a gene TSS) was examined using GREAT37. ChIP aggregate profiles and heatmap graphics were generated using siteproBW and heatmaprBW from the Cistrome package38. For heatmaps, regions that fell within 5 kb of another region were discarded for visualization purposes. For comparative analysis, genomic coordinates of were converted across species using the UCSC liftOver tool requiring at least 50% of the region to be mappable (−minmatch 0.5). For each histone mark, epigenetically conserved regions were defined as regions that mapped across species and resided within 1 kb of ChIP-Seq defined region in the other species (e.g. a rat H3K4me1 region mapped to mouse, overlapping or within 1 kb of a mouse H3K4me1 region). Regions of TF overlap were defined as multiple TF binding sites within a 1 kb window. De novo motif discovery was performed using MDSeqPos module from Cistrome on repeat-masked sequence with the top 1000 enriched ChIP-Seq peaks for each TF dataset based on MACS enrichment scores. Scanning of existing motifs was performed using FIMO, part of the MEME suite39, with binding motif profiles downloaded from Uniprobe40.
All repeat data (annotations, consensus sequences) were obtained from RepeatMasker libraries downloaded from the RepeatMasker website (http://www.repeatmasker.org, mouse: v20090604, rat: v20080521). Repeats classified as “Simple,” “Satellite,” and “Unknown” were discarded. Relative repeat ages were estimated based on median percent sequence divergence between extant copies, which is generally indicative of actual repeat age41. Divergence cutoffs for species-specific/shared repeats were determined as previously described42. Mouse repeat ages in years were estimated using the divergence/substitution rate (4.5 × 10−9)42.
Overrepresented repeat families were determined by comparing the observed number of copies from a family overlapping a ChIP-Seq dataset against a background expectation. The background was estimated by generating 1000 randomized datasets based on each ChIP-Seq dataset, matched for region size, chromosome, and relative distribution from annotated genes (i.e. an equivalent number of regions directly overlapping a gene TSS, exon, intron, 10 kb gene proximal region, 100 kb distal region, or > 100kb intergenic regions) to account for genomic biases in TE integration sites. The number of overlapping TE copies within each of the 1000 randomized datasets were averaged to determine a background expectation, and the enrichment of the observed overlap against the background was assessed with a binomial test using a Bonferroni corrected P < 0.005. Candidate enriched repeat families were further filtered to require ≥ 30 observed overlaps and ≥ 2 fold observed/expected enrichment. These additional thresholds primarily removed SINE elements, which were modestly but significantly enriched in multiple datasets. Though SINES are likely to contribute regulatory elements as well, the extremely high frequency and small size of SINES relative to broad regions of histone mark enrichment led us to discard them as candidates in our analysis. Further, as multiply mapping reads were discarded, our analysis was generally biased against extremely recent TE families that contain non-unique copies.
Illumina read data in FASTQ format for mouse ENCODE H3K4me1 ChIP-Seq experiments was downloaded from the UCSC Genome Browser ENCODE portal24. Replicates were pooled together, re-mapped and processed as described above.
Two replicates from different passages (10 million cells each) were prepared for each cell type. mRNA was extracted directly from cell lysates using Dynabeads Oligo (dT)25 (Invitrogen) and assessed for quality using a Bioanalyzer. 500 ng of mRNA was then used to prepare 3′ RNA-Seq libraries as previously described12. Theoretically, 3′ RNA-Seq ensures only a single 3′ fragment per mRNA transcript is represented in the data. Briefly, mRNA was heat sheared for 7 minutes to produce an average fragment size range of 300-500 bp, then used to generate cDNA libraries using a custom oligo dT primer containing Illumina-compatible adapter sequence. cDNA fragments were end-repaired and ligated to standard Illumina adapters. Size-selection was performed using E-gel SizeSelect agarose gels (Invitrogen), products were PCR amplified for 15 cycles, and purified using Ampure XP beads. Library quality was assessed using the 2100 Bioanalyzer (Agilent) and Qubit (Invitrogen), and sequenced on the Genome Analyzer IIx.
High-quality 36 bp reads were aligned to the mouse (mm9) or rat (rn4) genomes using Bowtie 0.12.743, and reads mapping to multiple locations were removed. Significant transcribed regions were detected using Unipeak v0.99 (Foley J., and Sidow A., in review). Regions were associated with annotated gene coordinates downloaded from ENSEMBL build e63, and multiple regions mapping to a single gene were combined, resulting in a raw count total of transcripts per gene. For cross-species analysis, only genes with 1:1 direct orthologs between mouse and rat (defined by ENSEMBL) were retained. Read counts were normalized across samples and species using the DESeq R package(v1.5)44. Genes were associated to RLTR13 elements by identifying the closest proximal gene TSS within 100 kb upstream or downstream of the element. The paired comparison of gene expression between species was performed using normalized expression levels averaged between replicates and significance was assessed using a Wilcoxon signed rank test.
To prepare the promoter-reporter constructs, positive/”active” and negative/”decayed” RLTR13D5 sequences were cloned into the KpnI and XhoI sites of pGL4.12 [luc2CP] firefly luciferase vector (Promega) containing a 230bp minimal HSVTK promoter. Rcho-1 TSCs, a commonly used TSC model positive for Cdx2, Eomes and Elf5 expression, were plated into 24 well plates in proliferating condition (RPMI with 20% FBS). 24h after plating, proliferation medium was replaced with transfection medium and 300ng of promoter-reporter vector along with 30ng of the control renilla vector (pGL4.74[hRluc/TK]) were transfected into Rcho-1 rat TSCs in each well using 3ul of Lipofectamine 2000 (Invitrogen) following manufacturer’s protocol. 12h after transfection, transfection medium was replaced with proliferation medium (RPMI with 20% FBS) and cultured for another 12h. 24h after transfection, cells were washed with cold PBS, lysed in 200ul of passive lysis buffer and standard dual luciferase assays were performed on the cell lysates by using Dual-Luciferase Reporter (DLR) Assay reagents (Promega).
The authors wish to thank G. Barsh for helpful comments, the A. Sidow lab for assistance with sequencing and J. Rossant for contribution of mouse TSCs. This work was supported by the Stanford Genome Training Grant (EBC) (T32 HG000044), the National Science Foundation Graduate Research Fellowship (EBC) (2008052909), the Stanford Bio-X program (JCB), and the Burroughs Welcome Prematurity Initiative (JCB).
Author contributions EBC and JBC conceived and designed the study and wrote the manuscript. EBC designed and performed RNA-Seq and ChIP-Seq experiments and analyzed the data. MAR and MJS provided rat samples. MAR performed luciferase assays.