|Home | About | Journals | Submit | Contact Us | Français|
During germ cell and preimplantation development, mammalian cells undergo nearly complete reprogramming of DNA methylation patterns. We profiled the methylomes of human and chimp sperm as a basis for comparison to methylation patterns of ES cells. While the majority of promoters escape methylation in both ES cells and sperm, the corresponding hypomethylated regions show substantial structural differences. Repeat elements are heavily methylated in both germ and somatic cells; however, retrotransposons from several subfamilies evade methylation more effectively during male germ cell development, while other subfamilies show the opposite trend. Comparing methylomes of human and chimp sperm revealed a subset of differentially methylated promoters and strikingly divergent methylation in retrotransposon subfamilies, with an evolutionary impact that is apparent in the underlying genomic sequence. Thus, the features that determine DNA methylation patterns differ between male germ cells and somatic cells, and elements of these features have diverged between humans and chimpanzees.
In mammals, proper DNA methylation is essential for both fertility and viability of offspring (Bestor, 1998; Bourc’his and Bestor, 2004; Li et al., 1992; Okano et al., 1999; Walsh et al., 1998). DNA methylation in germ cells is required for successful meiosis (Bourc’his and Bestor, 2004), and blastocysts derived from ES cells lacking DNMTs cannot survive past approximately 10 days of development (Li et al., 1992).
Mammalian germ cells are derived from somatic cells, rather than being set-aside during the first zygotic cleavages. During germ cell development, the genome undergoes a wave of nearly complete demethylation and remethylation (Popp et al., 2010; Walsh et al., 1998). This reprogramming event correlates with re-establishment of totipotency and with the creation of sex-specific methylation patterns at imprinted loci (reviewed by (Sasaki and Matsui, 2008)). Germ cell methylation patterns are erased and reset during a second wave of epigenetic reprogramming that occurs during preimplantation development. Post-fertilization, DNA methylation levels reach a nadir around the 8-cell stage, after which methylation is re-written, attaining its somatic level by the blastocyst stage (Mayer et al., 2000). Since this is completed prior to the establishment of the inner cell mass from which cultured embryonic stem (ES) cells are derived, one can view ES cells and mature germ cells as the terminal products of the two landmark epigenetic reprogramming events in mammals.
Mobile genetic elements constitute roughly half of most mammalian genomes (Lander et al., 2001). Repression of transposons relies critically on DNA methylation and is essential for the maintenance of genomic stability in the long term and of germ cell function in the near term (Bestor, 1998; Bourc’his and Bestor, 2004; Okano et al., 1999; Walsh et al., 1998). At least in part, silencing of repeated DNA depends upon an abundant class of PIWI-associated small RNAs, called piRNAs (reviewed in (Aravin and Hannon, 2008)). In the absence of this pathway, methylation is lost on at least some element copies, transposons are de-repressed, and germ cell development is arrested in meiosis.
CpG dinucleotides are underrepresented in mammalian genomes, most likely because a higher rate of spontaneous deamination of methylated cytosines exerts evolutionary pressure for CpG depletion by frequent CpG-to-TpG transitions (Duncan and Miller, 1980; Ehrlich et al., 1990). Mammalian genomes contain areas of relatively high CpG density, called “CpG islands” (CGIs) (Gardiner-Garden and Frommer, 1987), which have avoided CpG depletion over evolutionary time. CGIs are frequently observed at promoters and in some cases have been shown to exert regulatory effects. Thus, selection against CpG depletion may reflect the importance of specific CpG dinucleotides as sequence-based binding sites or simply the requirement for a certain regional density of CpGs. As an alternative, the existence of CGIs may simply be an artifact of longstanding hypomethylation of these regions, and consequent relief from CpG erosion, in mammalian germ cells. Under this hypo-deamination model, selective pressure is independent of CpG density, per se, and CGIs may instead be a secondary consequence of protection from methylation at specific sites combined with prevalent methylation elsewhere in the genome (Cooper and Krawczak, 1989; Duncan and Miller, 1980; Ehrlich et al., 1990).
Studies encompassing evolutionarily distant species have shown that broad features of the epigenome, such as the high methylation levels of gene bodies and repeats, are deeply conserved (Zemach et al., 2010). In closely related species, however, fine-scale analysis of DNA methylation state reveals variation. The chimpanzee and human genomes share more than 95% sequence homology but display regions of differential methylation (Enard et al., 2004). Through focused studies, we have gained glimpses into the characteristics of the methylome and the evolutionary pressures that shape it. We wished to enable genome-wide comparisons of DNA methylation states in closely related species and to examine possible differences between the two major waves of epigenetic remodeling that occur during the mammalian life cycle. We therefore produced full-genome, single-CpG resolution DNA methylation profiles in human and chimp sperm and compared these with methylation maps from human ES cells (Lister et al., 2009).
We conducted genome-wide shotgun bisulfite sequencing of individual sperm DNA samples isolated from two human and chimp donors (see supplementary information for details). Basic data analysis was conducted using a custom pipeline. We were able to determine methylation status for 96% of genomic CpGs in the human and chimp samples from a total of 28M and 27M CpGs, respectively (Table 1). Read coverage for CpGs on autosomes averaged 16X in human with an overall methylation level of ~70% for all CpG sites. For chimp we sequenced to an average coverage of nearly 14X and observed an average methylation level of ~67%. We did not observe significant methylation at non-CpG sites in either dataset. For comparison, we applied our analysis pipeline to a whole-genome bisulfite dataset from human ES cells (Lister et al., 2009). This dataset was comparable to our own, with 93% of CpG dinucleotides covered and an average depth of 14X on CpGs genome-wide.
We identified contiguous domains of low methylation, termed hypomethylated regions or HMRs, in a manner independent of genomic annotations such as CGIs and promoters. Since methylation levels in sperm were generally high, HMRs appeared obvious on browser plots as valleys in which methylation dropped to very low levels. To call HMRs in a statistically principled manner, we designed a novel computational approach, based on a 2-state hidden Markov model with Beta-Binomial emission distributions (see supplementary information). This algorithm identified ~79k HMRs in human sperm and ~70k HMRs in chimp sperm. Only ~44.5k HMRs were identified using the human ES cell dataset, despite similar sequence coverage and overall methylation level ((Lister et al., 2009); see Table Table11 and S1A). The size of HMRs also differed between germ and ES cells. In both chimp and human sperm, the mean size of HMRs was ~1.8kb and the median was ~1.3kb. In ES cells HMRs showed a mean size of ~1.2kb with a median of 833bp. HMRs overlapped all classes of genomic annotation (see Table S1B).
Average methylation levels differed by a small amount among the human donors (donor 1: 72%; donor 2: 67%), but were more similar among chimp donors (donor 1 and 2: 67%). The methylation status of individual CpGs of HMRs correlated very highly between individuals, with divergence being higher in repeats as compared to promoters (Fig. 1A, B). High inter-individual correlations at the CpG and the HMR levels imply that our datasets permit accurate calling of CpG methylation genome-wide.
We also compared methylation between species at an individual nucleotide level (see supplementary methods for details). As expected, the correlations between human and chimp sperm methylation are high, but the correlation remains generally highest within species.
We also directly compared the methylomes from each of the human and chimp donors with the human ES cell (ESC) methylome. The nucleotide-level correlations between sperm methylation of each of the four primate individuals was higher than their correlations with ESC methylation patterns (Fig. 1A). However, the human ESC methylome did show substantially higher correlation with the human germ cell methylomes than with those of chimp donors. Considered together these results indicate that, although waves of reprogramming in developing germ cells and embryos culminate in high genome-wide methylation, these two methylomes bear substantial differences overall.
The majority of promoters are associated with HMRs in both sperm and ESC, indicating widespread bookmarking of promoters during both waves of epigenetic reprogramming. A number of promoters did show differential methylation, with 1336 showing sperm-specific HMRs but only 201 showing ESC-specific HMRs (Fig. 2A). Promoters hypomethylated in germ cells were strongly enriched for putative binding sites of transcriptions factors known to function in testis, including NRF1, NF-Y, YY1 and CREB (see Fig S1). A similar analysis of ESC-specific HMRs failed to yield significant results.
Only the genes with sperm-specific promoter hypomethylation revealed a strong enrichment for functional (GO) categories. These were associated with germ cell functions (Fig. 2B; Tables S2) at distinct stages of gametogenesis (e.g. embryonic germ cell development and spermiogenesis). Thus, genes acting at developmental stages, potentially separated by decades, appear to maintain a permissive epigenetic state. Of the 8 genes analyzed from the piRNA metabolic process category, 7 showed promoter hypomethylation in sperm but not in ES cells and one was hypomethylated in both (Fig. 2B).
Retention of histones in human sperm was reported to be extensive (Hammoud et al., 2009). Our analysis of this data revealed a strong correlation between retained histones marked by H3K4me3 and HMRs at promoters. Among the 25.8k promoters marked by H3K4me3 in sperm, 91% overlapped an identified HMR. In general these results support prior observations that the presence of H3K4me3 at promoters is often accompanied by hypomethylation (Hammoud et al., 2009; Ooi et al., 2007).
It was previously posited that genes involved in early embryonic development had a distinct chromatin status in sperm, being hypomethylated, histone-retained, enriched in H3K4me3 marks, and thus poised for expression (Hammoud et al., 2009). At least with respect to DNA methylation, we do not detect a preferential link between HMRs in sperm and developmental regulators but instead widespread HMRs. One potential explanation for this perceived discrepancy is that our comparisons involve sperm and ES cells, while prior studies used a differentiated cell type to contrast with sperm.
The genes with promoters that lack HMRs in both sperm and ESC (N = 5,380; Fig. 2A) show strong enrichment for G-protein coupled receptors and genes involved in neurological functions (Table S2C and S2D). The reason why many of these genes, associated with highly specialized cell types, seem to lack promoter HMRs in sperm and ESC remains obscure.
Differences in average size and CpG densities suggest that the HMRs emerging after germ cell reprogramming differ qualitatively from those emerging after zygotic reprogramming (Fig. 3A, Table S1A). The majority of HMRs have CpG density between 1% and 10%, and promoter HMRs fall almost exclusively in this range for the sperm methylomes. Those HMRs falling below 1% CpG density lie almost exclusively in repeats. These are overrepresented in human sperm relative to chimp sperm and human ESCs. Promoter-associated HMRs have sizes concentrated between 1kb and 10kb in human and chimp sperm, with an overall trend to be broader than promoter-associated HMRs in ESCs (Fig. 3A). A notable increase in CpG density accompanies narrowing of HMRs and results in a significant portion of ESC HMRs with a CpG density above 10%.
To probe structural differences among HMRs in ES cells and sperm, we plotted the average methylation around HMR-associated transcriptional start sites (TSSs), genome-wide (Fig. 3B, upper). This revealed a general principle, that a core HMR in ES cells, referred to as a nested HMR (Fig. 3B, lower), often lies within an extended HMR in sperm. The median size of nested ESC HMRs is 1498, less than half the median size of 3109 for the sperm HMRs in which they reside. This phenomenon was also observed independently in a comparison of somatic and sperm HMRs, where variations in boundaries were additionally correlated with tissue-specific expression (Hodges et al., submitted). Extended HMRs are reminiscent of the concept of CpG shores (Doi et al., 2009), though in comparisons of sperm and ESC, we made no attempt to correlate gene expression with the widespread phenomenon of nesting that we report herein.
The observation of nested HMRs could arise either from a true expansion of the hypomethylated domain in sperm or as an artifact of sperm having less precise HMR boundaries than ESC. Examining degrees of change in methylation states across boundary CpGs in both cell types supports the former conclusion (Fig. 3C). Thus, nesting appears to represent a general phenomenon and likely reflects differences in the underlying mechanisms by which the boundaries of hypomethylated regions are determined during the waves of de novo methylation that lead to sperm and ESC.
As a step toward addressing such mechanisms, we asked whether any features are associated with HMR boundaries in either cell type. Two interesting characteristics emerged. Approaching the boundaries of either the extended sperm HMRs or the nested ES cell HMRs, CpG densities dropped just prior to the start of the HMR and rose dramatically again thereafter, though overall densities were higher in the nested portions (Fig. 3D). This reflects an increase in the average inter-CpG distance at the boundaries of HMRs (Fig. 3E). Because our method of identifying HMRs is agnostic to inter-CpG distance, this is not simply an artifact of our approach. One could imagine increases in inter-CpG distance interrupting a processive activity, preventing the spread of de novo methylation either directly or indirectly.
Though we had no a priori expectation that sequence features would reside at sperm or ESC HMR boundaries, we searched for motifs that might occur at or near boundary CpGs, independent of CpG density. We noted a trend towards enrichment for an ACGT motif at ESC boundary CpGs with a corresponding depletion immediately outside ESC HMRs (Fig. S2). This pattern was not significantly enriched at the boundaries of extended sperm HMRs. Building upon this observation, we also searched for larger motifs, focusing on those containing a central CpG core. Patterns with strong differences across HMR boundaries tended to have the ACGT core (Table S3). The most enriched pattern for sperm was AACGTT. For ESCs we saw a well-known E-box pattern, CACGTG. Plotting observed-to-expected frequencies centered on CpGs around boundaries of extended and nested HMRs (Fig. 3F), there was a clear depletion just outside each boundary followed by a sharp enrichment at the boundary CpG for each pattern in the appropriate cell type (Fig. S2B). These results raise the possibility that one or more DNA binding proteins might localize to HMR boundaries during waves of de novo methylation and help to define transitions in methylation states.
Consistent with prior observations and with the known role of DNA methylation in transposon silencing, most repeat elements were highly methylated in both sperm and ESC. However, a substantial fraction of HMRs overlapped transposons in chimp and human sperm, with all repeat classes represented (Fig 4A, Table S1B). Fewer repeat-associated HMRs appeared in ESCs. In sperm, HMRs collectively contained 4-5% of all bases assigned to repeats, compared to 1.3% in ESC (see Table S1B). Overall, this suggests that different mechanisms, with different stringencies, direct repeat methylation during germ cell and preimplantation development.
We noted a strong decrease in methylation of sperm DNA within pericentromeric regions, extending several megabases outward from the unassembled core centromeres (Fig. 4B). This was not seen in ESC or in terminally differentiated cells (Hodges et al., submitted). This striking pattern was attributable to sperm-specific hypomethylation of ~75-80% of the satellite repeats concentrated in pericentromeric regions (Fig. 4A). In ESC, only 16% of pericentromeric satellites were hypomethylated, a figure in accord with the overall hypomethylation rates of non-pericentromeric satellites in ESC and sperm (Table S4A). Prior studies of mouse germ cells using methylation sensitive restriction enzymes had noted selectively low methylation at pericentromeric satellites, suggesting that this is a conserved property (Yamagata et al., 2007).
Proper methylation of retrotransposons is required for transcriptional silencing of full-length and potentially active copies (Bourc’his and Bestor, 2004; Goodier and Kazazian, 2008; Walsh et al., 1998). However, specific retroelements can be active or unmethylated in male germ cells (e.g., AluY and AluYa5) (Schmid, 1991). Given our read lengths, we were able to address the methylation state of virtually all repeat families and most individual copies (see Table S4B).
Overall, retrotransposon copies that were full length or close to consensus showed a slight bias towards hypomethylation (Fig. S3A, S3B). However, neither of these attributes could explain the variation observed in retrotransposon methylation. Hypomethylated repeat copies did tend to have greater CpG density, especially within the LTR and SVA classes (Fig. 4C). For LINEs, LTR elements, and terminal repeats, HMRs concentrated within regulatory regions, which often show higher CpG density than their coding regions (Figs. (Figs.4D,4D, S3C, D, Tables S4D, G). SINE elements displayed a more uniform hypomethylation (Fig. S4E). Thus, similar mechanisms appear to define HMRs in both repeat and non-repeat portions of the genome, since for most repeats, there is a strong association of sperm HMRs with regulatory regions.
Among the LINEs, subfamilies of L1 were often hypomethylated in both sperm and ES cells and these trended strongly towards the active groups (Table S4E and S4H). L1PA subfamilies are considered the most active in the human genome (Khan et al., 2006), and the youngest of these (L1HS and L1PA2) were among the very few subfamilies enriched for hypomethylation in ES cells relative to sperm. Specifically in sperm, we noted hypomethylation of several other L1 families (e.g. L1PA4-16 and L1M3).
Among LTR subfamilies, sperm HMRs were enriched for ERV elements (Table S4C). Hypomethylated copies exist either as part of full-length provirus-like elements or as solo LTRs, with the greatest enrichment for LTRs belonging to “class I” elements (e.g. LTR12; see Table S4D and S4G). The few LTR subfamilies with more hypomethylated copies in ESC than sperm are all recently derived, human-specific ERVs (e.g., LTR5 and 13 and HERVH LTR7).
Sperm hypomethylation has been previously reported for primate Alu elements (Kochanek et al., 1993; Liu et al., 1994), and our data revealed several Alu subfamilies with differential methylation in sperm and ES cells, e,g., the AluY subfamily (Tables S4F and S4I). The more precisely defined AluYa5 (human) and AluYd4 (chimp) showed extreme enrichment for hypomethylation in sperm.
SVA elements showed strong, species-specific differences in methylation in human and chimp sperm (Fig. 4A). SVAs are composite elements consisting of hexameric repeats, an Alu-like region, a VNTR (variable number of tandem repeats) region and a SINE-R (Shen et al., 1994). SVA elements were active in the most recent common ancestor of chimp and human (Mills et al., 2006), and multiple examples of neoinsertions suggest that they still cause genomic rearrangements and disease in human (Ostertag et al., 2003).
Among the SVAs, the youngest subfamilies, D-F (Wang et al., 2005), showed the greatest frequency of hypomethylation in human sperm (Fig. 5A). Notably, these have a higher CpG density than do older subfamilies. 358 SVA insertions can be assigned as high-confidence orthologs between human and chimp, which remain highly similar in sequence (see supplement). Methylation through these element copies was distributed through the full range from very low to very high average methylation, with two modes near 20% and 80% methylation (Fig. 5B). In human sperm, 35% of orthologous SVAs had a methylation level below 50%. In sharp contrast, only 6% of copies fell below 50% methylation in chimp. We also annotated 921 SVA elements that appear to represent new insertions occurring after the human-chimp divergence (Mills et al., 2006). 852 (93%) of these were hypomethylated in sperm compared with only 62 (7%) in ES cells (Fig 5A). Considered together, our data indicate that SVA elements have come under different degrees of epigenetic control in the human and chimp lineages.
Many SVA insertions occur at or around promoters (Lander et al., 2001; Mikkelsen, 2005), and these elements often have a CpG content high enough to fit the traditional definition of a CpG island. Given their properties, SVA elements have the potential to introduce differential, species- and cell type-specific methylation near genes that may be relevant for their regulation. Figure 5C exemplifies such a situation where, in the case of TLR1, no HMR exists near the promoter in chimp sperm or human ES cells, but one is contributed in human sperm by a nearby SVA element. Although sperm are largely transcriptionally silent, similar HMRs are expected to exist in transcriptionally active developing germ cells (data not shown).
CGIs are the most well known evolutionary signature of vertebrate DNA methylation. Their original definition required a CpG observed-to-expected (o/e) ratio of at least 0.6. Although the full set of HMRs in human sperm and ESCs did not reach this empirical cut off, they did pass the 0.4 benchmark used by Weber and colleagues (Fig. 6A) (Weber et al., 2007). In general promoter-associated HMRs did surpass the 0.6 o/e cut off in both sperm and ESC.
The differences in CpG density in nested and extended HMRs (Fig. 3B) imply distinct CpG depletion pressure in these regions. Average CpG composition genome-wide is ~0.2 o/e, but reaches ~0.35 in extended HMRs and 0.68 in nested HMRs. We analyzed sperm-specific and ESC-specific HMRs in an attempt to decompose the CpG depletion pressure exerted by the two methylomes. The ESC-specific HMRs reached only 0.35 o/e CpG composition, while the sperm-specific HMRs reached a CpG composition of 0.5.
The life cycle of a germ cell can be separated into two components. The first is the time from fertilization to the time that somatically derived, primordial germ cells (PGCs) reach the genital ridge. Second is the time during which the PGC develops into a mature germ cell, which contributes to the zygote. The latter period generally spans from birth to the end of the reproductive life of the animal. Our data suggest a model in which methylation patterns present during both of these intervals shapes genomic CpG distributions but indicate a greater influence of methylation profiles during germ cell maturation (Fig. 6A).
We sought to measure the degree to which differential methylation could lead to CpG decay over the ~6My of divergent evolution separating human and chimp. We focused on regions that qualified as HMRs in either chimp or human. since These regions could have either lost methylation along one lineage or gained methylation along the other. For a given regional methylation level, we measured CpG decay as the proportion of regions having lost more than 5% of inferred ancestral CpGs (using gorilla as outgroup) and plotted the relationship between average methylation and decay rate (Fig. 6B). The correlation between regional methylation level and CpG decay was extremely strong for both human and chimp. These results indicate that CpG decay is appreciable as a function of methylation even over relatively brief evolutionary periods.
This observation predicted that we might see signatures of selective pressure preventing erosion of some CpGs that are maintained despite germline methylation. To address this question, we analyzed segregating sites at CpG dinucleotides using data from the HapMap 3 project (CEU population; (Altshuler et al., 2010)). CpGs were treated symmetrically, so each derived allele at these sites can be classified as A, G or T. As expected, segregating sites with T as the derived allele represent the vast majority.
We generated frequency spectra for each derived allele nucleotide with sites classified according to their methylation level in sperm (Fig. S4). As methylation levels increased, derived allele frequencies shifted towards the low ends of the spectra (Fig. (Fig.6C6C and S4). This shift was observed not only for derived TpG alleles, which could be explained by an extreme bias in mutation rate, but also for ApG and GpG derived alleles. One interpretation of these findings is that selection is on average weaker at individual CpG sites with lower sperm methylation. Such an interpretation is consistent with recent findings of Cohen et al. (Cohen et al., 2011) who used sophisticated evolutionary models to posit that selection for high CpG content is not a significant factor contributing to maintenance of CGIs in the genome.
The strong connection between HMRs and gene promoters suggests that the evolutionary gain or loss of HMRs may be associated with changes in selective pressure on functional regulatory regions. To investigate this possibility we analyzed sequence divergence in HMRs, focusing on those that are human- or chimp-specific. Since these differentially methylated regions will have different rates of C-to-T transitions, we counted changes from the inferred ancestor only at non-CpG sites. Genomic intervals differing by more than 1% relative to the inferred ancestor were counted as having divergent sequences.
Only 10% of HMRs shared between human and chimp showed divergence from the ancestral sequence at non-CpG sites (Fig 6D). At chimp-specific HMRs, 15% of human sequences and 19% of chimp sequences diverged from the inferred ancestor. At human-specific HMRs, 22% of human sequences diverged and 18% of chimp sequences diverged. These results indicated that changes in methylation state between human and chimp are associated with accelerated non-CpG sequence divergence. Interestingly, in both cases the species with lower methylation state had a greater rate of divergence, which is consistent with adaptation at novel regulatory regions as a driver for these changes.
We only identified 104 promoters that are hypomethylated in human but not in chimp sperm and only 52 genes with differential promoter methylation in the opposite orientation. Neither set showed significant enrichment for any ontology category. However, analysis of genes with promoters within 10kb of an identified human-specific sperm HMR revealed a strong enrichment for neuronal functions (see Tables S5). The HTR3E gene, a serotonin receptor subunit, is an example of such a gene, whose promoter is selectively hypomethylated in human sperm (Fig. 6E).
Overall, sperm methylation patterns were highly similar in all our samples. However, there were differences, even among individuals. There has been much discussion regarding the role of germline transmission of epigenetic marks in inter-individual variation (Curley et al., 2011). Changes in epigenetic state could allow flexibility in phenotype that could be reverted over short time spans if a trait became disadvantageous. Erosion of CpG content provides a mechanism to allow fixation of a positive trait in the long run. Thus, changes in DNA methylation patterns preceding changes in DNA sequence presents an attractive model for at least one mode of adaptation. While evaluating such hypotheses will require many more datasets, the work presented here builds a firm foundation for such studies.
Global resetting of DNA methylation patterns happens twice during mammalian development: once during germ cell development and once early in embryogenesis. Our data permit a genome-scale analysis of these two events. While high genome-wide levels of methylation are re-established during both waves of epigenetic remodeling, some regions are protected and establish HMR boundaries that appear relevant even in fully differentiated somatic cells (Hodges et al., submitted). A few promoters showed selective hypomethylation in sperm, and these are strongly enriched for annotations related to germ cell processes. Far fewer were selectively hypomethylated in ES cells and these were not enriched in any particular annotation category. Promoters of genes retaining nucleosomes have recently been shown to be hypomethylated in human sperm (Hammoud et al., 2009), and both of these features have been proposed to aid rapid activation during development. We find that gene-associated hypomethylation in sperm can be extended to more than 70% of all annotated genes in both human and chimp. Among these we failed to find any enrichment for regulators of early development. Instead, it seems that promoter regions are instead generally identified and bookmarked in sperm (see (Zaidi et al., 2010)).
Genome-wide, CpG sites seem to adopt a methylated state by default (Edwards et al., 2010). This raises the problem of precisely how regions that become HMRs are identified as such. Regions of hypomethylation at promoters have been correlated with regulatory DNA in various developmental contexts (Illingworth et al., 2008; Lister et al., 2009; Rollins et al., 2006; Straussman et al., 2009). Based upon analysis of histone marks and on the proposed binding properties of Dnmt3s (Dhayalan et al., 2010; Zhang et al., 2010), active transcription and accompanying methylation of K4 on histone H3 are thought to locally inhibit the methylation machinery. This could enable large-scale recognition of promoter regions if widespread transcription occurs during fetal germ cell development as genomic methylation patters are erased and reset. It is also plausible that specific protein/DNA complexes act locally even in the absence of active transcription, to prevent access by de novo methyltransferases. Proteins observed to function as boundary elements, such as CTCF and Sp1 (reviewed in (Gaszner and Felsenfeld, 2006)), provide candidates for such functions.
Despite overall similarity in the sets of promoters they mark, the HMRs observed at promoters in mature male germ cells usually extend beyond the boundaries of HMRs in ES cells when the two overlap. These wider HMRs do not seem to reflect less precision in HMR boundaries, as methylation differences across HMR boundaries are similar between sperm and ES cells. Because this “nested” HMR phenomenon is observed at so many promoters, it does not seem to be associated with the regulation of any specific genes during germ cell development. We have observed a clear increase in CpG content through the extended portion of these HMRs relative to the genome-wide average, suggesting they have to some degree avoided pressure to decay, and hence are more than a transient state. The phenomenon that we observe is similar to the concept of CpG shores (Doi et al., 2009). Perhaps the extended HMRs in germ cells presage the extent of “shores” that correlate with changes in gene expression.
Our data suggest that HMRs emerge from de novo methylation in male germ cells with sizes that differ from those that emerge from somatic reprogramming. Thus, despite involvement of similar methyltransferases and targeting of similar sets of sequences, the determinants of HMR sizes likely differ between the two reprogramming events. We have begun to see hints to the mechanisms determining such differences by comparing boundary-associated motifs in sperm and ESCs.
It is thought that germ cell genomes must be closely guarded from the activity of mobile genetic elements. While repeats were generally heavily methylated, we did find HMRs that overlapped repeats, and these were substantially more prevalent in sperm. We and others have characterized a conserved, small RNA-based silencing pathway, termed the piRNA pathway, that is important for recognizing and silencing mobile elements in germ cells (Aravin and Hannon, 2008). Our data indicate both individual element copies and broader element subfamilies can evade piRNA-based silencing. Yet, both these element copies and element families are often efficiently silenced during preimplantation development. This suggests fundamental differences in the mechanisms that recognize repeats and mark them for repression during the two major waves of epigenetic reprogramming in mammals.
Examining patterns of repeat-associated HMRs is potentially enlightening. HMRs are more prevalent in younger transposon subfamilies and the hypomethylated regions themselves tend to overlap with promoters or regulatory regions, just as they do in genes. Thus, it may be that active elements evade default methylation by being initially recognized as gene-like as a consequence of their binding transcription factors and possibly even being transcribed. In these cases, we imagine that silencing of most elements would be enforced by the piRNA pathway but that some sites, such as those we observe herein, might still escape. A number of examples can be cited in support of this hypothesis. The 5′UTRs of the L1PA subfamilies are known to carry conserved YY1 binding sites, while other recent subfamilies acquired RUNX3 and SRY binding motifs, all of which could promote transcription in developing germ cells (Khan et al., 2006; Lee et al., 2010). Similarly the sperm enriched hypomethylated EVR9 LTR12 elements have been shown to bind NF-Y, MZF1 and GATA-2 in erythroid K562 cells (Yu et al., 2005). In each of these cases, HMRs within these elements tend to encompass such potential transcription factor binding sites.
Similarly, Alu RNAs have been detected in human sperm (Kochanek et al., 1993). This suggests a potential link between Alu HMRs and the transcriptional activity of individual repeats, though previous studies also reported that the binding of SABP across Alu elements in sperm prevents their methylation (Chesnokov and Schmid, 1995). Interestingly, Alu hypomethylation is not seen in female germ cells (Liu et al., 1994) and has been proposed as one mediator of sex-specific imprints.
Satellites resist methylation in sperm when localized in clusters at centromeres, but are generally methylated when located elsewhere even if they are clustered. This is consistent with previous observations made in mouse through the use of methylation sensitive enzymes (Yamagata et al., 2007). Recent reports have shown that the transient transcriptional activation of paternal pericentromeric satellites was essential for centromeric heterochromatin formation in 2-cell zygotes (Probst et al., 2010). This could indicate that hypomethylation of satellite repeats in male germ cell marks paternal centromeres, in a manner similar to imprinting, allowing their rapid transcriptional activation upon fertilization.
In addition to a characteristic location within chromocenters in sperm, centromeres display a distinct chromatin structure differentiating them regionally during meiosis from other chromosomal regions (reviewed by (Dalal, 2009)). This has prompted suggestions that centromeric chromatin states might be critical for proper meiosis, a hypothesis strongly supported by our observation of selective hypomethylation of megabase domains of centromeric satellite clusters. Prior studies have demonstrated that de-repression of satellite repeats in mitotic cells creates segregation defects due to the formation of anaphase bridges (Frescas et al., 2008). Low methylation levels have also been correlated with the ability to bind cohesin complexes (Parelho et al., 2008). Considered as a whole, these observations suggest a model in which selective hypomethylation of centromeric satellites might be critical for accurate chromosome segregation during meiosis.
The most striking example of species-specific methylation to emerge from our analysis involved the SVA elements. These primate-specific composite elements contain a high density of CpGs, remain active in human and chimp and include many copies that are clear orthologs between human and chimp (Bantysh and Buzdin, 2009; Mills et al., 2006). Transduction of SVAs has been implicated in human diseases and gene formation (Damert et al., 2009; Ostertag et al., 2003). Our results indicate that for a subset of SVA elements, the ability to methylate these elements has either been acquired along the chimp lineage or lost in the human lineage during the past six million years, despite very little sequence change in these elements.
It has been thought that CGIs arose as the result of protection from methylation-associated deamination over long evolutionary periods. This is consistent with the observed correlation between the location of CGIs and regions that lack methylation in both germline and somatic cells. However, recent results have pointed to functions for CGIs that may be associated with their high CpG density (Thomson et al., 2010), with the plausible interpretation that selection may be acting to preserve CpG density in CGIs. We find that while most CGIs fall within HMRs of sperm, most HMRs extend well beyond the annotated CGIs, even using weaker CGI definitions. Thus, hypomethylated regions in male germ cells do not appear to require a critical CpG density to avoid methylation. Instead, our results are consistent with CGIs arising as a consequence of different mutational pressures rather than selection for CpG density.
In our datasets, signatures of deamination induced CpG depletion are clear. Yet we also observe CpG depletion from many sperm and ES cell HMRs. Several scenarios could resolve this conundrum. For example, such regions may have been methylated for substantial periods prior to assuming their unmethylated status. Thus, they may have decayed at some time in the past but are now stabilized by their hypomethylated status. Such sites could also actually be methylated during a period of germ cell development to which our current datasets are blind (e.g. in fetal gonocytes or female germ cells). In accord with this explanation, we have observed distinct CpG densities associated with sperm-specific and ESC-specific HMRs. Moreover, at HMRs where the only central, nested portion is hypomethylated in ES cells, we observe greater CpG retention through regions hypomethylated in both ES cells and sperm. Overall, we cannot exclude a model in which selection acts to preserve critical functions requiring specific local CpG densities. However, our results lend additional support to recent conclusions of Cohen et al. (Cohen et al., 2011) whose sophisticated evolutionary modeling showed that CGIs can be explained without invoking selection on CpG sites. Our results suggest a refinement of the hypo-deamination model in which CpG retention is a function of the time spent hypomethylated during each generation in germ cells and their somatic precursors.
The detailed comparative analysis performed here has revealed that, over the ~6M years since the divergence of human and chimp, most patterns of DNA methylation remain conserved in male germ cells. We have directly related evolutionary changes in CpG methylation with loss of CpG dinucleotides and have shown that even small differences in methylation can lead to substantial loss of CpGs over relatively short evolutionary periods. At the same time, there are many genomic regions that are highly conserved in sequence yet show quite different patterns of methylation. This could indicate an ability of the genome and the epigenome to evolve independently. However, we do find that the most drastic changes in methylation between human and chimp, where an HMR in one species shows high levels of methylation in the other, are accompanied by an increased sequence divergence even at non-CpG dinucleotides. One interpretation is that most species-specific HMRs have arisen newly along one lineage with these novel functional elements are showing signs of recent adaptation. On the other hand, if this accelerated sequence change were more a reflection of relaxed selective pressure, we would expect species-specific HMRs to more frequently result from loss of functional elements along the opposite lineage. Resolution of these questions can only come from a broadening of the studies reported herein to many more species.
Detailed methods can be found as supplementary information.
Two anonymous human donors were used and data pooled after sequencing. Two chimp donors were used. Semen was collected at the New Iberia Research Center (New Liberia, LA 70560) or the Southwest National Primate Research Center (San Antonio, TX 78227). Coagulated semen was separated from the liquid phase manually. Both human and chimp samples were diluted (1:1) in HBS buffer (0.01M HEPES ph 7.4; 150mM NaCl) and passed though a silica-based gradient, SpermFilter (Cryobiosystems), by centrifugation (according to manufacturer’s instructions).
DNA from ~100 million cells was extracted and sheared to a size of ~150-200nt by sonication. Double-stranded DNA fragments were end repaired, A-tailed, and ligated to methylated Illumina adaptors. Ligated fragments were bisulfite converted using the EZ-DNA Methylation-Gold Kit (Zymo research). Following PCR enrichment, fragments of 340 to 360bp were size selected and sequenced.
Reads were mapped with RMAPBS (Smith et al., 2009). The accuracy of our mapping method is discussed in Supplementary Information. Mapped reads were used to infer the methylation frequency at each CpG dinucleotide. These frequencies, along with the number of reads contributing to each frequency estimate, were supplied to a segmentation algorithm used to identify HMRs. Ortholog mapping between human and chimp was done with the liftOver tool available through the UCSC Genome Browser. Sequence conservation between human, chimp and rhesus was measured based on MULTIZ 44-way vertebrate alignments, also available through the UCSC Genome Browser. Complete details of all computational methods are provided in Supplementary Information.
We thank Michelle Rooks, Pramod Thekkat and Colin Malone for help with experimental procedures and Assaf Gordon, Luigi Manna and the CSHL and USC High Performance Computing Centers for computational support. We thank Babette Fontenot (New Iberia Research Center) and Jerilyn Pecotte (Southwest National Primate Center) for help with chimp sperm collection. We thank Sergey Nuzhdin, Ed Green, Peter Calabrese, Maren Friesen, Magnus Norborg, and Marie-Stanislas Remigereau for helpful discussions. This work was supported in part by grants from the NIH (R01HG005238 and 1RC2HD064459) and by a kind gift from Kathryn W. Davis. Data analyzed herein has been deposited in GEO (accession #*).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.