Methylomes of mature male germ cells in human and chimp
We conducted genome-wide shotgun bisulfite sequencing of individual sperm DNA samples isolated from two human and chimp donors (see supplementary information
for details). Basic data analysis was conducted using a custom pipeline. We were able to determine methylation status for 96% of genomic CpGs in the human and chimp samples from a total of 28M and 27M CpGs, respectively (). Read coverage for CpGs on autosomes averaged 16X in human with an overall methylation level of ~70% for all CpG sites. For chimp we sequenced to an average coverage of nearly 14X and observed an average methylation level of ~67%. We did not observe significant methylation at non-CpG sites in either dataset. For comparison, we applied our analysis pipeline to a whole-genome bisulfite dataset from human ES cells (Lister et al., 2009
). This dataset was comparable to our own, with 93% of CpG dinucleotides covered and an average depth of 14X on CpGs genome-wide.
Shotgun bisulfite sequencing of human and chimp sperm methylomes
We identified contiguous domains of low methylation, termed hypomethylated regions or HMRs, in a manner independent of genomic annotations such as CGIs and promoters. Since methylation levels in sperm were generally high, HMRs appeared obvious on browser plots as valleys in which methylation dropped to very low levels. To call HMRs in a statistically principled manner, we designed a novel computational approach, based on a 2-state hidden Markov model with Beta-Binomial emission distributions (see supplementary information
). This algorithm identified ~79k HMRs in human sperm and ~70k HMRs in chimp sperm. Only ~44.5k HMRs were identified using the human ES cell dataset, despite similar sequence coverage and overall methylation level ((Lister et al., 2009
); see Table and S1A
). The size of HMRs also differed between germ and ES cells. In both chimp and human sperm, the mean size of HMRs was ~1.8kb and the median was ~1.3kb. In ES cells HMRs showed a mean size of ~1.2kb with a median of 833bp. HMRs overlapped all classes of genomic annotation (see Table S1B
Global comparisons among primate sperm methylomes and with human ES cells
Average methylation levels differed by a small amount among the human donors (donor 1: 72%; donor 2: 67%), but were more similar among chimp donors (donor 1 and 2: 67%). The methylation status of individual CpGs of HMRs correlated very highly between individuals, with divergence being higher in repeats as compared to promoters (). High inter-individual correlations at the CpG and the HMR levels imply that our datasets permit accurate calling of CpG methylation genome-wide.
A global view of sperm and ESC methylomes
We also compared methylation between species at an individual nucleotide level (see supplementary methods
for details). As expected, the correlations between human and chimp sperm methylation are high, but the correlation remains generally highest within species.
We also directly compared the methylomes from each of the human and chimp donors with the human ES cell (ESC) methylome. The nucleotide-level correlations between sperm methylation of each of the four primate individuals was higher than their correlations with ESC methylation patterns (). However, the human ESC methylome did show substantially higher correlation with the human germ cell methylomes than with those of chimp donors. Considered together these results indicate that, although waves of reprogramming in developing germ cells and embryos culminate in high genome-wide methylation, these two methylomes bear substantial differences overall.
Comparison of hypomethylated promoters between sperm and ESC methylomes
The majority of promoters are associated with HMRs in both sperm and ESC, indicating widespread bookmarking of promoters during both waves of epigenetic reprogramming. A number of promoters did show differential methylation, with 1336 showing sperm-specific HMRs but only 201 showing ESC-specific HMRs (). Promoters hypomethylated in germ cells were strongly enriched for putative binding sites of transcriptions factors known to function in testis, including NRF1, NF-Y, YY1 and CREB (see Fig S1
). A similar analysis of ESC-specific HMRs failed to yield significant results.
Differentially reprogrammed genes and their functions
Only the genes with sperm-specific promoter hypomethylation revealed a strong enrichment for functional (GO) categories. These were associated with germ cell functions (; Tables S2
) at distinct stages of gametogenesis (e.g. embryonic germ cell development and spermiogenesis). Thus, genes acting at developmental stages, potentially separated by decades, appear to maintain a permissive epigenetic state. Of the 8 genes analyzed from the piRNA metabolic process category, 7 showed promoter hypomethylation in sperm but not in ES cells and one was hypomethylated in both ().
Retention of histones in human sperm was reported to be extensive (Hammoud et al., 2009
). Our analysis of this data revealed a strong correlation between retained histones marked by H3K4me3 and HMRs at promoters. Among the 25.8k promoters marked by H3K4me3 in sperm, 91% overlapped an identified HMR. In general these results support prior observations that the presence of H3K4me3 at promoters is often accompanied by hypomethylation (Hammoud et al., 2009
; Ooi et al., 2007
It was previously posited that genes involved in early embryonic development had a distinct chromatin status in sperm, being hypomethylated, histone-retained, enriched in H3K4me3 marks, and thus poised for expression (Hammoud et al., 2009
). At least with respect to DNA methylation, we do not detect a preferential link between HMRs in sperm and developmental regulators but instead widespread HMRs. One potential explanation for this perceived discrepancy is that our comparisons involve sperm and ES cells, while prior studies used a differentiated cell type to contrast with sperm.
The genes with promoters that lack HMRs in both sperm and ESC (N
= 5,380; ) show strong enrichment for G-protein coupled receptors and genes involved in neurological functions (Table S2C and S2D
). The reason why many of these genes, associated with highly specialized cell types, seem to lack promoter HMRs in sperm and ESC remains obscure.
Shared HMRs show distinct characteristics in sperm and ES cells
Differences in average size and CpG densities suggest that the HMRs emerging after germ cell reprogramming differ qualitatively from those emerging after zygotic reprogramming (, Table S1A
). The majority of HMRs have CpG density between 1% and 10%, and promoter HMRs fall almost exclusively in this range for the sperm methylomes. Those HMRs falling below 1% CpG density lie almost exclusively in repeats. These are overrepresented in human sperm relative to chimp sperm and human ESCs. Promoter-associated HMRs have sizes concentrated between 1kb and 10kb in human and chimp sperm, with an overall trend to be broader than promoter-associated HMRs in ESCs (). A notable increase in CpG density accompanies narrowing of HMRs and results in a significant portion of ESC HMRs with a CpG density above 10%.
Characteristics of HMRs emerging from germline and somatic reprogramming
To probe structural differences among HMRs in ES cells and sperm, we plotted the average methylation around HMR-associated transcriptional start sites (TSSs), genome-wide (, upper). This revealed a general principle, that a core HMR in ES cells, referred to as a nested HMR (, lower), often lies within an extended HMR in sperm. The median size of nested ESC HMRs is 1498, less than half the median size of 3109 for the sperm HMRs in which they reside. This phenomenon was also observed independently in a comparison of somatic and sperm HMRs, where variations in boundaries were additionally correlated with tissue-specific expression (Hodges et al., submitted). Extended HMRs are reminiscent of the concept of CpG shores (Doi et al., 2009
), though in comparisons of sperm and ESC, we made no attempt to correlate gene expression with the widespread phenomenon of nesting that we report herein.
The observation of nested HMRs could arise either from a true expansion of the hypomethylated domain in sperm or as an artifact of sperm having less precise HMR boundaries than ESC. Examining degrees of change in methylation states across boundary CpGs in both cell types supports the former conclusion (). Thus, nesting appears to represent a general phenomenon and likely reflects differences in the underlying mechanisms by which the boundaries of hypomethylated regions are determined during the waves of de novo methylation that lead to sperm and ESC.
As a step toward addressing such mechanisms, we asked whether any features are associated with HMR boundaries in either cell type. Two interesting characteristics emerged. Approaching the boundaries of either the extended sperm HMRs or the nested ES cell HMRs, CpG densities dropped just prior to the start of the HMR and rose dramatically again thereafter, though overall densities were higher in the nested portions (). This reflects an increase in the average inter-CpG distance at the boundaries of HMRs (). Because our method of identifying HMRs is agnostic to inter-CpG distance, this is not simply an artifact of our approach. One could imagine increases in inter-CpG distance interrupting a processive activity, preventing the spread of de novo methylation either directly or indirectly.
Though we had no a priori expectation that sequence features would reside at sperm or ESC HMR boundaries, we searched for motifs that might occur at or near boundary CpGs, independent of CpG density. We noted a trend towards enrichment for an ACGT motif at ESC boundary CpGs with a corresponding depletion immediately outside ESC HMRs (Fig. S2
). This pattern was not significantly enriched at the boundaries of extended sperm HMRs. Building upon this observation, we also searched for larger motifs, focusing on those containing a central CpG core. Patterns with strong differences across HMR boundaries tended to have the ACGT core (Table S3
). The most enriched pattern for sperm was AACGTT. For ESCs we saw a well-known E-box pattern, CACGTG. Plotting observed-to-expected frequencies centered on CpGs around boundaries of extended and nested HMRs (), there was a clear depletion just outside each boundary followed by a sharp enrichment at the boundary CpG for each pattern in the appropriate cell type (Fig. S2B
). These results raise the possibility that one or more DNA binding proteins might localize to HMR boundaries during waves of de novo methylation and help to define transitions in methylation states.
Differential repeat methylation in sperm and ES cells
Consistent with prior observations and with the known role of DNA methylation in transposon silencing, most repeat elements were highly methylated in both sperm and ESC. However, a substantial fraction of HMRs overlapped transposons in chimp and human sperm, with all repeat classes represented (, Table S1B
). Fewer repeat-associated HMRs appeared in ESCs. In sperm, HMRs collectively contained 4-5% of all bases assigned to repeats, compared to 1.3% in ESC (see Table S1B
). Overall, this suggests that different mechanisms, with different stringencies, direct repeat methylation during germ cell and preimplantation development.
Differential repeat methylation during male germ cell and somatic reprogramming
Sperm-specific satellite hypomethylation is concentrated at centromeres
We noted a strong decrease in methylation of sperm DNA within pericentromeric regions, extending several megabases outward from the unassembled core centromeres (). This was not seen in ESC or in terminally differentiated cells (Hodges et al., submitted). This striking pattern was attributable to sperm-specific hypomethylation of ~75-80% of the satellite repeats concentrated in pericentromeric regions (). In ESC, only 16% of pericentromeric satellites were hypomethylated, a figure in accord with the overall hypomethylation rates of non-pericentromeric satellites in ESC and sperm (Table S4A
). Prior studies of mouse germ cells using methylation sensitive restriction enzymes had noted selectively low methylation at pericentromeric satellites, suggesting that this is a conserved property (Yamagata et al., 2007
Retroelement methylation patterns are determined at the subfamily level
Proper methylation of retrotransposons is required for transcriptional silencing of full-length and potentially active copies (Bourc’his and Bestor, 2004
; Goodier and Kazazian, 2008
; Walsh et al., 1998
). However, specific retroelements can be active or unmethylated in male germ cells (e.g., AluY and AluYa5) (Schmid, 1991
). Given our read lengths, we were able to address the methylation state of virtually all repeat families and most individual copies (see Table S4B
Overall, retrotransposon copies that were full length or close to consensus showed a slight bias towards hypomethylation (Fig. S3A, S3B
). However, neither of these attributes could explain the variation observed in retrotransposon methylation. Hypomethylated repeat copies did tend to have greater CpG density, especially within the LTR and SVA classes (). For LINEs, LTR elements, and terminal repeats, HMRs concentrated within regulatory regions, which often show higher CpG density than their coding regions (Figs. , S3C, D
, Tables S4D, G
). SINE elements displayed a more uniform hypomethylation (Fig. S4E
). Thus, similar mechanisms appear to define HMRs in both repeat and non-repeat portions of the genome, since for most repeats, there is a strong association of sperm HMRs with regulatory regions.
Among the LINEs, subfamilies of L1 were often hypomethylated in both sperm and ES cells and these trended strongly towards the active groups (Table S4E and S4H
). L1PA subfamilies are considered the most active in the human genome (Khan et al., 2006
), and the youngest of these (L1HS and L1PA2) were among the very few subfamilies enriched for hypomethylation in ES cells relative to sperm. Specifically in sperm, we noted hypomethylation of several other L1 families (e.g. L1PA4-16 and L1M3).
Among LTR subfamilies, sperm HMRs were enriched for ERV elements (Table S4C
). Hypomethylated copies exist either as part of full-length provirus-like elements or as solo LTRs, with the greatest enrichment for LTRs belonging to “class I” elements (e.g. LTR12; see Table S4D and S4G
). The few LTR subfamilies with more hypomethylated copies in ESC than sperm are all recently derived, human-specific ERVs (e.g., LTR5 and 13 and HERVH LTR7).
Sperm hypomethylation has been previously reported for primate Alu elements (Kochanek et al., 1993
; Liu et al., 1994
), and our data revealed several Alu subfamilies with differential methylation in sperm and ES cells, e,g., the AluY subfamily (Tables S4F and S4I
). The more precisely defined AluYa5 (human) and AluYd4 (chimp) showed extreme enrichment for hypomethylation in sperm.
Species-specific methylation of the SVA element
SVA elements showed strong, species-specific differences in methylation in human and chimp sperm (). SVAs are composite elements consisting of hexameric repeats, an Alu-like region, a VNTR (variable number of tandem repeats) region and a SINE-R (Shen et al., 1994
). SVA elements were active in the most recent common ancestor of chimp and human (Mills et al., 2006
), and multiple examples of neoinsertions suggest that they still cause genomic rearrangements and disease in human (Ostertag et al., 2003
Among the SVAs, the youngest subfamilies, D-F (Wang et al., 2005
), showed the greatest frequency of hypomethylation in human sperm (). Notably, these have a higher CpG density than do older subfamilies. 358 SVA insertions can be assigned as high-confidence orthologs between human and chimp, which remain highly similar in sequence (see supplement
). Methylation through these element copies was distributed through the full range from very low to very high average methylation, with two modes near 20% and 80% methylation (). In human sperm, 35% of orthologous SVAs had a methylation level below 50%. In sharp contrast, only 6% of copies fell below 50% methylation in chimp. We also annotated 921 SVA elements that appear to represent new insertions occurring after the human-chimp divergence (Mills et al., 2006
). 852 (93%) of these were hypomethylated in sperm compared with only 62 (7%) in ES cells (). Considered together, our data indicate that SVA elements have come under different degrees of epigenetic control in the human and chimp lineages.
Divergent methylation of SVA elements between human and chimp
Many SVA insertions occur at or around promoters (Lander et al., 2001
; Mikkelsen, 2005
), and these elements often have a CpG content high enough to fit the traditional definition of a CpG island. Given their properties, SVA elements have the potential to introduce differential, species- and cell type-specific methylation near genes that may be relevant for their regulation. exemplifies such a situation where, in the case of TLR1, no HMR exists near the promoter in chimp sperm or human ES cells, but one is contributed in human sperm by a nearby SVA element. Although sperm are largely transcriptionally silent, similar HMRs are expected to exist in transcriptionally active developing germ cells (data not shown).
Signatures of selection accompany differential methylation between primates
CGIs are the most well known evolutionary signature of vertebrate DNA methylation. Their original definition required a CpG observed-to-expected (o/e) ratio of at least 0.6. Although the full set of HMRs in human sperm and ESCs did not reach this empirical cut off, they did pass the 0.4 benchmark used by Weber and colleagues () (Weber et al., 2007
). In general promoter-associated HMRs did surpass the 0.6 o/e cut off in both sperm and ESC.
Sequence features associated with methylome divergence
The differences in CpG density in nested and extended HMRs () imply distinct CpG depletion pressure in these regions. Average CpG composition genome-wide is ~0.2 o/e, but reaches ~0.35 in extended HMRs and 0.68 in nested HMRs. We analyzed sperm-specific and ESC-specific HMRs in an attempt to decompose the CpG depletion pressure exerted by the two methylomes. The ESC-specific HMRs reached only 0.35 o/e CpG composition, while the sperm-specific HMRs reached a CpG composition of 0.5.
The life cycle of a germ cell can be separated into two components. The first is the time from fertilization to the time that somatically derived, primordial germ cells (PGCs) reach the genital ridge. Second is the time during which the PGC develops into a mature germ cell, which contributes to the zygote. The latter period generally spans from birth to the end of the reproductive life of the animal. Our data suggest a model in which methylation patterns present during both of these intervals shapes genomic CpG distributions but indicate a greater influence of methylation profiles during germ cell maturation ().
We sought to measure the degree to which differential methylation could lead to CpG decay over the ~6My of divergent evolution separating human and chimp. We focused on regions that qualified as HMRs in either chimp or human. since These regions could have either lost methylation along one lineage or gained methylation along the other. For a given regional methylation level, we measured CpG decay as the proportion of regions having lost more than 5% of inferred ancestral CpGs (using gorilla as outgroup) and plotted the relationship between average methylation and decay rate (). The correlation between regional methylation level and CpG decay was extremely strong for both human and chimp. These results indicate that CpG decay is appreciable as a function of methylation even over relatively brief evolutionary periods.
This observation predicted that we might see signatures of selective pressure preventing erosion of some CpGs that are maintained despite germline methylation. To address this question, we analyzed segregating sites at CpG dinucleotides using data from the HapMap 3 project (CEU population; (Altshuler et al., 2010
)). CpGs were treated symmetrically, so each derived allele at these sites can be classified as A, G or T. As expected, segregating sites with T as the derived allele represent the vast majority.
We generated frequency spectra for each derived allele nucleotide with sites classified according to their methylation level in sperm (Fig. S4
). As methylation levels increased, derived allele frequencies shifted towards the low ends of the spectra (Fig. and S4
). This shift was observed not only for derived TpG alleles, which could be explained by an extreme bias in mutation rate, but also for ApG and GpG derived alleles. One interpretation of these findings is that selection is on average weaker at individual CpG sites with lower sperm methylation. Such an interpretation is consistent with recent findings of Cohen et al. (Cohen et al., 2011
) who used sophisticated evolutionary models to posit that selection for high CpG content is not a significant factor contributing to maintenance of CGIs in the genome.
The strong connection between HMRs and gene promoters suggests that the evolutionary gain or loss of HMRs may be associated with changes in selective pressure on functional regulatory regions. To investigate this possibility we analyzed sequence divergence in HMRs, focusing on those that are human- or chimp-specific. Since these differentially methylated regions will have different rates of C-to-T transitions, we counted changes from the inferred ancestor only at non-CpG sites. Genomic intervals differing by more than 1% relative to the inferred ancestor were counted as having divergent sequences.
Only 10% of HMRs shared between human and chimp showed divergence from the ancestral sequence at non-CpG sites (). At chimp-specific HMRs, 15% of human sequences and 19% of chimp sequences diverged from the inferred ancestor. At human-specific HMRs, 22% of human sequences diverged and 18% of chimp sequences diverged. These results indicated that changes in methylation state between human and chimp are associated with accelerated non-CpG sequence divergence. Interestingly, in both cases the species with lower methylation state had a greater rate of divergence, which is consistent with adaptation at novel regulatory regions as a driver for these changes.
We only identified 104 promoters that are hypomethylated in human but not in chimp sperm and only 52 genes with differential promoter methylation in the opposite orientation. Neither set showed significant enrichment for any ontology category. However, analysis of genes with promoters within 10kb of an identified human-specific sperm HMR revealed a strong enrichment for neuronal functions (see Tables S5
). The HTR3E gene, a serotonin receptor subunit, is an example of such a gene, whose promoter is selectively hypomethylated in human sperm ().