Epigenomic factors, including DNA methylation, histone modifications and non-coding RNAs, are an integral link between DNA sequence variation and subsequent transcriptional output modulation. Increased understanding of these elements will be crucial to obtain a coherent functional assessment of the large number of non-coding DNA variants identified from contemporary whole genome sequencing case-control studies, and to delineate their developmental and tissue-specific features and constraints.
This study investigated the DNA methylation state of genomic loci with evidence for involvement with T2D and found an association with the susceptibility locus that resides within the
FTO gene. This region was identified in a T2D GWA Study, with a common haplotype located across intron 1, exon 2 and intron 2 captured equivalently by various SNPs, subsequently shown to be mediating its disease susceptibility effect through obesity
[22],
[33],
[34],
[35]. Murine models initially lent support to
FTO itself being the causal gene in the locus; an obesity protective phenotype was observed in the
fto knock-out mouse
[36] and a similar but lesser effect in the hypomorphic
fto missense I367F mutant
[37]. A loss-of-function mutation in
FTO was recognised in humans as an autosomal-recessive lethal syndrome (OMIM #612938) with a phenotype of multiple malformations and severe growth retardation, with non-obese heterozygote relatives
[38]. Attempts to identify expression differences in
FTO in human skeletal muscle and subcutaneous adipose tissue with respect to risk SNPs have been unsuccessful to date
[39] and no evidence of allele-specific expression in immortalised lymphoblastic cell lines has been established
[40]. Subsequently Meyre
et al. identified further
FTO heterozygous, loss-of-function mutations in obese as well as lean subjects, further clouding
FTO's causative role
[41] and illustrating the complexity of interpreting the function of this dioxygenase in energy balance
[42].
Identification of stable epigenetic modifications may aid the exploration of genotype-phenotype interactions in complex diseases. DNA methylation can exert its functional influence through a range of different processes, including direct effects on transcription factor binding, or indirectly via changes to post-translational histone packaging and modulation of chromatin conformation and function
[43]. The ability to detect these epigenetic influences will depend on their direct association with genotypic factors and will therefore range from obligatory to stochastic
[44]. Thus we have utilised the power of the large scale GWA studies to look for genotype-methylation state associations.
We have identified a methylation association with the strongly replicated disease haplotype of the
FTO gene, tagged by SNP rs8050136. Therefore the association identified is with the individuals' genotypes not their particular phenotype status. We confirmed and validated these results at single-base resolution within the contributory signal using bisulphite pyrosequencing. In doing so, we identified that the methylation signal was genetically led by the phase of three CpG-creating SNPs in LD within a narrow 900 bp window peak. We did not find evidence of a
cis-methylation or hepitype effect
[15],
[16],
[17]. The ~10% change identified in our MeDIP experiment is likely to be an underestimate in this region as BATMAN calculations are based upon the reference genome. Additional work in murine strains also supports the inference that inherited genetic variability is a major determinate on epigenetic variability
[45]. Zhang
et al. identified ASM with as much as 85% difference between alleles across CpG Islands (CGIs)
[46]. However, it is not clear how much of this effect is driven by CpG-creating SNPs or the additional influence on surrounding CpG methylation, other genetic polymorphism effects on the methylation machinery, or a combination of all of these factors. Shoemaker
et al. have recently observed ASM in 23–37% heterozygous SNPs in differing human cell lines, with 38–88% of these regions dependent on CpG-SNP variation
[47]. We have termed our findings of a genetically-driven difference in methylation ability, detected over kilobases, Haplotype-Specific Methylation (HSM), to differentiate this state from epigenetic ASM where methylation will vary between alleles at individual non-SNP CpGs, or a hepitype where genetic variability combines with ASM within a haplotype. In a similarly designed, but larger study, the
FTO HSM region would be identified as a direct T2D-DMR.
We did not find an association between risk and non-risk haplotypes in the other T2D association LD blocks in this integrated analysis, however this does not exclude the possibility of more subtle effects in these regions. Although the limitation of the MeDIP technique is that is does not enable the evaluation of individual cytosines, it does allow more broad-scale haplotype methylation differences to be identified, such as those driven by CpG-SNPs
[14],
[47]. These genetic drivers of ASM can be identified in easy accessible tissue, which can then be followed up in the most appropriate disease-related tissue to examine for any surrounding CpG modulation.
Recent work has hypothesized that the lack of evidence for
FTO expression modulation by susceptibility SNPs may be due to this region having effects on distal surrounding genes including
IRX3 [30] and
RBL2 [48]. Ragvin
et al. used comparative genomics to identify HCNEs and overlying genomic regulatory blocks, and proposed that enhancers in the first intron susceptibility region exert long range regulatory effects on expression of the developmental transcription factor gene
IRX3, Iroquois Homeobox 3 located in a gene desert ~170 kb 3′ of
FTO [30]. Enhancers are located predominantly in intergenic or intronic regions and may act as regulators of gene transcription over long distances
[49], have an activating function on chromatin structure
[50], are sensitive to CpG methylation
[51] and have a important role in developmental processes
[43],
[52],
[53]. Of two HCNE-containing elements with enhancer effects implicated with a metabolic role, one is located within the 7.7 kb methylation window (chr16:52,371,700–52,379,399). Higher regional methylation of this enhancer, caused through increased methylatable CpG sites, on the susceptibility haplotype may impede its action in terms of enhancer-specific transcription factor recruitment, subsequent chromatin DNA looping, enhancer-promoter interaction and enhanceosome formation
[51] with subsequent down-regulation on
IRX3 expression. Additionally this HCNE is just over 200 bp 5′ of the 900 bp window (chr16:52,378,500–52,379,399). Therefore the 900 bp peak is within a 2 kb ‘shore’ region of this enhancer and it may be possible that these ‘Enhancer shores’ act in a similar fashion to ‘CpG Island shores’ (2 kb either side of Islands) and regions of low CpG density, which have been identified with more dynamic DNA methylation effects
[54]. Our ChIP-chip data from skeletal muscle indicate a H3K4me1 signature within the 7.7 kb region, as well ChIP-Seq data from cell lines confirms a 5K block of H3K4me1 enrichment completely encapsulated here (
http://bioinformatics-renlab.ucsd.edu/enhancer) (Supplementary
Figure S4)
[50] and a recent examination of histone modifications in pancreatic islets also identified this enhancer marker 1.2 kb wide over rs8050136 within the region
[55]. No evidence of allele-specific expression was identified from different sources; therefore whilst the DNA methylation state of the enhancer-including haplotype may be observed in all tissues, due to being predominately genetically driven due to CpG-SNPs, the possible outcome of effect on expression may only be seen in precise cell types at a precise time and/or environmental-specific manner
[40].
Despite the interesting genomic overlap between the 7.7 kb region of HSM and the HCNEs identified by Ragvin
et al., the phylogenetically distant zebrafish knock-down of the orthologous
irx3a has reduced pancreatic β insulin- and α glucagon-secreting cells and increased ghrelin-producing ε cells. The role of
IRX3 in pancreatic development is in conflict with the knowledge that most obese individuals display an increase in pancreatic beta cell mass as a compensatory response to the peripheral insulin resistance that co-exists
[56] and the knowledge that most previously-identified obesity genes are involved in neuronally-mediated central energy balance
[42]. However the evidence of functional enhancer capability of this conserved non-coding region is the crucial finding, as its downstream target may have changed or evolved to take on a more complex role over time. It is possible that
IRX3's role in neurodevelopment of the posterior forebrain in mammals, including the hypothalamus, may in fact be critical
[54],
[57],
[58]. Redressing the previous evidence in favour of
FTO causative role, the
fto knock-out mouse targeted exon 2 and 3, only ~1 kb into intron 1
[36], therefore did not remove any of the putative enhancers. If
FTO is involved in the phenotype, the observed methylation change could affect expression by changing gene-body methylation or influencing the isoform balance by modifying exon inclusion or exclusion
[59].
Loss or gain of CpG dinucleotides over evolutionary time leading to a genetically-driven variation in DNA methylation and subsequent higher variance has been proposed to be a major driver in evolutionary adaption as well as disease susceptibility
[60],
[61]. The loss of a CpG site, by deamination of methylated cytosine, can not only can have considerable influence on regional methylation
[17], but is also an important mechanism in the formation of transcription factor binding sites
[62], such as for p53 that has a role in regulation of insulin resistance
[63]. Acquiring the ability to methylate by a cluster of in-phase alleles within a regulatory domain could also be selected for, if functionally significant.
Trans-ethnic studies, especially in genetically diverse African-derived populations, can be informative in narrowing down the location of a causative variant in regions of strong LD in the initial study population
[64]. The SNP rs3751812 was the only to confer significant risk (T allele as on CEU susceptibility haplotype, ) in an African-American study
[65] and is in a LD block in two African HapMap populations overlapping the 7.7 kb window.
In conclusion, we have identified variant-CpG restricted Haplotype-Specific Methylation within the
FTO obesity susceptibility locus. Previous association SNP findings were equivalent across this region and therefore could be consistent with a difference in CpG methylation capability being the driving factor that is inherent to this haplotype. To our knowledge, this is the first identification of any association with a measureable methylation difference within a GWA Study SNP association locus. Detailed analysis of the methylation signal and pyrosequencing validation indicate the genetic phase of CpG-creating SNPs are a strong influence in this finding, indicating LD with CpG-creating SNPs as a relevant consideration in genomic methylation studies. A region of 7.7 kb drove the most significant haplotype-specific methylation, and overlies a region containing putative enhancer sequence. Our observation of increased methylation ability at this enhancer region may contribute towards reducing the efficiency of this regulatory element
[51]. Thus the investigation of epigenetic variation may be very useful in narrowing down significant regions in large LD association blocks and proposing functional hypotheses for subsequent follow-up from GWA studies.