|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: CGB CML VKR PD MIM SB GAH. Performed the experiments: CGB SF VKR PA. Analyzed the data: CGB GAW AET PA. Contributed reagents/materials/analysis tools: CGB CML GAW AET ES TAD IP IMM JM RP NN TMF ATH. Wrote the paper: CGB SF SB GAH.
Recent multi-dimensional approaches to the study of complex disease have revealed powerful insights into how genetic and epigenetic factors may underlie their aetiopathogenesis. We examined genotype-epigenotype interactions in the context of Type 2 Diabetes (T2D), focussing on known regions of genomic susceptibility. We assayed DNA methylation in 60 females, stratified according to disease susceptibility haplotype using previously identified association loci. CpG methylation was assessed using methylated DNA immunoprecipitation on a targeted array (MeDIP-chip) and absolute methylation values were estimated using a Bayesian algorithm (BATMAN). Absolute methylation levels were quantified across LD blocks, and we identified increased DNA methylation on the FTO obesity susceptibility haplotype, tagged by the rs8050136 risk allele A (p=9.40×10−4, permutation p=1.0×10−3). Further analysis across the 46 kb LD block using sliding windows localised the most significant difference to be within a 7.7 kb region (p=1.13×10−7). Sequence level analysis, followed by pyrosequencing validation, revealed that the methylation difference was driven by the co-ordinated phase of CpG-creating SNPs across the risk haplotype. This 7.7 kb region of haplotype-specific methylation (HSM), encapsulates a Highly Conserved Non-Coding Element (HCNE) that has previously been validated as a long-range enhancer, supported by the histone H3K4me1 enhancer signature. This study demonstrates that integration of Genome-Wide Association (GWA) SNP and epigenomic DNA methylation data can identify potential novel genotype-epigenotype interactions within disease-associated loci, thus providing a novel route to aid unravelling common complex diseases.
Type 2 diabetes (T2D) and obesity are complex diseases with polygenic susceptibility. Recent genome wide association (GWA) studies have provided significant insight towards gene discovery and, to date, around 30 common variant loci have been associated with T2D , . Despite these important advances, known genetic variants explain <10% of the heritable component of disease, and little is known about their contribution to aetiology. Ongoing studies are attempting to explain this ‘missing heritability’ in complex diseases via the detection of rare and structural variants, interactions between discovery SNPs and causal variants, and identification of associated stable epigenetic modifications .
Environmental determinants of these complex diseases are well-characterised by a wealth of epidemiological literature  but much less is understood about their mechanistic role or interaction with genetic susceptibility. Epigenetic mechanisms regulate gene expression, may be induced via environmental agents and can be inherited through somatic or germline pathways . Genome-wide epigenetic reprogramming occurs during gametogenesis, implicating early fetal development as a period susceptible to this environmental influence. Strong evidence in humans and rodents implicates in utero fetal nutritional deficiency with adult-onset chronic diseases including T2D , . Furthermore, differentially methylated regions (DMRs) of genes of relevance to T2D have been identified in individuals prenatally exposed to famine . Epigenetic variability, set at an extremely early development stage, permeates through all three subsequently developing germ layers , thereby facilitating possible identification in mesoderm-derived blood.
An epigenomic approach may provide insight into the genetic, epigenetic and environmental factors underlying the aetiology of complex diseases. The potential for interaction between these factors is increasingly understood, however the relationship between specific epigenetic modifications and genomic features, whether independent or dependent, is poorly explained in the context of disease and phenotype. In contrast, the role of aberrant epigenetic marks is well established at single gene loci with rare imprinted disorders  and as well in cancer genomes with global hypomethylation and locus hypermethylation . Recent investigations in other complex diseases, such as Systemic Lupus Erythematosus with discordant monozygotic twins  and Type 1 diabetes with imprinted common susceptibility , have been successful. A range of new technology platforms and bioinformatic tools now exist with which to test hypotheses around these putative interactions. The developmental and environmental origins of T2D suggest a potential role of programmed and/or stochastic epigenetic processes in determining disease risk that could arise during early embryonic development and are identifiable in a range of tissues. We hypothesize that genotype-epigenotype interactions may underlie the aetiopathogenesis of T2D, as a complex disease, and use an integrated platform with which to investigate their occurrence. We applied methylated DNA immunoprecipitation (MeDIP) to fragmented genomic DNA sampled from participants with and without T2D-risk genotypes. Enriched fragments were hybridised against total input DNA on a Nimblegen 385k array, encompassing known regions of genetic susceptibility to T2D (regions identified from previous locus and genome-wide association analyses; monogenic T2D and obesity genes and related disorders (as of 2008) (see Supplementary Table S1 for full list). With each array tiling >21 Mb, a total of >1.2 Gb of sequence was subsequently statistically assessed for DNA methylation status. By combining genotype with methylation status, an investigation for allele-  or haplotype-specific effects and hepitypes , ,  could be made to understand the potential sources of methylation variability at these genomic locations.
We determined DNA methylation differences on DNA derived from peripheral whole blood from 60 Caucasians, comprising 30 with Type 2 diabetes (UK Warren 2 T2D case sample) and 30 without T2D (Exeter Family Study of Childhood Health). An all-female set was used to remove any confounding effect of sex.
MeDIP-chip was performed on individual samples to quantify DNA methylation. The relationship between genetic haplotypes and DNA methylation at the genomic regions covered by the microarray was estimated by summating the methylation scores, estimated by the BATMAN algorithm, across each T2D association SNP LD block (as defined by Gabriel et al. ), and calculating an average ‘methylation load’ for all 60 individuals. Individuals then were grouped into three genotypic sets by their respective genotype for each T2D-association SNP (or tagging SNP in perfect LD with association SNP). Kruskal-Wallis and Linear Regression analyses were performed to determine the relationship between genotype and average methylation score. Uncorrected p-values for the Linear Regression analysis are shown in Table 1.
In our integrated epigenomic-genomic analysis, the large 46 kb LD block of the FTO gene (Supplementary Figure S1) was the only locus to reach nominal statistical significance by Kruskal-Wallis analysis (p=0.014). We performed a permutation analysis, in which the genotype assignment and observed methylation scores were shuffled 10,000 times, achieving a significant empirical p-value of 0.012. Subsequent analysis by Linear Regression was highly significant with p=9.40×10−4, and empirical p-value calculation by 10,000 permutations of 1.0×10−3. Age was included as a factor in this later analysis, as methylation changes due to age are significant in certain loci , , , but was excluded as a significant confounder (p=0.676). We also performed linear regression analysis for the other tested loci and did not find any association between age and methylation (data not shown). FTO SNP rs8050136 risk allele A homozygotes were shown to have a higher average level of methylation of 0.531, with heterozygotes at an intermediate level at 0.510 and G homozygotes with 0.497 (Table 2). Of note, although the initial association for FTO with BMI was identified by SNP rs9939609 , numerous subsequent studies have used different SNPs, all of which capture the same common haplotype (HapMap CEU frequency =0.425 (Figure 1)). This LD Block encompasses just under half of the first intron, exon 2 and the beginning of the second intron of the major FTO isoform (Supplementary Figure S1).
To investigate whether this methylation difference was being driven by a distinct region within the 46 kb LD block, and if so, to isolate its location, a sliding windows analysis utilising BATMAN methylation scores (estimated in 100 bp windows) was performed. The sliding windows analysis uses a computational approach to move different window sizes across each LD block, starting with a window size of 1 (100 bp), increasing iteratively by one on each pass, up to the maximum window number of 334 (Supplementary Videos S1 and S2). A central peak of methylation difference was most prominent in the 9 window (900 bp wide, at windows 161–169, chr16:52,378,500-52,379,399) analysis with Kruskal-Wallis p=5.630×10−5, empirical p=<1×10−5 with 10,000 permutations and Linear Regression analysis p=1.94×10−5 (Figure 2). Methylation averages were again highest for the A homozygotes (AA 0.603, AC 0.564, CC 0.507). In addition, plotting the slope of linear regression analysis for the 9 window across the LD block indicated that the p-value peaks all co-locate with the same negative regression slope peaks. From these findings, we infer that all of these regions could be contributing, with varying strength, to the increased methylation signal (Supplementary Figure S2).
The strongest association between methylation and genotype was identified with a larger window size of 60 (7.7 kb, after exclusion of repeat regions, - windows 110–169, chr16:52,371,700–52,379,399) with Kruskal-Wallis p=8.69×10−6, empirical p=<1×10−5 with 10,000 permutations, Linear Regression analysis p=1.33×10−7 (Figure 3) and age p=0.444. This 7.7 kb window becomes most significant just at the point when it encapsulates the 900 bp window at its downstream edge. The same trend in the average methylation scores was again identified (AA 0.529, AC 0.503, CC 0.478, Figure 4). The sliding windows analysis was also performed across the other T2D association LD blocks, but none showed any significant increasing p-values with window sizes greater than 10 (1 kb) (data not shown).
The findings at the FTO LD block, using an integrated epigenomic-genomic analysis, give a strong estimate of the relationship between haplotype and DNA methylation level. Since this approach is calculated by statistically-derived methylation estimates in 100 bp windows, it is unable to define the actual methylation state neither of individual CpG sites nor their relationship to SNPs. We therefore proceeded to investigate the genetic and epigenetic architecture of the 900 bp peak window to explore the exact methylation differences driving this signal.
The central 9 window peak of 900 bp contains only seven CpG sites in the reference sequence, although of the eight common SNPs within the same region, three create or abrogate additional CpG sites (being YpG or CpR SNPs, IUPAC: Y=C or T, R=G or A). These SNPs (rs7206629, rs7202116 and rs7202296) were all included in the AFD-EUR European panel (dbSNP). In this dataset they all possess identical minor allele frequencies (MAFs) implying high likelihood of residing on the same haplotype (Table 3). Only rs7202116 is included in the HapMap CEU data set, and the methylated state allele, G of a CpG, is present on the susceptibility haplotype (Figure 1), and is in absolute and near-perfect LD with the susceptibility SNPs identified (Table 4). Therefore those with the obesity-susceptibility haplotype would possess ten CpG sites capable of methylation, compared with seven, within this peak region.
Methylation differences identified between the susceptibility and non-susceptibility haplotypes were validated and quantified using bisulphite-pyrosequencing on 80 samples, using the initial 60 plus an additional 20 samples from the same studies. Within the 900 bp window, four CpGs were investigated, located at the downstream end of the 7.7 kb peak, and included one of the SNP dependent CpGs (rs7202296). An additional six CpGs that lie directly beneath the broad 7.7 kb peak, including one SNP dependent CpG (rs11075988) were examined. The common SNP distribution within the 7.7 kb region, which includes the 900 bp peak, is detailed in Supplementary Figure S4.
Methylation levels at the cytosine adjacent to rs7202296 (CpR) were, as expected, completely dependent on the presence of its CpG conformation, with average methylation levels of 87% for A homozygotes, 55.7% for AC heterozygotes and 11.5% for C homozygotes, with respect to rs8050136 genotype (Table 5). On examination of the sequence information from the pyrosequencing data, due to non-perfect LD between rs8050136 and rs7202296, one individual homozygous for the A rs8050136 allele was in fact homozygous CpA and another was heterozygote. Excluding these two individuals, the methylation level was 98.6% for those who were truly genetically CpG at rs7202296. The additional cytosines examined within this window did not show differences that could be accounted for by allele-specific methylation (ASM) or the presence of a hepitype, whereby a cis-methylation effect on the surrounding non-polymorphic CpG's methylation state would have been observed . Of note is the ~50% methylation values of the cytosine at 52,379,254, implying that whilst no evidence of ASM, there may be parent-of-origin specific methylation imprinting of this CpG, although FTO is not known to be imprinted . The three in-phase CpG-creating SNPs could easily account on their own for the ~10% difference seen in methylation levels in the 900 bp window peak identified in the initial MeDIP-array. The additional CpGs beneath the peak of Window 60, also showed methylation changes associated with a CpR site at rs11075988. At this SNP, the allele that abrogated the ability to methylate was in LD with the rs8050136 susceptibility genotype A, i.e. it was on the alternate phase. Decreased methylation of this one SNP is therefore vastly outweighed by the rest of the haplotype. The additional six CpGs showed methylation percentages for AA, AC and CC genotypic groups respectively of 77.6, 72.8, 76.7 (16:52377247), 92.7, 92.0, 90.8 (16:52377254), 9.5, 38.4, 69.0 (16:52377271, rs11075988 dependent), 96.7, 98.5, 98.8 (16:52377279), 82.6, 78.4, 81.1 (16:52377282), and 87.4, 85.9, 84.9 (16:52377308).
Finally, we used the same bisulphite-pyrosequencing assays to determine whether methylation differences within the 900 bp window peak were displayed in tissues which show high FTO expression and have a role in central energy balance. We studied 14 healthy brain samples (4 hypothalamus and 10 prefrontal cortex) and replicated the same findings at the CpG-creating SNPs described above. We did not see any allele-specific effects on the surrounding CpG sites.
Examination of the three critical CpG-creating SNPs within the narrow peak (rs7206629, rs7202116 and rs7202296) revealed that the latter two have transitioned from CpA to CpG, gaining the ability to methylate from the ancestral state (identical in Pan Troglodytes, Pongo pygmaeus and Macaca mulatta, Table 6). This mutational event is rarer than the reverse transition by deamination of the methylated cytosine , and identical frequency of both SNPs are seen in three ethnicity panels (Table 3). Analysis of dbSNP data (release 129) indicated that whilst ~60% of 193,133 validated SNPs on chromosome 16, with known ancestral state, occur at a CpG site, only ~12% of them have gained the ability to methylate since their common ancestor and less than 0.5% lie within at least 75 bp of each other, as per rs7202116 and rs7202996. Furthermore, rs7202116 is located within a short 49 bp mammalian evolutionarily constrained element (Chr16:52,379,085–52,379,132) as identified by Genomic Evolutionary Rate Profiling (GERP)  of 31 eutherian mammals.
We analysed the 420 HapMap Phase II phased SNP haplotypes for CEU, YRI and ASN and found that this region comprises 60 unique haplotypes. A Median-Joining Network analysis ,  (Supplementary Figure S3) was performed, including an inferred ancestral haplotype, which indicated that the susceptibility haplotype is closer to, or less mutational events away from, this ancestral haplotype. This finding suggests that the ability to methylate was gained early, is maintained on the susceptibility haplotype, and has been lost in the younger haplotypes present in humans. However no haplotypic evidence of selection was apparent (XP-EHH  or iHS , data not shown), although these tests are more powerful for more recent mutational events.
A ChIP-chip study, using the same T2D array as part of a larger experiment, investigated normal skeletal muscle cells and identified evidence of enhancer activity within the 7.7 kb window, with the classic signature of H3K4me1 and no other strong signal (Supplementary Figure S4). These findings are consistent with those of Ragvin et al. who have recently identified long range enhancers within the first intron of FTO that influence the expression of IRX3, located ~470 kb downstream of this region. . One of the two critical Highly Conserved Noncoding Elements (HCNEs) proposed to affect IRX3 expression by enhancer activity that overlaps with risk variants, designated as element 6 by Ragvin et al. is located at Chr16:52,377,745–52,378,287, and resides centrally within the broad 7.7 kb peak (Figure 3). This 542 bp element is also only 213 bps 5′ of the 900 bp window.
Direct analysis of allele-specific expression of FTO mRNA is not possible as no common coding polymorphisms exist within the LD block, or rest of the gene. Examining the Sanger GENEVAR HapMap CEU expression data identified no significant differences with respect to FTO haplotype status in FTO or IRX3 expression from lymphoblastoid cell lines in children or adults  (data not shown). We also used an available RNA-Seq dataset derived from the cerebellum of six male individuals , but were unable to identify allele-specific expression of FTO (by attempting to utilise any pre-mRNA intronic reads that may be present) or IRX3. This was due lack of informative SNPs in former and low coverage across the latter.
We also compared individuals with and without T2D, as a case-control analysis, across all the assayed regions of the array, (T2D and obesity related phenotype candidates, 1q linkage region and imprinted regions). We did not identify any disease-related differently methylated regions (DMRs) of statistical significance post false discovery rate correction (data not shown). The average of the variance and standard deviation in methylation level for all 60 individuals across the entire MeDIP data set was 0.02 and 0.14, respectively.
Epigenomic factors, including DNA methylation, histone modifications and non-coding RNAs, are an integral link between DNA sequence variation and subsequent transcriptional output modulation. Increased understanding of these elements will be crucial to obtain a coherent functional assessment of the large number of non-coding DNA variants identified from contemporary whole genome sequencing case-control studies, and to delineate their developmental and tissue-specific features and constraints.
This study investigated the DNA methylation state of genomic loci with evidence for involvement with T2D and found an association with the susceptibility locus that resides within the FTO gene. This region was identified in a T2D GWA Study, with a common haplotype located across intron 1, exon 2 and intron 2 captured equivalently by various SNPs, subsequently shown to be mediating its disease susceptibility effect through obesity , , , . Murine models initially lent support to FTO itself being the causal gene in the locus; an obesity protective phenotype was observed in the fto knock-out mouse  and a similar but lesser effect in the hypomorphic fto missense I367F mutant . A loss-of-function mutation in FTO was recognised in humans as an autosomal-recessive lethal syndrome (OMIM #612938) with a phenotype of multiple malformations and severe growth retardation, with non-obese heterozygote relatives . Attempts to identify expression differences in FTO in human skeletal muscle and subcutaneous adipose tissue with respect to risk SNPs have been unsuccessful to date  and no evidence of allele-specific expression in immortalised lymphoblastic cell lines has been established . Subsequently Meyre et al. identified further FTO heterozygous, loss-of-function mutations in obese as well as lean subjects, further clouding FTO's causative role  and illustrating the complexity of interpreting the function of this dioxygenase in energy balance .
Identification of stable epigenetic modifications may aid the exploration of genotype-phenotype interactions in complex diseases. DNA methylation can exert its functional influence through a range of different processes, including direct effects on transcription factor binding, or indirectly via changes to post-translational histone packaging and modulation of chromatin conformation and function . The ability to detect these epigenetic influences will depend on their direct association with genotypic factors and will therefore range from obligatory to stochastic . Thus we have utilised the power of the large scale GWA studies to look for genotype-methylation state associations.
We have identified a methylation association with the strongly replicated disease haplotype of the FTO gene, tagged by SNP rs8050136. Therefore the association identified is with the individuals' genotypes not their particular phenotype status. We confirmed and validated these results at single-base resolution within the contributory signal using bisulphite pyrosequencing. In doing so, we identified that the methylation signal was genetically led by the phase of three CpG-creating SNPs in LD within a narrow 900 bp window peak. We did not find evidence of a cis-methylation or hepitype effect , , . The ~10% change identified in our MeDIP experiment is likely to be an underestimate in this region as BATMAN calculations are based upon the reference genome. Additional work in murine strains also supports the inference that inherited genetic variability is a major determinate on epigenetic variability . Zhang et al. identified ASM with as much as 85% difference between alleles across CpG Islands (CGIs) . However, it is not clear how much of this effect is driven by CpG-creating SNPs or the additional influence on surrounding CpG methylation, other genetic polymorphism effects on the methylation machinery, or a combination of all of these factors. Shoemaker et al. have recently observed ASM in 23–37% heterozygous SNPs in differing human cell lines, with 38–88% of these regions dependent on CpG-SNP variation . We have termed our findings of a genetically-driven difference in methylation ability, detected over kilobases, Haplotype-Specific Methylation (HSM), to differentiate this state from epigenetic ASM where methylation will vary between alleles at individual non-SNP CpGs, or a hepitype where genetic variability combines with ASM within a haplotype. In a similarly designed, but larger study, the FTO HSM region would be identified as a direct T2D-DMR.
We did not find an association between risk and non-risk haplotypes in the other T2D association LD blocks in this integrated analysis, however this does not exclude the possibility of more subtle effects in these regions. Although the limitation of the MeDIP technique is that is does not enable the evaluation of individual cytosines, it does allow more broad-scale haplotype methylation differences to be identified, such as those driven by CpG-SNPs , . These genetic drivers of ASM can be identified in easy accessible tissue, which can then be followed up in the most appropriate disease-related tissue to examine for any surrounding CpG modulation.
Recent work has hypothesized that the lack of evidence for FTO expression modulation by susceptibility SNPs may be due to this region having effects on distal surrounding genes including IRX3  and RBL2 . Ragvin et al. used comparative genomics to identify HCNEs and overlying genomic regulatory blocks, and proposed that enhancers in the first intron susceptibility region exert long range regulatory effects on expression of the developmental transcription factor gene IRX3, Iroquois Homeobox 3 located in a gene desert ~170 kb 3′ of FTO . Enhancers are located predominantly in intergenic or intronic regions and may act as regulators of gene transcription over long distances , have an activating function on chromatin structure , are sensitive to CpG methylation  and have a important role in developmental processes , , . Of two HCNE-containing elements with enhancer effects implicated with a metabolic role, one is located within the 7.7 kb methylation window (chr16:52,371,700–52,379,399). Higher regional methylation of this enhancer, caused through increased methylatable CpG sites, on the susceptibility haplotype may impede its action in terms of enhancer-specific transcription factor recruitment, subsequent chromatin DNA looping, enhancer-promoter interaction and enhanceosome formation  with subsequent down-regulation on IRX3 expression. Additionally this HCNE is just over 200 bp 5′ of the 900 bp window (chr16:52,378,500–52,379,399). Therefore the 900 bp peak is within a 2 kb ‘shore’ region of this enhancer and it may be possible that these ‘Enhancer shores’ act in a similar fashion to ‘CpG Island shores’ (2 kb either side of Islands) and regions of low CpG density, which have been identified with more dynamic DNA methylation effects . Our ChIP-chip data from skeletal muscle indicate a H3K4me1 signature within the 7.7 kb region, as well ChIP-Seq data from cell lines confirms a 5K block of H3K4me1 enrichment completely encapsulated here (http://bioinformatics-renlab.ucsd.edu/enhancer) (Supplementary Figure S4)  and a recent examination of histone modifications in pancreatic islets also identified this enhancer marker 1.2 kb wide over rs8050136 within the region . No evidence of allele-specific expression was identified from different sources; therefore whilst the DNA methylation state of the enhancer-including haplotype may be observed in all tissues, due to being predominately genetically driven due to CpG-SNPs, the possible outcome of effect on expression may only be seen in precise cell types at a precise time and/or environmental-specific manner .
Despite the interesting genomic overlap between the 7.7 kb region of HSM and the HCNEs identified by Ragvin et al., the phylogenetically distant zebrafish knock-down of the orthologous irx3a has reduced pancreatic β insulin- and α glucagon-secreting cells and increased ghrelin-producing ε cells. The role of IRX3 in pancreatic development is in conflict with the knowledge that most obese individuals display an increase in pancreatic beta cell mass as a compensatory response to the peripheral insulin resistance that co-exists  and the knowledge that most previously-identified obesity genes are involved in neuronally-mediated central energy balance . However the evidence of functional enhancer capability of this conserved non-coding region is the crucial finding, as its downstream target may have changed or evolved to take on a more complex role over time. It is possible that IRX3's role in neurodevelopment of the posterior forebrain in mammals, including the hypothalamus, may in fact be critical , , . Redressing the previous evidence in favour of FTO causative role, the fto knock-out mouse targeted exon 2 and 3, only ~1 kb into intron 1 , therefore did not remove any of the putative enhancers. If FTO is involved in the phenotype, the observed methylation change could affect expression by changing gene-body methylation or influencing the isoform balance by modifying exon inclusion or exclusion .
Loss or gain of CpG dinucleotides over evolutionary time leading to a genetically-driven variation in DNA methylation and subsequent higher variance has been proposed to be a major driver in evolutionary adaption as well as disease susceptibility , . The loss of a CpG site, by deamination of methylated cytosine, can not only can have considerable influence on regional methylation , but is also an important mechanism in the formation of transcription factor binding sites , such as for p53 that has a role in regulation of insulin resistance . Acquiring the ability to methylate by a cluster of in-phase alleles within a regulatory domain could also be selected for, if functionally significant.
Trans-ethnic studies, especially in genetically diverse African-derived populations, can be informative in narrowing down the location of a causative variant in regions of strong LD in the initial study population . The SNP rs3751812 was the only to confer significant risk (T allele as on CEU susceptibility haplotype, Figure 1) in an African-American study  and is in a LD block in two African HapMap populations overlapping the 7.7 kb window.
In conclusion, we have identified variant-CpG restricted Haplotype-Specific Methylation within the FTO obesity susceptibility locus. Previous association SNP findings were equivalent across this region and therefore could be consistent with a difference in CpG methylation capability being the driving factor that is inherent to this haplotype. To our knowledge, this is the first identification of any association with a measureable methylation difference within a GWA Study SNP association locus. Detailed analysis of the methylation signal and pyrosequencing validation indicate the genetic phase of CpG-creating SNPs are a strong influence in this finding, indicating LD with CpG-creating SNPs as a relevant consideration in genomic methylation studies. A region of 7.7 kb drove the most significant haplotype-specific methylation, and overlies a region containing putative enhancer sequence. Our observation of increased methylation ability at this enhancer region may contribute towards reducing the efficiency of this regulatory element . Thus the investigation of epigenetic variation may be very useful in narrowing down significant regions in large LD association blocks and proposing functional hypotheses for subsequent follow-up from GWA studies.
Whole blood genomic DNA from Caucasian individuals living in the UK recruited to two pre-existing population cohorts (coordinated by the same research centre) was used for the experimental analyses. Forty UK Warren 2 T2D female participants were selected from trios and had a diagnosis of Type 2 diabetes made by either current prescription of a diabetes-specific medication or laboratory evidence of hyperglycemia (World Health Organization definition) . Forty female participants without diabetes were selected from the Exeter Family Study of Childhood Health and had normal fasting glucose and/or HbA1c levels. The mean age of participants was 37 years (standard deviation 6.6) and mean body mass index 29.4 kg/m2 (standard deviation 8.1). Four hypothalamus and ten prefrontal cortex samples were selected from ‘normal brain’ post-mortem tissues from the MRC London Brainbank for Neurodegenerative Diseases at the Institute of Psychiatry in London, and DNA was extracted using standard phenol: chloroform techniques. Written informed consent was obtained from all participants. Ethics approval was covered by the Southwest Multicentre Research Ethics Committee MREC/00/6/55 (UK Warren 2 T2D samples) and Royal Devon and Exeter Healthcare NHS Trust Study 1104 (Exeter Family Study). Post-mortem brain tissue subjects are approached in life for written consent for brain banking, and all tissue donations are consented, collected and stored following legal and ethical guidelines (NHS reference number 08/MRE09/38). The HTA license number for the brainbank is 12293.
The Nimblegen© array comprised of 387,835 50-75mer Probes divided into 122 regions covering 37,037,978 bases of the genome. (080314_HG18_PA_Tiling). This was targeted at all then known T2D Association Loci (20 regions: +/− 60 kb around gene or LD block around association SNP in non-genic regions), Monogenic Obesity & Diabetes (13 genes), the T2D Chromosome 1q linkage region (1q21-24: 148.4–171.3 Mb) comprising 22.8 Mb, identified in multi-continental populations including European, East-Asian, Native-American, and African-American ), all Known Imprinted Genes & Imprinting Control Regions from the Imprinted Gene and Parent-Of-Origin Effect Database (www.otago.ac.nz/IGC) gene or control region +/− 5000 bp numbering 82 loci  which includes confirmed human, as well as regions syntenic to mouse imprinted loci which have not yet been confirmed in human. Finally, a miscellaneous group of nine loci including Coronary Artery Disease and Stature, hyperuricaemia association regions and the PPARGCA1 gene  were included. For full list see Supplementary Table S1. All coordinates are for genome build NCBI Hs36.1/HG18.
The protocol was modified from Weber et al.  as previously described. 1.7 µg DNA is sonicated to produce 300-700 bp fragments. 500 ng DNA is set aside as the ‘input’ control and the remaining 1.2 µg is denatured to become the ‘MeDIP’ fraction. Each denatured DNA sample was incubated with 5MeC-mAb and added to Dynabeads® (Invitrogen). ‘MeDIP’ and ‘input’ DNA are then amplified using the Sigma-Aldrich GenomePlex® Whole Genome Amplification kit. Quality control for MeDIP enrichment efficiency is performed by Q-PCR comparison of two differentially methylated regions of the genome (selected from the Human Epigenome Project primer sequences; methylated locus ID: 6583, unmethylated: 8804; available from www.epigenome.org) with a minimum of a 3 cycle lag for the input compared to the MeDIP fraction for the methylated locus.
Intensity and Spatial Plots were drawn. Lowess Normalisation was used to smooth out data variation arising from non-linear responses during labelling, hybridisation, or scanning to the two different dye colours. Quantile Normalisation was performed to correct for variation in probe level intensities between arrays. The normalised and raw data are available from GEO (Gene Expression Omnibus, NCBI) under the accession number GSE20553.
Array IP/intensity is assumed to be proportional to the density of methylated CpGs. A Bayesian estimation of methylation within fragments of varied CpG density is made using BATMAN (described in detail in Down et al.), with the data output resulting in 100 bp window methylation scores . The algorithm was run individually on each sample.
BATMAN reported methylation values for 100 bp regions, called “proxies”, within larger regions of interest (ROI) varying from 500 to 4000 bp in length. We aimed to call DMRs at the length scale of ROIs to confer with normal methylation consistency that occurs over 500–1000 bp. However, we also observed that some proxies within ROIs exhibited significant variation in methylation. We therefore sought an algorithm that would call DMRs at the length of ROIs, but would use the methylation values of the individual proxies within a ROI rather than an average across all proxies. We adapted an empirical Bayesian method , where the case/control status of a sample s (Ps=1 for case, Ps=0 for control) is modelled as
where h is the logit-link function (logistic regression), nprox is the number of proxies in the ROI, and mps is the methylation value for proxy p in sample s. Because the number of samples is not large (~number of proxies) we simplify the problem by assuming that all regression coefficients are drawn from an underlying distribution with mean zero and finite variance σ2. The null hypothesis
then becomes Ho: σ2=0 . The Bayesian algorithm reports p-values for rejecting the null-hypothesis for each ROI. To correct for multiple testing, we estimated the False Discovery Rate (FDR) using two approaches: the q-value approach  and a permutation-based approach whereby sample labels were permuted a large number of times (>1000).
Linkage Disequilibrium blocks around genotyped susceptibility genes were defined as per Gabriel et al. , as implemented in HAPLOVIEW v.4.1 . At each block, subjects were grouped by genotype without reference to case or control status, for each T2D susceptibility SNP. Average methylation score was calculated per block by summating scores for all BATMAN windows within in it and dividing by the number of windows. Non-parametric (Kruskal-Wallis) and parametric (Linear Regression) tests generated p value statistics for the mean methylation score with respect to genotype status. Permutation empirical p-values were calculated by retaining observed methylation scores and shuffling genotype assignment 10,000 times.
A sliding windows analysis was performed across these LD blocks using 100 bp BATMAN methylation output windows, similar to that used in the context of contiguous SNP haplotypes . Starting with a window size of one and moving one window along per calculation across the entire block, Kruskal–Wallis and Linear Regression analyses were performed for the genotype groups with respect to methylation scores. Window size was increased by one on each pass and the analysis repeated, until the window size equalled the entire LD block. The resulting p-values were outputted and plotted at the midway point for each window. Detection of both non-linear and a biologically plausible linear association, between genotype and methylation status provides robust support to the significant relationship at the FTO LD block (Supplementary Video S1 and S2). Sliding windows analysis, statistical calculation, and permutation scripts were written with the R package .
Primers were designed using the Biotage PSQ assay design software 1.06 (© Biotage). Primer sequences and PCR reaction details are available from the authors on request. The reaction was performed on the Biotage PSQ HS96 instrument, as per manufactures instructions.
All genome co-ordinates are given for NCBI Build Hs36.1/UCSC hg18. Scripts for analysis and mining of non-ancestral methylation creating SNPs from dbSNP were written in Perl.
Analysis was performed on phased SNP data from HapMap Phase II SNP (420 phased haplotypes from the CEU, YRI & ASN populations), plus the addition of the inferred ancestral haplotype from ENSEMBL SNP data/dbSNP, for the FTO susceptibility LD Block with the NETWORK 126.96.36.199 software  (© 2008 Fluxus Technology Ltd). Modelling the hypermutability of CpG transition between 2x to 15x baseline rate did not change tree pattern, only shortened intervening branches with increasing rate (Supplementary Figure S3 is shown for 15x) , .
RNA-Seq data generated by an Illumina Genome Analyser from RNA derived from <24 hr post-mortem Cerebellum tissue samples of six anonymous unrelated donor males was available from the paper of Wang et al. . These data were aligned using TopHat 1.0, which incorporates the Bowtie aligner and additionally generates splice junction reads . SAM output files were visualised with the Integrative Genomics Viewer (version 1.4.01, http://www.broadinstitute.org/igv).
Chromatin Immunoprecipitation for H3K4me1, H3K4me2, H3K4me3, CTCF, H3K9me1 and H3K9me2 was performed on the same T2D designed arrays in normal human skeletal muscle cells, as part of a larger experiment utilising a standard protocol . ChIP Peaks were located by MPeak . Duplicates were performed for all, with 90–95% agreement between replicates for all antibodies.
FTO HapMap CEU Linkage Disequilibrium. Location of FTO Association LD block indicated by Red Rectangle, as visualised in HAPLOVIEW with LD block as defined by Gabriel et al. .
(8.73 MB TIF)
Linear Regression Slope viewed with Linear Regression p-values. Plot of Negative Slope of Linear Regression for the 9 window across the LD block (below) indicating Negative Slope at regions of p-value peaks (above).
(0.43 MB TIF)
Median-Joining Network of FTO susceptibility region haplotypes. Evolutionary relations of the 60 distinct haplotypes from 420 phased haplotypes from HapMap phased haplotypes (CEU, YRI and ASN) plus the Ancestral haplotype. Blue (methylation capable haplotypes within the 900 bp narrow peak with possession of the rs7202116 G allele) and Yellow (non-methylation capable) circles are proportional in size to the number of copies of that haplotype. Lines joining haplotypes are proportional to the number of mutational events separating them. Red nodes are unseen haplotypes, within this sampled set, that are inferred by the MJ algorithm . The thick blue outlined circle represents the haplotype identical to entire CEU susceptibility haplotype indicated in Figure 1 (made up of 49 CEU, 6 YRI & 9 ASN haplotypes).
(8.48 MB TIF)
Enhancer Prediction in 7.7 kb Broad Peak. Enhancer prediction within the Haplotype-Specific Methylation broad peak window of 7.7 kb. Enhancer prediction from H3K4me1 Chip-Seq from Heintzmans et al. in red (http://bioinformatics-renlab.ucsd.edu/enhancer) . Location of Ragvin et al. predicted enhancer (black with red border) and the H3K4me1 data from skeletal muscle indicated in the H3K4me1_SO2 row. Blue rectangle indicates 900 bp differential methylation region window that lies in the shore region of the Ragvin et al. enhancer. CpG creating SNPs in the 900 bp window are indicated with red rectangles. Highest Vertebrate PhastCons Conserved Elements LOD score  is underlined in red.
(6.64 MB TIF)
(0.05 MB XLS)
Sliding windows for Kruskal-Wallis analyses across FTO LD block.
(1.05 MB MOV)
Sliding windows for Linear Regression analyses across FTO LD block.
(0.72 MB MOV)
Competing Interests: The authors have declared that no competing interests exist.
Funding: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. CGB and SB were funded by the Wellcome Trust (084071). AET was supported by a Heller Research Fellowship. CML is funded by WT086596/Z/08/Z (Wellcome Trust Research Career Development Fellowship). SF was funded by the Medical Research Council and Barts and the London Charity. Additional funding was from NIDDK (DK-073490).