Unraveling how common genetic variations contribute to disease development is complicated, as the effect of a genetic variation may be limited to a certain developmental stage and/or cell type, and may be dependent on the presence of additional environmental factors. Our analysis of asthma-associated SNPs demonstrates how this problem can be tackled by the use of epigenetic information, which identifies which genomic regions are active in different cell types.
Previously, Pham and colleagues explored the association between potential enhancers and disease-associated variants extracted from a comprehensive GWAS catalogue 
. Their primary focus was on promoter-distal regions marked by H3K4me1 and/or H3K27ac. Specifically, the authors have validated a novel macrophage-specific enhancer signature encompassing ETS, CEBP, bZIP, EGR, E-Box and NFkB motifs by ChIP-sequencing, which confirmed their associations with epigenetic changes related to differentiation. Another, recent study by Maurano et al. 
examined the distribution of 5,654 noncoding significant associations (5,134 SNPs) for 207 diseases and 447 quantitative traits. By combining with the deep genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs), their study has revealed a collective 40% enrichment of GWAS SNPs in DHSs. For externally replicated non-coding SNPs, 69.8% reside within a DHS. Of GWAS SNPs in DHSs, 93.2% (2,874) overlap a transcription factor recognition site. Common variants associated with specific diseases or trait classes were systematically enriched in the recognition sequences of transcription factors governing physiological processes relevant to the same classes. Our studies are in agreement with these findings, and suggest that the combined analysis of GWAS and epigenetic information predicts which SNPs are more likely to be functionally contributing to disease and in which cell types these effects will be noticeable.
With the rapidly increasing amount of epigenetic data for different human tissues in different developmental stages, such analysis will become increasingly powerful.
Significant progress has been made with respect to the ENCyclopedia Of DNA Elements (ENCODE) project after our initial results were submitted for publication 
. Multiple types of ENCODE data can now be linked with disease-associated SNPs, that could help pinpointing regulatory regions with significant enrichment for functional SNPs 
. Researchers have employed ENCODE epigenetic data as a guide to unveil regulatory regions in which genetic variants could affect a given complex trait. For instance, Farrell et al. 
applied ENCODE data to uncover the function of a DNA fragment encompassing a 3-bp deletion polymorphism, which is shown to have enhancer-like activity. The 3-bp deletion polymorphism could possibly represent the most significant functional motif accounting for HBS1L-MYB intergenic polymorphism associated with the trait of interest, fetal hemoglobin.
Our present analysis is meant more as a proof of concept than an optimized, definitive study; there are multiple ways in which it can be significantly improved. First, the amount of published asthma-associated SNPs is continuously increasing, with a gain of 65% from January to October 2012, and additional SNPs will continue to be discovered. Moreover, rather than relying on SNPs that reach statistical significance in whole genome studies, our approach would be even more powerful when GWAS data is re-analyzed from scratch, limiting the SNPs considered to those in active genomic regions of the cells of interest.
Second, the peak-calling algorithm used (MACS) is not optimal for identifying histone modifications, but was designed for identifying much better-defined transcription factor binding sites. Preliminary analysis showed that we obtained a higher enrichment with other algorithms, such as SICER 
and ZINBA 
. Moreover, rather than calling peaks, it may be preferable to identify enhancers based on the profiles of H3K4me1 enrichment together with other chromatin marks as implemented in ChromaSig 
. The more sophisticated analysis of active genomic regions by the Kellis group, which combined multiple chromatin markers in CD4+ T cells, gave a higher enrichment of disease associated SNPs than our approach of relying solely on H3K4me1 peaks to identify enhancers. We assume that this is both because other cis
-regulatory elements such as suppressors or isolators may well have similar importance to enhancers, and because even for enhancers, H3K4me1 in combination with other markers may provide a more accurate identification. We expect classifications such as those by the Kellis group to become available for multiple cell types in the near future. Also, the chromatin state classifications could be further tailored to our type of analysis by focusing on those states that show the highest correlation with disease-associated SNPs, and identifying the optimal set of chromatin marks that identifies these regions. Finally, we want to reiterate that asthma-associated SNPs are significantly enriched not only in enhancers and promoters, but also in coding and untranslated regions. The transcription of these regions could further depend on both genetic and epigenetic factors.
In conclusion, we have demonstrated a novel approach to GWAS data analysis that integrates epigenomic information to identify SNPs and cell types contributing to disease. We expect our approach to be broadly applicable, and to further enhance the value of the accumulating information from GWAS of disease. Future work needs to experimentally confirm the functional role of the identified SNPs.