Recent studies on human genetics, such as The International HapMap Project (1
) and 1000 Genomes Project (2
), have identified a large number of genetics variants in the human genome. Furthermore, genome-wide association studies (GWAS) (3
) and exome sequencing (4
) are extensively used to globally investigate the relationship between genetic variants and human diseases/traits. By looking at the genomic location of the associated variants detected in GWAS, a large portion (~88%) of them fall outside of coding regions, which are harder to interpret than the protein-coding variants (6
). Therefore, elucidating the molecular function of genetic variants locating in the non-coding regions is critical to our full understanding of genetic disorders.
However, there are many difficulties and computational challenges in achieving this goal (7
). One of the major difficulties comes from the unclear role of non-coding genetic variants in the relevant processes underlying disease/trait association. These variants could affect many biological activities including transcription, splicing, post-transcriptional regulation, translation initiation/elongation and post-translational modification (8
). Previously, conservation information was frequently used to prioritize the functional importance of non-coding genetic variation (9
). At the transcription regulation level, mutations in the promoter regions may impact the recruitment of RNA polymerase and other regulators, especially the binding of transcriptional factors (TFs) to the promoter region to initiate gene transcription. Tools such as is-rSNP (11
), sTRAP (12
) and regSNPs (13
) have been successfully developed to evaluate the binding affinity affected by genetic variation. However, although algorithms that solely used TF motifs are effective in finding regulatory elements in the immediate promoter regions but may inevitably introduce a large number of false positives in the distal promoter/enhancer regions where the searching space becomes substantially larger. More and more studies showed that mutations within the distal regulatory elements, such as enhancer, insulator and silencer, could also disrupt or change the binding of TFs, nucleosome positioning signals and chromatin states. Furthermore, the locally changed chromosome conformation can block or create looping interaction between distal elements and promoter regions (14
) and subsequently influence gene regulation. Unfortunately, few tools or resources have used such information to study genetic variants.
The Encyclopedia of DNA Elements (ENCODE) project has identified a comprehensive map of functional elements and active chromatin marks by advanced techniques such as ChIP-seq, DNase-seq, bisulfate sequencing, chromosome conformation capture and so forth. (15
). Recent studies showed that disease-associated single-nucleotide polymorphisms (SNPs) detected by GWAS are significantly enriched in the regions that harbor functional elements, such as transcriptional factor binding sites (TFBSs), histone modification marked regions, DNase I hypersensitive sites (DHSs) and expression quantitative trait loci (16–19
). Two recently published databases, HaploReg (20
) and RegulomeDB (21
), have used these regulatory signals and marks to annotate the genetic variants, which offer comprehensive resources on regulatory variation. On the other hand, different functional elements have been reported to function in a tissue/cell type-specific manner. SNPs associated with the same trait are likely to locate in active chromatin marks in the same/relevant cell type (22
), implying the possibility of detecting regulatory signals using the chromatin marks of phenotypically relevant cell type. Computational algorithms including ChromHMM (23
) and Seaway (24
) have been successfully applied to scan different functional elements in the genome. Therefore, combinatory analysis of GWAS data and functional elements in a specific cell type to capture regulatory variants for a particular disease/trait are needed.
Here, we develop a web server GWAS3D (http://jjwanglab.org/gwas3d
) to systematically analyze the probability of genetic variants affecting regulatory pathways and underlying disease/trait associations by integrating chromatin state, functional genomics, sequence motifs and cross-species conservation for a set of GWAS data or variant list. We first collected and curated genome-wide chromosome interaction (5C, Hi-C, ChIA-PET) data, enhancer/insulator/promoter marks [H3K4me1, H3K27ac, p300, CCCTC-binding factor (CTCF), DHS] and ChromHMM predicted functional elements in 16 different cell types. Using those regulatory regions, we mapped genetic variants to the reference genome and evaluated the binding affinity changes of regulatory factors by scanning 73 ENCODE motifs. Finally, we combined original GWAS signal, risk haplotype, binding affinity significance and conservation information to prioritize the genetic variants. In addition, the system provides comprehensive annotation and visualization to help users to interpret the results. Comparing with existing software and databases, GWAS3D uses the latest information to build a one-stop web-based tool for clinicians and biologists to evaluate the deleteriousness of disease/trait-associated variants that affect transcription regulation on a broader spectrum, especially on non-coding genetic variation.