While DNA sequence variants, such as null alleles, can result in extremes of gene expression that may be deleterious, population-based studies of healthy individuals have reported high levels of heritable inter-individual variation in gene expression levels, and studies have mapped genetic variation contributing to gene expression levels in a number of different cell types [8
]. Collectively, studies that map genetic variation contributing to transcriptional variation are referred to as expression quantitative trait locus (eQTL) mapping studies. Their general design consists of genotyping subjects genome-wide and capturing a transcriptome-wide mRNA profile using microarrays, or more recently high-throughput RNA-sequencing. As in gene discovery studies related to a given disease, imputation using a reference map of human genome variation is used to interrogate the role of variants that are not directly genotyped but are in LD with a genotyped marker. Thus, almost every common marker (e.g., those alleles with a minor allele frequency > 5%) evaluated in GWAS studies will have been evaluated, directly or indirectly, in an eQTL study. An eQTL analysis itself consists of applying regression-based or non-parametric models to test millions of genetic variants for regulatory effects on the expression of nearby and distant genes. In a whole-genome eQTL analysis, many millions of tests are performed (N = number of genetic variants × number of genes or transcripts), requiring strict statistical thresholds for significance. While necessary in a genome-wide analysis, such strict thresholds can obscure many biologically meaningful effects of genetic variation on mRNA expression. Thus, approaches to appropriately constrain such analyses have evolved: based on our understanding of the architecture of mammalian genes, sequences involved in the regulation of gene expression of a given gene are most likely to be found near that gene and harbor genetic variation influencing gene expression. Such `cis
-eQTL' analyses are focused on assessing the role of genetic variants on the expression of genes in their vicinity and, empirically, have been demonstrated to be well-powered to detect regulatory effects that are replicated [15
]. For a given tissue and cohort, these analyses provide a list of genetic variants associated with a given gene or transcript's expression levels, the allelic direction of the association, and the magnitude of the effect, often quantified as the mean change in expression between individuals homozygous for either of the two alternate alleles. As such, they provide annotation of functional regulatory variation in the human genome.
When GWAS variants and eQTL variants co-localize to the same genomic region, this generates a testable hypothesis that a given genetic variant influences trait variance through effects on expression of a given gene. An example of identification of an eQTL that co-localizes with Multiple Sclerosis (MS) associated variants is shown in . The Figure shows a region of chromosome 20 where a genetic variant contributing to MS susceptibility has been localized (lower panel). This susceptibility locus encompasses multiple genes; however, the MS-associated genetic variant and surrogate markers that are in LD with it are also associated with altered gene expression of one of the genes in the region, specifically the CD40 gene, in peripheral blood mononuclear cells of multiple sclerosis patients. This result suggests that MS susceptibility may be, in part, due to altered CD40 gene expression levels that are influenced by genetic variants near to the gene. The result is by no means conclusive, but, it does suggest testable hypotheses to pursue in future studies.
Colocalization of cis-expression quantitative trait loci and Multiple Sclerosis GWAS signals in the CD40 locus
To date, the majority of eQTL studies have been performed in healthy subjects and reveal a substantial amount of functional regulatory variation in the human genome. The degree to which the detected associations are population- and cell-type specific, or observable only under certain conditions, is an important consideration for integrating with GWAS associations. Clearly, some genes exhibit highly cell type-, tissue-, and context-specific expression patterns [16
], and the extent to which the eQTL patterns are shared across cell-types or tissues is still being quantified. Most eQTL studies to date have been of modest size, limiting the assessment of tissue overlap because of poor statistical power to support a negative result. Published studies have estimated that only 0.4%–0.5% of genes have a significant cis
-eQTL in at least two or more tissues, and approximately 70–80% of eQTLs may be cell-type specific [8
]. While these modestly sized studies may overestimate the true proportion of cell-type specific eQTLs, it is clear that, in many situations, the cell- or tissue-type in which mRNA is profiled will have an important effect on the presence and magnitude of a variant's effect on gene expression. This is important as it determines the relevance of existing eQTL datasets to disease studies and informs the design of new studies that have to balance the competing needs of accessing a specific cell type of interest (which may be technically and practically challenging) or using a suitable surrogate cell population or cell mixture. To date, the majority of human eQTL studies have been conducted using cell lines, e.g., lymphoblastic cell lines (LCLs) that are derived from B lymphocytes transformed using the Epstein-Barr virus [13
], but increasingly studies are being conducted in primary cells or tissues, including blood [3
], monocytes [9
], primary B-cells[9
], liver [11
], adipose tissues[3
], skeletal muscle [31
], brain [32
], T-cells [8
], and fibroblasts [8
]. Recognizing the utility of eQTL analysis and the tissue specificity of eQTL associations, The U.S. National Institutes of Health has funded an unprecedented multi-tissue eQTL study of human gene expression (http://commonfund.nih.gov/GTEx/
). The Genotype-Tissue Expression (GTEx) program will publish summarized eQTL analysis results from many human tissues and will catalyze methods development to best utilize this wealth of data.
Population ancestry also contributes to context specificity of eQTLs. A cis
-eQTL analysis in LCLs of eight cohorts representing different human ancestries estimates that up to 31% of well-annotated genes have a significant cis
-eQTL relationship in at least one population. Of these genes with cis
-eQTLs, more than 50% exhibit that cis
-eQTL in at least two independent populations, and 6% of genes contain an eQTL effect in all eight populations [13
]. This pattern likely reflects the modest sample size of each population, effects of the transformation process, possible effects of environmental variables on the source tissue, and some clear examples of population-specific associations. This suggests that in some cases, population ancestry may be an important consideration for integration of eQTL observations with GWAS results, particularly as GWAS is applied to populations that are not of European ancestry. Interestingly, while variation in eQTL relationships among populations is expected because of differences in LD structure among human populations, current studies suggest that allele direction and the magnitude of the effect on expression levels vary little for those associations that are shared across human populations [13
]. Thus, in a minority of loci, there may be little in the way of population-specific modifying effects, and further work is needed to understand the role of these eQTLs that may relate to fundamental aspects of human biology. It is important to note that nearly all eQTL investigations conducted to date have focused on common genetic variants (minor allele frequency > 5%), thus our understanding of the levels and patterns of regulatory variation and effect sizes are limited to this set. However, with recent developments in genome sequencing and rare variant genotyping and imputation, investigators are now beginning to identify and characterize regulatory effects of low-frequency variants [36
], which may help in interpreting rare variant trait associations that are identified by GWAS or whole-genome sequencing projects.
Studies have demonstrated that, as a group, trait-associated variants identified through GWAS are enriched for eQTLs [37
], with inflammatory disease risk variants showing a similar degree of enrichment for eQTLs in relevant cell types [6
]. Publicly-available eQTL genome browsers and databases () provide useful resources for investigating potential eQTL associations for SNPs of interest. Though, because most report only significant findings, and analyses use different statistical models and criteria for data inclusion, false negatives are difficult to interpret. Even in the case where a trait-associated variant is also determined to be an eQTL, one has to exercise caution in interpretation. Due to patterns of LD in the human genome and the large number of disease-associated and eQTL-annotated genetic variants, a simple overlap of significant eQTL variants and GWAS variants can occur even when causal variants underlying both signals are different. Statistical frameworks have now been developed to provide an estimate of the degree to which these signals of association overlap [41
], and further methods development continues in this area.
Functional experiments to elucidate mechanisms underlying disease associations need to be performed in relevant cell types due to the context specificity of many cellular processes. Until recently, identifying the most relevant cell type for a trait of interest has relied on educated guesses based on interpretation of current literature. For inflammatory diseases, this poses a particular challenge: while a specific cell type may have a prominent role based on our current understanding of disease pathophysiology and treatment mechanism, inflammatory diseases are systemic diseases and multiple different immune cells are likely to contribute to disease susceptibility and to be the target of susceptibility variants. Studies quantifying enrichment of eQTLs among inflammatory disease-associated variants have identified specific cell types where the enrichment is greatest and have suggested that these cell types may be the ones that are more relevant in terms of genetic susceptibility to an inflammatory disease [9
]. A different approach has been proposed by Raychaudhuri and colleagues [43
] who devised a systematic assessment of cell types that starts with a set of disease-associated variants, identifies genes within each locus, and calculates a probabilistic model that designates the tissue or cell type(s) in which the expression of these potential susceptibility genes is enriched. Applying this method to three inflammatory diseases and using RNA expression profiles from a large number of purified murine immune cells from the Immunological Genome Consortium [44
], they designate cell types that are most relevant to systemic lupus erythematosus, Crohn's disease, and rheumatoid arthritis. While limited by the completeness of the list of susceptibility alleles and the use of gene sets defined in murine cells, these results implicate different combinations of cell types for each disease that are consistent with our current understanding of pathophysiology. Similar approaches using genome-wide maps outlining the state of chromatin in different human cell types have also successfully identified cell types relevant to specific diseases and will help to design studies that seek to explore disease-related eQTLs (Raychaudhuri, personal communication