In this work, we have presented a statistical framework for analyzing and identifying eQTLs, combining data from multiple tissues. Our approach considers a range of alternative models, one for each possible configuration of eQTL sharing among tissues. We compute Bayes Factors that quantify the support in the data for each possible configuration, and these are used both to develop powerful test statistics for detecting genes that have an eQTL in at least one tissue (by Bayesian model averaging across configurations), and to identify the tissue(s) in which these eQTLs are active (by comparing the Bayes factors for different configurations against one another). Our framework allows for heterogeneity of eQTL effects among tissues in which the eQTL is active, for different variances of gene expression measurements in each tissue, and for intra-individual correlations that may exist due to samples being obtained from the same individuals. For eQTL detection, our framework provides consistent, and sometimes substantial, gains in power compared to a tissue-by-tissue analysis and ANOVA or simple linear regression. Concerning the tissue specificity of eQTLs, our framework efficiently borrows information across genes to estimate configuration proportions, and then uses these estimates to assess the evidence for each possible configuration. When re-analyzing the gene expression levels in three cell types from 75 individuals (
), we found that there appears to be a substantial amount of sharing of eQTLs among tissues, substantially more than suggested by the original analysis.
In the next few years, we expect that expression data will be available on large numbers of diverse tissue types in sufficient sample sizes to allow eQTLs to be mapped effectively (for example, the NIH GTEx project aims to collect such data). The methods presented here represent a substantive step towards improved analyses that fully exploit the richness of these kinds of data. However, we also see several directions for potential extensions and improvements. First, our current framework can only partially deal with the challenges of large numbers of tissues. Specifically, because with
tissues, there are
possible configurations of eQTL sharing among tissues, some of our current methods, which consider all possible configurations, will become impractical for moderate
(speculatively, above about 10, perhaps). Our test statistic
partially addresses this problem, by allowing for heterogeneity while averaging over only
configurations, which is practical for very large
. Our simulation results suggest that
is a powerful test statistic for identifying SNPs that are an eQTL in at least one tissue. However our preferred approach for identifying which
tissues such SNPs are active in involves a hierarchical model that estimates the frequency of different patterns of sharing from the data, and this hierarchical model scales poorly with
. In particular, having a separate parameter for each possible configuration is unattractive (both statistically and computationally) for large
, and alternative approaches will likely be required. There are several possible ways forward here: for example, one would be to reduce the number of distinct configurations by clustering “similar” configurations together; another would be to focus less on the discrete configurations, and instead to focus on modeling heterogeneity in effect sizes in a continuous way - perhaps using a mixtures of multivariate normal distributions with more complex covariance structures than we allow here. We expect this to remain an area of active research in the coming years, especially since these types of issues will likely arise in many genomics applications involving multiple cell types, and not only in eQTL mapping.
Another important issue to address is that most future expression data sets will likely be collected by RNA-seq, which provides count data that are not normally distributed. Previous eQTL analyses of RNA-seq (e.g. 
) have nonetheless performed eQTL mapping using a normal model, by first transforming (normalized) count data at each gene to the quantiles of a standard normal distribution. Although this approach would not be attractive in experiments with small sample sizes, with the moderate to large sample sizes typically used in eQTL mapping experiments this approach works well. As a first step, this approach could also be used to apply our methods to count data. However, ultimately it would seem preferable to replace the normal model with a model that is better adapted to count-based data, perhaps a quasi-Poisson generalized linear model (
); Bayes Factors under these models could be approximated using Laplace approximations, similar to the approximations used here for the normal model 
. The quasi-Poisson model has the advantage over the normal transformation approach that it preserves the fact that there is more information about eQTL effects in tissues where a gene is high expressed than in tissues where it is low expressed. This information is lost by normal transformation. In our primary analyses here we addressed this by analyzing only genes that were robustly expressed in all tissues, but this is sub-optimal, and will become increasingly unattractive as the number of tissues grows.
Our analyses here assess (cis) eQTL sharing among tissues by performing association testing at the level of individual SNPs. A different approach to investigating eQTL sharing among tissues is to study the “cross-heritability” of expression levels among tissues (e.g. 
). These methods are based on polygenic models, and attempt to estimate the combined influence of all shared eQTLs; this contrasts with our analysis, where the focus is on sharing of individually-identifiable eQTLs of moderate-to-large effect. Both 
estimate cross-tissue heritability to be low. 
, studying expression in Blood and Adipose tissues from Icelanders, estimated cross-tissue heritability as
obtained an estimate of mean genetic correlation close to zero for Blood and LCLs in monozygotic twins (
). These results may appear to conflict with our results (both from our model-based approach, and the less-model-based pairwise analysis approach from 
), which suggest that most large-effect cis eQTLs are shared among fibroblasts, LCLs and T cells. However, these low estimates of cross-tissue heritability reflect not only the extent of sharing of eQTLs, but also the absolute size of the eQTL effects. If eQTL effects are small, explaining only a small proportion of the total variance in gene expression, then cross-tissue heritability will be also small, even if all eQTLs have exactly the same effect in all tissues. Thus, to assess eQTL sharing in the heritability-based approaches, it is helpful to contrast cross-tissue heritability,
, with within-tissue heritability,
, (which is also affected by eQTL effect size, but not by sharing). Specifically, within the polygenic model it can be shown that the correlation coefficient
of the eQTL effects in two tissues
. Applying this to the cis estimates of
, for adipose and blood, yields
. Although this estimate of effect correlation within a polygenic model, is not directly comparable with our estimate of sharing of eQTLs in a decidedly non-polygenic model (and for different cell types!), this result suggests that the two analyses may be less in conflict than they initially appear.