|Home | About | Journals | Submit | Contact Us | Français|
Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has primarily focused on protein-coding variants, due to the difficulty of interpreting non-coding mutations. This picture has changed with advances in the systematic annotation of functional non-coding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs, and molecular quantitative trait loci all provide complementary information about non-coding function. These functional maps can help prioritize variants on risk haplotypes, filter mutations encountered in the clinic, and perform systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable dataset integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis, and treatment.
Understanding the genetic basis of disease can revolutionize medicine by elucidating relevant biochemical pathways for drug targets and by enabling personalized risk assessments1,2. As technologies evolved over the past century, geneticists are no longer limited to studying Mendelian disorders and can tackle complex phenotypes. The resulting discovered associations have broadened from individual variants primarily in coding regions to much richer disease architectures, including non-coding variants, wider allelic spectra, numerous loci, and weak effect sizes (Table 1). In the last few years, a new wave of technological advances has intensified the shift towards tackling more complex genetic architectures and uncovering the molecular mechanisms underlying them.
In the early twentieth century, several metabolic disorders were shown to be genetic and Mendelian, and later positional cloning allowed the identification of many such loci, such as those curated by the Online Mendelian Inheritance in Man database (OMIM)3,4. Starting in the 1980s, linkage analysis was used to correlate the inheritance of traits in families with the inheritance of mapped polymorphic markers which could be assayed through restriction fragment length polymorphism (RFLP) analysis5,6. However, the regions mapped by linkage analysis were necessarily large, and cloning candidate genes for follow-up association studies, resequencing, and functional assays required the application of painstaking molecular techniques before the completion of the Human Genome Project7. In addition, complex phenotypes were not amenable to linkage because of the large sample sizes needed to detect loci with modest effects above the genomic background8. The long haplotype structure of the human genome, and its systematic mapping by the HapMap Project9, has allowed single nucleotide polymorphisms (SNPs) to be used as markers for common haplotypes, which could be genotyped using chip technology. The stage was set for a flood of unbiased, genome-wide association studies (GWAS) to search across unrelated individuals10 for common variants associated with complex disease and diverse molecular phenotypes (Fig. 1, Table 2).
Relative to linkage analysis and sequencing, GWAS have less power in cases where different rare mutations act in different families or individuals at the same locus (allelic heterogeneity). However, they are far more sensitive than family studies to complex polygenic associations where a phenotype is associated with the joint effect of many weakly-contributing variants across different loci (locus heterogeneity). In this sense GWAS have been a resounding success, identifying thousands of disease-associated loci for further study11 and revealing previously-unknown mechanisms for diseases such as Crohn’s disease, macular degeneration, and type 2 diabetes2. However, the pursuit of GWAS has also received criticism (Box 1) because of the structure of the knowledge it has been producing relative to the determinism of highly-penetrant Mendelian genetic discoveries2,12,13. The current tension mirrors the intellectual rift in the early 1900s between Mendelians, who modeled inheritance of discrete traits as being carried by single genes, and the biometrician adherents of Galton, who studied the inheritance of continuous traits; the fields were reconciled by R.A. Fisher, who proposed that quantitative traits’ heritability was owed to the contribution of many genes with small effect 14,15.
Although several predominant criticisms of GWAS have been voiced, responses to each can guide future studies.
Cumulative predictive power. Generally, the discovered loci reaching genome-wide significance have weak additive predictive power for specific phenotypes, which limits their clinical relevance for some traits at present130–132. However, risk prediction using the loci discovered for complex disease using GWAS often performs similarly to using classical clinical tests, and has unique properties, such as stability over the lifespan133. Predictors that jointly use hundreds or thousands of weakly-contributing loci have also been shown to explain a larger proportion of variance than was initially appreciated134,135. Integrating these discoveries into clinical protocols is in its infancy, and should be expected to mature.
Non-coding variants with unknown effect. Most of the loci are non-coding and many are far from discovered genes, and, because of linkage disequilibrium (LD), encompass many variants; therefore, they are not immediately informative or biochemically tractable for experimental work. Assigning a prior probability to the deleteriousness of a non-coding mutation is challenging136. To address this challenge, non-coding sequence is being annotated at a rapid pace through systematic efforts such as the ENCODE Project21 and the Roadmap Epigenomics Mapping Consortium22, and through studies of the impact of common variants on genomewide molecular phenotypes, discussed below.
Detection of rare variants. Significant loci tend to additively explain only a small proportion of the narrow-sense heritability of phenotypes12, suggesting that rare rather than common variants may underlie their genetics, which will only be discovered through whole-exome and whole-genome sequencing or family-based studies13. Many explanations for “hidden heritability” among the discovered common-variant associations have been proposed12. The relative importance of rare and common variants is a topic of intense debate137, ranging from arguments that associations with common variants are in fact driven by synthetic associations with large-effect rare variants in long-range LD138, that common associations of weak effect contribute to heritability well beyond the threshold of statistical significance139, and that narrow-sense heritability may be overestimated in many twin studies due to epistasis disguised as additivity98.
Reproducibility. GWAS sometimes do not replicate across studies or populations140, leading to the report of false positives and suspicion of the validity of novel associations, especially when they are non-coding. This could be partly due to the difficulties both in imputing genotypes, which will benefit from an increased understanding of common human variation, and to the poor definition of organismal phenotypes140, which can benefit from molecular disease biomarkers discussed below. Moreover, while the specific loci involved may differ across populations, they may reflect the same underlying molecular pathways, and thus regulatory annotations may be more reproducible across populations. Focusing on molecular phenotypes may improve reproducibility by isolating potential socio-economic or other environmental factors that occur downstream of molecular phenotypes and can strongly affect organismal phenotypes.
In this review, we discuss both the computational challenges and the opportunities presented by the large number of non-coding disease-associated variants being discovered through GWAS and medical resequencing. We first survey the types of regulatory annotations available, including those from functional and comparative genomics as well as quantitative trait loci (QTLs) and allele-specific events, and the ways in which these can be used to dissect disease-associated haplotypes to identify the most promising causal variants at a locus. We then discuss the utility of these regulatory annotations to perform systems-level analysis of GWAS and allelic spectra, revealing relevant cell types and regulatory mechanisms. Finally, we present a variety of bioinformatics hurdles and computational challenges that lie ahead for the field, such as discovering epistatic interactions, connections between molecular and organismal phenotype, and patterns that must be mined from potentially sensitive medical data.
Interpretation of the molecular mechanisms of disease-associated loci can be a great challenge. Even though protein biochemistry has been used to characterize missense and nonsense coding mutations that most often underlie monogenic traits, the frequency with which loss-of-function mutations and rare coding variants are being discovered in healthy individuals16,17 suggests our understanding is far from complete. The challenge of interpretation is even greater for non-coding variants, given the diversity of non-coding functions, the incomplete annotation of regulatory elements, and potentially still unknown mechanisms of regulatory control. Several pioneering studies have provided a model for the types of systematic regulatory annotations needed, by revealing the diverse mechanisms of action underlying human disease, including at the transcriptional, splicing, and translational level (Table 3).
In each of these cases, extensive experimental follow-up was needed to uncover the molecular mechanisms responsible for the disease association signal, and many more disease-associated variants remain uncharacterized, emphasizing the need for systematic methods for annotating regulatory regions, their functional nucleotides, and their interconnections.
Recognizing the need for systematic interpretation of non-coding disease-associated variants, several large-scale projects are currently underway to enhance the annotation of the non-coding genome (Fig. 2). These rely on reference annotation maps using both functional genomics and comparative genomics, and can dramatically increase the annotation of regulatory elements, which can have a strong impact for interpreting both existing GWAS and individual personal genomes.
Massively parallel short-read sequencing technologies have obviated the need for the extremely expensive tiling microarrays previously used to map biochemically active regions of the human genome. This has enabled chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) applied to map transcription factor binding, chromatin regulators, or histone modification marks18, mapping of DNA methylation using bisulfite sequencing (BS-Seq)19 and mapping of accessible chromatin regions by DNase hypersensitivity analysis (DNase-Seq)20. Computational integration of these datasets through supervised or unsupervised machine learning enables mapping of functional non-coding elements such as distal enhancers, transcription factor binding sites, and regulatory RNA genes on a genome-wide scale. For example, the Encyclopedia of DNA Elements (ENCODE) project is releasing comprehensive maps of chromatin states, TF binding, and transcription for a selection of cell lines and DNase maps for many primary cells21, and the NIH Epigenomics Roadmap Project22 and BluePrint project23 both aim to construct reference epigenome maps of hundreds of primary cells and cultured cells. Regulatory maps can then guide the way towards the most likely causal regulators on a haplotype (Fig. 2a).
While maps of regulatory regions can be highly informative, increasing their resolution from hundreds of nucleotides to single nucleotides requires additional computational or experimental developments. This can leverage systematic efforts that seek to elucidate the binding specificities of transcription factors24,25 and splicing regulators26,27, and to also discover regulatory motifs genome-wide based on their enrichment and conservation properties28,29. Similarly, new technologies have been applied to enhance existing techniques, such as digital genomic footprinting using DNAse-seq30, dynamic application of micrococcal nuclease (MNase)31, or the use of lambda exonuclease (ChIP-exo)32, dramatically increasing the mapping resolution of regulatory elements even without knowledge of the specific motifs involved.
Even when the functional elements and motifs are known, we need models to distinguish how mutations in different positions of a regulatory motif or element will affects its function. These models can be used to distinguish silent from deleterious mutations, as is possible within protein-coding regions. This requires integrative models of sequence motifs, chromatin state, and expression patterns24,33–36, which can be trained on experimentally tractable tissues or through in vitro experiments and applied to predict the effect of newly-observed rare and private mutations. The massive scale of regulatory predictions, encompassing hundreds of regulators and millions of regulatory motif instances, demands correspondingly massively parallel methods to validate them. Such methods exploit emerging large-scale synthesis and sequencing technologies are being developed both in model organisms and cultured human cells37–39, and enable testing mechanistic hypotheses about causal variants at unprecedented scales (Fig. 2b).
Even when a regulatory element is rarely used and its activity unobserved in the cell types and tissues sampled, its effect on fitness can still be recognized based on its preferential conservation across multiple related species. Genome-wide comparative analysis of many mammals has revealed a high-resolution map of constrained elements spanning 4.5% of the human genome40,41, revealing millions of likely new elements, including individual transcription factor binding sites, whose nucleotides have been preserved across evolutionary time. Beyond the overall level of evolutionary constraint, the specific evolutionary signatures encoded in the patterns of substitutions, insertions and deletions across related species can provide information for the type of molecular function likely encoded by the constrained elements41–44. Together, constraint and evolutionary signatures can pinpoint functional transcription factor binding motifs and individual binding sites (Fig. 2c), non-coding RNA genes and structures, microRNAs and their targets, and yet uncharacterized sequence elements that confer a selective advantage.
Even in absence of conserved sequence, the conservation of biochemical activity can be indicative of conserved functional elements, even when the corresponding sequence features are not detectable by traditional alignment and constraint measures due to turnover45,46. Because some fraction of protein binding and RNA transcription may be nonfunctional “noise,” cross-species analysis of transcription factor binding47 or gene expression48 can help reveal the subset of elements that are most likely to be functional. However, lineage-specific elements may nevertheless be important and not captured through this method.
For protein-coding mutations, knowledge of protein structure and function, and the unambiguous nature of the genetic code, has allowed the development of a class of predictive algorithms that can score the severity of missense and nonsense variants49–52. Reference annotations are needed to bring functional datasets to bear on understanding the molecular roles of disease-associated common variants in individual regions, especially for non-coding variants (Fig. 2). In addition, new methods are needed to define the relationship between global genetic architectures and genome-wide functional landscapes.
An immediate concern for practitioners of GWAS is the interpretation and prioritization of non-coding variants53. A number of resources, including HaploReg54 (L.D.W. and M.K.), RegulomeDB55, and ENSEMBL’s Variant Effect Predictor56 aim to annotate non-coding common variants from association studies using conservation, functional genomics, and regulatory motif data. Databases such as ANNOVAR57 and VAAST58 are specialized for annotating whole-genome/exome sequencing data, and leverage population-level negative selection to identify extremely rare coding alleles that are most likely to be functional. None of these tools presently brings together all of the available annotation resources listed in the previous section, however, and they will need to be continuously updated to reflect the exponential growth of regulatory knowledge (Table 4).
Prior knowledge of gene interrelationships has been leveraged in studies of gene expression to discover differentially-regulated pathways even where single genes in those pathways change expression too little to rise to statistical significance59. These methods for gene set enrichment analysis (GSEA) are being applied to GWAS, where similarly, genetic risk is expected to be concentrated along biological pathways and multiple testing diminishes the statistical significance of associations considered individually. Dozens of methods have been developed to use prior knowledge from gene functional annotation databases to perform pathway analysis on GWAS60,61 (Fig. 3a).
A recent study used chromatin state maps to discover an enrichment of cell type-specific enhancers among the top associations in several GWAS62 (L.D.W., M.K., and colleagues), demonstrating the utility of high-resolution functional genomics maps to serve as a type of pathway annotation. Similar results have been seen using DNase hypersensitivity maps across a large number of cell types63, and by examining concordance between expression quantitative trait loci (eQTLs) and GWAS64,65. These approaches have demonstrated the power of reference epigenomes to identify relevant tissues for further study (Fig. 3b). Another way to use prior knowledge about variant function is to incorporate the information into the association study itself through Bayesian methods61,66–69 or using boosting to prioritize disease networks70. However, it is difficult to evaluate the utility of these weighting schemes, which essentially discard loci about which there is the least functional data.
For potentially causal rare variants discovered through whole-genome sequencing, a class of techniques has been developed that deal successfully with allelic heterogeneity and low allele frequencies by pooling mutations across individuals by genes, pathways, or other functional annotations and filters71; the additional use of functional genomic maps has recently been proposed72. Improved annotation of non-coding regions will obviously empower this type of analysis (Fig. 3c).
Table 5 lists examples of new insights from computational methods integrating regulatory elements with GWAS.
While until this point we have discussed regulatory annotations from reference cell lines, biochemical activity is itself genotype-dependent, and thus a single reference annotation fails to capture the complexity of the regulatory genome. Moreover, we treated LD as a property of the human genome, while it is in fact population specific, and patterns of LD and selection have varied across both geography and time. This increased complexity can in fact be leveraged to gain additional insights into genome regulation, and provide additional power for the aforementioned analyses.
Two powerful tools have emerged to identify non-coding loci that affect molecular phenotypes: association studies and allele-specificity studies. Association studies (Fig. 1b) have been used to discover non-coding cis regulators of methylation (meQTLs)73, DNase I sensitivity (dsQTLs)74, transcription factor binding75, gene expression (eQTLs)76, and alternative splicing77. In the same manner as GWAS on organism-level quantitative traits, these studies consider a phenotype associated with a particular genomic locus (such as steady-state mRNA level corresponding to a gene) in the same cell type isolated across unrelated individuals, and search for genetic regulators of those molecular processes. A recent related study used eQTL data to reveal selective signatures of epistasis between deleterious coding variants and the regulatory variants that modulate their penetrance78, a method which should be broadly applicable to testing hypotheses about cis regulatory interactions from genomics models.
In contrast, allele specificity tests look at heterozygous sites in individuals and look for a skew in the molecular signal towards one of the alleles (Fig. 1c). Allele-specific methylation79, histone modification80, DNAse I sensitivity81, protein binding82, and expression83 have been surveyed genomewide. While association studies have the advantage of identifying regulatory variants that may be acting at some genetic distance from the regulated locus, and can include homozygous individuals in the sample, allele-specific studies can be performed on single individuals, and inherently control for possible trans-regulatory differences caused by individuals’ genetic background.
Causal variants within associated haplotypes should be identified not only for further research, but also for genetic counseling; because of variations in LD patterns, a SNP that marks a risk haplotype efficiently in one population may not in another84. Computational methods that explicitly model ethnic background in admixed populations can increase their power by exploiting their shared ancestry85.
Haplotype structure and allele frequencies from the HapMap project9 and 1000 Genomes project86 provide evidence of both positive and negative selection currently acting on the human lineage. Although the relative importance of population structure and selective sweeps in recent human history is debated87–89, many non-coding loci show multiple lines of evidence for local adaptation90.
Ultimately, linkage analysis and GWAS are sensitive to complementary genetic architectures, but a wide spectrum of diseases likely exhibit both locus and allele heterogeneity. Because the genomically-distributed signals of association with complex disease are weak, the potential confounding effects of population stratification and cryptic relatedness become especially important to control. Family-based methods such as linkage analysis and the transmission disequilibrium test (TDT) are free of these complications, and have been combined with association tests in a new class of methods91. In addition, new methods in phylogenomics and ancestral recombination graph reconstruction provide an opportunity to enhance association studies by explicitly taking population structure and region-specific relatedness into account92,93.
Modeling of allele frequency data94,95 and sequence divergence data46 suggests that a large amount of negative selection is occurring outside of mammalian conserved elements, evidence for widespread non-coding function. These same forces can maintain disease-associated alleles at lower frequency in the population dependent on their penetrance and expressivity.
Even when considering genome-wide enrichments of functional annotations in disease-associated regions, the aforementioned methods have so far considered each locus as acting independently and considered their effects as additive. Functional genomics should enable us to consider higher-order interactions between these individual loci, by leveraging functional and variation information to build interaction and regulatory networks. These networks can then guide the search for epistatic effects.
Substantial disagreement exists over the relative importance of epistasis in the genetic basis of complex disease96–98. While genetic interactions have been systematically mapped in yeast99 and cases have been identified in human66, testing for all possible interactions remains impossible; understandably, detecting epistasis in association studies is an area of intense theoretical interest66,100,101. One method102 successfully discovered epistasis between two taste receptor genes affecting nicotine dependence by using a multifactor dimensionality reduction (MDR) method integrated with linkage information from a pedigree disequilibrium test, similar to the hybrid linkage-association studies described previously91.
Some methods propose to limit the search space for interactions by only searching among the most significant independently-associated loci; this method failed to discover any interactions among the 180 loci reported to be associated with height103. Another proposed limit on the search space is with prior knowledge from gene annotations and protein-protein interactions104–106. Again, epigenomic maps and improved regulatory annotation holds promise for zeroing in on relevant combinations of SNPs that might be expected to interact.
Unlike promoters, enhancers pose the dual challenge of both pinpointing their location in vast nonfunctional sequences, and linking them to their target genes. These distal regulatory elements often interact physically with promoters, and technologies to detect these interactions, such as chromatin conformation capture (3C, Hi-C)107,108 and chromatin interaction paired-end tagging (ChIA-PET)109 are advancing rapidly.
Another way of detecting enhancer-gene relationships is to measure the correlation of these elements’ activity with expression across multiple cell types and conditions. This technique is being used to infer gene regulatory networks in human35 and model organisms99,110. While protein-protein interaction and metabolic networks are the most common types of prior knowledge integrated into existing algorithms, these regulatory networks may provide a more useful starting point in the search for epistasis.
Molecular QTL data discovered from inter-individual variation can also being used to help infer regulatory networks111, which unlike evidence learned solely from expression patterns provide unambiguous directionality for causality.
Chemical perturbations of cultured cells have been used for network inference. These experiments are useful not only for their relevance to understanding pharmacological mechanisms, but also for revealing the difference in network topology between normal and cancerous cells112, including gene-gene and gene-drug interactions relevant to interpreting genetic architecture of cancer.
While human genetic history and selective pressures are closely intertwined, model organisms offer an opportunity to measure the global effects of selection and the resulting genetic interactions in a controlled setting113,114. Model organisms have also proven useful for testing gene-gene99 and gene-drug115 interactions on a scale that is impossible in humans.
While genotyping and sequencing is already becoming commonplace for discovery of disease loci and increasingly for diagnostics in a clinical setting, in the future the democratization of genome-wide molecular profiling technologies will further enable cohort-level molecular association studies and personal functional genomics in a medical setting. These can complement existing genetic and chemical biomarkers with molecular-level diagnostics of disease state.
One of the major clinical applications of DNA microarrays was to identify disease-involved genes and to classify disease subtypes by genome-wide expression signatures116, and disease-associated gene sets from microarrays and now RNA-seq can be used to define biological pathways, such as those in the Molecular Signatures Database (MSigDB)117. Similarly, chromatin maps can be compared across lineages or between disease and normal tissue to define sets of regulating loci (Fig. 1d). These sets can be used for enrichment and pathway analysis of GWAS, as described previously.
Microarray-based assays for methylation are now allowing for the first time “epigenome-wide association studies” (EWAS)118, which identify differentially-methylated sites associated with disease without taking into account genotype (Fig. 1d). Such studies may bypass some of the environmental variability that lowers the penetrance of genetic factors119. Integrating family members into EWAS studies may be especially useful in order to test for imprinting and other parent-of-origin effects.
One important future use of molecular QTLs may be to empower Mendelian randomization studies120,121. Molecular traits - expression, epigenetic state, or biomarkers - can be important stepping stones between genetic variation and complex phenotypes, but the direction of causality can be unclear between the molecular trait and the organismal trait. A recent study used this method to challenge the idea that raising HDL cholesterol levels reduces risk of myocardial infarction, showing that alleles for higher HDL did not convey the genetic protection from heart disease that would be expected if cholesterol were causal122.
Once these regulatory mechanisms are predicted from functional genomics and molecular variation, the next challenge is applying this knowledge to rare variants discovered by whole-genome sequencing (Figure 2d). A goal for regulatory genomics should be to develop models that predict the effect of novel regulatory variants with the same accuracy as existing methods for novel protein-coding variants.
Some expression signatures of disease subtypes or progression are already being used clinically, and their use promises to grow. However, analogous to the problem of rare variants discovered through sequencing, clinical functional genomics samples will also exhibit patterns too rare in the population to have been correlated with disease. As a recent pilot study on an individual demonstrates123, there is both great power but also many challenges associated with interpreting such personal -omics profiling, and new computational models are needed that can generalize from the effects of common genetic and functional variation to personal genetics and functional genomics.
In addition to these conceptual challenges of statistical and computational integration of disparate datasets, each of these topics has relied on extensive data sharing between genomics and medical genetics researchers. However, sharing is still limited due to privacy concerns and informatics challenges of database interoperability. These challenges are even greater for non-genomic datasets such as medical records and drug response, resulting in treasure troves of information remaining unused. To complete the integration of genomics into the drug discovery and target validation pipelines, several additional hurdles need to be overcome:
In order to facilitate integrative analysis, GWAS investigators should report the association of all variants, not just those that are most significant. The editorial board of Nature Genetics recently articulated a policy to this effect124, but concerns remain about sufficiently de-identifying association results in order to protect subject privacy125. Procedures in place at central archives such as the NCBI’s database of Genotypes and Phenotypes (dbGaP) and the European Genome-Phenome Archive (EGA) are crucial to balancing the rights of human subjects with the principles of scientific openness.
The interoperability of databases remains paramount to integrative analysis. Continuing efforts by the UCSC Genome Browser and the ENSEMBL Genome Browser have facilitated integration of epigenomic and variation data, but better connections to domain-specific knowledge bases such as the GTex eQTL Browser, dbGaP analyses, and the NHGRI GWAS Catalog11 would broaden the scope of connections available to geneticists.
Medical records have been successfully mined to discover epidemiological patterns126, adverse drug reactions127, and disease risk factors and heterogeneity128. As electronic medical records become populated with genetic data, cooperation with clinicians will be needed in order to mine patient data for genetic associations with biomarkers and disease, and discover novel patterns of disease heterogeneity129.
Ultimately, informatics challenges will need to be resolved in order to connect the resulting molecular predictions to patient records, environmental variables, drug screening and response databases, towards enabling genomics as commonplace for clinical practice.
Data from GWAS and whole-genome sequencing continue to expand the catalog of non-coding variants implicated in human disease, and data from epigenome mapping consortia complemented with regulatory modeling are needed to prioritize candidate causal variants and candidate affected tissues. Thoughtful integration of systematic and manual annotations of gene sets along with higher-resolution functional maps may hold the key to implicating pathways and cell types, both through joint consideration of the many weak additive associations discovered in GWAS as well as in the search for epistatic interactions between variants. Clinically relevant regulatory interactions may then be tested experimentally in the tissues or in vitro experimental conditions that are predicted to recapitulate the phenotype. In addition, an explosion of functional genomics data has been facilitated by high-throughput sequencing technology, allowing “intermediate” molecular phenotypes to be correlated with both organismal phenotype and with genotype. This new type of data can be combined with genetic associations to decipher the mechanisms underlying complex disease.
L.D.W. and M.K. were funded by NIH grants R01HG004037 and RC1HG005334 and NSF CAREER grant 0644282.