PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Nat Biotechnol. Author manuscript; available in PMC 2013 July 7.
Published in final edited form as:
PMCID: PMC3703467
NIHMSID: NIHMS415408

Interpreting non-coding variation in complex disease genetics

Abstract

Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has primarily focused on protein-coding variants, due to the difficulty of interpreting non-coding mutations. This picture has changed with advances in the systematic annotation of functional non-coding elements. Evolutionary conservation, functional genomics, chromatin state, sequence motifs, and molecular quantitative trait loci all provide complementary information about non-coding function. These functional maps can help prioritize variants on risk haplotypes, filter mutations encountered in the clinic, and perform systems-level analyses to reveal processes underlying disease associations. Advances in predictive modeling can enable dataset integration to reveal pathways shared across loci and alleles, and richer regulatory models can guide the search for epistatic interactions. Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Ultimately, advances in regulatory and systems genomics can help unleash the value of whole-genome sequencing for personalized genomic risk assessment, diagnosis, and treatment.

Understanding the genetic basis of disease can revolutionize medicine by elucidating relevant biochemical pathways for drug targets and by enabling personalized risk assessments1,2. As technologies evolved over the past century, geneticists are no longer limited to studying Mendelian disorders and can tackle complex phenotypes. The resulting discovered associations have broadened from individual variants primarily in coding regions to much richer disease architectures, including non-coding variants, wider allelic spectra, numerous loci, and weak effect sizes (Table 1). In the last few years, a new wave of technological advances has intensified the shift towards tackling more complex genetic architectures and uncovering the molecular mechanisms underlying them.

Table 1
The diversity of genetic architectures underlying human phenotypes.

In the early twentieth century, several metabolic disorders were shown to be genetic and Mendelian, and later positional cloning allowed the identification of many such loci, such as those curated by the Online Mendelian Inheritance in Man database (OMIM)3,4. Starting in the 1980s, linkage analysis was used to correlate the inheritance of traits in families with the inheritance of mapped polymorphic markers which could be assayed through restriction fragment length polymorphism (RFLP) analysis5,6. However, the regions mapped by linkage analysis were necessarily large, and cloning candidate genes for follow-up association studies, resequencing, and functional assays required the application of painstaking molecular techniques before the completion of the Human Genome Project7. In addition, complex phenotypes were not amenable to linkage because of the large sample sizes needed to detect loci with modest effects above the genomic background8. The long haplotype structure of the human genome, and its systematic mapping by the HapMap Project9, has allowed single nucleotide polymorphisms (SNPs) to be used as markers for common haplotypes, which could be genotyped using chip technology. The stage was set for a flood of unbiased, genome-wide association studies (GWAS) to search across unrelated individuals10 for common variants associated with complex disease and diverse molecular phenotypes (Fig. 1, Table 2).

Figure 1
Four types of next-generation association tests
Table 2
Computational tools for association analyses.

Relative to linkage analysis and sequencing, GWAS have less power in cases where different rare mutations act in different families or individuals at the same locus (allelic heterogeneity). However, they are far more sensitive than family studies to complex polygenic associations where a phenotype is associated with the joint effect of many weakly-contributing variants across different loci (locus heterogeneity). In this sense GWAS have been a resounding success, identifying thousands of disease-associated loci for further study11 and revealing previously-unknown mechanisms for diseases such as Crohn’s disease, macular degeneration, and type 2 diabetes2. However, the pursuit of GWAS has also received criticism (Box 1) because of the structure of the knowledge it has been producing relative to the determinism of highly-penetrant Mendelian genetic discoveries2,12,13. The current tension mirrors the intellectual rift in the early 1900s between Mendelians, who modeled inheritance of discrete traits as being carried by single genes, and the biometrician adherents of Galton, who studied the inheritance of continuous traits; the fields were reconciled by R.A. Fisher, who proposed that quantitative traits’ heritability was owed to the contribution of many genes with small effect 14,15.

Box 1

Potential and limitations of genome-wide association studies

Although several predominant criticisms of GWAS have been voiced, responses to each can guide future studies.

Cumulative predictive power. Generally, the discovered loci reaching genome-wide significance have weak additive predictive power for specific phenotypes, which limits their clinical relevance for some traits at present130132. However, risk prediction using the loci discovered for complex disease using GWAS often performs similarly to using classical clinical tests, and has unique properties, such as stability over the lifespan133. Predictors that jointly use hundreds or thousands of weakly-contributing loci have also been shown to explain a larger proportion of variance than was initially appreciated134,135. Integrating these discoveries into clinical protocols is in its infancy, and should be expected to mature.

Non-coding variants with unknown effect. Most of the loci are non-coding and many are far from discovered genes, and, because of linkage disequilibrium (LD), encompass many variants; therefore, they are not immediately informative or biochemically tractable for experimental work. Assigning a prior probability to the deleteriousness of a non-coding mutation is challenging136. To address this challenge, non-coding sequence is being annotated at a rapid pace through systematic efforts such as the ENCODE Project21 and the Roadmap Epigenomics Mapping Consortium22, and through studies of the impact of common variants on genomewide molecular phenotypes, discussed below.

Detection of rare variants. Significant loci tend to additively explain only a small proportion of the narrow-sense heritability of phenotypes12, suggesting that rare rather than common variants may underlie their genetics, which will only be discovered through whole-exome and whole-genome sequencing or family-based studies13. Many explanations for “hidden heritability” among the discovered common-variant associations have been proposed12. The relative importance of rare and common variants is a topic of intense debate137, ranging from arguments that associations with common variants are in fact driven by synthetic associations with large-effect rare variants in long-range LD138, that common associations of weak effect contribute to heritability well beyond the threshold of statistical significance139, and that narrow-sense heritability may be overestimated in many twin studies due to epistasis disguised as additivity98.

Reproducibility. GWAS sometimes do not replicate across studies or populations140, leading to the report of false positives and suspicion of the validity of novel associations, especially when they are non-coding. This could be partly due to the difficulties both in imputing genotypes, which will benefit from an increased understanding of common human variation, and to the poor definition of organismal phenotypes140, which can benefit from molecular disease biomarkers discussed below. Moreover, while the specific loci involved may differ across populations, they may reflect the same underlying molecular pathways, and thus regulatory annotations may be more reproducible across populations. Focusing on molecular phenotypes may improve reproducibility by isolating potential socio-economic or other environmental factors that occur downstream of molecular phenotypes and can strongly affect organismal phenotypes.

In this review, we discuss both the computational challenges and the opportunities presented by the large number of non-coding disease-associated variants being discovered through GWAS and medical resequencing. We first survey the types of regulatory annotations available, including those from functional and comparative genomics as well as quantitative trait loci (QTLs) and allele-specific events, and the ways in which these can be used to dissect disease-associated haplotypes to identify the most promising causal variants at a locus. We then discuss the utility of these regulatory annotations to perform systems-level analysis of GWAS and allelic spectra, revealing relevant cell types and regulatory mechanisms. Finally, we present a variety of bioinformatics hurdles and computational challenges that lie ahead for the field, such as discovering epistatic interactions, connections between molecular and organismal phenotype, and patterns that must be mined from potentially sensitive medical data.

Systematic annotation of the non-coding genome

Interpretation of the molecular mechanisms of disease-associated loci can be a great challenge. Even though protein biochemistry has been used to characterize missense and nonsense coding mutations that most often underlie monogenic traits, the frequency with which loss-of-function mutations and rare coding variants are being discovered in healthy individuals16,17 suggests our understanding is far from complete. The challenge of interpretation is even greater for non-coding variants, given the diversity of non-coding functions, the incomplete annotation of regulatory elements, and potentially still unknown mechanisms of regulatory control. Several pioneering studies have provided a model for the types of systematic regulatory annotations needed, by revealing the diverse mechanisms of action underlying human disease, including at the transcriptional, splicing, and translational level (Table 3).

Table 3
Mechanisms through which non-coding variants influence human disease.

In each of these cases, extensive experimental follow-up was needed to uncover the molecular mechanisms responsible for the disease association signal, and many more disease-associated variants remain uncharacterized, emphasizing the need for systematic methods for annotating regulatory regions, their functional nucleotides, and their interconnections.

Recognizing the need for systematic interpretation of non-coding disease-associated variants, several large-scale projects are currently underway to enhance the annotation of the non-coding genome (Fig. 2). These rely on reference annotation maps using both functional genomics and comparative genomics, and can dramatically increase the annotation of regulatory elements, which can have a strong impact for interpreting both existing GWAS and individual personal genomes.

Figure 2
Dissecting haplotypes discovered through association tests

Reference functional genomics and chromatin state maps

Massively parallel short-read sequencing technologies have obviated the need for the extremely expensive tiling microarrays previously used to map biochemically active regions of the human genome. This has enabled chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) applied to map transcription factor binding, chromatin regulators, or histone modification marks18, mapping of DNA methylation using bisulfite sequencing (BS-Seq)19 and mapping of accessible chromatin regions by DNase hypersensitivity analysis (DNase-Seq)20. Computational integration of these datasets through supervised or unsupervised machine learning enables mapping of functional non-coding elements such as distal enhancers, transcription factor binding sites, and regulatory RNA genes on a genome-wide scale. For example, the Encyclopedia of DNA Elements (ENCODE) project is releasing comprehensive maps of chromatin states, TF binding, and transcription for a selection of cell lines and DNase maps for many primary cells21, and the NIH Epigenomics Roadmap Project22 and BluePrint project23 both aim to construct reference epigenome maps of hundreds of primary cells and cultured cells. Regulatory maps can then guide the way towards the most likely causal regulators on a haplotype (Fig. 2a).

Nucleotide-resolution regulatory annotations

While maps of regulatory regions can be highly informative, increasing their resolution from hundreds of nucleotides to single nucleotides requires additional computational or experimental developments. This can leverage systematic efforts that seek to elucidate the binding specificities of transcription factors24,25 and splicing regulators26,27, and to also discover regulatory motifs genome-wide based on their enrichment and conservation properties28,29. Similarly, new technologies have been applied to enhance existing techniques, such as digital genomic footprinting using DNAse-seq30, dynamic application of micrococcal nuclease (MNase)31, or the use of lambda exonuclease (ChIP-exo)32, dramatically increasing the mapping resolution of regulatory elements even without knowledge of the specific motifs involved.

Predictive models of variant effects

Even when the functional elements and motifs are known, we need models to distinguish how mutations in different positions of a regulatory motif or element will affects its function. These models can be used to distinguish silent from deleterious mutations, as is possible within protein-coding regions. This requires integrative models of sequence motifs, chromatin state, and expression patterns24,3336, which can be trained on experimentally tractable tissues or through in vitro experiments and applied to predict the effect of newly-observed rare and private mutations. The massive scale of regulatory predictions, encompassing hundreds of regulators and millions of regulatory motif instances, demands correspondingly massively parallel methods to validate them. Such methods exploit emerging large-scale synthesis and sequencing technologies are being developed both in model organisms and cultured human cells3739, and enable testing mechanistic hypotheses about causal variants at unprecedented scales (Fig. 2b).

Comparative genomics between related species

Even when a regulatory element is rarely used and its activity unobserved in the cell types and tissues sampled, its effect on fitness can still be recognized based on its preferential conservation across multiple related species. Genome-wide comparative analysis of many mammals has revealed a high-resolution map of constrained elements spanning 4.5% of the human genome40,41, revealing millions of likely new elements, including individual transcription factor binding sites, whose nucleotides have been preserved across evolutionary time. Beyond the overall level of evolutionary constraint, the specific evolutionary signatures encoded in the patterns of substitutions, insertions and deletions across related species can provide information for the type of molecular function likely encoded by the constrained elements4144. Together, constraint and evolutionary signatures can pinpoint functional transcription factor binding motifs and individual binding sites (Fig. 2c), non-coding RNA genes and structures, microRNAs and their targets, and yet uncharacterized sequence elements that confer a selective advantage.

Evolutionarily conserved biochemical activity

Even in absence of conserved sequence, the conservation of biochemical activity can be indicative of conserved functional elements, even when the corresponding sequence features are not detectable by traditional alignment and constraint measures due to turnover45,46. Because some fraction of protein binding and RNA transcription may be nonfunctional “noise,” cross-species analysis of transcription factor binding47 or gene expression48 can help reveal the subset of elements that are most likely to be functional. However, lineage-specific elements may nevertheless be important and not captured through this method.

Interpreting variants using functional genomic annotations

For protein-coding mutations, knowledge of protein structure and function, and the unambiguous nature of the genetic code, has allowed the development of a class of predictive algorithms that can score the severity of missense and nonsense variants4952. Reference annotations are needed to bring functional datasets to bear on understanding the molecular roles of disease-associated common variants in individual regions, especially for non-coding variants (Fig. 2). In addition, new methods are needed to define the relationship between global genetic architectures and genome-wide functional landscapes.

Tools for prioritizing variants

An immediate concern for practitioners of GWAS is the interpretation and prioritization of non-coding variants53. A number of resources, including HaploReg54 (L.D.W. and M.K.), RegulomeDB55, and ENSEMBL’s Variant Effect Predictor56 aim to annotate non-coding common variants from association studies using conservation, functional genomics, and regulatory motif data. Databases such as ANNOVAR57 and VAAST58 are specialized for annotating whole-genome/exome sequencing data, and leverage population-level negative selection to identify extremely rare coding alleles that are most likely to be functional. None of these tools presently brings together all of the available annotation resources listed in the previous section, however, and they will need to be continuously updated to reflect the exponential growth of regulatory knowledge (Table 4).

Table 4
Comparison of recent tools to systematically annotate variants

Gene set enrichment analysis

Prior knowledge of gene interrelationships has been leveraged in studies of gene expression to discover differentially-regulated pathways even where single genes in those pathways change expression too little to rise to statistical significance59. These methods for gene set enrichment analysis (GSEA) are being applied to GWAS, where similarly, genetic risk is expected to be concentrated along biological pathways and multiple testing diminishes the statistical significance of associations considered individually. Dozens of methods have been developed to use prior knowledge from gene functional annotation databases to perform pathway analysis on GWAS60,61 (Fig. 3a).

Figure 3
Systems-level analyses beyond isolated common haplotypes. (a) Gene-based enrichment analysis of genetic architecture

Regulatory element enrichment analysis

A recent study used chromatin state maps to discover an enrichment of cell type-specific enhancers among the top associations in several GWAS62 (L.D.W., M.K., and colleagues), demonstrating the utility of high-resolution functional genomics maps to serve as a type of pathway annotation. Similar results have been seen using DNase hypersensitivity maps across a large number of cell types63, and by examining concordance between expression quantitative trait loci (eQTLs) and GWAS64,65. These approaches have demonstrated the power of reference epigenomes to identify relevant tissues for further study (Fig. 3b). Another way to use prior knowledge about variant function is to incorporate the information into the association study itself through Bayesian methods61,6669 or using boosting to prioritize disease networks70. However, it is difficult to evaluate the utility of these weighting schemes, which essentially discard loci about which there is the least functional data.

Burden tests; dealing with heterogeneity

For potentially causal rare variants discovered through whole-genome sequencing, a class of techniques has been developed that deal successfully with allelic heterogeneity and low allele frequencies by pooling mutations across individuals by genes, pathways, or other functional annotations and filters71; the additional use of functional genomic maps has recently been proposed72. Improved annotation of non-coding regions will obviously empower this type of analysis (Fig. 3c).

Table 5 lists examples of new insights from computational methods integrating regulatory elements with GWAS.

Table 5
Examples of regulatory enrichment analyses of genetic associations.

Interpreting variants using population variation in molecular phenotypes

While until this point we have discussed regulatory annotations from reference cell lines, biochemical activity is itself genotype-dependent, and thus a single reference annotation fails to capture the complexity of the regulatory genome. Moreover, we treated LD as a property of the human genome, while it is in fact population specific, and patterns of LD and selection have varied across both geography and time. This increased complexity can in fact be leveraged to gain additional insights into genome regulation, and provide additional power for the aforementioned analyses.

Genotype-associated molecular activity

Two powerful tools have emerged to identify non-coding loci that affect molecular phenotypes: association studies and allele-specificity studies. Association studies (Fig. 1b) have been used to discover non-coding cis regulators of methylation (meQTLs)73, DNase I sensitivity (dsQTLs)74, transcription factor binding75, gene expression (eQTLs)76, and alternative splicing77. In the same manner as GWAS on organism-level quantitative traits, these studies consider a phenotype associated with a particular genomic locus (such as steady-state mRNA level corresponding to a gene) in the same cell type isolated across unrelated individuals, and search for genetic regulators of those molecular processes. A recent related study used eQTL data to reveal selective signatures of epistasis between deleterious coding variants and the regulatory variants that modulate their penetrance78, a method which should be broadly applicable to testing hypotheses about cis regulatory interactions from genomics models.

Allele-specificity activity

In contrast, allele specificity tests look at heterozygous sites in individuals and look for a skew in the molecular signal towards one of the alleles (Fig. 1c). Allele-specific methylation79, histone modification80, DNAse I sensitivity81, protein binding82, and expression83 have been surveyed genomewide. While association studies have the advantage of identifying regulatory variants that may be acting at some genetic distance from the regulated locus, and can include homozygous individuals in the sample, allele-specific studies can be performed on single individuals, and inherently control for possible trans-regulatory differences caused by individuals’ genetic background.

Importance of population-specific effects

Causal variants within associated haplotypes should be identified not only for further research, but also for genetic counseling; because of variations in LD patterns, a SNP that marks a risk haplotype efficiently in one population may not in another84. Computational methods that explicitly model ethnic background in admixed populations can increase their power by exploiting their shared ancestry85.

Population differentiation and positive selection

Haplotype structure and allele frequencies from the HapMap project9 and 1000 Genomes project86 provide evidence of both positive and negative selection currently acting on the human lineage. Although the relative importance of population structure and selective sweeps in recent human history is debated8789, many non-coding loci show multiple lines of evidence for local adaptation90.

Utilizing population structure and relatedness

Ultimately, linkage analysis and GWAS are sensitive to complementary genetic architectures, but a wide spectrum of diseases likely exhibit both locus and allele heterogeneity. Because the genomically-distributed signals of association with complex disease are weak, the potential confounding effects of population stratification and cryptic relatedness become especially important to control. Family-based methods such as linkage analysis and the transmission disequilibrium test (TDT) are free of these complications, and have been combined with association tests in a new class of methods91. In addition, new methods in phylogenomics and ancestral recombination graph reconstruction provide an opportunity to enhance association studies by explicitly taking population structure and region-specific relatedness into account92,93.

Aggregate measures of purifying selection

Modeling of allele frequency data94,95 and sequence divergence data46 suggests that a large amount of negative selection is occurring outside of mammalian conserved elements, evidence for widespread non-coding function. These same forces can maintain disease-associated alleles at lower frequency in the population dependent on their penetrance and expressivity.

Identifying higher-order relationships between variants

Even when considering genome-wide enrichments of functional annotations in disease-associated regions, the aforementioned methods have so far considered each locus as acting independently and considered their effects as additive. Functional genomics should enable us to consider higher-order interactions between these individual loci, by leveraging functional and variation information to build interaction and regulatory networks. These networks can then guide the search for epistatic effects.

Detecting epistasis de novo

Substantial disagreement exists over the relative importance of epistasis in the genetic basis of complex disease9698. While genetic interactions have been systematically mapped in yeast99 and cases have been identified in human66, testing for all possible interactions remains impossible; understandably, detecting epistasis in association studies is an area of intense theoretical interest66,100,101. One method102 successfully discovered epistasis between two taste receptor genes affecting nicotine dependence by using a multifactor dimensionality reduction (MDR) method integrated with linkage information from a pedigree disequilibrium test, similar to the hybrid linkage-association studies described previously91.

Guiding search for epistasis

Some methods propose to limit the search space for interactions by only searching among the most significant independently-associated loci; this method failed to discover any interactions among the 180 loci reported to be associated with height103. Another proposed limit on the search space is with prior knowledge from gene annotations and protein-protein interactions104106. Again, epigenomic maps and improved regulatory annotation holds promise for zeroing in on relevant combinations of SNPs that might be expected to interact.

Linking enhancers to their target genes using physical interaction data

Unlike promoters, enhancers pose the dual challenge of both pinpointing their location in vast nonfunctional sequences, and linking them to their target genes. These distal regulatory elements often interact physically with promoters, and technologies to detect these interactions, such as chromatin conformation capture (3C, Hi-C)107,108 and chromatin interaction paired-end tagging (ChIA-PET)109 are advancing rapidly.

Linking enhancers to their target genes using cell-to-cell variability

Another way of detecting enhancer-gene relationships is to measure the correlation of these elements’ activity with expression across multiple cell types and conditions. This technique is being used to infer gene regulatory networks in human35 and model organisms99,110. While protein-protein interaction and metabolic networks are the most common types of prior knowledge integrated into existing algorithms, these regulatory networks may provide a more useful starting point in the search for epistasis.

Inferring networks from individual-to-individual variability

Molecular QTL data discovered from inter-individual variation can also being used to help infer regulatory networks111, which unlike evidence learned solely from expression patterns provide unambiguous directionality for causality.

Inferring networks from systematic perturbations

Chemical perturbations of cultured cells have been used for network inference. These experiments are useful not only for their relevance to understanding pharmacological mechanisms, but also for revealing the difference in network topology between normal and cancerous cells112, including gene-gene and gene-drug interactions relevant to interpreting genetic architecture of cancer.

Artificial selection and drug response experiments in model organisms

While human genetic history and selective pressures are closely intertwined, model organisms offer an opportunity to measure the global effects of selection and the resulting genetic interactions in a controlled setting113,114. Model organisms have also proven useful for testing gene-gene99 and gene-drug115 interactions on a scale that is impossible in humans.

Functional genomics in a medical setting

While genotyping and sequencing is already becoming commonplace for discovery of disease loci and increasingly for diagnostics in a clinical setting, in the future the democratization of genome-wide molecular profiling technologies will further enable cohort-level molecular association studies and personal functional genomics in a medical setting. These can complement existing genetic and chemical biomarkers with molecular-level diagnostics of disease state.

Functional genomics of disease cohorts

One of the major clinical applications of DNA microarrays was to identify disease-involved genes and to classify disease subtypes by genome-wide expression signatures116, and disease-associated gene sets from microarrays and now RNA-seq can be used to define biological pathways, such as those in the Molecular Signatures Database (MSigDB)117. Similarly, chromatin maps can be compared across lineages or between disease and normal tissue to define sets of regulating loci (Fig. 1d). These sets can be used for enrichment and pathway analysis of GWAS, as described previously.

Epigenome-phenotype association

Microarray-based assays for methylation are now allowing for the first time “epigenome-wide association studies” (EWAS)118, which identify differentially-methylated sites associated with disease without taking into account genotype (Fig. 1d). Such studies may bypass some of the environmental variability that lowers the penetrance of genetic factors119. Integrating family members into EWAS studies may be especially useful in order to test for imprinting and other parent-of-origin effects.

Genetic association with molecular phenotypes for determining causality

One important future use of molecular QTLs may be to empower Mendelian randomization studies120,121. Molecular traits - expression, epigenetic state, or biomarkers - can be important stepping stones between genetic variation and complex phenotypes, but the direction of causality can be unclear between the molecular trait and the organismal trait. A recent study used this method to challenge the idea that raising HDL cholesterol levels reduces risk of myocardial infarction, showing that alleles for higher HDL did not convey the genetic protection from heart disease that would be expected if cholesterol were causal122.

Predicting molecular consequences of rare and private mutations

Once these regulatory mechanisms are predicted from functional genomics and molecular variation, the next challenge is applying this knowledge to rare variants discovered by whole-genome sequencing (Figure 2d). A goal for regulatory genomics should be to develop models that predict the effect of novel regulatory variants with the same accuracy as existing methods for novel protein-coding variants.

Functional genomics of individuals

Some expression signatures of disease subtypes or progression are already being used clinically, and their use promises to grow. However, analogous to the problem of rare variants discovered through sequencing, clinical functional genomics samples will also exhibit patterns too rare in the population to have been correlated with disease. As a recent pilot study on an individual demonstrates123, there is both great power but also many challenges associated with interpreting such personal -omics profiling, and new computational models are needed that can generalize from the effects of common genetic and functional variation to personal genetics and functional genomics.

Hurdles in biomedical informatics and interoperability

In addition to these conceptual challenges of statistical and computational integration of disparate datasets, each of these topics has relied on extensive data sharing between genomics and medical genetics researchers. However, sharing is still limited due to privacy concerns and informatics challenges of database interoperability. These challenges are even greater for non-genomic datasets such as medical records and drug response, resulting in treasure troves of information remaining unused. To complete the integration of genomics into the drug discovery and target validation pipelines, several additional hurdles need to be overcome:

GWAS P-value sharing

In order to facilitate integrative analysis, GWAS investigators should report the association of all variants, not just those that are most significant. The editorial board of Nature Genetics recently articulated a policy to this effect124, but concerns remain about sufficiently de-identifying association results in order to protect subject privacy125. Procedures in place at central archives such as the NCBI’s database of Genotypes and Phenotypes (dbGaP) and the European Genome-Phenome Archive (EGA) are crucial to balancing the rights of human subjects with the principles of scientific openness.

Database integration

The interoperability of databases remains paramount to integrative analysis. Continuing efforts by the UCSC Genome Browser and the ENSEMBL Genome Browser have facilitated integration of epigenomic and variation data, but better connections to domain-specific knowledge bases such as the GTex eQTL Browser, dbGaP analyses, and the NHGRI GWAS Catalog11 would broaden the scope of connections available to geneticists.

Medical record standardization

Medical records have been successfully mined to discover epidemiological patterns126, adverse drug reactions127, and disease risk factors and heterogeneity128. As electronic medical records become populated with genetic data, cooperation with clinicians will be needed in order to mine patient data for genetic associations with biomarkers and disease, and discover novel patterns of disease heterogeneity129.

Integration of medical and pharmacogenomics datasets

Ultimately, informatics challenges will need to be resolved in order to connect the resulting molecular predictions to patient records, environmental variables, drug screening and response databases, towards enabling genomics as commonplace for clinical practice.

CONCLUSIONS

Data from GWAS and whole-genome sequencing continue to expand the catalog of non-coding variants implicated in human disease, and data from epigenome mapping consortia complemented with regulatory modeling are needed to prioritize candidate causal variants and candidate affected tissues. Thoughtful integration of systematic and manual annotations of gene sets along with higher-resolution functional maps may hold the key to implicating pathways and cell types, both through joint consideration of the many weak additive associations discovered in GWAS as well as in the search for epistatic interactions between variants. Clinically relevant regulatory interactions may then be tested experimentally in the tissues or in vitro experimental conditions that are predicted to recapitulate the phenotype. In addition, an explosion of functional genomics data has been facilitated by high-throughput sequencing technology, allowing “intermediate” molecular phenotypes to be correlated with both organismal phenotype and with genotype. This new type of data can be combined with genetic associations to decipher the mechanisms underlying complex disease.

Acknowledgments

L.D.W. and M.K. were funded by NIH grants R01HG004037 and RC1HG005334 and NSF CAREER grant 0644282.

References

1. Collins F. Has the revolution arrived? Nature. 2010;464:674–675. [PubMed]
2. Lander ES. Initial impact of the sequencing of the human genome. Nature. 2011;470:187–197. [PubMed]
3. Botstein D, Risch N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genetics. 2003;33:228–237. [PubMed]
4. Hamosh A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research. 2004;33:D514–D517. [PMC free article] [PubMed]
5. Botstein D, White RL, Skolnick M, Davis RW. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet. 1980;32:314–331. [PubMed]
6. Lander ES, Botstein D. Mapping Mendelian Factors Underlying Quantitative Traits Using RFLP Linkage Maps. Genetics. 1989;121:185–199. [PubMed]
7. Watson JD. The Human Genome Project: Past, Present, and Future. Science. 1990;248:44–49. [PubMed]
8. Lander E, Kruglyak L. Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet. 1995;11:241–247. [PubMed]
9. Gibbs RA, et al. The International HapMap Project. Nature. 2003;426:789–796. [PubMed]
10. McCarthy MI, et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics. 2008;9:356–369. [PubMed]
11. Hindorff LA, et al. Potential Etiologic and Functional Implications of Genome-Wide Association Loci for Human Diseases and Traits. PNAS. 2009;106:9362–9367. [PubMed]
12. Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. [PMC free article] [PubMed]
13. Cirulli ET, Goldstein DB. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nature Reviews Genetics. 2010;11:415–425. [PubMed]
14. Fisher R. The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society Edinburgh. 1918;52:399–433.
15. Visscher PM, McEVOY B, Yang J. From Galton to GWAS: Quantitative Genetics of Human Height. Genetics Research. 2010;92:371–379. [PubMed]
16. MacArthur DG, et al. A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes. Science. 2012;335:823–828. [PMC free article] [PubMed]
17. Nelson MR, et al. An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People. Science. 2012 doi: 10.1126/science.1217876. [PubMed] [Cross Ref]
18. Park PJ. ChIP–seq: advantages and challenges of a maturing technology. Nature Reviews Genetics. 2009;10:669–680. [PMC free article] [PubMed]
19. Meissner A, et al. Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucl Acids Res. 2005;33:5868–5877. [PMC free article] [PubMed]
20. Boyle AP, et al. High-Resolution Mapping and Characterization of Open Chromatin across the Genome. Cell. 2008;132:311–322. [PMC free article] [PubMed]
21. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. [PMC free article] [PubMed]
22. Bernstein BE, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol. 2010;28:1045–1048. [PMC free article] [PubMed]
23. Adams D, et al. BLUEPRINT to decode the epigenetic signature written in blood. Nature Biotechnology. 2012;30:224–226. [PubMed]
24. Bussemaker HJ, Foat BC, Ward LD. Predictive Modeling of Genome-Wide mRNA Expression: From Modules to Molecules. Annual Review of Biophysics and Biomolecular Structure. 2007;36:329–347. [PubMed]
25. Tompa M, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnology. 2005;23:137–144. [PubMed]
26. Barash Y, et al. Deciphering the splicing code. Nature. 2010;465:53–59. [PubMed]
27. Wang Z, Burge CB. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA. 2008;14:802–813. [PubMed]
28. Xie X, et al. Systematic discovery of regulatory motifs in human promoters and 3|[prime]| UTRs by comparison of several mammals. Nature. 2005;434:338–345. [PMC free article] [PubMed]
29. Moses A, Chiang D, Pollard D, Iyer V, Eisen M. MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biology. 2004;5:R98. [PMC free article] [PubMed]
30. Hesselberth JR, et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods. 2009;6:283–289. [PMC free article] [PubMed]
31. Henikoff JG, Belsky JA, Krassovsky K, MacAlpine DM, Henikoff S. Epigenome characterization at single base-pair resolution. PNAS. 2011;108:18318–18323. [PubMed]
32. Rhee HS, Pugh BF. Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution. Cell. 2011;147:1408–1419. [PMC free article] [PubMed]
33. Beer MA, Tavazoie S. Predicting gene expression from sequence. Cell. 2004;117:185–198. [PubMed]
34. Roy S, et al. Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science. 2010;330:1787–1797. [PMC free article] [PubMed]
35. Gerstein MB, et al. Architecture of the human regulatory network derived from ENCODE data. Nature. in press. [PubMed]
36. Davidson EH, et al. A Genomic Regulatory Network for Development. Science. 2002;295:1669–1678. [PubMed]
37. Patwardhan RP, et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol. 2012;30:265–270. [PMC free article] [PubMed]
38. Sharon E, et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat Biotechnol. 2012;30:521–530. [PMC free article] [PubMed]
39. Melnikov A, et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol. 2012;30:271–277. [PMC free article] [PubMed]
40. Davydov EV, et al. Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++ PLoS Comput Biol. 2010;6:e1001025. [PMC free article] [PubMed]
41. Lindblad-Toh K, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478:476–482. [PMC free article] [PubMed]
42. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003;423:241–254. [PubMed]
43. Stark A, et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature. 2007;450:219–232. [PMC free article] [PubMed]
44. Papatsenko D, Kislyuk A, Levine M, Dubchak I. Conservation patterns in different functional sequence categories of divergent Drosophila species. Genomics. 2006;88:431–442. [PubMed]
45. Dermitzakis ET, Clark AG. Evolution of Transcription Factor Binding Sites in Mammalian Gene Regulatory Regions: Conservation and Turnover. Mol Biol Evol. 2002;19:1114–1121. [PubMed]
46. Meader S, Ponting CP, Lunter G. Massive turnover of functional sequence in human and other mammalian genomes. Genome Res. 2010;20:1335–1343. [PubMed]
47. Schmidt D, et al. Five-Vertebrate ChIP-Seq Reveals the Evolutionary Dynamics of Transcription Factor Binding. Science. 2010;328:1036–1040. [PMC free article] [PubMed]
48. Brawand D, et al. The evolution of gene expression levels in mammalian organs. Nature. 2011;478:343–348. [PubMed]
49. Ng PC, Henikoff S. SIFT: Predicting Amino Acid Changes That Affect Protein Function. Nucl Acids Res. 2003;31:3812–3814. [PMC free article] [PubMed]
50. Yue P, Melamud E, Moult J. SNPs3D: Candidate gene and SNP selection for association studies. BMC Bioinformatics. 2006;7:166. [PMC free article] [PubMed]
51. Ramensky V, Bork P, Sunyaev S. Human Non-synonymous SNPs: Server and Survey. Nucl Acids Res. 2002;30:3894–3900. [PMC free article] [PubMed]
52. Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nature Methods. 2010;7:248–249. [PMC free article] [PubMed]
53. Baker M. Functional genomics: The changes that count. Nature. 2012;482:257–262. [PubMed]
54. Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012;40:D930–934. [PMC free article] [PubMed]
55. Boyle AP, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22:1790–1797. [PubMed]
56. McLaren W, et al. Deriving the Consequences of Genomic Variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–2070. [PMC free article] [PubMed]
57. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucl Acids Res. 2010;38:e164–e164. [PMC free article] [PubMed]
58. Yandell M, et al. A Probabilistic Disease-Gene Finder for Personal Genomes. Genome Res. 2011 doi: 10.1101/gr.123158.111. [PubMed] [Cross Ref]
59. Subramanian A, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102:15545–15550. [PubMed]
60. Wang K, Li M, Hakonarson H. Analysing biological pathways in genome-wide association studies. Nat Rev Genet. 2010;11:843–854. [PubMed]
61. McKinney BA, Pajewski NM. Six Degrees of Epistasis: Statistical Network Models for GWAS. Front Genet. 2012;2 [PMC free article] [PubMed]
62. Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. [PMC free article] [PubMed]
63. Maurano MT, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. [PubMed]
64. Nica AC, et al. Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations. PLoS Genet. 2010;6:e1000895. [PMC free article] [PubMed]
65. Nicolae DL, et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010;6:e1000888. [PMC free article] [PubMed]
66. Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. Am J Hum Genet. 2010;86:6–22. [PubMed]
67. Knight J, Barnes MR, Breen G, Weale ME. Using Functional Annotation for the Empirical Determination of Bayes Factors for Genome-Wide Association Study Analysis. PLoS ONE. 2011;6:e14808. [PMC free article] [PubMed]
68. Lewinger JP, Conti DV, Baurley JW, Triche TJ, Thomas DC. Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet Epidemiol. 2007;31:871–882. [PubMed]
69. Chen GK, Witte JS. Enriching the analysis of genomewide association studies with hierarchical modeling. Am J Hum Genet. 2007;81:397–404. [PubMed]
70. Lee I, Blom UM, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21:1109–1121. [PubMed]
71. Dering C, Hemmelmann C, Pugh E, Ziegler A. Statistical analysis of rare sequence variants: an overview of collapsing methods. Genetic Epidemiology. 2011;35:S12–S17. [PMC free article] [PubMed]
72. Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nature Reviews Genetics. 2010;11:773–785. [PubMed]
73. Pai AA, Bell JT, Marioni JC, Pritchard JK, Gilad Y. A genome-wide study of DNA methylation patterns and gene expression levels in multiple human and chimpanzee tissues. PLoS Genet. 2011;7:e1001316. [PMC free article] [PubMed]
74. Degner JF, et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature. 2012;482:390–394. [PMC free article] [PubMed]
75. Kasowski M, et al. Variation in Transcription Factor Binding Among Humans. Science. 2010;328:232–235. [PMC free article] [PubMed]
76. Majewski J, Pastinen T. The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends in Genetics. 2011;27:72–79. [PubMed]
77. Pickrell JK, et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464:768–772. [PMC free article] [PubMed]
78. Lappalainen T, Montgomery SB, Nica AC, Dermitzakis ET. Epistatic Selection between Coding and Regulatory Variation in Human Evolution and Disease. Am J Hum Genet. 2011;89:459–463. [PubMed]
79. Kerkel K, et al. Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation. Nat Genet. 2008;40:904–908. [PubMed]
80. Prendergast JG, Tong P, Hay DC, Farrington SM, Semple CA. A genome-wide screen in human embryonic stem cells reveals novel sites of allele-specific histone modification associated with known disease loci. Epigenetics & Chromatin. 2012;5:6. [PMC free article] [PubMed]
81. McDaniell R, et al. Heritable Individual-Specific and Allele-Specific Chromatin Signatures in Humans. Science. 2010;328:235–239. [PMC free article] [PubMed]
82. Maynard ND, Chen J, Stuart RK, Fan JB, Ren B. Genome-wide mapping of allele-specific protein-DNA interactions in human cells. Nature Methods. 2008;5:307–309. [PubMed]
83. Ge B, et al. Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nat Genet. 2009;41:1216–1222. [PubMed]
84. Ng PC, Murray SS, Levy S, Venter JC. An agenda for personalized medicine. Nature. 2009;461:724–726. [PubMed]
85. Patterson N, et al. Methods for high-density admixture mapping of disease genes. Am J Hum Genet. 2004;74:979–1000. [PubMed]
86. Consortium T. 1000 G. P. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. [PMC free article] [PubMed]
87. Coop G, et al. The Role of Geography in Human Adaptation. PLoS Genet. 2009;5:e1000500. [PMC free article] [PubMed]
88. Hernandez RD, et al. Classic Selective Sweeps Were Rare in Recent Human Evolution. Science. 2011;331:920–924. [PMC free article] [PubMed]
89. Sabeti PC, et al. Positive natural selection in the human lineage. Science. 2006;312:1614–1620. [PubMed]
90. Grossman SR, et al. A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection. Science. 2010;327:883–886. [PubMed]
91. Ott J, Kamatani Y, Lathrop M. Family-based designs for genome-wide association studies. Nature Reviews Genetics. 2011;12:465–474. [PubMed]
92. Minichiello MJ, Durbin R. Mapping trait loci by use of inferred ancestral recombination graphs. Am J Hum Genet. 2006;79:910–922. [PubMed]
93. Wu Y. Association mapping of complex diseases with ancestral recombination graphs: models and efficient algorithms. J Comput Biol. 2008;15:667–684. [PubMed]
94. Asthana S, et al. Widely Distributed Noncoding Purifying Selection in the Human Genome. PNAS. 2007;104:12410–12415. [PubMed]
95. Ward LD, Kellis M. Evidence of Abundant Purifying Selection in Humans for Recently Acquired Regulatory Functions. Science. 2012 doi: 10.1126/science.1225057. [PubMed] [Cross Ref]
96. Hill WG, Goddard ME, Visscher PM. Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits. PLoS Genet. 2008;4:e1000008. [PMC free article] [PubMed]
97. Shao H, et al. Genetic Architecture of Complex Traits: Large Phenotypic Effects and Pervasive Epistasis. PNAS. 2008;105:19910–19914. [PubMed]
98. Zuk O, Hechter E, Sunyaev SR, Lander ES. The Mystery of Missing Heritability: Genetic Interactions Create Phantom Heritability. PNAS. 2012 doi: 10.1073/pnas.1119675109. [PubMed] [Cross Ref]
99. Costanzo M, et al. The Genetic Landscape of a Cell. Science. 2010;327:425–431. [PubMed]
100. Cordell HJ. Detecting gene|[ndash]|gene interactions that underlie human diseases. Nature Reviews Genetics. 2009;10:392–404. [PMC free article] [PubMed]
101. Musani SK, et al. Detection of gene x gene interactions in genome-wide association studies of human population data. Hum Hered. 2007;63:67–84. [PubMed]
102. Lou XY, et al. A Combinatorial Approach to Detecting Gene-Gene and Gene-Environment Interactions in Family Studies. Am J Hum Genet. 2008;83:457–467. [PubMed]
103. Allen HL, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467:832–838. [PMC free article] [PubMed]
104. Emily M, Mailund T, Hein J, Schauser L, Schierup MH. Using biological networks to search for interacting loci in genome-wide association studies. Eur J Hum Genet. 2009;17:1231–1240. [PMC free article] [PubMed]
105. Mechanic LE, Luke BT, Goodman JE, Chanock SJ, Harris CC. Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions. BMC Bioinformatics. 2008;9:146. [PMC free article] [PubMed]
106. Pattin KA, Moore JH. Exploiting the Proteome to Improve the Genome-Wide Genetic Analysis of Epistasis in Common Human Diseases. Hum Genet. 2008;124:19–29. [PMC free article] [PubMed]
107. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. [PubMed]
108. Lieberman-Aiden E, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326:289–293. [PMC free article] [PubMed]
109. Fullwood MJ, et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. [PMC free article] [PubMed]
110. Cheng C, et al. Construction and analysis of an integrated regulatory network derived from high-throughput sequencing data. PLoS Comput Biol. 2011;7:e1002190. [PMC free article] [PubMed]
111. Zhu J, et al. Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nature Genetics. 2008;40:854–861. [PMC free article] [PubMed]
112. Barretina J, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. [PMC free article] [PubMed]
113. Burke MK, et al. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature. 2010;467:587–590. [PubMed]
114. Gresham D, et al. The repertoire and dynamics of evolutionary adaptations to controlled nutrient-limited environments in yeast. PLoS Genet. 2008;4:e1000303. [PMC free article] [PubMed]
115. Perlstein EO, Ruderfer DM, Roberts DC, Schreiber SL, Kruglyak L. Genetic basis of individual differences in the response to small-molecule drugs in yeast. Nature Genetics. 2007;39:496–502. [PubMed]
116. Quackenbush J. Microarray analysis and tumor classification. N Engl J Med. 2006;354:2463–2472. [PubMed]
117. Liberzon A, et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–1740. [PMC free article] [PubMed]
118. Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nature Reviews Genetics. 2011;12:529–541. [PMC free article] [PubMed]
119. Petronis A. Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature. 2010;465:721–727. [PubMed]
120. Chen LS, Emmert-Streib F, Storey JD. Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol. 2007;8:R219. [PMC free article] [PubMed]
121. Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Davey Smith G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008;27:1133–1163. [PubMed]
122. Voight BF, et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. The Lancet. doi: 10.1016/S0140-6736(12)60312-2. [PMC free article] [PubMed] [Cross Ref]
123. Chen R, et al. Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes. Cell. 2012;148:1293–1307. [PMC free article] [PubMed]
124. Asking for more. Nature Genetics. 2012;44:733–733. [PubMed]
125. Homer N, et al. Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays. PLoS Genet. 2008;4:e1000167. [PMC free article] [PubMed]
126. Salathé M, et al. Digital epidemiology. PLoS Comput Biol. 2012;8:e1002616. [PMC free article] [PubMed]
127. Brownstein JS, Sordo M, Kohane IS, Mandl KD. The Tell-Tale Heart: Population-Based Surveillance Reveals an Association of Rofecoxib and Celecoxib with Myocardial Infarction. PLoS ONE. 2007;2 [PMC free article] [PubMed]
128. Roque FS, et al. Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts. PLoS Comput Biol. 2011;7:e1002141. [PMC free article] [PubMed]
129. Wilke RA, et al. The emerging role of electronic medical records in pharmacogenomics. Clin Pharmacol Ther. 2011;89:379–386. [PMC free article] [PubMed]
130. Kraft P, Hunter DJ. Genetic Risk Prediction — Are We There Yet? New England Journal of Medicine. 2009;360:1701–1703. [PubMed]
131. Yngvadottir B, MacArthur DG, Jin H, Tyler-Smith C. The promise and reality of personal genomics. Genome Biol. 2009;10:237. [PMC free article] [PubMed]
132. Roberts NJ, et al. The Predictive Capacity of Personal Genome Sequencing. Sci Transl Med. 2012;4:133ra58–133ra58. [PubMed]
133. Jostins L, Barrett JC. Genetic risk prediction in complex disease. Hum Mol Genet. 2011;20:R182–188. [PMC free article] [PubMed]
134. Stahl EA, et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature Genetics. 2012;44:483–489. [PMC free article] [PubMed]
135. Purcell SM, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. [PubMed]
136. Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Reviews Genetics. 2011;12:628–640. [PubMed]
137. Gibson G. Rare and common variants: twenty arguments. Nature Reviews Genetics. 2012;13:135–145. [PubMed]
138. Goldstein DB. The Importance of Synthetic Associations Will Only Be Resolved Empirically. PLoS Biol. 2011;9 [PMC free article] [PubMed]
139. Yang J, et al. Common SNPs explain a large proportion of heritability for human height. Nat Genet. 2010;42:565–569. [PMC free article] [PubMed]
140. Nebert DW, Zhang G, Vesell ES. From Human Genetics and Genomics to Pharmacogenetics and Pharmacogenomics: Past Lessons, Future Directions. Drug Metab Rev. 2008;40:187–224. [PMC free article] [PubMed]
141. Garrod AE, Harris H. Inborn errors of metabolism. 1909
142. Woo SL, Lidsky AS, Güttler F, Chandra T, Robson KJ. Cloned human phenylalanine hydroxylase gene allows prenatal diagnosis and carrier detection of classical phenylketonuria. Nature. 1983;306:151–155. [PubMed]
143. Riordan JR, et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science. 1989;245:1066–1073. [PubMed]
144. Audrézet M, et al. Genomic rearrangements in the CFTR gene: Extensive allelic heterogeneity and diverse mutational mechanisms. Human Mutation. 2004;23:343–357. [PubMed]
145. Zschocke J. Phenylketonuria mutations in Europe. Human Mutation. 2003;21:345–356. [PubMed]
146. Amiel J, et al. Hirschsprung Disease, Associated Syndromes and Genetics: A Review. J Med Genet. 2008;45:1–14. [PubMed]
147. Nica AC, et al. The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study. PLoS Genet. 2011;7:e1002003. [PMC free article] [PubMed]
148. King JL, Jukes TH. Non-Darwinian Evolution. Science. 1969;164:788–798. [PubMed]
149. Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624. [PubMed]
150. Ohno S. So much ‘junk’ DNA in our genome. Brookhaven symposia in biology. 1972;23:366–370. [PubMed]
151. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458:719–724. [PMC free article] [PubMed]
152. Korn JM, et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics. 2008;40:1253–1260. [PMC free article] [PubMed]
153. McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. [PubMed]
154. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010;34:816–834. [PMC free article] [PubMed]
155. Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics. 2007;39:906–913. [PubMed]
156. Servin B, Stephens M. Imputation-Based Analysis of Association Studies: Candidate Regions and Quantitative Traits. PLoS Genet. 2007;3:e114. [PMC free article] [PubMed]
157. Price AL, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–909. [PubMed]
158. Purcell S, et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am J Hum Genet. 2007;81:559–575. [PubMed]
159. Veyrieras J-B, et al. High-Resolution Mapping of Expression-QTLs Yields Insight into Human Gene Regulation. PLoS Genet. 2008;4 [PMC free article] [PubMed]
160. Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. [PMC free article] [PubMed]
161. Rozowsky J, et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol Syst Biol. 2011;7:522. [PMC free article] [PubMed]
162. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article 3. [PubMed]
163. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. [PMC free article] [PubMed]
164. Faustino NA, Cooper TA. Pre-mRNA Splicing and Human Disease. Genes Dev. 2003;17:419–437. [PubMed]
165. Cáceres JF, Kornblihtt AR. Alternative splicing: multiple control mechanisms and involvement in human disease. Trends in Genetics. 2002;18:186–193. [PubMed]
166. López-Bigas N, Audit B, Ouzounis C, Parra G, Guigó R. Are splicing mutations the most frequent cause of hereditary disease? FEBS Letters. 2005;579:1900–1903. [PubMed]
167. Barbaux S, et al. Donor splice-site mutations in WT1 are responsible for Frasier syndrome. Nat Genet. 1997;17:467–470. [PubMed]
168. Lorson CL, Hahnen E, Androphy EJ, Wirth B. A Single Nucleotide in the SMN Gene Regulates Splicing and Is Responsible for Spinal Muscular Atrophy. PNAS. 1999;96:6307–6311. [PubMed]
169. Cazzola M, Skoda RC. Translational Pathophysiology: A Novel Molecular Mechanism of Human Disease. Blood. 2000;95:3280–3288. [PubMed]
170. Bisio A, et al. Functional analysis of CDKN2A/p16INK4a 5′-UTR variants predisposing to melanoma. Hum Mol Genet. 2010;19:1479–1491. [PubMed]
171. Abelson JF, et al. Sequence Variants in SLITRK1 Are Associated with Tourette’s Syndrome. Science. 2005;310:317–320. [PubMed]
172. Guttman M, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458:223–227. [PMC free article] [PubMed]
173. Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–641. [PubMed]
174. Bonafé L, et al. Evolutionary Comparison Provides Evidence for Pathogenicity of RMRP Mutations. PLoS Genet. 2005;1:e47. [PubMed]
175. Cooper TA, Wan L, Dreyfuss G. RNA and Disease. Cell. 2009;136:777–793. [PMC free article] [PubMed]
176. Knight JC. Regulatory polymorphisms underlying complex disease traits. Journal of Molecular Medicine. 2004;83:97–109. [PMC free article] [PubMed]
177. Martin MP, et al. Genetic Acceleration of AIDS Progression by a Promoter Variant of CCR5. Science. 1998;282:1907–1911. [PubMed]
178. Bream JH, et al. CCR5 Promoter Alleles and Specific DNA Binding Factors. Science. 1999;284:223–223. [PubMed]
179. Bray NJ, et al. Allelic expression of APOE in human brain: effects of epsilon status and promoter haplotypes. Hum Mol Genet. 2004;13:2885–2892. [PubMed]
180. St George-Hyslop PH, Petit A. Molecular biology and genetics of Alzheimer’s disease. C R Biol. 2005;328:119–130. [PubMed]
181. Exner M, Minar E, Wagner O, Schillinger M. The role of heme oxygenase-1 promoter polymorphisms in human disease. Free Radical Biology and Medicine. 2004;37:1097–1104. [PubMed]
182. Kleinjan DA, van Heyningen V. Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease. The American Journal of Human Genetics. 2005;76:8–32. [PubMed]
183. Noonan JP, McCallion AS. Genomics of Long-Range Regulatory Elements. Annual Review of Genomics and Human Genetics. 2010;11:1–23. [PubMed]
184. Visel A, Rubin EM, Pennacchio LA. Genomic views of distant-acting enhancers. Nature. 2009;461:199–205. [PMC free article] [PubMed]
185. Lettice LA, et al. A Long-Range Shh Enhancer Regulates Expression in the Developing Limb and Fin and Is Associated with Preaxial Polydactyly. Hum Mol Genet. 2003;12:1725–1735. [PubMed]
186. Sakabe NJ, Savic D, Nobrega MA. Transcriptional enhancers in development and disease. Genome Biology. 2012;13:238. [PMC free article] [PubMed]
187. Pomerantz MM, et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nature Genetics. 2009;41:882–884. [PMC free article] [PubMed]
188. Tuupanen S, et al. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nature Genetics. 2009;41:885–890. [PubMed]
189. Wasserman NF, Aneas I, Nobrega MA. An 8q24 Gene Desert Variant Associated with Prostate Cancer Risk Confers Differential in Vivo Activity to a MYC Enhancer. Genome Res. 2010;20:1191–1197. [PubMed]
190. Duan J, et al. Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. Hum Mol Genet. 2003;12:205–216. [PubMed]
191. SeattleSeq Annotation. at < http://snp.gs.washington.edu/SeattleSeqAnnotation/>.
192. Burgner D, et al. A Genome-Wide Association Study Identifies Novel and Functionally Related Susceptibility Loci for Kawasaki Disease. PLoS Genet. 2009;5:e1000319. [PMC free article] [PubMed]
193. Emilsson V, et al. Genetics of gene expression and its effect on disease. Nature. 2008;452:423–428. [PubMed]
194. Segrè AV, Groop L, Mootha VK, Daly MJ, Altshuler D. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 2010;6 [PMC free article] [PubMed]
195. Raychaudhuri S, et al. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 2009;5:e1000534. [PMC free article] [PubMed]
196. Fransen K, et al. Analysis of SNPs with an effect on gene expression identifies UBE2L3 and BCL3 as potential new risk genes for Crohn’s disease. Hum Mol Genet. 2010;19:3482–3488. [PubMed]
197. Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nature Biotechnology. 2010;28:817–825. [PMC free article] [PubMed]
198. Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res. 2012;22:1748–1759. [PubMed]
199. John S, et al. Chromatin accessibility pre-determines glucocorticoid receptor binding patterns. Nature Genetics. 2011;43:264–268. [PubMed]
200. Cowper-Sal-lari R, et al. Breast cancer risk-associated SNPs modulate the affinity of chromatin for FOXA1 and alter gene expression. Nat Genet. 2012 doi: 10.1038/ng.2416. [PMC free article] [PubMed] [Cross Ref]