|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
Many common human diseases and traits are known to cluster in families and are believed to be influenced by several genetic and environmental factors, but until recently the identification of genetic variants contributing to these ‘complex diseases’ has been slow and arduous1. Genome-wide association studies (GWAS), in which several hundred thousand to more than a million single nucleotide polymorphisms (SNPs) are assayed in thousands of individuals, represent a powerful new tool for investigating the genetic architecture of complex diseases1,2. In the past few years, these studies have identified hundreds of genetic variants associated with such conditions and have provided valuable insights into the complexities of their genetic architecture3,4.
The genome-wide association (GWA) method represents an important advance compared to ‘candidate gene’ studies, in which sample sizes are generally smaller and the variants assayed are limited to a selected few, often on the basis of imperfect understanding of biological pathways and often yielding associations that are difficult to replicate5,6. GWAS are also an important step beyond family-based linkage studies, in which inheritance patterns are related to several hundreds to thousands of genomic markers. Despite many clear successes in single-gene ‘Mendelian’ disorders7,8, the limited success of linkage studies in complex diseases has been attributed to their low power and resolution for variants of modest effect9–11.
The underlying rationale for GWAS is the ‘common disease, common variant’ hypothesis, positing that common diseases are attributable in part to allelic variants present in more than 1–5% of the population12–14. They have been facilitated by the development of commercial ‘SNP chips’ or arrays that capture most, although not all, common variation in the genome. Although the allelic architecture of some conditions, notably age-related macular degeneration, for the most part reflects the contributions of several variants of large effect (defined loosely here as those increasing disease risk by twofold or more), most common variants individually or in combination confer relatively small increments in risk (1.1–1.5-fold) and explain only a small proportion of heritability—the portion of phenotypic variance in a population attributable to additive genetic factors3. For example, at least 40 loci have been associated with human height, a classic complex trait with an estimated heritability of about 80%, yet they explain only about 5% of phenotypic variance despite studies of tens of thousands of people15. Although disease-associated variants occur more frequently in protein-coding regions than expected from their representation on genotyping arrays, in which over-representation of common and functional variants may introduce analytical biases, the vast majority (>80%) of associated variants fall outside coding regions, emphasizing the importance of including both coding and non-coding regions in the search for disease-associated variants3.
The questions arise as to why so much of the heritability is apparently unexplained by initial GWA findings, and why it is important. It is important because a substantial proportion of individual differences in disease susceptibility is known to be due to genetic factors, and understanding this genetic variation may contribute to better prevention, diagnosis and treatment of disease. It is important to recognize, however, that few investigators expected these studies immediately to find all of the variants associated with common diseases, or even most of them; the hope was that they would at least find some16. Limitations in the design of early GWAS, such as imprecise phenotyping and the use of control groups of questionable comparability, may have reduced estimates of effect sizes while preserving some ability to identify associated variants17. These studies have considerably surpassed early expectations, reproducibly identifying hundreds of variants in many dozens of traits, but for many traits they have explained only a small proportion of estimated heritability18.
Many explanations for this missing heritability have been suggested, including much larger numbers of variants of smaller effect yet to be found; rarer variants (possibly with larger effects) that are poorly detected by available genotyping arrays that focus on variants present in 5% or more of the population; structural variants poorly captured by existing arrays; low power to detect gene–gene interactions; and inadequate accounting for shared environment among relatives. Consensus is lacking, however, on approaches and priorities for research to examine what has been termed ‘dark matter’ of genome-wide association—dark matter in the sense that one is sure it exists, can detect its influence, but simply cannot ‘see’ it (yet). Here we examine potential sources of missing heritability and propose research strategies to illuminate the genetics of complex diseases.
It is reasonable to assume that allelic architecture (number, type, effect size and frequency of susceptibility variants) may differ across traits, and that missing heritability may take a different form for different diseases19, but at present our understanding is too limited to distinguish these possibilities. Age-related macular degeneration may provide the best example of a common disease in which heritability is substantially explained by a small number of common variants of large effect20, but for other conditions, such as Crohn’s disease, the proportion of heritability explained is not nearly so large despite a much larger number of identified variants21 (Table 1). There are no obvious differences between these two traits in genetic architecture as predicted from clinical and epidemiological data that would explain the differences observed in their allelic architecture. Some apparent differences may simply be due to differences in the stage of investigation across traits. Studies in several conditions have clearly demonstrated that the number of detected variants increases with increasing sample size22–24.
Population genetic theory suggests an explanation for the paucity of variants explaining a large proportion of disease predisposition, in that decreased reproductive fitness should typically act to reduce the frequencies of high-risk variants. This might explain the relative lack of variants detected so far for some neuropsychiatric conditions, such as autism spectrum disorders, given their low reproductive fitness25. Yet for a condition such as type 1 diabetes, which has a similar prevalence, familial risk, early onset and poor reproductive fitness (at least before the discovery of insulin therapy), more than 40 loci have already been reported; this might be because the overall sample sizes studied in type 1 diabetes have been very large26,27. Present-day reproductive fitness may correlate poorly with the forces that have shaped variation throughout human evolution; moreover focusing on the reproductive effects of a single disease ignores the pleiotropic effects (effects of the same variant on multiple characteristics or disease risks) of multiple alleles influencing that condition simultaneously with many other conditions28.
Selection might also be responsible for keeping genetic effect sizes low, as variants of larger effect may be selected against and eventually disappear19. Long-term stabilizing selection minimizes the production of individuals at the extremes of a trait29, in part by reducing the additive genetic effects of alleles already present or those arising de novo by mutation30 to levels potentially beneath the ability of studies of feasible size to detect them. Selection may also contribute to differences in the ability to detect loci in different complex diseases, if genetic susceptibility to some diseases is more strongly affected by selection than other diseases, or if environmental perturbations vary in intensity across diseases. Immune and infectious agents have been recognized as among the strongest selection pressures in human evolution31, and immune-related genes have been strongly implicated in Crohn’s disease and other immune-mediated diseases3, suggesting either that pleiotropic effects of these variants reduce the efficiency of negative selection, or that strong environmental perturbation in modern societies might expose the disease risk associated with these variants. Selection may thus explain why disease allele frequencies are low and allelic effects are small, but this should manifest as low, rather than missing, heritability.
A probable contributor to the small genetic effect sizes observed so far is that current investigations have incompletely surveyed the potential causal variants within each gene. Relative risks observed for marker SNPs may underestimate the actual risks associated with the true causal variants. Notably, 11 out of 30 genes implicated as carrying common variants associated with lipid levels also carry known rare alleles of large effect identified in Mendelian dyslipidemias, including ABCA1, PCSK9 and LDLR22,32, suggesting that genes containing common variants with modest effects on complex traits may also contain rare variants with larger effects.
An important consideration is that the overwhelming majority of GWAS and other genetic studies have been limited to European ancestry populations, whereas genetic variation is greatest in populations of recent African ancestry2, and studies in non-Europeans have yielded intriguing new variants33,34. Studies of populations of recent African ancestry in particular is likely to increase the yield of rare variants and narrow the large chromosomal regions of association identified in the ‘younger’ population due to extended linkage disequilibrium, or the tendency for adjacent genetic loci to be inherited together31. Isolated populations may also be of value given their potential to be enriched in unique variants35.
The accuracy of current heritability estimates is also important, because experimentally identified variants could never explain all the variance in an erroneously inflated heritability estimate. Heritability of quantitative traits, formally defined as the proportion of phenotypic variance in a population attributable to additive genetic factors (narrow-sense heritability, h2 (ref. 36)) is typically estimated from family studies, and can be expected to vary across environments. Narrow-sense heritability estimates in humans can be inflated if family resemblance is influenced by non-additive genetic effects (dominance and epistasis, or gene–gene interaction), shared familial environments, and by correlations or interactions among genotypes and environment36,37. However, heritabilities estimated from pedigree studies in animals agree well with heritability estimated from response to artificial selection, suggesting that estimates from family studies are not necessarily inflated.
Teasing apart the contributions to heritability of environmental factors shared among relatives will soon be possible because the availability of genome-wide markers now provides empirical estimates of identity-by-descent (IBD) allele sharing between pairs of relatives. For example, full sibs share on average half their genetic complement, but this proportion can vary—in one large study it ranged from 0.37 to 0.62 (ref. 38). By relating phenotypic differences to the observed IBD sharing fraction among sib pairs, marker data were used to generate a heritability estimate of 0.8 for height38. This is remarkably consistent with estimates using traditional methods but free of their assumptions, suggesting that for height at least, heritability is not over-estimated. Applying such estimation to distantly related or ‘unrelated’ individuals is now feasible using dense genomic scans39; given the number of people with dense genotyping data, heritability estimates could be generated for a wide variety of traits free of potential confounding by unmeasured shared environment.
Improving estimates of all contributors to heritability will facilitate determination of the proportion of genetic variance that has been explained. Despite imprecision in current estimates, it may still be possible to know that ‘all the heritability’ has been explained by predicting phenotypes in a new set of individuals from trait-associated markers, and correlating the predicted phenotypes with the actual values. If the markers truly explain all the additive genetic variance, the squared correlation between predicted and actual phenotype will be equal to the heritability40. Population-based heritability estimates thus provide a valuable metric for completeness of available genetic risk information, but individualized disease prevention and treatment will ultimately require identifying the variants accounting for risk in a given individual rather than on a population basis.
Much of the speculation about missing heritability from GWAS has focused on the possible contribution of variants of low minor allele frequency (MAF), defined here as roughly 0.5%<MAF <5%, or of rare variants (MAF<0.5%). Such variants are not sufficiently frequent to be captured by current GWA genotyping arrays14,41, nor do they carry sufficiently large effect sizes to be detected by classical linkage analysis in family studies (Fig. 1). Once MAF falls below 0.5%, detection of associations becomes unlikely unless effect sizes are very large, as in monogenic conditions. For modest effect sizes, association testing may require composite tests of overall ‘mutational load’, comparing frequencies of mutations of potentially similar functional effect in cases and controls.
Low frequency variants could have substantial effect sizes (increasing disease risk two- to threefold) without demonstrating clear Mendelian segregation, and could contribute substantially to missing heritability42. For example, 20 variants with risk allele frequency of 1% and allelic odds ratio (or probability of an event occurring divided by the probability of it not occurring, compared in people with versus without the risk allele) of three would account for most familial aggregation of type 2 diabetes. There are relatively few examples of such variants contributing to complex traits, possibly owing to insufficiently large sample sizes or insufficiently comprehensive arrays.
The primary technology for the detection of rare SNPs is sequencing, which may target regions of interest, or may examine the whole genome. ‘Next-generation’ sequencing technologies, which process millions of sequence reads in parallel, provide monumental increases in speed and volume of generated data free of the cloning biases and arduous sample preparation characteristic of capillary sequencing43. Detection of associations with low frequency and rare variants will be facilitated by the comprehensive catalogue of variants with MAF ≥ 1% being generated by the 1,000 Genomes Project (http://www.1000genomes.org/page.php), which will also identify many variants at lower allele frequencies. The pilot effort of that program has already identified more than 11 million new SNPs in initially low-depth coverage of 172 individuals44.
Current mechanisms for using sequencing to identify rare variants underlying or co-located with GWA-defined associations include sequencing in genomic regions defined by strong and repeatedly replicated associations with common variants, and sequencing a larger fraction of the genome in people with extreme phenotypes. In the absence of GWA-defined signals, sequencing candidate genes in subjects at the extremes of a quantitative trait (such as lipid levels or the age at onset), can identify other associated variants, both common and rare45,46. An important finding from these studies is that much of the information is provided by people at the extremes of trait distributions, who seem to be more likely to carry loss-of-function alleles47.
Sample sizes used for the initial identification of DNA sequence variants have generally been modest, and sample size requirements increase essentially linearly with 1/MAF. Much larger samples are needed for the identification of associations with variants than those needed for the detection of the variants themselves. They also scale roughly linearly with 1/MAF given a fixed odds ratio and fixed degree of linkage disequilibrium with genotyped markers. Sample size for association detection also scales approximately quadratically with 1/|(OR−1)|, and thus increases sharply as the odds ratio (OR) declines. Sample size is even more strongly affected by small odds ratios than by small MAF, so low frequency and rare variants will need to have higher odds ratios to be detected.
Complicating matters further, numerous rare variants may be detected in a gene or region but they may have disparate effects on phenotype. Common variants have typically been analysed individually 23,48, but with one or two carriers of each rare variant, pooling them using specific criteria becomes attractive47,49,50. Pooling variants of similar class increases the effective MAF of the class and reduces the number of tests performed, but raises several other questions (Box 1).
Research strategies using rare and low frequency and structural variants include: (1) using expanding catalogues of human sequence variation44, by linkage disequilibrium of rare/low frequency/structural variants with GWA-genotyped SNPs and/or improved detection methods, to identify variants underlying association signals identified by SNP arrays. (2) Improving approaches for using common SNPs to predict and control for differences in rare and low frequency SNPs. (3) Using targeted sequencing judiciously, focusing on people with extreme or unusual phenotypes. (4) Including populations of recent African ancestry in sequencing studies to increase yield of rare variants and narrow large linkage disequilibrium blocks; consider isolated or founder populations potentially enriched with unique variants. (5) Focusing discovery efforts on well-phenotyped groups, accessible families with large sibships, and families that allow return to family members for iterative phenotyping. (6) Increasing emphasis on other structural variants such as inversions and translocations. (7) Implementing chromosomal-region-specific matching throughout the genome, to select for each case and for each part of their genome—a control that is more similar to the case within that genomic region rather than matching genome-wide using measures such as geographic ancestry. (8) Pooling rare variants for analysis using logical criteria, by addressing the questions: do the different rare variants increase or decrease disease risk? What classes of variants should be pooled? What is the optimal level of MAF for pooling? (9) Improving CNV detection by developing more extensive population databases in large cohorts to understand allele and mutation frequency, inheritance among unaffected individuals, and CNV calling algorithms.
Determining which of the multitude of variants carried by an individual are responsible for a given phenotype represents a massive task, especially if the causal alleles are relatively anonymous in terms of known functional consequences. Because only a small proportion will have obvious functional consequences for the resultant protein, lesser evidence of association may suffice to implicate variants of this sort. The best approaches for combining functional credibility and statistical support in the evaluation of such variants remain to be determined. GWAS have tended to focus almost exclusively on statistical evidence and de-emphasize considerations of biological plausibility, but the challenges of sifting through the millions of rare variants in which two individuals differ may prompt a return to biology if rare variants are to be grouped and analysed properly.
The sheer number of inter-individual differences, mostly rare, to be detected by whole-genome sequencing (roughly 0.4% of 3 billion base pairs51) also raises the question of finding appropriate comparison subjects, or allelic matches, because people carrying rare variants at some loci may have important differences in ancestry or other factors from a general population. To reduce the number of variants that must be considered in a case-control comparison it would be useful to implement chromosomal-region-specific matching throughout the genome, to select closely related alleles and regions from the comparison population, thereby greatly reducing the number of incidental allelic differences from cases.
Structural variation, including copy number variants (CNVs, such as insertions and deletions) and copy neutral variation (such as inversions and translocations), may account for some of the unexplained heritability if those variants contribute to the genetic basis of human disease and are incompletely assessed by commercial SNP genotyping arrays. Although this type of variation has not been explicitly examined in most GWAS until now, CNVs in particular (regions 1 kilobase (kb) or longer present in variable numbers across individuals) have gained attention as methods to detect them have improved52,53. Other forms of structural variation such as inversions, translocations, microsatellite repeat expansions, insertions of new sequence, and complex rearrangements have been implicated in rare Mendelian conditions. For the most part such variation has been largely unexplored in relation to complex traits54.
Variation due to CNVs arises from a combination of rare and common alleles; as with SNPs most variants are rare but most of the differences between any two individuals arise from a limited set of common (MAF ≥ 5%) copy number polymorphisms (CNPs)55. Disease-associated CNVs detected so far, like disease-associated SNPs, include rare variants with large associated effect sizes, and common variants with more modest effects but carried by a large proportion of the population (Table 2). An added twist is that rare, highly penetrant CNVs have generally been large (600 kb–3 mega-bases (Mb), affecting many genes), whereas disease-associated common CNPs have been much smaller (20–45 kb) and have identified specific genomic features for follow-up study. Because both rare and common CNVs are under-ascertained by current methods, the relative affect of these variants will continue to be an important research question for CNVs just as for SNPs. Of note, CNVs arising de novo in current cases and shown to be of importance in neuropsychiatric and developmental conditions56–58 will not contribute to family resemblance and heritability, but could explain some of the variation at present attributed to ‘environment’.
Several approaches have been developed for integrating analysis of CNVs into GWAS, including innovation in the design of GWA arrays (with associated discoveries in neuropsychiatric disorders59,60) and the use of the linkage disequilibrium relationships between SNPs and common CNPs (with associated discoveries in Crohn’s disease and body weight52,61). These approaches are early in their development and have important limitations, although rapid progress is expected as CNV detection algorithms evolve and large-scale sequencing studies produce comprehensive, high-resolution maps of segregating CNPs that can be measured in large reference panels.
Many GWA data sets already have sufficient genotype and intensity information to permit calling of large, rare CNVs even if specific CNV probes were not included. As with non-structural single nucleotide sequence variants, more detailed (‘iterative’) phenotyping in relatives may reveal subtle phenotypic effects that were not initially appreciated.
Family studies provide several opportunities for the investigation and interpretation of as-yet-unidentified genetic variation of many types underlying complex diseases (Box 2). Family studies may facilitate the detection of rare and low frequency variants, and the identification of their associations with common diseases, because predisposing variants will be present at much higher frequency in affected relatives of an index case.
To investigate missing heritability using family studies, the following measures are required: (1) examine phenotypic effects of rare variants, particularly for subtle phenotypic abnormalities. (2) Investigate mutation rates and inheritance patterns of recurrent mutations. (3) Assess inheritance patterns of rare and structural variants. (4) Investigate parent-of-origin-specific effects. (5) Enhance power for identifying associated loci by studying affected sibs, particularly for conditions with substantial genetic heterogeneity. (6) Identify associated loci by unexpectedly long runs of identity-by-state sharing among distantly related affected relatives. (7) Enhance power of GWA scans by up-weighting P values in preselected regions based on linkage signals. (8) Identify gene–gene interactions by positive correlations between family-specific logs odds ratio (lod) scores or evidence of linkage disequilibrium among unlinked loci.
Family studies also permit the investigation of parent-of-origin-specific effects, as have been reported for structural variants62,63. If not properly accounted for, such effects could mask associations and diminish the proportion of heritability explained. High-density SNP data in extended pedigrees can be used to localize predisposition genes, as unexpectedly long runs of identity-by-state sharing among affected relatives suggest true IBD that is probably due to an underlying genetic cause64. Linkage data can also enhance the power of high-density GWA scans by essentially relaxing P-value thresholds in the few instances in which suggestive findings overlap but are not definitive65. Family studies may also be useful in identifying gene–gene interactions, because affected relatives are more likely to share two nearby epistatic loci in linkage disequilibrium that would be unlinked in unrelated individuals66,67.
The nearly 400 GWAS published so far represent a wealth of data on the genetics of complex diseases4. These studies have provided valuable insights into the genetics of common diseases, particularly about the underlying genetic architecture of complex traits and the predominance of non-coding variants that may have a role in their aetiology. Just as linkage studies demonstrated that complex diseases cannot be explained by a small number of rare variants with large effects, GWAS have shown that they cannot be explained by a limited number of common variants of moderate effect (Fig. 1). The distinction between low frequency and truly rare alleles is largely an operational one, relating to the potential, given realistic effect sizes, for detecting associations with low frequency variants by GWAS at attainable sample sizes. Low frequency variants of intermediate effect might also contribute to explaining missing heritability that should be tractable through large meta-analyses and/or imputation of genome-wide association data.
GWAS will probably remain an efficient way of investigating the remaining heritability, because their association signals may well define the genomic regions where rare variants, structural variants, and other forms of underlying variation are likely to cluster. The value of future studies can be enhanced by expanding to non-European samples and less common diseases and including more precise phenotypes and measures of environmental exposures48,68 (Box 3). Information on lower frequency alleles emerging from projects such as the 1,000 Genomes will be used to produce even more comprehensive GWA arrays, and will facilitate the investigation of the lower frequency spectrum without the need for de novo sequencing.
The following steps can be used to make the most of existing and future GWAS: (1) ensure the wide availability of data with appropriate protections for consent and privacy. (2) Increase sample sizes and ensure thorough meta- and mega-analyses of comparable data, with increased focus on conditions with relatively small sample sizes studied so far. (3) Expand studies to non-European samples and more diverse diseases. (4) Improve phenotyping by expanding to subtler or more quantitative or precise phenotypes as needed to reduce heterogeneity or explore pleiotropic effects. (5) Capture larger proportion of variation in implicated genes. (6) Enhance the investigation of the X chromosome, particularly as the methods for imputation of X and Y markers improve. (7) Investigate gene–gene interactions, including dominance and epistasis. (8) Investigate gene–environment interactions: measure environment rigorously and analyse it against GWA data; examine rare exposures in common diseases for unusual responders; consider including GWA in monozygotic twins or migrant studies to identify gene–environment interaction interactions; conduct suitably large (several hundred thousand people) prospective cohort studies with GWA genotyping, and reproducible reliable exposure measures at baseline; include routine biobanking of material suitable for epigenetic analysis, such as non-immortalized lymphocytes for DNA methylation or cryopreserved cell or nuclear preparations for chromatin studies; relate quantitative phenotypes to epigenetic variation, which unlike SNPs is inherently quantitative; measure epigenetic variants in appropriate tissues when technically feasible. (9) Measure CNVs: use linkage disequilibrium patterns of SNP data and improved maps and imputation methods to identify common CNPs; use SNP intensity data to identify large CNVs where feasible regions; use best possible CNP typing array until using next generation sequencing for this purpose becomes feasible.
GWAS were initially designed to focus on the higher end of the frequency-effect size spectrum, so much work remains to be done, both in finding other variants in the lower frequency and larger effect domains shown in Fig. 1, and in understanding their functional and pathophysiological properties. To the extent that there are several causal variants on a common haplotype or that causal variants are in imperfect linkage disequilibrium with genotyped markers, marker SNPs will underestimate the associated disease risk.
The modest size of genetic effects detected so far confirms the multifactorial aetiology of these conditions and suggests that complex diseases will require substantially greater research effort to detect additional genetic influences. Near-term approaches for finding missing heritability on which there seems to be wide agreement include: targeted or whole-genome sequencing in people with extreme phenotypes, especially those with available family members and consent for recontact and iterative phenotyping; use of expanded reference panels of genomic variation such as 1,000 Genomes to enhance coverage of existing and future GWAS; mining of existing GWAS for associations with structural variants and evidence of gene–gene interactions; improved methods for detection of CNVs and other structural variants, applied to large, well-phenotyped groups and families; and expansion of sample sizes for numerous complex diseases through larger individual studies and meta-analyses, including people of non-European ancestry.
Given all that has been learned of the genetic architecture of common diseases in the past few years, it may also be worthwhile to attempt exhaustive characterization of some well-studied traits by cataloguing all the contributing variation, be it in DNA sequence, DNA structure, chromatin structure, environmental modifiers, and defining all its functional implications. Potential criteria for deciding which traits to pursue aggressively in this way might include the strength and robustness of detected associations, evidence that associations are disrupted by varying linkage disequilibrium patterns, documented associations of identified loci with multiple traits, and public health importance of the traits to be studied.
Explaining missing heritability, however intellectually satisfying, will probably have fewer practical applications as an end in itself than as a means to an end. The ultimate goal of this line of research, as with nearly all research in the genetics of complex disease, is to improve understanding of human physiology and disease aetiology so that more effective means of diagnosis, treatment and prevention can be developed. If a genetic variant(s) was found that opened the door to effective new treatments at low cost and with minimal side effects (LDL-receptor mutations and the statin class of drugs comes to mind), one would probably be content to leave some heritability unexplained. It is the expectation that associations identified by GWAS or other genomic methods will eventually enable effective disease prevention or treatment, either through delineation of the functional properties of variants recognized at present, or identification of new variants in which true functionality lies, that primarily motivates the hunt for missing heritability.
It is more difficult to imagine predictive variants accounting for a sizeable proportion of disease risk without also explaining a sizeable proportion of heritability, and the limited incremental value in disease prediction of variants identified so far suggests that genetic prediction of complex diseases on a population basis will be challenging 69–71. Still, the identification of even many hundreds of risk variants of small effect should permit identification of the small proportion of a population at the highest genetically defined risk, in which targeted prevention strategies should be explored. If testing of such variants was to be conducted across several diseases, as is now feasible with dense genome-wide association genotyping and will be greatly facilitated by whole-genome sequencing, a sizeable number of people could be identified to be at greatly increased risk for at least one disease. Identification of genetic variants that influence disease risk, prognosis, or the response to treatment should enable the development of diagnostic and interventional strategies that are safe, effective and as necessary, individualized71, although the value of genetic variants in disease prediction and the steps needed to realize this are widely debated69,70. Given how little has actually been explained of the demonstrable genetic influences on most common diseases, despite identification of hundreds of associated genetic variants, the search for missing heritability provides a potentially valuable path towards further discoveries.
This paper is inspired by the deliberations of an expert working group convened by the National Human Genome Research Institute (NHGRI) on 2–3 February 2009, to address the heritability unexplained in GWAS. The authors acknowledge the participation of J. C. Cohen, M. Daly and A. P. Feinberg in the workshop.
Author Contributions T.A.M., F.S.C., N.J.C., D.B.G., L.A.H., D.J.H., M.I.M. and E.M.R. planned and participated in the workshop; L.R.C., A.C., J.H.C., A.E.G., A.K., L.K., E.M., C.N.R., M.S., D.V., A.S.W., M.B., A.G.C., E.E.E., G.G., J.L.H., T.F.C.M., S.A.M. and P.M.V. participated in the workshop; T.A.M., P.M.V.,G.G., M.I.M., E.E.E., T.F.C.M. and S.A.M. drafted the manuscript; F.S.C., N.J.C., D.B.G., L.A.H., D.J.H., E.M.R., L.R.C., A.C., J.H.C., A.P.R., A.E.G., A.K., L.K., E.M., C.N.R., M.S., D.V., A.S.W., M.B., A.G.C. and J.L.H. critically reviewed and revised the manuscript for content.
Author Information Reprints and permissions information is available at www.nature.com/reprints. The authors declare competing financial interests: details accompany the full-text HTML version of the paper at www.nature.com/nature.