|Home | About | Journals | Submit | Contact Us | Français|
Although recent genome-wide studies have provided valuable insights into the genetic basis of human disease, they have explained relatively little of the heritability of most complex traits, and the variants identified through these studies have small effect sizes. This has led to the important and hotly debated issue of where the ‘missing heritability’ of complex diseases might be found. Here, seven leading geneticists offer their opinion about where this heritability is likely to lie, what this could tell us about the underlying genetic architecture of common diseases and how this could inform research strategies for uncovering genetic risk factors.
Understanding the heritability of genetic diseases requires a more comprehensive assessment of human genetic variation. Human genomes are rich in structural diversity, but the discovery and genotyping of this type of variation has lagged far behind those of the SNP1,2. Although there has been a tremendous push to close this gap over the past 4 years3–5, two aspects remain understudied. The first is the exploration of the landscape and impact of large variants (deletions, duplications and inversions) that are individually rare but collectively common in the human population6,7. An estimated 8% of the general population carry a large (>500 kb) deletion or duplication that occurs at an allele frequency of <0.05%7. The available data suggest that these variants are under strong selection, affect transcription8 and contribute to a variety of different diseases9. These genomic imbalances represent a special class of rare variants that can potentially affect many genes and pathways in a single individual. Not only are large numbers of cases and controls required to assess the clinical significance of particular events, but the modelling of other forms of genetic variation in this sensitized background of localized haploidy or triploidy remains largely unexplored. The second aspect involves the several hundred genes that map to regions of copy-number polymorphic (CNP) duplications. Available data suggest that these genes are highly variable among individuals, are enriched in genes associated with drug detoxification, immunity and environmental interaction10, and have been subject to bursts of rapid, and sometimes adaptive, evolution in humans and our ape relatives. However, because of their repetitive and multicopy nature, these genes are considered inaccessible by most existing genotyping and sequencing technologies. There is a pressing need to characterize not only copy number but also the sequence content and structural arrangement in these diverse regions of our genome. Such regions are more likely to be subject to recurrent mutations and be inadequately assayed by a correlated neighbouring SNP. It is therefore premature to conclude that CNPs have limited impact in terms of common disease until these more complex regions are tested11. Excluding the most variable and diverse regions of human genetic variation because they are difficult to study is an unacceptable loss in the pursuit of genotype–phenotype correlations.
I find it hard to imagine that there will be a single answer to the question of where to find missing heritability, but I have a suggestion as to what might help find it. Even in crosses between inbred mouse strains, in which the genetics is simplified to a comparison between two genomes (and related genomes at that), there is variation in genetic architecture among phenotypes. For example, susceptibility to infectious disease has often turned out to be due to variants of large effect12,13; however, apparently equally complex phenotypes (such as cell counts of red and white blood cells, variation in high and low density lipoproteins and obesity) have a much more complex genetic architecture due to the joint action of very many loci of small effect14. Differences also exist in the extent to which epistasis shapes a phenotype: pervasive epistatic effects have been documented in autoimmune conditions15, morphology16 and susceptibility to cancer17, but the genetic architecture underlying fear-related phenotypes consists almost entirely of multiple small additive effects14,18.
Differences in genetic architecture reflect the complex, often opposing effects of selection, population history, migration and mutation rates. Is it possible to be more specific, to make predictions about genetic architecture? Interactions between selection and the size and structure of populations contribute to allele frequencies in predictable ways19, and theoretical models and data have already been used to argue that additive genetic effects are likely to be common in complex phenotypes20. But this conclusion applies broadly, averaging across all phenotypes in different populations. A more fine-grained analysis, examining individual phenotypes and taking into account the characteristics of individual populations, has yet to be undertaken. We would not, for example, expect the genetic architecture of schizophrenia and autism, both conditions that considerably lower reproductive fitness, to be the same as that of intelligence, height or weight. modelling per locus effect sizes has been used to constrain the genetic architecture of schizophrenia21. Presumably, modeling fitness would constrain possibilities yet further. So, understanding why genetic architecture differs for different traits could help when choosing the correct tools to find the underlying genes and deciding whether to look for common or rare variants, and studying the genetic architecture might even tell us what type of variant to expect (for example, a SNP or a copy-number variant). Population and theoretical genetics approaches may hold the key to finding the missing heritability.
In a nutshell, I think the missing heritability problem is overblown, and the focus on hits that are significant genome-wide is distracting attention from more general concerns over the ability of genome-wide association (GWA) studies to fully describe the architecture of phenotypic variation. A lot of the confusion may arise because heritability seems often to be equated with genetic contributions, but it actually refers to the ratio of the genetic to the total phenotypic variance in a population22. The population is assumed to share a common environment, but if there is hidden environmental structure that interacts with genes, then this effect shunts genetic variance to the denominator and reduces the (narrow sense) heritability estimate. So the heritability gives a lower bound on estimates of how much total variance the genetic component should explain. Exceptions will occur if heritability is estimated in a biased way with respect to environmental exposures or from pedigrees that have a high risk.
More fundamentally then, there is a missing genetic variance problem, which really relates to misplaced preconceptions. It would have been nice if GWA studies typically uncovered a dozen associations each explaining 5–10% of the variance, but the fact that they do not suggests only that the allelic effects are smaller or the causal alleles are too rare. With so many tests, there is a high false-negative rate, as true associations are hidden in the fog of random associations. I am convinced that myriad common variants of small effect do explain the vast majority of genetic effects predicted by heritability estimates, even though we cannot detect them individually23. Rare variants of large effect, sometimes in synthetic association with common variants, will also contribute24.
To my mind, the really interesting questions concern differences in the architecture of diseases, and the rising prevalence of chronic disease. Gene–environment (G×E) interactions may be important in both of these situations, but they will not appear in the heritability estimates. For example, lifestyle changes may alter the distribution of genetic effects in some people, and genetic buffering may be disrupted in some pedigrees. There are many reasons why such G×E interactions will not be revealed by GWA studies. Foremost is lack of statistical power, especially if a small fraction of individuals experience the adverse exposure. Population stratification can also induce false negatives in which the allelic effect is in the opposite direction to the population effect — and I suspect this is very common. And of course we are largely unable to identify what the relevant environments are. For these reasons, stratifying GWA studies by environments is unlikely to reveal any but the largest interaction effects, and testing for G×E effects will not obviously explain much more of the variance. G×E interactions may be strongest for rare alleles, but this will be very hard to detect on a case-by-case basis. But we certainly need to learn a great deal more about how the environment does modify allelic effects, because after all it is not genotype–phenotype so much as genotype effect–phenotype associations that really matter. A good place to start is environmental influences on the transcriptome and metabolome.
Recently, some susceptibility variants for cancer and type 2 diabetes (T2D) were shown to confer risk only when inherited from a specific parent, and a variant was discovered that can either confer or reduce risk of T2D depending on the parent of origin25. Such variants contribute to missing heritability in two ways: first, they are more difficult to discover and second, even if discovered, their contribution to heritability would be underestimated when evaluated under models that do not take parental origin into account. In another example, variants that increase the recombination rate for fathers reduce the recombination rate for mothers26. As recombination rates of parents affect transmissions and recombinations are sometimes associated with mutations, there could be sex-specific association between variants in parents and risks in offspring. more generally, for diseases in which prenatal conditions have a role, interactions between the genetic variants present in the parent and offspring are possible. In T2D, we estimate that about 13 to 14% of the heritability accounted for by the known variants can be attributed to parent-of-origin effects. Parent-of-origin effects have also been reported for type 1 diabetes27. Considering that the power to detect such variants is low and these effects could be more prevalent with rare variants, such effects should not be overlooked.
more important to consider, however, is the fact that complex inheritance can take on numerous forms. Epigenetic effects beyond imprinting that are sequence-independent and that might be environmentally induced but can be transmitted for one or more generations28 could contribute to missing heritability. Phase-dependent interactions between variants that are not in linkage disequilibrium and that are difficult to detect without long-range haplotypes29 are another possibility. Predicting the amount of missing heritability explained by each of these would be speculation, but it is reasonable to assume that complex inheritance as a whole could account for a substantial fraction of heritability. Everyone is looking forward to full-genome sequencing of large samples. Simply being able to identify and type variants, rare or common, old or new, that are not tagged by the current common SNPs will no doubt lead to many important discoveries. However, various other types of information can add value to sequence data, such as knowledge about the epigenome30 and information on families and parental origins (including information from long-range phasing, which among other usages29 would assist in determining the age of the nearest common ancestor when two individuals share a region by descent). The possibility that some heritability comes from entirely unforeseen sources is actually something to look forward to in the future.
Although GWA studies have been successful in identifying common variants involved in complex trait aetiology, for the majority of complex traits, <10% of genetic variance is explained by common variants31. Genetic variance may also be explained by gene interactions and structural variation, and there is strong evidence that rare variants have an important role. These variants, although individually rare, are collectively frequent, and even though their effect sizes are greater than those observed for common variants, they are not large enough to produce familial aggregation32. For a variety of complex traits, exome data are currently being generated, and whole-genome sequencing will follow for the next wave of GWA studies. Detecting associations with rare variants is the first step towards a better understanding of the extent of their role in complex trait aetiology. Common variant analyses can be used for direct mapping of rare variants; however, these methods are underpowered owing to low allele frequencies and allelic heterogeneity33,34. Association methods developed specifically for rare variants jointly analyse variants in a locus or gene instead of individually testing each variant. These methods include the comparison to those found only in controls (RvE)35, the combined multivariate and collapsing (CmC)4 method and the weighted sum statistic (WSS)36, with the CmC and WSS methods having a power advantage over the RvE method. Some methods, such as the CmC method, can easily be adapted for the analysis of quantitative traits37, the control of confounding factors, and the inclusion of variant-predicted functionality and accuracy of genotype calls. Because sequencing uncovers both causal and non-functional variants, methods for analysing rare variants must be robust to misclassification. Even if detected gene associations are replicated, it is not possible within this analysis framework to tease apart causal from neutral variation. Due to the current cost of exome sequencing, study designs that maximize power for a given number of sequenced individuals are beneficial; these designs include extreme quantitative trait sampling and the use of common controls. When analysing rare variants, it is important to adequately control for population substructure and admixture; rare variants tend to have occurred more recently and therefore have greater population diversity than common variants. Analysis methods that evaluate effect sizes are necessary for the estimation of the amount of explained genetic variance. Computationally efficient methods are also important, in particular for the analysis of exome and whole-genome sequence data. Sequencing and the analysis of phenotyped samples will elucidate over the next few years whether or not the majority of the missing heritability for complex traits is due to rare variants.
The case of the missing heritability for common human diseases should not be a mystery to anyone given the inherent complexity of the relationship between genotype and phenotype. Consider, for example, the discovery of non-coding microRNAs that provide a new mechanism of translational regulation. Genetic variation influencing the expression of non-coding RNAs has the potential to add a previously unexpected layer of complexity to the genetic architecture of biological and clinical traits. For example, Nicoloso et al.38 recently showed that SNPs associated with breast cancer susceptibility can alter microRNA gene regulation. This is one of many layers of complexity that need to be considered before we will truly be able to identify the missing heritability that has not been accounted for by agnostic or unbiased GWA studies that focus on one genetic variation at a time.
Given this expected complexity, it is likely that synergistic interactions among variants in the sequences that regulate both microRNA and mRNA expression have a big impact on protein expression. Similarly, it is reasonable to presume that coding sequence variations, for example, could synergistically act through protein–protein interactions and protein–DNA interactions in transcriptional networks and biochemical systems. Such biomolecular interactions that depend on multiple genetic variations can substantially complicate the relationship between genotype and phenotype, making it impossible to explain phenotypic variation simply by adding together independent genetic effects. This hypothesis is completely consistent with the current results of GWA studies. Indeed, this kind of complexity is routinely observed in simple organisms, such as yeast39. If this is true for yeast, why would we expect humans to be markedly simpler?
The idea that multiple genetic variations can and do interact through different layers of genomic complexity is not new. In fact, Bateson40 coined the term epistasis to describe one gene standing upon or modifying the effects of another gene. This early definition of epistasis has given way to more modern definitions that recognize the complexity of gene networks and biochemical systems41–43. This logically leads to the idea that a significant proportion of the missing heritability is not due to single common variants, nor single rare variants, but rather to rare combinations of common variants. As such, solving part of the missing heritability problem will require the application of statistical and computational methods that detect patterns of epistasis across the genome implemented in a systems biology framework that accounts for the highly interconnected nature of bio molecular networks44,45. High-throughput technology alone will not solve this problem. The time is now to philosophically and analytically retool for a complex genetic architecture or we will continue to underdeliver on the promises of human genetics. Indeed, life, and thus genetics, is complicated46 and some will soon ask, as seismologists have47, whether we are trying to predict the unpredictable.
Testing associations between genotype and phenotype is central to many genetic studies of inherited traits and disease. Historically, mendel’s laws and morgan’s chromosome theory of inheritance have dominated research such as GWA studies, and alternative modes of inheritance have been largely and appropriately dismissed for lack of evidence. Now we discover that most genetic variants that account for the heritable component of phenotypic variation elude discovery48. Perhaps heritability is over estimated. Alternatively, perhaps the missing variants reside in largely unexplored regions of the genome, or in largely untested classes of genetic variation. Another possibility is that genetic variants are missed because they are rare and their effects are small. Or perhaps genetic complexity is greater than imagined, with a very large number of closely linked genes that show context-dependent and non-additive effects49. But will these possibilities reveal the ‘whole truth’ about ‘missing heritability’?
Recent studies in mice provide striking evidence for transgenerational genetic effects in which phenotypic variation in the present generation results from genetic variants in previous generations28. Remarkably, these studies show that transgenerational effects persist across several if not many generations, and that these effects are common and usually as strong as conventional inheritance. The original discoveries involve pigmentation50, germ cell51,52 and heart development53, embryogenesis54 and growth54, and ongoing work provides examples involving metabolism, behaviour and many other traits61 (j.H.N., unpublished observations). Some transgenerational effects are reminiscent of paramutations50,53,54, whereas others involve interacting genes in different generations51,52. The obvious molecular mechanisms for epigenetic inheritance are DNA methylation and histone modifications55. However, evidence from plants, flies and more recently from mice suggests another possibility28 — a combination of small RNAs50,53,54, RNA-binding proteins that are involved in both RNA editing56 and microRNA access to their target mRNAs57, and DNA methylation mediated by RNA-editing enzymes58,59 controls translation in RNA granules that are abundant in the gametes of both males and females50,60.
How are these discoveries relevant to missing heritability? Because transgenerational effects loosen conventional genotype–phenotype associations, even complete genome surveys will fail to reveal the full repertoire of genetic variants. Under these conditions, traits are heritable, with family members being more similar to each other than unrelated individuals. Remarkably in these cases, the genotype of individuals in previous generations is a better predictor of phenotype than the individual’s own genotype. Two key questions emerge: first, do transgenerational effects occur in humans? This test could simply involve examining associations between genotypes and phenotypes across generations. Second, what is the molecular basis for epigenetic inheritance? Although other possibilities exist, perhaps the most provocative evidence suggests that small RNAs and related protein functions are responsible for the epigenetic persistence of genetic memory.
E.E.E. thanks G. Cooper for helpful suggestions. J.F. is supported by the Wellcome Trust. J.H.M. is supported by US National Institutes of Health R01s LM009012, LM010098 and AI59694.
Competing interests statement
E.E.E. and A.K. declare competing financial interests; see Web version for details.
Evan E. Eichler’s homepage: http://eichlerlab.gs.washington.edu
Jonathan Flint’s homepage: http://www.well.ox.ac.uk/flint
Greg Gibson’s homepage: http://www.gibsongroup.biology.gatech.edu
Suzanne M. Leal’s homepage: http://www.bcm.edu/genetics/leal
Jason H. Moore’s homepage: http://www.epistasis.org
All links Are ActiVe in the online pdf
The contributors*Evan E. Eichler is a professor and Howard Hughes Medical Institute (HHMI) Investigator in the Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA. He received his Ph.D. in 1995 from the Department of Molecular and Human Genetics at Baylor College of Medicine, Houston, Texas, USA. He joined the faculty of Case Western Reserve University, Cleveland, Ohio, USA, in 1997 and later the Department of Genome Sciences in 2004. He was appointed as an HHMI Investigator in 2005 and an American Association for the Advancement of Science Fellow in 2006, and was awarded the American Society of Human Genetics Curt Stern Award in 2008. He is an editor of the journal Genome Research and a participant in numerous genome sequencing projects. His research group develops experimental and computational methods for studying complex regions of genome structural variation, including deletions, duplications and inversions. The long-term goal of his research is to understand the evolution and mechanisms of primate gene duplication and their relationship to copy-number variation and human disease.
Jonathan Flint is a psychiatrist working at the University of Oxford, UK. After working on small chromosomal deletions (or copy-number variants) that cause mental retardation, he turned to investigating the genetic basis of anxiety and depression in animal models and in humans. He developed strategies for fine-mapping genetic loci underpinning complex traits using heterogeneous stocks for genetic mapping. These strategies apply to all phenotypes, not just behaviour.
Greg Gibson is a professor in the School of Biology at the Georgia Institute of Technology in Atlanta, Georgia, USA. He trained as a developmental molecular geneticist at the University of Basel, Switzerland, and at Stanford University, California, USA, and then worked on genomic approaches to quantitative genetic analysis of complex traits in Drosophila melanogaster for 12 years, mainly at North Carolina State University, Raleigh, USA. Recently, he has shifted his research programme to human genetics, with a particular interest in the causes of environment and culture-dependent increases in the incidence of common complex diseases. His ‘human transition project’ uses genome-wide genetics of gene expression and metabolomics studies in mixed populations in developing countries, such as Morocco and Fiji, as well as in diverse urban settings.
Augustine Kong is Vice President of Statistics at deCODE genetics in Iceland. He received his Ph.D. from Harvard University, Cambridge, Massachusetts, USA, and was a professor at the University of Chicago, Illinois, USA, from 1987 to 2000. He started working in Iceland in 1996, and was the architect of the deCODE genetic map constructed in 2002. He has extensive experience with large-scale and population-based genetic studies covering a wide range of human traits. Recently, by combining principles from family and genome-wide association studies, he and his team derived methods that can, reliably and systematically, phase markers over a long range and determine the parental origins of alleles.
Suzanne M. Leal is a professor in the Department of Molecular and Human Genetics at Baylor College of Medicine, Houston, Texas, USA. She received an M.S. in biostatistics and a Ph.D. in epidemiology from Columbia University, New York, USA. She is interested in understanding the genetic etiology of complex and Mendelian traits in relation to common and rare variants, structural variation and gene–gene and gene–environment interactions. She is involved in the study of several traits, including non-syndromic hearing impairment, aneurysm and dissection, opiate addiction, pain perception and platelet reactivity. She is also involved in developing methods for analysing complex traits, and her latest work focuses on the analysis of next-generation sequence data and detecting associations with rare variants. This work led to the development of the combined multivariate and collapsing (CMC) method, which was one of the first approaches designed specifically to detect associations with rare variants.
Jason H. Moore is the Frank Lane Research Scholar in Computational Genetics, Professor of Genetics, Professor of Community and Family Medicine and Director of Bioinformatics at Dartmouth Medical School, Lebanon, New Hampshire, USA. He holds adjunct appointments in Computer Science at the University of New Hampshire, Durham, USA, and the University of Vermont, Burlington, USA. He also holds an adjunct appointment in Psychiatry and Human Behavior at Brown University, Providence, Rhode Island, USA. He serves as a founding Editor-in-Chief of the journal BioData Mining and is an editor of the Cambridge University Press book series Systems Genetics. He has published extensively on the detection, characterization and interpretation of epistasis and other phenomena that account for the complex mapping relationship between genotype and phenotype in the context of human health.
Joseph H. Nadeau is Chair of the Department of Genetics, James H. Jewel Professor of Genetics and Director of the Division of Bioinformatics in the Center for Proteomics and Bioinformatics at Case Western Reserve University School of Medicine, Cleveland, Ohio, USA. He has joint appointments in the Case Comprehensive Cancer Center and the Department of Electrical Engineering and Computer Science in the Case School of Engineering. He is an Elected Fellow of the American Association for the Advancement of Science, founding director of the Mouse Genome Database and Informatics Program, and co-founding editor of Mammalian Genome and WIREs Systems Biology and Medicine, which recently won the R. R. Hawkins Award for outstanding scholarly publication from the American Publishers Association. His work focuses on the genetics and systems properties of complex traits and common diseases in mouse models of human disease.
*Listed in alphabetical order.
Evan E. Eichler, Department of Genome Sciences, University of Washington, S413C Foege Building, 1705 NE Pacific St., Seattle, Washington 98195-5065, USA. Email: ude.notgnihsaw.sg@eee.
Jonathan Flint, Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK. Email: ku.ca.xo.llew@fj.
Greg Gibson, School of Biology, Georgia Institute of Technology, 310 Ferst Drive, Atlanta, Georgia 30332, USA. Email: firstname.lastname@example.org.
Augustine Kong, deCODE genetics, Sturlugata 8, IS-101 Reykjavik, Iceland. Email: si.edoced@gnok.
Suzanne M. Leal, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77025, USA. Email: ude.mcb@laels.
Jason H. Moore, Computational Genetics Laboratory, Dartmouth Medical School, One Medical Center Drive, Lebanon, New Hampshire 03756, USA. Email: email@example.com.
Joseph H. Nadeau, Department of Genetics, Case Western Reserve University School of Medicine, Biomedical Research Building 731, 2109 Adelbert Road, Cleveland, Ohio 44106-4955, USA. Email: firstname.lastname@example.org.