|Home | About | Journals | Submit | Contact Us | Français|
Human diseases are caused by alleles that encompass the full range of variant types, from single-nucleotide changes to copy-number variants, and these variations span a broad frequency spectrum, from the very rare to the common. The picture emerging from analysis of whole-genome sequences, the 1000 Genomes Project pilot studies, and targeted genomic sequencing derived from very large sample sizes reveals an abundance of rare and private variants. One implication of this realization is that recent mutation may have a greater influence on disease susceptibility or protection than is conferred by variations that arose in distant ancestors.
Common chronic diseases such as diabetes, coronary heart disease, stroke, neuropsychiatric illness (including schizophrenia, autism, and developmental disabilities), chronic respiratory disease, and cancer account for an overwhelmingly large fraction of mortality, morbidity, and health care expenditure (http://www.cdc.gov/nchs/). These diseases disproportionately affect aging populations and burden the health care systems and economies of industrialized nations throughout the world. Understanding the underlying causes of such disorders is a key step toward enabling earlier and more precise diagnosis, prognosis, interventional therapy, and potentially prevention.
Most common diseases are complex or multifactorial with both environmental and genetic contributions along with their nearly intractable interaction effects. In general, the environmental components are challenging to identify and quantitate. In contrast, as a result of the emergence of powerful genomic technologies, the analysis of the genetic components is becoming increasingly tractable and relatively inexpensive to investigate. These technical improvements have fueled a pipeline of discovery of the genes and variants that predispose to human maladies. The technical improvements have also impacted genetic diagnostics, as it is now practical to sequence an entire individual’s genome for less than the cost of a comprehensive set of whole-body imaging scans. Furthermore, the cost of whole-genome sequencing is rapidly becoming less expensive than current clinically implemented “multigene panel testing” for molecular diagnosis of disease traits with even modest genetic heterogeneity.
A “common disease/common variant” (CDCV) hypothesis has been popularized as an explanation for common disorders and has garnered much support (Reich and Lander, 2001). This model presupposes that different combinations of common alleles aggregate in specific individuals to increase disease risk. The CDCV hypothesis was a major intellectual impetus for the International Haplotype Mapping (HapMap) project and ultimately led to a proliferation of genome-wide association studies (GWAS) identifying regions influencing disease status or risk factor levels (http://www.genome.gov/gwastudies). As a consequence, insights into potential new pathways underlying common disease have emerged.
Manolio et al. (2009) suggested that the genetic variance explained per se is of interest for what it might suggest about effective research paradigms. Accounting for the genetic variance is not the same as achieving utility and impact. If the goal of our shared research program is to achieve mechanistic understanding and lessen the impact of human disease, then the magnitude of the genetic effect is less important than the possible insight provided by the newly identified loci. Genes identified through their weakly acting common alleles may give important clues about pathways and/or be excellent targets for therapeutic and preventive strategies.
However impressive the information gleaned from GWAS has been, the results have explained only a few percent of the apparent genetic variance contributing to common diseases. Furthermore, these studies have not yet delivered medically actionable variants that inform medical decision making by helping to establish an etiological diagnosis and lead to a more efficacious treatment or prevention plan. Both diagnostic utility and classification of pathogenicity are closely associated with the magnitude of each variant-specific effect. For most complex diseases, unless a variant clearly partitions the affected into distinct biological subgroups or can be incorporated into risk-prediction models, there is likely to be limited diagnostic usefulness. Morrison et al. (2007) proposed the use of a composite “genetic risk score” for risk assessment. However, it remains to be determined whether variants identified by GWAS have a role as biomarkers in risk assessment and clinical decision making. Overall, these data do not support a simple additive version of the CDCV model as an explanation for the majority of the genetic component underlying risk for common disease.
Research efforts have shifted to exploration of less frequent variants in common disorders. It has been noted for decades that the mutational changes that underlie rare and highly penetrant Mendelian disease may share features with genetic factors that underlie more common forms of the disease (Boerwinkle and Utermann, 1988; Goldstein and Brown, 2001). Clearly, the relationship between rare and common disease is not a simple one, but there are emerging examples wherein specific loci that cause “Mendelian disease” are contributing to the background risk to a parallel common disorder. Although a common disease/rare variant (CDRV) hypothesis is attractive, it demands tenable and complete explanations as to how the functional roles of individual alleles can work to produce the ultimate phenotypic effects. The models must consider the range of variant types, including single-base or simple-nucleotide variants (SNV), short insertions or deletions (indels), structural variants, and copy-number variants (CNV), the penetrance of individual alleles, and allelic and locus interactions (dominance and epistasis, respectively) and show how these all combine to produce the population frequency and the phenotypic complexity of different disorders.
Fortunately, the current state of knowledge of key examples supports models that close the gap between complex and Mendelian traits. These examples show how mutations in single genes can fulfill the definition of Mendelian disease—but in different context are parts of the menu of causal contributors to complex disorders (Greeley et al., 2010; Voight et al., 2010). As we begin to observe instances wherein variation at more than one locus contributes to perturbations of networks and ultimate phenotype, the relevance of assessing genome-wide variation becomes more apparent. Thus, inferences about individual mutation burden by geneticists in the last century are now open to direct observation (Muller, 1950).
The interplay between different types of variation and their contribution to disease are highly dependent on our understanding of the normal patterns of genetic variation. For rare variants, this has been a particular challenge, as highly accurate data need to be generated from many samples in order to properly determine the frequency and population distribution of the genetic variants. To illustrate this: the successful HapMap project (International HapMap 3 Consortium, 2010; International HapMap Consortium, 2005; Frazer et al., 2007) that provided an early survey of single-base variation across major human populations cataloged only a fraction of the genetic variation above a frequency of 5%. Even the 1000 Genomes Project pilot studies comprehensively captured only variation at greater than 1% frequency (1000 Genomes Project Consortium, 2010).
Our view of the site frequency spectrum of these rare variants (<1%) has been more influenced recently as a result of the generation of personal genome data using whole-exome sequencing and whole-genome sequencing (Gonzaga-Jauregui et al., 2011). The number of diploid human genome sequences available for analyses is growing rapidly. Remarkably, from the small number determined and publicly available to date, it is apparent that even more genetic variation exists between individuals than was previously expected (Ahn et al., 2009; Bentley et al., 2008; Kim et al., 2009; Levy et al., 2007; Lupski et al., 2010; Schuster et al., 2010; Wang et al., 2008; Wheeler et al., 2008). When compared with the haploid reference, each individual human genome on average contains some three and a half million SNV and about 1,000 CNV (>450 bp) (Conrad et al., 2010), many of which appear to be rare in the population from which the individual was sampled. In addition, each individual personal genome sequence still reveals 200,000–500,000 SNV that have not been observed in other publicly available personal genomes, many of which may be unique to that individual’s family or clan. In parallel, recent studies that deeply sequence relatively large samples (hundreds to thousands of individuals) show that the rate of identification of variants that have not been seen (private variants) continues unabated with every new individual sampled (Coventry et al., 2010; Turner et al., 2008). The extent of some of this nucleotide variation may have been anticipated from human genetic studies during the previous three decades that established that an SNV occurs about every 1 Kb; however, the extent of rare and “private” SNV was not anticipated, and the extent of CNV was unexpected. There are technical limitations to some of these studies—including a background of variation introduced in cultured cells—as well as in the mutation detection methods themselves, particularly for CNV in the 100 bp to 500 bp range, low-copy repeat sequences, and simple repeats. Nevertheless, the enormous extent of private variation has been clearly established.
A number of factors may have led to the observed skewing of the allele frequency spectrum toward rare and private variants. The explosion of human populations in the current historical epoch could, by itself, account for the short branch lengths and low frequencies of the most distal segments of human variant genealogies (Boyko et al., 2008; Coventry et al., 2010; Turner et al., 2008). In addition, secular factors that have enabled the explosion of the population, such as abundant food supplies, improved sanitation, and routine vaccinations, may directly participate in the relaxation of the most important selective pressures that have constrained the population in the past. Even the widespread availability of minimal routine health care may be artificially slowing negative selection. Dramatic reductions in maternal death and infant mortality, properly celebrated in the last 100 years, may be influencing the distribution of genetic variation and contribute to relaxed selection. Finally, mutation rate, perhaps partially driven by increased paternal age (Crow, 2008) and undiscovered environmental factors, may contribute to the observed rare variant spectra.
The conceptual shift to emphasizing studies of abundant, rare, and heterogeneous variants profoundly impacts our approaches to studying the genetic architecture of human disease, leading to a genome-wide, versus a locus- or gene-specific, emphasis. Genetic architecture here refers to the types of variation (SNV, CNV, etc., both coding and noncoding), their allele frequency distribution (common, rare, intermediate), the size of an allele’s effects, and new mutation rates. For a given individual, what is important to know is not only the number and location of pathogenic variants taken one at a time but also the unique composition of his or her genome-wide mutational burden. If this is the case, then the risk conferred by any particular allele estimated from the population risk would be much less relevant than the personal risk emerging from the total mutational burden in each individual. The shift in emphasis to a whole-genome view changes how we should consider the way in which harmful combinations of mutant alleles assemble or accumulate in each genome. Each personal genome has a collection or “ecology” of deleterious and protective variations, which in combination (not necessarily in sum) dictate the health of the individual. Understanding this genome ecology will be a substantial challenge in human genetics and has ramifications for the extent to which genetic information can be maximized for medical utility.
Each personal genome combines inherited alleles and new variation introduced by de novo mutation. Interestingly, CNV may contribute in a significant way, from both the novel combinations inherited from each parent and the new mutations. This is the very type of variation not fully taken into account when previous mutation models were being considered. Locus-specific mutation rates for SNV are 2.0–2.5 × 10−8 and have recently been shown to potentially differ in male versus female germ cells (Conrad et al., 2011); for CNV, new mutation rates can be substantially higher: between 10−6 and 10−4, 100 to 10,000 times more frequent than in SNV (Lupski, 2007a). The latter figures implicate CNV in sporadic traits (Lupski, 2007a) including birth defects (Lu et al., 2008) and highlight the contribution of new mutation to individual mutational burden (Potocki et al., 1999). Either new or recent (i.e., arising in close relatives or “clan members”) de novo mutations could substantially contribute to phenotypic extremes, such as birth defects and disease.
Although de novo CNV have been detectable now for some time with microarray technologies, identifying smaller de novo events (e.g., SNV) has become feasible only recently with the advent of large-scale DNA sequencing technologies. Recent exome sequencing studies of family trios with patients manifesting sporadic intellectual disability (previously more frequently referred to as mental retardation [MR]) identified a high frequency of de novo mutations in “MR genes” (Vissers et al., 2010). Such studies support established theory that if the mutational target is large (and hence the observed gene mutation rate is high), de novo mutations may account for a high incidence of disease even when the selection coefficient is close to 1.0. These early studies suggest a resolution to the question of why the frequency of neurodevelopmental disabilities is high despite near genetic lethality for such traits. Relatedly, sequencing studies of multiple ion channel genes in patients with epilepsy (Klassen et al., 2011) and of known autism susceptibility genes in subjects with high-functioning autism (Schaaf et al., 2011) reveal many rare variants and also de novo mutations that may be contributing to disease.
The concept of new mutation in X-linked lethal disorders was well established by Haldane (Haldane, 1935). However, the new mutation contribution to many human disease traits may be greater than anticipated (Hoischen et al., 2010), particularly for genetically heterogeneous conditions in which hundreds of genes could be involved but only one or a few loci are responsible in an individual patient. The developmental timing of new somatic mutations is perhaps underappreciated (Lupski, 2010) as previous studies have emphasized germline events. New mutations may occur in the germline, during any stage of development of the organism, in stem cells, or in differentiated somatic cells.
Chromothripsis in cancer (Stephens et al., 2011) and complex genomic rearrangements (CGR) associated with selected genomic disorders (Liu et al., 2011) both illustrate the potential gene(s) alteration—complexities that can be brought about by new mutation CNV events. In each case, a single mutational event can result in a cataclysmic chromosomal catastrophe and alter the copy number or structure of several different genes.
Most sites of variation have low minor allele frequencies (that is, are rare) and are of recent origin, and therefore the major contributors to inherited disease susceptibility are likely to be those alleles that arose recently in an extended pedigree. Purifying natural selection is expected to eliminate highly deleterious variants before they reach a high frequency, such that disease risk alleles with large effects should be enriched at the lower frequencies (Marth et al., 2011). The idea that there are unique combinations of rare variants characteristic of a recent family lineage and that these combinations can have a causative role in disease is encapsulated by what we refer to as “clan genomics” (Figure 1). The population from which one comes and its collection of older common variants may have less influence on an individual’s disease susceptibility than the collection of recently arisen rare variants and de novo mutations (Figures 2A and 2B). The most important thing that an individual needs to consider in terms of their genetic variation with relation to disease susceptibility is therefore recent “genetic history” of their extended pedigree or clan. From the standpoint of delivering personalized genomic medicine, the medically actionable alleles are the ones of most interest; and these may be highly weighted toward recent rare variants.
Nevertheless, the most important thing is not to focus disproportionately on specific variants, but rather to integrate across all classes of risk-associated variants. In some individuals, risk may be caused by an unusual combination of common variants, whereas in others it will be due to a smaller number of large effect rare variants.
Resequencing studies of genes that can cause rare Mendelian forms of common complex traits reveal that rare variants can contribute to hypertension (Ji et al., 2008; Wagner, 2008), hypercholesterolemia (Kotowski et al., 2006), hypertriglyceridemia (Romeo et al., 2009), and nonalcoholic fatty liver disease (Romeo et al., 2008) in the population at large. These examples inform models where individual alleles with high penetrance contribute to common complex traits. In addition, when GWAS signals have identified variants for common traits, their molecular mechanistic underpinnings often support those already established by Mendelian forms of the condition (Sankaran et al., 2008, 2009; Vernimmen et al., 2009).
The idea that genes responsible for Mendelian disease can also have a role in the common form of the same or a similar condition is not new. For example, the pioneering studies of Michael Brown and Joseph Goldstein showed that individuals with compound heterozygous mutations in the low-density lipoprotein receptor (LDLR) gene manifest the Mendelian disorder familial hypercholesterolemia (FH) (Brown and Goldstein, 1986; Goldstein and Brown, 1987). FH patients have extremely high cholesterol levels and can have coronary atherosclerotic heart disease and myocardial infarctions in their teenage years. Interestingly, the type of LDLR gene mutation predicts cardiovascular risk in children with familial hypercholesterolemia (Guardamagna et al., 2009). Heterozygous rare variant mutations at the LDLR locus can also cause the complex traits of early onset hypercholesterolemia, coronary atherosclerotic heart disease, and myocardial infarctions in carriers with disease manifesting in the fourth or fifth decades of life.
Heterozygous carriers for recessive disease genes do not manifest the recessive disease but may be susceptible to a milder or related malady, which may consist of a complex trait with a similar phenotype. For example, heterozygote carriers of mutations in the ataxia telangiestasia locus are susceptible to breast cancer (Athma et al., 1996), and similar heterozygous carrier susceptibilities are also manifest for other recessive human cancer predisposition syndromes (Heim et al., 1991). Carriers for mutations in the Gaucher disease causative gene, GBA encoding glucocerebrosidase, are at increased risk for Parkinson disease (Goker-Alpan et al., 2004; Sidransky et al., 2009). Heterozygous carriers of mutations in the cystic fibrosis transmembrane regulator gene, CFTR, can be susceptible to idiopathic pancreatitis (Cohn et al., 1998; Sharer et al., 1998; Weiss et al., 2005), chronic obstructive pulmonary disease (COPD) (Divac et al., 2004), and even chronic rhinosinusitis (Wang et al., 2000, 2005). Carriers of α-1-antitrypsin (AAT) deficiency can also be susceptible to COPD (Hersh et al., 2004; Poller et al., 1990). Interestingly, even such common traits as age-related macular degeneration (AMD) and carpal tunnel syndrome are associated with heterozygous carrier status for mutations in ABCA4, the gene responsible for Stargardt macular dystrophy (Bacq et al., 2009), and Charcot-Marie-Tooth neuropathy genes (Lupski et al., 2010), respectively (Figures 2C and 2D). In the latter case, haploinsufficiency due to either heterozygous SNV (Lupski et al., 2010) or heterozygous CNV (Del Colle et al., 2003) can convey the trait. Whereas most carrier states may have rare allele frequencies, others will actually have a significant carrier frequency in selected populations (e.g., CFTR ~4% in European descendants).
In addition to variants that cause Mendelian disease-informing complex traits, there is a striking reciprocity of genes implicated by GWAS that are also known to underlie rare Mendelian diseases. For example, 11 of 30 genes associated with serum lipid levels are implicated in single-gene disorders of lipid metabolism (Kathiresan et al., 2009). We reviewed the current listing of annotated genes with significant associations in 891 GWAS studies (http://www.genome.gov/gwastudies/). We found that at least 268 genes implicated by GWAS are also known to bear mutations in rare single-gene disorders. Some of these associations are intuitive, such as those associated with biochemical traits and related inborn errors of metabolism. There are also a significant number of genes that underlie developmental disorders that harbor common variants affecting risk of cancer, body growth, and cardiovascular traits (Table S1 available online). This raises the testable hypothesis that genetic influences on human diseases can largely be accounted for by a subset of genes that play roles in a restricted set of pathways. Immune and inflammatory pathways provide a robust example as do those genes involved in lipid metabolism. It is important to note that in our survey, for most cases of GWAS the causal gene underlying a given GWAS signal is unknown.
Whereas GWAS can indirectly implicate “Mendelian genes” in complex disease risk, different mutations of a single gene, or a CNV at a single locus, are directly implicated in complex disease risk. A poignant example of the different phenotypic consequences of distinct allelic variants at a locus is provided by the fragile X mental retardation 1 (FMR1) locus. Triplet repeat expansion of the CGG repeat element in the 5′ untranslated region (UTR) of the FMR1 gene—an especially unstable form of indel mutation—causes severe X-linked mental retardation in both males and females. Alleles with lower numbers of CGG repeats (55–200 repeats; called premutation alleles), however, cause adult onset tremor/ataxia syndrome (FXTAS) in approximately 33% of males and 10% of females (Hagerman et al., 2004; Jacquemont et al., 2004). Thus, premutation variants that have been considered nonpathogenic can have phenotypic consequences for common complex traits such as tremor and ataxia.
Rare CNV at different loci have also recently been associated with complex traits including Alzheimer disease (Rovelet-Lecrux et al., 2006), Parkinson disease (Farrer et al., 2004; Singleton et al., 2003), lupus glomerulonephritis (Aitman et al., 2006), Crohn disease (Fellermann et al., 2003, 2006; McCarroll et al., 2008), psoriasis (Hollox et al., 2008), pancreatitis (Le Maréchal et al., 2006), and obesity (Bochukova et al., 2010). Many rare CNV have also been associated with intellectual disability (Stankiewicz and Beaudet, 2007) and with some forms of neuropsychiatric illness, including schizophrenia (Consortium, 2008; Lupski, 2008; McCarthy et al., 2009; Stefansson et al., 2008) and autism (Kumar et al., 2008; Shinawi et al., 2010; Weiss et al., 2008).
A further illustrative example of the connection between Mendelian and complex traits is provided by variants at the MECP2 locus and their contribution to disease. Heterozygous loss-of-function SNV in MECP2 result in the X-linked dominant trait of Rett Syndrome in girls (Amir et al., 1999); however, hemizygous loss-of-function mutations are thought to be lethal in males. Recently, duplication of CNV including MECP2 has been associated with an intellectual disability plus seizure disorder in males (Carvalho et al., 2009; del Gaudio et al., 2006; Friez et al., 2006; Meins et al., 2005; Van Esch et al., 2005) and autism spectrum disorder (Ramocki et al., 2009; Schaaf et al., 2011). Male patients with triplication of MECP2 have a more severe phenotype (del Gaudio et al., 2006). Of note, maternal carriers of the MECP2 duplication (CNV) appear more susceptible to psychiatric symptoms unrelated to having a child with a disability (Ramocki et al., 2009). Thus, at a single locus, the genetic variation can cause an X-linked dominant disorder in females and an X-linked recessive trait in males and can be associated with susceptibility to a common complex trait in carrier mothers.
For mutations at a single locus, allelic interactions can profoundly affect clinical phenotype. At the ABCA4 locus, the disease severity is related to the residual activity of encoded transporter protein (Figures 2A–2C). Recessive Stargardt disease is caused by compound heterozygous mutations at this locus (Allikmets et al., 1997b). Homozygous or compound heterozygous mutations, if both null, result in retinitis pigmentosa. Within a single pedigree or clan, different combinations of alleles can result in differing ages of onset (Lewis et al., 1999), completely different diseases (Shroyer et al., 2001a), or both a recessive Stargardt disease and susceptibility to a complex trait, age-related macular degeneration, due to a heterozygous carrier state (Figure 2) (Shroyer et al., 1999, 2001b).
Rare point mutations (either functional noncoding SNV [Kurotaki et al., 2005] or coding SNV with incomplete penetrance [Shy et al., 2006]) in combination with a deletion CNV have been shown to contribute together to particular phenotypes. A combination of a rare deletion CNV with a de novo duplication CNV can also result in a phenotype that appears to be a complex trait (Potocki et al., 1999). Sometimes SNV mutations at two different loci, i.e., digenic inheritance, are required to manifest a trait segregating as a recessive disease, and the mutational load required may have a single mutant allele at each of the two loci (Kajiwara et al., 1994) (double heterozygous) or two mutant alleles at one locus and one at the other (triallelic inheritance) (Figure 3) (Katsanis et al., 2001). With respect to models for Mendelian transmission, a deletion CNV renders a locus monoallelic, whereas a duplication CNV results in a triallelic locus (Figure 3).
It is now well established that even simple Mendelian traits can have modifier loci (Badano and Katsanis, 2002; Dipple and McCabe, 2000), demonstrating the potential importance of nonhomologous allelic interaction and epistasis. For example, severity of disease for CMT can be due to a combination of mutations at more than one CMT locus (Chung et al., 2005; Hodapp et al., 2006; Meggouh et al., 2005) (Figures 4A–4C).
In contrast to the trans-genetics of Mendelism (Figures 2--4),4), genetic interactions occurring on the same chromosome or in cis (Figures 5A–5B) can also have profound consequences as exemplified at the alpha globin locus. For structural variants, the genomic mutational load can reflect the size of the CNV and inclusion of additional dosage-sensitive genes or genomic segments in cis (Bi et al., 2009; Lupski et al., 1991, 1992; Roa et al., 1996). Two extreme examples of this “cis-genetics” effect are segmental aneuploidy (Figure 5) and complete aneuploidy (e.g., trisomy 21) that convey complex phenotypes related to the size of the CNV and number of dosage-sensitive genes and/or genomic segments involved. For Down syndrome associated with trisomy 21, this includes an endophenotype of early onset Alzheimer disease; the amyloid precursor protein (APP) gene maps to chromosome 21, and duplications involving this gene have indeed been associated with Alzheimer disease (Rovelet-Lecrux et al., 2006).
For intellectual disability, recent studies suggest the possibility that two independent CNV (El-Hattab et al., 2010; Girirajan et al., 2010; Potocki et al., 1999) can contribute to the ultimate phenotype, as shown in individual patients and as predicted by previous models (Lupski, 2007b).
In aggregate these data show that rare variants and the genome-wide totality of pathogenic alleles contribute to complex traits (Allikmets, 2000; Allikmets et al., 1997a; Douros et al., 2008; Hersh et al., 2004; Lupski, 2007b; Poller et al., 1990; Wittrup et al., 1997, 2006). Unfortunately, such rare variants are not being accounted for in many current GWAS, and CNV and noncoding SNV are not detected by typical whole-exome sequencing approaches.
In the past, focused, locus-specific, single-gene analyses have elucidated genetic etiologies for disease, but it is now emerging that whole-genome sequencing will produce a more complete assessment of genetic variation contributing to personal health. The genome of each individual contains the inherited contribution of common variants that segregate within the population, the inherited contributions of rare variants that emerged in recent history in the clan, the new combinations of such recently arising variants from both parents, and the new mutation contributions yielding the total mutational burden (Figure 1). Highly penetrant rare variants, and often de novo mutations, contribute medically actionable alleles to Mendelian disease and perhaps extremes of phenotypes in common disease. Common variants can contribute to medically actionable variants for pharmacogenomics traits.
What emerges is a unified picture whereby previously distinct entities or categories of human diseases, chromosomal syndromes, genomic disorders, Mendelian traits, and common diseases or complex traits, can now be considered as part of one continuum (Figure 6), whereby common and rare variants including de novo mutations in the context of environmental influences result in perturbation of the biological balance of a restricted set of networks activating final common pathways that ultimately cause disease. Even though there may be many loci that contribute to interindividual inherited susceptibility of a phenotype in a population, in any one individual rare or common variants from just a few may be responsible for the trait (i.e., oligogenic inheritance). Extreme genetic heterogeneity and the contributions of new mutation may underlie some of the apparent complexity of complex traits.
A unified genetic model for human disease breaks down the artificial boundaries between categories of human disease (Figure 6). It views all human disease categories including complex traits, Mendelian disease, genomic disorders, and even chromosomal syndromes as representing a spectrum of phenotypic manifestations reflecting the totality of pathogenic variants: ancestral alleles, those arising in recent ancestors (clan), unique combinations inherited from parents, and de novo variants (Figure 1). A full accounting of individual mutational load genome-wide and expansion of the current genocentric, locus-specific model opens the door to reinvestigation of classic problems in human genetics. These challenges include understanding the molecular basis of incomplete penetrance and variable expressivity of monogenic traits, clinical manifestations of “recessive alleles” (i.e., weak semidominance), homologous allelic interaction and nonhomologous allelic interaction, and their effects on disease and health. This new synthesis is required to interpret the ecology of individual genomes in the context of complete individual genetic variation data, population genetics, and evolution.
Genome-wide assays including whole-genome sequencing, copy-number arrays, and transcriptional profiling are among the current technologies that can be used to further explore and test the “genome-wide totality of pathogenic variants” hypothesis. These genome analysis methods can now generate a massive data flow, opening up to experimental exploration fundamental questions that have occupied the minds of generations of scientists and philosophers. Yet, such genome-wide experimental assays alone will be insufficient. Other challenges include: How many types of variants (repeat expansions, CNV between 100 and 500 bp, etc.) are we missing with current techniques? How will we validate the phenotypic effects of variants observed in a single individual or family? What analytical approaches should clinical genome sequencing projects adopt given the sheer complexity of some of the gene-disease associations described herein? How can we integrate disease risk emerging from common and rare variants in an individual genome? Can disease phenotypes be refined and redefined by molecular correlates such as gene expression, chromatin conformation, DNA methylation, and all of the other ‘omics? Can individual serial observation of molecular phenotypes, much as we currently do for routine lab measures such as glucose and lipids, show us stronger effects of underlying genetic variation that are otherwise poorly captured by cross-sectional studies and lead us to yet new models?
This work was supported in part by the National Human Genome Research Institute (5 U54 HG003273) to R.A.G. and the National Institute of Neurological Disorders and Stroke (R01NS058529) to J.R.L. J.R.L. is a consultant for Athena Diagnostics, has stock ownership in 23andMe and Ion Torrent Systems, and is a coinventor on multiple United States and European patents for DNA diagnostics. R.A.G. and J.W.B. are founding shareholders in Seq-Wright, Inc. The Department of Molecular and Human Genetics derives revenue from clinical testing by high-resolution human genome analyses.
Supplemental Information includes one table and can be found with this article online at doi:10.1016/j.cell.2011.09.008.