|Home | About | Journals | Submit | Contact Us | Français|
There has been growing debate over the nature of the genetic contribution to individual susceptibility to common complex diseases such as diabetes, osteoporosis, and cancer. The ‘Common Disease, Common Variant (CDCV)’ hypothesis argues that genetic variations with appreciable frequency in the population at large, but relatively low ‘penetrance’ (or the probability that a carrier of the relevant variants will express the disease), are the major contributors to genetic susceptibility to common diseases. The ‘Common Disease, Rare Variant (CDRV)’ hypothesis, on the other hand, argues that multiple rare DNA sequence variations, each with relatively high penetrance, are the major contributors to genetic susceptibility to common diseases. Both hypotheses have their place in current research efforts.
Debates concerning precisely how genetic variations contribute to phenotypic expression have been at the heart of a great deal of biomedical research for more than a century. In fact, one of the most contentious yet insightful of these debates occurred at the turn of the 20th century and was rooted in positions championed by two opposing intellectual camps. The ‘Mendelians,’ in the form of William Bateson, Hugo de Vries, and others, focused on discrete gene-based units of inheritance and Mendel’s laws as the fundamental factors responsible for phenotypic expression and phenotypic similarities and differences across generations. On the other hand, the ‘Biometricians,’ as represented primarily by Karl Pearson, focused on the measurement and statistical analysis of continuous phenotypes such as height as well as the variation exhibited by such phenotypes within a population. The Biometricians rejected aspects of what is known today as Mendelian genetics as espoused by the ‘Mendelian’ camp at the time due to the fact that discrete units of heredity, such as Mendelian-segregating genes, could not, it seemed to them, explain the continuous range of phenotypic variation seen in real populations.
The debate between the Mendelians and Biometricians was resolved, to a high degree, by RA Fisher among others. Fisher essentially argued that multiple genes, in the form in which the Mendelians believed them to exist, each following Mendel’s laws yet working collectively (primarily additively), could influence phenotypic expression and hence the continuous variation that a phenotype might exhibit in the population at large . The historical vagaries surrounding the Mendelians’ and Biometricians’ opposition of each other, as well as Fisher’s and others’ contribution to the resolution of this opposition, have been very elegantly and richly detailed by William Provine in his book “The Origins of Theoretical Population Genetics” .
The distinction between overt, single gene-based, Mendelian forms of the inheritance and the more polygenic or multifactorial forms of inheritance of the type envisioned by the Biometricians and later refined by Fisher, provides context for contemporary debates concerning the genetic basis of complex disease entities such as hypertension, cancer and diabetes – especially the ‘Common Disease, Common Variant (CDCV) vs. Common Disease Rare Variant (CDRV) debate – in at least two ways. First, there is still debate over the actual number of genes or genetic variations that might influence any particular trait. For example, in the early-1960’s, a vigorous debate, very much analogous to the Mendelian/Biometrician debate, over the nature of essential or primary hypertension and its frequency in the population at large occurred. Basically, Sir Robert Platt argued that hypertension was due, in large part, to common genetic variations with relatively high ‘penetrance,’ whereas George Pickering argued that hypertension was due to the existence of a number of genetic variations, each with reduced penetrance (or ‘polygenes’), as comprehensively summarized by Swales . Although most contemporary geneticists consider Pickering’s hypothesis to be more consistent with the empirical studies of the epidemiology of hypertension, there is clearly room for peaceful co-existence for the two perspectives since some overtly Mendelian forms of hypertension exist which are largely attributable to genetic variations with high penetrance (albeit with low frequency ). This is true of virtually all other common chronic diseases as well, since Mendelian, primarily single genetic defect-related, forms of most diseases have been identified.
Second – and more to the point of this review – although most geneticists would argue that the genetic basis of most common chronic diseases is more likely to be consistent with the Biometrician/Fisherian/Pickering view entailing multiple genetic factors working in aggregate, there is considerable debate over the actual frequency of the multiple genes and genetic variations that may be at play, and this issue is at the heart of the CDCV vs. CDRV debate.
The CDCV hypothesis has it roots in a number of publications, but one of the most prominent, by Reich and Lander , considered what they termed the ‘allelic spectrum of disease’ and used empirical data to qualify this spectrum. This spectrum is simply the totality of variations that contribute to a disease, including low penetrance, high prenetrance, common (i.e. variations having a frequency of greater than 1% in the population), and rare variations (i.e., variations with a frequency less than 1%). Essentially, Reich and Lander provided a theoretically perspective from which they attempted to “…weave together strands from the human mutation and population genetics literature to provide a framework for understanding and predicting the allelic spectra of disease genes. The theory does a reasonable job for diseases where the genetic etiology is well understood… [but] also has bearing on the Common Disease/Common Variants (CD/CV) hypothesis, predicting that at loci where the total frequency of disease alleles is not too small, disease loci will have relatively simple spectra.” Although not complete advocates of the CDCV hypothesis, Lander and Reich concluded that, on the basis of the available data, the CDCV is not incompatible with many diseases.
A number of investigators have challenged the CDCV hypothesis and offered the alternative CDRV hypothesis in its place. For example, Pritchard , argued that population processes operative in the human lineage would be more likely to favor the existence of multiple rare variations contributing to disease rather than common variations. Essentially, Pritchard  posited that population-level processes influence the frequency of ‘deleterious’ (i.e., disease susceptibility) variations, such as mutation, random genetic drift, and purifying selection against susceptibility mutations. These processes, he argued, have acted on the human population during its expansion in the last few centuries or so, and have led to a situation in which the genomic positions or loci harboring variations underlying disease susceptibility are likely to be mildly deleterious (and hence not subject to overt selection), have a high overall mutation rate, have a total frequency that is quite high. and exhibit extensive allelic heterogeneity. Thus, Pritchard  argued that the notion that multiple, very recent rare variations contributing to disease arising in the last two centuries is more consistent with human population pathobiology than the notion that older, common variations are contributing to disease. For example, common variations are likely to be older and hence have been subjected to potential selective forces over time. By reaching an appreciable frequency, they therefore are not as likely to have been subjected to negative selection. Rare variations, on the other hand, are either likely to be new (i.e., only a few generations old) and hence not have been subjected to negative selection for a long time, or are rare because they are being selected against due to their deleterious nature. In this light, it is of interest that recent reports on the frequency of human alleles and their likely ‘functional’ or phenotypic effects suggest that less frequent variations are more likely to be functional than common variations .
The CDCV vs. CDRV debate is more than just an academic debate, as each position in the debate entails, or is consistent with, different strategies for the identification of variations contributing to disease susceptibility . We take the view that both positions are more or less defensible and correct in that multiple common variations with low penetrance and multiple rare variations with moderate to high penetrance contribute to the expression and frequency of common human diseases in the population at large.
The evidence that multiple rare variations might be contributing to human phenotypic variation is consistent with some early in-depth sequencing and re-sequencing studies of human genic variation. For example, studies by Nickerson and colleagues in the late 1990’s on the lipoprotein lipase (LpL) gene suggested that a number of naturally occurring variations, both common and rare, are likely to influence LpL function. LpL is a gene known to be a contributor to cholesterol levels and ultimately, when dysregulated, to heart disease (e.g.,). Follow-up survey sequencing studies by Nickerson and colleagues, among others, has consistently shown that rare, likely functionally significant, variations occur naturally in human genes of relevance to a number of human phenotypes and diseases (e.g., ). The identification of a number of rare variations from survey sequencing studies of physiologic important human genes is also consistent with studies of genes known to lead to rare diseases, such as cystic fibrosis and BRCA1 and BRCA2 forms of breast cancer, in which hundreds of rare, yet disease-causing, variations have been identified over the years [11, 12].
The availability of high-throughput genotyping technologies, coupled with the results of major polymorphism characterization efforts such as the International Hapmap Initiative, have made it possible to conduct genome wide association (GWA) studies seeking to identify common variations that are statistically linked with particular diseases . To date, hundreds of GWA studies have been performed, with many having identified unequivocal, statistically compelling, associations between particular genetic variations and diseases of all sorts . However, as successful as these studies have been in identifying such associations, the genetic variations identified to date from such studies collectively explain only a small fraction of the burden of any disease in the population at large. Importantly, intensive GWA studies investigating many traits and diseases have led to associations involving genes or genomic regions that typically have 30 to over 50 associated sequence variants, each of relatively low penetrance, with typical risk ratio of approximately 1.2. With rare exceptions, more than 90–95% of the heritable component of a disease has been left unexplained after extensive GWAS interrogation. This suggests that individual common inherited variations are not likely to explain the majority of common chronic disease prevalence and ultimately raises the question as to the nature of the remaining genetic factors contributing to disease, or what has been termed the ‘missing heritability’ of disease phenotypes . The fact that GWA studies have seemingly reached their limits in the identification of common variations contributing to common diseases obviously leaves the door open for the discovery of multiple rare variations that contribute to common diseases (or possibly other forms of genetic and epigenetic variation).
Many investigators have gone beyond survey sequencing of human genes to catalog rare sequence variations, to actually contrasting and comparing the frequency of rare variations in individuals with and without disease. Table 1 lists relevant studies. Virtually all of these studies have observed frequency differences between individuals with and without a particular disease phenotype for multiple rare single nucleotide polymorphisms (SNPs) in functionally-relevant (e.g., coding) regions of the genes studied. Although published studies of this sort may reflect publication bias (i.e., many other studies may have attempted to do this for different diseases and genes and simply did not find anything interesting), they do indicate that multiple rare variations are likely to be associated with some human diseases and disease-related phenotypes.
One important aspect of these studies is that they focused on the identification of multiple rare variations, any one of which might often be possessed (in isolation of others) by the individuals with the disease phenotype of interest. This suggests that all the variations contributing to the pool of variations that were (collectively) greater in frequency among the individuals with the disease phenotype perturb the gene of interest in roughly the same way to induce that disease phenotype. In fact, such ‘allelic heterogeneity’ is an important feature in the formulation of the CDRV hypothesis, in that it is argued that although there may be many genes or genomic regions that might influence a disease, whereby each of these genes or genomic regions may harbor many different rare variations that affect these genes or regions (note that this ‘allelic heterogeneity’ argument has also been made in the context of somatic mutations that influence tumorigenesis, as discussed below). Many genes that have been implicated in common disease pathogenesis have been shown to harbor multiple functionally-significant, naturally occurring, rare (and common) variations (see, e.g., Table 9.1 ).
In addition to the identification of rare SNPs contributing to common diseases and disease-related phenotypes, there have been a number of studies investigating the contribution of rare structural variations to human phenotypic variation . However, such studies are in their infancy and have often raised more questions than they have answered, as discussed below in the case of neuropsychiatric disease.
A number of recent studies have considered the contribution of copy number variations (CNVs) to neuropsychiatric diseases, including autism , and schizophrenia . Each of these studies provided compelling statistical evidence suggesting that, e.g., autistic, schizophrenic, or bipolar individuals are more likely to possess CNVs in their genome – and in particular deletions of genomic regions. The main theme of these studies is that they find evidence of not one particular CNV being present in the genomes of individuals with these conditions, but rather any of a number of rare CNVs . The finding that there are more rare CNVs, as a whole, among individuals with neuropsychiatric conditions suggests that one of two sorts of phenomena must be a play: either the genomes of individuals with these conditions are unusually ‘fragile’ in the sense that deletions arise at arbitrary places in the genome and this reflects some fundamental genomic ‘lesion’ associated with the etiologies of these disorders, or the actual locations of the deletions is of crucial importance in that these locations, when perturbed, cause brain dysfunction. Both of these phenomena are problematic. It is quite unlikely that individuals with ‘fragile’ genomes would only manifest the unique features of autism, schizophrenia, and bipolar disorder and not other conditions such as mental retardation, metabolic problems, developmental anomalies, etc. In other words, the unique phenotypes of autism, schizophrenia, and bipolar seem too specific for a gross molecular lesion such as global genomic fragility. In addition, if the genomic locations affected by the slight increase in number of rare CNVs in cases versus controls actually do harbor genes that are specific to, e.g., schizophrenia, then it is important for the scientific community to demonstrate that this is the case. This would rule out the ‘fragile genome’ hypothesis as well as any belief that the ‘multiple rare CNVs and neuropsychiatric disease’ findings reflect false positive results.
One very interesting question that has been raised by researchers contemplating the role of rare variations in complex diseases is whether or not diseases that are actually influenced by rare variations are likely to exhibit familial clustering. For example, Bodmer and Bonilla  have argued that such diseases are not likely to be familial and provide some theoretical calculations to show why this is the case. The suggestion that diseases influenced by rare variations are not familial has important consequences in that it suggests that family-based studies of the type pursued via classical genetics techniques (such as linkage analysis) are not likely to be useful for discovering causative genetic variations . However, the arguments by Bodmer and Bonilla  are problematic for at least two reasons.
First, if someone possesses a disease that is influenced by multiple rare inherited variations that work additively, then that individual’s parents obviously possessed the right constellation of variations to at least lead to a non-zero probability that they would produce an offspring that could ultimately manifest the disease. This fact would clearly lead to a higher probability of those parents producing another offspring with the phenotype than parents without the appropriate constellation of genetic variations. In this sense, one could say that the parents are ‘enriched’ for predisposing variations simply because they produced an offspring with the disease, and this enrichment could lead them to produce another offspring with the disease with a higher than average probablity. Note that this probability would clearly be a function of the number of variations that could contribute to the phenotype: if there was only single rare variant with low penetrance that could induce the phenotype, then the probability that an additional offspring would be produced by the parents is small. If, however, there are many variations that work additively, then the probability would be higher. The calculations by Bodmer and Bonilla  are consistent with the assumption of the existence of only a few predisposing variations with low penetrance, as opposed to any number of rare variations that work additively.
Second, the empirical evidence is consistent with the suggestion that diseases influenced by rare variations are indeed familial. Consider the simple fact that most common, chronic conditions such as diabetes, hypertension, and cancer have been subjected to GWA studies, as noted above, and the results of these studies suggest that common variations explain only a small fraction of these disease’s frequencies in the population at large. This leaves the door open to other genomic explanations for their frequency, such as the implication of multiple low frequency (MAF 1–5%) or rare (< 1%) variations. In fact, the lack of a major contribution to these diseases by common variations via GWA studies is motivating studies seeking to identify rare variations, including CNVs, that may contribute to them (see below). Virtually all the diseases for which GWA studies have been pursued – and for which the alternative CDRV hypothesis to the CDCV hypothesis is being explored – exhibit familial clustering.
In order to adequately test the CDRV hypothesis against the CDCV hypothesis for any disease, rare variations have to first be identified among individuals with the disease. This requires DNA sequencing protocols. Although Table 1 documents studies that have exploited DNA sequencing technologies.to identify rare variations associated with different disease phenotypes, these studies were performed with a singular focus on a particular gene or genomic region. In order to facilitate the search for rare variations in different genes, if not the entire genome, using contemporary DNA sequencing technologies, the ‘1000 Genomes’ project was initiated (www.1000genomes.org/). This project seeks to characterize sequence variation in 1000 individuals in order to provide a baseline for further disease-oriented DNA sequencing studies as well as develop appropriate protocols and bioinformatics tools.
Insight into rare variations and their potential impact on phenotypic expression has also benefited from a number of large-scale sequencing projects that attempted to sequence and assemble the entire genomes of individual humans [21,22,23]. These studies identified hundreds of thousands of novel genomic variations across the individuals studied that are very likely to be rare in the population or have arisen de novo in the genomes of the individuals sequenced. Studies investigating the potential functional impact of these rare or de novo variations suggest that many of them are likely to be functional and phenotypically-relevant .
A debate analogous to the debate about the role of common and rare variations in inherited or congenital diseases involves cancer genomics. This debate concerns the identification and differentiation of ‘driver’ mutations from ‘passenger’ mutations in tumoriigenesis. Driver mutations are those mutations that essentially cause or lead to tumoriigenesis. Passenger mutations, on the other hand, are simply those somatic mutations that build up over the unchecked cell replication that is the hallmark of cancer . A number of very recent papers describing tumor sequencing and resequencing studies – many sparked by The Cancer Genome Atlas (TCGA) initiative (http://cancergenome.nih.gov/) – have identified a number of mutations in cancers [26, 27, 28, 29], some subsets of which are likely to be causative or driver mutations. However, attempts to identify causal or driver mutations among the discovered mutations on the basis of their frequency across different samples has been criticized as highly problematic [30, 31, 32]. The current or prevailing belief is that it is unlikely that a single, or even a few, commonly observed mutations are responsible for any one tumor type. Rather, the evidence to date is consistent with the notion that a number of perturbations in particular genes and genetic pathways induced more than likely by singular or rare mutations – all of which have similar tumorigenic effects – are responsible for tumorigenesis .
In order to substantiate claims about the role of either specific common or rare variations in disease, some form of validation of an initial finding implicating those variations is in order. For common variations implicated in GWA and candidate gene association studies, the sine qua non of validation is replication of the association in an independent population or sample of individuals than that used in the initial study . However, replication studies of associations involving rare variations that exploit follow-up or ancillary populations is problematic given the infrequency of the variations of interest. This fact can be overcome to some degree by testing the hypothesis that the genes or genomic regions of interest have a collection of variations that, as a group, are more frequent among individuals with a disease phenotype than individuals without that phenotype (Figure 1). Statistical methods for carrying out such hypothesis tests are in their infancy, but will be crucial for advancing the CDRV hypothesis [16, 34, 35]. In addition to statistical evidence for associations between variations and a disease phenotype (whether implicating common or rare variations, or whether identified in an initial or replication study), it is important to assess the biological significance of the variation(s) in question via computational methods, laboratory assays or model systems.
The contemporary CDCV vs. CDRV debate is, as noted, not only rooted in historical debates about the nature of phenotypic variation, but also implicates different strategies for identifying genetic variations that predispose individuals to a disease. It is safe to say, however, that strategies for uncovering common and rare variations should be pursued for any disease phenotype, and that the CDCV/CDRV debate should be seen as not an ‘either/or’ debate, but rather as a debate about the degree to which common and rare variations contribute to a particular disease phenotype. As noted, it is known that rare, Mendelian forms of most common chronic diseases for which the CDCV vs. CDRV debate has been invoked, exist. For example, Liddle’s syndrome is a very rare form of hypertension influenced by rare genetic variations  and familial breast cancer induced by BRCA1 and BRCA2 mutations implicates multiple highly penetrant, yet very rare, variations and yet both hypertension and breast cancer have more common forms for which GWA studies and related strategies have been, and should be, pursued.
The many discoveries resulting from GWA studies themselves suggest that there must be genetic factors contributing to common complex diseases that are simply not amenable to detection via the GWA study strategy, as emphasized throughout this review, since the variations identified via GWA only explain a small fraction of the prevalence of the diseases studied. Although it could be that the ‘missing heritability’ associated with these diseases that is not accounted for by common variations is accounted for by subtle gene x environment interactions, common CNVs with low penetrance, complicated epistatistic interactions involving many common variations, and/or epigenomic phenomena, to the exclusion of rare variations, this seems unlikely. In fact, evidence of the type provided in Table 1 suggests otherwise. Ultimately, the question as to the veracity (or the degree of veracity) of the CDCV hypothesis vs. the CDRV hypothesis for any particular disease, like virtually all scientific questions, is an empirical one.
The authors benefited from the following research grants: The National Institute on Aging Longevity Consortium (U19 AG023122-01); The NIMH-funded Genetic Association Information Network Study of Bipolar Disorder (R01 MH078151-01A1); National Institutes of Health grants: N01 MH22005, U01 DA024417-01, and P50 MH081755-01; and the Scripps Translational Sciences Institute Clinical Translational Science Award (U54 RR0252204-01). Additional funding came from Scripps Genomic Medicine and the Price Foundation.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.