PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1270814)

Clipboard (0)
None

Related Articles

1.  Fine Mapping of Five Loci Associated with Low-Density Lipoprotein Cholesterol Detects Variants That Double the Explained Heritability 
PLoS Genetics  2011;7(7):e1002198.
Complex trait genome-wide association studies (GWAS) provide an efficient strategy for evaluating large numbers of common variants in large numbers of individuals and for identifying trait-associated variants. Nevertheless, GWAS often leave much of the trait heritability unexplained. We hypothesized that some of this unexplained heritability might be due to common and rare variants that reside in GWAS identified loci but lack appropriate proxies in modern genotyping arrays. To assess this hypothesis, we re-examined 7 genes (APOE, APOC1, APOC2, SORT1, LDLR, APOB, and PCSK9) in 5 loci associated with low-density lipoprotein cholesterol (LDL-C) in multiple GWAS. For each gene, we first catalogued genetic variation by re-sequencing 256 Sardinian individuals with extreme LDL-C values. Next, we genotyped variants identified by us and by the 1000 Genomes Project (totaling 3,277 SNPs) in 5,524 volunteers. We found that in one locus (PCSK9) the GWAS signal could be explained by a previously described low-frequency variant and that in three loci (PCSK9, APOE, and LDLR) there were additional variants independently associated with LDL-C, including a novel and rare LDLR variant that seems specific to Sardinians. Overall, this more detailed assessment of SNP variation in these loci increased estimates of the heritability of LDL-C accounted for by these genes from 3.1% to 6.5%. All association signals and the heritability estimates were successfully confirmed in a sample of ∼10,000 Finnish and Norwegian individuals. Our results thus suggest that focusing on variants accessible via GWAS can lead to clear underestimates of the trait heritability explained by a set of loci. Further, our results suggest that, as prelude to large-scale sequencing efforts, targeted re-sequencing efforts paired with large-scale genotyping will increase estimates of complex trait heritability explained by known loci.
Author Summary
Despite the striking success of genome-wide association studies in identifying genetic loci associated with common complex traits and diseases, much of the heritable risk for these traits and diseases remains unexplained. A higher resolution investigation of the genome through sequencing studies is expected to clarify the sources of this missing heritability. As a preview of what we might learn in these more detailed assessments of genetic variation, we used sequencing to identify potentially interesting variants in seven genes associated with low-density lipoprotein cholesterol (LDL-C) in 256 Sardinian individuals with extreme LDL-C levels, followed by large scale genotyping in 5,524 individuals, to examine newly discovered and previously described variants. We found that a combination of common and rare variants in these loci contributes to variation in LDL-C levels, and also that the initial estimate of the heritability explained by these loci doubled. Importantly, our results include a Sardinian-specific rare variant, highlighting the need for sequencing studies in isolated populations. Our results provide insights about what extensive whole-genome sequencing efforts are likely to reveal for the understanding of the genetic architecture of complex traits.
doi:10.1371/journal.pgen.1002198
PMCID: PMC3145627  PMID: 21829380
2.  Quantifying Missing Heritability at Known GWAS Loci 
PLoS Genetics  2013;9(12):e1003993.
Recent work has shown that much of the missing heritability of complex traits can be resolved by estimates of heritability explained by all genotyped SNPs. However, it is currently unknown how much heritability is missing due to poor tagging or additional causal variants at known GWAS loci. Here, we use variance components to quantify the heritability explained by all SNPs at known GWAS loci in nine diseases from WTCCC1 and WTCCC2. After accounting for expectation, we observed all SNPs at known GWAS loci to explain more heritability than GWAS-associated SNPs on average (). For some diseases, this increase was individually significant: for Multiple Sclerosis (MS) () and for Crohn's Disease (CD) (); all analyses of autoimmune diseases excluded the well-studied MHC region. Additionally, we found that GWAS loci from other related traits also explained significant heritability. The union of all autoimmune disease loci explained more MS heritability than known MS SNPs () and more CD heritability than known CD SNPs (), with an analogous increase for all autoimmune diseases analyzed. We also observed significant increases in an analysis of Rheumatoid Arthritis (RA) samples typed on ImmunoChip, with more heritability from all SNPs at GWAS loci () and more heritability from all autoimmune disease loci () compared to known RA SNPs (including those identified in this cohort). Our methods adjust for LD between SNPs, which can bias standard estimates of heritability from SNPs even if all causal variants are typed. By comparing adjusted estimates, we hypothesize that the genome-wide distribution of causal variants is enriched for low-frequency alleles, but that causal variants at known GWAS loci are skewed towards common alleles. These findings have important ramifications for fine-mapping study design and our understanding of complex disease architecture.
Author Summary
Heritable diseases have an unknown underlying “genetic architecture” that defines the distribution of effect-sizes for disease-causing mutations. Understanding this genetic architecture is an important first step in designing disease-mapping studies, and many theories have been developed on the nature of this distribution. Here, we evaluate the hypothesis that additional heritable variation lies at previously known associated loci but is not fully explained by the single most associated marker. We develop methods based on variance-components analysis to quantify this type of “local” heritability, demonstrating that standard strategies can be falsely inflated or deflated due to correlation between neighboring markers and propose a robust adjustment. In analysis of nine common diseases we find a significant average increase of local heritability, consistent with multiple common causal variants at an average locus. Intriguingly, for autoimmune diseases we also observe significant local heritability in loci not associated with the specific disease but with other autoimmune diseases, implying a highly correlated underlying disease architecture. These findings have important implications to the design of future studies and our general understanding of common disease.
doi:10.1371/journal.pgen.1003993
PMCID: PMC3873246  PMID: 24385918
3.  Partitioning the Heritability of Tourette Syndrome and Obsessive Compulsive Disorder Reveals Differences in Genetic Architecture 
Davis, Lea K. | Yu, Dongmei | Keenan, Clare L. | Gamazon, Eric R. | Konkashbaev, Anuar I. | Derks, Eske M. | Neale, Benjamin M. | Yang, Jian | Lee, S. Hong | Evans, Patrick | Barr, Cathy L. | Bellodi, Laura | Benarroch, Fortu | Berrio, Gabriel Bedoya | Bienvenu, Oscar J. | Bloch, Michael H. | Blom, Rianne M. | Bruun, Ruth D. | Budman, Cathy L. | Camarena, Beatriz | Campbell, Desmond | Cappi, Carolina | Cardona Silgado, Julio C. | Cath, Danielle C. | Cavallini, Maria C. | Chavira, Denise A. | Chouinard, Sylvain | Conti, David V. | Cook, Edwin H. | Coric, Vladimir | Cullen, Bernadette A. | Deforce, Dieter | Delorme, Richard | Dion, Yves | Edlund, Christopher K. | Egberts, Karin | Falkai, Peter | Fernandez, Thomas V. | Gallagher, Patience J. | Garrido, Helena | Geller, Daniel | Girard, Simon L. | Grabe, Hans J. | Grados, Marco A. | Greenberg, Benjamin D. | Gross-Tsur, Varda | Haddad, Stephen | Heiman, Gary A. | Hemmings, Sian M. J. | Hounie, Ana G. | Illmann, Cornelia | Jankovic, Joseph | Jenike, Michael A. | Kennedy, James L. | King, Robert A. | Kremeyer, Barbara | Kurlan, Roger | Lanzagorta, Nuria | Leboyer, Marion | Leckman, James F. | Lennertz, Leonhard | Liu, Chunyu | Lochner, Christine | Lowe, Thomas L. | Macciardi, Fabio | McCracken, James T. | McGrath, Lauren M. | Mesa Restrepo, Sandra C. | Moessner, Rainald | Morgan, Jubel | Muller, Heike | Murphy, Dennis L. | Naarden, Allan L. | Ochoa, William Cornejo | Ophoff, Roel A. | Osiecki, Lisa | Pakstis, Andrew J. | Pato, Michele T. | Pato, Carlos N. | Piacentini, John | Pittenger, Christopher | Pollak, Yehuda | Rauch, Scott L. | Renner, Tobias J. | Reus, Victor I. | Richter, Margaret A. | Riddle, Mark A. | Robertson, Mary M. | Romero, Roxana | Rosàrio, Maria C. | Rosenberg, David | Rouleau, Guy A. | Ruhrmann, Stephan | Ruiz-Linares, Andres | Sampaio, Aline S. | Samuels, Jack | Sandor, Paul | Sheppard, Brooke | Singer, Harvey S. | Smit, Jan H. | Stein, Dan J. | Strengman, E. | Tischfield, Jay A. | Valencia Duarte, Ana V. | Vallada, Homero | Van Nieuwerburgh, Filip | Veenstra-VanderWeele, Jeremy | Walitza, Susanne | Wang, Ying | Wendland, Jens R. | Westenberg, Herman G. M. | Shugart, Yin Yao | Miguel, Euripedes C. | McMahon, William | Wagner, Michael | Nicolini, Humberto | Posthuma, Danielle | Hanna, Gregory L. | Heutink, Peter | Denys, Damiaan | Arnold, Paul D. | Oostra, Ben A. | Nestadt, Gerald | Freimer, Nelson B. | Pauls, David L. | Wray, Naomi R. | Stewart, S. Evelyn | Mathews, Carol A. | Knowles, James A. | Cox, Nancy J. | Scharf, Jeremiah M.
PLoS Genetics  2013;9(10):e1003864.
The direct estimation of heritability from genome-wide common variant data as implemented in the program Genome-wide Complex Trait Analysis (GCTA) has provided a means to quantify heritability attributable to all interrogated variants. We have quantified the variance in liability to disease explained by all SNPs for two phenotypically-related neurobehavioral disorders, obsessive-compulsive disorder (OCD) and Tourette Syndrome (TS), using GCTA. Our analysis yielded a heritability point estimate of 0.58 (se = 0.09, p = 5.64e-12) for TS, and 0.37 (se = 0.07, p = 1.5e-07) for OCD. In addition, we conducted multiple genomic partitioning analyses to identify genomic elements that concentrate this heritability. We examined genomic architectures of TS and OCD by chromosome, MAF bin, and functional annotations. In addition, we assessed heritability for early onset and adult onset OCD. Among other notable results, we found that SNPs with a minor allele frequency of less than 5% accounted for 21% of the TS heritability and 0% of the OCD heritability. Additionally, we identified a significant contribution to TS and OCD heritability by variants significantly associated with gene expression in two regions of the brain (parietal cortex and cerebellum) for which we had available expression quantitative trait loci (eQTLs). Finally we analyzed the genetic correlation between TS and OCD, revealing a genetic correlation of 0.41 (se = 0.15, p = 0.002). These results are very close to previous heritability estimates for TS and OCD based on twin and family studies, suggesting that very little, if any, heritability is truly missing (i.e., unassayed) from TS and OCD GWAS studies of common variation. The results also indicate that there is some genetic overlap between these two phenotypically-related neuropsychiatric disorders, but suggest that the two disorders have distinct genetic architectures.
Author Summary
Family and twin studies have shown that genetic risk factors are important in the development of Tourette Syndrome (TS) and obsessive compulsive disorder (OCD). However, efforts to identify the individual genetic risk factors involved in these two neuropsychiatric disorders have been largely unsuccessful. One possible explanation for this is that many genetic variations scattered throughout the genome each contribute a small amount to the overall risk. For TS and OCD, the genetic architecture (characterized by the number, frequency, and distribution of genetic risk factors) is presently unknown. This study examined the genetic architecture of TS and OCD in a variety of ways. We found that rare genetic changes account for more genetic risk in TS than in OCD; certain chromosomes contribute to OCD risk more than others; and variants that influence the level of genes expressed in two regions of the brain can account for a significant amount of risk for both TS and OCD. Results from this study might help in determining where, and what kind of variants are individual risk factors for TS and OCD and where they might be located in the human genome.
doi:10.1371/journal.pgen.1003864
PMCID: PMC3812053  PMID: 24204291
4.  Improved Detection of Common Variants Associated with Schizophrenia and Bipolar Disorder Using Pleiotropy-Informed Conditional False Discovery Rate 
PLoS Genetics  2013;9(4):e1003455.
Several lines of evidence suggest that genome-wide association studies (GWAS) have the potential to explain more of the “missing heritability” of common complex phenotypes. However, reliable methods to identify a larger proportion of single nucleotide polymorphisms (SNPs) that impact disease risk are currently lacking. Here, we use a genetic pleiotropy-informed conditional false discovery rate (FDR) method on GWAS summary statistics data to identify new loci associated with schizophrenia (SCZ) and bipolar disorders (BD), two highly heritable disorders with significant missing heritability. Epidemiological and clinical evidence suggest similar disease characteristics and overlapping genes between SCZ and BD. Here, we computed conditional Q–Q curves of data from the Psychiatric Genome Consortium (SCZ; n = 9,379 cases and n = 7,736 controls; BD: n = 6,990 cases and n = 4,820 controls) to show enrichment of SNPs associated with SCZ as a function of association with BD and vice versa with a corresponding reduction in FDR. Applying the conditional FDR method, we identified 58 loci associated with SCZ and 35 loci associated with BD below the conditional FDR level of 0.05. Of these, 14 loci were associated with both SCZ and BD (conjunction FDR). Together, these findings show the feasibility of genetic pleiotropy-informed methods to improve gene discovery in SCZ and BD and indicate overlapping genetic mechanisms between these two disorders.
Author Summary
Genome-wide association studies (GWAS) have thus far identified only a small fraction of the heritability of common complex disorders, such as severe mental disorders. We used a conditional false discovery rate approach for analysis of GWAS data, exploiting “genetic pleiotropy” to increase discovery of common gene variants associated with schizophrenia and bipolar disorders. Leveraging the increased power from combining GWAS of two associated phenotypes, we found a striking overlap in polygenic signals, allowing for the discovery of several new common gene variants associated with bipolar disorder and schizophrenia that were not identified in the original analysis using traditional GWAS methods. Some of the gene variants have been identified in other studies with large targeted replication samples, validating the present findings. Our pleiotropy-informed method may be of significant importance for detecting effects that are below the traditional genome-wide significance level in GWAS, particularly in highly polygenic, complex phenotypes, such as schizophrenia and bipolar disorder, where most of the genetic signal is missing (i.e., “missing heritability”). The findings also offer insights into mechanistic relationships between bipolar disorder and schizophrenia pathogenesis.
doi:10.1371/journal.pgen.1003455
PMCID: PMC3636100  PMID: 23637625
5.  Statistical Power to Detect Genetic (Co)Variance of Complex Traits Using SNP Data in Unrelated Samples 
PLoS Genetics  2014;10(4):e1004269.
We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.
Author Summary
Genome-wide association studies (GWAS) have identified thousands of genetic variants for hundreds of traits and diseases. However, the genetic variants discovered from GWAS only explained a small fraction of the heritability, resulting in the question of “missing heritability”. We have recently developed approaches (called GREML) to estimate the overall contribution of all SNPs to the phenotypic variance of a trait (disease) and the proportion of genetic overlap between traits (diseases). A frequently asked question is that how many samples are required to estimate the proportion of variance attributable to all SNPs and the proportion of genetic overlap with useful precision. In this study, we derive the standard errors of the estimated parameters from theory and find that they are highly consistent with those observed values from published results and those obtained from simulation. The theory together with an online application tool will be helpful to plan experimental design to quantify the missing heritability, and to estimate the genetic overlap between traits (diseases) especially when it is unfeasible to have the traits (diseases) measured on the same individuals.
doi:10.1371/journal.pgen.1004269
PMCID: PMC3983037  PMID: 24721987
6.  Ubiquitous Polygenicity of Human Complex Traits: Genome-Wide Analysis of 49 Traits in Koreans 
PLoS Genetics  2013;9(3):e1003355.
Recent studies in population of European ancestry have shown that 30%∼50% of heritability for human complex traits such as height and body mass index, and common diseases such as schizophrenia and rheumatoid arthritis, can be captured by common SNPs and that genetic variation attributed to chromosomes are in proportion to their length. Using genome-wide estimation and partitioning approaches, we analysed 49 human quantitative traits, many of which are relevant to human diseases, in 7,170 unrelated Korean individuals genotyped on 326,262 SNPs. For 43 of the 49 traits, we estimated a nominally significant (P<0.05) proportion of variance explained by all SNPs on the Affymetrix 5.0 genotyping array (). On average across 47 of the 49 traits for which the estimate of is non-zero, common SNPs explain approximately one-third (range of 7.8% to 76.8%) of narrow sense heritability.
The estimate of is highly correlated with the proportion of SNPs with association P<0.031 (r2 = 0.92). Longer genomic segments tend to explain more phenotypic variation, with a correlation of 0.78 between the estimate of variance explained by individual chromosomes and their physical length, and 1% of the genome explains approximately 1% of the genetic variance. Despite the fact that there are a few SNPs with large effects for some traits, these results suggest that polygenicity is ubiquitous for most human complex traits and that a substantial proportion of the “missing heritability” is captured by common SNPs.
Author Summary
The “missing heritability” problem has been intensely debated for the last few years. Possible explanations include the existence of many genetic variants each with a small effect, rare variants with large effects, and heritability being over-estimated. Previous studies using whole-genome estimation have demonstrated that for human complex traits such as height, body mass index, and intelligence, a large portion of the heritability can be captured by all the common SNPs on the current genotyping arrays. These studies, however, were all concentrated only on a few traits. In this study, we analysed 49 quantitative traits in a sample of ∼7,000 unrelated Korean individuals. We found that, on average over all the traits, common SNPs on the Affymetrix 5.0 genotyping array explain approximately a third of the heritability, that genetic variants are widely distributed across the whole genome with longer chromosomes explaining more phenotypic variation, and that approximately any 1% of the genome explains 1% of the heritability. Despite examples where a few variants explain a substantial amount of variation, all these results are consistent with polygenicity being ubiquitous for most complex traits.
doi:10.1371/journal.pgen.1003355
PMCID: PMC3591292  PMID: 23505390
7.  Human metabolic profiles are stably controlled by genetic and environmental variation 
A comprehensive variation map of the human metabolome identifies genetic and stable-environmental sources as major drivers of metabolite concentrations. The data suggest that sample sizes of a few thousand are sufficient to detect metabolite biomarkers predictive of disease.
We designed a longitudinal twin study to characterize the genetic, stable-environmental, and longitudinally fluctuating influences on metabolite concentrations in two human biofluids—urine and plasma—focusing specifically on the representative subset of metabolites detectable by 1H nuclear magnetic resonance (1H NMR) spectroscopy.We identified widespread genetic and stable-environmental influences on the (urine and plasma) metabolomes, with (30 and 42%) attributable on average to familial sources, and (47 and 60%) attributable to longitudinally stable sources.Ten of the metabolites annotated in the study are estimated to have >60% familial contribution to their variation in concentration.Our findings have implications for the design and interpretation of 1H NMR-based molecular epidemiology studies. On the basis of the stable component of variation quantified in the current paper, we specified a model of disease association under which we inferred that sample sizes of a few thousand should be sufficient to detect disease-predictive metabolite biomarkers.
Metabolites are small molecules involved in biochemical processes in living systems. Their concentration in biofluids, such as urine and plasma, can offer insights into the functional status of biological pathways within an organism, and reflect input from multiple levels of biological organization—genetic, epigenetic, transcriptomic, and proteomic—as well as from environmental and lifestyle factors. Metabolite levels have the potential to indicate a broad variety of deviations from the ‘normal' physiological state, such as those that accompany a disease, or an increased susceptibility to disease. A number of recent studies have demonstrated that metabolite concentrations can be used to diagnose disease states accurately. A more ambitious goal is to identify metabolite biomarkers that are predictive of future disease onset, providing the possibility of intervention in susceptible individuals.
If an extreme concentration of a metabolite is to serve as an indicator of disease status, it is usually important to know the distribution of metabolite levels among healthy individuals. It is also useful to characterize the sources of that observed variation in the healthy population. A proportion of that variation—the heritable component—is attributable to genetic differences between individuals, potentially at many genetic loci. An effective, molecular indicator of a heritable, complex disease is likely to have a substantive heritable component. Non-heritable biological variation in metabolite concentrations can arise from a variety of environmental influences, such as dietary intake, lifestyle choices, general physical condition, composition of gut microflora, and use of medication. Variation across a population in stable-environmental influences leads to long-term differences between individuals in their baseline metabolite levels. Dynamic environmental pressures lead to short-term fluctuations within an individual about their baseline level. A metabolite whose concentration changes substantially in response to short-term pressures is relatively unlikely to offer long-term prediction of disease. In summary, the potential suitability of a metabolite to predict disease is reflected by the relative contributions of heritable and stable/unstable-environmental factors to its variation in concentration across the healthy population.
Studies involving twins are an established technique for quantifying the heritable component of phenotypes in human populations. Monozygotic (MZ) twins share the same DNA genome-wide, while dizygotic (DZ) twins share approximately half their inherited DNA, as do ordinary siblings. By comparing the average extent of phenotypic concordance within MZ pairs to that within DZ pairs, it is possible to quantify the heritability of a trait, and also to quantify the familiality, which refers to the combination of heritable and common-environmental effects (i.e., environmental influences shared by twins in a pair). In addition to incorporating twins into the study design, it is useful to quantify the phenotype in some individuals at multiple time points. The longitudinal aspect of such a study allows environmental effects to be decomposed into those that affect the phenotype over the short term and those that exert stable influence.
For the current study, urine and blood samples were collected from a cohort of MZ and DZ twins, with some twins donating samples on two occasions several months apart. Samples were analysed by 1H nuclear magnetic resonance (1H NMR) spectroscopy—an untargeted, discovery-driven technique for quantifying metabolite concentrations in biological samples. The application of 1H NMR to a biological sample creates a spectrum, made up of multiple peaks, with each peak's size quantitatively representing the concentration of its corresponding hydrogen-containing metabolite.
In each biological sample in our study, we extracted a full set of peaks, and thereby quantified the concentrations of all common plasma and urine metabolites detectable by 1H NMR. We developed bespoke statistical methods to decompose the observed concentration variation at each metabolite peak into that originating from familial, individual-environmental, and unstable-environmental sources.
We quantified the variability landscape across all common metabolite peaks in the urine and plasma 1H NMR metabolomes. We annotated a subset of peaks with a total of 65 metabolites; the variance decompositions for these are shown in Figure 1. Ten metabolites' concentrations were estimated to have familial contributions in excess of 60%. The average proportion of stable variation across all extracted metabolite peaks was estimated to be 47% in the urine samples and 60% in the plasma samples; the average estimated familiality was 30% for urine and 42% for plasma. These results comprise the first quantitative variation map of the 1H NMR metabolome. The identification and quantification of substantive widespread stability provides support for the use of these biofluids in molecular epidemiology studies. On the basis of our findings, we performed power calculations for a hypothetical study searching for predictive disease biomarkers among 1H NMR-detectable urine and plasma metabolites. Our calculations suggest that sample sizes of 2000–5000 should allow reliable identification of disease-predictive metabolite concentrations explaining 5–10% of disease risk, while greater sample sizes of 5000–20 000 would be required to identify metabolite concentrations explaining 1–2% of disease risk.
1H Nuclear Magnetic Resonance spectroscopy (1H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired 1H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in 1H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect 1H NMR-based biomarkers quantifying predisposition to disease.
doi:10.1038/msb.2011.57
PMCID: PMC3202796  PMID: 21878913
biomarker; 1H nuclear magnetic resonance spectroscopy; metabolome-wide association study; top-down systems biology; variance decomposition
8.  Lessons from Model Organisms: Phenotypic Robustness and Missing Heritability in Complex Disease 
PLoS Genetics  2012;8(11):e1003041.
Genetically tractable model organisms from phages to mice have taught us invaluable lessons about fundamental biological processes and disease-causing mutations. Owing to technological and computational advances, human biology and the causes of human diseases have become accessible as never before. Progress in identifying genetic determinants for human diseases has been most remarkable for Mendelian traits. In contrast, identifying genetic determinants for complex diseases such as diabetes, cancer, and cardiovascular and neurological diseases has remained challenging, despite the fact that these diseases cluster in families. Hundreds of variants associated with complex diseases have been found in genome-wide association studies (GWAS), yet most of these variants explain only a modest amount of the observed heritability, a phenomenon known as “missing heritability.” The missing heritability has been attributed to many factors, mainly inadequacies in genotyping and phenotyping. We argue that lessons learned about complex traits in model organisms offer an alternative explanation for missing heritability in humans. In diverse model organisms, phenotypic robustness differs among individuals, and those with decreased robustness show increased penetrance of mutations and express previously cryptic genetic variation. We propose that phenotypic robustness also differs among humans and that individuals with lower robustness will be more responsive to genetic and environmental perturbations and hence susceptible to disease. Phenotypic robustness is a quantitative trait that can be accurately measured in model organisms, but not as yet in humans. We propose feasible approaches to measure robustness in large human populations, proof-of-principle experiments for robustness markers in model organisms, and a new GWAS design that takes differences in robustness into account.
doi:10.1371/journal.pgen.1003041
PMCID: PMC3499356  PMID: 23166511
9.  Missing heritability and strategies for finding the underlying causes of complex disease 
Nature reviews. Genetics  2010;11(6):446-450.
Although recent genome-wide studies have provided valuable insights into the genetic basis of human disease, they have explained relatively little of the heritability of most complex traits, and the variants identified through these studies have small effect sizes. This has led to the important and hotly debated issue of where the ‘missing heritability’ of complex diseases might be found. Here, seven leading geneticists offer their opinion about where this heritability is likely to lie, what this could tell us about the underlying genetic architecture of common diseases and how this could inform research strategies for uncovering genetic risk factors.
doi:10.1038/nrg2809
PMCID: PMC2942068  PMID: 20479774
10.  An Evolutionary Perspective on Epistasis and the Missing Heritability 
PLoS Genetics  2013;9(2):e1003295.
The relative importance between additive and non-additive genetic variance has been widely argued in quantitative genetics. By approaching this question from an evolutionary perspective we show that, while additive variance can be maintained under selection at a low level for some patterns of epistasis, the majority of the genetic variance that will persist is actually non-additive. We propose that one reason that the problem of the “missing heritability” arises is because the additive genetic variation that is estimated to be contributing to the variance of a trait will most likely be an artefact of the non-additive variance that can be maintained over evolutionary time. In addition, it can be shown that even a small reduction in linkage disequilibrium between causal variants and observed SNPs rapidly erodes estimates of epistatic variance, leading to an inflation in the perceived importance of additive effects. We demonstrate that the perception of independent additive effects comprising the majority of the genetic architecture of complex traits is biased upwards and that the search for causal variants in complex traits under selection is potentially underpowered by parameterising for additive effects alone. Given dense SNP panels the detection of causal variants through genome-wide association studies may be improved by searching for epistatic effects explicitly.
Author Summary
In this study we have shown that two independent problems may have a common cause. Why do traits under selection exhibit additive genetic variance, and why is the proportion of the heritability explained by additive effects much smaller than the total heritability estimated to exist? Our results indicate that epistatic interactions can allow deleterious mutations to persist under selection and that these interactions can abate the depletion of additive genetic variation. Furthermore, a much larger element of non-additive genetic variance is maintained, which supports the notion that the heritability estimated from family studies could be a mixture of both additive and non-additive components. We show that searching directly for epistatic effects greatly improves the discovery of variants under selection, despite the multiple testing penalty being much larger. Finally, we demonstrate that common practices in genome-wide association studies could lead to both an ascertainment bias in detecting additive effects and a confirmation bias in perceiving that most of the genetic variance is additive.
doi:10.1371/journal.pgen.1003295
PMCID: PMC3585114  PMID: 23509438
11.  Genetic Dissection of the Drosophila melanogaster Female Head Transcriptome Reveals Widespread Allelic Heterogeneity 
PLoS Genetics  2014;10(5):e1004322.
Modern genetic mapping is plagued by the “missing heritability” problem, which refers to the discordance between the estimated heritabilities of quantitative traits and the variance accounted for by mapped causative variants. One major potential explanation for the missing heritability is allelic heterogeneity, in which there are multiple causative variants at each causative gene with only a fraction having been identified. The majority of genome-wide association studies (GWAS) implicitly assume that a single SNP can explain all the variance for a causative locus. However, if allelic heterogeneity is prevalent, a substantial amount of genetic variance will remain unexplained. In this paper, we take a haplotype-based mapping approach and quantify the number of alleles segregating at each locus using a large set of 7922 eQTL contributing to regulatory variation in the Drosophila melanogaster female head. Not only does this study provide a comprehensive eQTL map for a major community genetic resource, the Drosophila Synthetic Population Resource, but it also provides a direct test of the allelic heterogeneity hypothesis. We find that 95% of cis-eQTLs and 78% of trans-eQTLs are due to multiple alleles, demonstrating that allelic heterogeneity is widespread in Drosophila eQTL. Allelic heterogeneity likely contributes significantly to the missing heritability problem common in GWAS studies.
Author Summary
For traits with complex genetic inheritance it has generally proven very difficult to identify the majority of the specific causative variants involved. A range of hypotheses have been put forward to explain this so-called “missing heritability”. One idea—allelic heterogeneity, where genes each harbor multiple different causative variants—has received little attention, because it is difficult to detect with most genetic mapping designs. Here we make use of a panel of Drosophila melanogaster lines derived from multiple founders, allowing us to directly test for the presence of multiple alleles at a large set of genetic loci influencing gene expression. We find that the vast majority of loci harbor more than two functional alleles, demonstrating extensive allelic heterogeneity at the level of gene expression and suggesting that such heterogeneity is an important factor determining the genetic basis of complex trait variation in general.
doi:10.1371/journal.pgen.1004322
PMCID: PMC4014434  PMID: 24810915
12.  Hope for GWAS: Relevant Risk Genes Uncovered from GWAS Statistical Noise 
Hundreds of genetic variants have been associated to common diseases through genome-wide association studies (GWAS), yet there are limits to current approaches in detecting true small effect risk variants against a background of false positive findings. Here we addressed the missing heritability problem, aiming to test whether there are indeed risk variants within GWAS statistical noise and to develop a systematic strategy to retrieve these hidden variants. Employing an integrative approach, which combines protein-protein interactions with association data from GWAS for 6 common diseases, we found that associated-genes at less stringent significance levels (p < 0.1) with any of these diseases are functionally connected beyond noise expectation. This functional coherence was used to identify disease-relevant subnetworks, which were shown to be enriched in known genes, outperforming the selection of top GWAS genes. As a proof of principle, we applied this approach to breast cancer, supporting well-known breast cancer genes, while pinpointing novel susceptibility genes for experimental validation. This study reinforces the idea that GWAS are under-analyzed and that missing heritability is rather hidden. It extends the use of protein networks to reveal this missing heritability, thus leveraging the large investment in GWAS that produced so far little tangible gain.
doi:10.3390/ijms151017601
PMCID: PMC4227180  PMID: 25268625
genome-wide association studies (GWAS); missing heritability; protein-protein interaction networks; functional coherence
13.  Genomics in the Post-GWAS Era 
Seminars in Liver Disease  2011;31(2):215-222.
The field of genomics has entered a new era in which the ability to identify genetic variants that impact complex human traits and disease in an unbiased fashion using genome-wide approaches is widely accessible. To date, the workhorse of these efforts has been the genome-wide association study (GWAS), which has quickly moved from novel to routine, and has provided key insights into aspects of the underlying allelic architecture of complex traits. The main lesson learned from the early GWAS efforts is that though many disease-associated variants are often discovered, most have only a minor effect on disease, and in total explain only a small amount of the apparent heritability. Here we provide a brief overview of the genetic variation classes that may harbor the heritability missing from GWAS, and touch on approaches that will be leveraged in the coming years as genomics—and by extension medicine—becomes increasingly personalized.
doi:10.1055/s-0031-1276641
PMCID: PMC3285545  PMID: 21538286
Genetics; genomics; GWAS; liver; complex disease
14.  Genetic variations in colorectal cancer risk and clinical outcome 
Colorectal cancer (CRC) has an apparent hereditary component, as evidenced by the well-characterized genetic syndromes and family history associated with the increased risk of this disease. However, in a large fraction of CRC cases, no known genetic syndrome or family history can be identified, suggesting the presence of “missing heritability” in CRC etiology. The genome-wide association study (GWAS) platform has led to the identification of multiple replicable common genetic variants associated with CRC risk. These newly discovered genetic variations might account for a portion of the missing heritability. Here, we summarize the recent GWASs related to newly identified genetic variants associated with CRC risk and clinical outcome. The findings from these studies suggest that there is a lack of understanding of the mechanism of many single nucleotide polymorphisms (SNPs) that are associated with CRC. In addition, the utility of SNPs as prognostic markers of CRC in clinical settings remains to be further assessed. Finally, the currently validated SNPs explain only a small fraction of total heritability in complex-trait diseases like CRC. Thus, the “missing heritability” still needs to be explored further. Future epidemiological and functional investigations of these variants will add to our understanding of CRC pathogenesis, and may ultimately lead to individualized strategies for prevention and treatment of CRC.
doi:10.3748/wjg.v20.i15.4167
PMCID: PMC3989953  PMID: 24764655
Colorectal cancer; Genome-wide association study; Single nucleotide polymorphism; Signal transduction pathways; Cell cycle control; Gene desert; Genome instability
15.  Insight in Genome-Wide Association of Metabolite Quantitative Traits by Exome Sequence Analyses 
PLoS Genetics  2015;11(1):e1004835.
Metabolite quantitative traits carry great promise for epidemiological studies, and their genetic background has been addressed using Genome-Wide Association Studies (GWAS). Thus far, the role of less common variants has not been exhaustively studied. Here, we set out a GWAS for metabolite quantitative traits in serum, followed by exome sequence analysis to zoom in on putative causal variants in the associated genes. 1H Nuclear Magnetic Resonance (1H-NMR) spectroscopy experiments yielded successful quantification of 42 unique metabolites in 2,482 individuals from The Erasmus Rucphen Family (ERF) study. Heritability of metabolites were estimated by SOLAR. GWAS was performed by linear mixed models, using HapMap imputations. Based on physical vicinity and pathway analyses, candidate genes were screened for coding region variation using exome sequence data. Heritability estimates for metabolites ranged between 10% and 52%. GWAS replicated three known loci in the metabolome wide significance: CPS1 with glycine (P-value  = 1.27×10−32), PRODH with proline (P-value  = 1.11×10−19), SLC16A9 with carnitine level (P-value  = 4.81×10−14) and uncovered a novel association between DMGDH and dimethyl-glycine (P-value  = 1.65×10−19) level. In addition, we found three novel, suggestively significant loci: TNP1 with pyruvate (P-value  = 1.26×10−8), KCNJ16 with 3-hydroxybutyrate (P-value  = 1.65×10−8) and 2p12 locus with valine (P-value  = 3.49×10−8). Exome sequence analysis identified potentially causal coding and regulatory variants located in the genes CPS1, KCNJ2 and PRODH, and revealed allelic heterogeneity for CPS1 and PRODH. Combined GWAS and exome analyses of metabolites detected by high-resolution 1H-NMR is a robust approach to uncover metabolite quantitative trait loci (mQTL), and the likely causative variants in these loci. It is anticipated that insight in the genetics of intermediate phenotypes will provide additional insight into the genetics of complex traits.
Author Summary
Human metabolic individuality is under strict control of genetic and environmental factors. In our study, we aimed to find the genetic determinants of circulating molecules in sera of large set of individuals representing the general population. First, we performed a hypothesis-free genome wide screen in this population to identify genetic regions of interest. Our study confirmed four known gene metabolite connections, but also pointed to four novel ones. Genome-wide screens enriched for common intergenic variants may miss causal genetic variations directly changing the protein sequence. To investigate this further, we zoomed into regions of interest and tested whether the association signals obtained in the first stage were direct, or whether they represent causal variations, which were not captured in the initial panel. These subsequent tests showed that protein coding and regulatory variations are involved in metabolite levels. For two genomic regions we also found that genes harbour more than one causal variant influencing metabolite levels independent of each other. We also observed strong connection between markers of cardio-metabolic health and metabolites. Taken together, our novel loci are of interest for further research to investigate the causal relation to for instance type 2 diabetes and cardiovascular disease.
doi:10.1371/journal.pgen.1004835
PMCID: PMC4287344  PMID: 25569235
16.  Gene set analysis of SNP data: benefits, challenges, and future directions 
The last decade of human genetic research witnessed the completion of hundreds of genome-wide association studies (GWASs). However, the genetic variants discovered through these efforts account for only a small proportion of the heritability of complex traits. One explanation for the missing heritability is that the common analysis approach, assessing the effect of each single-nucleotide polymorphism (SNP) individually, is not well suited to the detection of small effects of multiple SNPs. Gene set analysis (GSA) is one of several approaches that may contribute to the discovery of additional genetic risk factors for complex traits. Complex phenotypes are thought to be controlled by networks of interacting biochemical and physiological pathways influenced by the products of sets of genes. By assessing the overall evidence of association of a phenotype with all measured variation in a set of genes, GSA may identify functionally relevant sets of genes corresponding to relevant biomolecular pathways, which will enable more focused studies of genetic risk factors. This approach may thus contribute to the discovery of genetic variants responsible for some of the missing heritability. With the increased use of these approaches for the secondary analysis of data from GWAS, it is important to understand the different GSA methods and their strengths and weaknesses, and consider challenges inherent in these types of analyses. This paper provides an overview of GSA, highlighting the key challenges, potential solutions, and directions for ongoing research.
doi:10.1038/ejhg.2011.57
PMCID: PMC3172936  PMID: 21487444
pathway analysis; multilocus; complex traits; genetic association studies
17.  A Genome-Wide Assessment of the Role of Untagged Copy Number Variants in Type 1 Diabetes 
PLoS Genetics  2014;10(5):e1004367.
Genome-wide association studies (GWAS) for type 1 diabetes (T1D) have successfully identified more than 40 independent T1D associated tagging single nucleotide polymorphisms (SNPs). However, owing to technical limitations of copy number variants (CNVs) genotyping assays, the assessment of the role of CNVs has been limited to the subset of these in high linkage disequilibrium with tag SNPs. The contribution of untagged CNVs, often multi-allelic and difficult to genotype using existing assays, to the heritability of T1D remains an open question. To investigate this issue, we designed a custom comparative genetic hybridization array (aCGH) specifically designed to assay untagged CNV loci identified from a variety of sources. To overcome the technical limitations of the case control design for this class of CNVs, we genotyped the Type 1 Diabetes Genetics Consortium (T1DGC) family resource (representing 3,903 transmissions from parents to affected offspring) and used an association testing strategy that does not necessitate obtaining discrete genotypes. Our design targeted 4,309 CNVs, of which 3,410 passed stringent quality control filters. As a positive control, the scan confirmed the known T1D association at the INS locus by direct typing of the 5′ variable number of tandem repeat (VNTR) locus. Our results clarify the fact that the disease association is indistinguishable from the two main polymorphic allele classes of the INS VNTR, class I-and class III. We also identified novel technical artifacts resulting into spurious associations at the somatically rearranging loci, T cell receptor, TCRA/TCRD and TCRB, and Immunoglobulin heavy chain, IGH, loci on chromosomes 14q11.2, 7q34 and 14q32.33, respectively. However, our data did not identify novel T1D loci. Our results do not support a major role of untagged CNVs in T1D heritability.
Author Summary
For many complex traits, and in particular type 1 diabetes (T1D), the genome-wide association study (GWAS) design has been successful at detecting a large number of loci that contribute disease risk. However, in the case of T1D as well as almost all other traits, the sum of these loci does not fully explain the heritability estimated from familial studies. This observation raises the possibility that additional variants exist but have not yet been found because they have not effectively been targeted by the GWAS design. Here, we focus on a specific class of large deletions/duplications called copy number variants (CNVs), and more precisely to the subset of these loci that mutate rapidly, which are highly polymorphic. A consequence of this high level of polymorphism is that these variants have typically not been captured by previous GWAS studies. We use a family based design that is optimized to capture these previously untested variants. We then perform a genome-wide scan to assess their contribution to T1D. Our scan was technically successful but did not identify novel associations. This suggests that little was missed by the GWAS strategy, and that the remaining heritability of T1D is most likely driven by a large number of variants, either rare of common, but with a small individual contribution to disease risk.
doi:10.1371/journal.pgen.1004367
PMCID: PMC4038470  PMID: 24875393
18.  Gene-based multiple trait analysis for exome sequencing data 
BMC Proceedings  2011;5(Suppl 9):S75.
The common genetic variants identified through genome-wide association studies explain only a small proportion of the genetic risk for complex diseases. The advancement of next-generation sequencing technologies has enabled the detection of rare variants that are expected to contribute significantly to the missing heritability. Some genetic association studies provide multiple correlated traits for analysis. Multiple trait analysis has the potential to improve the power to detect pleiotropic genetic variants that influence multiple traits. We propose a gene-level association test for multiple traits that accounts for correlation among the traits. Gene- or region-level testing for association involves both common and rare variants. Statistical tests for common variants may have limited power for individual rare variants because of their low frequency and multiple testing issues. To address these concerns, we use the weighted-sum pooling method to test the joint association of multiple rare and common variants within a gene. The proposed method is applied to the Genetic Association Workshop 17 (GAW17) simulated mini-exome data to analyze multiple traits. Because of the nature of the GAW17 simulation model, increased power was not observed for multiple-trait analysis compared to single-trait analysis. However, multiple-trait analysis did not result in a substantial loss of power because of the testing of multiple traits. We conclude that this method would be useful for identifying pleiotropic genes.
doi:10.1186/1753-6561-5-S9-S75
PMCID: PMC3287915  PMID: 22373189
19.  Dark Matter: Are Mice the Solution to Missing Heritability? 
Genome-wide association studies (GWAS) in humans have identified hundreds of single nucleotide polymorphisms associated with complex traits, yet for most traits studied, the sum total of all these identified variants fail to explain a significant portion of the heritable variation. Reasons for this “missing heritability” are thought to include the existence of rare causative variants not captured by current genotyping arrays, structural variants that go undetected by existing technology, insufficient power to identify multi-gene interactions, small sample sizes, and the influence of environmental and epigenetic effects. As genotyping technologies have evolved it has become inexpensive and relatively straightforward to perform GWAS in mice. Mice offer a powerful tool for elucidating the genetic architecture of behavioral and physiological traits, and are complementary to human studies. Unlike F2 crosses of inbred strains, advanced intercross lines, heterogeneous stocks, outbred, and wild-caught mice have more rapid breakdown of linkage disequilibrium which allow for increasingly high resolution mapping. Because some of these populations are created using a small number of founder chromosomes they are not expected to harbor rare alleles. We discuss the differences between these mouse populations and examine their potential to overcome some of the pitfalls that have plagued human GWAS studies.
doi:10.3389/fgene.2011.00032
PMCID: PMC3268586  PMID: 22303328
GWAS; quantitative trait loci; complex traits; forward genetics; advanced intercross lines; heterogeneous stock; outbred mice; wild mice
20.  The power of regional heritability analysis for rare and common variant detection: simulations and application to eye biometrical traits 
Frontiers in Genetics  2013;4:232.
Genome-wide association studies (GWAS) have provided valuable insights into the genetic basis of complex traits. However, they have explained relatively little trait heritability. Recently, we proposed a new analytical approach called regional heritability mapping (RHM) that captures more of the missing genetic variation. This method is applicable both to related and unrelated populations. Here, we demonstrate the power of RHM in comparison with single-SNP GWAS and gene-based association approaches under a wide range of scenarios with variable numbers of quantitative trait loci (QTL) with common and rare causal variants in a narrow genomic region. Simulations based on real genotype data were performed to assess power to capture QTL variance, and we demonstrate that RHM has greater power to detect rare variants and/or multiple alleles in a region than other approaches. In addition, we show that RHM can capture more accurately the QTL variance, when it is caused by multiple independent effects and/or rare variants. We applied RHM to analyze three biometrical eye traits for which single-SNP GWAS have been published or performed to evaluate the effectiveness of this method in real data analysis and detected some additional loci which were not detected by other GWAS methods. RHM has the potential to explain some of missing heritability by capturing variance caused by QTL with low MAF and multiple independent QTL in a region, not captured by other GWAS methods. RHM analyses can be implemented using the software REACTA (http://www.epcc.ed.ac.uk/projects-portfolio/reacta).
doi:10.3389/fgene.2013.00232
PMCID: PMC3832942  PMID: 24312116
common and rare variants; GWAS; regional heritability mapping; multiple independent effects; missing heritability
21.  The combination of a genome-wide association study of lymphocyte count and analysis of gene expression data reveals novel asthma candidate genes 
Human Molecular Genetics  2012;21(9):2111-2123.
Recent genome-wide association studies (GWAS) have identified a number of novel genetic associations with complex human diseases. In spite of these successes, results from GWAS generally explain only a small proportion of disease heritability, an observation termed the ‘missing heritability problem’. Several sources for the missing heritability have been proposed, including the contribution of many common variants with small individual effect sizes, which cannot be reliably found using the standard GWAS approach. The goal of our study was to explore a complimentary approach, which combines GWAS results with functional data in order to identify novel genetic associations with small effect sizes. To do so, we conducted a GWAS for lymphocyte count, a physiologic quantitative trait associated with asthma, in 462 Hutterites. In parallel, we performed a genome-wide gene expression study in lymphoblastoid cell lines from 96 Hutterites. We found significant support for genetic associations using the GWAS data when we considered variants near the 193 genes whose expression levels across individuals were most correlated with lymphocyte counts. Interestingly, these variants are also enriched with signatures of an association with asthma susceptibility, an observation we were able to replicate. The associated loci include genes previously implicated in asthma susceptibility as well as novel candidate genes enriched for functions related to T cell receptor signaling and adenosine triphosphate synthesis. Our results, therefore, establish a new set of asthma susceptibility candidate genes. More generally, our observations support the notion that many loci of small effects influence variation in lymphocyte count and asthma susceptibility.
doi:10.1093/hmg/dds021
PMCID: PMC3315207  PMID: 22286170
22.  Using genome-wide complex trait analysis to quantify ‘missing heritability’ in Parkinson's disease 
Human Molecular Genetics  2012;21(22):4996-5009.
Genome-wide association studies (GWASs) have been successful at identifying single-nucleotide polymorphisms (SNPs) highly associated with common traits; however, a great deal of the heritable variation associated with common traits remains unaccounted for within the genome. Genome-wide complex trait analysis (GCTA) is a statistical method that applies a linear mixed model to estimate phenotypic variance of complex traits explained by genome-wide SNPs, including those not associated with the trait in a GWAS. We applied GCTA to 8 cohorts containing 7096 case and 19 455 control individuals of European ancestry in order to examine the missing heritability present in Parkinson's disease (PD). We meta-analyzed our initial results to produce robust heritability estimates for PD types across cohorts. Our results identify 27% (95% CI 17–38, P = 8.08E − 08) phenotypic variance associated with all types of PD, 15% (95% CI −0.2 to 33, P = 0.09) phenotypic variance associated with early-onset PD and 31% (95% CI 17–44, P = 1.34E − 05) phenotypic variance associated with late-onset PD. This is a substantial increase from the genetic variance identified by top GWAS hits alone (between 3 and 5%) and indicates there are substantially more risk loci to be identified. Our results suggest that although GWASs are a useful tool in identifying the most common variants associated with complex disease, a great deal of common variants of small effect remain to be discovered.
doi:10.1093/hmg/dds335
PMCID: PMC3576713  PMID: 22892372
23.  Comparison of Family History and SNPs for Predicting Risk of Complex Disease 
PLoS Genetics  2012;8(10):e1002973.
The clinical utility of family history and genetic tests is generally well understood for simple Mendelian disorders and rare subforms of complex diseases that are directly attributable to highly penetrant genetic variants. However, little is presently known regarding the performance of these methods in situations where disease susceptibility depends on the cumulative contribution of multiple genetic factors of moderate or low penetrance. Using quantitative genetic theory, we develop a model for studying the predictive ability of family history and single nucleotide polymorphism (SNP)–based methods for assessing risk of polygenic disorders. We show that family history is most useful for highly common, heritable conditions (e.g., coronary artery disease), where it explains roughly 20%–30% of disease heritability, on par with the most successful SNP models based on associations discovered to date. In contrast, we find that for diseases of moderate or low frequency (e.g., Crohn disease) family history accounts for less than 4% of disease heritability, substantially lagging behind SNPs in almost all cases. These results indicate that, for a broad range of diseases, already identified SNP associations may be better predictors of risk than their family history–based counterparts, despite the large fraction of missing heritability that remains to be explained. Our model illustrates the difficulty of using either family history or SNPs for standalone disease prediction. On the other hand, we show that, unlike family history, SNP–based tests can reveal extreme likelihood ratios for a relatively large percentage of individuals, thus providing potentially valuable adjunctive evidence in a differential diagnosis.
Author Summary
In clinical practice, obtaining a detailed family history is often considered the standard-of-care for characterizing the inherited component of an individual's disease risk. Recently, genetic risk assessments based on the cumulative effect of known single nucleotide polymorphism (SNP) disease associations have been proposed as another potentially useful source of information. To date, however, little is known regarding the predictive power of each approach. In this study, we develop models based on quantitative genetic theory to analyze and compare family history and SNP–based models. Our models explain the impact of disease frequency and heritability on performance for each method, and reveal a wide range of scenarios (16 out of the 23 diseases considered) where SNP associations may already be better predictors of risk than family history. Our results confirm the difficulty of obtaining accurate prediction when SNP or family history–based methods are used alone, and they show the benefits of combining information from the two approaches. They also suggest that, in some situations, SNP associations may be potentially useful as supporting evidence alongside other types of clinical information. To our knowledge, this study is the first broad comparison of family history– and SNP–based methods across a wide range of health conditions.
doi:10.1371/journal.pgen.1002973
PMCID: PMC3469463  PMID: 23071447
24.  A Study of CNVs As Trait-Associated Polymorphisms and As Expression Quantitative Trait Loci 
PLoS Genetics  2011;7(2):e1001292.
We conducted a comprehensive study of copy number variants (CNVs) well-tagged by SNPs (r2≥0.8) by analyzing their effect on gene expression and their association with disease susceptibility and other complex human traits. We tested whether these CNVs were more likely to be functional than frequency-matched SNPs as trait-associated loci or as expression quantitative trait loci (eQTLs) influencing phenotype by altering gene regulation. Our study found that CNV–tagging SNPs are significantly enriched for cis eQTLs; furthermore, we observed that trait associations from the NHGRI catalog show an overrepresentation of SNPs tagging CNVs relative to frequency-matched SNPs. We found that these SNPs tagging CNVs are more likely to affect multiple expression traits than frequency-matched variants. Given these findings on the functional relevance of CNVs, we created an online resource of expression-associated CNVs (eCNVs) using the most comprehensive population-based map of CNVs to inform future studies of complex traits. Although previous studies of common CNVs that can be typed on existing platforms and/or interrogated by SNPs in genome-wide association studies concluded that such CNVs appear unlikely to have a major role in the genetic basis of several complex diseases examined, our findings indicate that it would be premature to dismiss the possibility that even common CNVs may contribute to complex phenotypes and at least some common diseases.
Author Summary
Despite the large number of SNPs found to be reproducibly associated with complex diseases, they collectively account for only a small proportion of the overall heritability to such traits. CNVs have thus been proposed to explain some of the missing heritability and to alter disease susceptibility. However, a recent study of the genetics of 8 common diseases involving 16,000 cases and 3,000 controls failed to identify any novel CNVs associated with disease and concluded that CNVs are unlikely to play a major role in their etiology. Studies we report here show that we must be careful not to dismiss the possibility that CNVs may indeed underlie some of the observed associations with complex disease. Our findings show that well-tagged CNVs are disproportionately more likely to be eQTLs, as well as cis-eQTLs, than frequency-matched SNPs; furthermore, reproducible trait associations, as represented in the NHGRI catalog, are enriched for well-tagged CNVs than frequency-matched SNPs. Because of these findings on the strong functional relevance of these CNVs, we created a database (available at http://www.scandb.org/) of expression associated CNVs to supplement our earlier studies of SNP eQTLs and to contribute to future studies of the genetics of complex traits.
doi:10.1371/journal.pgen.1001292
PMCID: PMC3033384  PMID: 21304891
25.  Genome-Wide Interaction-Based Association Analysis Identified Multiple New Susceptibility Loci for Common Diseases 
PLoS Genetics  2011;7(3):e1001338.
Genome-wide interaction-based association (GWIBA) analysis has the potential to identify novel susceptibility loci. These interaction effects could be missed with the prevailing approaches in genome-wide association studies (GWAS). However, no convincing loci have been discovered exclusively from GWIBA methods, and the intensive computation involved is a major barrier for application. Here, we developed a fast, multi-thread/parallel program named “pair-wise interaction-based association mapping” (PIAM) for exhaustive two-locus searches. With this program, we performed a complete GWIBA analysis on seven diseases with stringent control for false positives, and we validated the results for three of these diseases. We identified one pair-wise interaction between a previously identified locus, C1orf106, and one new locus, TEC, that was specific for Crohn's disease, with a Bonferroni corrected P<0.05 (P = 0.039). This interaction was replicated with a pair of proxy linked loci (P = 0.013) on an independent dataset. Five other interactions had corrected P<0.5. We identified the allelic effect of a locus close to SLC7A13 for coronary artery disease. This was replicated with a linked locus on an independent dataset (P = 1.09×10−7). Through a local validation analysis that evaluated association signals, rather than locus-based associations, we found that several other regions showed association/interaction signals with nominal P<0.05. In conclusion, this study demonstrated that the GWIBA approach was successful for identifying novel loci, and the results provide new insights into the genetic architecture of common diseases. In addition, our PIAM program was capable of handling very large GWAS datasets that are likely to be produced in the future.
Author Summary
Recent studies on the genetic basis of common diseases have identified many loci that confer disease susceptibility. However, much of the heritability of these diseases remains unexplained. Loci involved in gene–gene interactions are considered cryptic, because they confer susceptibility, but may not generate a detectable signal on their own. These interactions may account for the “missing heritability” of common diseases. Theoretically, these interactions can be identified with the genome-wide interaction-based association analysis. But, in reality, very few gene–gene interactions have been identified with that method, and most were based on prior biological knowledge. Here, we applied a parallel computing technique that facilitated the identification of multiple new cryptic susceptibility loci involved in common diseases. We applied stringent control for false positives, and we validated our findings with independent datasets. This study demonstrated that interactions between gene loci could be successfully identified with the genome-wide interaction-based approach. With this approach, we also identified cryptic loci with moderate single-locus effects. The identified loci and interactions merit further investigations for fine mapping and functional analyses. Our results extend the current knowledge of common diseases for future studies in genetic mapping. This approach is applicable to current and future genome-wide association datasets.
doi:10.1371/journal.pgen.1001338
PMCID: PMC3060075  PMID: 21437271

Results 1-25 (1270814)