Although recent genome-wide studies have provided valuable insights into the genetic basis of human disease, they have explained relatively little of the heritability of most complex traits, and the variants identified through these studies have small effect sizes. This has led to the important and hotly debated issue of where the ‘missing heritability’ of complex diseases might be found. Here, seven leading geneticists offer their opinion about where this heritability is likely to lie, what this could tell us about the underlying genetic architecture of common diseases and how this could inform research strategies for uncovering genetic risk factors.
The field of genomics has entered a new era in which the ability to identify genetic variants that impact complex human traits and disease in an unbiased fashion using genome-wide approaches is widely accessible. To date, the workhorse of these efforts has been the genome-wide association study (GWAS), which has quickly moved from novel to routine, and has provided key insights into aspects of the underlying allelic architecture of complex traits. The main lesson learned from the early GWAS efforts is that though many disease-associated variants are often discovered, most have only a minor effect on disease, and in total explain only a small amount of the apparent heritability. Here we provide a brief overview of the genetic variation classes that may harbor the heritability missing from GWAS, and touch on approaches that will be leveraged in the coming years as genomics—and by extension medicine—becomes increasingly personalized.
Genetics; genomics; GWAS; liver; complex disease
The common genetic variants identified through genome-wide association studies explain only a small proportion of the genetic risk for complex diseases. The advancement of next-generation sequencing technologies has enabled the detection of rare variants that are expected to contribute significantly to the missing heritability. Some genetic association studies provide multiple correlated traits for analysis. Multiple trait analysis has the potential to improve the power to detect pleiotropic genetic variants that influence multiple traits. We propose a gene-level association test for multiple traits that accounts for correlation among the traits. Gene- or region-level testing for association involves both common and rare variants. Statistical tests for common variants may have limited power for individual rare variants because of their low frequency and multiple testing issues. To address these concerns, we use the weighted-sum pooling method to test the joint association of multiple rare and common variants within a gene. The proposed method is applied to the Genetic Association Workshop 17 (GAW17) simulated mini-exome data to analyze multiple traits. Because of the nature of the GAW17 simulation model, increased power was not observed for multiple-trait analysis compared to single-trait analysis. However, multiple-trait analysis did not result in a substantial loss of power because of the testing of multiple traits. We conclude that this method would be useful for identifying pleiotropic genes.
Genetically tractable model organisms from phages to mice have taught us invaluable lessons about fundamental biological processes and disease-causing mutations. Owing to technological and computational advances, human biology and the causes of human diseases have become accessible as never before. Progress in identifying genetic determinants for human diseases has been most remarkable for Mendelian traits. In contrast, identifying genetic determinants for complex diseases such as diabetes, cancer, and cardiovascular and neurological diseases has remained challenging, despite the fact that these diseases cluster in families. Hundreds of variants associated with complex diseases have been found in genome-wide association studies (GWAS), yet most of these variants explain only a modest amount of the observed heritability, a phenomenon known as “missing heritability.” The missing heritability has been attributed to many factors, mainly inadequacies in genotyping and phenotyping. We argue that lessons learned about complex traits in model organisms offer an alternative explanation for missing heritability in humans. In diverse model organisms, phenotypic robustness differs among individuals, and those with decreased robustness show increased penetrance of mutations and express previously cryptic genetic variation. We propose that phenotypic robustness also differs among humans and that individuals with lower robustness will be more responsive to genetic and environmental perturbations and hence susceptible to disease. Phenotypic robustness is a quantitative trait that can be accurately measured in model organisms, but not as yet in humans. We propose feasible approaches to measure robustness in large human populations, proof-of-principle experiments for robustness markers in model organisms, and a new GWAS design that takes differences in robustness into account.
Many common human diseases and complex traits are highly heritable and influenced by multiple genetic and environmental factors. Although genome-wide association studies (GWAS) have successfully identified many disease-associated variants, these genetic variants explain only a small proportion of the heritability of most complex diseases. Genetic interactions (gene-gene and gene-environment) substantially contribute to complex traits and diseases and could be one of the main sources of the missing heritability. This paper provides an overview of the available statistical methods and related computer software for identifying genetic interactions in animal and plant experimental crosses and human genetic association studies. The main discussion falls under the three broad issues in statistical analysis of genetic interactions: the definition, detection and interpretation of genetic interactions. Recently developed methods based on modern techniques for high-dimensional data are reviewed, including penalized likelihood approaches and hierarchical models; the relationships among these methods are also discussed. I conclude this review by highlighting some areas of future research.
Bayesian methods; Complex traits; Epistasis; Gene-environment interactions; Genetic association; High-dimensionality; Hierarchical models; Penalized likelihood; Quantitative trait loci
Recent genome-wide association studies (GWAS) have identified a number of novel genetic associations with complex human diseases. In spite of these successes, results from GWAS generally explain only a small proportion of disease heritability, an observation termed the ‘missing heritability problem’. Several sources for the missing heritability have been proposed, including the contribution of many common variants with small individual effect sizes, which cannot be reliably found using the standard GWAS approach. The goal of our study was to explore a complimentary approach, which combines GWAS results with functional data in order to identify novel genetic associations with small effect sizes. To do so, we conducted a GWAS for lymphocyte count, a physiologic quantitative trait associated with asthma, in 462 Hutterites. In parallel, we performed a genome-wide gene expression study in lymphoblastoid cell lines from 96 Hutterites. We found significant support for genetic associations using the GWAS data when we considered variants near the 193 genes whose expression levels across individuals were most correlated with lymphocyte counts. Interestingly, these variants are also enriched with signatures of an association with asthma susceptibility, an observation we were able to replicate. The associated loci include genes previously implicated in asthma susceptibility as well as novel candidate genes enriched for functions related to T cell receptor signaling and adenosine triphosphate synthesis. Our results, therefore, establish a new set of asthma susceptibility candidate genes. More generally, our observations support the notion that many loci of small effects influence variation in lymphocyte count and asthma susceptibility.
Genome-wide association studies (GWAS) in humans have identified hundreds of single nucleotide polymorphisms associated with complex traits, yet for most traits studied, the sum total of all these identified variants fail to explain a significant portion of the heritable variation. Reasons for this “missing heritability” are thought to include the existence of rare causative variants not captured by current genotyping arrays, structural variants that go undetected by existing technology, insufficient power to identify multi-gene interactions, small sample sizes, and the influence of environmental and epigenetic effects. As genotyping technologies have evolved it has become inexpensive and relatively straightforward to perform GWAS in mice. Mice offer a powerful tool for elucidating the genetic architecture of behavioral and physiological traits, and are complementary to human studies. Unlike F2 crosses of inbred strains, advanced intercross lines, heterogeneous stocks, outbred, and wild-caught mice have more rapid breakdown of linkage disequilibrium which allow for increasingly high resolution mapping. Because some of these populations are created using a small number of founder chromosomes they are not expected to harbor rare alleles. We discuss the differences between these mouse populations and examine their potential to overcome some of the pitfalls that have plagued human GWAS studies.
GWAS; quantitative trait loci; complex traits; forward genetics; advanced intercross lines; heterogeneous stock; outbred mice; wild mice
Genome-wide association studies have been successful in identifying common variants for common complex traits in recent years. However, common variants have generally failed to explain substantial proportions of the trait heritabilities. Rare variants, structural variations, and gene-gene and gene-environment interactions, among others, have been suggested as potential sources of the so-called missing heritability. With the advent of exome-wide and whole-genome next-generation sequencing technologies, finding rare variants in functionally important sites (e.g., protein-coding regions) becomes feasible. We investigate the role of linkage information to select families enriched for rare variants using the simulated Genetic Analysis Workshop 17 data. In each replicate of simulated phenotypes Q1 and Q2 on 697 subjects in 8 extended pedigrees, we select one pedigree with the largest family-specific LOD score. Across all 200 replications, we compare the probability that rare causal alleles will be carried in the selected pedigree versus a randomly chosen pedigree. One example of successful enrichment was exhibited for gene VEGFC. The causal variant had minor allele frequency of 0.0717% in the simulated unrelated individuals and explained about 0.1% of the phenotypic variance. However, it explained 7.9% of the phenotypic variance in the eight simulated pedigrees and 23.8% in the family that carried the minor allele. The carrier’s family was selected in all 200 replications. Thus our results show that family-specific linkage information is useful for selecting families for sequencing, thus ensuring that rare functional variants are segregating in the sequencing samples.
Genome-wide association studies (GWAS) have identified many common variants associated with complex traits in human populations. Thus far, most reported variants have relatively small effects and explain only a small proportion of phenotypic variance, leading to the issues of ‘missing’ heritability and its explanation. Using height as an example, we examined two possible sources of missing heritability: first, variants with smaller effects whose associations with height failed to reach genome-wide significance and second, allelic heterogeneity due to the effects of multiple variants at a single locus. Using a novel analytical approach we examined allelic heterogeneity of height-associated loci selected from SNPs of different significance levels based on the summary data of the GIANT (stage 1) studies. In a sample of 1,304 individuals collected from an island population of the Adriatic coast of Croatia, we assessed the extent of height variance explained by incorporating the effects of less significant height loci and multiple effective SNPs at the same loci. Our results indicate that approximately half of the 118 loci that achieved stringent genome-wide significance (p-value<5×10−8) showed evidence of allelic heterogeneity. Additionally, including less significant loci (i.e., p-value<5×10−4) and accounting for effects of allelic heterogeneity substantially improved the variance explained in height.
The last decade of human genetic research witnessed the completion of hundreds of genome-wide association studies (GWASs). However, the genetic variants discovered through these efforts account for only a small proportion of the heritability of complex traits. One explanation for the missing heritability is that the common analysis approach, assessing the effect of each single-nucleotide polymorphism (SNP) individually, is not well suited to the detection of small effects of multiple SNPs. Gene set analysis (GSA) is one of several approaches that may contribute to the discovery of additional genetic risk factors for complex traits. Complex phenotypes are thought to be controlled by networks of interacting biochemical and physiological pathways influenced by the products of sets of genes. By assessing the overall evidence of association of a phenotype with all measured variation in a set of genes, GSA may identify functionally relevant sets of genes corresponding to relevant biomolecular pathways, which will enable more focused studies of genetic risk factors. This approach may thus contribute to the discovery of genetic variants responsible for some of the missing heritability. With the increased use of these approaches for the secondary analysis of data from GWAS, it is important to understand the different GSA methods and their strengths and weaknesses, and consider challenges inherent in these types of analyses. This paper provides an overview of GSA, highlighting the key challenges, potential solutions, and directions for ongoing research.
pathway analysis; multilocus; complex traits; genetic association studies
Genome-wide association studies (GWAS) have provided valuable insights into the genetic basis of complex traits. However, they have explained relatively little trait heritability. Recently, we proposed a new analytical approach called regional heritability mapping (RHM) that captures more of the missing genetic variation. This method is applicable both to related and unrelated populations. Here, we demonstrate the power of RHM in comparison with single-SNP GWAS and gene-based association approaches under a wide range of scenarios with variable numbers of quantitative trait loci (QTL) with common and rare causal variants in a narrow genomic region. Simulations based on real genotype data were performed to assess power to capture QTL variance, and we demonstrate that RHM has greater power to detect rare variants and/or multiple alleles in a region than other approaches. In addition, we show that RHM can capture more accurately the QTL variance, when it is caused by multiple independent effects and/or rare variants. We applied RHM to analyze three biometrical eye traits for which single-SNP GWAS have been published or performed to evaluate the effectiveness of this method in real data analysis and detected some additional loci which were not detected by other GWAS methods. RHM has the potential to explain some of missing heritability by capturing variance caused by QTL with low MAF and multiple independent QTL in a region, not captured by other GWAS methods. RHM analyses can be implemented using the software REACTA (http://www.epcc.ed.ac.uk/projects-portfolio/reacta).
common and rare variants; GWAS; regional heritability mapping; multiple independent effects; missing heritability
The variance explained by genetic variants as identified in (genome-wide) genetic association studies is typically small compared to family-based heritability estimates. Explanations of this ‘missing heritability’ have been mainly genetic, such as genetic heterogeneity and complex (epi-)genetic mechanisms.
We used comprehensive simulation studies to show that three phenotypic measurement issues also provide viable explanations of the missing heritability: phenotypic complexity, measurement bias, and phenotypic resolution. We identify the circumstances in which the use of phenotypic sum-scores and the presence of measurement bias lower the power to detect genetic variants. In addition, we show how the differential resolution of psychometric instruments (i.e., whether the instrument includes items that resolve individual differences in the normal range or in the clinical range of a phenotype) affects the power to detect genetic variants.
We conclude that careful phenotypic data modelling can improve the genetic signal, and thus the statistical power to identify genetic variants by 20–99%.
Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.
association; common variant; haplotype; rare variant
It has been proposed that single nucleotide polymorphisms (SNPs) discovered by genome-wide association studies (GWAS) account for only a small fraction of the genetic variation of complex traits in human population. The remaining unexplained variance or missing heritability is thought to be due to marginal effects of many loci with small effects and has eluded attempts to identify its sources. Combination of different studies appears to resolve in part this problem. However, neither individual GWAS nor meta-analytic combinations thereof are helpful for disclosing which genetic variants contribute to explain a particular phenotype. Here, we propose that most of the missing heritability is latent in the GWAS data, which conceals intermediate phenotypes. To uncover such latent information, we propose the PGMRA server that introduces phenomics—the full set of phenotype features of an individual—to identify SNP-set structures in a broader sense, i.e. causally cohesive genotype–phenotype relations. These relations are agnostically identified (without considering disease status of the subjects) and organized in an interpretable fashion. Then, by incorporating a posteriori the subject status within each relation, we can establish the risk surface of a disease in an unbiased mode. This approach complements—instead of replaces—current analysis methods. The server is publically available at http://phop.ugr.es/fenogeno.
Coronary atherosclerosis is a complex heritable trait with an enigmatic genetic etiology. Genome-wide association studies (GWAS) have successfully led to identification of over 100 different loci for susceptibility to coronary atherosclerosis. Most identified single nucleotide polymorphisms (SNP)s and genes have not been previously implicated in the pathogenesis of atherosclerosis and hence, have modest biological plausibility. The novel discoveries, however, might provide the opportunity for identification of new pathways and consequently novel preventive and therapeutic targets. A notable outcome of GWAS is relatively modest effect sizes of the associated SNPs. Collectively, the identified SNPs account for a relatively small fraction of heritability of coronary atherosclerosis, which raises the question of “missing heritability”. Because GWAS test the common disease – comment variant hypothesis, a plausible explanation might be the presence of uncommon and rare variants in the genome that are untested in GWAS but that might exert large effect sizes on the risk of atherosclerosis. The latter, however, remains an empiric question pending validation through experimentation. Alternative mechanisms, such as transgenerational epigenetics including microRNAs, might in part account for the heritability of coronary atherosclerosis. Collectively, the recent findings are indicative of the etiological complexity of coronary atherosclerosis. Hence, it is expected that genetic etiology of coronary atherosclerosis will remain enigmatic in the foreseeable future.
Atherosclerosis; Coronary artery disease; Genetics; GWAS; Polymorphism
Genome-wide association studies (GWAS) have identified many common polymorphisms associated with complex traits. However, these associated common variants explain only a small fraction of the phenotypic variances, leaving a substantial portion of genetic heritability unexplained. As a result, searches for "missing" heritability are drawing increasing attention, particularly for rare variant studies that often require a large sample size and, thus, extensive sequencing effort. Although the development of next generation sequencing (NGS) technologies has made it possible to sequence a large number of reads economically and efficiently, it is still often cost prohibitive to sequence thousands of individuals that are generally required for association studies. A more efficient and cost-effective design would involve pooling the genetic materials of multiple individuals together and then sequencing the pools, instead of the individuals. This pooled sequencing approach has improved the plausibility of association studies for rare variants, while, at the same time, posed a great challenge to the pooled sequencing data analysis, essentially because individual sample identity is lost, and NGS sequencing errors could be hard to distinguish from low frequency alleles.
A unified approach for estimating minor allele frequency, SNP calling and association studies based on pooled sequencing data using an expectation maximization (EM) algorithm is developed in this paper. This approach makes it possible to study the effects of minor allele frequency, sequencing error rate, number of pools, number of individuals in each pool, and the sequencing depth on the estimation accuracy of minor allele frequencies. We show that the naive method of estimating minor allele frequencies by taking the fraction of observed minor alleles can be significantly biased, especially for rare variants. In contrast, our EM approach can give an unbiased estimate of the minor allele frequency under all scenarios studied in this paper. A SNP calling approach, EM-SNP, for pooled sequencing data based on the EM algorithm is then developed and compared with another recent SNP calling method, SNVer. We show that EM-SNP outperforms SNVer in terms of the fraction of db-SNPs among the called SNPs, as well as transition/transversion (Ti/Tv) ratio. Finally, the EM approach is used to study the association between variants and type I diabetes.
The EM-based approach for the analysis of pooled sequencing data can accurately estimate minor allele frequencies, call SNPs, and find associations between variants and complex traits. This approach is especially useful for studies involving rare variants.
Although genome-wide association studies have uncovered variants associated with more than 150 traits, the percentage of phenotypic variation explained by these associations remains small. This has led to the search for the dark matter that explains this missing genetic component of heritability. One potential explanation for dark matter is rare variants, and several statistics have been devised to detect associations resulting from aggregations of rare variants in relatively short regions of interest, such as candidate genes. In this paper we investigate the feasibility of extending this approach in an agnostic way, in which we consider all variants within a much broader region of interest, such as an entire chromosome or even the entire exome. Our method searches for subsets of variant sites using either Markov chain Monte Carlo or genetic algorithms. The analysis was performed with knowledge of the Genetic Analysis Workshop 17 answers.
Single nucleotide polymorphisms (SNPs) discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method by simulations based upon the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium (LD) between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency (MAF) than the SNPs explored to date.
Several lines of evidence suggest that genome-wide association studies (GWAS) have the potential to explain more of the “missing heritability” of common complex phenotypes. However, reliable methods to identify a larger proportion of single nucleotide polymorphisms (SNPs) that impact disease risk are currently lacking. Here, we use a genetic pleiotropy-informed conditional false discovery rate (FDR) method on GWAS summary statistics data to identify new loci associated with schizophrenia (SCZ) and bipolar disorders (BD), two highly heritable disorders with significant missing heritability. Epidemiological and clinical evidence suggest similar disease characteristics and overlapping genes between SCZ and BD. Here, we computed conditional Q–Q curves of data from the Psychiatric Genome Consortium (SCZ; n = 9,379 cases and n = 7,736 controls; BD: n = 6,990 cases and n = 4,820 controls) to show enrichment of SNPs associated with SCZ as a function of association with BD and vice versa with a corresponding reduction in FDR. Applying the conditional FDR method, we identified 58 loci associated with SCZ and 35 loci associated with BD below the conditional FDR level of 0.05. Of these, 14 loci were associated with both SCZ and BD (conjunction FDR). Together, these findings show the feasibility of genetic pleiotropy-informed methods to improve gene discovery in SCZ and BD and indicate overlapping genetic mechanisms between these two disorders.
Genome-wide association studies (GWAS) have thus far identified only a small fraction of the heritability of common complex disorders, such as severe mental disorders. We used a conditional false discovery rate approach for analysis of GWAS data, exploiting “genetic pleiotropy” to increase discovery of common gene variants associated with schizophrenia and bipolar disorders. Leveraging the increased power from combining GWAS of two associated phenotypes, we found a striking overlap in polygenic signals, allowing for the discovery of several new common gene variants associated with bipolar disorder and schizophrenia that were not identified in the original analysis using traditional GWAS methods. Some of the gene variants have been identified in other studies with large targeted replication samples, validating the present findings. Our pleiotropy-informed method may be of significant importance for detecting effects that are below the traditional genome-wide significance level in GWAS, particularly in highly polygenic, complex phenotypes, such as schizophrenia and bipolar disorder, where most of the genetic signal is missing (i.e., “missing heritability”). The findings also offer insights into mechanistic relationships between bipolar disorder and schizophrenia pathogenesis.
Complex diseases are often highly heritable. However, for many complex traits only a small proportion of the heritability can be explained by observed genetic variants in traditional genome-wide association (GWA) studies. Moreover, for some of those traits few significant SNPs have been identified. Single SNP association methods test for association at a single SNP, ignoring the effect of other SNPs. We show using a simple multi-locus odds model of complex disease that moderate to large effect sizes of causal variants may be estimated as relatively small effect sizes in single SNP association testing. This underestimation effect is most severe for diseases influenced by numerous risk variants. We relate the underestimation effect to the concept of non-collapsibility found in the statistics literature. As described, continuous phenotypes generated with linear genetic models are not affected by this underestimation effect. Since many GWA studies apply single SNP analysis to dichotomous phenotypes, previously reported results potentially underestimate true effect sizes, thereby impeding identification of true effect SNPs. Therefore, when a multi-locus model of disease risk is assumed, a multi SNP analysis may be more appropriate.
Human genome resequencing technologies are becoming ever more affordable and provide a valuable source of data about rare genetic variants in the human genome. Such rare variation may play an important role in explaining the missing heritability of complex human traits. We implement an existing method for analyzing rare variants by testing for association with the mutational load across genes. In this study, we make use of simulated data from the Genetic Analysis Workshop 17 to assess the power of this approach to detect association with simulated quantitative and dichotomous phenotypes and to evaluate the impact of missing genotypes on the power of the analysis. According to our results, the mutational load based rare variant analysis method is relatively robust to call-rate and is adequately powered for genome-wide association analysis.
Genome-wide association studies have thus far failed to explain the observed heritability of complex human diseases. This is referred to as the “missing heritability” problem. However, these analyses have usually neglected to consider a role for epigenetic variation, which has been associated with many human diseases. We extend models of epigenetic inheritance to investigate whether environment-sensitive epigenetic modifications of DNA might explain observed patterns of familial aggregation. We find that variation in epigenetic state and environmental state can result in highly heritable phenotypes through a combination of epigenetic and environmental inheritance. These two inheritance processes together can produce familial covariances significantly higher than those predicted by models of purely epigenetic inheritance and similar to those expected from genetic effects. The results suggest that epigenetic variation, inherited both directly and through shared environmental effects, may make a key contribution to the missing heritability.
Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. ‘Population-based linkage analysis’ (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10−6). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority.
Hundreds of genome-wide association studies have been performed in recent years in order to try to identify common variants that associate with complex disease. These have met with varying success. Some of the strongest effects of common variants have been found in lateonset diseases and in drug response. The major histocompatibility complex has also shown very strong association with a variety of disorders. Although there have been some notable success stories in neuropsychiatric genetics, on the whole, common variation has explained little of the high heritability of these traits. In contrast, early studies of rare copy number variants have led rapidly to a number of genes and loci that strongly associate with neuropsychiatric disorders. It is likely that the use of whole-genome sequencing to extend the study of rare variation in neuropsychiatry will greatly advance our understanding of neuropsychiatric genetics.
genome-wide association study; rare variant; neuropsychiatric; schizophrenia; sequencing; rare variant; neuropsychiatric; schizophrenia ; sequencing
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6×10−8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6×10−13; SOX6, p = 6.4×10−10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
BMD and hip geometry are two major predictors of osteoporotic fractures, the most severe consequence of osteoporosis in elderly persons. We performed sex-specific genome-wide association studies (GWAS) for BMD at the lumbar spine and femor neck skeletal sites as well as hip geometric indices (NSA, NL, and NW) in the Framingham Osteoporosis Study and then replicated the top findings in two independent studies. Three novel loci were significant: in women, including chromosome 1p13.2 (RAP1A) for NW; in men, 2q11.2 (TBC1D8) for NSA and 18q11.2 (OSBPL1A) for NW. We confirmed a previously reported region on 8q24.12 (TNFRSF11B/OPG) for lumbar spine BMD in women. In addition, we integrated GWAS signals with eQTL in several tissues and publicly available expression signature profiling in cellular and whole-animal models, and prioritized 16 candidate genes/loci based on their potential involvement in skeletal metabolism. Among three prioritized loci (GPR177, SOX6, and CASR genes) associated with BMD in women, GPR177 and SOX6 have been successfully replicated later in a large-scale meta-analysis, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of using expression profiling to support the candidacy of suggestive GWAS signals that may contain important genes of interest.