Genome-wide association studies (GWAS) have been used successfully in detecting associations between common genetic variants and complex diseases. However, common SNPs detected by current GWAS only explain a small proportion of heritable variability. With the development of next-generation sequencing technologies, researchers find more and more evidence to support the role played by rare variants in heritable variability. However, rare and common variants are often studied separately. The objective of this paper is to develop a robust strategy to analyze association between complex traits and genetic regions using both common and rare variants.
We propose a weighted selective collapsing strategy for both candidate gene studies and genome-wide association scans. The strategy considers genetic information from both common and rare variants, selectively collapses all variants in a given region by a forward selection procedure, and uses an adaptive weight to favor more likely causal rare variants. Under this strategy, two tests are proposed. One test denoted by BwSC is sensitive to the directions of genetic effects, and it separates the deleterious and protective effects into two components. Another denoted by BwSCd is robust in the directions of genetic effects, and it considers the difference of the two components. In our simulation studies, BwSC achieves a higher power when the casual variants have the same genetic effect, while BwSCd is as powerful as several existing tests when a mixed genetic effect exists. Both of the proposed tests work well with and without the existence of genetic effects from common variants.
Two tests using a weighted selective collapsing strategy provide potentially powerful methods for association studies of sequencing data. The tests have a higher power when both common and rare variants contribute to the heritable variability and the effect of common variants is not strong enough to be detected by traditional methods. Our simulation studies have demonstrated a substantially higher power for both tests in all scenarios regardless whether the common SNPs are associated with the trait or not.
About forty percent of the genetic variance of age-related macular degeneration can be explained by common variation at five common single nucleotide polymorphisms. We evaluated the degree to which these known variants explain the clustering of age-related macular degeneration in a group of densely affected families. We sought to determine if the actual number of risk alleles at the five variants in densely affected families matched the expected number. Using data from 322 families with age-related macular degeneration, we employed a simulation strategy to generate comparison groups of families and determined whether their genetic profile at the known age-related macular degeneration risk loci differed from the observed genetic profile, given the density of disease observed. Overall, the genotypic loads for the five single nucleotide polymorphisms in the families did not deviate significantly from the genotypic loads predicted by the simulation. However, for a subset of densely affected families, the mean genotypic load in the families was significantly lower than the expected load determined from the simulation. Given that these densely affected families may harbor rare, more penetrant variants for age-related macular degeneration, linkage analyses and resequencing targeting these families may be an effective approach to finding additional implicated genes.
Age-related macular degeneration; complex trait; simulation; single nucleotide polymorphisms; liability threshold model
About 40% of the genetic variance of age-related macular degeneration (AMD) can be explained by a common variation at five common single-nucleotide polymorphisms (SNPs). We evaluated the degree to which these known variants explain the clustering of AMD in a group of densely affected families. We sought to determine whether the actual number of risk alleles at the five variants in densely affected families matched the expected number. Using data from 322 families with AMD, we used a simulation strategy to generate comparison groups of families and determined whether their genetic profile at the known AMD risk loci differed from the observed genetic profile, given the density of disease observed. Overall, the genotypic loads for the five SNPs in the families did not deviate significantly from the genotypic loads predicted by the simulation. However, for a subset of densely affected families, the mean genotypic load in the families was significantly lower than the expected load determined from the simulation. Given that these densely affected families may harbor rare, more penetrant variants for AMD, linkage analyses and resequencing targeting these families may be an effective approach to finding additional implicated genes.
AMD; complex trait; simulation; SNPs; liability threshold model
Risk prediction that capitalizes on emerging genetic findings holds great promise for improving public health and clinical care. However, recent risk prediction research has shown that predictive tests formed on existing common genetic loci, including those from genome-wide association studies, have lacked sufficient accuracy for clinical use. Because most rare variants on the genome have not yet been studied for their role in risk prediction, future disease prediction discoveries should shift toward a more comprehensive risk prediction strategy that takes into account both common and rare variants. We are proposing a collapsing receiver operating characteristic (CROC) approach for risk prediction research on both common and rare variants. The new approach is an extension of a previously developed forward ROC (FROC) approach, with additional procedures for handling rare variants. The approach was evaluated through the use of 533 single-nucleotide polymorphisms (SNPs) in 37 candidate genes from the Genetic Analysis Workshop 17 mini-exome data set. We found that a prediction model built on all SNPs gained more accuracy (AUC = 0.605) than one built on common variants alone (AUC = 0.585). We further evaluated the performance of two approaches by gradually reducing the number of common variants in the analysis. We found that the CROC method attained more accuracy than the FROC method when the number of common variants in the data decreased. In an extreme scenario, when there are only rare variants in the data, the CROC reached an AUC value of 0.603, whereas the FROC had an AUC value of 0.524.
Although recent genome-wide studies have provided valuable insights into the genetic basis of human disease, they have explained relatively little of the heritability of most complex traits, and the variants identified through these studies have small effect sizes. This has led to the important and hotly debated issue of where the ‘missing heritability’ of complex diseases might be found. Here, seven leading geneticists offer their opinion about where this heritability is likely to lie, what this could tell us about the underlying genetic architecture of common diseases and how this could inform research strategies for uncovering genetic risk factors.
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
Following years of linear gains in the genetic dissection of human disease, we are now in a period of exponential discovery. This is particularly apparent for complex disease. Genome-wide association studies have provided myriad associations between common variability and disease, and shown that common genetic variability is unlikely to explain the entire genetic predisposition to disease. Here, we detail how one can expand on this success and systematically identify genetic risks that lead or predispose to disease using next generation sequencing. Geneticists have had for many years a protocol to identify Mendelian disease. Now we have available a similar set of tools for the identification of rare moderate risk loci and common low risk variants. While undoubtedly major challenges remain, particularly with data handling and the functional classification of variants, we suggest that these will be largely practical and not conceptual.
Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied.
We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa.
If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects.
This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source.
The corrected QT (QTc) interval is a complex quantitative trait, believed to be influenced by several genetic and environmental factors. It is a strong prognostic indicator of cardiovascular mortality in patients with and without cardiac disease. More than 700 mutations have been described in 12 genes (LQT1-LQT12) involved in congenital long QT syndrome. However, the heritability (genetic contribution) of QTc interval in the general population cannot be adequately explained by these long QT syndrome genes. In order to further investigate the genetic architecture underlying QTc interval in the general population, genome-wide association studies, in which up to one million single nucleotide polymorphisms are assayed in thousands of individuals, are now being employed and have already led to the discovery of variants in seven novel loci and five loci that are known to cause congenital long or short QT syndrome. Here we show that a combined risk score using 11 of these loci explains about 10% of the heritability of QTc. Additional discovery of both common and rare variants will yield further etiological insight and accelerate clinical applications.
Genome-wide association studies have successfully identified numerous loci at which common variants influence disease risks or quantitative traits of interest. Despite these successes, the variants identified by these studies have generally explained only a small fraction of the variations in the phenotype. One explanation may be that many rare variants that are not included in the common genotyping platforms may contribute substantially to the genetic variations of the diseases. Next-generation sequencing, which would better allow for the analysis of rare variants, is now becoming available and affordable; however, the presence of a large number of rare variants challenges the statistical endeavor to stably identify these disease-causing genetic variants. We conduct a genome-wide association study of Genetic Analysis Workshop 17 case-control data produced by the next-generation sequencing technique and propose that collapsing rare variants within each genetic region through a supervised dimension reduction algorithm leads to several macrovariants constructed for rare variants within each genetic region. A simultaneous association of the phenotype to all common variants and macrovariants is undertaken using a linear discriminant analysis using the penalized orthogonal-components regression algorithm. The results suggest that the proposed analysis strategy shows promise but needs further development.
There is increasing empirical evidence that whole-genome prediction (WGP) is a powerful tool for predicting line and hybrid performance in maize. However, there is a lack of knowledge about the sensitivity of WGP models towards the genetic architecture of the trait. Whereas previous studies exclusively focused on highly polygenic traits, important agronomic traits such as disease resistances, nutrifunctional or climate adaptational traits have a genetic architecture which is either much less complex or unknown. For such cases, information about model robustness and guidelines for model selection are lacking. Here, we compared five WGP models with different assumptions about the distribution of the underlying genetic effects. As contrasting model traits, we chose three highly polygenic agronomic traits and three metabolites each with a major QTL explaining 22 to 30% of the genetic variance in a panel of 289 diverse maize inbred lines genotyped with 56,110 SNPs.
We found the five WGP models to be remarkable robust towards trait architecture with the largest differences in prediction accuracies ranging between 0.05 and 0.14 for the same trait, most likely as the result of the high level of linkage disequilibrium prevailing in elite maize germplasm. Whereas RR-BLUP performed best for the agronomic traits, it was inferior to LASSO or elastic net for the three metabolites. We found the approach of genome partitioning of genetic variance, first applied in human genetics, as useful in guiding the breeder which model to choose, if prior knowledge of the trait architecture is lacking.
Our results suggest that in diverse germplasm of elite maize inbred lines with a high level of LD, WGP models differ only slightly in their accuracies, irrespective of the number and effects of QTL found in previous linkage or association mapping studies. However, small gains in prediction accuracies can be achieved if the WGP model is selected according to the genetic architecture of the trait. If the trait architecture is unknown e.g. for novel traits which only recently received attention in breeding, we suggest to inspect the distribution of the genetic variance explained by each chromosome for guiding model selection in WGP.
Genomic selection; Whole-genome prediction; Genetic architecture; Complex traits; Zea mays
Rapid advances in sequencing technologies set the stage for the large-scale medical sequencing efforts to be performed in the near future, with the goal of assessing the importance of rare variants in complex diseases. The discovery of new disease susceptibility genes requires powerful statistical methods for rare variant analysis. The low frequency and the expected large number of such variants pose great difficulties for the analysis of these data. We propose here a robust and powerful testing strategy to study the role rare variants may play in affecting susceptibility to complex traits. The strategy is based on assessing whether rare variants in a genetic region collectively occur at significantly higher frequencies in cases compared with controls (or vice versa). A main feature of the proposed methodology is that, although it is an overall test assessing a possibly large number of rare variants simultaneously, the disease variants can be both protective and risk variants, with moderate decreases in statistical power when both types of variants are present. Using simulations, we show that this approach can be powerful under complex and general disease models, as well as in larger genetic regions where the proportion of disease susceptibility variants may be small. Comparisons with previously published tests on simulated data show that the proposed approach can have better power than the existing methods. An application to a recently published study on Type-1 Diabetes finds rare variants in gene IFIH1 to be protective against Type-1 Diabetes.
Risk to common diseases, such as diabetes, heart disease, etc., is influenced by a complex interaction among genetic and environmental factors. Most of the disease-association studies conducted so far have focused on common variants, widely available on genotyping platforms. However, recent advances in sequencing technologies pave the way for large-scale medical sequencing studies with the goal of elucidating the role rare variants may play in affecting susceptibility to complex traits. The large number of rare variants and their low frequencies pose great challenges for the analysis of these data. We present here a novel testing strategy, based on a weighted-sum statistic, that is less sensitive than existing methods to the presence of both risk and protective variants in the genetic region under investigation. We show applications to simulated data and to a real dataset on Type-1 Diabetes.
Coronary artery disease (CAD) is the leading cause of death worldwide. Affected individuals cluster in families in patterns that reflect the sharing of numerous susceptibility genes. Genome-wide and large-scale gene-centric genotyping studies that involve tens of thousands of cases and controls have now mapped common disease variants to 34 distinct loci. Some coronary disease common variants show allelic heterogeneity or copy number variation. Some of the loci include candidate genes that imply conventional or emerging risk factor-mediated mechanisms of disease pathogenesis. Quantitative trait loci associations with risk factors have been informative in Mendelian randomization studies as well as fine-mapping of causative variants. But, for most loci, plausible mechanistic links are uncertain or obscure at present but provide potentially novel directions for research into this disease's pathogenesis. The common variants explain ∼4% of inter-individual variation in disease risk and no more than 13% of the total heritability of coronary disease. Although many CAD genes are presently undiscovered, it is likely that larger collaborative genome-wide association studies will map further common/low-penetrance variants and hoped that low-frequency or rare high-penetrance variants will also be identified in medical resequencing experiments.
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence phenotype. Genome-wide association (GWA) studies have identified >600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the utility of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2,3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P=0.016), and that underlie skeletal growth defects (P<0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants, and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented amongst variants that alter amino acid structure of proteins and expression levels of nearby genes. Our data explain ∼10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to ∼16% of phenotypic variation (∼20% of heritable variation). Although additional approaches are needed to fully dissect the genetic architecture of polygenic human traits, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Recently there has been great interest in identifying rare variants associated with common diseases. We apply several collapsing-based and kernel-based single-gene association tests to Genetic Analysis Workshop 17 (GAW17) rare variant association data with unrelated individuals without knowledge of the simulation model. We also implement modified versions of these methods using additional information, such as minor allele frequency (MAF) and functional annotation. For each of four given traits provided in GAW17, we use the Bayesian mixed-effects model to estimate the phenotypic variance explained by the given environmental and genotypic data and to infer an individual-specific genetic effect to use directly in single-gene association tests. After obtaining information on the GAW17 simulation model, we compare the performance of all methods and examine the top genes identified by those methods. We find that collapsing-based methods with weights based on MAFs are sensitive to the “lower MAF, larger effect size” assumption, whereas kernel-based methods are more robust when this assumption is violated. In addition, many false-positive genes identified by multiple methods often contain variants with exactly the same genotype distribution as the causal variants used in the simulation model. When the sample size is much smaller than the number of rare variants, it is more likely that causal and noncausal variants will share the same or similar genotype distribution. This likely contributes to the low power and large number of false-positive results of all methods in detecting causal variants associated with disease in the GAW17 data set.
In recent years, genome wide association studies have revolutionized the understanding of the genetic architecture of complex disease, particularly in the context of disorders that present in old age, such as type 2 diabetes and cardiovascular disease. This new era is made all the more compelling by the fact that, through extensive validation efforts, there is now very strong consensus among human geneticists on what the key loci are that contribute to the pathogenesis of these traits. However, as these variants have been almost exclusively uncovered in an adult setting, there is the question of when these genetic variants start exerting their effects; indeed many may start setting up an individual’s predisposition to a disease of old age very early on in life. To this end, we review what breakthroughs have been made in elucidating which of these genetic factors are operating in childhood and conversely what discoveries have actually been made in the pediatric setting that have then been found subsequently to increase one’s risk of a late-onset disease. After all, it well known that complex traits like obesity, type 2 diabetes and inflammatory bowel disease are strongly determined by genetic factors, but the isolation of genes in these complex phenotypes in adults has been impeded by interaction with strong environmental factors. Distillation of the genetic component in these complex traits, which will at least partially have origins in childhood, should be easier to determine in a pediatric setting, where the relatively short period of a child’s lifetime limits the impact of environmental exposure.
Disease; late-onset; childhood; genetic; association
Genome-wide association studies and follow-up meta-analyses in Crohn's disease (CD) and ulcerative colitis (UC) have recently identified 163 disease-associated loci that meet genome-wide significance for these two inflammatory bowel diseases (IBD). These discoveries have already had a tremendous impact on our understanding of the genetic architecture of these diseases and have directed functional studies that have revealed some of the biological functions that are important to IBD (e.g. autophagy). Nonetheless, these loci can only explain a small proportion of disease variance (∼14% in CD and 7.5% in UC), suggesting that not only are additional loci to be found but that the known loci may contain high effect rare risk variants that have gone undetected by GWAS. To test this, we have used a targeted sequencing approach in 200 UC cases and 150 healthy controls (HC), all of French Canadian descent, to study 55 genes in regions associated with UC. We performed follow-up genotyping of 42 rare non-synonymous variants in independent case-control cohorts (totaling 14,435 UC cases and 20,204 HC). Our results confirmed significant association to rare non-synonymous coding variants in both IL23R and CARD9, previously identified from sequencing of CD loci, as well as identified a novel association in RNF186. With the exception of CARD9 (OR = 0.39), the rare non-synonymous variants identified were of moderate effect (OR = 1.49 for RNF186 and OR = 0.79 for IL23R). RNF186 encodes a protein with a RING domain having predicted E3 ubiquitin-protein ligase activity and two transmembrane domains. Importantly, the disease-coding variant is located in the ubiquitin ligase domain. Finally, our results suggest that rare variants in genes identified by genome-wide association in UC are unlikely to contribute significantly to the overall variance for the disease. Rather, these are expected to help focus functional studies of the corresponding disease loci.
Genetic studies of common diseases have seen tremendous progress in the last half-decade primarily due to recent technologies that enable a systematic examination of genetic markers across the entire genome in large numbers of patients and healthy controls. The studies, while identifying genomic regions that influence a person's risk for developing disease, often do not pinpoint the actual gene or gene variants that account for this risk (called a causal gene/variant). A prime example of this can be seen with the 163 genetic risk factors that have recently been associated with the chronic inflammatory bowel diseases known as Crohn's disease and ulcerative colitis. For less than a handful of these 163 is the causative change in the genetic code known. The current study used an approach to directly look at the genetic code for a subset of these and identified a causative change in the genetic code for eight risk factors for ulcerative colitis. This finding is particularly important because it directs biological studies to understand the mechanisms that lead to this chronic life-long inflammatory disease.
Genome-wide association studies (GWAS) have detected many disease associations. However, the reported variants tend to explain small fractions of risk, and there are doubts about issues such as the portability of findings over different ethnic groups or the relative roles of rare versus common variants in the genetic architecture of complex disease. Studying the degree of sharing of disease-associated variants across populations can help in solving these issues. We present a comprehensive survey of GWAS replicability across 28 diseases. Most loci and SNPs discovered in Europeans for these conditions have been extensively replicated using peoples of European and East Asian ancestry, while the replication with individuals of African ancestry is much less common. We found a strong and significant correlation of Odds Ratios across Europeans and East Asians, indicating that underlying causal variants are common and shared between the two ancestries. Moreover, SNPs that failed to replicate in East Asians map into genomic regions where Linkage Disequilibrium patterns differ significantly between populations. Finally, we observed that GWAS with larger sample sizes have detected variants with weaker effects rather than with lower frequencies. Our results indicate that most GWAS results are due to common variants. In addition, the sharing of disease alleles and the high correlation in their effect sizes suggest that most of the underlying causal variants are shared between Europeans and East Asians and that they tend to map close to the associated marker SNPs.
Describing and identifying the genetic variants that increase risk for complex diseases remains a central focus of human genetics and is fundamental for the emergent field of personalized medicine. Over the last six years, GWAS have revolutionized the field, discovering hundreds of disease loci. However, with only a handful of exceptions, the causal variants that generate the associations unveiled by GWAS have not been identified, and their frequency and degree of sharing across populations remains unknown. Here, we present a comprehensive comparison of GWAS results designed to try to understand the nature of causal variants. By examining the results of GWAS for 28 diseases that have been performed with peoples of European, East Asian, and African ancestries, we conclude that a large fraction of associations are caused by common causal variants that should map relatively close to the associated markers. Our results indicate that many of the disease risk variants discovered by GWAS are shared across Eurasians.
Epidemiologic evidence supports a genetic predisposition to stroke. Recent advances, primarily using the genome-wide association study approach, are transforming what we know about the genetics of multifactorial stroke, and are identifying novel stroke genes. The current findings are consistent with different stroke subtypes having different genetic architecture. These discoveries may identify novel pathways involved in stroke pathogenesis, and suggest new treatment approaches. However, the already identified genetic variants explain only a small proportion of overall stroke risk, and therefore are not currently useful in predicting risk for the individual patient. Such risk prediction may become a reality as identification of a greater number of stroke risk variants that explain the majority of genetic risk proceeds, and perhaps when information on rare variants, identified by whole-genome sequencing, is also incorporated into risk algorithms. Pharmacogenomics may offer the potential for earlier implementation of 'personalized genetic' medicine. Genetic variants affecting clopidogrel and warfarin metabolism may identify non-responders and reduce side-effects, but these approaches have not yet been widely adopted in clinical practice.
Genome-wide association studies have successfully identified numerous loci at which common variants influence disease risk or quantitative traits. Despite these successes, the variants identified by these studies have generally explained only a small fraction of the heritable component of disease risk, and have not pinpointed with certainty the causal variant(s) at the associated loci. Furthermore, the mechanisms of action by which associated loci influence disease or quantitative phenotypes are often unclear, because we do not know through which gene(s) the associated variants exert their effects or because these gene(s) are of unknown function or have no clear connection to known disease biology. Thus, the initial set of genome-wide association studies serve as a starting point for future genetic and functional studies. We outline possible next steps that may help accelerate progress from genetic studies to the biological knowledge that can guide the development of predictive, preventive, or therapeutic measures.
Cigarette smoking is a common addiction that increases the risk for many diseases, including lung cancer and chronic obstructive pulmonary disease. Genome-wide association studies (GWAS) have successfully identified and validated several susceptibility loci for nicotine consumption and dependence. However, the trait variance explained by these genes is only a small fraction of the estimated genetic risk. Pathway analysis complements single marker methods by including biological knowledge into the evaluation of GWAS, under the assumption that causal variants lie in functionally related genes, enabling the evaluation of a broad range of signals. Our approach to the identification of pathways enriched for multiple genes associated with smoking quantity includes the analysis of two studies and the replication of common findings in a third dataset. This study identified pathways for the cholinergic receptors, which included SNPs known to be genome-wide significant; as well as novel pathways, such as genes involved in the sensory perception of smell, that do not contain any single SNP that achieves that stringent threshold.
Cases with a family history are enriched for genetic risk variants, and the power of association studies can be improved by selecting cases with a family history of disease. However in recent genome-wide association scans utilising familial sampling, the excess relative risk for familial cases is less than predicted when compared to unselected cases. This can be explained by incomplete linkage disequilibrium between the tested marker and the underlying causal variant.
We show that the allele frequency and effect size of the underlying causal variant can be estimated by combining marker data from studies that ascertain cases based on different family histories. This allows us to learn about the genetic architecture of a complex trait, without having identified any causal variants. We consider several validated common marker alleles for breast cancer, using our own study of high risk, predominantly bilateral cases, cases preferentially selected to have at least two affected first or second degree relatives, and published estimates of relative risk from standard case/control studies.
To obtain realistic estimates and to accommodate some prior beliefs, we use Bayesian estimation to infer that the causal variants are probably common, with minor allele frequency >5%, and have small effects, with relative risk around 1.2.
These results strongly support the common disease common variant hypothesis for these specific loci associated with breast cancer.
Our results agree with recent assertions that synthetic associations of rare variants are unlikely to account for most associations seen in genome-wide studies.
Breast cancer; family history; ascertainment bias; causal variant; synthetic association
Genome-wide association studies are providing new insights into the genetic basis of metabolic and cardiovascular traits. In the past 3 years, common variants in ∼50 loci have been strongly associated with metabolic and cardiovascular traits. Several of these loci have implicated genes without a previously known connection with metabolism. Further studies will be required to characterize the full impact of these loci on metabolism. Many of the identified loci include multiple independent variants that influence the same metabolic or cardiovascular trait and a few loci harbor independent variants that each influence distinct traits. The total proportion of trait heritability explained by variants identified so far is still modest (typically <10%). Future studies will build on these successes by identifying additional common and rare variants and by determining the functional impact of the underlying alleles and genes.
Multiple sclerosis (MS) is a complex disease with underlying genetic and environmental factors. Although the contribution of alleles within the major histocompatibility complex (MHC) are known to exert strong effects on MS risk, much remains to be learned about the contributions of loci with more modest effects identified by genome-wide association studies (GWASs), as well as loci that remain undiscovered. We use a recently developed method to estimate the proportion of variance in disease liability explained by 475,806 single nucleotide polymorphisms (SNPs) genotyped in 1,854 MS cases and 5,164 controls. We reveal that ~30% of MS genetic liability is explained by SNPs in this dataset, the majority of which is accounted for by common variants. These results suggest that the unaccounted for proportion could be explained by variants that are in imperfect linkage disequilibrium with common GWAS SNPs, highlighting the potential importance of rare variants in the susceptibility to MS.
To discover common variants in 6 lipid metabolic genes and construct and validate a genetic risk score (GRS) based on the joint effects of genetic variants in multiple genes from lipid and other pathobiologic pathways.
Explaining the genetic basis of coronary artery disease (CAD) is incomplete. Discovery and aggregation of genetic variants from multiple pathways may advance this objective.
Premature CAD cases (N=1,918) and CAD-free controls (N=1,032) were selected from our angiographic registry. In a discovery phase, single nucleotide polymorphisms (SNPs) at 56 loci from internal discovery and external reports were tested for associations with biomarkers and CAD: 28 promising SNPs were then tested jointly for CAD associations, and a genetic risk score (GRS) consisting of SNPs contributing independently was constructed and validated in a replication set of familial cases and population-based controls (N=1,320).
Five variants contributed jointly to CAD prediction in a multigenic GRS model: odds ratio (OR) =1.24 (95% CI 1.16–1.33) per risk allele, p=8.2×10−11, adjusted OR=2.03 (1.53–2.70), 4th vs. 1st quartile. GRS5 had minor impact on area under the receiver-operating characteristic curve (p>0.05) but resulted in substantial net reclassification improvement: 0.16 overall, 0.28 in intermediate risk patients (both p<0.0001). GRS5 predicted familial CAD with similar magnitude in the validation set.
CorGen demonstrates the ability of a multigenic, multipathway GRS to improve discrimination of angiographic CAD. Genetic risk scores promise to increase understanding of the genetic basis of CAD and improve identification of individuals at increased CAD risk.
Coronary artery disease; genetics; risk; risk score