Genome-wide association studies (GWAS) have led to a rapid increase in available data on common genetic variants and phenotypes and numerous discoveries of new loci associated with susceptibility to common complex diseases. Integrating the evidence from GWAS and candidate gene studies depends on concerted efforts in data production, online publication, database development, and continuously updated data synthesis. Here the authors summarize current experience and challenges on these fronts, which were discussed at a 2008 multidisciplinary workshop sponsored by the Human Genome Epidemiology Network. Comprehensive field synopses that integrate many reported gene-disease associations have been systematically developed for several fields, including Alzheimer's disease, schizophrenia, bladder cancer, coronary heart disease, preterm birth, and DNA repair genes in various cancers. The authors summarize insights from these field synopses and discuss remaining unresolved issues—especially in the light of evidence from GWAS, for which they summarize empirical P-value and effect-size data on 223 discovered associations for binary outcomes (142 with P < 10−7). They also present a vision of collaboration that builds reliable cumulative evidence for genetic associations with common complex diseases and a transparent, distributed, authoritative knowledge base on genetic variation and human health. As a next step in the evolution of Human Genome Epidemiology reviews, the authors invite investigators to submit field synopses for possible publication in the American Journal of Epidemiology.
association; database; encyclopedias; epidemiologic methods; genome, human; genome-wide association study; genomics; meta-analysis
Despite numerous candidate gene and linkage studies, the field of type 2 diabetes (T2D) genetics had until recently succeeded in identifying few genuine disease-susceptibility loci. The advent of genome-wide association (GWA) scans has transformed the situation, leading to an expansion in the number of established, robustly replicating T2D loci to almost 20. These novel findings offer unique insights into the pathogenesis of T2D and in the main point towards the etiological importance of disorders of beta-cell development and function. All associated variants have common allele frequencies in the discovery populations, and exert modest to small effects on the risk of disease, characteristics which limit their prognostic and diagnostic potential. However, ongoing studies focussing on the role of copy number variation and targeting low frequency polymorphisms should identify additional T2D-susceptibility loci, some of which may have larger effect sizes and offer better individual prediction of disease risk.
The approach to molecular genetic studies of complex phenotypes has evolved considerably during the recent years. The candidate gene approach, restricted to analysis of a few single nucleotide polymorphisms (SNPs) in a modest number of cases and controls, has been supplanted by the unbiased approach of Genome-Wide Association Studies (GWAS), wherein a large number of tagger SNPs are typed in a large number of individuals. GWAS, which are designed upon the common disease- common variant hypothesis (CD-CV), have identified a large number of SNPs and loci for complex phenotypes. However, alleles identified through GWAS are typically not causative but rather in linkage disequilibrium (LD) with the true causal variants. The common alleles, which may not capture the uncommon and rare variants, account only for a fraction of heritability of the complex traits. Hence, the focus is being shifted to rare variants – common disease (RV-CD) hypothesis, surmising that rare variants exert large effect sizes on the phenotype. In conjunctional with this conceptual shift technological advances in DNA sequencing techniques have dramatically enhanced whole genome or whole exome sequencing capacity. The sequencing approach affords identification of not only the rare but also the common variants. The approach – whether used in complementation with GWAS or as a stand-alone approach - could define the genetic architecture of the complex phenotypes. Robust phenotyping and large-scale sequencing studies are essential to extract the information content of the vast number of DNA sequence variants (DSVs) in the genome. To garner meaningful clinical information and link the genotype to a phenotype, identification and characterization of a very large number of causal fields beyond the information content of DNA sequence variants would be necessary. This review provides an update on the current progress and limitations in identifying DSVs that are associated with phenotypic effects.
Recently, genome-wide association studies (GWAS) have led to the discovery of hundreds of susceptibility loci that are associated with complex metabolic diseases, such as type 2 diabetes and hyperthyroidism. The majority of the susceptibility loci are common across different races or populations; while some of them show ethnicity-specific distribution. Though the abundant novel susceptibility loci identified by GWAS have provided insight into biology through the discovery of new genes or pathways that were previously not known, most of them are in introns and the associated variants cumulatively explain only a small fraction of total heritability. Here we reviewed the genetic studies on the metabolic disorders, mainly type 2 diabetes and hyperthyroidism, including candidate genes-based findings and more recently the GWAS discovery; we also included the clinical relevance of these novel loci and the gene-environmental interactions. Finally, we discussed the future direction about the genetic study on the exploring of the pathogenesis of the metabolic diseases.
Genome wide association study; Gene-environmental interaction; Hyperthyroidism; Risk prediction; Type 2 diabetes.
Genome-wide association studies (GWAS) have been used successfully in detecting associations between common genetic variants and complex diseases. However, common SNPs detected by current GWAS only explain a small proportion of heritable variability. With the development of next-generation sequencing technologies, researchers find more and more evidence to support the role played by rare variants in heritable variability. However, rare and common variants are often studied separately. The objective of this paper is to develop a robust strategy to analyze association between complex traits and genetic regions using both common and rare variants.
We propose a weighted selective collapsing strategy for both candidate gene studies and genome-wide association scans. The strategy considers genetic information from both common and rare variants, selectively collapses all variants in a given region by a forward selection procedure, and uses an adaptive weight to favor more likely causal rare variants. Under this strategy, two tests are proposed. One test denoted by BwSC is sensitive to the directions of genetic effects, and it separates the deleterious and protective effects into two components. Another denoted by BwSCd is robust in the directions of genetic effects, and it considers the difference of the two components. In our simulation studies, BwSC achieves a higher power when the casual variants have the same genetic effect, while BwSCd is as powerful as several existing tests when a mixed genetic effect exists. Both of the proposed tests work well with and without the existence of genetic effects from common variants.
Two tests using a weighted selective collapsing strategy provide potentially powerful methods for association studies of sequencing data. The tests have a higher power when both common and rare variants contribute to the heritable variability and the effect of common variants is not strong enough to be detected by traditional methods. Our simulation studies have demonstrated a substantially higher power for both tests in all scenarios regardless whether the common SNPs are associated with the trait or not.
Contemporary genomic tools now allow the fast and reliable genotyping of hundreds of thousands of variants and permit an unbiased interrogation of the common variability across the human genome. These technical advances have been the basis of numerous recent investigations of genes underlying complex genetic traits, and the results for blood pressure and hypertension have been of particular interest. The pathophysiology of the complex genetic trait blood pressure and hypertension is unclear. The heritability of essential hypertension is high and insights can be gained by finding associated genes. Current genome-wide association studies (GWAS) have identified 10 to 20 loci in or near genes that generally were not expected to be associated with blood pressure or essential hypertension; more significant variants will be discovered when even larger and more refined studies become available. This article gives a short introduction to GWAS and summarizes the current findings for blood pressure and hypertension.
Blood pressure; Hypertension; Genome-wide association study; Genomics
Large-scale meta-analyses of genome-wide association scans (GWAS) have been successful in discovering common risk variants with modest and small effects. The detection of lower frequency signals will undoubtedly require concerted efforts of at least similar scale. We investigate the sample size-dictated power limits of GWAS meta-analyses, in the presence and absence of modest levels of heterogeneity and across a range of different allelic architectures. We find that data combination through large-scale collaboration is vital in the quest for complex trait susceptibility loci, but that effect size heterogeneity across meta-analysed studies drawn from similar populations does not appear to have a profound effect on sample size requirements.
genetic study; sample size; heterogeneity; replication; study design
Obesity is a classical complex trait, influenced by both genetic and lifestyle factors. The number of obesity gene variants is currently unknown but, based on sound evolutionary principles, likely to be many, each with a modest effect on the phenotype. Recent advances in our knowledge of variation in the human genome and high throughput genotyping technologies have made possible genome-wide association (GWA) analysis and the identification of bona fide susceptibility genes for many complex diseases and phenotypes, including obesity and its comorbid conditions. GWA analysis in even larger numbers of individuals through collaborative efforts of many investigators will likely identify those polygenes of moderate and modest effect size that manifest in our typical environment. Once the subset of real-world-relevant obesity susceptibility variants is identified, follow-up studies, including detailed molecular analysis of the loci, stratified analyses, prospective and interventional studies in humans, and mechanistic studies in cells and animals will allow us to define the genetic architecture of the locus and dissect how these genes interact with specific environmental and other factors. The molecular and analytical tools to accomplish these goals are now in hand, but cooperation among investigators will be necessary to amass the requisite numbers of phenotyped and genotyped individuals. Identification of susceptibility genes for obesity and determining how they interact with each other and the environment will lead to new insights into the molecular, cellular, and physiological basis of energy homeostasis, and novel strategies for prevention and treatment.
DNA sequence variants (DSVs) are major components of the “causal field” for virtually all-medical phenotypes, whether single-gene familial disorders or complex traits without a clear familial aggregation. The causal variants in single gene disorders are necessary and sufficient to impart large effects. In contrast, complex traits are due to a much more complicated network of contributory components that in aggregate increase the probability of disease. The conventional approach to identification of the causal variants for single gene disorders is genetic linkage. However, it does not offer sufficient resolution to map the causal genes in small size families or sporadic cases. The approach to genetic studies of complex traits entails candidate gene or Genome Wide Association Studies (GWAS). GWAS provides an unbiased survey of the effects of common genetic variants (common disease - common variant hypothesis). GWAS have led to identification of a large number of alleles for various cardiovascular diseases. However, common alleles account for a relatively small fraction of the total heritability of the traits. Accordingly, the focus has shifted toward identification of rare variants that might impart larger effect sizes (rare variant-common disease hypothesis). This shift is made feasible by recent advances in massively parallel DNA sequencing platforms, which afford the opportunity to identify virtually all common as well as rare alleles in individuals. In this review, we discuss various strategies that are used to delineate the genetic contribution to medically important cardiovascular phenotypes, emphasizing the utility of the new deep sequencing approaches.
Genetics; Next-Generation Sequencing; Complex traits; Polymorphism
Genome-wide association studies (GWAS) have been applied to various gastrointestinal and liver diseases in recent years. A large number of susceptibility genes and key biological pathways in disease development have been identified. So far, studies in inflammatory bowel diseases, and in particular Crohn’s disease, have been especially successful in defining new susceptibility loci using the GWAS design. The identification of associations related to autophagy as well as several genes involved in immunological response will be important to future research on Crohn’s disease. In this review, key methodological aspects of GWAS, the importance of proper cohort collection, genotyping issues and statistical methods are summarized. Ways of addressing the shortcomings of the GWAS design, when it comes to rare variants, are also discussed. For each of the relevant conditions, findings from the various GWAS are summarized with a focus on the affected biological systems.
Genome-wide association studies; Inflammatory bowel disease; Gastroenterology; Hepatology
Within the last 3 years, genome-wide association studies (GWAS) have had unprecedented success in identifying loci that are involved in common diseases. For example, more than 35 susceptibility loci have been identified for type 2 diabetes and 32 for obesity thus far. However, the causal gene and variant at a specific linkage disequilibrium block is often unclear. Using a combination of different mouse alleles, we can greatly facilitate the understanding of which candidate gene at a particular disease locus is associated with the disease in humans, and also provide functional analysis of variants through an allelic series, including analysis of hypomorph and hypermorph point mutations, and knockout and overexpression alleles. The phenotyping of these alleles for specific traits of interest, in combination with the functional analysis of the genetic variants, may reveal the molecular and cellular mechanism of action of these disease variants, and ultimately lead to the identification of novel therapeutic strategies for common human diseases. In this Commentary, we discuss the progress of GWAS in identifying common disease loci for metabolic disease, and the use of the mouse as a model to confirm candidate genes and provide mechanistic insights.
We conducted a systematic study of top susceptibility variants from a genome-wide association (GWA) study of Bipolar Disorder to gain insight into the functional consequences of genetic variation influencing disease risk. We report here the results of experiments to explore the effects of these susceptibility variants on DNA methylation and mRNA expression in human cerebellum samples. Among the top susceptibility variants, we identified an enrichment of cis regulatory loci on mRNA expression (eQTLs), and a significant excess of quantitative trait loci for DNA CpG methylation, hereafter referred to as mQTLs. Bipolar Disorder susceptibility variants that cis-regulate both cerebellar expression and methylation of the same gene are a very small proportion of Bipolar Disorder susceptibility variants. This finding suggests that mQTLs and eQTLs provide orthogonal ways of functionally annotating genetic variation within the context of studies of pathophysiology in brain. No lymphocyte mQTL enrichment was found, suggesting that mQTL enrichment was specific to the cerebellum, in contrast to eQTLs. Separately, we found that using mQTL information to restrict the number of SNPs studied enhances our ability to detect a significant association. With this restriction a priori informed by the observed functional enrichment, we identified a significant association (rs12618769, Pbonferroni<0.05) from two other GWA studies (TGen+GAIN; 2,191 cases and 1,434 controls) of Bipolar Disorder, which we replicated in an independent GWA study (WTCCC). Collectively, our findings highlight the importance of integrating functional annotation of genetic variants for gene expression and DNA methylation to advance biological understanding of Bipolar Disorder.
In the last years GWA studies have successfully identified common SNPs associated with complex diseases. However, most of the variants found this way account for only a small portion of the trait variance. This fact leads researchers to focus on rare-variant mapping with large scale sequencing, which can be facilitated by using linkage information. The question arises why linkage analysis often fails to identify genes when analyzing complex diseases. Using simulations we have investigated the power of parametric and nonparametric linkage statistics (KC-LOD, NPL, LOD and MOD scores), to detect the effect of genes responsible for complex diseases using different pedigree structures.
As expected, a small number of pedigrees with less than three affected individuals has low power to map disease genes with modest effect. Interestingly, the power decreases when unaffected individuals are included in the analysis, irrespective of the true mode of inheritance. Furthermore, we found that the best performing statistic depends not only on the type of pedigrees but also on the true mode of inheritance.
When applied in a sensible way linkage is an appropriate and robust technique to map genes for complex disease. Unlike association analysis, linkage analysis is not hampered by allelic heterogeneity. So, why does linkage analysis often fail with complex diseases? Evidently, when using an insufficient number of small pedigrees, one might miss a true genetic linkage when actually a real effect exists. Furthermore, we show that the test statistic has an important effect on the power to detect linkage as well. Therefore, a linkage analysis might fail if an inadequate test statistic is employed. We provide recommendations regarding the most favorable test statistics, in terms of power, for a given mode of inheritance and type of pedigrees under study, in order to reduce the probability to miss a true linkage.
Linkage; Parametric analysis; Nonparametric analysis; NPL score; LOD score; MOD score; Complex diseases; Rare variants
The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.
The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review, we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.
New advances in genomic technology are being introduced at a greater speed and are revolutionizing the field of genetics for both complex and Mendelian diseases. For instance, during the past few years, genome-wide association studies (GWAS) have identified a large number of significant associations between genomic loci and movement disorders such as Parkinson’s disease and progressive supranuclear palsy. GWAS are carried out through the use of high-throughput SNP genotyping arrays, which are also used to perform linkage analyses in families previously considered statistically underpowered for genetic analyses. In inherited movement disorders, using this latter technology, it has repeatedly been shown that mutations in a single gene can lead to different phenotypes, while the same clinical entity can be caused by mutations in different genes. This is being highlighted with the use of next-generation sequencing technologies and leads to the search for genes or genetic modifiers that contribute to the phenotypic expression of movement disorders. Establishing an accurate genome–epigenome–phenotype relationship is becoming a major challenge in the post-genomic research that should be facilitated through the implementation of both functional and cellular analyses. In this review, we summarize the latest genetic discoveries made by the use of NGS technologies and purpose future directions and challenges to truly understand the pathophysiology of MDs.
next-generation sequencing; movement disorders; gene discovery; novel neurological phenotypes
Multiple genome-wide association studies (GWASs) and two large scale meta-analyses have been performed for Crohn's disease and have identified 71 susceptibility loci. These findings have contributed greatly to our current understanding of the disease pathogenesis. Yet, these loci only explain approximately 23% of the disease heritability. One of the future challenges in this post-GWAS era is to identify potential sources of the remaining heritability. Such sources may include common variants with limited effect size, rare variants with higher effect sizes, structural variations, or even more complicated mechanisms such as epistatic, gene-environment and epigenetic interactions. Here, we outline potential sources of this hidden heritability, focusing on Crohn's disease and the currently available data. We also discuss future strategies to determine more about the heritability; these strategies include expanding current GWAS, fine-mapping, whole genome sequencing or exome sequencing, and using family-based approaches. Despite the current limitations, such strategies may help to transfer research achievements into clinical practice and guide the improvement of preventive and therapeutic measures.
Many disease-associated variants affect gene expression levels (expression quantitative trait loci, eQTLs) and expression profiling using next generation sequencing (NGS) technology is a powerful way to detect these eQTLs. We analyzed 94 total blood samples from healthy volunteers with DeepSAGE to gain specific insight into how genetic variants affect the expression of genes and lengths of 3′-untranslated regions (3′-UTRs). We detected previously unknown cis-eQTL effects for GWAS hits in disease- and physiology-associated traits. Apart from cis-eQTLs that are typically easily identifiable using microarrays or RNA-sequencing, DeepSAGE also revealed many cis-eQTLs for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. We also identified and confirmed SNPs that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of messenger RNAs (mRNA). We then combined the power of RNA-sequencing with DeepSAGE by performing a meta-analysis of three datasets, leading to the identification of many more cis-eQTLs. Our results indicate that DeepSAGE data is useful for eQTL mapping of known and unknown transcripts, and for identifying SNPs that affect alternative polyadenylation. Because of the inherent differences between DeepSAGE and RNA-sequencing, our complementary, integrative approach leads to greater insight into the molecular consequences of many disease-associated variants.
Many genetic variants that are associated with diseases also affect gene expression levels. We used a next generation sequencing approach targeting 3′ transcript ends (DeepSAGE) to gain specific insight into how genetic variants affect the expression of genes and the usage and length of 3′-untranslated regions. We detected many associations for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. Some of these variants are also associated with disease. We also identified and confirmed variants that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of mRNAs. We conclude that DeepSAGE is useful for detecting eQTL effects on both known and unknown transcripts, and for identifying variants that affect alternative polyadenylation.
Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. ‘Population-based linkage analysis’ (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10−6). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority.
Genome-wide association studies are providing new insights into the genetic basis of metabolic and cardiovascular traits. In the past 3 years, common variants in ∼50 loci have been strongly associated with metabolic and cardiovascular traits. Several of these loci have implicated genes without a previously known connection with metabolism. Further studies will be required to characterize the full impact of these loci on metabolism. Many of the identified loci include multiple independent variants that influence the same metabolic or cardiovascular trait and a few loci harbor independent variants that each influence distinct traits. The total proportion of trait heritability explained by variants identified so far is still modest (typically <10%). Future studies will build on these successes by identifying additional common and rare variants and by determining the functional impact of the underlying alleles and genes.
Rapid advances in sequencing technologies set the stage for the large-scale medical sequencing efforts to be performed in the near future, with the goal of assessing the importance of rare variants in complex diseases. The discovery of new disease susceptibility genes requires powerful statistical methods for rare variant analysis. The low frequency and the expected large number of such variants pose great difficulties for the analysis of these data. We propose here a robust and powerful testing strategy to study the role rare variants may play in affecting susceptibility to complex traits. The strategy is based on assessing whether rare variants in a genetic region collectively occur at significantly higher frequencies in cases compared with controls (or vice versa). A main feature of the proposed methodology is that, although it is an overall test assessing a possibly large number of rare variants simultaneously, the disease variants can be both protective and risk variants, with moderate decreases in statistical power when both types of variants are present. Using simulations, we show that this approach can be powerful under complex and general disease models, as well as in larger genetic regions where the proportion of disease susceptibility variants may be small. Comparisons with previously published tests on simulated data show that the proposed approach can have better power than the existing methods. An application to a recently published study on Type-1 Diabetes finds rare variants in gene IFIH1 to be protective against Type-1 Diabetes.
Risk to common diseases, such as diabetes, heart disease, etc., is influenced by a complex interaction among genetic and environmental factors. Most of the disease-association studies conducted so far have focused on common variants, widely available on genotyping platforms. However, recent advances in sequencing technologies pave the way for large-scale medical sequencing studies with the goal of elucidating the role rare variants may play in affecting susceptibility to complex traits. The large number of rare variants and their low frequencies pose great challenges for the analysis of these data. We present here a novel testing strategy, based on a weighted-sum statistic, that is less sensitive than existing methods to the presence of both risk and protective variants in the genetic region under investigation. We show applications to simulated data and to a real dataset on Type-1 Diabetes.
It is widely believed that both common and rare variants contribute to the risks of common diseases or complex traits and the cumulative effects of multiple rare variants can explain a significant proportion of trait variances. Advances in high-throughput DNA sequencing technologies allow us to genotype rare causal variants and investigate the effects of such rare variants on complex traits. We developed an adaptive ridge regression method to analyze the collective effects of multiple variants in the same gene or the same functional unit. Our model focuses on continuous trait and incorporates covariate factors to remove potential confounding effects. The proposed method estimates and tests multiple rare variants collectively but does not depend on the assumption of same direction of each rare variant effect. Compared with the Bayesian hierarchical generalized linear model approach, the state-of-the-art method of rare variant detection, the proposed new method is easy to implement, yet it has higher statistical power. Application of the new method is demonstrated using the well-known data from the Dallas Heart Study.
Coronary atherosclerosis is a complex heritable trait with an enigmatic genetic etiology. Genome-wide association studies (GWAS) have successfully led to identification of over 100 different loci for susceptibility to coronary atherosclerosis. Most identified single nucleotide polymorphisms (SNP)s and genes have not been previously implicated in the pathogenesis of atherosclerosis and hence, have modest biological plausibility. The novel discoveries, however, might provide the opportunity for identification of new pathways and consequently novel preventive and therapeutic targets. A notable outcome of GWAS is relatively modest effect sizes of the associated SNPs. Collectively, the identified SNPs account for a relatively small fraction of heritability of coronary atherosclerosis, which raises the question of “missing heritability”. Because GWAS test the common disease – comment variant hypothesis, a plausible explanation might be the presence of uncommon and rare variants in the genome that are untested in GWAS but that might exert large effect sizes on the risk of atherosclerosis. The latter, however, remains an empiric question pending validation through experimentation. Alternative mechanisms, such as transgenerational epigenetics including microRNAs, might in part account for the heritability of coronary atherosclerosis. Collectively, the recent findings are indicative of the etiological complexity of coronary atherosclerosis. Hence, it is expected that genetic etiology of coronary atherosclerosis will remain enigmatic in the foreseeable future.
Atherosclerosis; Coronary artery disease; Genetics; GWAS; Polymorphism
The International HapMap Project produced a genome-wide database of human genetic variation for use in genetic association studies of common diseases. The initial output of these studies has been overwhelming, with over 150 risk loci identified in studies of more than 60 common diseases and traits. These associations have suggested previously unsuspected etiologic pathways for common diseases that will be of use in identifying new therapeutic targets and developing targeted interventions based on genetically defined risk. Here we examine the development and application of the HapMap to genome-wide association (GWA) studies; present and future technologies for GWA research; current major efforts in GWA studies; successes and limitations of the GWA approach in identifying polymorphisms related to complex diseases; data release and privacy polices; use of these findings by clinicians, the public, and academic physicians; and sources of ongoing authoritative information on this rapidly evolving field.
complex diseases; genetic association; genomic variation
In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.
A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.
Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible.