The approach to molecular genetic studies of complex phenotypes has evolved considerably during the recent years. The candidate gene approach, restricted to analysis of a few single nucleotide polymorphisms (SNPs) in a modest number of cases and controls, has been supplanted by the unbiased approach of Genome-Wide Association Studies (GWAS), wherein a large number of tagger SNPs are typed in a large number of individuals. GWAS, which are designed upon the common disease- common variant hypothesis (CD-CV), have identified a large number of SNPs and loci for complex phenotypes. However, alleles identified through GWAS are typically not causative but rather in linkage disequilibrium (LD) with the true causal variants. The common alleles, which may not capture the uncommon and rare variants, account only for a fraction of heritability of the complex traits. Hence, the focus is being shifted to rare variants – common disease (RV-CD) hypothesis, surmising that rare variants exert large effect sizes on the phenotype. In conjunctional with this conceptual shift technological advances in DNA sequencing techniques have dramatically enhanced whole genome or whole exome sequencing capacity. The sequencing approach affords identification of not only the rare but also the common variants. The approach – whether used in complementation with GWAS or as a stand-alone approach - could define the genetic architecture of the complex phenotypes. Robust phenotyping and large-scale sequencing studies are essential to extract the information content of the vast number of DNA sequence variants (DSVs) in the genome. To garner meaningful clinical information and link the genotype to a phenotype, identification and characterization of a very large number of causal fields beyond the information content of DNA sequence variants would be necessary. This review provides an update on the current progress and limitations in identifying DSVs that are associated with phenotypic effects.
Genome-wide association studies (GWAS) have led to a rapid increase in available data on common genetic variants and phenotypes and numerous discoveries of new loci associated with susceptibility to common complex diseases. Integrating the evidence from GWAS and candidate gene studies depends on concerted efforts in data production, online publication, database development, and continuously updated data synthesis. Here the authors summarize current experience and challenges on these fronts, which were discussed at a 2008 multidisciplinary workshop sponsored by the Human Genome Epidemiology Network. Comprehensive field synopses that integrate many reported gene-disease associations have been systematically developed for several fields, including Alzheimer's disease, schizophrenia, bladder cancer, coronary heart disease, preterm birth, and DNA repair genes in various cancers. The authors summarize insights from these field synopses and discuss remaining unresolved issues—especially in the light of evidence from GWAS, for which they summarize empirical P-value and effect-size data on 223 discovered associations for binary outcomes (142 with P < 10−7). They also present a vision of collaboration that builds reliable cumulative evidence for genetic associations with common complex diseases and a transparent, distributed, authoritative knowledge base on genetic variation and human health. As a next step in the evolution of Human Genome Epidemiology reviews, the authors invite investigators to submit field synopses for possible publication in the American Journal of Epidemiology.
association; database; encyclopedias; epidemiologic methods; genome, human; genome-wide association study; genomics; meta-analysis
Despite numerous candidate gene and linkage studies, the field of type 2 diabetes (T2D) genetics had until recently succeeded in identifying few genuine disease-susceptibility loci. The advent of genome-wide association (GWA) scans has transformed the situation, leading to an expansion in the number of established, robustly replicating T2D loci to almost 20. These novel findings offer unique insights into the pathogenesis of T2D and in the main point towards the etiological importance of disorders of beta-cell development and function. All associated variants have common allele frequencies in the discovery populations, and exert modest to small effects on the risk of disease, characteristics which limit their prognostic and diagnostic potential. However, ongoing studies focussing on the role of copy number variation and targeting low frequency polymorphisms should identify additional T2D-susceptibility loci, some of which may have larger effect sizes and offer better individual prediction of disease risk.
Recently, genome-wide association studies (GWAS) have led to the discovery of hundreds of susceptibility loci that are associated with complex metabolic diseases, such as type 2 diabetes and hyperthyroidism. The majority of the susceptibility loci are common across different races or populations; while some of them show ethnicity-specific distribution. Though the abundant novel susceptibility loci identified by GWAS have provided insight into biology through the discovery of new genes or pathways that were previously not known, most of them are in introns and the associated variants cumulatively explain only a small fraction of total heritability. Here we reviewed the genetic studies on the metabolic disorders, mainly type 2 diabetes and hyperthyroidism, including candidate genes-based findings and more recently the GWAS discovery; we also included the clinical relevance of these novel loci and the gene-environmental interactions. Finally, we discussed the future direction about the genetic study on the exploring of the pathogenesis of the metabolic diseases.
Genome wide association study; Gene-environmental interaction; Hyperthyroidism; Risk prediction; Type 2 diabetes.
Contemporary genomic tools now allow the fast and reliable genotyping of hundreds of thousands of variants and permit an unbiased interrogation of the common variability across the human genome. These technical advances have been the basis of numerous recent investigations of genes underlying complex genetic traits, and the results for blood pressure and hypertension have been of particular interest. The pathophysiology of the complex genetic trait blood pressure and hypertension is unclear. The heritability of essential hypertension is high and insights can be gained by finding associated genes. Current genome-wide association studies (GWAS) have identified 10 to 20 loci in or near genes that generally were not expected to be associated with blood pressure or essential hypertension; more significant variants will be discovered when even larger and more refined studies become available. This article gives a short introduction to GWAS and summarizes the current findings for blood pressure and hypertension.
Blood pressure; Hypertension; Genome-wide association study; Genomics
Large-scale meta-analyses of genome-wide association scans (GWAS) have been successful in discovering common risk variants with modest and small effects. The detection of lower frequency signals will undoubtedly require concerted efforts of at least similar scale. We investigate the sample size-dictated power limits of GWAS meta-analyses, in the presence and absence of modest levels of heterogeneity and across a range of different allelic architectures. We find that data combination through large-scale collaboration is vital in the quest for complex trait susceptibility loci, but that effect size heterogeneity across meta-analysed studies drawn from similar populations does not appear to have a profound effect on sample size requirements.
genetic study; sample size; heterogeneity; replication; study design
Over the past two decades, DNA samples from thousands of families have been collected and genotyped for linkage studies of common complex diseases such as type 2 diabetes, asthma and prostate cancer. Unfortunately, little success has been achieved in identifying genetic susceptibility risk factors through these considerable efforts. However, significant success in identifying common disease risk-associated variants has been recently achieved from genome-wide association (GWA) studies using unrelated case-control samples. These GWA studies are typically performed using population-based cases and controls that are ascertained irrespective of their family history for the disease of interest. Few genetic association studies have taken full advantage of the considerable resources that are available from the linkage-based family collections despite evidence showing cases that have a positive family history of disease are more likely to carry common genetic variants associated with disease susceptibility. Herein, we argue that population stratification is still a concern in case-control genetic association studies, despite the development of analytic methods designed to account for this source of confounding, for a subset of SNPs in the genome, most notably those SNPs in regions involved with natural selection. We note that current analytic approaches designed to address the issue of population stratification in case-control studies cannot definitively distinguish between true and false associations and we argue that family-based samples can still serve an invaluable role in following-up findings from case-control studies.
population stratification; prostate cancer; association; 8q24
DNA sequence variants (DSVs) are major components of the “causal field” for virtually all-medical phenotypes, whether single-gene familial disorders or complex traits without a clear familial aggregation. The causal variants in single gene disorders are necessary and sufficient to impart large effects. In contrast, complex traits are due to a much more complicated network of contributory components that in aggregate increase the probability of disease. The conventional approach to identification of the causal variants for single gene disorders is genetic linkage. However, it does not offer sufficient resolution to map the causal genes in small size families or sporadic cases. The approach to genetic studies of complex traits entails candidate gene or Genome Wide Association Studies (GWAS). GWAS provides an unbiased survey of the effects of common genetic variants (common disease - common variant hypothesis). GWAS have led to identification of a large number of alleles for various cardiovascular diseases. However, common alleles account for a relatively small fraction of the total heritability of the traits. Accordingly, the focus has shifted toward identification of rare variants that might impart larger effect sizes (rare variant-common disease hypothesis). This shift is made feasible by recent advances in massively parallel DNA sequencing platforms, which afford the opportunity to identify virtually all common as well as rare alleles in individuals. In this review, we discuss various strategies that are used to delineate the genetic contribution to medically important cardiovascular phenotypes, emphasizing the utility of the new deep sequencing approaches.
Genetics; Next-Generation Sequencing; Complex traits; Polymorphism
In the past decade, significant progress in genomic medicine and technological advances have revolutionized our approach to common complex disorders in many areas of medicine, including ophthalmology. A major disorder that still needs major genetic progress is diabetic retinopathy (DR), one of the leading causes of blindness in adults.
To perform a literature review, present the current findings, and highlight some key challenges.
Thorough literature review of the genetic factors for DR, including heritability scores, twin studies, family studies, candidate gene studies, linkage studies, and genome-wide association studies (GWAS).
While there is clear demonstration of a genetic contribution in the development and progression of DR, the identification of susceptibility loci through candidate gene approaches, linkage studies, and GWAS is still in its infancy. The greatest obstacles remain a lack of power due to small sample size of available studies and a lack of phenotype standardization. In this review, we also discuss novel technologies and novel approaches, such as intermediate phenotypes for biomarkers, proteomics, metabolomics, exome chips, and next-generation sequencing that may facilitate future studies of DR.
Conclusions and Relevance
The field of the genetics of DR is still in its infancy and is a challenge due to the complexity of the disease itself. This review outlines some strategies and lessons for future investigation to improve our understanding of this most complex of genetic disorders.
Within the last 3 years, genome-wide association studies (GWAS) have had unprecedented success in identifying loci that are involved in common diseases. For example, more than 35 susceptibility loci have been identified for type 2 diabetes and 32 for obesity thus far. However, the causal gene and variant at a specific linkage disequilibrium block is often unclear. Using a combination of different mouse alleles, we can greatly facilitate the understanding of which candidate gene at a particular disease locus is associated with the disease in humans, and also provide functional analysis of variants through an allelic series, including analysis of hypomorph and hypermorph point mutations, and knockout and overexpression alleles. The phenotyping of these alleles for specific traits of interest, in combination with the functional analysis of the genetic variants, may reveal the molecular and cellular mechanism of action of these disease variants, and ultimately lead to the identification of novel therapeutic strategies for common human diseases. In this Commentary, we discuss the progress of GWAS in identifying common disease loci for metabolic disease, and the use of the mouse as a model to confirm candidate genes and provide mechanistic insights.
Genome-wide association studies (GWAS) have been applied to various gastrointestinal and liver diseases in recent years. A large number of susceptibility genes and key biological pathways in disease development have been identified. So far, studies in inflammatory bowel diseases, and in particular Crohn’s disease, have been especially successful in defining new susceptibility loci using the GWAS design. The identification of associations related to autophagy as well as several genes involved in immunological response will be important to future research on Crohn’s disease. In this review, key methodological aspects of GWAS, the importance of proper cohort collection, genotyping issues and statistical methods are summarized. Ways of addressing the shortcomings of the GWAS design, when it comes to rare variants, are also discussed. For each of the relevant conditions, findings from the various GWAS are summarized with a focus on the affected biological systems.
Genome-wide association studies; Inflammatory bowel disease; Gastroenterology; Hepatology
We conducted a systematic study of top susceptibility variants from a genome-wide association (GWA) study of Bipolar Disorder to gain insight into the functional consequences of genetic variation influencing disease risk. We report here the results of experiments to explore the effects of these susceptibility variants on DNA methylation and mRNA expression in human cerebellum samples. Among the top susceptibility variants, we identified an enrichment of cis regulatory loci on mRNA expression (eQTLs), and a significant excess of quantitative trait loci for DNA CpG methylation, hereafter referred to as mQTLs. Bipolar Disorder susceptibility variants that cis-regulate both cerebellar expression and methylation of the same gene are a very small proportion of Bipolar Disorder susceptibility variants. This finding suggests that mQTLs and eQTLs provide orthogonal ways of functionally annotating genetic variation within the context of studies of pathophysiology in brain. No lymphocyte mQTL enrichment was found, suggesting that mQTL enrichment was specific to the cerebellum, in contrast to eQTLs. Separately, we found that using mQTL information to restrict the number of SNPs studied enhances our ability to detect a significant association. With this restriction a priori informed by the observed functional enrichment, we identified a significant association (rs12618769, Pbonferroni<0.05) from two other GWA studies (TGen+GAIN; 2,191 cases and 1,434 controls) of Bipolar Disorder, which we replicated in an independent GWA study (WTCCC). Collectively, our findings highlight the importance of integrating functional annotation of genetic variants for gene expression and DNA methylation to advance biological understanding of Bipolar Disorder.
In the last years GWA studies have successfully identified common SNPs associated with complex diseases. However, most of the variants found this way account for only a small portion of the trait variance. This fact leads researchers to focus on rare-variant mapping with large scale sequencing, which can be facilitated by using linkage information. The question arises why linkage analysis often fails to identify genes when analyzing complex diseases. Using simulations we have investigated the power of parametric and nonparametric linkage statistics (KC-LOD, NPL, LOD and MOD scores), to detect the effect of genes responsible for complex diseases using different pedigree structures.
As expected, a small number of pedigrees with less than three affected individuals has low power to map disease genes with modest effect. Interestingly, the power decreases when unaffected individuals are included in the analysis, irrespective of the true mode of inheritance. Furthermore, we found that the best performing statistic depends not only on the type of pedigrees but also on the true mode of inheritance.
When applied in a sensible way linkage is an appropriate and robust technique to map genes for complex disease. Unlike association analysis, linkage analysis is not hampered by allelic heterogeneity. So, why does linkage analysis often fail with complex diseases? Evidently, when using an insufficient number of small pedigrees, one might miss a true genetic linkage when actually a real effect exists. Furthermore, we show that the test statistic has an important effect on the power to detect linkage as well. Therefore, a linkage analysis might fail if an inadequate test statistic is employed. We provide recommendations regarding the most favorable test statistics, in terms of power, for a given mode of inheritance and type of pedigrees under study, in order to reduce the probability to miss a true linkage.
Linkage; Parametric analysis; Nonparametric analysis; NPL score; LOD score; MOD score; Complex diseases; Rare variants
Genome-wide association studies (GWAS) have identified reproducible genetic associations with hundreds of human diseases and traits. The vast majority of these associated single nucleotide polymorphisms (SNPs) are non-coding, highlighting the challenge in moving from genetic findings to mechanistic and functional insights. Nevertheless, large-scale (epi)genomic studies and bioinformatic analyses strongly suggest that GWAS hits are not randomly distributed in the genome but rather pinpoint specific biological pathways important for disease development or phenotypic variation. In this review, we focus on GWAS discoveries for the three main blood cell types: red blood cells, white blood cells and platelets. We summarize the knowledge gained from GWAS of these phenotypes and discuss their possible clinical implications for common (e.g., anemia) and rare (e.g., myeloproliferative neoplasms) human blood-related diseases. Finally, we argue that blood phenotypes are ideal to study the genetics of complex human traits because they are fully amenable to experimental testing.
GWAS; hemoglobin; hematocrit; red blood cell; erythrocyte; white blood cell; leukocyte; platelet; human genetics
The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.
The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review, we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.
Genome-wide association studies (GWAS) have been used successfully in detecting associations between common genetic variants and complex diseases. However, common SNPs detected by current GWAS only explain a small proportion of heritable variability. With the development of next-generation sequencing technologies, researchers find more and more evidence to support the role played by rare variants in heritable variability. However, rare and common variants are often studied separately. The objective of this paper is to develop a robust strategy to analyze association between complex traits and genetic regions using both common and rare variants.
We propose a weighted selective collapsing strategy for both candidate gene studies and genome-wide association scans. The strategy considers genetic information from both common and rare variants, selectively collapses all variants in a given region by a forward selection procedure, and uses an adaptive weight to favor more likely causal rare variants. Under this strategy, two tests are proposed. One test denoted by BwSC is sensitive to the directions of genetic effects, and it separates the deleterious and protective effects into two components. Another denoted by BwSCd is robust in the directions of genetic effects, and it considers the difference of the two components. In our simulation studies, BwSC achieves a higher power when the casual variants have the same genetic effect, while BwSCd is as powerful as several existing tests when a mixed genetic effect exists. Both of the proposed tests work well with and without the existence of genetic effects from common variants.
Two tests using a weighted selective collapsing strategy provide potentially powerful methods for association studies of sequencing data. The tests have a higher power when both common and rare variants contribute to the heritable variability and the effect of common variants is not strong enough to be detected by traditional methods. Our simulation studies have demonstrated a substantially higher power for both tests in all scenarios regardless whether the common SNPs are associated with the trait or not.
Multiple genome-wide association studies (GWASs) and two large scale meta-analyses have been performed for Crohn's disease and have identified 71 susceptibility loci. These findings have contributed greatly to our current understanding of the disease pathogenesis. Yet, these loci only explain approximately 23% of the disease heritability. One of the future challenges in this post-GWAS era is to identify potential sources of the remaining heritability. Such sources may include common variants with limited effect size, rare variants with higher effect sizes, structural variations, or even more complicated mechanisms such as epistatic, gene-environment and epigenetic interactions. Here, we outline potential sources of this hidden heritability, focusing on Crohn's disease and the currently available data. We also discuss future strategies to determine more about the heritability; these strategies include expanding current GWAS, fine-mapping, whole genome sequencing or exome sequencing, and using family-based approaches. Despite the current limitations, such strategies may help to transfer research achievements into clinical practice and guide the improvement of preventive and therapeutic measures.
Genome-wide association studies (GWAS) have identified around 60 common variants associated with multiple sclerosis (MS), but these loci only explain a fraction of the heritability of MS. Some missing heritability may be caused by rare variants that have been suggested to play an important role in the aetiology of complex diseases such as MS. However current genetic and statistical methods for detecting rare variants are expensive and time consuming. ‘Population-based linkage analysis’ (PBLA) or so called identity-by-descent (IBD) mapping is a novel way to detect rare variants in extant GWAS datasets. We employed BEAGLE fastIBD to search for rare MS variants utilising IBD mapping in a large GWAS dataset of 3,543 cases and 5,898 controls. We identified a genome-wide significant linkage signal on chromosome 19 (LOD = 4.65; p = 1.9×10−6). Network analysis of cases and controls sharing haplotypes on chromosome 19 further strengthened the association as there are more large networks of cases sharing haplotypes than controls. This linkage region includes a cluster of zinc finger genes of unknown function. Analysis of genome wide transcriptome data suggests that genes in this zinc finger cluster may be involved in very early developmental regulation of the CNS. Our study also indicates that BEAGLE fastIBD allowed identification of rare variants in large unrelated population with moderate computational intensity. Even with the development of whole-genome sequencing, IBD mapping still may be a promising way to narrow down the region of interest for sequencing priority.
Genome-wide association studies are providing new insights into the genetic basis of metabolic and cardiovascular traits. In the past 3 years, common variants in ∼50 loci have been strongly associated with metabolic and cardiovascular traits. Several of these loci have implicated genes without a previously known connection with metabolism. Further studies will be required to characterize the full impact of these loci on metabolism. Many of the identified loci include multiple independent variants that influence the same metabolic or cardiovascular trait and a few loci harbor independent variants that each influence distinct traits. The total proportion of trait heritability explained by variants identified so far is still modest (typically <10%). Future studies will build on these successes by identifying additional common and rare variants and by determining the functional impact of the underlying alleles and genes.
Obesity is a classical complex trait, influenced by both genetic and lifestyle factors. The number of obesity gene variants is currently unknown but, based on sound evolutionary principles, likely to be many, each with a modest effect on the phenotype. Recent advances in our knowledge of variation in the human genome and high throughput genotyping technologies have made possible genome-wide association (GWA) analysis and the identification of bona fide susceptibility genes for many complex diseases and phenotypes, including obesity and its comorbid conditions. GWA analysis in even larger numbers of individuals through collaborative efforts of many investigators will likely identify those polygenes of moderate and modest effect size that manifest in our typical environment. Once the subset of real-world-relevant obesity susceptibility variants is identified, follow-up studies, including detailed molecular analysis of the loci, stratified analyses, prospective and interventional studies in humans, and mechanistic studies in cells and animals will allow us to define the genetic architecture of the locus and dissect how these genes interact with specific environmental and other factors. The molecular and analytical tools to accomplish these goals are now in hand, but cooperation among investigators will be necessary to amass the requisite numbers of phenotyped and genotyped individuals. Identification of susceptibility genes for obesity and determining how they interact with each other and the environment will lead to new insights into the molecular, cellular, and physiological basis of energy homeostasis, and novel strategies for prevention and treatment.
Genetic association and linkage studies can provide insights into complex disease biology, guiding the development of new diagnostic and therapeutic strategies. Over the past decade, genetic association studies have largely focused on common, easy to measure genetic variants shared between many individuals. These common variants typically have subtle functional consequence and translating the resulting association signals into biological insights can be challenging. In the last few years, exome sequencing has emerged as a cost-effective strategy for extending these studies to include rare coding variants, which often have more marked functional consequences. Here, we provide practical guidance in the design and analysis of complex trait association studies focused on rare, coding variants.
There has been considerable progress in our understanding of the genetic architecture of susceptibility to inflammatory diseases in recent years: several hundred susceptibility loci have been discovered in genome-wide association studies (GWAS) of human populations. This success has created an important challenge in identifying the functional consequences of these risk-associated variants and in elucidating how the repercussions of individual susceptibility loci integrate to yield dysregulation of immune pathways and, ultimately, syndromic clinical phenotypes. The integration of GWAS association signals with high-resolution transcriptome and other genomic data that capture the dynamics of cellular state and function in the context of individual's collection of susceptibility alleles has proven to be a successful avenue of investigation. The rapid pace of methodological development in this area has been coupled with an accumulation of experimental data that makes the elucidation of complex biological networks underlying susceptibility to these common inflammatory diseases a reasonable goal in the near future.
Coronary atherosclerosis is a complex heritable trait with an enigmatic genetic etiology. Genome-wide association studies (GWAS) have successfully led to identification of over 100 different loci for susceptibility to coronary atherosclerosis. Most identified single nucleotide polymorphisms (SNP)s and genes have not been previously implicated in the pathogenesis of atherosclerosis and hence, have modest biological plausibility. The novel discoveries, however, might provide the opportunity for identification of new pathways and consequently novel preventive and therapeutic targets. A notable outcome of GWAS is relatively modest effect sizes of the associated SNPs. Collectively, the identified SNPs account for a relatively small fraction of heritability of coronary atherosclerosis, which raises the question of “missing heritability”. Because GWAS test the common disease – comment variant hypothesis, a plausible explanation might be the presence of uncommon and rare variants in the genome that are untested in GWAS but that might exert large effect sizes on the risk of atherosclerosis. The latter, however, remains an empiric question pending validation through experimentation. Alternative mechanisms, such as transgenerational epigenetics including microRNAs, might in part account for the heritability of coronary atherosclerosis. Collectively, the recent findings are indicative of the etiological complexity of coronary atherosclerosis. Hence, it is expected that genetic etiology of coronary atherosclerosis will remain enigmatic in the foreseeable future.
Atherosclerosis; Coronary artery disease; Genetics; GWAS; Polymorphism
In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.
A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.
Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible.