|Home | About | Journals | Submit | Contact Us | Français|
Although genetic studies have been critically important for the identification of therapeutic targets in Mendelian disorders, genetic approaches aiming to identify targets for common, complex diseases have traditionally had much more limited success. However, during the past year, a novel genetic approach — genome-wide association (GWA) — has demonstrated its potential to identify common genetic variants associated with complex diseases such as diabetes, inflammatory bowel disease and cancer. Here, we highlight some of these recent successes, and discuss the potential for GWA studies to identify novel therapeutic targets and genetic biomarkers that will be useful for drug discovery, patient selection and stratification in common diseases.
Genetic factors are known to have an important role in many common diseases, and the identification of genetic determinants for such diseases has the potential to provide insights into disease pathogenesis, revealing novel therapeutic targets or strategies. Genetic factors could also provide useful biomarkers for diagnosis, patient stratification and prognostic or therapeutic categorization. In addition, given that inherited genetic factors are present at birth, knowledge of these factors could facilitate timely preventative or ameliorative interventions.
During the past 25 years, genetic linkage-based studies have proved very effective in identifying causal genetic factors in Mendelian (single gene) disorders; causal genes for more than 1,300 dominant and recessive Mendelian diseases have been identified1. Most common diseases and endophenotypes, however, do not exhibit Mendelian inheritance, but rather feature complex, multifactorial expression and inheritance. Although linkage-based methods have been broadly applied, these studies have had little success in identifying the allelic determinants of common disorders2. In particular, there has been poor replication among studies, whereby an initial study identifies an allele (genotype) with large estimated genetic effects (relative risk) but subsequent studies fail to corroborate the results3,4. In part, this reflects the dependence of linkage-based studies on unusually informative families (with multiple affected and unaffected individuals), which induce a bias toward rare, semi-Mendelian disease subsets in subpopulations. Reports of successful identification of genetic variants in common diseases using an approach that circumvents this limitation — genome-wide association (GWA) studies — have therefore generated considerable excitement.
Human GWA studies are based on three hypotheses: First, the common trait/common variant hypothesis proposes that the genetic architecture of complex traits consists of a limited number of common alleles, each conferring a small increase in risk to the individual5,6; second, the brief history of most human populations precludes sufficient generations (or meioses) to create recombination events (or mutations) between closely located, common (ancient) variants; and, third, suppression of meiotic recombination (coldspots) occurs very frequently. Thus, approximately 80% of the human genome is comprised of around 10 kb regions that exhibit reduced recombination in human populations (haplotypes)7. Genetic variants (alleles) within haplotypes are in linkage disequilibrium (LD). This phenomenon enables much of the recombination history in a population to be ascertained by genotyping a large set of well-spaced, common (ancient) variants throughout the genome, especially if variant selection is informed by knowledge of haplotypes. During the last 10 years, more than 10 million single nucleotide polymorphisms (SNPs) have been identified8. Furthermore, the International HapMap project has genotyped approximately 4 million common SNPs (occurring with a minor-allele frequency of more than 5%) in human populations and has assembled these genotypes computationally into a genome-wide map of SNP-tagged haplotypes7. These resources, together with array technologies for massively parallel SNP genotyping and the well-established epidemiological case-control association studies have rendered GWA feasible (BOX 1, FIG. 1).
Initial genetic association studies focused on candidate loci and exhibited a lack of replication among studies9,10. There were biological explanations for inconsistent results: unobserved, confounding biological sources of heterogeneity, including inconsistent or poorly defined measurements of the phenotype, heterogeneous genetic sources for the phenotype (genocopies), population stratification (ethnic ancestry), population-specific LD, heterogeneous genetic and epigenetic backgrounds or heterogeneous environmental influences (phenocopies). In addition, there were statistical reasons for irreproducibility, including failure to control the rate of false discoveries, model misspecification and heterogeneous bias in estimated effects among studies11–14. Also, a frequent source of non-replication was lack of power due to the limited number of individuals genotyped and phenotyped15,16.
In order to ameliorate poor replication, GWA experiments employ multi-tiered experimental designs with discovery, replication and biological validation stages17 (FIG. 1). Tiered designs are critical for cost-effective detection of meaningful, hypothesis-generating, genotype–phenotype associations given the large number of comparisons involved, prior probability estimates of association, sample sizes, resampling procedures and statistical significance thresholds. GWA studies also owe their statistical power to their large cohort size and high rate of SNP detection. Currently, a respected threshold for uncorrected, significant associations is P <5 × 10−7 (REFS 18,19). Alleles with moderately less significant associations, however, are often also reported, as they might indicate loci that reach the aforementioned threshold in subsequent studies.
The first GWA study, published in 2002, evaluated acute myocardial infarction (AMI)20. The discovery, or nomination, phase comprised the examination of genotype–phenotype association signals in 65,671 coding domain SNPs (cSNPs) in 752 cases and controls (TABLE 1). Although subsequent studies have used up to 20 times this number of non-coding SNPs, gene-tagging SNPs are more informative, as the majority of true-positive associations are expected to be with genes1. Even more informative are screens that employ functional cSNPs, such as nonsynonymous SNPs (nsSNPs), that are candidate, causal (risk-enhancing) gene alleles1,21–28. The replication, or confirmatory, phase examined associations of 26 SNPs in 2,137 individuals and confirmed association of AMI with a 50 kb region containing lymphotoxin-α (LTA), nuclear factor of kappa light polypeptide gene enhancer in B cells (also known as RELA), nuclear factor of kappa light polypeptide gene enhancer in B cells inhibitor-like 1 (NFKBIL1) and human leukocyte antigen (HLA)-B associated transcript 1 (BAT1) genes. Additional replication studies have been undertaken, some of which have confirmed an association of this region with AMI-related phenotypes and, in particular, one nsSNP in LTA29–35. The association of LTA with AMI was an unexpected finding, suggesting a novel therapeutic target.
A second, pioneering GWA study examined age-related macular degeneration (AMD)36 (TABLE 1). The discovery phase sought associations of 105,980 SNPs with AMD in 96 cases and 50 control individuals. Despite the small cohort size, SNPs in the complement factor H (CFH) gene, including an nsSNP, showed significant association with AMD. Replication was not performed, but subsequent studies have replicated associations of CFH alleles with AMD37–40. Of all common diseases examined by GWA to date, AMD is unique in that a single haplotype explains 61% of the genetic variance, conferring a homozygous odds ratio of 7.4. To put this in perspective, this is of a similar magnitude to the classic associations of HLA-B27 with anterior uveitis/ankylosing spondylitis and HLA alleles with type 1 diabetes mellitus (T1DM). Complement pathway dysregulation was a novel, unexpected association with AMD. Subsequent studies have shown an association of AMD with two additional members of the alternative complement pathway (factor B (CFB) and C3)41,42. These findings, together with biological validation studies, have led to the initial development of new AMD therapies, based upon complement inhibition.
In the past year, technical challenges associated with GWA were largely overcome, genotyping costs were decreased and a significant number of studies have used SNP genotyping arrays in larger population groups to produce replicated associations between individual SNP alleles and common diseases.
Five large GWA studies have examined Crohn’s disease and ulcerative colitis, two histologically distinct types of inflammatory bowel disease (IBD) (TABLE 1). Four of the studies used micro-arrays featuring between 300,000 and 400,000 SNPs18,43–45, whereas the fifth study genotyped approximately 16,000 nsSNPs24. Two follow-up studies sought to replicate the most significant signals from the Wellcome Trust case control consortium (WTCCC) study18, one in a European population and another in a Japanese population46,47. The European study replicated significant signals of the WTCCC study, but some of the alleles failed to reach significance in the Japanese study and others were not detected. The failure to replicate signals in different studies might reflect true differences between populations, differences in phenotype ascertainment or a lack of power.
Considering the six studies of European populations, there was significant replication of specific allele associations with Crohn’s disease (TABLES 1,,2).2). Three associations were concordant in four out of five studies (representing the genes caspase recruitment domain 15 protein (CARD15, also known as NOD2), interleukin 23 receptor (IIL23R) and ATG16 autophagy related 16-like 1 (ATG16L1)). Of note, CARD15 had previously been identified as a susceptibility gene by linkage-based approaches48,49. One gene, prostaglandin E receptor 4 (PTGER4) showed association in two out of five studies. In addition, several disease-associated intergenic segments have been replicated. IBD susceptibility genes that have been identified to date appear to coalesce into biological networks involving innate immunity, autophagy and phagocytosis50. In addition, alleles of two genes associated with Crohn’s disease (IL23R and PTPN2) have shown association with other autoimmune disorders21,51, suggesting the existence of autoimmune susceptibility ‘supergenes’. There is great interest in alleles that exhibit pleiotropic associations, as they potentially represent blockbuster targets that cross-over therapeutic categories (TABLE 2).
In common with most GWA studies to date, estimated genetic effects (relative risks) of IBD-associated loci are small18,46. However, as many of these variants were common, the population attributable risk — an estimate of the percentage of cases of disease that would be avoided if the allele(s) were absent — was substantial. Of several studies that looked for epistatic interactions between IBD association signals, two found suggestive evidence of epistasis involving two different pairs of genes24,52.
A good example of the capabilities and limitations of GWA studies is type 2 diabetes mellitus18,53–57 (T2DM; TABLE 1). Two studies examined association both with SNPs and haplotypes in the discovery phase54,57. Haplotype-based analysis can be more powerful than marker-by-marker analysis in association studies22,58–61. For example, haplotypes can correlate a specific phenotype with a specific gene in a small population sample even when individual SNPs cannot62. Case-control and family-based association studies were employed in several studies of T2DM.
The replication phases of these studies were impressive; two of them included over 9,000 replication individuals53,54. One study sought to replicate signals identified by the WTCCC study18 by genotyping the most significant SNPs; 9 of 77 candidate SNPs reached a P <5 × 10−7 significance level63. The eight genes represented by these SNPs were replicated in at least one other independent study (TABLES 1–3). The concordance of T2DM-associated genes between GWA studies is striking: of 10 novel associations, only two were unique to a single study.
Reassuringly, some of the genes identified by GWA in studies of TD2M have previously been associated with the disease in other types of genetic studies. For example, transcription factor 7-like 2 (TCF7L2) had previously shown linkage to T2DM in the Icelandic population, and significant association in a candidate gene association study64. Heterozygous and homozygous carriers of TCF7L2 risk alleles had relative risks of 1.45 and 2.41, respectively. TCF7L2 is a transcription factor that regulates the pro-glucagon gene in entero-endocrine cells65. TCF7L2 alleles have also shown associations with endophenotypes such as a lower likelihood of response to the oral hypoglycaemic drug sulphonylurea66 and increased risk of progression to T2DM among persons with impaired glucose tolerance67.
In common with IBD, T2DM associations exhibited small estimated-effect sizes. Some of the candidate genes from GWA studies were consistent with biological processes that have previously been implicated in the pathogenesis of T2DM, such as pancreatic islet beta-cell function and insulin biosynthesis. However, these studies also suggested new components of these processes, such as zinc transport and Wnt-signalling56,64,68. Validation of T2DM candidate genes as therapeutic targets will require additional studies to identify causal susceptibility alleles and to determine their precise effect on cell biology.
Three studies performed initial modelling of how loci combine to affect susceptibility to T2DM56,63,69. One study found evidence of epistatic interactions between two genes. Otherwise, T2DM appeared to fit a polygenic threshold model with additive/multiplicative effects of individual loci. However, until the causal alleles that underpin these association signals have been found, it is not possible to make categorical statements about the allelic architecture of T2DM.
Frequencies of T2DM associated alleles showed considerable variation between ethnic and racial groups. Despite these differences, however, T2DM-associated risk alleles were conserved between independent populations, implying an ancient origin of these polymorphisms70.
Expansion of an initial association of an allele with a categorical trait (such as the presence of a disease) with quantitative component phenotypes (endophenotypes) is an approach pioneered with apolipoprotein E (APOE) alleles in Alzheimer’s disease. It appears to be highly instructive in elucidating the mechanism of action of alleles in disease pathogenesis. One T2DM GWA extended its analysis to a quantitative endophenotype: T2DM-related obesity (measured by body-mass index (BMI); TABLE 1)54,71. Alleles associated with T2DM in the fat-mass and obesity-associated gene (FTO)18,55,63 also showed an association with BMI (TABLES 2,,3).3). Association of FTO with obesity has since been confirmed72.
Two GWA studies examined T1DM. One examined 6,500 nsSNPs28 and the other evaluated 392,575 SNPs18. Four T1DM, susceptibility loci had previously been identified by linkage-based methods (class II MHC alleles, CTLA4, PTPN22 and insulin). GWA studies replicated the association with PTPN22 and identified several novel loci, including C12orf30, KIAA0350 (also known as CLEC16A) and IFIH1 (each replicated in two studies). Twenty-one T1DM candidate genes that have previously shown linkage or association are currently undergoing replication studies73.
T1DM, like rheumatoid arthritis and IBD, is an autoimmune disorder. Medical practitioners have long noted familial aggregation of autoimmune diseases. One study showed association of both rheumatoid arthritis and T1DM with specific polymorphisms (IL2RA-rs2104286 and PTPN22-rs6679677; TABLE 2)18. T1DM, rheumatoid arthritis and IBD also show association with MHC alleles74–76. These findings suggest common underlying aetiological pathways (and therapeutic targets) for several, common autoimmune disorders77.
GWA studies of cancer based on common, inherited SNPs are useful for the identification of germ-line risk alleles, but not somatic mutations. Three GWA studies sought inherited association signals in prostate cancer53,78,79; FIG. 2 shows details of the discovery phase of one of these studies. An association signal at chromosome 8q24 that had previously been identified by linkage analysis80 was replicated in two GWA studies53,79. In addition, these studies identified a second 8q24 association, approximately 300 kb upstream from the first. As yet, the functional basis of these associations is unclear. Although individual 8q24 alleles showed modest estimated genetic effects, the cumulative effect of several loci fit a multiplicative model that conferred a population-attributable risk (PAR), that is, an expected reduction in prostate-cancer incidence if the risk alleles did not exist in the population, of up to 68%81. As noted above, PAR values are strongly affected by allele frequency and represent only an approximate measure of the contribution of those alleles to disease incidence.
One study of prostate cancer78 identified a TCF2 (also known as HNF1B) susceptibility allele. Intriguingly, this allele appeared to diminish the risk of T2DM (TABLE 2), possibly representing antagonistic pleiotropy. This is supported by epidemiological evidence which suggests that diabetic men have a slightly lower prostate cancer risk than non-diabetic men82.
In addition to common diseases, GWA studies are applicable to complex traits. One study undertook GWA with numerous quantitative and categorical memory-associated endophenotypes87. Despite a small discovery cohort (341 individuals), associations with the KIBRA (also known as WWC1) gene have been replicated87–89. A notable innovation in this study was that associations were sought with multi-scale and multi-modality endophenotypes; that is, performance in seven memory-associated tests and functional magnetic resonance image-based measures of the hippocampus during three memory-associated tests. This study provides evidence that progress can be made in the elucidation of the genetic determinants of subjective, qualitative neurologic traits by using objective, quantitative, surrogate endophenotypes.
As well as identifying novel associations, GWA studies have confirmed several susceptibility genes that were previously established by linkage analysis in large pedigrees. For example, a GWA study of late-onset Alzheimer’s disease (LOAD) identified the well-established APOE-susceptibility allele90. This association was also replicated in a study that genotyped 17,343 putative functional cSNPs23.
A remaining problem with large GWA studies is the cost of genotyping, but one study provided evidence that sample pooling strategies might help to overcome this issue. In a GWA study of bipolar disorder, investigators created 39 pools, containing DNA from 2,672 individuals91. These pools were used for both discovery and replication experiments. Pools were individually genotyped for 555,235 SNPs and normalized allele frequencies were inferred from intensity data. Replicates were assayed for each pool. Thirty-seven SNPs showing allele frequency differences in both cohorts were individually genotyped and one SNP retained a significant association. The aforementioned WTCCC study also studied bipolar disorder, identifying an association at 16p12 (REF. 18). One locus, for glutamate receptor, metabotropic 7 (GRM7), showed association in both studies.
The rate of publication of GWA studies continues to increase. Recent studies have investigated asthma92, nicotine dependence93, coronary artery disease19,26, atrial fibrillation94, prolonged QT interval and sudden cardiac death95,96, coeliac disease97, lung cancer98, psoriasis21 and liver cirrhosis25, among others (TABLE 1).
The utility of GWA studies for the identification of novel genomic associations with complex diseases has unambiguously been established over the past year. In general, GWA studies have employed large case–control cohorts featuring both familial and sporadic cases, categorical trait definitions and up to half a million commonly polymorphic SNPs. To date, with the exception of CFH in AMD, the estimated genetic effects of replicated associations have been uniformly and surprisingly small.
Encouragingly, most associated haplotype intervals identified to date are sufficiently small to feature a single gene. In large measure, this reflects the use of several, outbred populations in confirmatory, fine-mapping studies. Even when the association is within a single gene, the predisposing variant might affect an adjacent gene, as in adult lactose intolerance99. Although some association intervals have been found to contain a single, unequivocally functional gene variant, the causality of alleles has been established in only a minority of cases. Causal alleles identified to date do not yet show much difference in genetic mechanism from those identified in Mendelian disorders; this could reflect ascertainment bias1,83,84.
Many genes identified by GWA were not candidate genes previously, highlighting the hypothesis-informing value of genetic studies. Already, there are examples of potentially tractable therapeutic targets that had not previously been considered in a disease or trait. As yet, the confluence of associated genes into biological networks and pathways is at an early stage. In part, this reflects scant or incorrect annotation of many genes. There appears to be a significant conservation of associations of common alleles between human populations. Thus, to date, it appears that GWA studies are fulfilling expectations with regard to the elucidation of molecular mechanisms underpinning poorly understood, common diseases.
In the few informative studies reported to date, endophenotypes have been highly instructive in dissecting the network or pathway that is perturbed by an individual allele, which affects a complex trait. It is particularly exciting to see the application of multi-mode endophenotypes, such as combinations of psychological testing, brain imaging and gene expression87. This is clearly an area of potential opportunity.
The cost of enrolling the very large cohorts that are needed to discover and validate alleles with small effect sizes has hitherto precluded the collection and integration of rich, accurate clinical metadata. It is likely that future studies will use a much greater stratification of traits than the phenotypically crude studies reported so far. Recent GWA studies of breast cancer provide a good example of the added genetic complexity that can be revealed by trait stratification19,85,86. In addition, following replication of associations with categorical traits, it is anticipated that targeted genotypic examination of many endophenotypes will be highly instructive in the dissection of the role of individual alleles in disease pathogenesis.
GWA studies show significant potential to redefine disease classification. In some cases, GWA studies are identifying molecular factors that enable patient stratification and might prove useful in personalized medicine. Cancers provide the clearest examples of this to date. In other cases, exemplified by IBD, GWA studies are pointing to common molecular underpinnings in diseases that were believed to be distinct. In restless leg syndrome, replicated associations have provided concrete evidence that the phenotype represents a bona fide neurological disorder100,101. In mental illness, there is great anticipation that GWA studies will provide an objective, molecular revision of disease categorization.
Many questions remain concerning the genetic architecture of common diseases. These include the extent of locus and allelic heterogeneity, fit with an additive-threshold model or, alternatively, the extent of epistasis (the relative contributions of rare and common, and high and low, penetrance alleles) and various types of variation, from genome rearrangements to SNPs. GWA studies are not designed to evaluate these questions. Once loci have been identified, however, methods such as deep resequencing can nominate candidate susceptibility alleles and provide data for the evaluation of genetic architecture102. Meaningful, individual risk determinations will require the identification of causal alleles, the development of multiplexed molecular diagnostics and significant modelling.
The trends observed in recent GWA studies are anticipated to continue. Chips with 900,000 and 1,000,000 million SNPs were recently launched and genotyping accuracies have improved. Cohort sizes are steadily increasing and biobanks of unparalleled size and phenotype definition are being established. Combinations of genotype- and haplotype-based associations are becoming more prevalent. Experimental designs and statistical methods are also becoming more uniform, enabling more meaningful meta-analysis. In particular, the emergence of adaptive designs and the use of Bayesian inferential methods will produce a probabilistic synthesis from combined analyses83. Importantly, this will provide an intuitive framework for combining information from multiple studies, resulting in more effective detection and replication of weak associations103.
As noted above, phenotypes studied to date have been crude. The use of endophenotypes is expected to increase significantly. In particular, biomarker phenotypes are anticipated to become widely used. These will probably include gene expression, proteomic, metabolomic and imaging biomarkers. As determinants of complex traits are identified, genetic stratification will become possible, potentially reducing the genetic complexity of traits and enabling the identification of additional association signals. An example of this was the recent use of periodic limb movements and serum ferritin levels in GWA studies of restless leg syndrome100. An area of substantial future interest for the pharmaceutical industry will be pharmacogenetic GWA studies to identify markers for patient stratification in clinical trials. Comprehensive pharmacogenetic information will, in turn, facilitate the practice of personalized medicine. Pharmacogenetic GWA studies and early adoption of personalized therapy are likely to be used in the selection of expensive or chronic medications in life threatening conditions or where the therapeutic index is narrow or adverse event concerns are high, such as cancer chemotherapy.
Despite the current excitement, GWA studies have only been able to account for a small proportion of the expected genetic variance in complex traits24,102. This is not surprising given current limitations. First, current GWA studies are designed to identify common risk alleles that are predicted to be important in complex disorders under the common disease/common alleles hypothesis5,6. Increasing evidence suggests that some complex disorders and traits, such as schizophrenia, hypercholesterolaemia and body mass, are genetically heterogeneous104–107. The genetic basis of such diseases is more likely to conform to the common trait/rare variant hypothesis, which proposes that many rare variants exist, with substantial allelic heterogeneity at causal loci58,108,109. The GWA approach is unable to detect susceptibility loci that harbour numerous, individually rare (recent), polymorphisms. Instead, a resequencing approach will be needed to identify rare alleles. Encouragingly, massively parallel sequencing methods provide a potential solution102,107,110, suggesting disease-specific rare alleles and recent mutations that provide supplementary genotyping array content. Second, a proportion of the genome cannot effectively be examined on the basis of tag SNP genotypes. Approximately 20% of the genome is comprised of recombination hotspots that are not amenable to LD-based approaches7. Alternatively, at recombination coldspots, haplotype blocks might be too large for unambiguous identification of causal loci. The extent of the effect of genomic copy number variation (CNV) on association signals is not yet clear, although recent genotyping arrays do provide CNV information. Insufficient numbers of cases will be available for GWA studies of many orphan diseases, uncommon disease complications or adverse events. For some common diseases, these considerations could obfuscate a substantial proportion of the genetic variance. Supplementation of genotyping array content reflective of CNV regions should, however, circumvent some of these limitations. Use of adaptive statistical methods and resampling strategies might also circumvent the need for thousands of affected individuals in studies of orphan diseases83.
GWA successes are creating substantial need for downstream genetics, biochemistry and cell biology efforts to confirm the biological relevance of genotype–phenotype associations and to elucidate the underlying mechanisms of disease. This is especially true of association signals in gene deserts or alleles without apparent functional consequence. Translation of the fruits of GWA studies to clinical practice will require the derivation of predictive models of the genetic architecture of complex traits that evaluate with much greater precision the contributions of factors such as epistasis, genocopies, phenocopies and penetrance.
A Deo lumen, ab amicis auxilium. This work was partially supported by National Institutes of Health grants N01A000,064 and U01AI066,569, and by National Science Foundation grant 0524,775. The authors thank the reviewers for their helpful suggestions.
Entrez Gene: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene APOE | ATG16L1 | BAT1 | C3 | CFB | CFH | CLEC16A | FTO | GRM7 | HLA-B27 | HNF1B | IFIH1 | IL2RA | IL23R | NFKBIL1 | NOD2 | PTGER4 | PTPN2 | PTPN22 | RELA | TCF7L2 | WWC1
OMIM: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM age-related macular degeneration | acute myocardial infarction | Alzheimer’s disease | anterior uveitis/ankylosing spondylitis | breast cancer | colorectal cancer | Crohn’s disease | prostate cancer | restless leg syndrome | type 1 diabetes mellitus | type 2 diabetes mellitus | ulcerative colitis
Stephen Kingsmore’s homepage: http://www.ncgr.org
ALL LINKS ARE ACTIVE IN THE ONLINE PDF