|Home | About | Journals | Submit | Contact Us | Français|
Atherosclerotic vascular disease is a major health care burden, being the leading cause of morbidity and death worldwide.1 A better understanding of the genetic basis of atherosclerotic vascular disease is urgently needed to provide new insights into the underlying pathophysiological mechanisms and facilitate development of novel diagnostic and therapeutic modalities. The advent of genome-wide association (GWA) studies (see supplementary Table 1 for glossary) is an important step in this direction, having led to the identification of susceptibility alleles for many of the common ‘complex’ diseases. This is in contrast to genetic linkage studies, which had limited success in identifying genes for ‘complex’ diseases or quantitative trait loci and candidate gene-based association studies, the results of which have been mostly irreproducible.
GWA studies became possible with the completion of the Human Genome Project,2 the discovery of millions of single nucleotide polymorphisms (SNPs) in the human genome, and the International HapMap Project3 which characterized the patterns of linkage disequilibrium (LD) in the human genome, as well as the availability of high-throughput genotyping platforms and decreased costs of genotyping. In contrast to candidate gene studies in which genes are selected on the basis of known or suspected disease mechanisms, GWA studies permit a relatively comprehensive scan of the genome in an agnostic fashion and thus have the potential to identify novel disease susceptibility or quantitative trait loci.
Although there are at least 7 million common SNPs (minor allele frequency >5%) in the human genome,4 neighboring SNPs are often strongly correlated with each other (ie, in LD). LD is measured by the r2 statistic, which indicates the correlation of alleles at two sites, and ranges from 0 (no correlation) to 1 (perfect correlation). GWA studies take advantage of patterns of LD, such that genotyping 500k SNPs (in non-African samples) can achieve high coverage of ~ 90% of all known SNPs, despite directly testing less than one-tenth of the SNPs. The GWA design is based on the assumption that common variants with modest effects on a complex trait exist and explain substantial proportion of variation in the trait. GWA studies have 2 key advantages compared with the hitherto widely used family-based linkage approaches. GWA signals are localized to small (10~100 kb) regions of the chromosome, making fine mapping of the actual disease susceptibility/quantitative trait locus less-effort intensive. Additionally, these studies can identify alleles with modest effect size that are unlikely to be uncovered with linkage studies.
In a relatively short time, GWA studies have identified >130 susceptibility alleles for common ‘complex’ diseases (http://www.genome.gov/GWAstudies/)5 including atherosclerotic vascular disease (Table 1). In addition, this approach has also been used to identify genetic variants that influence quantitative traits related to vascular disease, such as body mass index,13,14 plasma lipid levels,10,15–17 and circulating markers of inflammation.18–21 At least 35 loci that influence atherosclerotic vascular disease and related intermediate traits have been identified. Many of these discoveries point to previously unknown etiologic pathways in disease pathogenesis, highlighting the potential of this research approach to provide new insights into pathophysiological mechanisms underlying atherosclerotic vascular disease.
Our review summarizes the methodological approaches to a GWA study, and provides an update on the results of GWA studies for atherosclerotic vascular diseases - coronary artery disease (CAD), peripheral arterial disease (PAD), and aneurysmal disease - and risk factors for atherosclerosis, including obesity, type 2 diabetes, hypertension, plasma lipid levels, and markers of inflammation. Finally, we discuss limitations of the GWA approach in identifying susceptibility genes for cardiovascular diseases. Recent reviews provide a genetic epidemiologic perspective on GWA studies5,22 and as well as an emerging consensus regarding the GWA approach and the challenges of such studies.23
For case–control studies, samples of unrelated affected and unaffected individuals are ascertained from the study population. The selected case and control subjects should be matched for sociodemographic variables including age, and race/ethnicity and be from the same geographic regions. For a complex disease, such as atherosclerotic vascular disease, several possible phenotypes may be studied, including CAD, myocardial infarction (MI), PAD, and aortic aneurysmal disease. Although considerable overlap exists among these phenotypes, the underlying pathophysiologic factors may be quite different. For example, MI often results from rupture of a ‘vulnerable’ atherosclerotic plaque whereas angina is typically due to advanced coronary atherosclerosis and subsequent luminal compromise.24 Aortic aneurysmal disease commonly results from atherosclerosis-related weakening of the arterial wall. Selection of controls for atherosclerosis phenotypes is also problematic since many individuals may have subclinical disease. A possible solution is the use of so-called ‘super controls’ who are older, have no known history of disease and have been screened for the presence of subclinical disease. However, this may result in odds ratios that are biased. Standardizing phenotype measurement is particularly important when samples are pooled across many centers and if meta-analyses are planned.
Current GWA studies assay a dense set of markers across the genome in study samples. SNPs can be selected to obtain an evenly spaced coverage across the whole genome or to capture the common variations in the whole genome as defined by the HapMap database. Affymetrix® Inc (Santa Clara, CA) and Illumina® Inc (San Diego, CA) provide the commonly used platforms for genotyping, using different SNP selection strategies. Illumina’s probes are based almost entirely on haplotype-tagging SNPs identified in the HapMap data (www.illumina.com). In contrast, the SNP probes on Affymetrix arrays are for random SNPs chosen to cover the genome, supplemented by tag SNPs (www.affymetrix.com). Genomic coverage of GWA genotyping platforms is often estimated by the percent of common SNPs having an r2≥0.8 with at least 1 SNP on the platform and varies with platform (Figure 1). For example, the Affymetrix 500k and Illumina 1M chips provide coverage of about 70%25 and 95% of the common variation (r2 > 0.8) in European populations in the HapMap database. Because of the lower LD in populations with recent African ancestry, only ~40% of the common variation is covered by the Affymetrix 500k chip. Platforms consisting of ~ 1 million SNPs provide better coverage in African populations but may not be the most cost-efficient for GWA studies in European populations.26 In the latter, a cost-effective strategy may be to genotype a large number of individuals with the ~300k-500k tag SNPs and carry out imputation of the untyped SNPs.27
The number of probes for copy number variants (CNVs) varies with different platforms. CNVs range from 1 kb to 5000 kb of DNA and may influence disease susceptibility.28,29 Genome-wide SNP assays often include tags to capture CNVs. The majority of CNV tags are not SNPs and therefore have little effect on the coverage of, and power to detect association with SNPs.26 The Illumina HumanCNV370-Duo SNP array provides comprehensive coverage of 14,000 regions with CNVs. In the newly developed HumanHap650Y and Human1M array, there are ~4,300 and ~260,000 SNPs located in novel and reported CNVs regions (www.illumina.com). The GeneChip Array 6.0 contains more than 906,600 SNPs and more than 946,000 probes for the detection of CNVs (www.affymetrix.com).
Genotyping errors may cause spurious associations as well as weaken power and must be accounted for before proceeding with statistical analyses.30 In particular, differential genotyping error rates in cases and controls may inflate type-I error. Visual inspection of genotype cluster plots or intensity values generated by the genotyping assay for SNPs that show significant association, helps ensure that the associations do not merely reflect genotyping artifact.8 Typically, genotyping data should meet the following criteria for quality control:22 1) the minimum rate of successfully genotyped SNPs per sample, the proportion of samples for which a SNP can be measured (ie, SNP call rate), and the concordance rates in duplicate samples should be >99%; 2) severe departure of Hardy-Weinberg equilibrium (P<10−4 to 10−6) is not present; and 3) there are no Mendelian inheritance errors in nuclear families.
Associations between SNPs and phenotypes are tested by counting alleles or genotypes.31 A genetic model (eg, a dominant model versus an additive model) must be specified to determine the appropriate test for association. In the dominant model, the heterozygotes have the same risk or mean value of a quantitative trait as one of the homozygotes; whereas the additive model assumes that each additional copy of the variant allele increases the odds ratio (OR), or mean value of the quantitative trait, by the same amount. The Cochran-Armitage trend test is a genotype-based contingency-table test for association that detects trends across ordinal categories (ie, genotypes).23 Analysis of variance is typically used to test for association between a genotype and quantitative phenotypes. The results of GWA studies are usually plotted as −log10 P-value of each SNP versus its chromosomal locations (the so-called ‘Manhattan’ plot). A quantile-quantile (Q-Q) plot is used to characterize the extent to which the observed distribution of the test statistic follows the expected (ie, null) distribution.8 In addition to a true association, systematic bias due to unrecognized population structure and genotyping artifacts can distort the null distribution and present as deviations from the identity line.
Typically, in a GWA study, a test of association is performed at each assayed SNP, correcting the significance level (P-value) for multiple testing. A P-value of 5×10−8 (a P value of 0.05 after a Bonferroni correction for 1 million independent tests)32 has been proposed as a conservative threshold for declaring a significant association in a GWA study. Other methods, such as the family-wise error rate,33 the false-positive-report probability,9,34 the false discovery rate,35 Bayes factors,8 and permutation testing have been proposed to correct for multiple testing. Most recent studies have used P-values between 10−4 and 10−7 to minimize false associations.8,13,14,36–39
Population stratification or population structure, maybe a source for spurious association and should be assessed and reported in GWA studies. Several statistical methods (ie, structured association, genomic control, and principal components analysis) have been used to correct for population stratification.40 The method of structured association uses anonymous genetic markers (ie, neutral markers) scattered throughout the genome to assign the samples to discrete subpopulation clusters and then assess evidence of association within each cluster.41 ‘Genomic control’42 is based on the fact that in the presence of population substructure, the standard χ2 statistic used in case-control studies is inflated by a multiplicative factor (λ). This factor is proportional to the degree of stratification and can be estimated with a set of unlinked genetic markers across the genome. By rescaling the χ2 test statistic using λ in the disease-marker association tests, background population differences can be corrected for. Principal components analysis43 is used to model ancestry differences of cases and controls, and the association statistics are computed using ancestry-adjusted genotypes and phenotypes.
There is considerable interest in methods that can impute genotypes at SNPs not directly genotyped. These methods use genotypes at variants in LD to infer missing genotypes at untyped variants. Power in GWA studies can be modestly improved by testing for associations that are attributable to variants that have not been directly typed but imputed.44 Furthermore, this approach can facilitate combining data from genome-wide scans that use different SNP sets. With the use of the ‘imputed’ genotype data, association tests can be performed at a finer resolution across the genome.44 If significant association is found for an imputed SNP, confirmation should be sought by genotyping.45
In common ‘complex’ diseases, most susceptibility variants have modest effect sizes with ORs of 1.1~1.5.46 The statistical power to detect association between genetic variation and a phenotype is a function of several factors, including the frequency of the risk allele or genotype, the effect size of the associated allele or genotype, the correlation between the genotyped marker and the ‘true’ risk allele, sample size, and genetic heterogeneity of the study population. Statistical power is enhanced when risk allele frequency is higher, the risk allele has a relatively strong effect, and a large sample size is available. Typically, sample sizes of at least 1,000 cases and 1,000 controls are required to detect odds ratios >1.5 with at least 80% power. Much larger samples are needed to detect risk alleles with ORs < 1.5 and to detect gene-gene, or gene-environment effects. More penetrant alleles with large effects will require smaller samples.
Stringent statistical thresholds reduce false-positive results but may overlook true associations. False-negative results may also result from the lack of the genetic variant of relevance on the genotyping platform, genetic heterogeneity between cases and controls, or lack of variation in that SNP in the population under study.22 Several variants associated with cardiovascular disease in candidate-gene based association studies have not been confirmed in GWA studies, eg, the C677T SNP in the 5,10-methylenetetrahydrofolate reductase (MTHFR) gene, and Q192R SNP in the paraoxonase 1 (PON1) gene. Incorporating prior probability (eg, based on results of previous linkage analyses),47 gene expression data,48 and pathway analyses49 may enhance the interpretation of GWA study results.
In a GWA study, replication of results in independent samples is critical for separating the many false-positive associations from the few true-positive associations. Several criteria for replication have been proposed,45 including the use of the same or similar population cohort and phenotype, adequate sample size, the same genetic model, and a similar effect size and statistical significance for the same SNP or a SNP in high LD with the original SNP (r2 close to 1.0) as the initial report. Selection of SNPs to be replicated from the initial study should be based on LD structure, putative functional data or published medical literature. Statistical significance should first be obtained with the genetic model reported in initial study and a joint or combined analysis should lead to a smaller P-value than that seen in the initial report. Failure to replicate initial results45 may be due to insufficient statistical power to detect risk alleles of modest effect sizes due to reasons alluded to above. Recently, the ‘flip-flop’ phenomenon, where an initial study finds an allele to be protective but a follow-up study finds it to be the risk allele, has been reported.50 Flip-flop phenomena may result from either variation in LD architecture at the susceptibility locus across different ethnic populations, or as an artifact of sampling variation within a sample from the same ethnic group.50
When an association signal is detected, fine mapping and biochemical assays are needed to confirm the causal variant and its functional effects. For most GWA ‘hits’, systematic efforts to identify the underlying causal variant or variants have yet to be reported. Both bioinformatics and experimental tools will be needed to maximize the biological information obtained from GWA studies. Bioinformatics tools can be used to display association data in the context of the increasingly rich functional annotation of the genome. For example, the application ‘WGAviewer’51 identifies the associations that seem most likely to be biologically relevant and selects genomic regions that may need to be resequenced in a search for causal variants. To identify putative causal variants, the pattern of LD in the genomic region of interest should be characterized on the basis of resequencing data. Many of the complex-trait susceptibility variants so far identified map to sequence of unknown function that is some distance from the nearest coding sequence, and the design of appropriate functional assays is likely to pose challenges.23 Such variants may regulate gene expression and the availability of genome-wide profiles of gene expression along with dense genotype data from the same samples could be a valuable resource for functional annotation of these variants.48
The database of Genotypes and Phenotypes (dbGaP) was established by the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov/dbGaP) to archive and distribute the results of GWA studies.52 In addition, a catalog of published GWA studies is available online at the National Human Genome Research Institute (NHGRI) Website (www.genome.gov/GWAstudies). Several initiatives have been launched by the NIH, including the SNP Typing for Association with Multiple Phenotypes from Existing Epidemiologic Data (STAMPEED), the Genes, Environment, and Health Initiative (GEI), and the Electronic Medical Records and Genomics (eMERGE). The STAMPEED project supports studies to identify genetic variants related to heart, lung, and blood disorders and their risk factors; the GEI aims to analyze genetic variation in groups of patients with specific illnesses and develop technology to produce and validate new methods for monitoring environmental exposures that interact with a genetic variation and thereby result in human diseases; and the eMERGE investigators will study the relationship between genetic variation and common human disease/traits, leveraging data present in electronic medical records. Data will be available at no cost to qualified investigators.
Vascular diseases as well as related quantitative traits have been the subject of several GWA studies. The tables herein summarize the results of published GWA studies for several atherosclerotic vascular diseases and intermediate phenotypes, and we discuss these further below.
GWA studies have identified several statistically significant and replicated loci associated with CAD in Europeans (Table 1). The effect size (OR) of susceptibility alleles is modest, ranging from 1.10 to 1.37. Three independent studies6–8 identified common sequence variants on chromosome 9p21 that were associated with CAD. The findings were replicated in 875 cases with history of MI and 1,644 controls from the German MI family study.9 Two other loci associated with CAD in the Wellcome Trust Case Control Consortium (WTCCC) were replicated in the German MI family study, including MTHFD1L and an intergenic locus. A combined analysis of data from the WTCCC and German MI family studies identified additional 4 loci (near CELSR2/PSRC1/SORT1, MIA3, a locus at chromosome 10q11.21, and SMAD3) significantly associated with CAD (P<1.3 × 10−6) with a high probability (> 80%) of a true association.9 Several of the newly identified genes and loci are promising candidates.53 For example, SORT1 has been implicated in mediating endocytosis and degradation of lipoprotein lipase (LPL), MTHFD1L is involved in various cellular processes including purine and methionine synthesis, and PSRC1, MIA3, and SMAD3 have a role in cell growth. However, further analysis of the loci in a wider range of subjects is necessary. Alleles associated with plasma lipid levels have also been noted to be associated with CAD in the WTCCC sample.10 The effect sizes and statistical significance of variants associated with CAD are summarized in Table 1. The population-attributable fractions for several of the susceptibility of SNPs were noted to be substantial (10% ~ 22%), although the estimates are likely inflated when based on ORs instead of relative risks.9
Two GWA studies have identified variants associated with PAD. Helgadottir et al.11 investigated the contributions of the 2 common variants - rs10757278 and rs10811661 - on chromosome 9p21 that have been reported to be associated with CAD and type 2 diabetes (T2D), respectively, to PAD. The SNP rs10757278, but not rs10811661, was associated with PAD (OR = 1.14) in several populations (Table 1, Figure 2).11 Thorgeirsson et al.12 found a common variant in the nicotinic acetylcholine receptor gene cluster on chromosome 15q24 to affect nicotine dependence, smoking quantity, and the risk of PAD and lung cancer (Table 1). A synonymous SNP (rs1051730) within the cholinergic receptor nicotinic alpha 3 (CHRNA3) gene was significantly associated with PAD (OR = 1.19). This study provides an example of a SNP associated with a behavioral disorder (increased smoking quantity and addiction to nicotine) and, in turn, with the risk of a common ‘complex’ disease.
Helgadottir et al.11 investigated the contributions of the common variants on chromosome 9p21 to abdominal aortic aneurysm (Table 1). The SNP rs10757278-G was associated with abdominal aortic aneurysm (OR = 1.31) (Table 1, Figure 2).11 Interestingly, the G allele was also associated with intracranial aneurysms, which are not related to atherosclerosis, suggesting that the 9p21 locus may influence abnormal vascular remodeling and/or repair.11
The chromosome 9p21 region of association is limited to a 9-kb region flanked by strong recombination hotspots, distant from any annotated gene. Members of the gene family, cyclin-dependent kinase inhibitors (CDKN2A and CDKN2B), are the nearest genes. Different variants at the chromosome 9p21 locus are associated with different phenotypes (Figure 2). The SNP rs10757278 is associated with CAD and other vascular diseases but not with T2D,6,11 whereas rs10811661 iss associated with T2D but not CAD and other vascular diseases.8,11,37–39 Figure 2 summarizes the ORs of these 2 SNPs on chromosome 9p21 for different disease phenotypes including CAD/MI, PAD, abdominal aortic aneurysm, intracranial aneurysm, and T2D. These findings suggest that: 1) loci on chromosome 9p21 responsible for vascular disease phenotypes and T2D are distinct; 2) the same locus is associated with CAD and MI and therefore is unlikely to be a ‘plaque’ vulnerability locus; and 3) the same locus is associated with PAD, aneurysmal disease, and coronary artery calcification.7 How this locus mediates susceptibility to vascular disease is not yet known, although inhibition of TGF-β-induced growth has been speculated.54 An antisense non-coding RNA gene (ANRIL) on the high-risk haplotype spanning 53 kb, is expressed in tissues and cells that are affected by atherosclerosis and has been put forward as a potential candidate gene at the chromosome 9p21 CAD locus.55
SNPs in the FTO (fat mass and obesity associated) gene have been associated with body mass index and with early-onset and severe obesity (body mass index > 40 kg/m2) in GWA studies.13,14 The FTO gene is expressed in the hypothalamus, the key part of the brain that influences appetite. Another GWA study in 16,876 individuals of European origin identified a SNP (rs17782313, P=2.9×10−6) downstream of the melanocortin-4 receptor (MC4R) gene, as being associated with body mass index in adults and children and with severe obesity in children (OR=1.3, and P=8.0×10−11).56 Chambers et al.57 also showed the MC4R SNP to be associated with insulin resistance (P=3.2×10−6) in 2,684 Indian Asian and European individuals, independent of body mass index.
Novel insights into T2D genetics have been provided by at least 6 GWA studies58 that identified 11 common variants in 11 genes as influencing the risk of T2D. Recently, the DIAGRAM consortium59 performed meta-analyses of 3 GWA scans for T2D comprising 10,128 individuals of European descent and ~2.2 million SNPs (a combination of measured and imputed SNPS), followed by replication in an independent sample, with an effective sample size of up to 53,975. At least 6 previously unknown loci with robust evidence for association were identified, highlighting the value of large consortia for complex disease genetics. However, none of the variants have, as yet, been implicated in CAD, although as noted above, the chromosome 9p21 loci for T2D and CAD are adjacent.
In the WTCCC study of 2,000 cases and 3,000 controls,8 no locus was noted to be significantly associated with hypertension. In the Framingham Heart Study, Levy et al.60 also did not find a significant association signal for blood pressure. The failure to find susceptibility/quantitative trait loci may be due to significant genetic and phenotypic heterogeneity for this phenotype, risk alleles that have very modest effect size, and rare variants might have an important role in blood pressure regulation.
Several GWA studies10,15–17,37 have been performed to detect SNPs associated with interindividual variation in plasma lipids including levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These separate GWA reports have identified 8 novel genes/loci that contribute significantly to variations in lipid levels, including ANGPTL3, CELSR2/PSRC1/SORT1, GALNT2, GCKR, MLXIPL, MVK/MMAB, NCAN/CILP2/PBX4, and TRIB1. Excepting the loci near MVK/MMAB, all the newly identified loci were replicated in at least 2 GWA studies. In addition, 11 gene regions with prior evidence for association with lipid traits were confirmed in the 4 GWA studies.10,16,17,37 These genes are ABCA1, APOA1, APOB, APOE, CETP, HMGCR, LDLR, LIPC, LIPG, LPL, and PCSK9. Several SNPs within these genes have been shown to be associated with the corresponding lipid trait and most of the associated SNPs are located within intronic, regulatory, or intergenic regions, indicating the need for fine mapping to identify the true ‘causal’ variants. Some loci were associated with 2 lipid traits, such as APOB with LDL cholesterol and triglycerides, APOA1/C3/A4/A5 and LPL with HDL cholesterol and triglycerides, suggesting that these variants influence common metabolic pathways for certain lipid traits.
Circulating C-reactive protein (CRP), a marker of inflammation, is associated with increased risk of coronary heart disease.61 Two GWA studies for plasma CRP levels have been reported (Supplementary Table 2).18,19 In the Women’s Genome Health Study (WGHS),19 seven loci were associated with plasma CRP, explaining 0.6% to 3.4% of the variance in CRP levels. These loci are implicated in metabolic syndrome (GCKR, APOE), insulin resistance (LEPR), beta cell function (HNF1A), and atherosclerosis (IL6R, CRP). Another GWA study18 confirmed the association between common variants in HNF1A, CRP and APOE genes and plasma CRP levels. Of note, numerous SNPs in each of these genes were associated with plasma CRP. A GWA study62 of a subset of participants in the WGHS (Supplementary Table 2), showed SNPs in the intercellular adhesion molecule 1 (ICAM1) gene to be associated with plasma levels of soluble ICAM1. In addition, a SNP in the ABO locus was also associated with plasma levels of soluble ICAM1, indicating an unexpected association of a blood group antigen with inflammatory processes. This study highlights the potential of GWA studies of circulating proteins to identify novel trans-acting loci that affect gene expression.
Uric acid has been implicated as a risk factor for coronary heart disease and hypertension and participates in oxidative stress and inflammation pathways.63 Two GWA studies have implicated the SLC2A9 gene as influencing serum uric acid levels (Supplementary Table 2).20,21 Polymorphisms in SLC2A9 may affect uric acid synthesis and renal reabsorption or both,64 thereby influencing serum uric acid levels.
Although numerous novel susceptibility loci have been identified in GWA studies, fine localization of the causal variant or variants remains a major challenge. This is because many of the susceptibility variants map to sequence of unknown function without any obvious nearby candidate genes. Resequencing of the interesting genomic region may be needed to identify the ‘causal’ variant followed by subsequent functional annotation to understand how the variant influences disease susceptibility, as discussed above.
The genetic factors contributing to complex diseases/traits are presumed to be a combination of common (frequency > 0.05), less common (frequency 0.01–0.05) and rare (frequency < 0.01) sequence variants. Rare variants may exert relatively strong phenotypic effects in the individuals carrying them, and may be more valuable in individualized risk stratification, given their greater predictive value.65 However, GWA studies are not powered to detect rare variants and resequencing of genomes/candidate genes may be needed to identify rare etiologic variants.66–68 Next-generation sequencing platforms are poised to provide low-cost, high-volume sequence data and the cost for resequencing exonic regions of the genome is now approaching that for GWA studies.69 The recently announced 1,000 Genomes Project (http://www.1000genomes.org) to sequence the genomes of at least 1,000 asymptomatic individuals will provide a high resolution catalogue of human DNA variation. The sequence data will be made public, however the phenotype data will not be available. In contrast, the Personal Genome Project (PGP) (www.personalgenomes.org), initiated in 2006, aims to publish the full sequence of the complete genomes and the phenotypes (eg, medical records, various laboratory measurements, and imaging tests) of up to 100,000 volunteers, with the ultimate goal of promoting individualized medicine.
Testing associations in diverse non European populations is needed to confirm association signals that have been detected in European populations, as well as identify to association signals specific to diverse populations. For example, Steinthorsdottir et al.70 found the SNP rs7756992 in CDKAL1 to be associated with T2D in samples of European ancestry (risk allele frequency (RAF) in controls=0.258) and were able to replicate this association in a southern Han Chinese sample (RAF=0.462, 1,500 cases and 1,000 controls, P=1×10−4) but not in an African sample (RAF=0.612, 860 cases and 1,100 controls, P=0.72). Recently, 2 GWA studies71,72 independently identified SNPs (eg, rs2237892) in KCNQ1 as being associated with T2D in East Asian populations, but not in populations of European descent. This is likely because RAF of rs2237892 in European is low (0.05–0.07 versus 0.28–0.40 in East Asians), resulting in greatly reduced statistical power.73 Even in a large meta-analyses59, the significance level of the KCNQ1 risk allele (rs2237892) in European populations was ~0.01, well below the threshold that would have triggered replication efforts. Similarly, the difference in RAF of the TCF7L2 variants between East Asians (0.03 for rs7903146) and Europeans (0.30 for rs7901695), leads to a weaker association signal in East Asians.58,71,72
A detailed description of genetic diversity in non-European populations, such as African-Americans as well as Hispanics, is needed. Patterns of genomic variation in the HapMap database, in populations other than the original samples (eg, Yoruba in Africa, CEPH (Centre d’Etude du Polymorphisme Humain) from Utah, Han Chinese in Beijing and Japanese in Tokyo) need to be comprehensively cataloged. In addition, new population-specific SNP arrays will be needed to efficiently capture most of the common genetic variation in such populations.
GWA studies can identify novel etiologic pathways of disease and thereby facilitate development of new therapeutic agents for prevention and/or treatment.23 In addition, knowledge of individual patterns of disease predisposition (eg, genetic profiling), could facilitate individualized approaches to diagnosis, prognostication, and therapy. However, the clinical utility of the diagnostic or predictive value of disease susceptibility variants/quantitative trait loci has yet to be established. The major limitation is that the variants so far identified for most complex traits explain only a small proportion of individual variation in disease risk.74 Several Web-based companies have already started to market genotyping of disease susceptibility variants to the public.75 Genetic variants with modest effects tend to have low predictive value, because the difference in absolute risks tends to be small between carriers and noncarriers of the risk variants. Whether combining tests for multiple variants in genomic profiles will yield higher predictive value requires confirmation. Furthermore, clinical trials are needed to establish whether knowledge of risk based on genomic profiles leads to adoption of lifestyle changes by individuals.
GWA studies have become a valuable tool for identifying susceptibility variants and delineating the genetic architecture of common ‘complex’ diseases, including atherosclerotic vascular disease. The susceptibility loci identified in GWA studies may provide novel insights into the etiological mechanisms that influence disease predisposition and thereby facilitate development of new preventive and therapeutic strategies. However, significant challenges remain in identifying the determinants of the residual disease susceptibility due to genetic factors, including the contribution of gene-gene and gene-environmental interactions. Further work is also needed to perform functional annotation of the risk alleles and to investigate whether such alleles can impact risk stratification and prognostication in the clinic.
The authors acknowledge helpful comments by Teri Manolio and Daniel Schaid.
This work was supported by National Institutes of Health grants HL75794 and HG004599.
No conflicts of interest to disclose
Statement of Responsibility
Both authors have read and agree to the manuscript as written.