GWAS have provided a major boost to complex disease genetics in rapidly identifying novel susceptibility risk loci that had hitherto not been found using linkage, candidate genes, or other approaches. However, most GWAS to date have been done in populations of European ancestry and the potential burden of risk posed by these loci to other populations is unknown. A first step in understanding this issue is the investigation of the allele frequency across multiple populations, as we have done for this large set of 621 loci associated with 26 common complex diseases and traits. The present study has demonstrated wide between-population variation as well as a lack of correlation in allele frequencies between the groups of European ancestry versus the non-European groups. These observations appear to be true for a wide variety of diseases considered in this study, including various types of cancer (e.g., breast cancer, prostate cancer, colorectal cancer), metabolic disease (e.g., type 2 diabetes), behavioral/mental health conditions (e.g., bipolar disorder, schizophrenia), systemic autoimmune diseases (e.g., systemic lupus erythematosus, rheumatoid arthritis), and neurodegenerative diseases (e.g., Alzheimer's disease). Continuous traits, including height, bone mineral density, serum lipids, and C-reactive protein, also show the same pattern. These findings have several obvious implications: (1) the burden of disease posed by each of these loci will vary considerably among populations, with obvious public health implications that will differ between populations; (2) findings from GWAS in European ancestry groups may not be directly replicable or transferable to other populations; therefore, replication studies that aim to test for genetic variants identified in one population may not be possible in other populations because the risk allele is very rare or absent. Empirical evidence that supports this notion has started to emerge for type 2 diabetes, for which 2 GWAS in East Asian populations recently identified a signal in the KCNQ1
]. This signal had been missed in all the previous European-descent GWAS studies because the risk allele was far less frequent in European descent populations, thereby greatly reducing the power to detect the association [2
]. These observations provide compelling reasons for ensuring that more human populations sampled from widely contrasting geographical locations around the world are included in the international effort to use genomic tools to gain novel insight into the pathophysiology of common human diseases.
Nearly all the diseases and traits considered in the present study show considerable ethnic and/or population differences in prevalence and incidence rates between the source populations represented by the HapMap 3 dataset. For example, on a global level, comprehensive reviews have shown that rheumatoid arthritis [3
], schizophrenia [4
], and type 1 diabetes [5
] have been shown to differ markedly between countries (the latter by up to 350-fold) [5
]. Similarly, in the United States, African Americans, Mexican Americans, and non-Hispanic White Americans (represented in the HapMap by ASW, MEX, and CEU, respectively) differ considerably in rates of obesity, type 2 diabetes, hypertension, dyslipidemia, and coronary artery disease [6
]. While many of these differences can be attributed to environmental, lifestyle, and behavioral characteristics, it is nonetheless important to identify the genetic contribution to these differences. A survey of the relative frequencies of potential disease risk variants is a first step towards achieving this goal. The findings of the present study provide a compelling summary of such differences and highlight the need to expand current GWAS and follow-up studies to multiple populations.
Background population differentiation across continental populations for loci across the genome is well documented for the original HapMap populations [7
] and usually exceeds the finer grained differentiation within continents, as was demonstrated by Heath et al. [8
] in their study of the fine structure of European populations. This is consistent with the finding in this study of greater correlation or similarity within-continental groups compared to between-continental groups for these disease and trait loci identified from GWAS. Therefore, the findings of this study of loci of clinical and/or public health significance are broadly similar to those from genome wide studies of unselected loci.
The question of how often genetic or environmental variants produce unequal effects in different populations is often posed in the context of explaining health disparities and deciding if population-specific interventions are warranted for specific health conditions. Thus, the emphasis had been on ‘ethnicity-specific disease risk’ or the consistency of genetic effects across different racial or ethnic groups [9
]. The largest systematic effort to investigate this question, a meta-analysis of 43 gene-disease associations [11
], found that genetic effects are largely consistent across ethnic groups. A more recent study [12
] investigated risk allele frequencies and population differentiation among 53 world populations in 25 SNPs which showed robust association with 6 complex diseases (from the Wellcome Trust Case Control Consortium study) and found that risk allele frequencies showed substantial variation across the populations, including some that were fixed or absent in a population. In the present study, we present systematic evidence showing that allele frequencies at risk loci for common complex diseases discovered from GWAS differ substantially between global population groups. This implies that, assuming similar effect sizes for a locus across populations, the population attributable risk (PAR) for any given associated allele would vary considerably across populations simply as a function of the frequency of that allele (apart from other genetic and/or environmental factors). This will be true for single gene effects but may also have immense implications for gene-gene and gene-environment interactions in which the frequency (or rarity) of a specific risk variant may significantly modify disease risk from the interaction.