|Home | About | Journals | Submit | Contact Us | Français|
Mitochondrial aldehyde dehydrogenase (ALDH2) is one of the most important enzymes in human alcohol metabolism. The oriental ALDH2*504Lys variant functions as a dominant negative greatly reducing activity in heterozygotes and abolishing activity in homozygotes. This allele is associated with serious disorders such as alcohol liver disease, late onset Alzheimer disease, colorectal cancer, and esophageal cancer, and is best known for protection against alcoholism. Many hundreds of papers in various languages have been published on this variant, providing allele frequency data for many different populations. To develop a highly refined global geographic distribution of ALDH2*504Lys, we have collected new data on 4,091 individuals from 86 population samples and assembled published data on a total of 80,691 individuals from 366 population samples. The allele is essentially absent in all parts of the world except East Asia. The ALDH2*504Lys allele has its highest frequency in Southeast China, and occurs in most areas of China, Japan, Korea, Mongolia, and Indochina with frequencies gradually declining radially from Southeast China. As the indigenous populations in South China have much lower frequencies than the southern Han migrants from Central China, we conclude that ALDH2*504Lys was carried by Han Chinese as they spread throughout East Asia. Esophageal cancer, with its highest incidence in East Asia, may be associated with ALDH2*504Lys because of a toxic effect of increased acetaldehyde in the tissue where ingested ethanol has its highest concentration. While the distributions of esophageal cancer and ALDH2*504Lys do not precisely correlate, that does not disprove the hypothesis. In general the study of fine scale geographic distributions of ALDH2*504Lys and diseases may help in understanding the multiple relationships among genes, diseases, environments, and cultures.
Alcoholism is a major public health problem globally. Many genes have allele frequency variation that has been associated with risk of developing either alcoholism or complications of alcoholism. The ethanol metabolizing genes, especially the alcohol dehydrogenase (ADH) genes and the aldehyde dehydrogenase 2 (ALDH2) gene, are the strongest such associations (Couzigou et al., 1994; Long et al., 1998; Reich et al., 1998; Chen et al., 1999; Luo et al., 2006). The major metabolic pathway for ethanol is degradation by ADH enzymes to acetaldehyde followed by degradation of that intermediate metabolite to acetate by ALDH enzymes. The mitochondrial ALDH2, encoded by the ALDH2 gene on chromosome 12, has the lowest Km (~1μmol/L) for acetaldehyde (Algar & Holmes, 1989). The variants at ADH genes and at ALDH2 that are associated with alcoholism appear to interact by increasing the transient levels of the toxic acetaldehyde.
The geographic distribution of the most relevant ALDH2 variant, ALDH2*504Lys, is quite dramatic being present only in East Asian populations with frequencies as high as 40% in some East Asian population samples. Flushing and discomfort, such as headache and nausea, occur in many individuals of East Asian ancestry after drinking even small amounts of ethanol; these symptoms do not occur in individuals of European ancestry after drinking equivalent, and even larger, amounts of ethanol (Wolff, 1972). That early observation led to the conclusion that alcohol metabolism is quite different between these populations. Early in 1975 the atypical form of ALDH2 isozymes was found (Samatoyannopoulos et al., 1975; Greenfield & Pietruszko, 1977; Hempel & Pietruszko, 1978) including the allele designated ALDH2*2 (ALDH2*504Lys). The difference was identified as a substitution of the Glutamic acid at codon position 504 with Lysine (hence the reference to the variant allele as ALDH2*504Lys, Ikawa et al., 1983). Position 487 of the mature protein is actually codon position 504 of the gene when the mitochondrial leader sequence is included; hence the change in this genomic era from “487” to “504” as the referent position. As this atypical form of the enzyme was seen only in East Asians, it was also referred to as the oriental variant (Yoshida et al., 1984), the variant allele acts as a dominant negative with the heterozygote having greatly reduced enzyme activity and the homozygote having no activity; in both cases acetaldehyde accumulates as ethanol is converted to acetaldehyde. The greatly reduced enzyme activity in the heterozygous state reflects the fact that the ALDH2 tetramer is essentially a dimer of a pair of subunits with an important functional interface. If that pair is a heterodimer for the 504 Glutamate and Lysine monomers, there is essentially no activity (Larson et al., 2005).
ALDH2*504Lys appears to confer protection against some diseases such as alcohol liver disease. The unpleasant sensations experienced by heterozygotes for ALDH2*504Lys (even more so by the homozygotes) after drinking, such as the flushing reaction in many East Asians, hinder/stop further heavy drinking, thus reducing serious harm to the liver and other relevant organs (Yu et al., 2002). However, some social attitudes reduce this protection by promoting drinking in many ethnic groups of East Asia (Hasituya et al., 2007), resulting in more serious diseases in the ALDH2*504Lys heterozygotes.
The accumulation of acetaldehyde in the body has many consequences in addition to the aversion to consuming ethanol. Li et al. (2006) showed that the Glu504Lys polymorphism was associated with efficacy of sublingual nitroglycerin and recently Chen et al. (2008) showed that ALDH2 activity is critical for protection from ischemia. These findings emphasize the importance of the studies on genetic variation at ALDH2. The 504Lys variant is believed to increase the risk of many disorders, including many cancers. Cancer incidences increase among alcoholics in organs including esophagus, stomach, liver, upper aerodigestive tract in which acetaldehyde is produced by the alcohol dehydrogenases (Yokoyama et al., 2001). Esophageal cancer is of particular interest because studies have shown an increased risk of developing esophageal cancer in ALDH2*504Lys heterozygotes in different East Asian populations (Yokoyama & Omori, 2005; Yokoyama et al., 2006; Yang et al., 2007; Li et al., 2008; Druesne-Pecollo et al., 2009). The geographic distribution of esophageal cancer, with its much higher frequencies in individuals of East Asian ancestry (Parkin et al., 1997), suggests a potential association of this cancer to ALDH2*504Lys. The association is believed to be mediated through levels of acetaldehyde following drinking alcohol. The hypothesis is supported by associations of variants at two different ADH genes (Hashibi et al., 2008).
A plethora of papers on ALDH2 exists in the global literature because of its relevance for public health and human population genetics studies. It may be among the more intensively studied human genes. The large number of publications also provides allele frequency data of ALDH2*504Lys (ALDH2*2 or *487Lys) in many populations, allowing us to determine the detailed geographic distribution of this allele, with the resulting potential to study the demographic histories of populations and the multiple factors affecting the allele frequency. The allele frequency of ALDH2*504Lys ranges from 0 to 40% among the East Asian populations based on the published data, constituting large variation just within East Asia and belying the common impression that the allele is common in all East Asians. We also find that some key areas or ethnic groups in East Asia have not been studied so far for the frequencies of ALDH2 alleles. To assess the detailed distribution of ALDH2*504Lys, we collected relevant data from the literature, and filled many blanks on the global map by typing relevant new population samples.
In this paper, we present new data on the ALDH2*504Lys frequency of 4,091 individuals in 86 populations from China, Laos, Vietnam, Russia, Japan, and other countries around the world (Table 1). Various collaborating laboratories have used different typing methods of either traditional PCR-RFLP (Oota et al., 2004) or Taqman® SNP genotyping assay (C__11703892_10). Added to the data we have extracted from the literature, the total sample size is 80,691 individuals from 366 population samples (see Table S1 and ALFRED online database for detailed data including references to the relevant geographic areas and ethnic groups).
A refined map of ALDH2*504Lys allele frequency was generated from these frequency data using the Surfer 8.0 program to interpolate the clinal patterns (Figure 1). For some of the populations, more than one sample was studied. The different samples from the same population usually had similar allele frequencies, while some showed notable deviation from the common data. Some of this inconsistency may have resulted from technical problems such as typing errors or sampling bias. Here we chose either the most commonly estimated frequency for each population with multiple estimates, or the data based on the largest sample size. For instance, the frequencies are all around 17% in 11 Korean population samples; therefore, two estimates of 3% and 36% were rejected in constructing our map. Among nine Japanese samples from Tokyo, with the frequency ranging from 21.5% to 29.0%, we only chose the frequency of 26.6% with the largest sample size of 642. In some cases, we prefer random sample data to the control sample data of case-control studies, or recent data by new typing methods to the data published decades ago. All of the data are included in Table S1 with indication of which samples were included in Figure 1.
In total, the map shows a pattern of a single center of expansion within East Asia. The highest frequencies appear in a restricted area in Southeast China, among the Han Chinese in South Fujian province and East Guangdong province (the Hakka and Minnam populations), decreasing gradually to the north and west. Hakka from Changting County in Fujian have the highest frequency, 40.9%. The Hakka population samples from Taiwan and Sichuan also exhibit high frequencies, indicating that Hakka have maintained a high frequency during their migrations. The allele frequencies in other Han Chinese populations range from 9% to 40%, exhibiting a cline clearly decreasing from southeast to northwest, except for two small peaks in Shanghai in East China and Shandong in Central China.
Another high frequency area for the ALDH2*504Lys allele is Central Japan with 34.1% in Chiba. However, this high frequency area seems to be an extension from East China. The frequency decreases from around 30% in Honshu to around 10% in Ryukyu and Hokkaido, corresponding well to the migration history of modern Japanese (the descendants of Yayoi People, Hammer et al., 2006). Therefore, it is most probable that the ALDH2*504Lys allele in Japan was brought by the early Yayoi migrants from mainland East Asia.
Because the ALDH2*504Lys allele reduces activity in heterozygotes, though with a less severe phenotype than homozygous ALDH2*504Lys individuals, we have also considered the combined distribution of both homozygotes and heterozygotes (i.e., 2pq + q2). Figure 2 shows the distribution of this ALDH2*504Lys “carrier” frequency. The high frequency area of the “carriers” is much wider than the high frequency area of the allele, as expected, indicating that more populations may be at risk for the associated disorders.
The frequency decline from Southeast China to West and North China is quite smooth. The allele frequencies decrease to less than 20% in Southwest and Central China, and to less than 10% in Manchuria, Mongolia, Xinjiang, and Tibet within the broader region of East Asia. In Central Asia and Siberia, beyond the pronounced genetic influence of Han Chinese, the ALDH2*504Lys allele is rare. The allele is also detected in some Iranian populations, which may be explained by diffusion along the Silk Road. We conclude that the spread of ALDH2*504Lys to the north and west was concomitant with the expansion of Han Chinese and diffusion of the allele into surrounding populations.
Although the ALDH2*504Lys allele frequency reaches a peak in Southeastern Chinese populations, we cannot draw the conclusion that this allele originated there. The population history shows clearly that Hakka and Minnam Chinese presently in Southeast China are descendants of migrants from Central China (Wen et al., 2004). The indigenous populations in South China, such as Hmong-Mien populations (Hmong and She) from the Yangtze River area, and Daic populations (Kam, Laka, Mulam, and Maonan) from the Pearl River area, exhibit much lower frequency of ALDH2*504Lys. ALDH2*504Lys is almost absent in the aboriginal populations of Hainan and Taiwan, the two largest islands in South China. Therefore, it is unlikely that the Southeast Chinese obtained the ALDH2*504Lys allele from the indigenous populations. Unlike the gradually decreasing frequency to the north and west, the allele frequency drops sharply to the south. The allele exists at low frequency in Peninsular Southeast Asia, and is rare in the Southeast Asian islands. If this allele originated in the Southeast Chinese populations after they arrived in the present region, the quick expansion of the allele to the north and west cannot be explained. Therefore, we conclude that the ALDH2*504Lys allele was most probably carried south by the Han Chinese migrants from Central China, rather than originating in the indigenous populations in the region where it now has the highest frequencies.
Understanding why the present Central China populations exhibit much lower ALDH2*504Lys than the Southeast China populations is crucial in the study of the history of this allele. Both the decrease in Central China and the increase in Southeast China should be accounted for. In the history of China, many Altaic populations moved from the North China to Central China after wars in the 4th, 12th, and 13th centuries which also resulted in the migration of some Chinese populations from Central China to South China. These Altaic populations later merged with the Central Chinese populations after their kingdoms or dynasties ended. The most famous examples are Sienbers (Xianbei, founders of Former Yan Kingdom, Later Yan Kingdom, Western Qin Kingdom, Southern Liang Kingdom and Southern Yan Kingdom of Sixteen Kingdoms Period, and Northern Dynasties), Huns (founders of Han-Zhao Kingdom and Northern Liang Kingdom of Sixteen Kingdoms Period), Khitans (founders of Liao Dynasty), and Jurchens (founders of Jin Dynasty). Those Altaic migrants may have included very few or no individuals carrying the ALDH2*504Lys allele because present Altaic populations have a low frequency of the allele. The merging of these Altaic populations could have decreased the proportion of ALDH2*504Lys in the Central Chinese populations. On the other hand, some as yet unknown protective effects of ALDH2*504Lys against diseases might also have contributed to the increased frequency of this allele in Southern Chinese. Since migrations to South China resulted from wars, the refugees may have been subjected to considerable stress and a selective advantage could have had great impact. We can speculate that the ALDH2*504Lys heterozygotes had an advantage because they tended to drink less alcohol or had some other advantage (Chen et al., 1999). The recent appreciation of other metabolic/pharmacologic roles for ALDH2 (Li et al., 2006; Larson et al., 2007; Chen et al., 2008) suggest that if selective factors are responsible for the high ALDH2*2 frequency in East Asia, their nature may be unrelated to the current association with esophageal cancer or ethanol metabolism. Alternative hypotheses of increased resistance to some disease organisms (Enoch and Goldman, 1990;Yokoyama et al., 2001; Oota et al., 2004; Yokoyama & Omori, 2005; Yang et al., 2007; Li et al., 2008) would also explain a clear advantage to heterozygotes. However, statistically positive selection on ALDH2*504Lys cannot be detected using the extended haplotype test (Sabeti et al., 2007) as very low levels of recombination exist in the genomic region of ALDH2 locus (Oota et al., 2004). Other methods suggest positive selection on ALDH2*504Lys (Long et al., 2006).
Whatever positive selection may have increased the frequency of ALDH2*504Lys, serious diseases such as esophageal cancer or ischemia could act to decrease the ALDH2*504Lys allele frequency among the populations since studies report that heavy alcohol drinkers who are heterozygotes for ALDH2*504Lys have higher risk for esophageal cancer (Yokoyama & Omori, 2005; Yang et al., 2007; Li et al., 2008). In addition, ALDH2 activation was shown to reduce ischemic damage to the heart, suggesting that patients with reduced ALDH2 activity may suffer increased damage during cardiac ischemic events or coronary bypass surgery (Chen et al., 2008). The typical age of onset for esophageal cancer in the high incidence area can be earlier than 30 (He et al., 2006). We compared the geographic distribution of esophageal cancer incidence with the ALDH2*504Lys allele and carrier frequency distributions. We collected the male esophageal cancer incidence data of 355 populations from the literature, covering most countries in the world (Table S2). Central and Southeast China were examined in detail. Figure 3 illustrates the world distribution of esophageal cancer incidence and the details in East Asia. The extremely high incidences only appear in East Asia and some populations in Central Asia where the frequency of ALDH2*504Lys carriers is also high. However, comparison of Figure 2 and Figure 3 shows that the distributions are far from identical. However, the high cancer incidence areas mostly fall into the high frequency area of the derived allele carriers. The acetaldehyde accumulation resulting from ALDH2*504Lys in those who drink alcohol is certainly not the only risk factor for esophageal cancer. As noted above, ALDH2 also has other metabolic functions that could be independently influencing the distribution of ALDH2 variants (Li et al., 2006; Larson et al., 2007). Some environmental factors such as soil and vegetation characteristics and life styles may also be associated with the esophageal cancer risk (Wu et al., 2007; Fan et al., 2008; Moradi, 2008).
In Central Chinese populations, the heritability of esophageal cancer is estimated at around 49% (Han et al., 1994; Li et al., 1998). East Asian migrants in America also have a much higher esophageal cancer incidence than European Americans and African Americans (Parkin et al., 1997), indicating the pronounced heritability of esophageal cancer. Therefore, the incidence of esophageal cancer is affected by multiple factors that interact with the ALDH2*504Lys allele frequency in a complex way. That complexity could explain the differences between the distributions of esophageal cancer and the ALDH2*504Lys allele carriers in East Asia.
In most areas of South China and Southeast Asia, the incidence of esophageal cancer is much lower than that observed in Central China, indicating that there are fewer environmental risk factors and lower susceptibility of esophageal cancer in South China. However, there is still a high incidence area in Southeast China, which might be associated with the highest allele frequency of ALDH2*504Lys in exactly the same geographic area. In contrast to the high incidence of esophageal cancer in Southeast China being the consequence of the high ALDH2*504Lys frequency, it is possible that the high incidence of esophageal cancer in Central China is working to decrease the ALDH2*504Lys frequency while cultural pressure to consume ethanol increases as the impact of *504Lys decreases. The answer depends on which factors increasing risk are most important in which area and how they interact.
In conclusion, we hypothesize that the oriental ALDH2*504Lys variant might have originated in the ancient Han Chinese population in Central China and spread to most areas of East Asia with the expansion of Han Chinese and their genetic influences on neighboring populations over the past few thousand years. Some diseases such as esophageal cancer show a complex relationship with the frequency of ALDH2*504Lys. Where the ALDH2*504Lys frequency is high for whatever reason, as in Southeast China, there is a clear increased risk of esophageal cancer in heterozygotes that results in higher esophageal cancer incidences in some subregions. In other areas of China there is also an increased risk of esophageal cancer in heterozygotes (Yang et al., 2007; Wu et al., 2001; Chen, 2005; Yang, 2005; Xiao, 2007) but the lower frequency of ALDH2*504Lys is not sufficient to explain the high incidence of esophageal cancer. More genetic epidemiological investigations in China are required to reveal any possible reciprocal relationship between esophageal cancer and the ALDH2*504Lys allele and identify the other risk factors that appear to be present.
KKK was supported in part by U.S. Public Health Service grants AA009379 and GM057672 and by National Science Foundation grant BCS0725180. NY and ER were supported by the Biodiversity and Dynamics of Gene Pools program of the Presidium of the Russian Academy of Sciences. SB, VS and AM were supported by the Russian Foundation for Basic Research grants ## 09-04-01755-a, 06-04-48274, 07-04-01629, 07-04-10173, 07-04-10177. LJ is supported by grants from the National Outstanding Youth Science Foundation of China (30625016), National Science Foundation of China (30890034), 863 Program (2007AA02Z312), and Shanghai Leading Academic Discipline Project (B111). Dr. Maria Landi from National Cancer Institute gave us important suggestion on data analyses. We especially thank the many individuals who volunteered to provide samples for this study.
Web Resource ALDH2*504Lys allele frequency is being updated in ALFRED, the Allele Frequency Database: http://alfred.med.yale.edu/alfred/SiteTable1A_working.asp?siteuid=SI000734O