|Home | About | Journals | Submit | Contact Us | Français|
Comparative studies of ethnically diverse human populations, particularly in Africa, are important for reconstructing human evolutionary history and for understanding the genetic basis of phenotypic adaptation and complex disease. African populations are characterized by greater levels of genetic diversity, extensive population substructure, and less linkage disequilibrium (LD) among loci compared to non-African populations. Africans also possess a number of genetic adaptations that have evolved in response to diverse climates and diets, as well as exposure to infectious disease. This review summarizes patterns and the evolutionary origins of genetic diversity present in African populations, as well as their implications for the mapping of complex traits, including disease susceptibility.
One of the “grand challenges” of the post-genome era is to “develop a detailed understanding of the heritable variation in the human genome” (36). By characterizing genetic variation among individuals and populations, we may gain a better understanding of differential susceptibility to disease, differential response to pharmacological agents, human evolutionary history, and the complex interaction of genetic and environmental factors in producing phenotypes. Africa is an important region to study human genetic diversity because of its complex population history and the dramatic variation in climate, diet, and exposure to infectious disease, which result in high levels of genetic and phenotypic variation in African populations. A better understanding of levels and patterns of variation in African genomes, together with phenotype data on variable traits, including susceptibility to disease and drug response, will be critical for reconstructing modern human origins, the genetic basis of adaptation to diverse environments, and the development of more effective vaccines and other therapeutic treatments for disease. This information will also be important for identifying variants that play a role in susceptibility to a number of complex diseases in people of recent African ancestry (172, 187, 208).
Africa is a region of considerable genetic, linguistic, cultural, and phenotypic diversity. There are more than 2000 distinct ethno-linguistic groups in Africa, speaking languages that constitute nearly a third of the world’s languages (http://www.ethnologue.com/) (Figure 1). These populations practice a wide range of subsistence patterns including various modes of agriculture, pastoralism, and hunting-gathering. Africans also live in climates that range from the world’s largest desert and second largest tropical rainforest to savanna, swamps, and mountain highlands, and these climates have, in some cases, undergone dramatic changes in the recent past (106, 172).
According to the Out of Africa (OOA) model of modern human origins, anatomically modern humans originated in Africa and then spread across the rest of the globe within the past ~100,000 years (206). The transition to modern humans within Africa was not sudden; rather, the paleobiological record indicates an irregular mosaic of modern, archaic, and regional morphological and behavioral traits that occurred over a substantial period of time and across a broad geographic range within Africa (127). The earliest known derived suite of morphological traits associated with modern humans appears in fossil remains from Ethiopia, dated to ~150--190 kya (128, 229). However, this finding does not rule out the existence of modern morphological traits in other regions of Africa before 100 kya, particularly where specimens may be less well preserved and/or where extensive archaeological and paleobiological investigations have not been conducted (172). Indeed, a multiregional origin model for modern humans within Africa is not as unlikely as it would be for global populations, considering the greater potential for migration and admixture within a single continental region (172, 241). A more fully modern suite of traits appears in East Africa and Southwest Asia around 90 kya, followed by a rapid spread of modern humans throughout the rest of Africa and Eurasia within the past 40,000--80,000 years (120, 172) (Figure 2).
Two migration routes of modern humans out of Africa have been proposed. The presence of modern humans in Oceania as early as ~50 kya (65, 66), which predates their presence in Europe ~40 kya, has suggested a southern coastal route around the Indian Ocean in which modern humans first left Africa (possibly via Ethiopia) by crossing the Bab-el-Mandeb strait at the mouth of the Red Sea and then rapidly migrated to Southeast Asia and Oceania (62, 172) (Figure 1). This migration model is supported by the presence of very old mtDNA haplotypes in South Asia and their absence in the Levant (120, 168, 197). Other models have traditionally favored a second (or single) northern route via the Sinai Peninsula into the Levant (62, 172) (Figure 1). Regardless of the route of migration of modern humans out of Africa, the shared patterns of genetic diversity among non-African populations [e.g., at the CD4 locus (200)] and the divergent patterns of genetic variation among African populations argue against repeated sampling of African diversity from multiple source populations (172, 200, 206). However, analyses of more independent loci and a larger number of African populations, particularly from East Africa, will be necessary to better estimate the number and source of migration events out of Africa (172). After modern humans migrated from Africa, there could have been some admixture of modern humans with archaic populations in Eurasia, such as Neanderthals. This hypothesis remains a topic of considerable interest and debate and is the subject of a number of recent studies and reviews (46, 59, 71, 73, 77, 144, 158, 172, 184, 185, 224)
The migration of modern humans out of Africa is thought to be accompanied by a population bottleneck. The size of the population(s) migrating out of Africa is estimated to be ~600 effective founding females (i.e., census size of ~1800 females) on the basis of mtDNA evidence (62, 120), to be ~1000 effective founding males and females (i.e., census size of ~3000 individuals) based on the analysis of 783 autosomal microsatellites genotyped in the Center d’Etude du Polymorphisme Humain (CEPH) human genome diversity panel (HGDP) (112), and to be ~1500 (i.e., a census size of ~4500 individuals) based on a combined analysis of mtDNA, Y chromosome, and X chromosome nucleotide diversity data (72). These estimates imply that Eurasians must have rapidly expanded to a larger size to account for estimates of a long-term effective population size (Ne) of ~10,000 individuals (census size of ~30,000 individuals) for global populations (172, 243). Indeed, several recent studies indicate a rapid expansion of Eurasian populations within the past ~50,000 years, whereas Africans have maintained a large effective population size (72, 125, 243).
The pattern of genetic variation in modern African populations is influenced by demographic history (e.g., changes in population size, short- and long- range migration events, and admixture) as well as locus-specific forces such as natural selection, recombination, and mutation. For example, the migration of agricultural Bantu speakers from West Africa throughout sub-Saharan Africa within the past ~4000 years and subsequent admixture with indigenous populations has had a major impact on patterns of variation in modern African populations (157, 167, 172, 201, 235a) (Figure 1). Although Africa is critical for understanding modern human origins and genetic risk factors for disease, it has been under-represented in human genetic studies. Much of what we currently know about genetic diversity is from a limited number of the ~2000 ethno-linguistic groups in Africa, and the majority of these data are from mtDNA and Y chromosome studies. Large-scale autosomal studies of African genetic diversity are only now beginning to become available.
Phylogenetic analyses of both mtDNA and Y chromosome DNA indicate that the oldest lineages are specific to Africa and have a Time to Recent Common Ancestry (TMRCA) of ~200 kya (75, 206). Interestingly, the most ancient mtDNA lineage (L0d) [dated to ~106 kya (75)], which is common in click-speaking southern African Khoisan (SAK) populations, has recently been identified at low frequency (5%) in the click-speaking Sandawe population from Tanzania (75, 201). Maximum likelihood estimates for the time of divergence of these populations based on all mtDNA lineages is ~44 kya, indicating that any common ancestry is quite old. This finding supports studies of classical polymorphisms as well as archeological data that suggest that Khoisan-speaking populations may have originated in eastern Africa and subsequently migrated into southern Africa (26), although a southern African origin of Khoisan-speakers cannot be ruled out.
Phylogenetic analysis indicates that the most recent African specific mtDNA haplogroup lineage, L3, is the likely precursor of modern European and Asian mtDNA haplotypes (226). Indeed a subset of this lineage (M1) is observed at high frequency in Ethiopian populations (101, 168) and may have expanded out of Africa ~60 kya (168). This observation adds strength to the proposal that the dispersal of modern humans out of Africa may have occurred via Ethiopia (117, 200). However, more recent analysis of whole mtDNA genomes suggests that the M1 lineage may have originated in southwestern Asia and then was introduced into East Africa from Asia ~40--45 kya (150), whereas others have argued for a much more recent introduction of the M1 lineage into Africa from the Middle East (63).
The migration of modern humans out of Africa resulted in a population bottleneck and a concomitant loss of genetic diversity (112, 169). Numerous studies have shown higher levels of nucleotide and haplotype diversity in Africans compared to non-Africans in both nuclear and mitochondrial genomes (40, 72, 93, 111, 200, 202, 206, 208). Non-African populations appear to have a subset of the genetic diversity present in sub-Saharan Africa and more private alleles and haplotypes are observed in Africa relative to other regions (38, 93, 111, 169, 200, 202, 206, 208, 243) as expected under an OOA model. For example, a resequencing study of 3873 genes in 154 chromosomes from European, Latino/Hispanic, Asian, and African American populations observed that African Americans had the highest percentage of rare single nucleotide polymorphisms (SNPs) (64%) and the lowest percentage of common SNPs (36%). Additionally, 44% of all SNPs in this population were private (78). The high level of genetic diversity in African populations is also consistent with a larger long-term effective population size (Ne) compared to non-Africans (72, 195, 196, 202, 206; Ne is estimated to be ~15,000 for Africans and ~7500 for non-Africans based on a resequencing analysis of several 10-kb regions (243) (see Supplemental Material).
Although most studies of genetic variation in humans have focused on nucleotide and microsatellite diversity, a number of recent studies have demonstrated considerable amounts of structural variation (SV) in the human genome, including both copy number variation (which can include insertions and deletions as well as gene duplications) and inversions (17, 37, 191, 211) (http://projects.tcag.ca/variation/). Some of these structural variants are also associated with phenotypic variability (37, 171, 193). For example, variation in copy number of the amylase gene, which plays a role in digestion of starch, is correlated with enzyme activity level and with diet in ethnically diverse human populations (156). Additionally, SVs may play an important role in susceptibility to common disease (109, 124). A recent study that used high-resolution paired-end mapping to identify SVs in the genomes of a single African (Yoruba from Nigeria) individual and an individual of European descent led to the identification of 1175 insertions/deletions (INDELs) and 122 inversions (103). By extrapolation, these researchers predicted 761 and 887 SVs in the full genomes of these European and African individuals, respectively. Additionally, 45% of the SVs were shared between these samples, suggesting that a large proportion of SV events occurred prior to the divergence of African and non-African populations. The majority of these SVs were less than 10 kb in size, but at least 15% were larger than 100 kb and some SVs were predicted to be several megabases in size in both the European and African sample, indicating that the genomes of healthy individuals may differ by megabases of nucleotide sequence (103). To date, few population genetic studies of SVs across ethnically diverse populations have been performed (37). Instead, most studies have focused on the European, Japanese, Chinese, and African (Yoruba) HapMap populations (37). A study of 67 common copy number variants (CNVs) in these populations indicated that 11% of the variation was due to differences among populations and that many of the variants were shared among populations from different regions, further supporting the argument that these variants existed prior to migration of modern humans out of Africa (171). There are currently no studies of SV variability within and between ethnically diverse African populations. Such knowledge will be informative for reconstructing human evolutionary history and for understanding the role of SVs in normal phenotypic diversity and in susceptibility to disease.
Measures of population structure on a global level indicate that only ~10%--16% (Wright’s fixation index, FST = 0.10--0.16) of observed genetic variation is due to differences among populations from Africa, Europe, and Asia (26, 40, 206, 228). Analysis of population structure using the program STRUCTURE, (162,) based on 1048 individuals from the CEPH human diversity panel genotyped for 993 genome-wide microsatellite and insertion/deletion markers, indicates that individuals cluster into five major geographic regions: Africa, Europe/Middle East, East Asia, Oceania, and the New World (175). Two recent studies of >500,000 SNPs genotyped in the CEPH diversity panel support these initial findings (93, 111). Analyses within the African populations indicate that additional substructure exists, particularly between hunter-gatherer and agriculturalist populations (93, 111). However, the CEPH diversity panel includes just eight African populations, four of which are agricultural Bantu-speakers likely to share recent common ancestry (Figure 1). Thus, results from these studies may not reflect the full extent of population structure within Africa.
Several studies of nucleotide and haplotype variation have indicated that ancestral African populations were geographically structured prior to the migration of modern humans out of Africa (72, 73, 82, 158, 200, 241). Additionally, a recent study of 800 short tandem repeat polymorphisms (STRPs) and 400 /INDELs genotyped in more than 3000 geographically and ethnically diverse Africans indicates the presence of at least 13 genetically distinct ancestral populations in Africa and high levels of population admixture in many regions (F.A. Reed and S.A Tishkoff unpublished data). Population clusters are correlated with self-described ethnicity and shared cultural and/or linguistic properties (e.g., Pygmies, Khoisan-speaking hunter-gatherers, Bantu speakers, Cushitic speakers). This study reveals extensive admixture between inferred ancestral populations in most African populations. One exception is among West African Niger-Kordofanian (i.e., Bantu) speakers who are more genetically homogeneous compared with other African populations, likely reflecting the recent and rapid spread of Bantu speakers from a common origin in Cameroon/Nigeria (although fine-scale genetic structure can be detected amongst these populations). Thus, the pattern of genetic diversity in Africa indicates that African populations have maintained a large and subdivided population structure throughout much of their evolutionary history (Figure 2). Historic subdivision among African populations is likely due to ethnic and linguistic barriers, as well as a number of geographic, ecological, and climatic factors (including periods of glaciation and warming) that could have contributed to population expansions, contractions, fragmentations, and extinctions during recent human evolution in Africa (172, 206).
Linkage disequilibrium (LD), the nonrandom association between alleles at different loci, is typically measured using two different estimators: D’ and r2 (161). Levels and patterns of LD depend on a number of demographic factors including population size and structure, as well as locus-specific factors such as selection, mutation, recombination (1, 161, 206, 207), and gene conversion (see Supplemental Material). LD is particularly useful for inferring evolutionary and demographic processes, as well as for mapping disease-susceptibility loci. Therefore, an understanding of levels and patterns of LD has broader implications for studies of human evolutionary history and disease.
Several haplotype studies have indicated lower levels of LD in African populations compared to non-Africans (200, 202, 206, 207). Studies of long-range LD between SNP markers at multiple nuclear loci confirmed these initial results and demonstrated that haplotype blocks (where SNPs are in strong LD) extend over greater genomic distances and are more uniform in non-Africans compared to African populations (69, 116, 174, 183)
Given that recombination is an important determinant of the extent of LD, an alternative way to assess LD is to estimate the population recombination rate (ρ = 4Ner, where Ne is effective population size and r is the meiotic recombination rate/kb) (161). Empirical studies have shown that African Americans have higher ρ estimates compared to Europeans and Asians (33, 47, 60), consistent with the results of previous studies that described less LD in Africans relative to non-African populations. The divergent patterns of LD can be explained by the distinct demographic histories of African and non-African populations (206, 208) (Figure 2). Specifically, African populations have shorter blocks of LD because ancestral Africans maintained a larger effective population size (Ne), and because there has been more time for recombination to decrease levels of LD. Greater LD in non-African populations is likely the result of a founding event during expansion of modern humans out of Africa within the past 100,000 years (200, 206, 208).
However, an ongoing challenge has been to characterize patterns of LD among populations within continental regions, especially in Africa. Some evidence has suggested variance in levels and patterns of LD among subpopulations in Africa. Tishkoff and colleagues (200) noted that African populations have divergent patterns of LD; specifically, alleles that were in positive association in one population were in negative association in another. Additionally, a resequencing analysis of the IL-13 gene in 126 geographically diverse Africans identified divergent patterns of LD across West and East African populations (195) These observations suggest that not all African populations are characterized by a single discrete pattern of LD and each may have distinct haplotype block structures (Figure 3). Theoretically, under a model of population subdivision, allelic associations can differ between populations due to the stochastic effects of genetic drift (1).
Recombination hotspots, where historical crossing-over events are clustered and separate relatively large haplotype blocks, have been a topic of considerable interest in the scientific literature (12, 42, 43). The occurrence of recombination hotspots in human DNA has been demonstrated empirically from studies of single sperm DNA and from pedigree analyses (12, 43, 94). However, the extent to which this pattern is a general feature of the genome remains unknown, particularly because genetic drift can result in a similar pattern of LD (12, 47).
Several recent studies have observed that the locations of most hotspots tend to be shared between diverse populations (38, 47, 76). However, several datasets have suggested that some hotspots may be population specific (34, 38, 47, 76) and that African and African-American populations have more recombination hotspots relative to non-Africans (34, 47). Given that recombination rates vary between species (165, 223) and even individuals (43, 101a, 102), it is possible that hotspots could also differ among ethnically distinct populations, including Africans. Indeed the identification of haplotypes at the RNF212 gene associated with recombination rate, which occur at different frequencies in the HapMap populations (102), raises the possibility that population-specific genetic variants may influence recombination rates.
The mapping of complex disease genes relies on the identification of an association of polymorphic markers, either individually or as haplotypes, with disease susceptibility loci (207). The International HapMap Project (http://www.hapmap.org/) has characterized patterns of haplotype structure and LD across the human genome to facilitate mapping of complex disease genes (40, 41, 129). Another goal of this project has been to identify haplotype tag SNPs (htSNPs) that distinguish major haplotypes, thereby reducing the number of SNPs needed for association studies (207).
To date, 3.4 million SNPs have been characterized in 270 individuals from four populations: Yoruba from Nigeria, European-Americans Japanese, and Chinese. Knowledge of the frequency and distribution of these SNPs across ethnically diverse populations is important to assess their usefulness as markers for gene mapping studies in diverse ethnic groups (206, 207). A survey of 3024 SNPs spaced across 36 genomic regions genotyped in 927 unrelated individuals from the CEPH human genetic diversity panel indicates that although haplotype block sharing with the HapMap populations is high in European and East Asian populations, sharing for most other populations is low, particularly for haplotypes in African hunter-gatherer populations (38). These results suggest that development of distinct panels of htSNPs and more dense coverage of SNPs will be needed for African populations (38, 207). Seven additional populations have been added to the HapMap inititative: Luhya (Bantu) and Maasai (NiloSaharan) from Kenya, Tuscans from Italy, Gujarati Indians from Texas, metropolitan Chinese in Denver, people of Mexican ancestry in Los Angeles, and African Americans from the Southwest United States . The characterization of SNP and haplotype diversity in these additional populations will be important for the identification of htSNPs that are more informative across ethnically diverse populations.
Although the SNPs used in the HapMap study have been highly informative for use in association mapping studies, the initial identification of SNPs in one or a few populations can result in an ascertainment bias (AB) toward high-frequency, presumably older, SNPs. Several studies have shown that AB can distort estimates of migration rate (221), mutation rates (140), recombination rates (33, 140), and LD (7, 33). Although the effects of AB can sometimes be corrected (33, 142, 143), these correction methods make a number of assumptions that are not applicable to African populations, including the assumption of no population substructure (142). To more accurately infer human genetic variation it will be necessary to characterize the entire frequency distribution of nucleotide variants in diverse populations. Additionally, because variants associated with disease could be geographically restricted due to new mutation, genetic drift, or regional-specific selection pressure, de novo identification of genetic variation in diverse African populations will be important. The HapMap ENCODE (http://www.hapmap.org) and the proposed “1000 genomes” (http://www.genome.gov) resequencing projects aim to discover novel variation, including rare SNPs and structural variants, in targeted regions of the genome (as well as in whole genomes for a subset of samples) from the extended HapMap populations and other ethnically diverse populations. The extensive levels of substructure identified in Africa will likely require analysis of additional ethnically and geographically diverse African populations.
Natural selection, the process by which favorable heritable traits become more common in successive generations, operates to either increase or decrease the frequency of mutations that have an effect on an individual’s fitness. When a mutation is advantageous it can rapidly increase in frequency, together with linked variants (i.e., genetic hitchhiking), due to positive selection and replace pre-existing variation in a given population (i.e., a selective sweep) (83, 141, 180, 206). The strength of selection and local rates of recombination dictate how large of a genomic region is affected by a selective sweep. If selection is recent, there may not be enough time for the selected variant to become fixed in the population, resulting in an incomplete selective sweep. The genetic signatures of a selective sweep include a region of extensive LD [extended haplotype homozygosity (EHH)] and low variation on high-frequency chromosomes with the derived beneficial mutation relative to chromosomes with the ancestral allele (179, 203, 219). After this selective sweep, given enough time, new mutations and recombination will occur, leading to an excess of rare variants and a decrease in the extent of LD. Weak purifying selection is also expected to result in an increase of low frequency variants. Under this scheme of selection, deleterious mutations entering the population generally remain at low frequencies because their adverse effect on fitness makes it unlikely that they will reach high frequencies. In contrast, long-term balancing selection (resulting from greater fitness of heterozygotes or when maintenance of multiple alleles in a population is adaptively advantageous) is expected to result in an excess of alleles at intermediate frequency.
Demographic processes can also cause similar skews in the frequency of polymorphisms. For example, when population size rapidly increases, genetic drift has less effect in a rapidly expanding population, leading to an excess of rare polymorphisms (mimicking the pattern seen under positive or purifying selection). In contrast, a population bottleneck is expected to cause the loss of low-frequency variants, and thus produce an excess of intermediate-frequency variants (mimicking the pattern observed under balancing selection) (141). Although natural selection and demographic history can cause similar departures from a neutral equilibrium model, it is possible to distinguish these forces either by simulating the expected pattern of variation under different demographic scenarios or by using an outlier approach in which targets of selection are identified because they show an unusual pattern of variation or population differentiation compared with an empirical distribution observed at other loci across the genome (98, 141). Given the vast number of studies published on natural selection, the next sections focus on a few case studies of genetic and phenotypic adaptation in sub-Saharan Africa.
Malaria (caused by infection with the Plasmodium falciparum parasite) is a major cause of mortality in sub-Saharan Africa, resulting in more than 1 million deaths (primarily children) each year (107). Given the enormous impact of malaria in Africa, it is not surprising that this disease has exerted strong selective pressure on African populations during recent human evolutionary history.
A number of genetic variants in African populations have been shown to confer resistance to malaria. One of the best known genetic adaptations is the HbS mutation in the β-globin gene, which causes sickle cell anemia in homozygous individuals. Individuals who are heterozygous for the sickle cell trait are protected against malarial infection and have higher reproductive fitness (107) which results in the maintenance of the HbS allele at high frequency in many malaria endemic regions. A recent genetic study observed long range LD extending over 400 kb at the β-globin locus on haplotypes with the HbS mutation in West African and Caribbean African populations, consistent with the pattern expected under positive selection (81). Other well-known hemoglobin variants associated with malaria resistance in African populations include hemoglobins C (HbC) and E (HbE). Studies have also identified pattens of LD on chromosomes that contain either the HbC variant (236) or the HbE variant (147) that are consistent with recent positive selection.
Glucose-6-phosphate dehydrogenase (G6PD) mutations that result in reduced enzyme activity are also associated with malaria resistance (205). The most common G6PD mutation in Africa, G6PD A-, occurs at a frequency of ~25% in malaria endemic regions (205). Several empirical studies have found evidence for recent selection at the G6PD locus in African populations. For example, a study of SNP and microsatellite haplotype variability demonstrated a signature of strong positive selection of the A- variant (205). On the basis of the breakdown of LD between the microsatellite markers and the G6PD A- variant, the age of this variant was estimated to be between 3840 and 11,760 years (205). Similarly, nucleotide sequence analyses of the G6PD locus in Africa also showed patterns of variation consistent with recent positive selection (181, 217). A signature of a recent partial selective sweep was also supported by two studies that showed extensive LD extending >400 kb on chromosomes with the G6PD A- mutation (179, 182). A comparative analysis of human and nonhuman primates suggested that signatures of selection at the A- allele are unique to humans (218). Overall, these data are consistent with other evidence suggesting that the malaria parasite has had a significant impact on humans only within the past 10,000 years (96, 220), possibly corresponding with the development of agriculture and/or pastoralism in Africa (205).
The Duffy gene on chromosome 1 confers resistance to malaria caused by Plasmodium vivax, which is not prevalent in Africa today but may have been in the past. The Duffy gene encodes a receptor on the surface of erythrocytes and is characterized by three alleles (FY*A, FY*B, and FY*O). The frequency of the FY*O allele is at or close to fixation in most sub-Saharan African populations, but is very rare outside of Africa (79). A resequencing study of nucleotide variation at the FY locus in five sub-Saharan African populations, and in a comparative Italian population (79), reported that variation at this locus is two- to three- fold lower in Africans than in the Italian sample, which is the opposite pattern observed at most loci. A more extensive resequencing analysis of this locus (80) also revealed reduced sequence variation around the FY*O mutation and an excess of high-frequency derived alleles at linked sites, consistent with a selective sweep of this region. Additionally, researchers observed unusually large FST values for the FY*A and FY*O variants at this locus across African, European, and Asian populations, consistent with local adaptation in different geographic regions. These results have led to the conclusion that positive selection has been a dominant force in shaping the distribution of Duffy alleles among human populations.
Lactase persistence (LP), the ability to digest milk and other dairy products into adulthood, is a classic example of a genetic adaptation in humans. LP varies in frequency in different human populations; it is most common in northern Europeans and certain African and Arabian nomadic tribes that practice pastoralism and is at low frequency in East Asians and West sub-Saharan Africans (88). A number of studies have demonstrated a strong association between LP and the presence of the T allele at a C/T SNP located −13910 kb upstream from the lactase gene (LCT) in European populations (58, 159). In vitro studies also showed that the T-13910 variant enhances gene transcription from the LCT promoter (110, 148). In a study of long-range LD in Europeans, Bersaglieri and colleagues (21) found that haplotypes containing the LP-associated T-13910 variant were largely identical over nearly a 1-Mb region, consistent with a strong selective sweep. In genome-wide scans of selection in the HapMap samples, the LCT gene showed the strongest signal of positive selection in Europeans (180, 219).
Although the T-13910 variant is likely the causal mutation of the lactase persistence trait in Europeans, analyses of this SNP in ethnically and geographically diverse African populations indicated that it is present in only a few West African pastoralist populations, such as the Fulani (or Fulbe) and Hausa from Cameroon (35, 135, 136). These results suggested that the T-13910 allele may not be a strong predictor of lactase persistence in most sub-Saharan Africans. A more recent genotype and phenotype association study in a sample of 43 populations from Tanzania, Kenya, and the Sudan identified three novel SNPs located ~14 kb upstream of LCT that are significantly associated with the LP trait in African populations (203). These SNPs are located within 100 bp of the European LP-associated variant (C/T-13910). One LP-associated SNP (G/C-14010) is common in Tanzanian and Kenyan pastoralist populations, whereas the other two (T/G-13915 and C/G-13907) are common in northern Sudanese and Kenyans. The derived alleles at these loci (C-14010, G-13915, and G-13907) were shown to enhance transcription from the LCT promoter in vitro (203). Genotyping of 123 SNPs across a 3-Mb region in these populations demonstrated that these African LP-associated variants exist on haplotype backgrounds that are distinct from the European LP-associated variant and from each other. In addition, haplotype homozyogisty extends >2 Mb on chromosomes with the LP-associated C-14010 variant, consistent with an ongoing selective sweep over the past 3000--7000 years. An independent study of Sudanese populations also identified a significant association of the G-13915 allele with LP in that region (90) and a recent study confirmed the enhancer effect of the G-13915 variant, together with a C-3712 variant, on the same haplotype background (57). These data indicate a striking example of convergent evolution and local adaptation due to strong selective pressure resulting from shared cultural traits (e.g. cattle domestication and adult milk consumption) in Europeans and Africans. These studies also demonstrate the effect of local adaptation on patterns of genetic variation and the importance of resequencing across geographically and ethnically diverse African populations to identify population-specific variants associated with variable traits, including disease susceptibility.
Another important dietary adaptation in human populations is the ability to taste bitter compounds. A hypothesized selective advantage of bitter taste is that it helps individuals avoid ingesting toxic substances in plants. Variation at the TAS2R genes is associated with sensitivity to bitter taste substances (54). An analysis of nucleotide and haplotype variation at 24 TASR2 genes in 55 globally diverse individuals also identified substantial amino acid diversity, an excess of nonsynonymous substitutions, and high levels of population differentiation at variable sites, suggesting that amino acid variability at these loci may be maintained due to natural selection (99).
There have also been a number of in-depth studies at individual TAS2R loci for which there are known associations with bitter taste perception. For example, the ability to taste phenylthiocarbamide (PTC), a synthetic bitter substance, is a highly variable trait in humans (83). Although several TAS2R loci contribute to variability in PTC taste perception (54, 55), 50%--85% of the phenotypic variance in PTC sensitivity is attributed to variation at the TAS2R38 gene (237).. Studies have identified three amino acid substitutions at TAS2R38 that are in nearly complete LD in non-African populations and that form two common amino acid haplotypes (a taster haplotype PAV and a nontaster haplotype AVI; PAV is dominant). Furthermore, considerably more haplotype variability has been observed in Africa (M.C. Campbell and S.A. Tishkoff, unpublished data, 238) and these haplotypes are associated with a broad range of taste perception phenotypes (M.C. Campbell and S.A. Tishkoff, unpublished data).
Genetic analyses of both African and non-African populations have detected signatures of balancing selection at the TAS2R38 locus, including an excess of intermediate-frequency variants, a low amount of genetic differentiation between the continental populations (FST = 0.056), and an ancient divergence between the major taster and nontaster haplotypes (238). Furthermore, Wooding and coworkers (237) showed that PTC taste sensitivity in chimpanzees is associated with different amino acid haplotypes at the TAS2R38 gene compared to humans, implying a unique origin of the taster/nontaster variants in humans and chimpanzees.
It has also been suggested that low sensitivity to bitter taste may provide a selective advantage against malarial infection in some African populations. Recent data have shown that the K172 allele at the TAS2R16 gene (which is associated with low sensitivity to bitter taste substances, including salicin) occurs at moderately high frequencies in malaria endemic regions in central Africa (189). Furthermore, sequence analysis of the entire TAS2R16 coding region, as well as part of the 5' and 3' untranslated regions (UTRs), in 997 individuals from 60 human populations detected a signature of positive selection on chromosomes with the K172N variant (189). Although the variant driving the signal of positive selection was not conclusively determined, the authors speculated that differential selection in malarial (favoring the K variant) and non-malarial (favoring the N variant) environments has maintained both alleles at relatively high frequencies in Africa. An earlier study suggested that the higher dietary intake of naturally occurring bitter substances, such as organic cyanogens, may be protective against malarial infection in populations from Central and Southeast Africa (91]]). Additionally, an inhibitory effect of cyanide on the normal development of the P. falciparum parasite has been observed in vitro (137a]). Thus, individuals with a low sensitivity to the bitter taste of cyanide compounds may have a survival advantage against malarial infection through a higher intake of this bitter compound (189). However, this hypothesis remains to be tested.
Several studies have reported more evidence for positive selection in populations outside of Africa relative to those in Africa. For example, Akey and colleagues (6) examined 132 genes in 24 African Americans and 23 European Americans and found evidence for selection at eight genes only in the European-derived population. A number of genome scans for selection (which aim to identify de novo targets of selection) have identified differential patterns of selection in Africans and non-Africans (25, 84, 97, 98, 137, 180, 192, 219, 225, 234). Several of these studies have observed more loci under recent selection in non-African relative to African populations (25, 97, 137, 192, 234). Furthermore, it has been hypothesized that non-African populations have experienced more recent strong local adaptation as modern humans migrated out of Africa into novel and diverse environments (6, 84, 192, 234).
Although an increase in positive selection might conceivably occur in populations that have migrated into new environments, it is premature to conclude that the amount of recent positive selection is greater in non-African versus African populations. For example, Voight and colleagues (219) identified widespread evidence of recent selection in each of the HapMap samples in a genome-wide scan. Furthermore, they observed the strongest signals of selection in the HapMap Nigerian population compared to HapMap European and Asian populations (although the power to detect selection may be greater in larger African populations). Additionally, a recent study demonstrated that non-Africans have an excess proportion of nonsynonymous variation, including many variants that are likely to be deleterious (114), which is attributed to population demographic history rather than increased adaptive evolution. Therefore, demographic factors may have influenced the differential pattern of selection observed in African and non-African populations.
However, the primary limitation in comparing the frequency of selective events in African and non-African populations is that African populations have been severely understudied. For instance, many studies have used African Americans as the sole representative of African populations. However, the statistical power to detect selective sweeps is likely to be lower in studies using AfricanAmerican samples because of their recent admixture with Europeans (25, 234). Demographic parameters such as a population bottleneck in the Eurasian populations may also have mimicked patterns of variation caused by selective events in non-African populations (25). Indeed, one might predict that Africans would have relatively high amounts of local adaptation, considering that Africa has the highest levels of genetic diversity and contains populations living in a wide range of environments and with high exposure to infectious disease. However, signatures of selection in African populations may be missed because studies have relied mainly on one or two African populations. To gain a clearer understanding of genetic and phenotypic adaptations in Africa, it is important to scan for genetic signatures of selection in a broad range of ethnically diverse African populations living in distinct environments. Additionally, it will be important to clearly identify functional variants that are likely to be targets of selection and to verify their impact on phenotypic variation (32) before the relative number of selection events in African and non-African populations can be clearly determined .
Host genetic variation plays a key role in influencing susceptibility to many infectious diseases in humans. Through recurrent exposure to different pathogens, a number of genetic adaptations have evolved that provide resistance to infection. Although the number of known candidate genes related to infectious disease has expanded, progress in the identification of genes that influence infectious disease susceptibility and/or resistance in diverse African populations has been slow. Understanding the genetic basis of infectious disease in Africans may provide useful insight into devising effective strategies to combat these diseases that have a large impact on African populations. Here we focus on three infectious diseases that cause the highest number of deaths in Africa: acquired immune deficiency syndrome (AIDS) tuberculosis (TB) and malaria.
It is estimated that 42 million people are infected worldwide with HIV, the virus that causes AIDS (146). Moreover, greater than 75% of HIV-1 infections and 84% of all AIDS-related deaths occur in sub-Saharan Africa (235). Although most individuals exposed to HIV rapidly progress to more advanced stages of this disease, researchers have identified a number of Africans who do not progress to AIDS despite exposure to HIV (146). This observation suggests that polymorphisms associated with disease susceptibility and resistance may be present in African populations (Table 1).
Chemokine receptors, which aid in the entry of HIV into the host cell, play a role in AIDS susceptibility (146). The protective effect of the Δ32 mutation at the chemokine receptor 5 (CCR5) gene has been well established in populations of Northern European ancestry (108). Because this mutation occurs at a relatively low frequency in sub-Saharan Africa, it is unlikely to play a major role in disease resistance in that region. However, several studies have demonstrated that mutations at the CCR2 gene play a role in resistance to HIV in Africans; for example, the CCR2–64I allele is associated with delayed HIV disease progression in African and AfricanAmerican populations (118, 233), although this effect was not observed in a large number of individuals from Uganda (170), indicating that underlying genetic and/or environmental factors that affect resistance may vary across African populations. See Table 1 for more details about other genes identified in African popluations.
Mycobacterium tuberculosis infection leading to TB causes significant mortality throughout the world, particularly in resource-poor countries. Furthermore, HIV infection is strongly associated with an increased risk for TB in sub-Saharan Africa (45, 119). In Africa, the rates of tuberculosis range from 50 to greater than 300 cases per 100,000 individuals (215). Genetic variation has been shown to influence susceptibility to TB in African populations. For example, polymorphisms in the NRAMP1 gene have been associated with increased TB susceptibility in ethnically diverse populations from the Gambia (20a ) and a single population in northern Tanzania (188). Other genes that are thought to play a role in TB susceptibility are shown in Table 1.
As previously mentioned, malaria infection has been a strong selective force in recent human evolution. Approximately 40% of the world is at risk for malaria infection, and approximately 90% of all malaria deaths occur in sub-Saharan Africa (67). It is estimated that 500 million new cases of malarial illness caused by the Plasmodium falciparum parasite occur every year in Africa (http://www.rbm.who.int/amd2003/amr2003/ch1.htm).
Most of the common variants associated with resistance to malarial infection in Africans are expressed in red blood cells or play a role in immune response. These variants include hemoglobin HbS, HbC, HbE and α+-thalassemia, the G6PD A- allele, the FY*O Duffy allele (which prevents P. vivax from invading erythrocytes), and a number of HLA alleles (67, 83, 86).
Interestingly, several studies have shown that ethnically diverse African populations may differ in regard to genetic susceptibility to malarial infection. A variant in the promoter region of the IL4 gene, for example, is associated with a decrease in P. falciparum infection in the pastoralist Fulani from Mali, as evidenced by lower parasite load, but no such genetic association is observed in the neighboring agriculturalist Dogon population (52, 213). Other studies also reported lower prevalence of malaria parasites and fewer clinical attacks of malaria among the Fulani compared to other ethnically distinct populations in neighboring villages (107, 133, 152a). Differences in the expression profile of genes involved in immune response in the Fulani have been suggested to explain the distinct resistance to malaria in this population, but not in neighboring African populations (210). Therefore, novel genetic adaptations to malaria have evolved in genetically distinct populations in response to differential exposure to pathogens. These studies demonstrate that ethnically diverse African populations may have different resistance or susceptibility alleles (Table 1), motivating the need to include a large number of genetically distinct populations in studies of infectious disease susceptibility in Africa.
There are a number of useful approaches for mapping complex disease traits that involve analyses of the association of markers and disease traits in pedigrees or parent/offspring trios [i.e., linkage and transmission disequilibrium test (TDT) analyses], and/or in populations (i.e., case/control association studies) (87). Several genome-wide SNP panels have been developed for use in genome-wide association studies (GWAS) in populations. Association mapping for complex disease relies on having some a priori knowledge of genetic diversity, population structure, and LD in both case and control populations to identify polymorphisms associated with disease genes and to avoid erroneous associations (122). Given the high levels of substructure and admixture between genetically distinct populations in Africa, even within small geographic regions, it is particularly important to control for population heterogeneity and substructure in GWAS of African populations [using, for example, programs such as PLINK (166)]. Another useful method for mapping complex traits in highly admixed populations (e.g., African Americans who have African and European ancestry) is mapping by admixture linkage disequilibrium (MALD) (see Supplemental Material). The MALD approach assumes that the genomic region that contains the susceptibility allele for a given disease will have enriched ancestry from populations in which the disease phenotype is more prevalent. Thus, detailed characterization of allelic variation across ancestral West African populations, which will enable accurate inference of African ancestry, will be important for the success of this approach.
Overall, the genetic factors underlying complex diseases are still poorly understood. To date, two models of complex disease have been proposed. The common disease/common variant (CD/CV) hypothesis posits that alleles influencing complex diseases are relatively common and are therefore found in multiple populations (51, 206). In contrast, some data have suggested that complex disease is caused by rare susceptibility alleles at many loci with small effect (50, 160). Additionally, gene × environment interactions as well as epistatic interactions among loci likely influence complex disease susceptibility (including infectious disease susceptibility) (44, 230, 232). Due to local adaptation, there may be population or regional specific susceptibility alleles underlying some complex diseases in Africa. Here we discuss three complex diseases that are common in populations of recent African descent (see Supplemental Material for a review of genetic susceptibility to prostate cancer in African descent populations and Reference 187 for a detailed review of additional genetic disease susceptibility studies in Africa).
Obesity is a multifactorial disease that disproportionately affects African Americans and Afro-Caribbeans living in the United States (151). Moreover, this disease is increasing in prevalence in sub-Saharan Africa (13), particularly among urban residents . A recent study reported that the incidence of obesity in urban West Africa has more than doubled (114%) over the past 15 years (2). Obesity is a serious health concern because it is closely associated with other common disorders, such as type 2 diabetes and hypertension (208).
Although environmental factors are important determinants of obesity (92, 132), studies have also identified candidate loci that contribute to the onset of this disease. For example, the human uncoupling gene UCP3 has been correlated with obesity and lower resting energy expenditure in African Americans and the Mende tribe of Sierra Leone (10, 100); in contrast, risk for obesity associated with this gene varied among non-African populations (48, 134, 152). Another closely linked human uncoupling gene, UCP2, was also correlated with obesity in AfricanAmerican children (240). However, whether or not these tightly linked genes exert an independent effect on obesity in populations of African ancestry has not been firmly established.
Other candidate gene studies have reported associations between polymorphisms in promoter regions and weight-related phenotypes in populations of African descent. For example, polymorphisms in the promoter of the angiotensin-converting enzyme (ACE) gene are correlated with obesity in Nigerians and African Americans (105). Also, allelic variation in the promoter of the agouti-related protein (AGRP) gene was found to affect gene expression and was implicated in the regulation of body weight in people of African origin (11, 22).
Genome-wide linkage analyses of obesity-related phentoypes have indicated strong linkage to chromosomes 1, 2, 5, 7, 8, and 11 in West Africans (4, 29). These results demonstrate that multiple loci, together with environmental factors, likely contribute to the phenotypic variance of obesity-related traits.
Type 2 diabetes mellitus (T2DM) is a late-onset metabolic disorder characterized by reduced insulin secretion and insulin action (89). African Americans have a twofold increase in risk for T2DM compared to other populations in the United States. Furthermore, the prevalence of T2D is lower in Africa (~1%--2%) than among people of African descent in industrialized nations (~11%--13%)(177).
One of the earliest T2DM susceptibility loci implicated in this disease is the calpain 10 (CAPN10) gene on chromosome 2 in Mexican Americans (89). Subsequent association studies, however, have yielded inconsistent results among geographically distinct populations (70). In addition, the risk associated with this gene also differs between ethnically diverse populations from West Africa. Specifically, a CAPN10 haplotype defined by known risk polymorphisms was associated with T2DM in populations from Nigeria, but not in distinct populations from Ghana (31). Further study in a larger set of Africans will be needed to resolve this discrepancy.
Other recently identified candidate loci associated with diabetes or diabetes-related traits in Africans and African Americans include the AGRP gene (22) the transcription factor 7-like 2 (TCF7L2) gene (85), and the proprotein convertase subtilisin/kexin-type 2 (PCSK2) gene (108a). Genome-wide linkage analyses in West African families have also indicated suggestive linkage to diabetes on chromosomes 12 and 19 as well as stronger evidence of linkage on chromosome 20 (177). Other studies have reported strong linkage for several quantitative traits that contribute to diabetes on chromosomes 4, 6, 8, 10, 15, 16, 17, and 18 (28, 30, 178).
Increased risk for type 2 diabetes in Africans and other indigenous populations is also suggested to be due to changes in selective pressure (i.e., the thrifty gene hypothesis) (139). Prior to 10,000 years ago, all modern humans subsisted as hunter-gatherers who likely experienced frequent cycles of feast and famine. According to the thrifty gene hypothesis, ancestral genetic variants that once promoted the efficient absorption, storage, or utilization of nutrients in this ancestral environment are now maladaptive in more modern environments, increasing risk for disease (51, 153). Although genetic evidence for this hypothesis has been inconclusive (34, 68, 214), recent data from the Macaque Genome Project indicated that a number of polymorphisms in the macaque correspond to known disease-predisposing alleles in humans (39). These results suggest that ancestral variants may influence disease susceptibility in humans .
Hypertension (HT) disproportionately affects people of African descent living in Western environments. For example, African Americans have a 1.6-fold higher prevalence of HT than European Americans (49). However, HT was not considered to be a major disease in sub-Saharan Africa until recently. Studies have now reported a growing incidence of HT, especially in urban settings in Nigeria, Cameroon, and Tanzania (56, 212).
To date, candidate gene studies have shown inconsistent associations with HT. For example, increased risk for disease was correlated with the G protein β3-subunit (GNB3) gene, angiotensin (AGT), and angiotensin II receptor (AGTR1) in some populations of African ancestry (53, 123, 186, 245), but these results were not replicated in other studies (61, 216, 231). A study of the heritability of AGT and ACE levels in Nigerians and African Americans indicated that heritability is high in Nigerians (77% for AGT and 67% for ACE) but low in African Americans (18% for AGT and ACE), suggesting a strong environmental component to variability in African American populations (44).
Other evidence has suggested that interactions between susceptibility loci may play an important role in HT susceptibility. For example, polymorphisms in the ACE, ACE4, and ACE8 genes were jointly associated with blood pressure through epistatic interaction in Nigerian families (244). Moreover, interaction between genes (ACE and GRK4) on different chromosomes was also found to influence blood pressure in a cohort of West African families (231). These studies suggest a complex model of disease susceptibility that involves epistatic interactions and may account for inconsistent associations in previous studies.
The results of genome-wide linkage studies have also suggested that chromosome 6q24 and 21q21 likely contain genes that influence risk for hypertension in African Americans. In another study, fine mapping of HT susceptibility loci on chromosome 6 uncovered polymorphisms in the vanin 1 (VNN1) gene that were correlated with HT in African Americans and Mexican Americans (246,).
From an evolutionary perspective, salt retention (a characteristic of HT) has been proposed to represent a phenotypic adaptation to heat. Specifically, ancient human populations living in hot, humid areas, who consumed low levels of dietary salt, were theorized to have adapted to their environment by retaining salt, while populations in cooler, temperate climates adapted to conditions of higher sodium levels (14). Under this scenario, polymorphisms that promote salt retention will increase in frequency in hot and humid environments due to selective pressure. This hypothesis was recently supported by several studies that observed the highest frequency of variants associated with salt retention and/or high blood pressure in Africans and decreasing allele frequency outside of Africa (138, 198, 242).
Individuals with distinct ancestry are known to vary in drug response. For example, drugs commonly used to treat heart disease are known to be less effective in individuals of African descent relative to individuals of European descent (see Supplemental Material). To better understand variation in drug response, it is important to characterize levels and patterns of genetic diversity at genes that may influence drug response, drug metabolism, and/or transport in ethnically diverse populations (23). To date, some of the most intensively studied drug metabolism enzyme (DME) loci include cytochrome P450s (CYPs), N-acetyl transferases (NATs), and multidrug transporters (MDR). These genes are highly polymorphic and their variation results in proteins with increased, normal, or decreased activity (187).
Studies have shown differences in the distribution of variation at these DME loci in African and non-African populations. For example, the CYP2B6 gene is involved in the metabolism of several clinically important drugs, including artemisinin and efavirenz used to treat malaria and HIV infection, respectively. Common CYP2B6 polymorphisms vary in frequency between ethnically diverse populations from West Africa, as well as between African and non-African populations. The functional significance of most of these variants is not yet known (131). However, recent studies have shown that two polymorphisms (983T>C and 516G>T), found at a higher frequency in African populations relative to non-Africans, are associated with a reduction in CYP2B6 protein expression as well as a large increase in levels of the antiretroviral drug efavirenz in the plasma of African HIV patients (130, 145, 239).
Additionally, the frequency of functional variants at the NAT2 gene, known to play a role in the metabolism of the drug isoniazid (used to treat TB), was found to vary among ethnically diverse African populations. In particular, haplotypes associated with the fast-acetylation and the slow-acetylation phenotypes at the NAT2 gene differ in frequency among African populations, particularly between hunter-gatherer and agriculturalist populations (H.M. Mortensen and S.A.Tishkoff, unpublished data, 154, 155). However, Africans have high levels of haplotype diversity and the effect of many of these haplotypes remains unknown. These studies imply that ethnically diverse Africans may differ in response to drugs used to treat infectious disease due to variation at genes involved in drug metabolism or transport. Given the high numbers of deaths due to infectious disease in Africa, the study of variation at genes that play a role in response to these drugs across ethnically diverse Africans is of critical importance.
To date, only a fraction of the ~2000 linguistically distinct ethnic groups in Africa has been extensively studied for genome-wide variation. Extensive sampling of East African populations will be informative for testing models of the origin and dispersal of modern humans out of Africa, whereas in depth sampling of West African populations will be informative for determining African American ancestry and for the identification of markers and populations useful in MALD mapping. It is important to use an ethical approach when collecting samples to be used for genetic diversity studies, particularly if those samples and genetic data will be made publicly available. In addition to obtaining research permits from the local African governments and informed consent from individual participants (including benefits and risks involved in use of samples in current and future studies), results should ideally be made available to participants after translation into the local language. Additionally, efforts should be made to train local African scientists and to build resources across Africa for independent human genetics research. The African Society of Human Genetics was recently formed in 2004 to help achieve many of these goals (176) (http://www.afshg.org).
As we begin to build diverse sets of samples, a shift toward genome-wide studies of genetic diversity will be particularly informative for making inferences about African demographic history and the genetic basis of complex traits. A better understanding of the distribution and frequency of structural variation and its role in regard to phenotypic diversity will be of particular interest. The development of high-throughput SNP genotyping methodology (i.e., the Affymetrix 6.0 and Illumina 1M SNP chips), which is rapidly becoming less expensive, will facilitate the possibility of GWAS in a large number of Africans. Indeed, several recent GWAS in Europeans that have identified genetic variants associated with traits such as height, skin pigmentation, and eye color indicate that very large sample sizes (>3000--5000) are required to identify variants associated with complex traits influenced by multiple loci and the environment (194, 227). However, to date no GWAS have been perfomed on a comparable scale in Africa. Such studies will be highly informative for identifying genes that play a role in a number of common traits (e.g., height), as well as for identifying genes that play a role in susceptibility to infectious and other complex diseases. The high levels of genetic substructure in Africa, even within small geographic regions ( F.A. Reed and S.A. Tishkoff, unpublished data), require determination of individual ancestry and proper correction for substructure in association studies.
Genotyping ethnically diverse African populations living in distinct climates and with distinct subsistence patterns for these high density SNP panels will also be useful for conducting wholegenome scans of selection to identify genes that have played a role in local adaptation and disease. The continued development of statistical and computational methods for inferring demographic history and natural selection will shed light on human evolutionary history in Africa. Approaches that incorporate detailed geographic information such as natural boundaries (i.e., mountain ranges, rivers, deserts, etc.) will be particularly useful for inferring African demographic history (163, 172).
Given that African populations possess a large fraction of population-specific alleles and may have experienced local adaptation, resequencing across diverse African populations will be important for identifying population-specific functional variants. Targeted resequencing of genes that play a key role in disease susceptibility and drug response will be particularly important for the design of more effective treatments in individuals of recent African descent. Additionally, whole genome resequencing (a goal of the 1000 genomes project) will be informative for identifying large-scale structural variants and rare variants that may play an important role in disease and for reconstructing human evolutionary history.
We thank S.M. Williams, J.M. Akey, J.S. Friedlaender, and N.A. Rosenberg for critical review of sections of the manuscript and/or figures and for helpful suggestions. The authors are funded by the U.S. National Science Foundation (NSF) grant BSC-0552486, U.S. National Institutes of Health (NIH) grant R01GM076637, and a David and Lucile Packard Career Award to S.A.T.
The authors are not aware of any biases that might be perceived as affecting the objectivity of this review.