PCA of genetic variation within Africa indicated the presence of 43 significant PCs (P < 0.05 with a Tracy-Widom distribution). PC1 (10.8% of the extracted variation) distinguishes eastern and Saharan Africa from western, central, and southern Africa (). The second PC (6.1%) distinguishes the Hadza; the third PC (4.9%) distinguishes Pygmy and SAK individuals from other Africans. The fourth PC (3.7%) is associated with the Mozabites, some Dogon, and the CMA individuals, who show ancestry from the European–Middle Eastern cluster. The fifth PC (3.1%) is associated with SAK speakers. The 10th PC was of particular interest (2.2%) because it associates with the SAK, Sandawe, and some Dogon individuals, suggesting shared ancestry.
We incorporated geographic data into a Bayesian clustering analysis, assuming no admixture (TESS software) (
25) and distinguished six clusters within continental Africa (). The most geographically widespread cluster (orange) extends from far Western Africa (the Mandinka) through central Africa to the Bantu speakers of South Africa (the Venda and Xhosa) and corresponds to the distribution of the Niger-Kordofanian language family, possibly reflecting the spread of Bantu-speaking populations from near the Nigerian/Cameroon highlands across eastern and southern Africa within the past 5000 to 3000 years (
26,
27). Another inferred cluster includes the Pygmy and SAK populations (green), with a noncontiguous geographic distribution in central and southeastern Africa, consistent with the STRUCTURE () and phylogenetic analyses (). Another geographically contiguous cluster extends across northern Africa (blue) into Mali (the Dogon), Ethiopia, and northern Kenya. With the exception of the Dogon, these populations speak an Afroasiatic language. Chadic-speaking and Nilo-Saharan–speaking populations from Nigeria, Cameroon, and central Chad, as well as several Nilo-Saharan–speaking populations from southern Sudan, constitute another cluster (red). Nilo-Saharan and Cushitic speakers from the Sudan, Kenya, and Tanzania, as well as some of the Bantu speakers from Kenya, Tanzania, and Rwanda (Hutu/Tutsi), constitute another cluster (purple), reflecting linguistic evidence for gene flow among these populations over the past ~5000 years (
28,
29). Finally, the Hadza are the sole constituents of a sixth cluster (yellow), consistent with their distinctive genetic structure identified by PCA and STRUCTURE.
STRUCTURE analysis of the Africa data set indicated 14 ancestral clusters (, and
figs. S15 to S18). Analyses of subregions within Africa indicated additional substructure (
figs. S19 to S29). At low
K values, the Africa-wide STRUCTURE results (
fig. S15) recapitulated the PCA and worldwide STRUCTURE results. However, as
K increased, additional population clusters were distinguished (
4): the Mbugu [who speak a mixed Bantu and Cushitic language (
30), shown in dark purple]; Cushitic-speaking individuals of southern Ethiopian origin (light purple); Nilotic Nilo-Saharan–speaking individuals (red); central Sudanic Nilo-Saharan–speaking individuals (tan); and Chadic-speaking and Baggara individuals (maroon). At
K = 14, subtle substructure between East African Bantu speakers (light orange) and West Central African Bantu speakers (medium orange), and individuals from Nigeria and farther west, who speak various non-Bantu Niger-Kordofanian languages (dark orange), was also apparent (). Bantu speakers of South Africa (Xhosa, Venda) showed substantial levels of the SAK and western African Bantu AACs and low levels of the East African Bantu AAC (the latter is also present in Bantu speakers from Democratic Republic of Congo and Rwanda). Our results indicate distinct East African Bantu migration into southern Africa and are consistent with linguistic and archeological evidence of East African Bantu migration from an area west of Lake Victoria (
28) and the incorporation of Khoekhoe ancestry into several of the Southeast Bantu populations ~1500 to 1000 years ago (
31).
High levels of heterogeneous ancestry (i.e., multiple cluster assignments) were observed in nearly all African individuals, with the exception of western and central African Niger-Kordofanian speakers (medium orange), who are relatively homogeneous at large
K values ( and
fig. S15). Considerable Niger-Kordofanian ancestry (shades of orange) was observed in nearly all populations, reflecting the recent spread of Bantu speakers across equatorial, eastern, and southern Africa (
27) and subsequent admixture with local populations (
28). Many Nilo-Saharan–speaking populations in East Africa, such as the Maasai, show multiple cluster assignments from the Nilo-Saharan (red) and Cushitic (dark purple) AACs, in accord with linguistic evidence of repeated Nilotic assimilation of Cushites over the past 3000 years (
32) and with the high frequency of a shared East African–specific mutation associated with lactose tolerance (
33).
Our data support the hypothesis that the Sahel has been a corridor for bidirectional migration between eastern and western Africa (
34-
36). The highest proportion of the Nilo-Saharan AAC was observed in the southern and central Sudanese populations (Nuer, Dinka, Shilluk, and Nyimang), with decreasing frequency from northern Kenya (e.g., Pokot) to northern Tanzania (Datog, Maasai) (, and
fig. S15). Additionally, all Nilo-Saharan–speaking populations from Kenya, Tanzania, southern Sudan, and Chad clustered with west central Afroasiatic Chadic–speaking populations in the global analysis at
K ≤11 (), which is consistent with linguistic and archeological data suggesting bidirectional migration of Nilo-Saharans from source populations in Sudan within the past ~10,500 to 3000 years (
4,
29). The proposed migration of proto-Chadic Afroasiatic speakers ~7000 years ago from the central Sahara into the Lake Chad Basin may have resulted in a Nilo-Saharan to Afroasiatic language shift among Chadic speakers (
37). However, our data suggest that this shift was not accompanied by large amounts of Afroasiatic gene flow. Other populations of interest, including the Fulani (Nigeria and Cameroon), the Baggara Arabs (Cameroon), the Koma (Nigeria), and Beja (Sudan), are discussed in (
4).