1.  Population Genomics of Sub-Saharan Drosophila melanogaster: African Diversity and Non-African Admixture 
PLoS Genetics  2012;8(12):e1003080.
Drosophila melanogaster has played a pivotal role in the development of modern population genetics. However, many basic questions regarding the demographic and adaptive history of this species remain unresolved. We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection within an African population, between African populations, and between European and African populations. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa FST were found to be enriched in genomic regions of locally elevated cosmopolitan admixture, possibly reflecting a role for some of these loci in driving the introgression of non-African alleles into African populations.
Author Summary
Improvements in DNA sequencing technology have allowed genetic variation to be studied at the level of fully sequenced genomes. We have sequenced more than 100 D. melanogaster genomes originating from sub-Saharan Africa, which is thought to contain the ancestral range of this model organism. We found evidence for recent and substantial non-African gene flow into African populations, which may be driven by natural selection. The data also helped to refine our understanding of the species' history, which may have involved a geographic expansion from southern central Africa (e.g. Zambia). Lastly, we identified a large number of genes and functions that may have experienced recent adaptive evolution in one or more populations. An understanding of genomic variation in ancestral range populations of D. melanogaster will improve our ability to make population genetic inferences for worldwide populations. The results presented here should motivate statistical, mathematical, and computational studies to identify evolutionary models that are most compatible with observed data. Finally, the potential signals of natural selection identified here should facilitate detailed follow-up studies on the genetic basis of adaptive evolutionary change.
PMCID: PMC3527209  PMID: 23284287
2.  Worldwide Distribution of the MYH9 Kidney Disease Susceptibility Alleles and Haplotypes: Evidence of Historical Selection in Africa 
PLoS ONE  2010;5(7):e11474.
MYH9 was recently identified as renal susceptibility gene (OR 3–8, p<10−8) for major forms of kidney disease disproportionately affecting individuals of African descent. The risk haplotype (E-1) occurs at much higher frequencies in African Americans (≥60%) than in European Americans (<4%), revealing a genetic basis for a major health disparity. The population distributions of MYH9 risk alleles and the E-1 risk haplotype and the demographic and selective forces acting on the MYH9 region are not well explored. We reconstructed MYH9 haplotypes from 4 tagging single nucleotide polymorphisms (SNPs) spanning introns 12–23 using available data from HapMap Phase II, and by genotyping 938 DNAs from the Human Genome Diversity Panel (HGDP). The E-1 risk haplotype followed a cline, being most frequent within sub-Saharan African populations (range 50–80%), less frequent in populations from the Middle East (9–27%) and Europe (0–9%), and rare or absent in Asia, the Americas, and Oceania. The fixation indexes (FST) for pairwise comparisons between the risk haplotypes for continental populations were calculated for MYH9 haplotypes; FST ranged from 0.27–0.40 for Africa compared to other continental populations, possibly due to selection. Uniquely in Africa, the Yoruba population showed high frequency extended haplotype length around the core risk allele (C) compared to the alternative allele (T) at the same locus (rs4821481, iHs = 2.67), as well as high population differentiation (FST(CEU vs. YRI) = 0.51) in HapMap Phase II data, also observable only in the Yoruba population from HGDP (FST = 0.49), pointing to an instance of recent selection in the genomic region. The population-specific divergence in MYH9 risk allele frequencies among the world's populations may prove important in risk assessment and public health policies to mitigate the burden of kidney disease in vulnerable populations.
PMCID: PMC2901326  PMID: 20634883
3.  Genome-wide scan with nearly 700 000 SNPs in two Sardinian sub-populations suggests some regions as candidate targets for positive selection 
European Journal of Human Genetics  2012;20(11):1155-1161.
This paper explores the genetic structure and signatures of natural selection in different sub-populations from the Island of Sardinia, exploiting information from nearly 700 000 autosomal SNPs genotyped with the Affymetrix Genome-Wide Human SNP 6.0 Array. The genetic structure of the Sardinian population and its position within the context of other Mediterranean and European human groups were investigated in depth by comparing our data with publicly available data sets. Principal components and admixture analyses suggest a clustering of the examined samples in two significantly differentiated sub-populations (Ogliastra and Southern Sardinia), as confirmed by AMOVA (FST=0.011; P<0.001). Differentiation of these sub-populations was still evident when they were pooled together with supplementary Sardinian samples from HGDP and compared with several other European, North-African and Near Eastern populations, confirming the uniqueness of the Sardinian genetic background. Moreover, by applying several statistical approaches aimed at assessing differences at the SNP level, the highest differentiated genomic regions between Ogliastra and Southern Sardinia were thus investigated via an extended haplotype homozygosity (EHH)-based test to point out potential selective sweeps. Using this approach, 40 genomic regions were detected, with significant differences between Ogliastra and Southern Sardinia. These regions were subsequently investigated using a long-range haplotype test, which found significant REHH values for SNPs rs11070188 and rs11070192 in the Ogliastra sub-population. In the light of these results and the overlap of the different computed statistics, the region encompassing these loci can be considered a strong candidate to have undergone selective pressure in Ogliastra.
PMCID: PMC3476709  PMID: 22535185
Sardinia; natural selection; extended haplotype homozygosity (EHH)
4.  Balancing Selection on a Regulatory Region Exhibiting Ancient Variation That Predates Human–Neandertal Divergence 
PLoS Genetics  2013;9(4):e1003404.
Ancient population structure shaping contemporary genetic variation has been recently appreciated and has important implications regarding our understanding of the structure of modern human genomes. We identified a ∼36-kb DNA segment in the human genome that displays an ancient substructure. The variation at this locus exists primarily as two highly divergent haplogroups. One of these haplogroups (the NE1 haplogroup) aligns with the Neandertal haplotype and contains a 4.6-kb deletion polymorphism in perfect linkage disequilibrium with 12 single nucleotide polymorphisms (SNPs) across diverse populations. The other haplogroup, which does not contain the 4.6-kb deletion, aligns with the chimpanzee haplotype and is likely ancestral. Africans have higher overall pairwise differences with the Neandertal haplotype than Eurasians do for this NE1 locus (p<10−15). Moreover, the nucleotide diversity at this locus is higher in Eurasians than in Africans. These results mimic signatures of recent Neandertal admixture contributing to this locus. However, an in-depth assessment of the variation in this region across multiple populations reveals that African NE1 haplotypes, albeit rare, harbor more sequence variation than NE1 haplotypes found in Europeans, indicating an ancient African origin of this haplogroup and refuting recent Neandertal admixture. Population genetic analyses of the SNPs within each of these haplogroups, along with genome-wide comparisons revealed significant FST (p = 0.00003) and positive Tajima's D (p = 0.00285) statistics, pointing to non-neutral evolution of this locus. The NE1 locus harbors no protein-coding genes, but contains transcribed sequences as well as sequences with putative regulatory function based on bioinformatic predictions and in vitro experiments. We postulate that the variation observed at this locus predates Human–Neandertal divergence and is evolving under balancing selection, especially among European populations.
Author Summary
Natural selection shapes the genome in a non-random way, as an allele that contributes more to the reproductive fitness of a species increases in frequency within the population. Under balancing selection, a particular kind of natural selection, more than one allele increases in frequency in the population, likely due to a reproductive advantage of individuals carrying both alleles. Only a handful of loci have been well documented to evolve under balancing selection, with the HBB gene (sickle cell locus) being the best studied. Here, we report a non-coding (but putatively functional) locus that has maintained two divergent alleles in the human population since before the Human–Neandertal divergence and is therefore likely to be under balancing selection. These findings also provide a clear example for ancient African substructure.
PMCID: PMC3623772  PMID: 23593015
5.  Natural selection at genomic regions associated with obesity and type-2 diabetes: East Asians and sub-Saharan Africans exhibit high levels of differentiation at type-2 diabetes regions 
Human genetics  2010;129(4):407-418.
Different populations suffer from different rates of obesity and type-2 diabetes (T2D). Little is known about the genetic or adaptive component, if any, that underlies these differences. Given the cultural, geographic, and dietary variation that accumulated among humans over the last 60,000 years, we examined whether loci identified by genome-wide association studies for these traits have been subject to recent selection pressures. Using genome-wide SNP data on 938 individuals in 53 populations from the Human Genome Diversity Panel, we compare population differentiation and haplotype patterns at these loci to the rest of the genome. Using an “expanding window” approach (100 to 1,600 kb) for the individual loci as well as the loci as ensembles, we find a high degree of differentiation for the ensemble of T2D loci. This differentiation is most pronounced for East Asians and sub-Saharan Africans, suggesting that these groups experienced natural selection at loci associated with T2D. Haplotype analysis suggests an excess of obesity loci with evidence of recent positive selection among South Asians and Europeans, compared to sub-Saharan Africans and Native Americans. We also identify individual loci that may have been subjected to natural selection, such as the T2D locus, HHEX, which displays both elevated differentiation and extended haplotype homozygosity in comparisons of East Asians with other groups. Our findings suggest that there is an evolutionary genetic basis for population differences in these traits, and we have identified potential group-specific genetic risk factors.
PMCID: PMC3113599  PMID: 21188420
obesity; type-2 diabetes; genetics; natural selection; population differentiation
6.  Population genetic study of the brain-derived neurotrophic factor (BDNF) gene 
Molecular psychiatry  2009;15(8):810-815.
Genetic variants in the brain-derived neurotrophic factor (BDNF) gene, predominantly the functional Val66Met polymorphism, have been associated with risk of bipolar disorder and other psychiatric disorders. However, not all studies support these findings, and overall the evidence for BDNF association with disease risk is weak. As differences in population genetic structure between patient samples could cause discrepant or spurious association results, we investigated this possibility by carrying out population genetic analyses of the BDNF genomic region. Substantial variation was detected in BDNF coding region SNP allele and haplotype frequencies between 58 global populations, with the derived Met allele of Val66Met ranging from 0–72% frequency across populations. FST analyses to assess diversity in the HapMap populations determined that the Val66Met FST value was at the 99.8th percentile among all SNPs in the genome. As the BDNF population genetic differences may be due to local selection, we performed the long-range haplotype (LRH) test for selection using 68 SNPs spanning the BDNF genomic region in 12 European-derived pedigrees. Evidence for positive selection was found for a high frequency Val-carrying haplotype, with a relative extended haplotype homozygosity (REHH) value above the 99th percentile compared to HapMap data (P=4.6 ×10−4). In conclusion, we observed considerable BDNF allele and haplotype diversity among global populations and evidence for positive selection at the BDNF locus. These phenomena can have a profound impact on detection of disease susceptibility genes and must be considered in gene association studies of BDNF.
PMCID: PMC2888876  PMID: 19255578
selection; diversity; gene association; single-nucleotide polymorphism; haplotype; susceptibility locus
7.  Inactive alleles of cytochrome P450 2C19 may be positively selected in human evolution 
Cytochrome P450 CYP2C19 metabolizes a wide range of pharmacologically active substances and a relatively small number of naturally occurring environmental toxins. Poor activity alleles of CYP2C19 are very frequent worldwide, particularly in Asia, raising the possibility that reduced metabolism could be advantageous in some circumstances. The evolutionary selective forces acting on this gene have not previously been investigated.
We analyzed CYP2C19 genetic markers from 127 Gambians and on 120 chromosomes from Yoruba, Europeans and Asians (Japanese + Han Chinese) in the Hapmap database. Haplotype breakdown was explored using bifurcation plots and relative extended haplotype homozygosity (REHH). Allele frequency differentiation across populations was estimated using the fixation index (FST) and haplotype diversity with coalescent models.
Bifurcation plots suggested conservation of alleles conferring slow metabolism (CYP2C19*2 and *3). REHH was high around CYP2C19*2 in Yoruba (REHH 8.3, at 133.3 kb from the core) and to a lesser extent in Europeans (3.5, at 37.7 kb) and Asians (2.8, at −29.7 kb). FST at the CYP2C19 locus was low overall (0.098). CYP2C19*3 was an FST outlier in Asians (0.293), CYP2C19 haplotype diversity < = 0.037, p <0.001.
We found some evidence that the slow metabolizing allele CYP2C19*2 is subject to positive selective forces worldwide. Similar evidence was also found for CYP2C19*3 which is frequent only in Asia. FST is low at the CYP2C19 locus, suggesting balancing selection overall. The biological factors responsible for these selective pressures are currently unknown. One possible explanation is that early humans were exposed to a ubiquitous novel toxin activated by CYP2C19. The genetic adaptation took place within the last 10,000 years which coincides with the development of systematic agricultural practices.
PMCID: PMC4036532  PMID: 24690327
Positive selection; Cytochrome P450 2C19; Xenobiotics; Drug metabolism; Extended haplotype homozygosity; Bifurcation plots
8.  Ancestral Components of Admixed Genomes in a Mexican Cohort 
PLoS Genetics  2011;7(12):e1002410.
For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study “virtual genomes” of admixed individuals. We apply this approach to a cohort of 492 parent–offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations—Africa, Europe, and America—vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10–15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease-related phenotypes and will allow new insight into the adaptive and demographic history of indigenous people.
Author Summary
Admixed individuals, such as African Americans and Latinos, arise from mating between individuals from different continents. Detailed knowledge about the ancestral origin of an admixed population not only provides insight regarding the history of the population itself, but also affords opportunities to study the evolutionary biology of the ancestral populations. Applying novel statistical methods, we analyzed the high-density genotype data of nearly 1,500 Mexican individuals from Mexico City, who are admixed among Indigenous Americans, Europeans, and Africans. The relative contributions from the three continental-level ancestral populations vary substantially between individuals. The European ancestors of these Mexican individuals genetically resemble Southern Europeans, such as the Spaniard and the Portuguese. The Indigenous American ancestry of the Mexicans in our study is largely attributed to the indigenous groups residing in the southwestern region of Mexico, although some individuals have inherited varying degrees of ancestry from the Mayans of the Yucatan Peninsula and other indigenous American populations. A search for signatures of selection, focusing on the parts of the genomes derived from an ancestral population (e.g. Indigenous American), identifies regions in which a genetic variant may have been favored by natural selection in that ancestral population.
PMCID: PMC3240599  PMID: 22194699
9.  Genomic Ancestry of North Africans Supports Back-to-Africa Migrations 
PLoS Genetics  2012;8(1):e1002397.
North African populations are distinct from sub-Saharan Africans based on cultural, linguistic, and phenotypic attributes; however, the time and the extent of genetic divergence between populations north and south of the Sahara remain poorly understood. Here, we interrogate the multilayered history of North Africa by characterizing the effect of hypothesized migrations from the Near East, Europe, and sub-Saharan Africa on current genetic diversity. We present dense, genome-wide SNP genotyping array data (730,000 sites) from seven North African populations, spanning from Egypt to Morocco, and one Spanish population. We identify a gradient of likely autochthonous Maghrebi ancestry that increases from east to west across northern Africa; this ancestry is likely derived from “back-to-Africa” gene flow more than 12,000 years ago (ya), prior to the Holocene. The indigenous North African ancestry is more frequent in populations with historical Berber ethnicity. In most North African populations we also see substantial shared ancestry with the Near East, and to a lesser extent sub-Saharan Africa and Europe. To estimate the time of migration from sub-Saharan populations into North Africa, we implement a maximum likelihood dating method based on the distribution of migrant tracts. In order to first identify migrant tracts, we assign local ancestry to haplotypes using a novel, principal component-based analysis of three ancestral populations. We estimate that a migration of western African origin into Morocco began about 40 generations ago (approximately 1,200 ya); a migration of individuals with Nilotic ancestry into Egypt occurred about 25 generations ago (approximately 750 ya). Our genomic data reveal an extraordinarily complex history of migrations, involving at least five ancestral populations, into North Africa.
Author Summary
Proposed migrations between North Africa and neighboring regions have included Paleolithic gene flow from the Near East, an Arabic migration across the whole of North Africa 1,400 years ago (ya), and trans-Saharan transport of slaves from sub-Saharan Africa. Historical records, archaeology, and mitochondrial and Y-chromosome DNA have been marshaled in support of one theory or another, but there is little consensus regarding the overall genetic background of North African populations or their origin and expansion. We characterize the patterns of genetic variation in North Africa using ∼730,000 single nucleotide polymorphisms from across the genome for seven populations. We observe two distinct, opposite gradients of ancestry: an east-to-west increase in likely autochthonous North African ancestry and an east-to-west decrease in likely Near Eastern Arabic ancestry. The indigenous North African ancestry may have been more common in Berber populations and appears most closely related to populations outside of Africa, but divergence between Maghrebi peoples and Near Eastern/Europeans likely precedes the Holocene (>12,000 ya). We also find significant signatures of sub-Saharan African ancestry that vary substantially among populations. These sub-Saharan ancestries appear to be a recent introduction into North African populations, dating to about 1,200 years ago in southern Morocco and about 750 years ago into Egypt, possibly reflecting the patterns of the trans-Saharan slave trade that occurred during this period.
PMCID: PMC3257290  PMID: 22253600
10.  Mortality in Patients with HIV-1 Infection Starting Antiretroviral Therapy in South Africa, Europe, or North America: A Collaborative Analysis of Prospective Studies 
PLoS Medicine  2014;11(9):e1001718.
Analyzing survival in HIV treatment cohorts, Andrew Boulle and colleagues find mortality rates in South Africa comparable to or better than those in North America by 4 years after starting antiretroviral therapy.
Please see later in the article for the Editors' Summary
High early mortality in patients with HIV-1 starting antiretroviral therapy (ART) in sub-Saharan Africa, compared to Europe and North America, is well documented. Longer-term comparisons between settings have been limited by poor ascertainment of mortality in high burden African settings. This study aimed to compare mortality up to four years on ART between South Africa, Europe, and North America.
Methods and Findings
Data from four South African cohorts in which patients lost to follow-up (LTF) could be linked to the national population register to determine vital status were combined with data from Europe and North America. Cumulative mortality, crude and adjusted (for characteristics at ART initiation) mortality rate ratios (relative to South Africa), and predicted mortality rates were described by region at 0–3, 3–6, 6–12, 12–24, and 24–48 months on ART for the period 2001–2010. Of the adults included (30,467 [South Africa], 29,727 [Europe], and 7,160 [North America]), 20,306 (67%), 9,961 (34%), and 824 (12%) were women. Patients began treatment with markedly more advanced disease in South Africa (median CD4 count 102, 213, and 172 cells/µl in South Africa, Europe, and North America, respectively). High early mortality after starting ART in South Africa occurred mainly in patients starting ART with CD4 count <50 cells/µl. Cumulative mortality at 4 years was 16.6%, 4.7%, and 15.3% in South Africa, Europe, and North America, respectively. Mortality was initially much lower in Europe and North America than South Africa, but the differences were reduced or reversed (North America) at longer durations on ART (adjusted rate ratios 0.46, 95% CI 0.37–0.58, and 1.62, 95% CI 1.27–2.05 between 24 and 48 months on ART comparing Europe and North America to South Africa). While bias due to under-ascertainment of mortality was minimised through death registry linkage, residual bias could still be present due to differing approaches to and frequency of linkage.
After accounting for under-ascertainment of mortality, with increasing duration on ART, the mortality rate on HIV treatment in South Africa declines to levels comparable to or below those described in participating North American cohorts, while substantially narrowing the differential with the European cohorts.
Please see later in the article for the Editors' Summary
Editors' Summary
AIDS has killed about 36 million people since the first recorded case of the disease in 1981, and a similar number of people (including 25 million living in sub-Saharan Africa) are currently infected with HIV, the virus that causes AIDS. HIV destroys immune system cells (including CD4 cells, a type of lymphocyte), leaving infected individuals susceptible to other serious infections. Early in the AIDS epidemic, HIV-positive people usually died within 10 years of becoming infected. In 1996, effective antiretroviral therapy (ART) became available and, for people living in high-income countries, HIV infection became a chronic condition. But ART was expensive, so HIV/AIDS remained largely untreated and fatal in resource-limited countries. Then, in 2003, the international community began to work towards achieving universal access to ART. By the end of 2012, nearly two-thirds of HIV-positive people (nearly 10 million individuals) living in low- and middle-income countries who were eligible for treatment because their CD4 cell count had fallen below 350/mm3 blood or because they had developed an AIDS-defining condition were receiving treatment.
Why Was This Study Done?
It is known that a larger proportion of HIV-positive patients starting ART die during the first year of treatment in sub-Saharan Africa than in Europe and North America. This difference arises in part because patients in resource-limited settings tend to have lower CD4 counts when they start treatment than patients in wealthy countries. However, the lack of reliable data on mortality (death) in resource-limited settings has made it hard to compare longer-term outcomes in different settings. Information on the long-term outcomes of HIV-positive patients receiving ART in resource-limited countries is needed to guide the development of appropriate health systems and treatment regimens in these settings. In this collaborative analysis of prospective cohort studies, the researchers compare mortality up to 4 years on ART in South Africa, Europe, and North America. A prospective cohort study follows a group of individuals over time to see whether differences in specific characteristics at the start of the study affect subsequent outcomes. A collaborative analysis combines individual patient data from several studies.
What Did the Researchers Do and Find?
The researchers combined data from four South Africa cohorts of HIV-positive patients starting ART included in the International Epidemiologic Databases to Evaluate AIDS South African (IeDEA-SA) collaboration with data from six North American cohorts and nine European cohorts included in the ART Cohort Collaboration (ART-CC). The South African cohorts were chosen because unusually for studies undertaken in countries in sub-Saharan Africa the vital status of patients (whether they had died) who had been lost to follow-up in these cohorts could be obtained from the national population register. Patients in South Africa began treatment with more advanced disease (indicated by a lower average CD4 count) than patients in Europe or North America. Notably, high early mortality after starting ART in South Africa occurred mainly in patients starting ART with a CD4 count below 50 cells/mm3. The cumulative mortality after 4 years of ART was 16.6%, 4.7%, and 15.3% in South Africa, Europe, and North America, respectively. After adjusting for patient characteristics at ART initiation, the mortality rate among patients beginning ART was initially lower in Europe and North American than in South Africa. However, although the adjusted mortality rate in Europe remained lower than the rate in South Africa, the rate in North America was higher than that in South Africa between 24 and 48 months on ART.
What Do These Findings Mean?
Although the linkage to national vital registration systems (databases of births and deaths) undertaken in this collaborative analysis is likely to have greatly reduced bias due to under-ascertainment of mortality, the accuracy of these findings may still be limited by differences in how this linkage was undertaken in different settings. Nevertheless, these findings suggest that mortality among HIV-infected patients receiving ART in South Africa, although initially higher than in Europe and North America, rapidly declines with increasing duration on ART and, after 4 years of treatment, approaches the rate seen in high-income settings. Intriguingly, these findings also highlight the relatively higher late mortality in North America compared to either Europe or South Africa, a result that needs to be investigated to explore the extent to which differences in mortality ascertainment, patient characteristics and comorbidities, or health systems and treatment regimens contribute to variations in outcomes among HIV-positive patients in various settings.
Additional Information
Please access these websites via the online version of this summary at
This study is further discussed in a PLOS Medicine Perspective by Agnes Binagwaho and colleagues
Information is available from the US National Institute of Allergy and Infectious Diseases on HIV infection and AIDS
NAM/aidsmap provides basic information about HIV/AIDS, and summaries of recent research findings on HIV care and treatment
Information is available from Avert, an international AIDS charity, on many aspects of HIV/AIDS, including information on universal access to ART, on HIV and AIDS in sub-Saharan Africa, and on HIV and AIDS in South Africa (in English and Spanish)
The World Health Organization provides information on all aspects of HIV/AIDS (in several languages); its 2013 Consolidated guidelines on the use of antiretroviral drugs for treating and preventing HIV infections: recommendations for a public health approach are available
The 2013 UNAIDS World AIDS Day Report provides up-to-date information about the AIDS epidemic and efforts to halt it
Information about the International Epidemiologic Databases to Evaluate AIDS South African (IeDEA-SA) collaboration and about the ART Cohort Collaboration is available
Personal stories about living with HIV/AIDS are available through Avert, Nam/aidsmap, and Healthtalkonline
PMCID: PMC4159124  PMID: 25203931
11.  Geographic Differences in Genetic Susceptibility to IgA Nephropathy: GWAS Replication Study and Geospatial Risk Analysis 
PLoS Genetics  2012;8(6):e1002765.
IgA nephropathy (IgAN), major cause of kidney failure worldwide, is common in Asians, moderately prevalent in Europeans, and rare in Africans. It is not known if these differences represent variation in genes, environment, or ascertainment. In a recent GWAS, we localized five IgAN susceptibility loci on Chr.6p21 (HLA-DQB1/DRB1, PSMB9/TAP1, and DPA1/DPB2 loci), Chr.1q32 (CFHR3/R1 locus), and Chr.22q12 (HORMAD2 locus). These IgAN loci are associated with risk of other immune-mediated disorders such as type I diabetes, multiple sclerosis, or inflammatory bowel disease. We tested association of these loci in eight new independent cohorts of Asian, European, and African-American ancestry (N = 4,789), followed by meta-analysis with risk-score modeling in 12 cohorts (N = 10,755) and geospatial analysis in 85 world populations. Four susceptibility loci robustly replicated and all five loci were genome-wide significant in the combined cohort (P = 5×10−32–3×10−10), with heterogeneity detected only at the PSMB9/TAP1 locus (I2 = 0.60). Conditional analyses identified two new independent risk alleles within the HLA-DQB1/DRB1 locus, defining multiple risk and protective haplotypes within this interval. We also detected a significant genetic interaction, whereby the odds ratio for the HORMAD2 protective allele was reversed in homozygotes for a CFHR3/R1 deletion (P = 2.5×10−4). A seven–SNP genetic risk score, which explained 4.7% of overall IgAN risk, increased sharply with Eastward and Northward distance from Africa (r = 0.30, P = 3×10−128). This model paralleled the known East–West gradient in disease risk. Moreover, the prediction of a South–North axis was confirmed by registry data showing that the prevalence of IgAN–attributable kidney failure is increased in Northern Europe, similar to multiple sclerosis and type I diabetes. Variation at IgAN susceptibility loci correlates with differences in disease prevalence among world populations. These findings inform genetic, biological, and epidemiological investigations of IgAN and permit cross-comparison with other complex traits that share genetic risk loci and geographic patterns with IgAN.
Author Summary
IgA nephropathy (IgAN) is the most common cause of kidney failure in Asia, has lower prevalence in Europe, and is very infrequent among populations of African ancestry. A long-standing question in the field is whether these differences represent variation in genes, environment, or ascertainment. In a recent genome-wide association study of 5,966 individuals, we identified five susceptibility loci for this trait. In this paper, we study the largest IgAN case-control cohort reported to date, composed of 10,775 individuals of European, Asian, and African-American ancestry. We confirm that all five loci are significant contributors to disease risk across this multi-ethnic cohort. In addition, we identify two novel independent susceptibility alleles within the HLA-DQB1/DRB1 locus and a new genetic interaction between loci on Chr.1p36 and Chr.22q22. We develop a seven–SNP genetic risk score that explains nearly 5% of variation in disease risk. In geospatial analysis of 85 world populations, the genetic risk score closely parallels worldwide patterns of disease prevalence. The genetic risk score also predicts an unsuspected Northward risk gradient in Europe. This genetic prediction is verified by examination of registry data demonstrating, similarly to other immune-mediated diseases such as multiple sclerosis and type I diabetes, a previously unrecognized increase in IgAN–attributable kidney failure in Northern European countries.
PMCID: PMC3380840  PMID: 22737082
12.  Genome-Wide Association Study of White Blood Cell Count in 16,388 African Americans: the Continental Origins and Genetic Epidemiology Network (COGENT) 
PLoS Genetics  2011;7(6):e1002108.
Total white blood cell (WBC) and neutrophil counts are lower among individuals of African descent due to the common African-derived “null” variant of the Duffy Antigen Receptor for Chemokines (DARC) gene. Additional common genetic polymorphisms were recently associated with total WBC and WBC sub-type levels in European and Japanese populations. No additional loci that account for WBC variability have been identified in African Americans. In order to address this, we performed a large genome-wide association study (GWAS) of total WBC and cell subtype counts in 16,388 African-American participants from 7 population-based cohorts available in the Continental Origins and Genetic Epidemiology Network. In addition to the DARC locus on chromosome 1q23, we identified two other regions (chromosomes 4q13 and 16q22) associated with WBC in African Americans (P<2.5×10−8). The lead SNP (rs9131) on chromosome 4q13 is located in the CXCL2 gene, which encodes a chemotactic cytokine for polymorphonuclear leukocytes. Independent evidence of the novel CXCL2 association with WBC was present in 3,551 Hispanic Americans, 14,767 Japanese, and 19,509 European Americans. The index SNP (rs12149261) on chromosome 16q22 associated with WBC count is located in a large inter-chromosomal segmental duplication encompassing part of the hydrocephalus inducing homolog (HYDIN) gene. We demonstrate that the chromosome 16q22 association finding is most likely due to a genotyping artifact as a consequence of sequence similarity between duplicated regions on chromosomes 16q22 and 1q21. Among the WBC loci recently identified in European or Japanese populations, replication was observed in our African-American meta-analysis for rs445 of CDK6 on chromosome 7q21 and rs4065321 of PSMD3-CSF3 region on chromosome 17q21. In summary, the CXCL2, CDK6, and PSMD3-CSF3 regions are associated with WBC count in African American and other populations. We also demonstrate that large inter-chromosomal duplications can result in false positive associations in GWAS.
Author Summary
Although recent genome-wide association studies have identified common genetic variants associated with total white blood cell (WBC) and WBC sub-type counts in European and Japanese ancestry populations, whether these or other loci account for differences in WBC count among African Americans is unknown. By examining >16,000 African Americans, we show that, in addition to the previously identified Duffy Antigen Receptor for Chemokines (DARC) locus on chromosome 1, another variant, rs9131, and other nearby variants on human chromosome 4 are associated with total WBC count in African Americans. The variants span the CXCL2 gene, which encodes an inflammatory mediator involved in WBC production and migration. We show that the association is not restricted to African Americans but is also present in independent samples of European Americans, Hispanic Americans, and Japanese. This finding is potentially important because WBC mediate or have altered counts in a variety of acute and chronic disorders.
PMCID: PMC3128101  PMID: 21738479
13.  Genomic landscape of positive natural selection in Northern European populations 
Analyzing genetic variation of human populations for detecting loci that have been affected by positive natural selection is important for understanding adaptive history and phenotypic variation in humans. In this study, we analyzed recent positive selection in Northern Europe from genome-wide data sets of 250 000 and 500 000 single-nucleotide polymorphisms (SNPs) in a total of 999 individuals from Great Britain, Northern Germany, Eastern and Western Finland, and Sweden. Coalescent simulations were used for demonstrating that the integrated haplotype score (iHS) and long-range haplotype (LRH) statistics have sufficient power in genome-wide data sets of different sample sizes and SNP densities. Furthermore, the behavior of the FST statistic in closely related populations was characterized by allele frequency simulations. In the analysis of the North European data set, 60 regions in the genome showed strong signs of recent positive selection. Out of these, 21 regions have not been discovered in previous scans, and many contain genes with interesting functions (eg, RAB38, INFG, NOS1AP, and APOE). In the putatively selected regions, we observed a statistically significant overrepresentation of genetic association with complex disease, which emphasizes the importance of the analysis of positive selection in understanding the evolution of human disease. Altogether, this study demonstrates the potential of genome-wide data sets to discover loci that lie behind evolutionary adaptation in different human populations.
PMCID: PMC2987258  PMID: 19844263
natural selection; genetic variation; population; Europe
14.  Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data 
PLoS Genetics  2009;5(10):e1000695.
Demographic models built from genetic data play important roles in illuminating prehistorical events and serving as null models in genome scans for selection. We introduce an inference method based on the joint frequency spectrum of genetic variants within and between populations. For candidate models we numerically compute the expected spectrum using a diffusion approximation to the one-locus, two-allele Wright-Fisher process, involving up to three simultaneous populations. Our approach is a composite likelihood scheme, since linkage between neutral loci alters the variance but not the expectation of the frequency spectrum. We thus use bootstraps incorporating linkage to estimate uncertainties for parameters and significance values for hypothesis tests. Our method can also incorporate selection on single sites, predicting the joint distribution of selected alleles among populations experiencing a bevy of evolutionary forces, including expansions, contractions, migrations, and admixture. We model human expansion out of Africa and the settlement of the New World, using 5 Mb of noncoding DNA resequenced in 68 individuals from 4 populations (YRI, CHB, CEU, and MXL) by the Environmental Genome Project. We infer divergence between West African and Eurasian populations 140 thousand years ago (95% confidence interval: 40–270 kya). This is earlier than other genetic studies, in part because we incorporate migration. We estimate the European (CEU) and East Asian (CHB) divergence time to be 23 kya (95% c.i.: 17–43 kya), long after archeological evidence places modern humans in Europe. Finally, we estimate divergence between East Asians (CHB) and Mexican-Americans (MXL) of 22 kya (95% c.i.: 16.3–26.9 kya), and our analysis yields no evidence for subsequent migration. Furthermore, combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU).
Author Summary
The demographic history of our species is reflected in patterns of genetic variation within and among populations. We developed an efficient method for calculating the expected distribution of genetic variation, given a demographic model including such events as population size changes, population splits and joins, and migration. We applied our approach to publicly available human sequencing data, searching for models that best reproduce the observed patterns. Our joint analysis of data from African, European, and Asian populations yielded new dates for when these populations diverged. In particular, we found that African and Eurasian populations diverged around 100,000 years ago. This is earlier than other genetic studies suggest, because our model includes the effects of migration, which we found to be important for reproducing observed patterns of variation in the data. We also analyzed data from European, Asian, and Mexican populations to model the peopling of the Americas. Here, we find no evidence for recurrent migration after East Asian and Native American populations diverged. Our methods are not limited to studying humans, and we hope that future sequencing projects will offer more insights into the history of both our own species and others.
PMCID: PMC2760211  PMID: 19851460
15.  The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent 
PLoS Genetics  2013;9(11):e1003912.
Skin pigmentation is one of the most variable phenotypic traits in humans. A non-synonymous substitution (rs1426654) in the third exon of SLC24A5 accounts for lighter skin in Europeans but not in East Asians. A previous genome-wide association study carried out in a heterogeneous sample of UK immigrants of South Asian descent suggested that this gene also contributes significantly to skin pigmentation variation among South Asians. In the present study, we have quantitatively assessed skin pigmentation for a largely homogeneous cohort of 1228 individuals from the Southern region of the Indian subcontinent. Our data confirm significant association of rs1426654 SNP with skin pigmentation, explaining about 27% of total phenotypic variation in the cohort studied. Our extensive survey of the polymorphism in 1573 individuals from 54 ethnic populations across the Indian subcontinent reveals wide presence of the derived-A allele, although the frequencies vary substantially among populations. We also show that the geospatial pattern of this allele is complex, but most importantly, reflects strong influence of language, geography and demographic history of the populations. Sequencing 11.74 kb of SLC24A5 in 95 individuals worldwide reveals that the rs1426654-A alleles in South Asian and West Eurasian populations are monophyletic and occur on the background of a common haplotype that is characterized by low genetic diversity. We date the coalescence of the light skin associated allele at 22–28 KYA. Both our sequence and genome-wide genotype data confirm that this gene has been a target for positive selection among Europeans. However, the latter also shows additional evidence of selection in populations of the Middle East, Central Asia, Pakistan and North India but not in South India.
Author Summary
Human skin color is one of the most visible aspects of human diversity. The genetic basis of pigmentation in Europeans has been understood to some extent, but our knowledge about South Asians has been restricted to a handful of studies. It has been suggested that a single nucleotide difference in SLC24A5 accounts for 25–38% European-African pigmentation differences and correlates with lighter skin. This genetic variant has also been associated with skin color variation among South Asians living in the UK. Here, we report a study based on a homogenous cohort of South India. Our results confirm that SLC24A5 plays a key role in pigmentation diversity of South Asians. Country-wide screening of the variant reveals that the light skin associated allele is widespread in the Indian subcontinent and its complex patterning is shaped by a combination of processes involving selection and demographic history of the populations. By studying the variation of SLC24A5 sequences among a diverse set of individuals, we show that the light skin associated allele in South Asians is identical by descent to that found in Europeans. Our study also provides new insights into positive selection acting on the gene and the evolutionary history of light skin in humans.
PMCID: PMC3820762  PMID: 24244186
16.  Single nucleotide polymorphisms generated by genotyping by sequencing to characterize genome-wide diversity, linkage disequilibrium, and selective sweeps in cultivated watermelon 
BMC Genomics  2014;15(1):767.
A large single nucleotide polymorphism (SNP) dataset was used to analyze genome-wide diversity in a diverse collection of watermelon cultivars representing globally cultivated, watermelon genetic diversity. The marker density required for conducting successful association mapping depends on the extent of linkage disequilibrium (LD) within a population. Use of genotyping by sequencing reveals large numbers of SNPs that in turn generate opportunities in genome-wide association mapping and marker-assisted selection, even in crops such as watermelon for which few genomic resources are available. In this paper, we used genome-wide genetic diversity to study LD, selective sweeps, and pairwise FST distributions among worldwide cultivated watermelons to track signals of domestication.
We examined 183 Citrullus lanatus var. lanatus accessions representing domesticated watermelon and generated a set of 11,485 SNP markers using genotyping by sequencing. With a diverse panel of worldwide cultivated watermelons, we identified a set of 5,254 SNPs with a minor allele frequency of ≥ 0.05, distributed across the genome. All ancestries were traced to Africa and an admixture of various ancestries constituted secondary gene pools across various continents. A sliding window analysis using pairwise FST values was used to resolve selective sweeps. We identified strong selection on chromosomes 3 and 9 that might have contributed to the domestication process. Pairwise analysis of adjacent SNPs within a chromosome as well as within a haplotype allowed us to estimate genome-wide LD decay. LD was also detected within individual genes on various chromosomes. Principal component and ancestry analyses were used to account for population structure in a genome-wide association study. We further mapped important genes for soluble solid content using a mixed linear model.
Information concerning the SNP resources, population structure, and LD developed in this study will help in identifying agronomically important candidate genes from the genomic regions underlying selection and for mapping quantitative trait loci using a genome-wide association study in sweet watermelon.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-767) contains supplementary material, which is available to authorized users.
PMCID: PMC4246513  PMID: 25196513
Linkage disequilibrium; GWAS; Selective sweep; Population structure; Genotyping by sequencing; Watermelon; Citrullus lanatus var. lanatus
17.  Lactase Persistence and Lipid Pathway Selection in the Maasai 
PLoS ONE  2012;7(9):e44751.
The Maasai are a pastoral people in Kenya and Tanzania, whose traditional diet of milk, blood and meat is rich in lactose, fat and cholesterol. In spite of this, they have low levels of blood cholesterol, and seldom suffer from gallstones or cardiac diseases. Field studies in the 1970s suggested that the Maasai have a genetic adaptation for cholesterol homeostasis. Analysis of HapMap 3 data using Fixation Index (Fst) and two metrics of haplotype diversity: the integrated Haplotype Score (iHS) and the Cross Population Extended Haplotype Homozygosity (XP-EHH), identified genomic regions and single nucleotide polymorphisms (SNPs) as strong candidates for recent selection for lactase persistence and cholesterol regulation in 143–156 founder individuals from the Maasai population in Kinyawa, Kenya (MKK). The non-synonmous SNP with the highest genome-wide Fst was the TC polymorphism at rs2241883 in Fatty Acid Binding Protein 1(FABP1), known to reduce low density lipoprotein and tri-glyceride levels in Europeans. The strongest signal identified by all three metrics was a 1.7 Mb region on Chr2q21. This region contains the genes LCT (Lactase) and MCM6 (Minichromosome Maintenance Complex Component) involved in lactase persistence, and the gene Rab3GAP1 (Rab3 GTPase-activating Protein Catalytic Subunit), which contains polymorphisms associated with total cholesterol levels in a genome-wide association study of >100,000 individuals of European ancestry. Sanger sequencing of DNA from six MKK samples showed that the GC-14010 polymorphism in the MCM6 gene, known to be associated with lactase persistence in Africans, is segregating in MKK at high frequency (∼58%). The Cytochrome P450 Family 3 Subfamily A (CYP3A) cluster of genes, involved in cholesterol metabolism, was identified by Fst and iHS as candidate loci under selection. Overall, our study identified several specific genomic regions under selection in the Maasai which contain polymorphisms in genes associated with lactase persistence and cholesterol regulation.
PMCID: PMC3461017  PMID: 23028602
18.  Genome wide signatures of positive selection: The comparison of independent samples and the identification of regions associated to traits 
BMC Genomics  2009;10:178.
The goal of genome wide analyses of polymorphisms is to achieve a better understanding of the link between genotype and phenotype. Part of that goal is to understand the selective forces that have operated on a population.
In this study we compared the signals of selection, identified through population divergence in the Bovine HapMap project, to those found in an independent sample of cattle from Australia. Evidence for population differentiation across the genome, as measured by FST, was highly correlated in the two data sets. Nevertheless, 40% of the variance in FST between the two studies was attributed to the differences in breed composition. Seventy six percent of the variance in FST was attributed to differences in SNP composition and density when the same breeds were compared. The difference between FST of adjacent loci increased rapidly with the increase in distance between SNP, reaching an asymptote after 20 kb. Using 129 SNP that have highly divergent FST values in both data sets, we identified 12 regions that had additive effects on the traits residual feed intake, beef yield or intramuscular fatness measured in the Australian sample. Four of these regions had effects on more than one trait. One of these regions includes the R3HDM1 gene, which is under selection in European humans.
Firstly, many different populations will be necessary for a full description of selective signatures across the genome, not just a small set of highly divergent populations. Secondly, it is necessary to use the same SNP when comparing the signatures of selection from one study to another. Thirdly, useful signatures of selection can be obtained where many of the groups have only minor genetic differences and may not be clearly separated in a principal component analysis. Fourthly, combining analyses of genome wide selection signatures and genome wide associations to traits helps to define the trait under selection or the population group in which the QTL is likely to be segregating. Finally, the FST difference between adjacent loci suggests that 150,000 evenly spaced SNP will be required to study selective signatures in all parts of the bovine genome.
PMCID: PMC2681478  PMID: 19393047
19.  Population- and genome-specific patterns of linkage disequilibrium and SNP variation in spring and winter wheat (Triticum aestivum L.) 
BMC Genomics  2010;11:727.
Single nucleotide polymorphisms (SNPs) are ideally suited for the construction of high-resolution genetic maps, studying population evolutionary history and performing genome-wide association mapping experiments. Here, we used a genome-wide set of 1536 SNPs to study linkage disequilibrium (LD) and population structure in a panel of 478 spring and winter wheat cultivars (Triticum aestivum) from 17 populations across the United States and Mexico.
Most of the wheat oligo pool assay (OPA) SNPs that were polymorphic within the complete set of 478 cultivars were also polymorphic in all subpopulations. Higher levels of genetic differentiation were observed among wheat lines within populations than among populations. A total of nine genetically distinct clusters were identified, suggesting that some of the pre-defined populations shared significant proportion of genetic ancestry. Estimates of population structure (FST) at individual loci showed a high level of heterogeneity across the genome. In addition, seven genomic regions with elevated FST were detected between the spring and winter wheat populations. Some of these regions overlapped with previously mapped flowering time QTL. Across all populations, the highest extent of significant LD was observed in the wheat D-genome, followed by lower LD in the A- and B-genomes. The differences in the extent of LD among populations and genomes were mostly driven by differences in long-range LD ( > 10 cM).
Genome- and population-specific patterns of genetic differentiation and LD were discovered in the populations of wheat cultivars from different geographic regions. Our study demonstrated that the estimates of population structure between spring and winter wheat lines can identify genomic regions harboring candidate genes involved in the regulation of growth habit. Variation in LD suggests that breeding and selection had a different impact on each wheat genome both within and among populations. The higher extent of LD in the wheat D-genome versus the A- and B-genomes likely reflects the episodes of recent introgression and population bottleneck accompanying the origin of hexaploid wheat. The assessment of LD and population structure in this assembled panel of diverse lines provides critical information for the development of genetic resources for genome-wide association mapping of agronomically important traits in wheat.
PMCID: PMC3020227  PMID: 21190581
20.  Reconstructing the Population Genetic History of the Caribbean 
PLoS Genetics  2013;9(11):e1003925.
The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse—which today is reflected by shorter, older ancestry tracts—consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse—reflected by longer, younger tracts—is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.
Author Summary
Latinos are often regarded as a single heterogeneous group, whose complex variation is not fully appreciated in several social, demographic, and biomedical contexts. By making use of genomic data, we characterize ancestral components of Caribbean populations on a sub-continental level and unveil fine-scale patterns of population structure distinguishing insular from mainland Caribbean populations as well as from other Hispanic/Latino groups. We provide genetic evidence for an inland South American origin of the Native American component in island populations and for extensive pre-Columbian gene flow across the Caribbean basin. The Caribbean-derived European component shows significant differentiation from parental Iberian populations, presumably as a result of founder effects during the colonization of the New World. Based on demographic models, we reconstruct the complex population history of the Caribbean since the onset of continental admixture. We find that insular populations are best modeled as mixtures absorbing two pulses of African migrants, coinciding with the early and maximum activity stages of the transatlantic slave trade. These two pulses appear to have originated in different regions within West Africa, imprinting two distinguishable signatures on present-day Afro-Caribbean genomes and shedding light on the genetic impact of the slave trade in the Caribbean.
PMCID: PMC3828151  PMID: 24244192
21.  Worldwide distribution of NAT2 diversity: Implications for NAT2 evolutionary history 
BMC Genetics  2008;9:21.
The N-acetyltransferase 2 (NAT2) gene plays a crucial role in the metabolism of many drugs and xenobiotics. As it represents a likely target of population-specific selection pressures, we fully sequenced the NAT2 coding region in 97 Mandenka individuals from Senegal, and compared these sequences to extant data on other African populations. The Mandenka data were further included in a worldwide dataset composed of 41 published population samples (6,727 individuals) from four continental regions that were adequately genotyped for all common NAT2 variants so as to provide further insights into the worldwide haplotype diversity and population structure at NAT2.
The sequencing analysis of the NAT2 gene in the Mandenka sample revealed twelve polymorphic sites in the coding exon (two of which are newly identified mutations, C345T and C638T), defining 16 haplotypes. High diversity and no molecular signal of departure from neutrality were observed in this West African sample. On the basis of the worldwide genotyping survey dataset, we found a strong genetic structure differentiating East Asians from both Europeans and sub-Saharan Africans. This pattern could result from region- or population-specific selective pressures acting at this locus, as further suggested in the HapMap data by extremely high values of FST for a few SNPs positions in the NAT2 coding exon (T341C, C481T and A803G) in comparison to the empirical distribution of FST values accross the whole 400-kb region of the NAT gene family.
Patterns of sequence variation at NAT2 are consistent with selective neutrality in all sub-Saharan African populations investigated, whereas the high level of population differentiation between Europeans and East Asians inferred from SNPs could suggest population-specific selective pressures acting at this locus, probably caused by differences in diet or exposure to other environmental signals.
PMCID: PMC2292740  PMID: 18304320
22.  Genome-Wide Scan for Signatures of Human Population Differentiation and Their Relationship with Natural Selection, Functional Pathways and Diseases 
PLoS ONE  2009;4(11):e7927.
Genetic differences both between individuals and populations are studied for their evolutionary relevance and for their potential medical applications. Most of the genetic differentiation among populations are caused by random drift that should affect all loci across the genome in a similar manner. When a locus shows extraordinary high or low levels of population differentiation, this may be interpreted as evidence for natural selection. The most used measure of population differentiation was devised by Wright and is known as fixation index, or FST. We performed a genome-wide estimation of FST on about 4 millions of SNPs from HapMap project data. We demonstrated a heterogeneous distribution of FST values between autosomes and heterochromosomes. When we compared the FST values obtained in this study with another evolutionary measure obtained by comparative interspecific approach, we found that genes under positive selection appeared to show low levels of population differentiation. We applied a gene set approach, widely used for microarray data analysis, to detect functional pathways under selection. We found that one pathway related to antigen processing and presentation showed low levels of FST, while several pathways related to cell signalling, growth and morphogenesis showed high FST values. Finally, we detected a signature of selection within genes associated with human complex diseases. These results can help to identify which process occurred during human evolution and adaptation to different environments. They also support the hypothesis that common diseases could have a genetic background shaped by human evolution.
PMCID: PMC2775949  PMID: 19936260
23.  Genome-wide analysis reveals the ancient and recent admixture history of East African Shorthorn Zebu from Western Kenya 
Heredity  2014;113(4):297-305.
The Kenyan East African zebu cattle are valuable and widely used genetic resources. Previous studies using microsatellite loci revealed the complex history of these populations with the presence of taurine and zebu genetic backgrounds. Here, we estimate at genome-wide level the genetic composition and population structure of the East African Shorthorn Zebu (EASZ) of western Kenya. A total of 548 EASZ from 20 sub-locations were genotyped using the Illumina BovineSNP50 v. 1 beadchip. STRUCTURE analysis reveals admixture with Asian zebu, African and European taurine cattle. The EASZ were separated into three categories: substantial (⩾12.5%), moderate (1.56%
PMCID: PMC4181064  PMID: 24736786
BMC Medical Genetics  2011;12:55.
We hypothesized that the frequencies of risk alleles of SNPs mediating susceptibility to cardiovascular diseases differ among populations of varying geographic origin and that population-specific selection has operated on some of these variants.
From the database of genome-wide association studies (GWAS), we selected 36 cardiovascular phenotypes including coronary heart disease, hypertension, and stroke, as well as related quantitative traits (eg, body mass index and plasma lipid levels). We identified 292 SNPs in 270 genes associated with a disease or trait at P < 5 × 10-8. As part of the Human Genome-Diversity Project (HGDP), 158 (54.1%) of these SNPs have been genotyped in 938 individuals belonging to 52 populations from seven geographic areas. A measure of population differentiation, FST, was calculated to quantify differences in risk allele frequencies (RAFs) among populations and geographic areas.
Large differences in RAFs were noted in populations of Africa, East Asia, America and Oceania, when compared with other geographic regions. The mean global FST (0.1042) for 158 SNPs among the populations was not significantly higher than the mean global FST of 158 autosomal SNPs randomly sampled from the HGDP database. Significantly higher global FST (P < 0.05) was noted in eight SNPs, based on an empirical distribution of global FST of 2036 putatively neutral SNPs. For four of these SNPs, additional evidence of selection was noted based on the integrated Haplotype Score.
Large differences in RAFs for a set of common SNPs that influence risk of cardiovascular disease were noted between the major world populations. Pairwise comparisons revealed RAF differences for at least eight SNPs that might be due to population-specific selection or demographic factors. These findings are relevant to a better understanding of geographic variation in the prevalence of cardiovascular disease.
PMCID: PMC3103418  PMID: 21507254
cardiovascular disease; genetics; genome-wide association study; risk allele frequency; population differentiation
The distribution of genetic variation among populations is conveniently measured by Wright’s FST, which is a scaled variance taking on values in [0,1]. For certain types of genetic markers, and for single-nucleotide polymorphisms (SNPs) in particular, it is reasonable to presume that allelic differences at most loci are selectively neutral. For such loci, the distribution of genetic variation among populations is determined by the size of local populations, the pattern and rate of migration among those populations, and the rate of mutation. Because the demographic parameters (population sizes and migration rates) are common across all autosomal loci, locus-specific estimates of FST will depart from a common distribution only for loci with unusually high or low rates of mutation or for loci that are closely associated with genomic regions having a relationship with fitness. Thus, loci that are statistical outliers showing significantly more among-population differentiation than others may mark genomic regions subject to diversifying selection among the sample populations. Similarly, statistical outliers showing significantly less differentiation among populations than others may mark genomic regions subject to stabilizing selection across the sample populations. We propose several Bayesian hierarchical models to estimate locus-specific effects on FST, and we apply these models to single nucleotide polymorphism data from the HapMap project. Because loci that are physically associated with one another are likely to show similar patterns of variation, we introduce conditional autoregressive models to incorporate the local correlation among loci for high-resolution genomic data. We estimate the posterior distributions of model parameters using Markov chain Monte Carlo (MCMC) simulations. Model comparison using several criteria, including DIC and LPML, reveals that a model with locus- and population-specific effects is superior to other models for the data used in the analysis. To detect statistical outliers we propose an approach that measures divergence between the posterior distributions of locus-specific effects and the common FST with the Kullback-Leibler divergence measure. We calibrate this measure by comparing values with those produced from the divergence between a biased and a fair coin. We conduct a simulation study to illustrate the performance of our approach for detecting loci subject to stabilizing/divergent selection, and we apply the proposed models to low- and high-resolution SNP data from the HapMap project. Model comparison using DIC and LPML reveals that CAR models are superior to alternative models for the high resolution data. For both low and high resolution data, we identify statistical outliers that are associated with known genes.
PMCID: PMC2713112  PMID: 19623271
Bayesian approach; Hierarchical model; SNP; Wright’s Fst; MCMC

