PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of molbiolevolLink to Publisher's site
 
Mol Biol Evol. 2009 June; 26(6): 1357–1367.
Published online 2009 March 11. doi:  10.1093/molbev/msp045
PMCID: PMC2734137

Population Genomic Analysis of ALMS1 in Humans Reveals a Surprisingly Complex Evolutionary History

Abstract

Mutations in the human gene ALMS1 result in Alström Syndrome, which presents with early childhood obesity and insulin resistance leading to Type 2 diabetes. Previous genomewide scans for selection in the HapMap data based on linkage disequilibrium and population structure suggest that ALMS1 was subject to recent positive selection. Through a detailed population genomic analysis of existing genomewide data sets and new resequencing data obtained in geographically diverse populations, we find that the signature of selection at ALMS1 is considerably more complex than what would be expected for an idealized model of a selective sweep acting on a newly arisen advantageous mutation. Specifically, we observed three highly divergent and globally dispersed haplogroups, two of which carry a set of seven derived nonsynonymous single nucleotide polymorphisms that are nearly fixed in Asian populations. Our data suggest that the interaction of human demographic history and positive selection on standing variation in Eurasian populations approximately 15 thousand years ago parsimoniously explains the spectrum of extant ALMS1 variation. These results provide new insights into the evolutionary history of ALMS1 in humans and suggest that selective events identified in genomewide scans may be more complex than currently appreciated.

Keywords: ALMS1, positive selection, standing variation

Introduction

The recent availability of dense catalogs of human genetic variation such as the HapMap (International HapMap Consortium 2005) and Perlegen (Hinds et al. 2005) data sets has facilitated global inferences of positive selection. Numerous genomewide scans have identified putative targets of positive selection with patterns of variation that show significant deviations from neutral expectations (Sabeti et al. 2002; Kelley et al. 2006; Voight et al. 2006; Wang et al. 2006; Zhang et al. 2006; Kimura et al. 2007; Tang et al. 2007). Although these analyses have provided considerable insight into how often and where in the genome positive selection has shaped extant patterns of human genetic variation, a deeper understanding of human evolutionary history will require in-depth follow-up studies of “outlier loci” identified in genomewide scans (Biswas and Akey 2006).

To this end, we have performed a detailed population genomic analysis of ALMS1, which has been identified as a putative target of recent adaptive evolution in several genomewide scans for selection (International HapMap Consortium 2005; Wang et al. 2006; Kimura et al. 2007; Tang et al. 2007). Mutations in ALMS1 can lead to Alström Syndrome, a rare autosomal recessive disorder with a spectrum of phenotypes including early onset obesity, metabolic disorders, and sensory impairment (Collin et al. 2005; Li et al. 2007). Recent in vitro work demonstrates that ALMS1 is widely expressed and localizes to centrosomes and the base of cilia (Hearn et al. 2005; Arsov et al. 2006), and studies in mice confirm that ALMS1 is involved in cilia formation and function (Li et al. 2007). Alström Syndrome belongs to a growing class of human diseases, referred to as ciliopathies, that includes disorders such as nephronophthisis, Bardet–Biedl syndrome (BBS), and Meckel–Gruber syndrome (MKS) (Badano et al. 2006). Interestingly, several phenotypes, such as childhood obesity and insulin resistance, overlap between Alström Syndrome and BBS (Hildebrandt and Otto 2005), and hypomorphic mutations in MKS causing genes are associated with BBS (Leitch et al. 2008). Thus, distinct genetic perturbations to the network of proteins involved in cilia formation and function can result in overlapping and pleiotropic phenotypic anomalies.

To better understand the evolutionary history of ALMS1, we analyzed ALMS1 genotype, sequence, and haplotype data. These analyses show that ALMS1 has been subjected to recent positive selection in Eurasian populations approximately 15 thousand years ago (kya). However, unexpectedly, the signature of selection at ALMS1 is considerably more complex than what would be expected for an idealized model of selection acting on a newly arisen advantageous mutation. Rather, the interaction of human demography and positive selection on standing variation in Eurasians parsimoniously explains the spectrum of extant ALMS1 variation. In addition, by reanalyzing previously published genomewide association data, we provide evidence that ALMS1 genetic variation contributes to interindividual variation in metabolic phenotypes such as insulin and glucose levels. In summary, our results provide new insights into the evolutionary history of ALMS1 in humans, highlight the need for careful follow-up studies of candidate selection genes identified in genomewide analyses, and suggest that selective events in human populations may be more complex than currently appreciated.

Materials and Methods

Samples

We sequenced approximately 6 kb of ALMS1 in DNA samples from 91 individuals representing six populations that were obtained from the Coriell Institute for Medical Research Cell Repositories (Camden, NJ). Coriell repository numbers for these samples are as follows: CEPH (n = 21: NA06990, NA07019, NA07348–9, NA10830–1, NA10842–5, NA10848, NA10850–4, NA10857–8, NA10860–1, and NA17201), Han Chinese of L.A. (n = 21: NA17733–NA17749, NA17752–56), Middle East (n = 10: NA17041–50), Pygmy (n = 10: NA10469–73, NA10492–96), South Africa (n = 9: NA17341–49), South America (n = 10: NA17301–10) and South East Asia (n = 10: NA17081–90). In addition, we sequenced the same regions in four nonhuman primate DNA samples from the Coriell Institute for Medical Research Cell Repositories with the following repository numbers: gorilla (Gorilla gorilla; AG05251), bonobo (Pan paniscus; AG05253), chimpanzee (Pan troglodytes; AG06939), and orangutan (Pongo pygmaeus; AG12256).

DNA Sequencing

Sequencing primers were designed from published human sequence (NM_015120) with primer3 (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) for coding and noncoding regions of ALMS1: upstream, intron 2, exon 5, intron 7, exon 8, intron 8, exon 10, and downstream (primer sequences are available upon request). We used standard polymerase chain reaction–based sequencing reactions using Applied Biosystem's Big Dye sequencing protocol on an ABI 3130xl. Sequence data were assembled using Phred/Phrap (Ewing and Green 1998; Ewing et al. 1998), and the alignments were inspected for accuracy with Consed (Gordon et al. 1998, 2001). Polymorphisms were identified with PolyPhred 4.0 (Bhangale et al. 2006). All polymorphic sites were manually verified and confirmed by sequencing the opposite strand. Genotype data from 210 unrelated individuals were obtained from the HapMap project (Release 22 NCBI Build 36) (International HapMap Consortium 2005).

Linkage Disequilibrium (LD)

We calculated r2 between all pairwise combinations (Hill 1968) of markers in ALMS1 and approximately 1 Mb of flanking sequences (both 5′ and 3′) using HapMap genotype data. Estimates of r2 were obtained from Haploview (Barrett et al. 2005) for all markers with a minor allele frequency ≥5% and used in subsequent analyses. To evaluate and compare the distribution of LD within and between the HapMap CEU, YRI, and ASN samples, and how LD decays as a function of distance from ALMS1, we calculated a statistic related to ZnS (Kelly 1997). Specifically, we calculated the average r2 between all pairwise comparisons of single nucleotide polymorphisms (SNPs) in bin 1 and bin 2:

An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp045fx1_ht.jpg

where n1 is the number of SNPs in bin 1 and n2 is the number of SNPs in bin 2. Here, n2 represents the number of SNPs in ALMS1 and n1 the number of SNPs in nonoverlapping 50-kb windows up and downstream of ALMS1.

Haplotype Analysis

Haplotypes were reconstructed in the HapMap and sequence data with Phase 2.1.1 (Stephens et al. 2001; Stephens and Scheet 2005) using 10 iterations to confirm consistency among runs, and the run with the best average goodness-of-fit was used. We defined Haplogroup A (ancestral) and Haplogroup D (derived) based on the allelic state of seven nonsynonymous SNPs (nsSNPs) (rs3813227, rs6546837, rs6546838, rs6724782, rs6546839, rs2056486, and rs10193972) and Haplogroup D1 and Haplogroup D2 based on the allelic state of two additional SNPs (rs6730785 and rs7598901). The ancestral allele was determined by the chimpanzee sequence.

We used Neighbor from the software package PHYLIP 3.6 (Felsenstein 1989, 2005) to construct unrooted phylogenetic trees on phased sequence (Dnadist was used to calculate the pairwise distance matrix) and HapMap data (average pairwise distances were calculated for the distance matrix). In both cases, we removed recombinant haplotypes occurring among the seven aforementioned nsSNPs (three unique haplotypes/four total haplotypes from the sequence data and two unique haplotypes/three total haplotypes from the HapMap data). We visualized the Neighbor-Joining trees with the APE package in R (http://cran.r-project.org/web/packages/ape/).

Time to the Most Recent Common Ancestor (TMRCA) Estimates

We used the method described by Thomson et al. (2000) to estimate the TMRCA on our phased sequenced data as this method does not utilize any particular population model. Analyses were performed both on all haplotypes as well as on only haplotypes with no recombination among the seven nsSNPs, and we found minimal effects on the estimated TMRCA (data not shown). We used the average divergence between chimpanzee and human sequences divided by two times the estimated divergence time of 6 million years, which we calculated to be 36/(2*60,00,000), or 3 × 10−6 for our sequence mutation rate. Briefly, to estimate the TMRCA, we used the simple estimate of T, the time since the MRCA (Thomson et al. 2000; Mekel-Bobrov et al. 2005):

An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp045fx2_ht.jpg

where An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp045fx3_ht.jpg is the unbiased estimator of T, xi is the number of mutational differences between the ith sequence and the MRCA, n is the total number of sequences in the sample, and μ is the mutation rate. In addition, we used three additional methods to estimate the ALMS1 TMRCA (McPeek and Strahs 1999; Bahlo and Griffiths 2000; Templeton 2002), all of which yielded similarly old dates and were not significantly different from one another (data not shown).

Coalescent Simulations

We calculated three standard neutrality tests of the site frequency spectrum: Tajima's D (Tajima 1989), Fu and Li's F test (Fu and Li 1993), and Fay and Wu's H test (Fay and Wu 2000). We used the nonhuman primate sequence to establish the ancestral allele for Fay and Wu's H test. To interpret summary statistics derived from the resequencing data, we performed additional coalescent simulations with the program ms (Hudson 2002) using previously inferred demographic parameters that were found to best fit genomic patterns of variation in the HapMap YRI, CEU, and ASN samples (Schaffner et al. 2005). The exact parameters can be found in table 1 of Schaffner et al. (2005), and involve multiple bottlenecks, population expansions, population splitting, recombination, and gene conversion. The only exception is that we did not include migration following population splitting as Schaffner et al. (2005) found these parameters resulted in only slightly worse fitting models, but the modest increase in levels of population differentiation resulted in more accepted simulation replicates to analyze. The ms command line argument for this model is available upon request. We used a rejection sampling method (Beaumont et al. 2002) to account for the a priori observation of ALMS1 population structure and a total of 1 × 107 simulations were performed. Initially, we attempted to accept data sets if they matched observed levels of differentiation in our resequencing data (five or more SNPs with an FST ≥ 0.80 between African and Han Chinese samples and two or more SNPs with an FST ≥ 0.52 between African and CEPH samples). However, none of the 10 million simulation replicates met these criteria, indicating that such levels of structure are incompatible with a neutral demographic model that is consistent with major features of human genomic variation (Schaffner et al. 2005). Thus, for computational tractability, we relaxed the acceptance criteria to one or more SNPs with a pairwise FST ≥ 0.80 and 0.52 between African and Han Chinese samples and African and CEPH samples, respectively. Using these thresholds, 1,405 data sets of the 10 million simulations were accepted and analyzed further. In particular, we evaluated the probability of observing divergent haplotype lineages, TMRCA, and Tajima's D as or more extreme than that observed for ALMS1. In accepted data sets, we calculated TMRCA as described above, Tajima's D (Tajima 1989), and the average number of nucleotide differences between haplogroups carrying the derived allele at the highly differentiated SNP:

An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp045fx4_ht.jpg

where D1 and D2 denote the set of haplotypes belonging to derived haplogroup lineages 1 and 2, respectively. In the simulated data sets, D1 and D2 were chosen so as to maximize dxy.

Table 1
Summary of Coalescent Simulation Results Conditional on Ascertainment

Human Genome Diversity Project–Centre d'Etude du Polymorphisme Humain (HGDP–CEPH) Analysis

We defined haplogroups in the HGDP–CEPH data set (Li et al. 2008) with six SNPs, Haplogroups A and D were defined based on alleles of four genotyped nsSNPs (rs3813227, rs6546838, rs2056486, and rs10193972) and Haplogroups D1 and D2 were further defined by two additional genotyped SNPs (rs2037814 and rs3820700). Recombinant haplotypes among Haplogroups A and D were excluded from the haplotype frequency map, whereas recombinants between Haplogroups D1 and D2 were included and defined by the allelic status of rs10193972. In order to avoid any single population sample falling below a sample size of 10, we combined the Bantu SE and SW individuals into one Bantu South population.

We developed a simple heuristic statistic to determine how unusual the geographic distribution of ALMS1 genetic variation is relative to the rest of the genome using all autosomal HGDP–CEPH data that had less than 10% missing data. Specifically, for the ith SNP, we define the global deviance score, GDi, as follows:

An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp045fx5_ht.jpg

where FSTi12, FSTi13, and FSTi23is the unbiased pairwise FST (Weir 1996) between East Asian and African samples, East Asian and American samples, and African and American samples, respectively, for the ith SNP, and An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp045fx6_ht.jpg is the average allele frequency across samples weighted by sample size. In words, the global deviance score is large when levels of differentiation between Asian and African and Asian and American samples are greater than the genomewide average and levels of differentiation between African and American samples is less than the genomewide average. We included the Bantu (North and South), Biaka, Mbuti, Mandenka, Yoruba, and San in the African sample; the Colombian, Karitiana, Maya, Pima, and Surui in the American sample; and the Cambodian, Dai, Daur, Han (North and South), Hezhen, Japanese, Lahu, Miaozu, Mongola, Naxi, Oroqen, She, Tu, Tujia, Xibo, Yakut, and Yizu in the East Asian sample.

We used the expression analysis tool (Thomas et al. 2006) to identify enriched PANTHER Pathways, Biological Processes, and Molecular Functions (Thomas et al. 2003) among genes in the top 0.1% of the distribution of GD scores. Pathways and terms with less than five genes were excluded from further analysis, and Bonferroni corrections were used to correct for multiple testing.

Estimating the Time of the Selective Sweep

We estimated the time since the selective sweep for the derived class of ALMS1 lineages by analyzing the amount of nucleotide diversity that has accumulated on the selected haplotypes as described in Akey et al. (2004) where the time back to the selective sweep, t, can be estimated by S/(), where S is the number of segregating sites, n is the number of haplotypes included and μ is the neutral mutation rate of the locus. For ALMS1 derived haplogroups, n = 120, S = 13, and μ = 1.75 × 10−4. Note that this calculation should be treated as a rough approximation because it assumes a starlike phylogeny, which ALMS1 violates.

Estimating the Strength of Selection

We used the following simple deterministic formula to estimate the selection coefficient, s (Gillespie 1998):

An external file that holds a picture, illustration, etc.
Object name is molbiolevolmsp045fx7_ht.jpg

where w-=12pqhsq2s, p is the frequency of the selected allele, q is the frequency of the nonselected allele, and h is the heterozygous effect. We assumed an initial frequency of 10% (a conservatively high estimate based on current frequencies in African samples) and a final frequency of 95% (a conservatively low estimate based on current frequencies in East Asian samples) for the putatively selected allele. The range of s reported in the main text is based on varying the age of the selective event (from 500 to 1,000 generations) and heterozygous effects (h = 0, 0.5, and 1).

Results

ALMS1 Genetic Variation Is Highly Structured among Populations and Localizes the Signature of Selection

ALMS1 was initially identified as a potential target of positive selection based on large allele frequency differences among populations for six nonsynonymous SNPs (nsSNPs) (International HapMap Consortium 2005). In order to better understand how unusual levels of population structure are in the ALMS1 region relative to the rest of the genome and fine-scale map the signature of selection, we first performed a genomewide analysis of allele frequency differences in the HapMap data. Specifically, we calculated the average pairwise FST (Weir 1996) in nonoverlapping 100-kb windows using HapMap Phase II data (autosomal regions only) among the Yoruba (YRI) individuals from Ibadan, Nigeria (n = 60), CEPH (CEU) individuals with ancestry from northern and western Europe (n = 60), Japanese (JPT) individuals from Tokyo, Japan (n = 45), and Han Chinese (CHB) individuals from Beijing, China (n = 45). In all of the analyses, we combined the JPT and CHB individuals into a single Asian sample (ASN). Windows containing ALMS1 were in the extreme 99th percentile of the empirical distribution for both the ASN and YRI and CEU and YRI comparisons, and only 12 of the 28,652 windows were more differentiated than ALMS1 in the ASN and YRI comparisons.

We performed three additional analyses to determine how robust the signature of strong population structure at ALMS1 is to potential confounding variables. First, we repeated the genomewide analysis of FST on the HapMap data with window sizes in units of genetic distance (0.1 cM) estimated from fine-scale recombination rates. Second, we adjusted each window specific estimate of FST for a larger set of potential confounding variables (number of SNPs, recombination rate, GC%, and heterozygosity per 100-kb bin) by multiple regression. Finally, we performed a genomewide analysis of FST on Class A Perlegen SNPs as described above, which were discovered more uniformly and manifest less ascertainment bias relative to the HapMap SNPs (Hinds et al. 2005; Kelley et al. 2006). In all three cases, ALMS1 remained one of the most differentiated regions in the genome (results not shown), indicating that our results are robust to ascertainment bias, recombination rate heterogeneity, and additional confounding variables.

The distribution of pairwise FST among the ASN, CEU, and YRI samples for all SNPs across an approximately 800-kb region centered on ALMS1 is shown in figure 1. The largest values of FST across the region are coincident with the location of ALMS1 (fig. 1), and SNPs located immediately up and downstream of ALMS1 show markedly lower FST values. Extreme levels of population structure are found throughout ALMS1; specifically, 45 and 35 SNPs have FST values greater than the 99th percentile for the ASN and YRI and CEU and YRI samples, respectively. These highly differentiated SNPs include seven nsSNPs (fig. 2), six of which have been previously described (International HapMap Consortium 2005), and are located in exons 5, 8, and 10 of ALMS1. The derived alleles at each of the seven nsSNPs are found at 99%, 80%, and 8–9% frequency in the ASN, CEU, and YRI samples, respectively. Levels of differentiation between the ASN and CEU samples are not unusual compared with the genome at large (fig. 1).

FIG. 1.
Distribution of FST across the ALMS1 region. The top panel shows the ALMS1 gene structure (with exons shaded in black) and locations of previously estimated recombination hotspots (International HapMap Consortium, 2007). Pairwise FST between each HapMap ...
FIG. 2.
Patterns of LD in ALMS1 and flanking regions. The location of ALMS1 is marked by a gray rectangle and additional genes located in the region are shown as white rectangles. Previously inferred recombination hotspots are denoted by black rectangles. For ...

In summary, the patterns of FST in the HapMap and Perlegen data suggest ALMS1 was a target of recent positive selection in East Asian and European populations, and indicate several plausible sites (i.e., one or more of the seven highly differentiated nsSNPs) conferring a fitness advantage. Consistent with this interpretation, previous genomewide scans of selection based on LD have also identified ALMS1 as an outlier in the ASN and CEU samples (Wang et al. 2006; Kimura et al. 2007; Tang et al. 2007). A graphical summary of the distribution of LD across the ALMS1 region in the HapMap samples is shown in figure 2.

Unusual Patterns of ALMS1 Haplotype Variation Are Inconsistent with Simple Models of a Selective Sweep

Under simple models of genetic hitchhiking (Maynard-Smith and Haigh 1974), we would expect to find a single haplotype carrying an advantageous allele at high frequency. To test this prediction, we reconstructed haplotypes (Stephens et al. 2001; Stephens and Scheet 2005) in the HapMap samples. A visual representation of haplotypes shows a striking departure from predictions of a simple hitchhiking model, where haplotypes carrying the derived alleles at the seven highly differentiated nsSNPs exist on two distinct backgrounds (fig. 3). In addition, haplotypes carrying the ancestral allele at each of the seven highly differentiated nsSNPs exist on a background that is distinct relative to the two derived classes (fig. 3). Similar results were observed in visual representations of genotypes (results not shown), demonstrating these patterns are not an artifact of haplotype inference.

FIG. 3.
ALMS1 haplotypes are organized into three divergent haplogroup lineages. On the left, a visual representation of ALMS1 haplotypes derived from the HapMap samples is shown, where rows correspond to individual phased haplotypes and columns represent SNPs. ...

To more quantitatively assess haplotype structure at ALMS1, we constructed a Neighbor-Joining tree based on pairwise distances among haplotypes. The Neighbor-Joining tree shows three distinct haplogroups (fig. 3), formed by an initial split between haplotypes carrying the seven ancestral nsSNPs from those containing the derived nsSNPs (which we will refer to as Haplogroup A and Haplogroup D, respectively). Furthermore, there is an additional deep split among the derived haplotypes forming two distinct haplogroups (which we will refer to as Haplogroup D1 and Haplogroup D2).

The average pairwise difference (based on all 242 HapMap SNPs spanning ALMS1) between Haplogroups A and D and D1 and D2 are 104.2 and 57.6, respectively. Thus, on average, Haplogroups A and D possess alternative alleles at approximately 43% of sites and Haplogroups D1 and D2 differ at approximately 24% of sites. In contrast, the average number of pairwise differences within Haplogroups A, D, D1, and D2 are 24.0, 15.5, 7.3, and 1.5, respectively. Although marked differences exist in Haplogroup D frequency between African and non-African HapMap samples (0.99, 0.80, and 0.08 in the ASN, CEU, and YRI samples, respectively) both derived lineages are found in Africa (Haplogroups D1 and D2 exist in the YRI sample at a frequency of 6% and 2%, respectively; fig. 3). Furthermore, haplotype heterozygosities of the YRI, ASN, and CEU samples are 0.953, 0.837, and 0.700, respectively.

In order to examine a data set not limited by the ascertainment biases inherent in the HapMap data set, we sequenced approximately 6 kb of coding and noncoding ALMS1 in 91 globally dispersed individuals (see Methods). As shown in supplementary figure S1, Supplementary Material online, the Neighbor-Joining tree of ALMS1 sequence variation recapitulates the topology of the three divergent haplogroups consisting of Haplogroups A, D1, and D2, and derived haplogroups are present in low frequencies in African samples (0.13). Consistent with the patterns of divergence among haplogroups in the HapMap data, our estimate of the TMRCA of ALMS1 is 2,158 ± 848 kya (see Methods), which is among the oldest reported autosomal TMRCAs (Kreitman and Di Rienzo 2004).

Thus, both the HapMap and resequencing data demonstrate that the origins of Haplogroups D1 and D2 can be traced back to Africa, and these haplogroups have dramatically increased in frequency in Eurasian populations sometime after the dispersal of humans out of Africa. The large divergence among haplogroups and their global occurrence strongly argue for a model of selection acting upon standing variation, rather than on a newly arisen advantageous mutation.

ALMS1 DNA Sequence Variation

Summary and standard neutrality test statistics for the resequencing data are shown in supplementary table S1, Supplementary Material online. Typically, patterns of DNA sequence variation are evaluated by determining how unusual observed values are under neutral expectations. However, this canonical approach fails to properly account for the fact that ALMS1 was not chosen at random, but rather was ascertained based on its high level of population structure. Such ascertainment biases need to be taken into account when interpreting patterns of DNA sequence variation in subsequent analyses of outlier loci (Kreitman and Di Rienzo 2004; Thornton and Jensen 2007). To this end, we used a rejection sampling approach (Beaumont et al. 2002) to explicitly control for the a priori observation of strong population structure when evaluating the probability of observing additional aspects of ALMS1 genetic variation under neutrality as described in the Methods section. For simplicity, we will focus on results from the Han Chinese and CEPH, as the calibrated model of Schaffner et al. (2005) is most appropriate for these samples.

Two interesting points emerge from the simulations. First, when ascertainment is taken into account, values of haplogroup divergence, Tajima's D, and TMRCA are either marginally significant or not significant at all in both the Han Chinese and CEPH samples. At least for Tajima's D, this result is unsurprising, as theoretical analyses have shown that tests of the site frequency spectrum have low power to detect deviations from neutrality under models of selection from standing variation (Hermisson and Pennings 2005; Przeworski et al. 2005; Barrett and Schluter 2008).

Second, table 1 illustrates the contrasting patterns of sequence characteristics between the CEPH and the Han Chinese samples. Specifically, the average pairwise difference between derived lineages is marginally significant in the Han Chinese, but not in the CEPH (table 1). In contrast, Tajima's D and the TMRCA are marginally significant in the CEPH, but not the Han Chinese. This result is due to the fact that the ancestral haplogroup is absent in the Han Chinese, but its frequency is 26% in the CEPH, raising the TMRCA of the latter. Furthermore, the presence of three common and divergent haplogroups in the CEPH leads to a modestly positive Tajima's D.

Unusual Worldwide Distribution of ALMS1 Haplogroups

Recently, over 650,000 SNPs were genotyped in the HGDP–CEPH samples (Li et al. 2008), which consist of over 1,000 individuals from 52 populations (see Methods). In the HGDP–CEPH data, four of the ALMS1 nsSNPs described above (rs3813227, rs6546838, rs2056486, and rs10193972) were genotyped, and we used two additional genotyped SNPs (rs3820700 and rs1052161) to distinguish between Haplogroups D1 and D2. The worldwide distribution of ALMS1 haplogroups (fig. 4) reveals a particularly interesting pattern where Haplogroup D is nearly fixed in East Asian samples (98.9%), but is at considerably lower frequency in the American samples (43.0%). Similarly, the frequency of Haplogroup D1 in the American samples is extremely low (0.8%) compared with East Asian samples (24.6%). Conversely, Haplogroup A is common in the Americas (57.03%) but nearly absent in East Asia (0.01%). This geographic distribution is peculiar given that Asia was the likely source population of the Americas (Karafet et al. 1997; Mulligan et al. 2004; Goebel et al. 2008; Volodko et al. 2008). The simplest explanation for these data is that Haplogroups A and D were both present in Asia before the founding of the Americas, but Haplogroup D dramatically increased in frequency in East Asia sometime after the colonization of the Americas 15–20 kya (Karafet et al. 1997; Mulligan et al. 2004; Goebel et al. 2008; Volodko et al. 2008). The caveats to this interpretation are that the HGDP–CEPH samples are not ideally suited to test models for the peopling of the Americas, and the SNPs typed in these samples have difficult to account for ascertainment bias.

FIG. 4.
Distribution of ALMS1 haplogroups in 52 populations. Haplogroup frequencies are indicated with pie charts. Haplogroups A, D1, and D2 are shown in magenta, green, and blue, respectively.

To evaluate how unusual the worldwide distribution of ALMS1 allele frequency variation is relative to the rest of the genome, which would provide insight into whether purely neutral processes such as genetic drift and serial founder effects (Edmonds et al. 2004; Klopfstein et al. 2006; Hallatschek and Nelson 2008) can account for patterns of ALMS1 variation, we analyzed 643,884 SNPs (see Methods) genotyped in the HGDP–CEPH panel (Li et al. 2008). Specifically, we defined a simple heuristic statistic, which we refer to as the global deviance score (see Methods), to capture the worldwide frequency distribution of ALMS1. Seven ALMS1 SNPs rank in the top 50 SNPs (99.99th percentile). Interestingly, 27 of the top 50 SNPs are located in regions of the genome that have previously been implicated as targets of adaptive evolution (supplementary table S2, Supplementary Material online; see also Wang et al. 2006; Frazer et al. 2007; Kimura et al. 2007; Tang et al. 2007). In addition, genes in the 99.9th percentile of the empirical distribution of global deviance scores are significantly enriched (Bonforroni corrected P < 0.05) for particular PANTHER Pathways, Biological Processes, and Molecular Functions (supplementary table S3, Supplementary Material online). Of particular interest is the observation that genes involved in carbohydrate metabolism (including ALMS1) are significantly enriched among the top 0.1% of loci (supplementary table S3, Supplementary Material online), consistent with previous genomewide scans for selection (Kelley and Swanson 2008), indicating this class of genes has been particularly important in the recent evolutionary history of East Asian populations.

In short, the geographic distribution of ALMS1 haplogroup frequencies in the HGDP–CEPH samples further supports a model of selection from standing variation. In particular, the presence of Haplogroup A at high frequency in the American samples combined with its extremely low frequency in East Asia, suggests that the ancestral haplogroup was present at an appreciable frequency in Asia prior to the colonization of the Americas, and subsequently driven to near extinction as selection promoted the rapid increase in Haplogroup D frequency.

Discussion

ALMS1 possesses many anomalous patterns of genetic variation such as extensive population structure, including a cadre of seven nsSNPs, three divergent haplogroup lineages, and a peculiar spatial distribution in geographically diverse populations. We have shown that these characteristics are inconsistent with purely neutral explanations. However, our data are equally inconsistent with simple models of positive selection acting on a newly arisen advantageous mutation (Maynard-Smith and Haigh 1974). Rather, our results support a model of positive selection acting on standing variation in Eurasia populations. In this model, one or more polymorphisms on Haplogroups D1 and D2, which are found at low frequency in the African samples we analyzed, became adaptive following their dispersal out of Africa and rapidly increased in frequency in Eurasians. Furthermore, by considering the geographic distribution of haplogroup frequencies, we are able to narrow down the likely time frame of selection to be either concurrent with or subsequent to the colonization of the Americas 15–20 kya. This interpretation is consistent with our estimate of the time since the selective sweep on the derived lineages of 15.5 kya (see Methods).

A particularly interesting feature of ALMS1 is the old and divergent haplogroup lineages. After taking into account levels of population structure, the estimated TMRCA and average pairwise divergence among haplogroups are not unusual (table 1), suggesting these characteristics occur with appreciable frequency in highly structured regions of the genome (see also Cornejo and Escalante 2006; Garrigan and Hammer 2006). We note, however, that our analyses have primarily focused on elucidating the recent evolutionary history of ALMS1, which shows compelling evidence for recent directional selection acting on preexisting variation in Eurasian populations; additional studies will be necessary to better delimit the contribution of additional models, such as balancing selection to the long-term evolutionary history of ALMS1. Indeed, ALMS1 possesses higher levels of LD in the YRI relative to the genome-at-large (data not shown), a finding that is surprising given its ancient TMRCA, suggesting some form of nonneutral evolution, such as balancing or frequency dependent selection, in Africa.

We estimate the selection coefficient, s, for ALMS1 to be approximately 0.01–0.05, which is commensurate with magnitudes of selection observed for genes underlying lactase persistence (LCT, s = 0.01–0.05; Bersaglieri et al. 2004; Enattah et al. 2007; Tishkoff et al. 2007) and resistance to malaria (G6PD, s = 0.02–0.05; Tishkoff et al. 2001). Thus, the estimated strength of selection for ALMS1 is among the strongest identified in humans, which begs the question as to the historical selective pressure acting on ALMS1 genetic variation. Although it is clear from Alström Syndrome patients that ALMS1 mutations can influence a spectrum of phenotypes, including obesity, type 2 diabetes, and metabolic disorders, the phenotypic consequences of nonsyndromic variation are unknown.

To explore the role of ALMS1 in metabolic phenotypes further, we reanalyzed the results from a number of genomewide association studies for type 2 diabetes (t Hart et al. 2003; Patel et al. 2006; Wellcome Trust Case Control Consortium 2007; Saxena et al. 2007), none of which implicate the ALMS1 region. However, 18 metabolic traits were measured in Saxena et al. (2007) that were not extensively discussed in the original publication. We obtained association data from this study to test the hypothesis that ALMS1 genetic variation is associated with insulin or glucose-related phenotypes, given the observed clinical manifestations of individuals with Alström Syndrome. Interestingly, in nondiabetic controls ALMS1 SNPs show nominal levels of association to five insulin and glucose related phenotypes (supplementary table S3, Supplementary Material online). The strongest association was observed between rs7598660 and 2-h insulin levels (P = 1.38 × 10−4; supplementary fig. S3, Supplementary Material online), which ranked as the 43rd most significant association among the approximately 380,000 genotyped SNPs. Although these results should be interpreted with caution because of the modest statistical evidence supporting them, which do not attain genomewide significance, they suggest that nonsyndromic variation in ALMS1 may contribute to interindividual variation in the same metabolic phenotypes that are perturbed in Alström Syndrome patients. Additional studies will ultimately be necessary to more clearly define the functional and phenotypic consequences of ALMS1 genetic variation, which in turn will inform inferences about the historical selective pressures acting on this genomic region.

A closer inspection of the genomewide association results for ALMS1 also provides insight into how past adaptive evolution may influence present day distribution and susceptibility to disease. Specifically, the strongest association between ALMS1 genetic variation and metabolic phenotypes was observed between rs7598660 and 2-h insulin levels (supplementary fig. S3, Supplementary Material online). The ancestral allele of rs7598660 is associated with higher 2-h insulin levels (i.e., greater insulin resistance), whereas the derived allele is associated with lower 2-h insulin levels (i.e., less insulin resistance; supplementary fig. S3, Supplementary Material online). As the derived allele is only present on a subset of Haplogroup D2 chromosomes (supplementary table S5, Supplementary Material online) it is unlikely that the rs7598660 polymorphism (or linked variation) was the direct target of selection, but rather increased in frequency in non-African populations by hitchhiking. Therefore, geographically varying selective pressures on ALMS1 resulted in large allele frequency differences of a putative polymorphism (rs7598660 or linked variant) that influences insulin resistance, which is tangential to the primary selective force. Interestingly, the frequency of the rs7598660 derived allele among the HapMap YRI, CEU, and ASN samples is 0.008, 0.669, and 0.367, respectively, which is consistent with a higher prevalence of insulin resistance in African-Americans relative to European-Americans (Haffner et al. 1996; Reiner et al. 2007). Thus, models that attempt to place human disease into an evolutionary context (Di Rienzo and Hudson 2005; Biswas and Akey 2006) may also need to account for indirect selective effects, where susceptibility alleles are not causally related to historical selective pressures but merely go along for the ride on a selected haplotype.

In summary, we have shown that the evolutionary history of ALMS1 is considerably more complex than might have been expected based on its identification as an outlier locus in genomewide scans for selection, involving the interaction of demographic history, geographically restricted selection, and selection from standing variation. An emerging question in the evolution of natural populations is to what extent selection acts on new or preexisting mutations (Orr and Betancourt 2001; Hermisson and Pennings 2005; Przeworski et al. 2005; Barrett and Schluter 2008). This issue has important implications for the evolutionary trajectory of populations (Hermisson and Pennings 2005) and more practically on the types of signatures to pursue in the search for selected loci (Przeworski et al. 2005). A number of examples in humans have been described that are consistent with selection from standing variation such as FY (Hamblin et al. 2002), LCT (Tishkoff et al. 2007; Enattah et al. 2008), and NAT2 (Magalon et al. 2008). Thus, we suspect that when additional candidate selection genes are examined with more scrutiny, selection from standing variation will be found to be a common mechanism of adaptation, driven by the rapid dispersal of humans into new environments during the last 60 ky.

Supplementary Materials

Supplementary tables S1S5 and supplementary figures S1 and S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

[Supplementary Data]

Acknowledgments

We thank members of the Akey Lab and Willie Swanson for helpful discussions and comments on the manuscript. This work was supported by a research grant (1R01GM076036-01A1) from the NIH and a Sloan Fellowship in Computational Biology to J.M.A. and by an NHGRI Interdisciplinary Training in Genomic Sciences grant (HG00035) to L.B.S.

References

  • Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, Kruglyak L. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2004;2:e286. [PMC free article] [PubMed]
  • Arsov T, Silva DG, O'Bryan MK, et al. (13 co-authors) Fat aussie—a new Alstrom syndrome mouse showing a critical role for ALMS1 in obesity, diabetes, and spermatogenesis. Mol Endocrinol. 2006;20:1610–1622. [PubMed]
  • Badano JL, Mitsuma N, Beales PL, Katsanis N. The ciliopathies: an emerging class of human genetic disorders. Annu Rev Genom Hum Genet. 2006;7:125–148. [PubMed]
  • Bahlo M, Griffiths RC. Inference from gene trees in a subdivided population. Theor Popul Biol. 2000;57:79–95. [PubMed]
  • Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. [PubMed]
  • Barrett RD, Schluter D. Adaptation from standing genetic variation. Trends Ecol Evol. 2008;23:38–44. [PubMed]
  • Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in population genetics. Genetics. 2002;162:2025–2035. [PubMed]
  • Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA, Rhodes M, Reich DE, Hirschhorn JN. Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet. 2004;74:1111–1120. [PubMed]
  • Bhangale TR, Stephens M, Nickerson DA. Automating resequencing-based detection of insertion–deletion polymorphisms. Nat Genet. 2006;38:1457–1462. [PubMed]
  • Biswas S, Akey JM. Genomic insights into positive selection. Trends Genet. 2006;22:437–446. [PubMed]
  • Collin GB, Cyr E, Bronson R, et al. (11 co-authors) Alms1-disrupted mice recapitulate human Alstrom syndrome. Hum Mol Genet. 2005;14:2323–2333. [PMC free article] [PubMed]
  • Consortium IH. A haplotype map of the human genome. Nature. 2005;437:1299–1320. [PMC free article] [PubMed]
  • Consortium WTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. [PMC free article] [PubMed]
  • Cornejo OE, Escalante AA. The origin and age of Plasmodium vivax. Trends Parasitol. 2006;22:558–563. [PMC free article] [PubMed]
  • Di Rienzo A, Hudson RR. An evolutionary framework for common diseases: the ancestral-susceptibility model. Trends Genet. 2005;21:596–601. [PubMed]
  • Edmonds CA, Lillie AS, Cavalli-Sforza LL. Mutations arising in the wave front of an expanding population. Proc Natl Acad Sci USA. 2004;101:975–979. [PubMed]
  • Enattah NS, Jensen TG, Nielsen M, et al. (22 co-authors) Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. Am J Hum Genet. 2008;82:57–72. [PubMed]
  • Enattah NS, Trudeau A, Pimenoff V, et al. (27 co-authors) Evidence of still-ongoing convergence evolution of the lactase persistence T-13910 alleles in humans. Am J Hum Genet. 2007;81:615–625. [PubMed]
  • Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186–194. [PubMed]
  • Ewing B, Hillier L, Wendl MC, Green P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. [PubMed]
  • Fay JC, Wu CI. Hitchhiking under positive Darwinian selection. Genetics. 2000;155:1405–1413. [PubMed]
  • Felsenstein J. PHYLIP-Phylogeny Inference Package. Cladistics. 1989;5:164–166.
  • Felsenstein J. 2005 PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author.
  • Frazer KA, Ballinger DG, Cox DR, et al. (233 co-authors) A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PMC free article] [PubMed]
  • Fu YX, Li WH. Statistical tests of neutrality of mutations. Genetics. 1993;133:693–709. [PubMed]
  • Garrigan D, Hammer MF. Reconstructing human origins in the genomic era. Nat Rev Genet. 2006;7:669–680. [PubMed]
  • Gillespie JH. Population genetics: a concise guide. Baltimore (MD): The Johns Hopkins University Press; 1998.
  • Goebel T, Waters MR, O'Rourke DH. The late Pleistocene dispersal of modern humans in the Americas. Science. 2008;319:1497–1502. [PubMed]
  • Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res. 1998;8:195–202. [PubMed]
  • Gordon D, Desmarais C, Green P. Automated finishing with autofinish. Genome Res. 2001;11:614–625. [PubMed]
  • Haffner SM, D'Agostino R, Saad MF, et al. (11 co-authors) Increased insulin resistance and insulin secretion in nondiabetic African-Americans and Hispanics compared with non-Hispanic whites. The Insulin Resistance Atherosclerosis Study. Diabetes. 1996;45:742–748. [PubMed]
  • Hallatschek O, Nelson DR. Gene surfing in expanding populations. Theor Popul Biol. 2008;73:158–170. [PubMed]
  • Hamblin MT, Thompson EE, Di Rienzo A. Complex signatures of natural selection at the Duffy blood group locus. Am J Hum Genet. 2002;70:369–383. [PubMed]
  • Hearn T, Spalluto C, Phillips VJ, Renforth GL, Copin N, Hanley NA, Wilson DI. Subcellular localization of ALMS1 supports involvement of centrosome and basal body dysfunction in the pathogenesis of obesity, insulin resistance, and type 2 diabetes. Diabetes. 2005;54:1581–1587. [PubMed]
  • Hermisson J, Pennings PS. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005;169:2335–2352. [PubMed]
  • Hildebrandt F, Otto E. Cilia and centrosomes: a unifying pathogenic concept for cystic kidney disease? Nat Rev Genet. 2005;6:928–940. [PubMed]
  • Hill WG. Linkage disequilibrium in finite populations. Theor Appl Genet. 1968;38:226–231. [PubMed]
  • Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR. Whole-genome patterns of common DNA variation in three human populations. Science. 2005;307:1072–1079. [PubMed]
  • Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics. 2002;18:337–338. [PubMed]
  • Karafet T, Zegura SL, Vuturo-Brady J, et al. (14 co-authors) Y chromosome markers and Trans-Bering Strait dispersals. Am J Phys Anthropol. 1997;102:301–314. [PubMed]
  • Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM. Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res. 2006;16:980–989. [PubMed]
  • Kelley JL, Swanson WJ. Positive selection in the human genome: from genome scans to biological significance. Annu Rev Genom Hum Genet. 2008;9:143–160. [PubMed]
  • Kelly JK. A test of neutrality based on interlocus associations. Genetics. 1997;146:1197–1206. [PubMed]
  • Kimura R, Fujimoto A, Tokunaga K, Ohashi J. A practical genome scan for population-specific strong selective sweeps that have reached fixation. PLoS ONE. 2007;2:e286. [PMC free article] [PubMed]
  • Klopfstein S, Currat M, Excoffier L. The fate of mutations surfing on the wave of a range expansion. Mol Biol Evol. 2006;23:482–490. [PubMed]
  • Kreitman M, Di Rienzo A. Balancing claims for balancing selection. Trends Genet. 2004;20:300–304. [PubMed]
  • Leitch CC, Zaghloul NA, Davis EE, et al. (14 co-authors) Hypomorphic mutations in syndromic encephalocele genes are associated with Bardet–Biedl syndrome. Nat Genet. 2008;40:443–448. [PubMed]
  • Li G, Vega R, Nelms K, Gekakis N, Goodnow C, McNamara P, Wu H, Hong NA, Glynne R. A role for Alstrom syndrome protein, alms1, in kidney ciliogenesis and cellular quiescence. PLoS Genet. 2007;3:e8. [PubMed]
  • Li JZ, Absher DM, Tang H, et al. (11 co-authors) Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008;319:1100–1104. [PubMed]
  • Magalon H, Patin E, Austerlitz F, Hegay T, Aldashev A, Quintana-Murci L, Heyer E. Population genetic diversity of the NAT2 gene supports a role of acetylation in human adaptation to farming in Central Asia. Eur J Hum Genet. 2008;16:243–251. [PubMed]
  • Maynard-Smith JM, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23:23–35. [PubMed]
  • McPeek MS, Strahs A. Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping. Am J Hum Genet. 1999;65:858–875. [PubMed]
  • Mekel-Bobrov N, Gilbert SL, Evans PD, Vallender EJ, Anderson JR, Hudson RR, Tishkoff SA, Lahn BT. Ongoing adaptive evolution of ASPM, a brain size determinant in Homo sapiens. Science. 2005;309:1720–1722. [PubMed]
  • Mulligan CJ, Hunley K, Cole S, Long JC. Population genetics, history, and health patterns in native americans. Annu Rev Genom Hum Genet. 2004;5:295–315. [PubMed]
  • Orr HA, Betancourt AJ. Haldane's sieve and adaptation from the standing genetic variation. Genetics. 2001;157:875–884. [PubMed]
  • Patel S, Minton JA, Weedon MN, Frayling TM, Ricketts C, Hitman GA, McCarthy MI, Hattersley AT, Walker M, Barrett TG. Common variations in the ALMS1 gene do not contribute to susceptibility to type 2 diabetes in a large white UK population. Diabetologia. 2006;49:1209–1213. [PubMed]
  • Przeworski M, Coop G, Wall JD. The signature of positive selection on standing genetic variation. Evolution. 2005;59:2312–2323. [PubMed]
  • Reiner AP, Carlson CS, Ziv E, Iribarren C, Jaquish CE, Nickerson DA. Genetic ancestry, population sub-structure, and cardiovascular disease-related traits among African-American participants in the CARDIA Study. Hum Genet. 2007;121:565–575. [PubMed]
  • Sabeti PC, Reich DE, Higgins JM, et al. (17 co-authors) Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002;419:832–837. [PubMed]
  • Saxena R, Voight BF, Lyssenko V, et al. (66 co-authors) Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316:1331–1336. [PubMed]
  • Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D. Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005;15:1576–1583. [PubMed]
  • Stephens M, Scheet P. Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet. 2005;76:449–462. [PubMed]
  • Stephens M, Smith NJ, Donnelly P. A new statistical method for haplotype reconstruction from population data. Am J Hum Genet. 2001;68:978–989. [PubMed]
  • t Hart LM, Maassen JA, Dekker JM, Heine RJ. Lack of association between gene variants in the ALMS1 gene and Type 2 diabetes mellitus. Diabetologia. 2003;46:1023–1024. [PubMed]
  • Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. [PubMed]
  • Tang K, Thornton KR, Stoneking M. A new approach for using genome scans to detect recent positive selection in the human gnome. PLoS Biol. 2007;5:e171. [PMC free article] [PubMed]
  • Templeton A. Out of Africa again and again. Nature. 2002;416:45–51. [PubMed]
  • Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003;13:2129–2141. [PubMed]
  • Thomas PD, Kejariwal A, Guo N, Mi H, Campbell MJ, Muruganujan A, Lazareva-Ulitsky B. Applications for protein sequence-function evolution data: mRNA/protein expression analysis and coding SNP scoring tools. Nucleic Acids Res. 2006;34:W645–W650. [PMC free article] [PubMed]
  • Thomson R, Pritchard JK, Shen P, Oefner PJ, Feldman MW. Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. Proc Natl Acad Sci USA. 2000;97:7360–7365. [PubMed]
  • Thornton KR, Jensen JD. Controlling the false-positive rate in multilocus genome scans for selection. Genetics. 2007;175:737–750. [PubMed]
  • Tishkoff SA, Reed FA, Ranciaro A, et al. (19 co-authors) Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet. 2007;39:31–40. [PMC free article] [PubMed]
  • Tishkoff SA, Varkonyi R, Cahinhinan N, et al. (17 co-authors) Haplotype diversity and linkage disequilibrium at human G6PD: recent origin of alleles that confer malarial resistance. Science. 2001;293:455–462. [PubMed]
  • Voight BF, Kudaravalli S, Wen XPritchard JK. A map of recent positive selection in the human genome. PLoS Biol. 2006;4:e72. [PMC free article] [PubMed]
  • Volodko NV, Starikovskaya EB, Mazunin IO, Eltsov NP, Naidenko PV, Wallace DG, Sukernik RI. Mitochondrial genome diversity in arctic Siberians, with particular reference to the evolutionary history of Beringia and Pleistocenic peopling of the Americas. Am J Hum Genet. 2008;82:1084–1100. [PubMed]
  • Wang ET, Kodama G, Baldi P, Moyzis RK. Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci USA. 2006;103:135–140. [PubMed]
  • Weir BS. Genetic data analysis II. Sunderland: Sinauer Associates, Inc Publishers; 1996.
  • Zhang C, Bailey DK, Awad T, et al. (12 co-authors) A whole genome long-range haplotype (WGLRH) test for detecting imprints of positive selection in human populations. Bioinformatics. 2006;22:2122–2128. [PubMed]

Articles from Molecular Biology and Evolution are provided here courtesy of Oxford University Press