We sequenced the genomes of a ~7,000 year old farmer from Germany and eight
~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other
ancient genomes1–4 with 2,345 contemporary humans to show that most
present Europeans derive from at least three highly differentiated populations: West
European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near
Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near
Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but
also harbored WHG-related ancestry. We model these populations’ deep relationships
and show that EEF had ~44% ancestry from a “Basal Eurasian”
population that split prior to the diversification of other non-African lineages.
Using DNA extracted from a finger bone found in Denisova Cave in southern Siberia, we have sequenced the genome of an archaic hominin to about 1.9-fold coverage. This individual is from a group that shares a common origin with Neanderthals. This population was not involved in the putative gene flow from Neanderthals into Eurasians; however, the data suggest that it contributed 4–6% of its genetic material to the genomes of present-day Melanesians. We designate this hominin population ‘Denisovans’ and suggest that it may have been widespread in Asia during the Late Pleistocene epoch. A tooth found in Denisova Cave carries a mitochondrial genome highly similar to that of the finger bone. This tooth shares no derived morphological features with Neanderthals or modern humans, further indicating that Denisovans have an evolutionary history distinct from Neanderthals and modern humans.
We propose a method that uses genetic data to test for the occurrence of a recent range expansion and to infer the location of the origin of the expansion. We introduce a statistic ψ (the directionality index) that detects asymmetries in the two-dimensional allele frequency spectrum of pairs of population. These asymmetries are caused by the series of founder events that happen during an expansion and they arise because low frequency alleles tend to be lost during founder events, thus creating clines in the frequencies of surviving low-frequency alleles. Using simulations, we show that ψ is more powerful for detecting range expansions than both FST and clines in heterozygosity. We also show how we can adapt our approach to more complicated scenarios such as expansions with multiple origins or barriers to migration and we illustrate the utility of ψ by applying it to a data set from modern humans.
A likelihood method is introduced that jointly estimates the number of loci and the additive effect of alleles that account for the genetic variance of a normally distributed quantitative character in a randomly mating population. The method assumes that measurements of the character are available from one or both parents and an arbitrary number of full siblings. The method uses the fact, first recognized by Karl Pearson in 1904, that the variance of a character among offspring depends on both the parental phenotypes and on the number of loci. Simulations show that the method performs well provided that data from a sufficient number of families (on the order of thousands) are available. This method assumes that loci are in Hardy-Weinberg and linkage equilibrium but does not assume anything about linkage relationships. It performs equally well if all loci are on the same non-recombining chromosome provided they are in linkage equilibrium. The method can be adapted to take account of loci already identified as being associated with the character of interest. In that case, the method estimates the number of loci not already known to be affect the character. The method applied to measurements of crown-rump length in 281 family trios in a captive colony of African green monkeys (Chlorocebus aethiopus sabaeus) estimates the number of loci to be 112 and the additive effect to be 0.26 cm. A parametric bootstrap analysis shows that a rough confidence interval has a lower bound of 14 loci.
Wright-Castle method; quantitative genetics
We introduce a new method to detect ancient selective sweeps centered on a candidate site. We explored different patterns produced by sweeps around a fixed beneficial mutation, and found that a particularly informative statistic measures the consistency between majority haplotypes near the mutation and genotypic data from a closely related population. We incorporated this statistic into an approximate Bayesian computation (ABC) method that tests for sweeps at a candidate site. We applied this method to simulated data and show that it has some power to detect sweeps that occurred more than 10,000 generations in the past. We also applied it to 1,000 Genomes and Complete Genomics data combined with high-coverage Denisovan and Neanderthal genomes to test for sweeps in modern humans since the separation from the Neanderthal–Denisovan ancestor. We tested sites at which humans are fixed for the derived (i.e., nonchimpanzee allele) whereas the Neanderthal and Denisovan genomes are homozygous for the ancestral allele. We observe only weak differences in statistics indicative of selection between functional categories. When we compare patterns of scaled diversity or use our ABC approach, we fail to find a significant difference in signals of classic selective sweeps between regions surrounding nonsynonymous and synonymous changes, but we detect a slight enrichment for reduced scaled diversity around splice site changes. We also present a list of candidate sites that show high probability of having undergone a classic sweep in the modern human lineage since the split from Neanderthals and Denisovans.
selective sweeps; modern humans; Neanderthal; Denisova; approximate Bayesian computation
We present a high-quality genome sequence of a Neandertal woman from Siberia. We show that her parents were related at the level of half siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neandertal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neandertals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high quality Neandertal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neandertals and Denisovans.
Estimating dispersal distances from population genetic data provides an important alternative to logistically-taxing methods for directly observing dispersal. While methods for estimating dispersal rates between a modest number of discrete demes are well developed, methods of inference applicable to “isolation-by-distance” models are much less established. Here we present a method for estimating ρσ2, the product of population density (ρ) and the variance of the dispersal displacement distribution (σ2). The method is based on the assumption that low-frequency alleles are identical by descent. Hence, the extent of geographic clustering of such alleles, relatively to their frequency in the population, provides information about ρσ2. We show that a novel likelihood-based method can infer this composite parameter with a modest bias in a lattice model of isolation-by-distance. For calculating the likelihood, we use an importance sampling approach to average over the unobserved intra-allelic genealogies, where the intra-allelic genealogies are modeled as a pure birth process. The approach also leads to a likelihood ratio test of isotropy of dispersal, i.e. whether dispersal distances on two axes are different. We test the performance of our methods using simulations of new mutations in a lattice model and illustrate its use with a data set from Arabidopsis thaliana.
Balancing selection has maintained human leukocyte antigen (HLA) allele diversity, but it is unclear whether this selection is symmetric (all heterozygotes are comparable and all homozygotes are comparable in terms of fitness) or asymmetric (distinct heterozygote genotypes display greater fitness than others). We tested the hypothesis that HLA is under asymmetric balancing selection in populations by estimating allelic branch lengths from genetic sequence data encoding peptide-binding domains. Significant deviations indicated changes in the ratio of terminal to internal branch lengths. Such deviations could arise even if no individual alleles present a strikingly altered branch length (e.g. if there is an overall distortion, with all or many terminal branches being longer than expected). DQ and DP loci were also analyzed as haplotypes. Using allele frequencies for 419 distinct populations in 10 geographical regions, we examined population differentiation in alleles within and between regions, and the relationship between allelic branch length and frequency. The strongest evidence for asymmetrical balancing selection was observed for HLA-DRB1, HLA-B and HLA-DPA1, with significant deviation (P ≤ 1.1 × 10−4) in about half of the populations. There were significant results at all loci except HLA-DQB1/DQA1. We observed moderate genetic variation within and between geographic regions, similar to the rest of the genome. Branch length was not correlated with allele frequency. In conclusion, sequence data suggest that balancing selection in HLA is asymmetric (some heterozygotes enjoy greater fitness than others). Because HLA polymorphism is crucial for pathogen resistance, this may manifest as a frequency-dependent selection with fluctuation in the fitness of specific heterozygotes over time.
Neanderthals have been shown to share more genetic variants with present-day non-Africans than Africans. Recent admixture between Neanderthals and modern humans outside of Africa was proposed as the most parsimonious explanation for this observation. However, the hypothesis of ancient population structure within Africa could not be ruled out as an alternative explanation. We use simulations to test whether the site frequency spectrum, conditioned on a derived Neanderthal and an ancestral Yoruba (African) nucleotide (the doubly conditioned site frequency spectrum [dcfs]), can distinguish between models that assume recent admixture or ancient population structure. We compare the simulations to the dcfs calculated from data taken from populations of European, Chinese, and Japanese descent in the Complete Genomics Diversity Panel. Simulations under a variety of plausible demographic parameters were used to examine the shape of the dcfs for both models. The observed shape of the dcfs cannot be explained by any set of parameter values used in the simulations of the ancient structure model. The dcfs simulations for the recent admixture model provide a good fit to the observed dcfs for non-Africans, thereby supporting the hypothesis that recent admixture with Neanderthals accounts for the greater similarity of Neanderthals to non-Africans than Africans.
admixture; ancient structure; Neanderthal; frequency spectrum
Elucidating the process of speciation requires an in-depth understanding of the evolutionary history of the species in question. Studies that rely upon a limited number of genetic loci do not always reveal actual evolutionary history, and often confuse inferences related to phylogeny and speciation. Whole-genome data, however, can overcome this issue by providing a nearly unbiased window into the patterns and processes of speciation. In order to reveal the complexity of the speciation process, we sequenced and analyzed the genomes of 10 wild pigs, representing morphologically or geographically well-defined species and subspecies of the genus Sus from insular and mainland Southeast Asia, and one African common warthog.
Our data highlight the importance of past cyclical climatic fluctuations in facilitating the dispersal and isolation of populations, thus leading to the diversification of suids in one of the most species-rich regions of the world. Moreover, admixture analyses revealed extensive, intra- and inter-specific gene-flow that explains previous conflicting results obtained from a limited number of loci. We show that these multiple episodes of gene-flow resulted from both natural and human-mediated dispersal.
Our results demonstrate the importance of past climatic fluctuations and human mediated translocations in driving and complicating the process of speciation in island Southeast Asia. This case study demonstrates that genomics is a powerful tool to decipher the evolutionary history of a genus, and reveals the complexity of the process of speciation.
We investigate the consequences of adopting the criteria used by the state of California, as described by Myers et al. (2011), for conducting familial searches. We carried out a simulation study of randomly generated profiles of related and unrelated individuals with 13-locus CODIS genotypes and YFiler® Y-chromosome haplotypes, on which the Myers protocol for relative identification was carried out. For Y-chromosome sharing first degree relatives, the Myers protocol has a high probability () of identifying their relationship. For unrelated individuals, there is a low probability that an unrelated person in the database will be identified as a first-degree relative. For more distant Y-haplotype sharing relatives (half-siblings, first cousins, half-first cousins or second cousins) there is a substantial probability that the more distant relative will be incorrectly identified as a first-degree relative. For example, there is a probability that a first cousin will be identified as a full sibling, with the probability depending on the population background. Although the California familial search policy is likely to identify a first degree relative if his profile is in the database, and it poses little risk of falsely identifying an unrelated individual in a database as a first-degree relative, there is a substantial risk of falsely identifying a more distant Y-haplotype sharing relative in the database as a first-degree relative, with the consequence that their immediate family may become the target for further investigation. This risk falls disproportionately on those ethnic groups that are currently overrepresented in state and federal databases.
It is a challenging task to infer selection intensity and allele age from population genetic data. Here we present a method that can efficiently estimate selection intensity and allele age from the multilocus haplotype structure in the vicinity of a segregating mutant under positive selection. We use a structured-coalescent approach to model the effect of directional selection on the gene genealogies of neutral markers linked to the selected mutant. The frequency trajectory of the selected allele follows the Wright-Fisher model. Given the position of the selected mutant, we propose a simplified multilocus haplotype model that can efficiently model the dynamics of the ancestral haplotypes under the joint influence of selection and recombination. This model approximates the ancestral genealogies of the sample, which reduces the number of states from an exponential function of the number of single-nucleotide polymorphism loci to a quadratic function. That allows parameter inference from data covering DNA regions as large as several hundred kilo-bases. Importance sampling algorithms are adopted to evaluate the probability of a sample by exploring the space of both allele frequency trajectories of the selected mutation and gene genealogies of the linked sites. We demonstrate by simulation that the method can accurately estimate selection intensity for moderate and strong positive selection. We apply the method to a data set of the G6PD gene in an African population and obtain an estimate of 0.0456 (95% confidence interval 0.0144−0.0769) for the selection intensity. The proposed method is novel in jointly modeling the multilocus haplotype pattern caused by recombination and mutation, allowing the analysis of haplotype data in recombining regions. Moreover, the method is applicable to data from populations under exponential growth and a variety of other demographic histories.
selection coefficient; allele age; haplotype structure; structured coalescent; importance sampling; time-varying population size
We present a DNA library preparation method that has allowed us to reconstruct a high coverage (30X) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.
Despite extensive genetic analysis, the evolutionary relationship between polar bears (Ursus maritimus) and brown bears (U. arctos) remains unclear. The two most recent comprehensive reports indicate a recent divergence with little subsequent admixture or a much more ancient divergence followed by extensive admixture. At the center of this controversy are the Alaskan ABC Islands brown bears that show evidence of shared ancestry with polar bears. We present an analysis of genome-wide sequence data for seven polar bears, one ABC Islands brown bear, one mainland Alaskan brown bear, and a black bear (U. americanus), plus recently published datasets from other bears. Surprisingly, we find clear evidence for gene flow from polar bears into ABC Islands brown bears but no evidence of gene flow from brown bears into polar bears. Importantly, while polar bears contributed <1% of the autosomal genome of the ABC Islands brown bear, they contributed 6.5% of the X chromosome. The magnitude of sex-biased polar bear ancestry and the clear direction of gene flow suggest a model wherein the enigmatic ABC Island brown bears are the descendants of a polar bear population that was gradually converted into brown bears via male-dominated brown bear admixture. We present a model that reconciles heretofore conflicting genetic observations. We posit that the enigmatic ABC Islands brown bears derive from a population of polar bears likely stranded by the receding ice at the end of the last glacial period. Since then, male brown bear migration onto the island has gradually converted these bears into an admixed population whose phenotype and genotype are principally brown bear, except at mtDNA and X-linked loci. This process of genome erosion and conversion may be a common outcome when climate change or other forces cause a population to become isolated and then overrun by species with which it can hybridize.
The evolutionary genetic relationship between polar bears (Ursus maritimus) and brown bears (U. arctos) is a subject of continuing controversy. To address this we generated genome-wide sequence data for seven polar bears, two brown bears (including one from the enigmatic ABC Islands population), and a black bear (U. americanus). These data reveal remarkable genetic homogeneity within polar bears and clear evidence of past hybridization with brown bears. Hybridization, however, appears to be limited to habitat islands, where isolated populations of polar bears are gradually converted into brown bears via male-mediated dispersal and sex-biased gene flow. Our simplified and comprehensive model for the origin and evolution of polar bears resolves conflicting interpretations of mitochondrial and nuclear genetic data, and highlights the potential effect of natural climate change on long-term evolutionary processes.
One enduring question in evolutionary biology is the extent of archaic admixture in the genomes of present-day populations. In this paper, we present a test for ancient admixture that exploits the asymmetry in the frequencies of the two nonconcordant gene trees in a three-population tree. This test was first applied to detect interbreeding between Neandertals and modern humans. We derive the analytic expectation of a test statistic, called the D statistic, which is sensitive to asymmetry under alternative demographic scenarios. We show that the D statistic is insensitive to some demographic assumptions such as ancestral population sizes and requires only the assumption that the ancestral populations were randomly mating. An important aspect of D statistics is that they can be used to detect archaic admixture even when no archaic sample is available. We explore the effect of sequencing error on the false-positive rate of the test for admixture, and we show how to estimate the proportion of archaic ancestry in the genomes of present-day populations. We also investigate a model of subdivision in ancestral populations that can result in D statistics that indicate recent admixture.
admixture; gene genealogies; lineage sorting
We generalize a recently introduced graphical framework to compute the probability that haplotypes or genotypes of two individuals drawn from a finite, subdivided population match. As in the previous work, we assume an infinite-alleles model. We focus on the case of a population divided into two subpopulations, but the underlying framework can be applied to a general model of population subdivision. We examine the effect of population subdivision on the match probabilities and the accuracy of the product rule which approximates multi-locus match probabilities as a product of one-locus match probabilities. We quantify the deviation from predictions of the product rule by R, the ratio of the multi-locus match probability to the product of the one-locus match probabilities.We carry out the computation for two loci and find that ignoring subdivision can lead to underestimation of the match probabilities if the population under consideration actually has subdivision structure and the individuals originate from the same subpopulation. On the other hand, under a given model of population subdivision, we find that the ratio R for two loci is only slightly greater than 1 for a large range of symmetric and asymmetric migration rates. Keeping in mind that the infinite-alleles model is not the appropriate mutation model for STR loci, we conclude that, for two loci and biologically reasonable parameter values, population subdivision may lead to results that disfavor innocent suspects because of an increase in identity-by-descent in finite populations. On the other hand, for the same range of parameters, population subdivision does not lead to a substantial increase in linkage disequilibrium between loci. Those results are consistent with established practice.
Match probability; Population subdivision; Product rule; Match graph
A rare mutation in the RSPH9 gene leading to Primary Ciliary Dyskinesia was previously identified in two Bedouin families, one from Israel and one from the United Arab Emirates (UAE). Herein we analyze mutation segregation in the Israeli family, present the clinical disease spectrum, and estimate mutation age in the two families. Mutation segregation was studied by restriction fragment length analysis. Mutation ages were estimated using a model of the decrease in the length of ancestral haplotypes. The mutations in each of the two families had a common ancestor less than 95 and 17 generations in the past. If the mutations in the two families are descended from a common ancestor, that mutation would have to have arisen at least 150 generations ago. If the Bedouin population has been roughly constant in size for at least 6000 years, it is possible that the mutations in the two families are identical by descent. If there were substantial fluctuations in the size of the Bedouin population, it is more likely that there were two independent mutations. Based on the available data, the population genetic analysis does not strongly favor one conclusion over the other.
primary ciliary dyskinesia; Bedouin; founder mutation
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
In this report, we investigate the statistical power of several tests of selective neutrality based on patterns of genetic diversity within and between species. The goal is to compare tests based solely on population genetic data with tests using comparative data or a combination of comparative and population genetic data. We show that in the presence of repeated selective sweeps on relatively neutral background, tests based on the dN/dS ratios in comparative data almost always have more power to detect selection than tests based on population genetic data, even if the overall level of divergence is low. Tests based solely on the distribution of allele frequencies or the site frequency spectrum, such as the Ewens–Watterson test or Tajima's D, have less power in detecting both positive and negative selection because of the transient nature of positive selection and the weak signal left by negative selection. The Hudson–Kreitman–Aguadé test is the most powerful test for detecting positive selection among the population genetic tests investigated, whereas McDonald–Kreitman test typically has more power to detect negative selection. We discuss our findings in the light of the discordant results obtained in several recently published genomic scans.
HKA test; Ewens–Watterson test; dN/dS; McDonald–Kreitman test; Tajima's D; neutrality test; genomic scan; statistical power
Metagenomic sequencing projects from environments dominated by a small number of species produce genome-wide population samples. We present a two-site composite likelihood estimator of the scaled recombination rate, ρ = 2Nec, that operates on metagenomic assemblies in which each sequenced fragment derives from a different individual. This new estimator properly accounts for sequencing error, as quantified by per-base quality scores, and missing data, as inferred from the placement of reads in a metagenomic assembly. We apply our estimator to data from a sludge metagenome project to demonstrate how this method will elucidate the rates of exchange of genetic material in natural microbial populations. Surprisingly, for a fixed amount of sequencing, this estimator has lower variance than similar methods that operate on more traditional population genetic samples of comparable size. In addition, we can infer variation in recombination rate across the genome because metagenomic projects sample genetic diversity genome-wide, not just at particular loci. The method itself makes no assumption specific to microbial populations, opening the door for application to any mixed population sample where the number of individuals sampled is much greater than the number of fragments sequenced.
At a broad scale, the exchange of genetic material through homologous recombination (i.e. what happens in animals during sex) increases the potential rate of adaptation. Bacteria often reproduce clonally, without recombination, by making exact copies of their genomes, but they also have mechanisms analogous to sex that allow them to recombine sporadically. Despite microbes' critical role at the base of our world's ecosystem, microbiologists know surprisingly little about how microbes grow and evolve outside the laboratory. Metagenomic sequencing projects provide a means to sample the genetic diversity of natural microbial populations and have the potential to reveal much about the ecology and evolution of these populations. Here we present a novel method to estimate the recombination rate from metagenomic data, while explicitly allowing for imperfections such as sequencing error and missing data.
We consider gene trees in three species for which the species tree is known. We show that population subdivision in ancestral species can lead to asymmetry in the frequencies of the two gene trees not concordant with the species tree and, if subdivision is extreme, cause the one of the nonconcordant gene trees to be more probable than the concordant gene tree. Although published data for the human–chimp–gorilla clade and for three species of Drosophila show asymmetry consistent with our model, sequencing error could also account for observed patterns. We show that substantial levels of persistent ancestral subdivision are needed to account for the observed levels of asymmetry found in these two studies.
gene genealogy; transspecies polymorphism; lineage sorting
A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small.
The genealogical relationships of individuals in a finite population can create statistical non-independence of alleles at unlinked loci. In this paper, we introduce a flexible graphical method for computing the probabilities that two individuals in a finite, randomly-mating population have the same haplotype or genotype at several loci. This method allows us to generalize the analysis of Laurie and Weir (2003) to cases with more loci and other models of mating. We show that monogamy increases the probabilities of genotypic matches at unlinked loci and that the effect of monogamy increases with the number L of loci. We conjecture a sharp upper bound on the effect of monogamy for a given L.
match probability; product rule; unlinked; linkage disequilibrium; monogamy; match graph
Despite being one of the most studied families within the Carnivora, the phylogenetic relationships among the members of the bear family (Ursidae) have long remained unclear. Widely divergent topologies have been suggested based on various data sets and methods.
We present a fully resolved phylogeny for ursids based on ten complete mitochondrial genome sequences from all eight living and two recently extinct bear species, the European cave bear (Ursus spelaeus) and the American giant short-faced bear (Arctodus simus). The mitogenomic data yield a well-resolved topology for ursids, with the sloth bear at the basal position within the genus Ursus. The sun bear is the sister taxon to both the American and Asian black bears, and this clade is the sister clade of cave bear, brown bear and polar bear confirming a recent study on bear mitochondrial genomes.
Sequences from extinct bears represent the third and fourth Pleistocene species for which complete mitochondrial genomes have been sequenced. Moreover, the cave bear specimen demonstrates that mitogenomic studies can be applied to Pleistocene fossils that have not been preserved in permafrost, and therefore have a broad application within ancient DNA research. Molecular dating of the mtDNA divergence times suggests a rapid radiation of bears in both the Old and New Worlds around 5 million years ago, at the Miocene-Pliocene boundary. This coincides with major global changes, such as the Messinian crisis and the first opening of the Bering Strait, and suggests a global influence of such events on species radiations.