To study how LD decays with the distance in the genome, we computed the average value,
, of the measure of linkage disequilibrium
D (the excess rate of occurrence of derived alleles at two SNPs compared with the expectation if they were independent
[36]) between pairs of SNPs binned by genetic distance
x (see
Methods). Immediately after the time of last gene flow between Neandertal (or their relatives) and human ancestors, long range LD is generated, and it is then expected to decay at a constant rate per generation as recombination breaks down the segments shared with Neandertals. Thus, in the absence of new LD-generating events (discussed further below), the

statistic across pairs of introgressed alleles is expected to have an exponential decay with genetic distance, and the genetic extent of the decay can thus be interpreted in terms of the time of last shared ancestry between Neandertals (or their relatives) and modern humans (Section S1 and Appendix A in
Text S1).
To amplify the signal of admixture LD relative to non-admixture LD, we restricted our analysis to SNPs where the “derived” allele (the one that has arisen as a new mutation as determined by comparison to chimpanzee) is found in Neandertals and occurs in the tested population at a frequency of <10%. The justification for this frequency threshold is two-fold. First, the signal of Neandertals being more closely related to non-Africans than to Africans is substantially enriched at SNPs below this threshold (Section S1 in
Text S1). Second, under the model of recent gene flow, such SNPs have an increased probability of having arisen due to mutations on the Neandertal lineage; we estimate that about 30% of them will have arisen on the Neandertal lineage under a model of history that we fitted to the data. This ascertainment enriches the class of informative SNPs by a factor of ten (Section S1 in
Text S1). Our simulations show that restricting to this class of SNPs yields accurate estimates of the time of gene flow for a wide range of demographic histories consistent with patterns of human variation (Section S2 in
Text S1).
To assess how useful this statistic is for measuring admixture LD, we performed coalescent simulations of 100 regions of a million base pairs each, for a range of demographic histories chosen to be plausible for Neandertals, West Africans and non-Africans (these histories were constrained by the observed population differentiation between west Africans and Europeans as measured by their F
ST and the quantitative extent to which Neandertals share more derived alleles with Europeans than with Africans). The simulation results, which we discuss at length in Section S2 of
Text S1, and summarize in , show that we obtain accurate and relatively unbiased estimates of the number of generations since admixture (never more than 15% from the true value) for (1) constant-sized population scenarios, (2) demographic models that include population bottlenecks as well as more recent admixture after the gene flow, (3) hybrid models of ancient structure and recent gene flow, and (4) mutation rates that differ by a factor of 5 from what we use in our main simulations ( see ). Two other SNP ascertainment schemes yield qualitatively consistent findings but the ascertainment we used provides the most accurate estimates under the range of demographic models considered (Section S5 of
Text S1 and ). The simulations also show that in the absence of gene flow (including in the scenario of ancient subdivision), the dates obtained are always at least 5,000 generations for scenarios of demographic history that match the constraints of real human data. Thus, an empirical estimate of a date much less than 5,000 generations likely reflects real gene flow.
| Table 1Estimates of the time of gene flow for different demographic models and mutation rates as well as different ascertainments. |
We applied our statistic to data from Pilot 1 of the 1000 Genomes Project, which discovered polymorphisms in 59 West Africans, 60 European Americans, and 60 East Asians (Han Chinese and Japanese from Tokyo)
[37]. We binned pairs of SNPs by the genetic distance between them using the deCODE genetic map. We considered all pairs of SNPs that are at most 1 cM apart. We computed the average LD over all pairs of SNPs in each bin and fit an exponential curve to the decay of LD (from 0.02–1 cM in 0.001 cM increments).
shows the extent of LD for pairs of SNPs where both SNPs have a derived allele frequency <10%. This figure shows that the extent of LD is larger in Europeans and East Asians than in West Africans, both when the Neandertal genome carries the derived and when it carries the ancestral allele. Empirical features of these LD decay curves show that, for alleles derived in the Neandertal genome, the pattern in Europeans and East Asians is reflecting “admixture LD”. LD in West Africans is less extensive when Neandertals carry the derived allele than when they carry the ancestral allele, while the reverse is seen in Eurasians. To understand this, we note that in the absence of gene flow, polymorphic sites where Neandertals carry the derived allele must have arisen from mutations that occurred prior to Neandertal-human divergence so that they are old and recombination will have had a lot of time to break down the LD, while sites where Neandertals carry the ancestral allele mutations will include mutations that have arisen since the Neandertal-human split and thus LD will be expected to be more extensive, exactly as is seen in West Africans. In contrast, if gene flow occurred, then LD can be greater at sites where Neandertals carry the derived allele as is observed in Europeans and East Asians. This signal persists when we stratify the LD decay curves by the frequency of the ascertained SNPs (Figure S8 in
Text S1). Thus the scale of the LD at these sites must be conveying information about the date of gene flow.
A concern in interpreting the extent of LD in terms of a date is that all available genetic maps (which specify the probability of recombination per generation between all pairs of SNPs) are likely to be inaccurate at the scale of tens of kilobases that is relevant to our analysis. We confirmed that errors in genetic maps can bias LD-based date estimates by simulating a gene flow event 2,000 generations ago using a model in which recombination was localized to hot spots
[38] but where the data were analyzed assuming a genetic map that assumed homogeneous recombination rates across the genome. This led to a date of 1,597 generations since admixture. We developed a statistical model of the random errors that relate the true and observed genetic maps (see
Methods). The precision of the map is modeled using a scalar parameter α. A unit interval of the observed genetic map corresponds to an interval in the true map of expected unit length and variance 1/α. To validate this error model, we estimated the map error in these simulations (α) by comparing the true and the observed genetic maps. Theoretical arguments (Section S3 in
Text S1) show that we can obtain a corrected date (
tGF) from the uncorrected date in generations (λ) using the equation
tGF
=

α(e
λ/α-1). We applied this correction to obtain a date of 1,926 generations. While this error model appears to provide an adequate description of random errors in a genetic map, it does not account for systematic biases.
To apply this statistical correction to real data, we estimated the error rate α in the genetic map by comparing the genomic distribution of a set of cross-over events from 728 meioses previously detected in a European American Hutterite pedigree
[39] to what would be expected if the map were perfect. Unfortunately, the map that we would ideally want to use for estimating the date of Neandertal admixture is not the genetic map that applies to Hutterites today, but the time-averaged genetic map that applied between the present and the date of gene flow. Obviously, such a map is not available, but we hypothesize that by performing our analyses using a genetic map that is built from samples more closely related to the Hutterite pedigree than the map that we would like to analyze (the deCODE pedigree map built in Icelanders) as well as a genetic map that averages over too long a period of time (the European LD Map, which measures recombination over approximately five hundred thousand years), we can obtain some sense of the robustness of our inferences to uncertainties in how the European genetic map has changed over time.
shows the estimates of λ, α and
tGF in Europeans obtained using the two genetic maps. The estimates of
tGF are in 1,805–2,043 over the deCODE and European LD maps. We also estimated λ in East Asians using the “East Asian LD map”. We find that λ in East Asians based on the East Asian LD map is 1,253–1,287, similar to the 1,159–1,183 in Europeans based on the European LD map, although the similarity of these numbers does not prove the Neandertal genetic material in Europeans and East Asians derives from the same ancestral gene flow event. While a shared ancestral gene flow event is plausible, the gene flow events could in principle have occurred in different places at around the same time
[40]. We also cannot reliably estimate the recombination rate correction factor α for the East Asian map because we do not have access to cross-over events in an East Asian pedigree, and hence we do not present an estimate of
tGF in East Asians and focus on Europeans in the rest of this paper.
| Table 2Admixture dates for Europeans. |
To convert the date estimates in generations to date estimates in years, we use an average generation interval which has been estimated to be 29 in diverse modern hunter gatherer societies as well as in developing and industrialized nation states
[41]. We assume a uniform prior probability distribution of generation times between 25 and 33 years per generation for the true value of this quantity and integrate this with the uncertainty of λ and α, and obtain an estimate of last gene exchange between Neandertals and European ancestors of 47,334–63,146 years for the deCODE map, and 49,021–64,926 years for the European LD Map (95% credible intervals). Taking the conservative union of these ranges, we obtain 47,000–65,000 years BP. In our simulations of ascertainment strategy, we found demographic models that can produce biases in the date estimates that could be as large as 15% (Section S2 in
Text S1). To be conservative, we applied this to the uncorrected dates from each of the maps and then applied the relevant map correction. The union of the resulting intervals leads us to conclude that the true date of gene flow could be as young as 37,000 years BP or as old as 86,000 years BP.
We considered the possibility that our results might be biased by natural selection, which is known to affect patterns of human genetic diversity and to have had a much larger effect closer to genes
[42],
[43]. We estimated the time of gene flow stratifying the SNPs by their distance to the nearest exon, dividing the data into 5 bins such that each bin contained 20% of all the SNPs. Using the deCODE map, we obtain λ

=

1,145–1,301 in all bins (Table S8 in
Text S1). This estimate is concordant with the λ

=

1,201 obtained without stratification, and suggests that our inferences are not an artifact of LD generated by directional natural selection.