Demographic models inferred from genetic data play several important roles in population genetics. First, they complement archeological evidence in understanding prehistorical events (such as the number and timing of major continental migrations) which have left no written record 
. Second, they facilitate the search for genetic regions that have been targets of non-neutral forces, such as recent natural selection, by guiding our expectations as to how much sequence and haplotype variation one expects to see in a given genomic region (and, more importantly, the variance around these expectations) 
. Finally, existing demographic models can guide sampling design for subsequent population or medical genetic studies.
Given their many uses, it is not surprising that many studies have inferred demographic models for populations of humans and other species 
The process of inferring a demographic model consistent with a particular data set typically involves exploring a large parameter space by simulating the model many times, often using coalescent-theory based Monte Carlo approaches. For computational reasons, many of the demographic inference procedures developed thus far have focused on single population models or models with multiple populations but no subsequent migration after subpopulations split (i.e., 
, but also see 
). Methods that do consider multiple populations with migration often assume independent non-recombining regions 
and do not often scale to genomic size data sets. Approaches for jointly considering recombination and migration often use a restricted set of summary statistics 
of the data, which limits their statistical power. Finally, complex demographic inferences that make use of many summary statistics are often very computationally intensive 
, which precludes thorough investigation of their statistical properties.
Here, we develop and apply a computationally efficient diffusion-based approach to the problem of demographic inference, based on the multi-population allele frequency spectrum (AFS) (i.e., the joint distribution of allele frequencies across diallelic variants) 
. Given a genetic region sequenced in multiple individuals from each of
populations, the resulting AFS is a P
-dimensional matrix. Each entry of this matrix records the number of diallelic genetic polymorphisms in which the derived allele was found in the corresponding number of samples from each population. For example, if diploid individuals from two populations were sequenced, with 10 individuals from population 1 and 5 from population 2, the AFS would be a 21-by-11 matrix (indexed from 0). The [2,0] entry would record the number of polymorphisms for which the derived allele was seen twice in population 1 but never seen in population 2, while the 
entry would record polymorphisms for which the derived allele was homozygous in all individuals from population 1 and seen 5 times in population 2. If all polymorphic sites possess only two alleles and can be considered independent, the AFS is a complete summary of the data. Many of the statistics commonly used for population genetic inference, such as
, are summaries of the AFS (see 
Efficient techniques exist for simulating the AFS of a single population 
. The joint AFS between two populations has been used by several recent studies 
, but these have all relied upon very computationally intensive coalescent simulations. Here we approximate the joint multi-population AFS by numerical solution of a diffusion equation, and our implementation supports up to three simultaneous populations. Because the diffusion approach neglects linkage, our comparison with the data is through a composite likelihood function. Such likelihoods are consistent estimators under a wide range of population genetic scenarios for selectively-neutral data, but do not correctly capture variances 
. (Lower recombination induces higher linkage and higher variance in the entries of the AFS.) As we demonstrate below, the efficiency of our diffusion approach enables both conventional and parametric bootstrap resampling of the data, allowing us to accurately estimate confidence intervals for parameter values and critical values for hypothesis tests 
, accounting for any degree of linkage found in the data. This bootstrap procedure overcomes the traditional concerns with composite likelihood as a philosophy for inference in population genetics.
To demonstrate the utility of our approach, we apply our method to two epochs in human history, using single nucleotide polymorphism (SNP) data from the Environmental Genome Project (EGP) 
, the largest public database of human resequencing data. We first study the expansion of humans out of Africa, jointly modeling the history of African, European, and East Asian populations. We then study the settlement of the New World, jointly modeling European, East Asian, and admixed Mexican populations. In both cases, we quantify the uncertainty of our parameter inferences and test hypotheses about migration (bootstrapping to account for linkage). In particular, we infer an earlier divergence between African and Eurasian populations than previous studies, because our inferences account for the substantial migration between these populations. Our methods also find no evidence for multiple migrations between East Asia and the New World. While similarly complex models for human continental populations have been studied 
, to our knowledge, our analysis is the first in which the full joint AFS is used for inference and in which uncertainty and goodness-of-fit have been quantified.
An important advantage of the diffusion approach is the ease with which selection can be incorporated. As an illustrative application, we also predict the distribution of protein-coding variation between populations. In agreement with the data, we find that less nonsynonymous variation is shared between populations than might be expected based only on patterns of shared noncoding variation.
While no model can capture the full complexity of any species' genetic history, the models presented refine our understanding of the expansion of humanity across the globe. None of the methodology is specific to humans, and we expect our method will find wide application to demographic inference of other species.