Our results provide some important insights in detecting historic admixture. The simulations we present illustrate the effect that initial parameters have on the outcome of human admixture. Simple adjustments in the parameters in our simulation series changed the expected allele frequency outcome from as low as 0.0005 to over 0.50, an increase of three orders of magnitude. The results of any admixture study using genetic data, then, are highly dependent on the variables presented in these simulations (e.g., mutation rate, population sizes, and time since admixture (number of generations)).
High mutation rates can decrease the expected migrant allele frequency and the variability by more than 50 percent, especially in populations that experienced earlier migrations. For example, an increased mutation rate can change the mean final allele frequency from .0243 to .0128, or from .5016 to .2699 (depending on other variables, as reported in Figure ). Researchers should keep this in mind when selecting loci for analysis. Because some DNA mutation rates are highly variable, choice of locus can have a profound impact on the number of migrant alleles detected years later. Many studies advocate the use of mtDNA due to data collecting feasibility and other factors. However, because the mutation rate is generally higher in mtDNA, it could corrupt signal in studies addressing historic admixture, even when the time frame is relatively recent.
The sizes of the migrant and native populations are fundamental for an understanding of expected allele frequency. With time since admixture as low as those we consider in our simulations, the most important factors are the sizes of the migrating and native populations. In our simulations, if the native population is large, changing the migrating population size results in a change of mean final allele frequency from .0243 to .0010. If the native population is small, those numbers change to .5016 and .0407. These are the most significant differences illustrated by our simulations and they attest to the important role of population sizes. Researchers should not expect to find many alleles from a small migratory group of 50 individuals in a large population today, even if sampling methods are exhaustive.
Additionally, we see that time plays an important role. The standard deviations presented in Table demonstrate that allelic frequencies vary widely, particularly as the number of generations increases. High mutation rates combined with large time spans can reduce migrant allele frequencies significantly. When the mutation rate is low, however, the time since admixture does not affect the final mean allele frequency much (or at all), but it still has a profound impact on the standard deviation. For example, a change in time since admixture in one parameter set almost doubles the standard deviation from .0525 to .1044. As time increases, genetic drift causes the spread of final allele frequencies to increase, particularly when the population sizes are small. Thus, as the time since the admixture event increases, sample size for both loci and subjects becomes increasingly important.
In our second simulation, most of the migrant alleles are present in less than 2% of the population. In a study of a population where few subjects from many human populations are studied, alleles from a small-scale admixture will usually not be recovered at all. And these rare alleles could easily be ignored in favor of haplotypes that better categorize the population into clusters.
Our results demonstrate a profound and general fact: the values of these genetic parameters can drastically alter the expected frequency of migrant alleles in today's populations. Even in our simulations, where steps have been taken to ensure a best-case scenario for the migrant allele, there is often a large spread of possible outcomes. DNA data have been touted as a panacea for recovering information about the past, but their use depends so extensively on factors that are beyond our control that their application is not always appropriate. It is imperative, therefore, that researchers understand the implications of the variables we have presented and not rely solely on DNA sequence data when researching small, recent human migrations. We can only hope to understand basic details of population history when quantifying genetic data and even valid results derived from genetic data may still be misleading if viewed unilaterally, as demonstrated by Harpending et al [
46,
47].
Our results, however, are not completely ominous. Carefully designed studies should be able to draw specific and valid conclusions from genetic data. One area for major improvement is the number of individuals and loci sampled. Our results indicate that a large sample size and large number of loci are needed to obtain robust results. Studies that are unable to sample sufficiently do not have the power to draw appropriate conclusions and should be interpreted with caution. Our results give guidelines for a variety of conditions and allow researchers to analyze the benefits of increasing sample sizes given their populations of interest. Because of the real possibility that a certain allele will have drifted to extinction, even sampling 100% of a population at a single locus may not reveal a single migrant allele, even if it was fixed in the migrant population. If one is faced with the challenge of researching small-scale admixture, it is necessary to identify migrant alleles even if they show up in a very small proportion of loci and subjects. Consequently, phylogenetic methods must be created that can pinpoint very small similarities between populations. Table summarizes the genetic and experimental factors that we believe will increase the chance of detecting admixture in today's populations. One complication that arises in such situations, however, is that very recent migration and admixture will further complicate the results. Identifying migrant alleles that are rare will be very difficult, not only because of the increased sampling necessary to detect them, but because of the noise that is likely to be introduced in the time since the event under examination.
| Table 3Improving probability of detecting historic admixture |
Perhaps most importantly, it must be remembered that drift is stochastic and that historic genetic parameters are, for the most part, unknown. Thus, the absence of specific genetic data is not conclusive evidence against historic admixture. Our results illustrate several parameter sets that would cause admixture to be either completely or practically undetectable today. To address the inconsistent results found in DNA all but the largest genetic studies need to continue to consider anthropologic, archeological, and linguistic data in order to formulate conclusions. Finally, our study demonstrates the utility of simulation studies to put bounds on parameter values and sample sizes for studies of human migration events.