The rejection algorithm method can be an efficient way to infer distributions of genetic and demographic parameters using polymorphism data (Ramakrishnan
et al.,
2004). In this approach values for a parameter of interest are simulated from a prior distribution and accepted with a probability proportional to their likelihood. Since likelihoods are difficult to compute directly, approximate simulation methods have been developed wherein summary statistics are used in place of the full dataset, and the candidate parameter value is accepted based on the values of the statistics (Tavaré
et al.,
1997). Bayesian statistics incorporate the use of prior information in the form of distributions of parameters of interest. The rejection algorithm method has been extended by simulation of parameter values from prior distributions and by coalescent simulation of gene genealogies given these parameter values (Pritchard
et al.,
1999). Summary statistics calculated from these simulated genealogies can be compared with summary statistics calculated from observed data, and if the simulated summary statistics fall within a specified tolerance of the observed summary statistics, the parameter values are accepted to form a sample from the posterior distribution. The combination of summary statistics in place of the full dataset and Bayesian prior distributions is termed approximate Bayesian computation. Approximate Bayesian methods using summary statistics can deal with complex models and provide estimates of parameter values of interest under any model (Excoffier and Heckel,
2006). REJECTOR implements an approximate Bayesian method termed rejection-based approximate Bayesian inference (Beaumont
et al.,
2002), for inferring population history.
REJECTOR calculates summary statistic values from observed data, then simulates a series of population histories from prior distributions for parameters of interest (e.g. time of divergence, population size and migration rate) and calculates summary statistic values for each simulated population history. REJECTOR accepts a simulated population history if its summary statistic values fall within a user-specified tolerance value of the values calculated from the experimental data. The parameter values of the accepted histories form posterior distributions that can be used to make estimates of the true parameter values. In addition to parameter estimation, REJECTOR can be used to compare alternate models of population history by comparing the proportion of accepted simulated histories generated by each model.
REJECTOR uses any number of unlinked blocks of genetic data, each containing any number and combination of SNP/UEP, microsatellite and sequence loci. REJECTOR makes use of the extant SIMCOAL2 (Laval and Excoffier,
2004) software for coalescent simulation, an updated version that allows for multiple blocks of linked loci. Comparisons can be made using a wide array of summary statistics. REJECTOR allows the user to jointly analyze different categories of genetic data simultaneously. Because each iteration of a rejection algorithm-based simulation is independent of all others, the computational work can be divided amongst multiple processors. A recent advance in Bayesian simulation, sequential Monte Carlo promises improved efficiency in rejection-based methods via the propagation of parameter values through a series of intermediate distributions with decreasing tolerance values (Sisson
et al.,
2007), but this iteration of the REJECTOR software does not include this feature.