Coalescent Simulations Under GTR + Γ
When data were simulated under GTR + Γ, and sequence length was 100 bp (resulting in 20–30 segregating sites), estimates of ρ were quite accurate, independent of the substitution model assumed (). Success values increased with higher recombination rates, from 40%–50% to 80%–90%. Similar results were obtained when sequence length was 200 bp (encompassing 40–50 segregating sites) ().
Recombination Detection and Estimation for 20–25 Segregating Sites
Recombination Estimation and Detection for 40–50 Segregating Sites
The LPT for the presence of recombination showed false-positive rates around 5% for all estimation models (, first row). Power of the test increased with increasing values of ρ, from 30%–40% at ρ = 2 (7 recombination events expected on average) to 89%–97% when ρ = 100 (355 recombination events expected), although for the two-allele models power seemed higher for ρ = 50 than for ρ = 100. Using the “all”-allele models slightly increased power, but incorporating rate variation among sites (Γ) did not have a significant effect. Results were very similar when sequence length was 200 bp, although power increased 10%–15% at low levels of recombination ().
Simulations Under Codon Models
Simulating data under a codon model without recombination did not result in any case of increased false-positive rate for the LPT. False-positive rates for sequences simulated under ω = 0.2, 1, and 5 were only 6%, 3%, and 2%, respectively.
Simulations with Exponential Growth
Exponential growth resulted in a consistent underestimation of the recombination rate (). Increasing values of the population growth parameter G were associated with decreasing estimates. At the highest growth rate (G = 80), the median estimates were 0, 0.5, and 4, for the JC2, JCall, and GTRAll models, respectively, when the true value of 4Nrl was 10. The power of the LPT was also strongly diminished by growth. When G = 20, power was only 21%–30%, decreasing to 8%–12% at the highest growth rate. The different estimation models did not have a clear effect on the recombination rate estimates, although for G = 40 and 80 using all alleles resulted in better estimates.
Recombination Estimation and Detection in Fluctuating Populations
Simulations with Recombination and Selection
The ability to estimate or detect recombination decreased with the presence and intensity of selection. In the absence of recombination, the estimated recombination rate was zero or close to zero even in the presence of selection (). Also, selection did not affect the otherwise conservative false-positive rate of the recombination test. When ρ = 10, selection reduced considerably the accuracy and success of the estimator, especially when selection was strong and only biallelic sites were taking into account. In this case, selection clearly reduced the power of the recombination test, from 90% in the neutral case, to almost 70% and 50% for the weak and strong selection cases, respectively. All calculations above used the specific Watterson’s estimate of 4Nμ for each replicate. Interestingly, when the simulated (parametric) value of 4Nμ = 0.1 was used instead, the impact of selection on success was diminished (data not shown).
Recombination Estimation and Detection in the Presence of Selection
Simulations Under Population Subdivision
When data were simulated under a population subdivision model with no migration, but this subdivision was ignored during the estimation procedure, there was a tendency to underestimate the simulated value of the recombination rate. This was especially true when only sites with two alleles were used. For example, when there was recombination in just one population (ρ = 10), the median estimate decreased from 8 to 4, and success from 72% to 15%, when only biallelic sites were considered. A similar effect was observed for the recombination permutation test. False positives increased and power decreased with population subdivision when only sites with two alleles were used, ().
Recombination Estimation and Detection in Subdivided Populations
Simulations with Longitudinal Sampling
Sampling time had an effect on the estimation of the recombination rate. When sampling occurred as early as generation 250, the estimated recombination rate was 0 although the simulated value was 10 (), even though there were already 61 segregating sites in the sample. From generation 500 onward the estimation improved, especially if all sites were considered. The power of the recombination test also increased with increasing number of generations completed before sampling.
Recombination Estimation and Detection in Longitudinal Samples
When sequences sampled at different time points were considered as contemporaneous and treated as a single sample, the median estimate seemed to slightly overestimate the actual value of the recombination rate (). Estimator success and power of the recombination test were a little higher when all alleles were considered than when only biallelic sites were used.
Estimates of recombination from patient 1 (P1) changed between different time points, with a single peak at time point 5 (). Most estimates were significantly different from zero (i.e., the LPT was significant), except for time points 1 and 2. In most cases, the model assumed did not have a strong effect. In patient 2 (P2), there were several nonsignificant time points interspersed between different recombination peaks. In this case the model had a stronger effect. For example, at time points 5 and 10, recombination was not detected when considering only biallelic sites, but it was inferred when considering all sites, even resulting in large estimates.
Recombination Estimation and Detection from Empirical Data Sets
When we analyzed 10 mixed samples including two randomly chosen P1 sequences from each time point, we obtained consistent estimates of the recombination rate (mean estimate JC2 = 8.4, SD JC2 = 4), but well below most of the estimates previously obtained at each point. In 8 out of the 10 mixed samples examined, the LPT was significant.
Variability Between Runs
The SD associated with 10 repetitions of the recombination rate estimate from sample time 5 of P1 was very high using the JC2 (SD = 24.88) or JCall (SD = 18.51) models, and significantly smaller (P < 0.01) with the GTRall model (SD = 5.87) (). When we repeated this experiment with samples 10 and 14 from individual P2, we obtained similar results. On the other hand, the results of the LPT seem to be very consistent across different runs (). However, the same experiment performed with simulated data (GTR+Γ, ρ = 50, 40–50 segregating sites) resulted in much better repeatability. For JC2, the average over 100 replicates of the SD associated with 10 repetitions was 3.4 ± 1.9.
Repeatability of the Estimation Procedure
Comparison with Other Recombination Detection Methods
With low variation, and especially at low recombination rates, the LPT was significantly more powerful than any of the 14 methods evaluated by Posada and Crandall (2001b
) (). At higher variation levels the LPT was tied with the best methods (). At the same time, the likelihood permutation showed a false-positive rate around 5%, and it did not seem to be influenced by increasing levels of rate heterogeneity (), whereas many of the other methods had false-positive rates above 5%. These results were independent of the estimation model used (JC2 or JCall).
Fig. 1 Relative power and false-positive rate of the LPT. Performance of the LPT under the JC2 and JCall is compared to the best results obtained by any of the 14 recombination detection methods evaluated by Posada and Crandall (2001b). (A) Power to detect recombination (more ...)