|Home | About | Journals | Submit | Contact Us | Français|
The macroevolutionary consequences of recent climate change remain controversial, and there is little paleobotanical or morphological evidence that Pleistocene (1.8–0.12 Ma) glacial cycles acted as drivers of speciation, especially among lineages with long generation times, such as trees. We combined genetic and ecogeographic data from 2 closely related North American tree species, Populus balsamifera and P. trichocarpa (Salicacaeae), to determine if their divergence coincided with and was possibly caused by Pleistocene climatic events. We analyzed 32 nuclear loci from individuals of P. balsamifera and P. trichocarpa to produce coalescent-based estimates of the divergence time between the 2 species. We coupled the coalescent analyses with paleodistribution models to assess the influence of climate change on species' range. Furthermore, measures of niche overlap were used to investigate patterns of ecological differentiation between species. We estimated the divergence date of P. balsamifera and P. trichocarpa at approximately 75 Ka, which corresponds closely with the onset of Marine Isotope Stage 4 (~76 Ka) and a rapid increase in global ice volume. Significance tests of niche overlap, in conjunction with genetic estimates of migration, suggested that speciation occurred in allopatry, possibly resulting from the environmental effects of Pleistocene glacial cycles. Our results indicate that the divergence of keystone tree species, which have shaped community diversity in northern North American ecosystems, was recent and may have been a consequence of Pleistocene-era glaciation and climate change.
Despite evidence for the influence of Pleistocene (1.8–0.12 Ma) climate change on patterns of intraspecific genetic variation (Hewitt 1996), there is no consensus regarding its importance for driving speciation (Bennett 2004; Futuyma 2010). For many lineages, especially those with long generation times (e.g., trees), fossil evidence indicates macroevolutionary stasis throughout the Pleistocene (Willis and Niklas 2004; Eldredge 2005). Explanations for this stasis include the relatively short duration of climate cycles, which are insufficient to result in species-level divergence (Bennett 1990), as well as the inhibition of species-wide phenotypic evolution encountered in a geographic mosaic (Eldredge et al. 2005). Evaluation of the stasis hypothesis is incomplete however, because the paleobotanical record is unable to assess lineage splitting without coincident morphological divergence. Despite the well-known history of climate change and glaciation in North America, no clear examples of Pleistocene-age tree speciation from temperate or boreal regions have been demonstrated.
Computational, theoretical, and DNA sequencing advances of the last two decades provide the opportunity to integrate information on the timing of genome-wide divergence with patterns of historical range changes to understand past speciation events (Richards et al. 2007). Coalescent analyses of multilocus DNA sequence data allow estimation of the timing and demographic consequences of speciation, even for lineages with poor or no fossil record (Wakeley and Hey 1997). For recent divergence events, detailed estimates of species' historical distributions can also be inferred, allowing for the development of data-driven scenarios to understand processes underlying divergence (Knowles et al. 2007; Hickerson 2010).
Populus balsamifera L. and P. trichocarpa Torr & Gray are relatively short-lived (≤200 years maximum life span), fast-growing, dioecious tree species that are capable of clonal growth and very long distance gene flow (Eckenwalder 1996). The species are among the best-understood North American trees with respect to patterns of nucleotide variation and physiology (Gilchrist et al. 2006; Soolanayakanahally et al. 2009; Dillen et al. 2010; Keller et al. 2010; Olson et al. 2010). Nevertheless, their history of divergence remains underexplored, despite their morphological similarity, close phylogenetic affinity, and natural hybrid formation ( Viereck 1970; Hamzeh and Dayanandan 2004; Eckenwalder 2010), which may indicate recent speciation. Both P. balsamifera and P. trichocarpa range into high latitudes of North America and are parapatric in Alaska and Canada with little distributional overlap (Fig. 1a,1b). The fossil record of western North America indicates that antecedents (e.g., P. eotremuloides; Fig. 1b) of this lineage occupied the Pacific Northwest since at least the late Oligocene (~23 Ma) and were anatomically similar to P. trichocarpa (Axelrod 1988, 1998).
Here we investigate the timing and geography of divergence between P. balsamifera and P. trichocarpa, 2 keystone species of the boreal and Pacific temperate forests, by coupling coalescent-based divergence time estimation with modeling of historical species distributions. Integrating these approaches allows data- and model-based inferences regarding the following questions: (i) was the divergence of P. balsamifera and P. trichocarpa temporally associated with a Pleistocene-age climatic event and (ii) if so, did the divergence coincide with climatically mediated shifts in the species distribution?
Silica-dried leaf material of 6 Populus species was obtained from collectors in North America and Europe (Fig. 1a,b; online Appendix 1, available from http://www.sysbio.oxfordjournals.org/). The 2 ingroup species, P. balsamifera and P. trichocarpa, were represented by 29 individuals from 23 populations. The outgroup species (P. angustifolia James, P. deltoides Marshall, P. fremontii Watson, and P. tremula L.), each represented by 3–5 individuals (17 total), were chosen based on availability and putative phylogenetic position with respect to the ingroup (Hamzeh and Dayanandan 2004). Total genomic DNA was extracted with Dneasy plant kits (Qiagen, Valencia, CA). For each individual, we amplified up to 32 nuclear loci (each ~600 bp and arbitrarily chosen from a set of 590 loci developed by Olson et al. 2010; online Appendix 2). Polymerase chain reaction products were Sanger sequenced in both directions, and sequences were edited and aligned manually using Aligner v3.0.0 (Codon Code Corporation, Dedham, MA). All polymorphic and heterozygous sites were visually confirmed. Gene sequences from P. trichocarpa, P. tremula, P. fremontii, P. angustifolia, and P. deltoides were deposited in GenBank with accession numbers JN70676–JN707345, whereas sequences for P. balsamifera are available on GenBank from a previous study ( Keller 2010).
Per-locus estimates of genetic diversity (S, θW; Watterson 1975; π; Nei 1987; Hd; Nei 1987), neutrality (Tajima's D; Tajima 1989), and linkage disequilibrium (ZnS; Kelly 1997) were calculated for P. trichocarpa and P. balsamifera in DnaSP v5 (Rozas et al. 2003). Diversity and Tajima's D were estimated for both total sites and silent sites only. Estimates of nucleotide divergence (K; Tajima 1983) were calculated in DnaSP among all 6 of the sampled species. Distributions of statistics were compared using nonparametric Wilcoxon rank sum tests. Using the ms coalescent simulator (Hudson 2002), we tested whether per-locus estimates of Tajima's D differed from expectations under a standard neutral model by computing the distribution of 10,000 coalescent simulations using the observed per-locus estimates of θ and ρ (see Olson et al. (2010) for estimates of ρ [ρ= 2Nc, where c = recombination fraction between sites]). Similarly, we tested whether mean (across loci) estimates of D differed from standard neutral model expectations by simulating 10,000 thirty-two–locus data sets and calculating the mean Tajima's D across loci for each simulation.
For phylogenetic analyses, some individuals and loci were removed from the total data set to minimize missing data. The final phylogenetic data set consisted of 16 loci and 31 individuals that included members of P. angustifolia, P. fremontii, and P. tremula as outgroup taxa (online Appendices 1 and 2). For each of the 16 loci, data sets were constructed using phased haplotype sequences; in addition, a combined-locus data set was produced via concatenation of the individual loci. We determined the haplotype phase of diploid sequences using PHASE v2.1.1 ( Stephens 2001; Stephens and Scheet 2005), employing the general model for recombination rate variation (-MR0; Li and Stephens 2003) and a run length of 10,000 iterations (burn-in = 100; thinning interval = 10). Ninety-two percent of the haplotypes included in the phylogenetic analyses were determined at a posterior probability of >95%. The few gaps (~10) in the data set were treated as missing data. Nucleotide substitution models for maximum likelihood (ML) analyses were determined using JModelTest v0.1.1 (Guindon and Gascuel 2003; Posada 2008). The optimal models for each locus (online Appendix 1) and the concatenated data set were identified with the Akaike information criterion (AIC; Akaike 1974). ML analyses were conducted in GARLI v2.0 (Zwickl 2006; http://code.google.com/p/garli/). For each of the 17 data sets, we performed 12 replicate runs (6 with starting trees determined by stepwise addition and 6 with random starting trees) of 2 million iterations each. For ML analyses of the concatenated data set, we used a partitioned model approach, for which individual loci defined partitions. To determine branch support, 200 ML bootstrap replicates were produced in GARLI. Three tree searches were performed per replicate, each run for a length of 250,000 iterations.
The estimation of an ML species tree was performed in STEM v2.0 ( Kubatko 2009). The STEM method produces an ML estimate of the species tree from a collection of fully resolved gene trees by using the coalescent process to model levels of discordance between the gene trees and the species tree (Kubatko et al. 2009). Because STEM assumes no recombination, we used the 4-gamete test to identify intragenic recombination break points. Five loci exhibited intragenic recombination and were reduced to the longest region without observed recombination. Interspecific gene flow would also violate assumptions of the current version of STEM analysis; however, none of our loci exhibited obvious signs of introgression. STEM analyses were conducted using the 16 resolved ML gene trees estimated in GARLI and a point estimate of θ, which was calculated as the average per-site θ across the 16 included loci. We performed 3 optimized ML tree searches as well as 3 simulated annealing searches for the 15 most likely trees to ensure consistency among runs and identify any topologies that may be tied for the ML score.
The genealogical sorting index (gsi; Cummings et al. 2008) was used to quantify the degree of exclusive ancestry for P. balsamifera and P. trichocarpa from single-gene (gsi) and gene ensemble (gsiT) data sets. The gsi provides a measure of genealogical divergence on a continuous scale from 0 to 1, where 1 represents the complete monophyly of a group, and provides a useful tool for identifying boundaries between young species (Cummings et al. 2008). A permutation test is used to evaluate the null hypothesis that a gsi value equal to or greater than the empirical estimate would be observed by chance alone, assuming that the groups under consideration are not taxonomically distinct. The gsi calculations and significance testing were performed using the genealogical sorting index software (available at http://www.genealogicalsorting.org/). The single-gene ML trees, estimated in GARLI, were used as input for gsi and gsiT calculations, and P values were generated from 10,000 permutations. Because small sample sizes can bias P values, we only report gsi and gsiT results for P. balsamifera and P. trichocarpa.
We used the Bayesian model–based clustering method implemented in STRUCTURE v2.2.3 (Pritchard et al. 2000; Falush et al. 2003) to investigate patterns of genetic structure within and between P. balsamifera and P. trichocarpa. Three individuals (2 P. balsamifera individuals, 1 P. trichocarpa individual) were excluded from the analysis due to missing data. To determine the number of distinct genetic clusters in the data, we performed 10 replicate runs for each value of K (i.e., the number of demes) from 1 to 10, with no prior placed on the population of origin. The linkage model (Falush et al. 2003) with correlated allele frequencies was used for each run, which included a burn-in of 500,000 and 1,000,000 iterations of data collection. We used the program CORRSIEVE v1.6-3 ( Campana 2011) to calculate Evannos 2005 ΔK and (Campana 2011) ΔFST, as well as the average maximum Pearson's correlation coefficient (r) between all combinations of subpopulations in replicate runs. The average maximum correlation coefficient is used to assess replication reliability of the K solution, which is inferred to be stable when r is greater than or equal to a user-defined value of 0.99 ( Cockram 2008). ΔK and ΔFST are measures of second-order rate of change of the likelihood function and FST estimate, respectively, and are complementary means of identifying the optimal value for the number of demes sampled (Campana et al. 2011). The program CLUMPP ( Jakobsson 2007) was used to conduct model averaging of individual ancestry coefficients across the replicate runs.
To characterize the divergence of P. balsamifera and P. trichocarpa, we estimated the parameters of the isolation-migration (IM) model (i.e., θ1, θ2, θA, T, M12, and M21; Wakeley and Hey 1997) using the program MIMAR ( Becquet 2007). MIMAR differs from the implementation of IM by Hey and Nielsen (2004) by incorporating recombination. MIMAR uses per-locus counts of derived polymorphisms found in each of 4 classes (S1, polymorphisms unique to sample 1; S2, polymorphisms unique to sample 2; SS, polymorphisms shared between samples; SF, polymorphisms fixed between species). We used 4 outgroup species (i.e., P. tremula, P. angustifolia, P. fremontii, and P. deltoides) to infer the ancestral state of each polymorphism. Coalescent simulations were performed under a standard neutral model, which is consistent with the observed patterns of nucleotide variation in P. balsamifera and P. trichocarpa (Neiman et al. 2009; Keller et al. 2010; also see Results section).
Analyses were conducted using 32 loci and 29 ingroup individuals (16 P. balsamifera individuals, 13 P. trichocarpa individuals). The synonymous mutation rate per base pair per year was assumed to be 2.5 × 10−9 as estimated by Tuskan et al. (2006) and was input as a point estimate. We performed analyses under 2 models of migration (symmetric and asymmetric). Under both models, estimates of θ and T (time to divergence) were sampled from the following uniform prior distributions: θbalsamiferaU[0.0001, 0.005], θtrichocarpaU[0.0005, 0.005], θancestorU[0.0005, 0.005], TU[0, 150,000]. The migration prior for the symmetric model was lnU[−15, 3], and for the asymmetric model, the priors were M12 lnU[−15, 5] and M21 lnU[−15, 5]. Prior values were chosen to bracket contemporary diversity measures and maximize the step acceptance rates. We performed 3 replicate runs, each with a length of 2,500,000 iterations, generating 15 genealogies per step, and with a burn-in of 500,000. Ninety percent highest posterior density (HPD) intervals were calculated using the BOA package in R 2.6.2 (Smith 2007).
To determine which migration model provided a better fit to the observed data, we performed a goodness-of-fit test using the program MIMARGOF ( Becquet 2007). The test produces distributions of summary statistics (S1, S2, SS, SF, π, FST, and Tajima's D) calculated from the data, which are simulated on parameters sampled from the posterior distribution output by MIMAR. Estimates of the same summary statistics, calculated from the observed data, are compared with the simulated distributions to determine a predictive posterior P value. MIMAR and STRUCTURE analyses were conducted using the computational portal (https://biotech.inbre.alaska.edu/portal), maintained by the UAF Life Sciences Informatics CORE.
To infer distributional changes of P. balsamifera and P. trichocarpa since the last interglacial (LIG), we produced species distribution models (SDMs) for the present, last glacial maximum (LGM; 21 Ka), and LIG (135 Ka) time periods. Three climate layers corresponding to current, LGM, and LIG conditions were created from monthly climate data at a spatial resolution of 2.5′ (arc-minutes). Data for current climate were obtained from the WorldClim database (http://www.worldclim.org; Hijmans 2005), the LGM data were taken from the general circulation model (GCM) simulations from the Community Climate System Model (CCSM; Collins 2004) and the Model for Interdisciplinary Research on Climate (version 3.2; Hasumi and Emori 2004), and the LIG data were taken from a global, coupled ocean-atmosphere-land-sea-ice GCM (National Center for Atmospheric Research CCSM, http://www.ccsm.ucar.edu/; Kiehl and Gent 2004). The original GCM data (available at http://www.pmip2.cnrs-gif.fr) were obtained at a spatial resolution of 2.8. Climate surfaces were created by A. Townsend Peterson (University of Kansas) and followed the methods described by Waltari et al. 2007. Species occurrence data were obtained from 9 herbaria (ALA, DAV, JEPS, MO, ARIZ, ASC, ASU, WTU, and OSC) accessed through the Global Biodiversity Information Facility data portal, http://data.gbif.org. Collection coordinates were plotted in Google Earth to confirm that each occurred in reasonable locations. To reduce the effects of spatial autocorrelation in climate variables, we removed occurrence locations that were within 150 km of one another.
Current, LGM, and LIG distribution models for P. trichocarpa and P. balsamifera were generated with the maximum entropy method implemented in the program MAXENT v3.3.3a (Phillips et al. 2006; Phillips 2008). Current distribution models were developed using 7 contemporary climate variables (i.e., mean annual temperature, mean diurnal range, maximum temperature of warmest month, minimum temperature of coldest month, annual precipitation, precipitation of wettest month, and precipitation of driest month), which were chosen due to the low level of intervariable correlation, and species occurrence information. These models were then projected onto the LGM and LIG climate models to identify the potential geographic distributions of the species during glacial and interglacial periods during the middle and late Pleistocene, assuming that the species' ecological tolerances were the same in the past as they are today. To account for error in the predictive modeling approach, we generated 15 jackknife replicates with deletion of 30% of species' occurrence points for each analysis and report the variance and mean prediction across these replicates. Replicates were run for a maximum of 500 iterations with a convergence threshold of 10 − 5. The optimal L1 regularization multiplier value (β) for each species was determined using the corrected AIC as implemented in ENMTools v1.2 ( Warren 2008; Warren and Seifert 2011).
Novel climate conditions can confound the accurate projection of past and future SDMs. Therefore, we used the multivariate environmental similarity surface and most dissimilar variable plots from MAXENT to identify regions of novel climate conditions and those climate variables that were most responsible for the conditions (Elith et al. 2010). The LGM models for both species showed that large regions of the projected species' distributions (see Results section) were influenced by novel climate conditions (online Appendix 3a). These novel conditions were attributed to values of mean diurnal range (i.e., the mean of all weekly diurnal temperature ranges; online Appendix 3b) that are not presently observed within the study area. Mean diurnal range contributed 2.6% and 0% to the present-day models of P. balsamifera and P. trichocarpa, respectively. After removing this variable from the data, we generated new SDMs and observed very little change in the projected distributions of either species (online Appendix 3c), whereas the novel environmental conditions were nearly eliminated. The LIG climate models showed only a small area that was influenced by novel climate conditions, and removing these conditions (online Appendix 3e) did not substantially change the inferred distributions (online Appendix 3d,f).
To assess levels of climate niche divergence between P. balsamifera and P. trichocarpa, we first calculated the niche overlap statistic Schoener's D (Schoener 1968) for the 2 species using ENMtools. We then used the “background similarity test” in ENMtools, which employs a randomization procedure to compare the observed level of niche overlap between 2 species to null distributions of expected overlap values based on characteristics of each species' environmental background ( Warren 2008). Two null distributions were generated per test, each by comparing the distribution model of a focal species to 1000 models created by randomly drawing points from the background range of the other species. The program DIVA-GIS v7.1.6 (Hijmans et al. 2005) was used to create mask files of the species' background ranges from which to generate the random sample points. The background range of each species was determined to include any raster cell from the original SDM with a logistic probability of presence >0.1. This threshold was set as the lowest probability of presence value for an actual occurrence point of each species. In all, 8 similarity tests were performed—1 for each of the 7 climate variables and 1 with all variables included.
We re-sequenced 32 nuclear loci for both P. balsamifera (16 individuals) and P. trichocarpa (13 individuals), producing >18,000 bp per individual. Populus trichocarpa harbored significantly greater nucleotide and haplotype diversity than P. balsamifera (Wilcoxon rank sum: πsilent: z = 2.13, P = 0.033; θsilent: z = 1.99, P = 0.046; Hd: z = 1.94, P = 0.052), whereas estimates of linkage disequilibrium (ZnS) and Tajima's D (Wilcoxon rank sum—ZnS: z = 1.16, P = 0.245; Tajima's D: z = 0.104, P = 0.917) were similar in the 2 species (Table 1). Mean Tajima's D averaged across loci did not deviate significantly from neutral expectations for either P. balsamifera or P. trichocarpa with values from individual loci ranging between −1.99 and 2.49 (online Appendix 4).
Estimates of K (average number of nucleotide differences between species; Table 2) showed P. tremula to be more divergent, on average, from the 5 North American species than each of those species was to one another. In contrast, the lowest levels of divergence were found between P. trichocarpa and P. balsamifera (KS = 0.0068, KTOTAL = 0.0047).
After the data sets were reduced to minimize missing data, the complete concatenated data set comprised 9268 nucleotide characters, of which 323 (3.5%) were parsimony informative. All data matrices and trees are available at TreeBASE (http://www.treebase.org/; S11943). The best ML estimate of phylogeny provided moderate bootstrap support (72%) for the monophyly of the ingroup (i.e., P. trichocarpa and P. balsamifera) as well as for the monophyly of P. balsamifera (70%; Fig. 2a). Populus trichocarpa was recovered as paraphyletic with respect to P. balsamifera. An analysis on the concatenated data set, constrained so that both P. trichocarpa and P. balsamifera were monophyletic, produced a significantly less likely topology (Kishino–Hasegawa: P = 0.007) than the best ML estimate. STEM search replicates, using 16 ML gene trees, consistently recovered a single, identical ML species tree (Fig. 2b), which also placed P. balsamifera and P. trichocarpa together as sister species. The primary difference between the concatenated and STEM topologies was the lack of a monophyletic outgroup in the STEM tree. Estimates of gsiT for P. balsamifera (0.528) and P. trichocarpa (0.394) were moderate; however, P values indicated a significant level of genealogical divergence for both species (Table 3). For P. balsamifera, all 16 loci demonstrated significant levels of exclusive ancestry, whereas 13 loci were significant for P. trichocarpa.
STRUCTURE analyses applied to P. trichocarpa and P. balsamifera produced stable solutions for Ks between 1 and 10, and both ΔK and ΔFST supported an optimal clustering model of K = 2 (Table 4). Although all individuals in both species exhibited some loci in both clusters, all P. balsamifera individuals could be unambiguously assigned (q > 0.8) to a single cluster (Fig. 3). In P. trichocarpa, however, admixture was higher (q > 0.25) and no individual could be unambiguously assigned to a single cluster. These results were consistent with those from the genealogical sorting analyses, which indicated a lower degree of exclusive ancestry for P. trichocarpa relative to P. balsamifera.
Goodness-of-fit tests revealed that the symmetric and asymmetric migration models provided an equally good fit to the observed data (online Appendix 5). Therefore, we confine our discussion to the results of the simpler symmetric migration model. Assuming a mutation rate of 2.5 × 10−9, a generation time of t = 15 years, and symmetric migration, P. balsamifera and P. trichocarpa diverged ≈75 Ka (HPD: 35–464 Ka; Fig. 4c). The estimates of effective population size (Fig. 4a) were highest for the ancestral population (NA≈ 11,847; HPD: 6187–19,573) and somewhat smaller for P. trichocarpa (N2≈ 7943; HPD: 5231–21,667) and P. balsamifera (N1≈ 5163; HPD: 2893–8867). Finally, the estimate of migrants per generation (M; Fig. 4b) was very low (M≈ 0.44; HPD: 0–0.74).
Figure 5 shows our predictions of present and past (LGM and LIG) distributions for P. balsamifera and P. trichocarpa. Model accuracy for P. trichocarpa was relatively high (Area Under the Curve [AUC] = 0.943; standard deviation [SD] = 0.034), and the accuracy for P. balsamifera was adequate for inference (AUC = 0.728; SD = 0.021; Hosmer and Lemeshow 2000; Betts et al. 2006); model accuracy is no better than random when AUC = 0.5. Distribution models for the present were mostly congruent with the actual distributions of both species (Fig. 1a,b) but overpredicted large areas of occurrence in south central Mexico (P. trichocarpa) and southwest Alaska (P. balsamifera and P. trichocarpa). The environmental variables that most contributed to the model predictions, measured as the percentage of total model gain contributed by a given variable, were annual mean temperature (54%) for P. balsamifera and the mean temperature of the coldest month (77%) for P. trichocarpa.
Paleodistribution models indicated a history of dramatic range shifts for P. balsamifera over the last 135 kyr (Fig. 5). By contrast, P. trichocarpa's range remained relatively stable in size and location. The suitable range of P. balsamifera was predicted to be much larger during the LIG than at present and much smaller during the LGM, at which time we predict there were 2 regions of suitable habitat: central Alaska and the intermountain west. Of the inferred ranges of P. trichocarpa, the largest are from the present and the LGM periods and the smallest from the LIG. As compared with the present, the inferred LGM distribution of P. trichocarpa indicates much more suitable habitat at the southern extent of the distribution (i.e., Baja California and Mexico).
The results of a background similarity test using all 7 bioclimatic variables indicate that P. trichocarpa is more ecologically similar to P. balsamifera than would be expected based on background environmental heterogeneity (Fig. 6a). Tests performed on individual climate variables suggested that for the annual mean temperature variable (Fig. 6b), the level of niche overlap was not different than that expected by chance, whereas for mean diurnal range and the maximum temperature of the warmest month, P. balsamifera was more similar to P. trichocarpa than expected by chance (Fig. 6c,d). Populus balsamifera was significantly different from P. trichocarpa for the minimum temperature of the coldest month (Fig. 6e). Precipitation variables (Fig. 6f,g,h) showed high levels of similarity both between the SDMs (D = 0.81–0.89) and between the background regions inhabited by each species (represented by the null distributions).
Coalescent-based analyses from 32 loci estimate that the most likely time of divergence of P. balsamifera and P. trichocarpa was during the late Pleistocene ~75 Ka, corresponding closely with the onset of Marine Isotope Stage 4 (~76 Ka) and a rapid increase in global ice volume (Landais et al. 2004). This recent date provides a clear contrast to the idea of macroevolutionary stasis in tree lineages since the Pliocene (5.3–2.6 Ma; Huntley 1991). The hypothesis of uncommon Pleistocene tree speciation is based primarily on the absence of evidence for morphological diversification in the Pleistocene—which may not reflect reproductive isolation (Willis and Niklas 2004; Brochmann and Brysting 2008). Already in a few cases of herbaceous (e.g., Pritzelago, Anthyllis, and Reseda; Kadereit et al. 2004; Martin-Bravo et al. 2010) and shrubby plant lineages (Inga; Richardson et al. 2001), speciation has been shown to have coincided with the climatic and physiographic processes of the Ice Age.
The precision of our estimates of divergence time is limited by the inherent variance in the coalescent process (Wakeley 2008), and their accuracy is dependent on the estimate of mutation rate in Populus. Nonetheless, the overall similarity between the P. balsamifera and P. trichocarpa genomes suggests a very recent divergence. The synonymous mutation rate for Populus, 2.5 × 10− 9 per site per year, used in this study was taken from Tuskan et al. (2006); the authors estimated the rate as one-sixth that of Arabidopsis (i.e., 1.5 × 10− 8; Koch et al. 2000), based on the ratio of a Populus/Salix divergence estimate derived from KS-converted mutation rates (8–13 Ma; Sterck 2005) to the same estimate derived from the fossil record (60–65 Ma; Eckenwalder 1996). Recently, Ossowski et al. (2010) have estimated an overall mutation rate for Arabidopsis of 5.9 × 10−9 per site per generation. Applying the KS conversion of Sterck et al. (2005) (Populus/Salix divergence ≈ 20 Ma) and, subsequently, the fossil record adjustment of Tuskan et al. (2006) (20 Ma divided by 60–65 Ma ≈ 1/3) to this estimate produces a Populus mutation rate estimate of 2 × 10−9. Because coalescent time should scale linearly with mutation rate (assuming no changes in population size; Wakeley 2008), a mutation rate discrepancy of the magnitude suggested by the results of (Ossowski 2010) would not produce a divergence time estimate for P. balsamifera and P. trichocarpa that predates the Pleistocene. The accurate determination of generation time may also affect correct divergence date estimation. In our study, however, because generation time was used to scale the per-year mutation rate as well as the Tgenerations estimate from MIMAR, our divergence estimate was independent of generation time. By contrast, our estimates of effective population size (Ne) are dependent on the generation time, meaning that underestimates of generation time will result in inflated estimates Ne.
By understanding the geographic context of P. balsamifera and P. trichocarpa divergence, we sought to elucidate the evolutionary processes that contributed to it. The standard view of Pleistocene speciation invokes an allopatric model of divergence as physical or ecological barriers restrict gene flow between isolated conspecific populations (Hewitt 1996; Knowles 2000). If speciation of P. balsamifera and P. trichocarpa occurred allopatrically, then pre- and post-zygotic isolation could remain weak (Via 2009). Hybridization between these two species is common where their current ranges are in close proximity ( Viereck 1970), indicating that neither the pre- nor the post-zygotic isolating barriers are strong. In addition, patterns of population structure across both species indicate that although individual genomes cluster primarily with other individuals in the same species, there is also evidence of admixture, especially in P. trichocarpa (Fig. 3).
Estimates of niche overlap between species that are not identically distributed can be influenced by background environmental conditions not shared by both species' ranges (Warren et al. 2008). To account for the effects of dissimilar environmental backgrounds, we performed background similarity tests. The null distributions created by this procedure (shown in Fig. 6) represent the level of niche overlap expected if the overlap between species is only explained by differences in available habitat (Warren et al. 2008). An observed niche overlap value outside of the 95% confidence interval of the null distribution is interpreted as elevated ecological similarity or divergence relative to that expected by chance. Our results indicated a significant level of ecological similarity between the P. balsamifera and P. trichocarpa (Fig. 6), suggesting that, with the possible exception of cold tolerance (Fig. 6e), they have not diverged across measured niche axes.
Our low migration rate estimates between P. trichocarpa and P. balsamifera (M≈ 0.44) indicate that gene flow between species is low, with migrants entering the population at approximately half the rate of new mutations. Today, these species hybridize in regions of range overlap; however, historical levels of gene flow must have been sufficiently low for their independent evolution and divergence. We recognize that caution must be exercised, however, in interpreting migration rates produced by IM-type models, which are estimated by integration across time ( Becquet 2009).
Fossil evidence of a potential ancestor to P. balsamifera and P. trichocarpa occurs along the northwest coast of North America (Fig. 1b). During glacial periods, plants in this region were restricted to coastal or southern refugia and became isolated from populations existing further to the east in the Rocky Mountains or central Alaska (Fig. 1c; Brunsfeld et al. 2001). Both our SDM results (Fig. 5b) and the genetic analyses of Keller et al. (2010) provide evidence for refugial populations of P. balsamifera in unglaciated regions of the intermountain west. The isolation of these allopatric populations likely reduced gene flow resulting in independent evolutionary lineages that have diverged due to drift or local adaptation to low-temperature regimes (Fig. 6e). Although our SDM analyses indicate that the suitable range edges of P. balsamifera and P. trichocarpa remained in relatively close proximity (<50 km) throughout the last 135 kyr, measures of straight-line distance do not account for the existence of physical barriers to gene flow (e.g., glacier ice, mountain ranges).
An Ice Age scenario of allopatric speciation by founder effect is consistent with the recent divergence time, inference of historical geography, and paraphyletic pattern recovered from our analyses (Figs. 2a, 4, and 5; Rieseberg 1994; Harrison 1998). Paraphyly is a predicted pattern of relationship early during speciation when insufficient time has passed since divergence for monophyly to arise in the source population, in this case P. trichcocarpa. Nevertheless, measures of genealogical sorting (gsi; Table 3) suggest that, despite incomplete lineage sorting, the divergence process has been sufficient for the development of taxonomic distinctiveness between P. trichocarpa and P. balsamifera. We note that because STEM analyses are focused on identifying species-level relationships (Kubatko et al. 2009), visualization of the results from this analysis did not allow an assessment of monophyly for the species. Without also including a concatenated gene-tree approach, the paraphyly of P. trichocarpa with respect to P. balsamifera would not have been revealed.
This project was funded by the National Science Foundation Plant Genome Award DBI-0701911. The UAF Life Sciences Informatics CORE is supported by grant number RR016466 from the National Center for Research Resources, a component of National Institutes of Health.
The authors thank A. Black, W. Fertig, A. Breen, A. Robertson, P. Ingvarsson, E. Etherington, K. Smikrud, J. Kiser, S. Silim, and W. R. Schroeder of the Agroforestry Division of Agriculture and Agri-Food Canada in Indian Head, Saskatchewan, for tissue donations. We thank J. Sellner and R. Jay for helping in the collection of sequence data. We also thank A. T. Peterson for providing environmental data layers for the development of SDMs, as well as D. Warren and S. Keller for helpful discussion.