|Home | About | Journals | Submit | Contact Us | Français|
Understanding the genetic mechanisms of speciation and basis of species differences is among the most important challenges in evolutionary biology. Two questions of particular interest are what roles divergent selection and chromosomal differentiation play in these processes. A number of recently proposed theories argue that chromosomal rearrangements can facilitate the development and maintenance of reproductive isolation and species differences by suppressing recombination within rearranged regions. Reduced recombination permits the accumulation of alleles contributing to isolation and adaptive differentiation and protects existing differences from the homogenizing effects of introgression between incipient species. Here, we examine patterns of genetic diversity and divergence in rearranged versus collinear regions in two widespread, extensively hybridizing sunflower species, Helianthus annuus and Helianthus petiolaris, using sequence data from 77 loci distributed throughout the genomes of the two species. We find weak evidence for increased genetic divergence near chromosomal break points but not within rearranged regions overall. We find no evidence for increased rates of adaptive divergence on rearranged chromosomes; in fact, collinear chromosomes show a far greater excess of fixed amino acid differences between the two species. A comparison with a third sunflower species indicates that much of the nonsynonymous divergence between H. annuus and H. petiolaris probably occurred during or soon after their formation. Our results suggest a limited role for chromosomal rearrangements in genetic divergence, but they do document substantial adaptive divergence and provide further evidence of how species integrity and genetic identity can be maintained at many loci in the face of extensive hybridization and gene flow.
The roles of divergent selection in speciation (Coyne and Orr 2004) and in the development and maintenance of species differences (Rieseberg et al. 2004) have been long-standing questions in evolutionary biology. As more molecular data become available, it is becoming increasingly clear that positive selection has played a major role in both molecular and phenotypic evolutionary change (Rieseberg et al. 2002; Eyre-Walker 2006). Many examples of positive selection apparently acting on individual genes or gene families have been documented (Nei  and references therein). Interest in documenting positive selection within species and adaptive divergence between species is motivated not only by the ongoing debate about the relative roles of neutral and selective processes in evolution but also by the promise that identifying genes under divergent selection between species can help us to understand both gene function and the nature of adaptive phenotypic differences (Steiner et al. 2007; Barrett et al. 2008).
As more and more genomic data, including dozens to hundreds of unlinked markers (Storz 2005) or complete genome sequences (Kosiol et al. 2008), have become available for an increasing number of species, there has been increased interest in the degree to which interspecific patterns of genetic variation reflect widespread positive selection throughout the genomes of related species. By comparing the distribution of genetic divergence or evolutionary rate measures to a neutral model, outliers undergoing divergent or balancing selection can be identified (Lewontin and Krakauer 1973; Beaumont and Balding 2004).
In chromosomally divergent species, a related question concerning the role of chromosomal rearrangements in creating and maintaining species boundaries has been a topic of debate for close to a century (Sturtevant 1938; White 1978; Ayala and Coluzzi 2005). Traditionally, chromosomal rearrangements have been thought to contribute to speciation (White 1978; King 1993) or to the maintenance of species identities in hybridizing species (Barton and Bengtsson 1986; Levin 2002) via underdominant fitness effects associated with meiotic abnormalities and the creation of unbalanced gametes in chromosomal heterozygotes. However, this view of the role of chromosomal rearrangements in speciation and species boundaries suffers from significant theoretical and empirical difficulties (Rieseberg 2001). Rearrangements must be strongly underdominant for them to effectively reduce gene flow between incipient species, but such rearrangements are very unlikely to be fixed within either species, except under very restrictive conditions involving extremely small effective population sizes or strong meiotic drive (Wright 1940, 1941; Walsh 1982; Lande 1985). More weakly underdominant rearrangements are considerably easier to fix within species, but will also contribute less to reduced fitness of hybrids and thus isolation between species. (This theoretical difficulty may be partially overcome by invoking a cascade model in which numerous rearrangements that are weakly underdominant individually are strongly underdominant in concert; White 1978.) In addition, the fitness effects of rearrangements can be quite variable, and in many cases little reduction in fertility is seen (Sites and Moritz 1987; Coyne et al. 1991, 1993; Davisson and Akeson 1993).
Recently, a number of related models have been proposed in which chromosomal rearrangements can reduce gene flow and potentially contribute to speciation indirectly through the suppression of recombination rather than directly through underdominance. Rieseberg (2001) suggested that if a gene contributing to reproductive isolation between species is located within a chromosomal rearrangement, the isolating effects of that gene could extend much farther along the chromosome due to recombination suppression. Depending on the number of rearrangements by which species differ and how strongly recombination is suppressed within them, relatively few isolation genes could effectively block introgression for a significant portion of the genome. This model was partly motivated by evidence that pollen sterility quantitative trait loci (QTL) may cluster near chromosomal break points in sunflowers (although some sterility is probably due to the rearrangements themselves; Gardner et al. 2000; Lai, Nakazato, et al. 2005) and that introgression was suppressed across much larger linkage blocks in rearranged regions versus collinear regions (Rieseberg et al. 1999). Noor et al. (2001) proposed a closely related, nonmutually exclusive model in which asymmetrically acting isolation alleles, which would normally be removed from the incipient species by selection, may be maintained if such alleles from both species are found within the same chromosomal inversion. If these isolation alleles are prevented from recombining, neither allele or arrangement conferring sterility in the alternate genetic background can be removed by selection and neither arrangement can introgress across the incipient species boundary. This model is supported by both empirical data and theory. A number of asymmetric incompatibilities have been documented (Coyne and Orr 1989; Palopoli and Wu 1994), and loci conferring reproductive isolation between Drosophila pseudoobscura and Drosophila persimilis are located almost exclusively in genomic regions that show fixed chromosomal differences between the two species (Orr 1987; Noor et al. 2001a and 2001b). Navarro and Barton (2003a) modeled the effects of chromosomal rearrangements between hybridizing species and found that they could greatly facilitate the accumulation of incompatibility alleles. Kirkpatrick and Barton (2006) suggested a different mechanism by which adaptive species-specific differences may be preferentially found in inverted regions—if two or more alleles conferring fitness benefits in a particular environment or genetic background are captured within the same inversion, they become tightly linked due to recombination suppression within the inversion; such inversions can increase rapidly in frequency due to the cumulative effects of multiple adaptive alleles.
A common prediction of these recombination suppression models is that interspecific gene flow will be reduced across rearranged chromosomal regions. In species pairs with a history of gene flow, this should facilitate local adaptation and the preferential accumulation of genetic differences (including hybrid incompatibilities) on rearranged chromosomes (Noor et al. 2001b; Rieseberg 2001; Navarro and Barton 2003a; Kirkpatrick and Barton 2006). The widespread annual sunflowers of the genus Helianthus are ideal for testing predictions about the roles of chromosomal rearrangements and divergent selection in the maintenance of species integrity despite ongoing gene flow. Helianthus annuus and Helianthus petiolaris are strongly differentiated in karyotype (Burke et al. 2004; Lai, Nakazato, et al. 2005), morphology (Rosenthal et al. 2002, 2005), and ecological preference, most notably with respect to soil moisture and salt content (Heiser 1947; Gross et al. 2004; Karrenberg et al. 2006). At the same time, they hybridize extensively throughout their shared range, have high rates of complex backcross production (Rieseberg et al. 1999), and have long-term rates of gene flow of approximately Nefm = 0.5 in each direction (Strasburg and Rieseberg 2008).
Here, we compare patterns of sequence diversity and differentiation between H. annuus and H. petiolaris at 77 loci distributed across collinear and rearranged chromosomes to test for correlations between chromosomal rearrangements and patterns of divergence and to examine patterns of natural selection across the two species’ genomes. We also compare each of these two species to a third, historically allopatric species, Helianthus argophyllus, at a subset of these loci in order to address the degree to which long-term sympatry and hybridization have affected patterns of genetic differentiation among these species.
Achenes were collected from four H. annuus and three H. petiolaris populations distributed across the sympatric area of their distribution (fig. 1; locality information given in Yatabe et al. ). Only locally allopatric populations were sampled to exclude early generation hybrids between the two species. A fourth H. petiolaris population was sampled but later determined to be of hybrid origin based on sequence and flow cytometry data (not shown); this population was excluded from all analyses. Achenes were also collected from three H. argophyllus populations. Achenes were germinated in greenhouses at Indiana University. DNA was extracted from leaf tissue for population genetic analyses from two individuals per population; in total, eight H. annuus and six H. petiolaris individuals were employed for sequencing. For H. argophyllus, leaf tissue was collected from two individuals each from three populations (fig. 1). All DNA extractions were performed using a QIAGEN Dneasy Plant kit (Qiagen, Valencia, CA).
A transcript map was recently generated for sunflower based on the mapping of single-nucleotide polymorphisms (SNPs) for 243 expressed sequence tags (ESTs—Lai, Livingstone, et al. 2005). In the present paper, 139 mapped ESTs distributed across three collinear linkage groups (1, 3, and 10) and five rearranged chromosomes (12, 13, 14, 16, and 17) were tested for single band amplification in H. annuus and H. petiolaris. Of these, 92 ESTs were selected for sequencing tests and finally 77 were retained for the full analysis between these two species (33 collinear and 44 rearranged). Relationships among the five rearranged chromosomes between H. annuus and H. petiolaris that were included in this study, as well as the locations of all 77 loci, are shown in figure 2. Twenty-two of these loci were also sequenced in H. argophyllus. Sequencing reactions were performed on polymerase chain reaction (PCR) products previously cleaned using EXOSAP-IT (USB, Cleveland, Ohio). Sequencing reactions were carried out in a total volume of 10 μl containing 2 μl of water, 3 μl of 5 mM MgCl2, 2 μl of 2 μM primer, 2 μl of cleaned-up PCR product (equivalent from 10 to 20 ng of DNA), and 1 μl of ABI Big Dye version 3.1. Sequencing reactions were then purified with the magnetic bead CleanSeq kit from Agencourt and loaded on an ABI 3730 capillary sequencer (Applied Biosystems, Foster City, CA).
Sequences were aligned using CodonCode Aligner version 1.2.0 (Codoncode Corporation, Dedham, MA) or Sequencher version 4.7 (Gene Codes Corporation, Ann Arbor, MI). Individuals heterozygous for indels were cloned using the Topo-TA cloning kit with DH5α-T1R one shot chemically competent cells (Invitrogen, Carlsbad, CA). DNA plasmid isolation was then performed using QIAprep Miniprep (Qiagen). Sequencing reactions were directly performed on purified plasmids using the same protocol as for PCR products. All sequences have been submitted to GenBank; accession numbers are given in supplementary table S1 (Supplementary Material online).
Because PCR products were sequenced directly, it was not always possible to infer allelic phase. In addition, phase was not applicable for the concatenated sequences. Thus, all analyses considered genotypic rather than haplotypic data. We have included some measures of uncorrected average pairwise sequence diversity and divergence based on arbitrarily phased SNPs; these measures are not affected by SNP phase. We have taken care not to use measures involving full haplotypic data or corrected sequence diversity/divergence, which would be affected by SNP phase. Because EST sequences were short (<500 bp), most calculations and tests were performed for each EST separately and for concatenated sequences of ESTs from collinear and rearranged linkage groups or of all ESTs together. DnaSP 4.50.1 software (Rozas et al. 2003) was used to calculate various measures of sequence diversity and divergence, including gross and net divergence between species (net divergence = gross divergence − average diversity within each species) as well as Pi(a)/Pi(s) and Ka/Ks ratios (of nonsynonymous to synonymous intraspecific diversity and interspecific divergence, respectively) and to perform McDonald–Kreitman (MK) tests (McDonald and Kreitman 1991). The total number of synonymous and nonsynonymous sites for a set of sequences was estimated by averaging the number of synonymous and nonsynonymous sites over all sequences when more than two sequences were compared. We attempted to correct for a possible loss of power in the MK test due to weakly deleterious variants possibly segregating at low frequency within each species (Fay et al. 2001) by performing the test on the entire data set, on only polymorphic sites with common variants (minor allele frequency >2/28) and on only polymorphic sites with rare variants (minor allele frequency ≤2/28). For sites with more than two alleles segregating, the frequency of the rarest allele was used. We also attempted to correct for bias in the MK test due to past changes in selective constraints (due, e.g., to the significant increase in effective population size H. annuus and H. petiolaris have undergone since their initial divergence—Strasburg and Rieseberg 2008) by comparing rates of amino acid divergence above neutral expectations across loci. If lower selective constraints in the past due to smaller historical effective population sizes are the cause of the excess amino acid divergence, then those demographic changes should affect the entire genome, whereas positive selection is only expected to affect a subset of genes (Fay et al. 2002). We estimated the excess of amino acid differences between species over neutral expectations as the number of amino acid differences minus the number of silent differences times the ratio of amino acid to silent polymorphisms, Da − Ds • (Pa/Ps), after Fay et al. (2001).
Autocorrelation of genetic differentiation among loci along each linkage group was examined using the program SPAGeDi (Hardy and Vekemans 2002). Data from 99 mapped microsatellite loci (Yatabe et al. 2007) were also included, and genetic differentiation was measured using the Gamma'st and G'st (Hedrick 2005) statistics for sequence and microsatellite data, respectively. Following Scotti-Saintagne et al. (2004), we tested for correlations between genetic differentiation and map distance using Moran's index (Sokal and Oden 1978); significance was tested using 20,000 permutations. A genome scan for divergent selection on individual SNPs was performed using the program BayesFst (Beaumont and Balding 2004), with all populations within a species combined and independent normal priors for all locus parameters. Each analysis was run for 3.2 million steps following a 640,000 step burn-in; 10,000 values were used to simulate the posterior distribution. Output from BayesFst was analyzed using the CODA package of R version 2.8.0 (http://www.r-project.org/), using a one-tailed 0.05 cutoff for sites showing evidence of divergent selection. Four runs with separate random number seeds gave identical results for each species pair.
Seventy-seven ESTs were sequenced in eight H. annuus individuals (two individuals per population from four populations) and six H. petiolaris individuals (two individuals per population from three populations). Among the ESTs, 17 were composed entirely of noncoding regions, 37 were composed entirely of coding regions, and 23 contained both. The average aligned sequence length was 154 bp, with a minimum of 53 bp and a maximum of 487 bp. The total aligned length of the concatenated sequences was 11,869 bp (4,052 bp and 7,817 bp for loci on collinear and rearranged chromosomes, respectively). Twenty-two of these loci were also sequenced in six H. argophyllus individuals (two individuals per population from three populations); the total aligned length of these 22 loci was 5,085 bp. Additional basic information and summary statistics for each locus individually are given in supplementary table S2 (Supplementary Material online).
Within-species diversity data are given in table 1. For both H. annuus and H. petiolaris, overall diversity is lower on collinear chromosomes than on rearranged chromosomes, although differences are fairly small. Both of these species have far more genetic diversity than H. argophyllus, as expected based on their respective ranges and on previous work (Strasburg and Rieseberg 2008, forthcoming).
Measures of genetic distance among populations within H. annuus and H. petiolaris and between the two species are given in table 2. Considering all 77 loci together, the genetic distances between populations of the same species are very similar based on Nei's (1982) γST genetic distance (values range from 0.22 to 0.26) and on net sequence divergence (0.15–0.27%). Distances between populations of different species are also fairly consistent across population pairs, ranging from γST of 0.41 to 0.47 and net sequence divergence of 0.78–0.98%. These results are consistent with previous studies showing high levels of gene flow and relatively little genetic structure within each species (Schwarzbach and Rieseberg 2002; Gross et al. 2003; Strasburg and Rieseberg 2008). When loci on collinear and rearranged chromosomes are considered separately, there is somewhat more variation in comparisons among individual populations, but no consistent trend toward increased divergence for either class of loci.
Sequence divergence among species pairs overall is given in table 3. Gross divergence between H. annuus and H. petiolaris for loci on rearranged chromosomes is somewhat higher than for loci on collinear chromosomes (although not significantly so), but this difference largely disappears in the comparison of net divergence, which takes into account diversity within each species. Interestingly, net sequence divergence between H. annuus and H. argophyllus is higher than between H. annuus and H. petiolaris, despite the fact that the former species pair is roughly 40% younger than the latter pair (1.1 vs. 1.8 My—Strasburg and Rieseberg, forthcoming). These results are consistent with genome-wide microsatellite divergence data showing H. annuus and H. petiolaris, which are broadly sympatric, to be more similar to each other than are H. annuus and H. argophyllus, which have historically been allopatric (Yatabe et al. 2007). Helianthus annuus and H. argophyllus have come into contact within the past ~100 years and are exchanging genetic material in their area of range overlap (Strasburg and Rieseberg, forthcoming), but the H. annuus populations sampled here are well outside the H. argophyllus range (see fig. 1).
Over the 11,869 bp distributed across eight chromosomes, only 26 fixed nucleotide differences (plus one fixed indel) were found (table 3) between H. annuus and H. petiolaris. An additional 649 mutations were polymorphic in one species and monomorphic in the other and 113 mutations were shared by the two species. Close to three times more fixed differences are found on rearranged chromosomes than collinear chromosomes, but the total length of sequence from rearranged chromosomes is almost twice as high, and the ratio of fixed differences to sequence length is not significantly different between the two groups (χ2 = 0.32, degrees of freedom [df] = 1, P = 0.44).
The distributions of gross and net sequence divergence values by locus are shown in figure 3. Loci found on collinear and rearranged chromosomes are not significantly different from each other for either average gross (one-tailed t-test, t = 0.65, df = 75, P = 0.27) or net (t = 0.11, df = 75, P = 0.46) divergence. Considering rearranged chromosomes only, nine loci map to regions within three inversions that distinguish the two species (see fig. 2), where effective recombination suppression is likely to be high, whereas 35 loci are outside of inversions, including areas within large translocations, where recombination suppression may be fairly limited except near break points (Grant 1975). Net sequence divergence is very similar for loci within inversions and loci outside of inversions (0.69% vs. 0.66%, t = 0.09, df = 42, P = 0.47). If a single dramatic outlier, HT080 (see fig. 3C), is excluded, loci within inversions show substantially lower divergence, and the difference is marginally significant (0.39% vs. 0.66%, t = 1.64, df = 41, P = 0.052). Sample sizes are obviously small here, but there does not appear to be any effect of inversions generally on increasing genetic divergence between the two species.
When we focus on loci that are closely linked (within 5 cM) to a chromosomal break point, where recombination suppression is expected to be highest, a marginally significant trend arises, in which the 18 loci near break points have greater net sequence divergence than the 26 loci on rearranged chromosomes but not near break points (0.84% vs. 0.55%, t = 1.44, df = 42, P = 0.071). In this case, if the outlier, HT080 (see fig. 3D), is excluded, the difference becomes significant (0.84% vs. 0.45%, t = 2.35, df = 41, P = 0.014). If there is an effect of rearrangements on genetic divergence, it appears to be localized to areas very near break points.
For both collinear and rearranged genes, the majority show virtually no differentiation (net sequence divergence less than 0.5%), with a long tail of more strongly differentiated genes. This pattern is consistent with expectations under fairly high gene flow with diversifying selection at a number of loci (Le Corre and Kremer 2003; Latta 2004), as is the case with these two species. The farthest outlier for both gross and net divergence, HT123, is an unknown protein found on a collinear chromosome. Another outlier for net divergence, HT080, is found on a rearranged chromosome and contains 3 of the 27 fixed differences between the species (see below).
No autocorrelations were observed in levels of genetic differentiation along linkage groups for any map distance class (supplementary fig. S1, Supplementary Material online). This pattern holds true when all loci are analyzed together or when collinear and rearranged chromosomes are considered separately. It also holds when sequence data are analyzed alone or with the previously published microsatellite data (Yatabe et al. 2007).
Results of MK tests are given in table 4. There is a dramatic excess of fixed amino acid differences between H. annuus and H. petiolaris compared with nonsynonymous polymorphism within each species when standardized by silent fixed differences/polymorphism, and this excess is largely explained by loci found on collinear chromosomes—half of all fixed amino acid differences are found on collinear chromosomes, compared with just 1/8 of all fixed synonymous or noncoding sequence differences. This pattern holds when only common polymorphisms or only rare polymorphisms are considered, indicating that it is not significantly affected by slightly deleterious nonsynonymous mutations segregating at low frequencies in either species (in fact, patterns of excess amino acid divergence are very slightly higher when only rare polymorphisms are considered). The silent substitutions in table 4 include both synonymous substitutions in protein-coding regions as well as substitutions in noncoding regions. MK tests restricted to coding regions are qualitatively similar, with the same set of comparisons showing significant differences in patterns of amino acid divergence versus polymorphism (supplementary table S3, Supplementary Material online). Taking all sites together, the excess of amino acid differences between H. annuus and H. petiolaris over neutral expectations is 7.5 or 75% of amino acid differences. Excess amino acid differences for loci on collinear and rearranged chromosomes are 4.6 and 2.9, respectively, despite the fact that rearranged loci contain almost 75% more coding sequence (5,138 vs. 2,948 bp).
Helianthus annuus and H. argophyllus show no evidence of divergence due to natural selection based on MK tests. In contrast, H. petiolaris and H. argophyllus show a significant excess of amino acid divergence. Ratios of nonsynonymous to silent divergence are virtually identical for H. annuus/H. petiolaris and H. petiolaris/H. argophyllus; the ratio of nonsynonymous to silent diversity is somewhat higher for H. petiolaris/H. argophyllus but not significantly so.
Plots of synonymous versus nonsynonymous divergence and diversity values for loci containing coding regions (25 and 35 loci on collinear and rearranged chromosomes, respectively) for H. annuus and H. petiolaris are shown in figure 4. Diversity values are averages between the two species. As expected, the large majority of loci have Ka/Ks or Pi(a)/Pi(s) ratios less than one, indicating a prominent role for purifying selection; half of all protein-coding loci show no amino acid divergence and more than half show no amino acid polymorphism. Ratios for individual loci must be interpreted carefully given their short length, but there are no obvious differences in the distribution of nonsynonymous to synonymous divergence or diversity ratios for loci in collinear versus rearranged chromosomes. Excluding loci with no synonymous divergence, the unweighted average Ka/Ks ratio for collinear loci, 0.13, is not significantly different from the average for rearranged loci, 0.10 (t = 0.51, df = 53, P = 0.59). Likewise, Pi(a)/Pi(s) ratios are not significantly different (0.08 for collinear loci, 0.09 for rearranged loci; t = 0.36, df = 53, P = 0.76). However, when concatenated sequences are considered together, Ka/Ks is almost twice as high for collinear loci as for rearranged loci (0.18 vs. 0.10), whereas Pi(a)/Pi(s) is 0.10 for both classes.
Two of the three loci with Ka/Ks ratios greater than one also have fixed differences between H. annuus and H. petiolaris that were found to be outliers in a genome scan for positive selection (table 5). All sites found to be significantly differentiated in the genome scan have fixed differences between H. annuus and H. petiolaris; 26 sites have fixed nucleotide differences (7 on collinear chromosomes and 19 on rearranged chromosomes), and 8 sites are part of a fixed 8-bp indel difference between the two species. Counting the indel, which is located on a collinear chromosome, as a single site, there is not a significant difference between the proportion of fixed differences on collinear chromosomes versus rearranged chromosomes compared with overall sequence length (χ2 = 0.08, df = 1, P = 0.62). Among the loci that are found on rearranged chromosomes, those within 5 cM of a chromosomal break point between H. annuus and H. petiolaris are marginally overrepresented among loci showing fixed differences between the two species, although the trend is not significant. Six of the 10 loci showing fixed differences are within 5 cM of a break point, compared with 18 of 44 rearranged loci overall (Fisher's exact test, P = 0.23). Considering individual sites, 13 of 19 fixed differences are within 5 cM of break points, compared with 3,765 of 7,817 sites overall (χ2 = 2.36, df = 1, P = 0.078). These fixation events are not necessarily independent; in particular, four loci within 5 cM of the same break point on chromosome 17 account for 8 of the 19 fixed differences on rearranged chromosomes. As seen in the MK test results, amino acid differences are overrepresented among fixed differences compared with the ratio of nonsynonymous to synonymous polymorphism within species, and this overrepresentation is largely due to fixed differences on collinear chromosomes; amino acid differences represent 63% of fixed differences on collinear chromosomes (including the 8-bp indel as a single difference) compared with just 26% of fixed differences on rearranged chromosomes. Interestingly, none of the eight fixed differences around the break point on chromosome 17 are nonsynonymous changes. Although a total of 16 and 41 sites show fixed differences between H. annuus/H. argophyllus and H. petiolaris/H. argophyllus, respectively, no sites were found to be more differentiated than neutral expectations based on Bayesian analyses. Presumably, this is because fewer sites were available for analysis and the average differentiation is higher for these two species pairs (especially H. petiolaris/H. argophyllus).
An excess of amino acid differences between species relative to amino acid polymorphism within species may be due to natural selection, but it may also be explained by reduced selective constraints due to smaller effective population sizes in the past (Eyre-Walker 2002). Both H. annuus and H. petiolaris have undergone significant population growth since their divergence (Strasburg and Rieseberg 2008), so such an explanation is plausible here. To disentangle the effects of demographic changes from divergent selection, we plotted the amount of amino acid divergence between H. annuus and H. petiolaris against the excess of amino acid divergence over neutral expectations, equal to Ka − Ks·[Pi(a)/Pi(s)] (Fay et al. 2001) for loci at which there is any amino acid variation. As seen in figure 5, there is a significant correlation between the rate of amino acid divergence between the two species and the excess of amino acid divergence relative to amino acid polymorphism within species, indicating that not all loci have been affected equally by a genome-wide increase in selective constraint. We find that genes that are evolving more quickly have a significantly higher excess amino acid divergence, as Fay et al. (2002) found with their “fast” versus “neutral” genes in a comparison of Drosophila melanogaster and Drosophila simulans. The relationship between Ka and excess of Ka is highly significant (r2 = 0.286, df = 28, P = 0.0012); and if the most divergent gene, an outlier that shows essentially no excess of amino acid divergence, is removed, the relationship becomes even stronger (r2 = 0.585, df = 27, P < 0.0001). This pattern is more consistent with a subset of loci being under divergent selection than with a genome-wide excess of amino acid divergence due to lowered selective constraints at some point in the past.
We have attempted to address a number of questions concerning the evolution and differentiation of two widespread, broadly sympatric annual sunflower species, H. annuus and H. petiolaris. 1) Is there molecular evidence for positive selection in the divergence of these two species? 2) What effect, if any, do chromosomal rearrangements have in the partitioning of sequence divergence generally and nonsynonymous divergence specifically between the two species? 3) How do genomic patterns of diversity and divergence between H. annuus and H. petiolaris compare with patterns between these two species and a third, historically allopatric species, H. argophyllus?
We found considerable evidence for divergent selection between H. annuus and H. petiolaris, with four times as many fixed amino acid differences as predicted under neutrality based on the ratio of synonymous to nonsynonymous polymorphism within each species. Our results add to a growing body of literature documenting significant divergent selection at the molecular level, at the level of both individual genes and large groups of genes or entire genomes (e.g., Fay et al. 2001; Barrier et al. 2003; Kosiol et al. 2008; Minder and Widmer 2008; Vamathevan et al. 2008). Our data indicate that approximately 75% of fixed amino acid differences between H. annuus and H. petiolaris were driven by natural selection (with an admittedly small sample size, given that the two species show very few fixed differences at nonsynonymous or silent sites). This number is within a range of values from a number of studies, mostly in Drosophila and primates, finding that anywhere from 35% to more than 90% of nonsynonymous substitutions are driven by positive selection (Fay et al. 2002; Sawyer et al. 2003, 2007; Bierne and Eyre-Walker 2004; Obbard et al. 2006). Comparably high rates of adaptive divergence have also been described in bacteria (Charlesworth and Eyre-Walker 2006) and viruses (Nielsen and Yang 2003). Other estimates have been considerably lower (Bustamante et al. 2005) and vary considerably among taxonomic groups (Eyre-Walker 2006) but overall rates of adaptive evolution are quite high, and many of these values are likely to even be underestimates (Charlesworth and Eyre-Walker 2008).
While there is substantial information available concerning morphological and ecological differentiation between H. annuus and H. petiolaris (Heiser 1947; Gross et al. 2004; Lexer et al. 2005; Karrenberg et al. 2006), we do not have adequate information to describe in detail the ecological or other adaptive significance of the specific amino acid differences documented here. We can, however, draw some tentative conclusions based on previous work. Fifteen of the 27 fixed differences between H. annuus and H. petiolaris are within 5 cM of a QTL for phenotypic differences between the two species (Rieseberg et al. 2003; Lexer et al. 2005), including seed shape (5 of the 17 loci, accounting for 11 of the 27 fixed differences), floral shape, tissue ion concentrations, and growth architecture (see table 5). In addition, 13 of the 19 fixed differences on rearranged chromosomes map near chromosomal break points, where QTLs for species differences and pollen sterility are preferentially located (Lai, Nakazato, et al. 2005). The extent to which these fixations are related to the QTL for species differences, proximity to chromosomal break points, or some other factor is not clear. But they represent interesting candidate genes for future study, especially those containing multiple fixed nonsynonymous differences. For example, a cellulose synthase on linkage group 16 has four fixed amino acid substitutions between the two species; and CRA1, a seed storage protein-encoding gene on linkage group 10, has two fixed nonsynonymous differences (one nonconservative) three codons apart. The only gene within an inversion showing fixed differences (one amino acid and two synonymous differences) is an expansin; expression changes in members of this gene family have been associated with both drought (Wu and Cosgrove 2000; Buchanan et al. 2005) and salt (Camacho-Cristobal et al. 2008; Kwon et al. 2008) stress.
Regarding the effects of chromosomal rearrangements differentiating H. annuus and H. petiolaris, we find little support for models that implicate a role for rearrangements in increasing differentiation via the suppression of recombination over large genomic distances. We also see little evidence for a particular role of inversions, which are likely to show stronger recombination suppression than large translocations. However, we do see evidence that recombination suppression very near chromosomal break points has a weak effect in increasing differentiation in those regions, as evidenced by marginally higher net sequence divergence and a marginally greater frequency of fixed sequence differences near break points. These results are similar to those reported by Yatabe et al. (2007), who found no difference in microsatellite differentiation on collinear versus rearranged chromosomes, but did find a weak effect of proximity to chromosomal break points on differentiation. Their interpretation, which is also consistent with our results, was that the unit of isolation between H. annuus and H. petiolaris is likely to be quite small and that genes or genomic regions contributing to isolation are readily decoupled from nearby regions, except in areas where recombination is suppressed most strongly, such as near chromosomal break points. Portions of rearranged chromosomes more distant from break points are likely to show more limited recombination suppression (Grant 1975; Schaeffer and Anderson 2005).
Other empirical tests confirm the predictions of recombination suppression models, at least in some systems (reviewed in Hoffmann and Rieseberg 2008). They have largely been supported in Drosophila (Noor et al. 2001b; Schaeffer et al. 2003; Brown et al. 2004; Machado et al. 2007; Noor et al. 2007), tomatoes (Livingstone and Rieseberg 2004), Anopheles mosquitoes (Stump et al. 2007), and chromosome races of the Sorex araneus complex of shrews (Basset et al. 2006). However, although a number of studies comparing humans and chimps suggested increased sequence divergence, gene expression divergence, or evidence of selection in rearranged relative to collinear chromosomes (Navarro and Barton 2003b; Marques-Bonet et al. 2004), some researchers have questioned whether the sequence divergence and Ka/Ks differences documented by Navarro and Barton (2003b) can plausibly be explained by the theoretical model and hybridization scenario they suggest (Hey 2003; Lu et al. 2003); and reanalyses or studies done with larger data sets have brought these earlier results into doubt (Vallender and Lahn 2004; Zhang et al. 2004; Mikkelsen et al. 2005; Szamalek et al. 2007).
Interestingly, the bulk of excess amino acid divergence we see is found on collinear chromosomes rather than rearranged chromosomes; in fact, polymorphism and divergence ratios for rearranged chromosomes are not significantly different from neutral expectations. Lexer et al. (2005) also found that an excess of QTL for species differences were located on collinear chromosomes, a pattern generally consistent with our data. The reason for this pattern is not immediately clear. Although recombination suppression in rearrangements may limit the efficiency of positive selection due to Hill–Robertson interference (Hill and Robertson 1966), it is not expected to have an effect within each species if rearrangements are fixed, as is the case here (Burke et al. 2004), and recombination levels are not decreased. One highly speculative possibility is that the chromosomal variants that are currently fixed between the two species were still segregating within one or both species for some time following initial species divergence, and that during this period adaptive evolution was inhibited on chromosomes that were polymorphic for rearrangements but not on collinear chromosomes. Comparisons between these two species and H. argophyllus indicate that much of the adaptive divergence did occur soon after initial species divergence (see below), which is consistent with this hypothesis. Alternatively, this pattern may be a sampling artifact that will disappear when additional loci are sampled.
As far as we are aware, the degree to which segregating chromosomal rearrangements, especially inversions, might limit adaptive evolution due to recombination suppression has not been explored empirically or theoretically. Although recombination is generally very strongly reduced near chromosomal break points, gene flux can be fairly high near the center of inversions due to gene conversion and double crossing over (Andolfatto et al. 2001; Schaeffer and Anderson 2005). There is evidence of positive selection and rapid decay of linkage disequilibrium within inversions in D. pseudoobscura, suggesting that gene flux is enough to break up associations that are not maintained by epistatic selection (Schaeffer et al. 2003; Schaeffer and Anderson 2005). The degree of linkage disequilibrium within inversions is a function of both the degree of recombination suppression and the age of the inversion. Unfortunately, it is difficult to put ages on the rearrangements between H. annuus and H. petiolaris because it is not clear what arrangements are derived or when the rearrangements occurred relative to species divergence.
A comparison of H. annuus versus H. argophyllus and H. petiolaris versus H. argophyllus MK tests is potentially informative with regard to the timing of adaptive divergence among these species. The sister species H. annuus and H. argophyllus show no evidence of adaptive divergence—polymorphism and divergence ratios indicate a 56% deficit of nonsynonymous fixations relative to neutral expectations, suggestive of a history of largely purifying selection following divergence for these two species. In contrast, H. petiolaris and H. argophyllus show patterns of adaptive divergence comparable to H. annuus and H. petiolaris; they show an excess of almost 11 nonsynonymous fixations relative to neutral expectations, indicating that close to 70% of their amino acid divergence was driven by positive selection. Helianthus annuus and H. petiolaris diverged roughly 700,000 years before H. annuus and H. argophyllus (Strasburg and Rieseberg 2008), and MK test results suggest either that the adaptive divergence between H. annuus and H. petiolaris largely occurred prior to H. annuus/H. argophyllus speciation or that it has occurred largely along the H. petiolaris lineage. Which scenario is more likely could be determined by polarizing each substitution to determine whether H. annuus or H. petiolaris has the derived allele. We have attempted to do so for coding substitutions using Helianthus tuberosus (Jerusalem artichoke) and Taraxacum officinale (dandelion) sequences from the Compositae Genome Project EST database (http://www.compgenomics.ucdavis.edu/). We were able to reliably polarize 14 of 19 coding substitutions; H. annuus had the derived allele in six cases, and H. petiolaris had the derived allele in eight cases. Although these sample sizes are far too small for anything more than a speculative answer, it appears to be the case that each species has undergone a comparable amount of divergent evolution, suggesting that most of it took place relatively soon after their initial divergence, prior to the divergence of H. annuus and H. argophyllus.
If this hypothesis is correct, it has implications for the mode of speciation for H. annuus/H. petiolaris and H. annuus/H. argophyllus. Helianthus annuus and H. petiolaris are currently broadly sympatric and have been hybridizing for most if not all of their existence (Strasburg and Rieseberg 2008). If these two species underwent parapatric or sympatric speciation, we would expect to find a signal of positive selection at loci directly or indirectly involved in reproductive isolation through divergent adaptation (Albertson et al. 2003; Blais et al. 2007; Via and West 2008). In contrast, although the early biogeographic history of H. annuus and H. argophyllus is not well understood, they have historically been allopatric and it is plausible that their speciation was allopatric as well. If that is the case, reproductive isolation may have been a byproduct of the accumulation of nonadaptive incompatibility alleles (Turelli et al. 2001), in which we would not necessarily expect to see a signature of divergent selection during and soon after speciation. Although our data are consistent with these predictions and although it is tempting to speculate about how much of the divergence between H. annuus and H. petiolaris occurred during incipient speciation and was perhaps involved in the speciation process itself, our data and the biogeographic histories of these three species are not sufficiently detailed to address this question rigorously.
As genome-wide genetic diversity and divergence data become increasingly available, empirical support has increased for a genic, as opposed to genomic, view of the speciation and species divergence (Wu 2001; Lexer and Widmer 2008). Species that are strongly reproductively isolated at a number of genes and maintain morphological and ecological differences may still actively hybridize and exchange genetic material across much of their genomes (Strasburg and Rieseberg 2008). Adaptive divergence can proceed at the loci contributing to reproductive isolation and species differences even as much of the genome shows little or no genetic differentiation over millions of years. In species that are chromosomally differentiated, the role of rearrangements in promoting differentiation over large genomic regions may be limited depending on factors such as the nature of the rearrangements and amount of hybridization that occurs between species; but chromosomal break points, where recombination is suppressed most strongly, may permit both adaptive and nonadaptive divergence. Future work in this system will include analyses with more markers to allow more fine-scale characterization of genomic patterns of differentiation as well as more detailed population genetic and gene expression studies of genes near break points and other genes showing significant differentiation. This will help us to understand in more detail the degree of overall genomic differentiation between these species as well as the genetic basis of their isolation and phenotypic differentiation.
We would like to thank Eric Baack and Sophie Karrenberg for sharing their collections; Steve Wooley for writing a computer program to make filtered alignments and BayesFst input files; and Briana Gross, Ken Olsen, Nic Kooyers, Kate Waselkov, and Stuart McDaniel for comments on a previous version of the manuscript. This work was supported by a National Institutes of Health Ruth L. Kirschstein Postdoctoral Fellowship (5F32GM072409-02) to J.L.S. and grants from the National Science Foundation (DEB-0314654 and DBI0421630) and the National Institutes of Health (GM059065) to L.H.R.