Our results indicate highly elevated substitution rates in the chloroplast clpP1 exons in Sileneae and in Oenothera. Although long branch lengths also could indicate high ages, we argue that this is unlikely because the comparisons with other genes () which, unless acquired by horizontal transfer, must be of the same age. Even if the topological relationships sometimes are ambiguous, any resolution of the trees in will show that the sister group has a significantly shorter branch. As sister groups by definition has the same age, either the short branch has undergone a substitution rate decrease or vice versa. This supports the explanation for the long clpP1 branches as likely to have been the result of a substitution rate increase.
In Sileneae, there are at least three independent increases in substitution rates. Loss of introns in clpP1 appears correlated with elevated substitution rates in several lineages. In two of the Sileneae lineages, high substitution rates are accompanied with both gene duplications and significant positive selection.
Even if the rate of synonymous substitutions in the
clpP1 gene for the
Sileneae species with generally elevated substitution rates appears higher than that of other chloroplast genes for the same taxa (), most of the total substitution rate increase can be ascribed to non-synonymous substitutions. A generally elevated substitution rate for a whole genome such as the mitochondrial genome of some species of
Plantago could be explained by, e.g. an increased amount of oxygen free-radicals
[22], but elevated substitution rates in, and exclusive to, a specific gene are harder to explain. The very long branch leading to
Silene conica and
S. conoidea is puzzling in this context, because the rates of both synonymous and non-synonymous substitutions are very high, and there is no significant positive selection. The pattern is similar, but less extreme, in
L. chalcedonica (). The length of the branches leading to
Solanum/
Lycopersicon (Solanaceae), within the Fabaceae, to
Vitis, and to
Cucumis might also indicate elevated substitution rates, but this is much less striking than in the long
Sileneae branches and in
Oenothera (
Fig. S1).
By comparing synonymous and non-synonymous substitutions, we were able to detect statistically significant positive selection at the gene level on three branches. However, we anticipate that the actual duration of the episodes of positive selection in the tree might be larger. The power of the tests employed here is relatively low, i.e. positive selection is difficult to detect even if it exists at many sites
[7]. Recently, methods have been developed to detect positive selection on individual codons for specific branches (e.g. the branch-site likelihood method
[23]). These methods are generally more powerful and their utilization has resulted in a marked increase in the number of published reports of positive selection
[24]. However, simulations by Zhang
[24] showed that the power of these methods comes at a cost in the form of high levels of false positives.
Under the assumption that substitutions in non-coding sequences are selection-wise neutral, elevated substitution rates in exon sequences compared to introns provide an indication of positive selection
[25]. By comparing the branch lengths of the exon tree () and the intron tree () it is apparent that this is the case in
S. fruticosa and probably also in
Lychnis chalcedonica. For example, the uncorrected pairwise base distance between
S. fruticosa (Sf1) and
S. schafta is 0.23 in exons, but 0.05 in the introns, and for
L. chalcedonica (exon: Lc1, intron: Lc3/Lc4 combined) and
L. flos-jovis these figures are 0.30 and 0.04, respectively. Finally,
Oenothera flava (the only
Oenothera species in this study to have introns) shows more variability in exons than in introns compared to
Eucalyptus (0.26 and 0.18, respectively), although the difference is less striking for this taxon.
The
clpP1 exon sequences that show the most extreme substitution rates are those of
Silene conica and
S. conoidea, but because they lack introns in the gene, the exon/intron comparison cannot be made. The variability in the gene is approximately an order of magnitude higher (synonymous and non-synonymous substitutions contributing roughly equally to the increase) than that of spacer-regions in the cpDNA of
S. conica (see below). The pairwise base difference between
Silene conica and
Silene latifolia in intergenic spacers is on average 0.03 (
rbcL/
accD: 0.037,
petA/
psbJ: 0.035,
psbE/
petL: 0.028,
rpl20/
rps12: 0.022, data from
[18]), but the difference in the
clpP1 gene is 0.31. Despite the very divergent sequences, the dN/dS ratios did not indicate positive selection acting within this group. Because ratios around 1 implicate absence of both positive and purifying selection, this indicates that the
S. conica and
S. conoidea sequences may have lost their function. However, the simulation experiment strongly rejected the hypothesis that the absence of stop codons can be explained by chance alone. Further support for this is given by the fact that
Silene conica and
S. conoidea have seven indels in the
clpP1 sequence, all of lengths that are multiples of three. The existence of these indels that do not distort the reading frame is in itself strong evidence for maintained gene function. Finally, even if
S. conica appears to have a somewhat elevated cpDNA substitution rate in general
[18], it seems unlikely that the very high substitution rates in
clpP1 would be the effect of lost function. Xing and Lee
[26] found that alternative splicing could greatly relax selection pressure (measured as dN/dS). This effect was accompanied by a strong decrease in synonymous substitutions. Because we observe a strong increase in synonymous substitutions in our data (), this explanation too, seems unlikely in this particular case.
Whether there is a causal relationship between the increase in synonymous and non-synonymous rates in Silene conica/conoidea remains unclear. However, there are other indications that positive selection of clpP1 is correlated with increased synonymous substitution rates; Lychnis chalcedonica and Oenothera elata also have elevated synonymous substitution rates (). Some of the other branches in the eudicot clpP1 tree () have combinations of branch lengths and dN/dS ratios that indicate a more widespread occurrence of positive selection of the gene. In particular the branch leading to Solanum/Lycopersicon (node 14 in ) that has a dN/dS ratio significantly higher than 1 before Bonferroni-correction (), but also the branches leading to node 6 (), Medicago, Vitis, and Cucumis seem interesting targets for further investigations.
In a systematic search for positive selection in higher plants based on almost 140,000 embryophyte gene sequences from GenBank, very few cases of ω values above one were found when averaging over whole genes
[27]. Only in two cases were ω>2, and both of these were sequence pairs of nuclear encoded genes
[27]. This illustrates how unusual our findings are. The recent report on positive selection in the chloroplast gene
rbcL [10] clearly shows that specific sites in that gene are under positive selection in a wide range of land plants. Our study gives indications that positive selection in the
clpP1 gene might be widespread in flowering plants. In
rbcL it is only a small proportion of the sites that appear to be under positive selection
[10], whereas in
clpP1 a very large proportion of the sites are affected.
In the present study, the rates of synonymous substitutions are rather conserved with respect to different taxa and genes, with the important exception of the species undergoing rapid evolution in the clpP1 gene (). None of the “normal” taxa or genes shows as high dS rates as the clpP1 gene from those three species. The degree of elevated dS rates also indicates an interesting pattern: the species with the strongest estimated positive selection has the least elevated dS rate and vice versa, i.e. the rate of non-synonymous substitutions are roughly the same among the three species.
Elevated evolutionary rates due to positive selection or relaxed selective constraints are often preceded by gene duplication
[28]. We detected extra
clpP1 gene copies only in
Lychnis chalcedonica and
Silene fruticosa. Indeed, the completely sequenced chloroplast genome of
Oenothera elata (NC_002693) contains only one copy of
clpP1.
Only one of the four
clpP1 copies in
L. chalcedonica is potentially functional (Lc1), i.e. the others contain stop codons or are incomplete. The intron-containing Lc4 fragment shows obvious signs of elevated substitution rates in
clpP1, although less so than Lc1. The Lc3 copy, apparently a pseudogene, is less divergent than the other copies in
Lychnis chalcedonica. This does not seem to be an artifact due to missing data, because in the region where sequence information for Lc1, Lc2, and Lc3 overlap the uncorrected distance between Lc1/Lc2 and
L. flos-jovis (the “normal”
Lychnis species in this study) is 30.3%/34.6%, whereas between Lc3 and
L. flos-jovis it is 3.9%. Thus, in absence of a formal phylogenetic analysis of the
clpP1 copies in
L. chalcedonica, we may speculate that at least the duplication leading to Lc3 preceded the onset of positive selection. In
S. fruticosa, Sf2 is markedly less divergent than the two other copies, and thus also probably the result of an ancient duplication preceding the non-synonymous rates increase. In the region where sequence information for Sf1 and Sf2 overlaps the uncorrected distance between Sf1 and
S. schafta is 22.6%, whereas between Sf2 and
S. schafta it is 5.2%. Both these cases may thus agree with one of few documented cases where gene duplication precedes the onset of positive selection
[29]. It may be that positive selection, under some circumstances, can be triggered by duplication rather than being an expected outcome of it.
The very large insertions (174 to 197 amino acids) found in the clpP1 exon 1 of Silene fruticosa (Sf1), Lychnis flos-cuculi, and L. abyssinica do not cause frame shifts or stop codons. To our knowledge, the effect of indel evolution has not been studied in relation to positive selection. It is possible that repetitive insertions are beneficial, because given that the repeats are in multiples of three nucleotides, they reduce the probability of stop codons, while possibly fostering new phenotypic variants.
-Conclusions
In our study, four distantly related taxa or groups of taxa (
Oenothera,
Silene fruticosa,
Silene conica/conoidea, and
Lychnis chalcedonica/flos-cuculi/abyssinica) exhibit substitution rates in
clpP1 exon sequences that are hitherto unprecedented in the chloroplast genome. We conclude that these high evolutionary rates are correlated with positive selection of
clpP1 in the evolutionary histories of at least three of these four groups. In the case of
Lychnis, this was probably preceded by a duplication of a segment including
clpP1,
psbB,
psbT,
psbN, and
psbH, but only the
clpP1 gene shows signs of positive selection. We cannot rule out the possibility of gene duplications as a causal agent in the other three cases, because duplicates may either be ambiguous, extinct, or undetected. One of the major aspects of the present results is that they indicate that positive selection may be accompanied by elevated synonymous substitution rates in some cases. If this indeed proves to be the case it will have far-reaching consequences for our ability to detect positive selection. Also, the fact that positive selection appears to have originated in at least three closely related lineages of
Sileneae calls for caution when interpreting plastid data at population and phylogenetic levels (cf.
[9]). The relationship between cpDNA duplications, increased substitution rates, positive selection, and indel evolution in the chloroplast genome needs further scrutiny, and plant
clpP1 appears to constitute an excellent model system for such studies.