|Home | About | Journals | Submit | Contact Us | Français|
Cypripedium calceolus, although widespread in Eurasia, is rare in many countries in which it occurs. Population genetics studies with nuclear DNA markers on this species have been hampered by its large nuclear genome size. Plastid DNA markers are used here to gain an understanding of variation within and between populations and of biogeographical patterns.
Thirteen length-variable regions (microsatellites and insertions/deletions) were identified in non-coding plastid DNA. These and a previously identified complex microsatellite in the trnL-trnF intergenic spacer were used to identify plastid DNA haplotypes for European samples, with sampling focused on England, Denmark and Sweden.
The 13 additional length-variable regions identified were two homopolymer (polyA) repeats in the rps16 intron and a homopolymer (polyA) repeat and ten indels in the accD-psa1 intergenic spacer. In accD-psa1, most of these were in an extremely AT-rich region, and it was not possible to design primers in the flanking regions; therefore, the whole intergenic spacer was sequenced. Together, these new regions and the trnL-trnF complex microsatellite allowed 23 haplotypes to be characterized. Many were found in only one or a few samples (probably due to low sampling density), but some commoner haplotypes were widespread. Most of the genetic variation was found within rather than between populations (83 vs. 18%, respectively). Two haplotypes occurred from the Spanish Pyrenees to Sweden.
Plastid DNA data can be used to gain an understanding of patterns of genetic variation and seed-mediated gene flow in orchids. Although these data are less information-rich than those for nuclear DNA, they present a useful option for studying species with large genomes. Here they support the hypothesis of long-distance seed dispersal often proposed for orchids.
Cypripedium calceolus is one of the rarest plant species in the UK, with only one surviving original clump and some reintroduced plants present in the wild in northern England. It was formerly more widespread, although never common, and the reduction in its populations to near extinction has been attributed to the collection of plants for gardens and herbarium specimens (Nelson, 1994; Farrell, 1999). In addition to the plants in the wild, there are several plants of known UK origin in cultivation and three plants growing in semi-natural habitat which have been considered to be possible introductions from non-native material (Farrell, 1999; I. Taylor, Natural England, pers. comm.). One of these has been shown to be a member of the North American C. parviflorum complex (M. F. Fay et al., unpubl. res.), and is excluded from the present study.
Due to its extreme rarity in England, C. calceolus has been considered an important target for conservation activities ranging from site management, wardening, pollination, ex situ seed germination and, more recently, genetic fingerprinting. However, largely due to the large genome size of this species (1C = 32·4 pg, Bennett et al., 2000), the nuclear DNA-based techniques increasingly used for studies of rare and endangered species (Qamaruz-Zaman et al., 1998; Fay and Krauss, 2003), including amplified fragment length polymorphisms (AFLP, Vos et al., 1995) and random amplified polymorphic DNA (RAPD, Williams et al., 1990), have not proved effective (Fay and Cowan, 2001; Fay et al., 2002; Fay and Krauss, 2003). When the standard AFLP protocol is used with genomes as large as that in C. calceolus, levels of genetic variation are dramatically underestimated (Fay and Cowan, 2001; Fay et al., 2002, 2005; Leitch and Fay, 2008). Nuclear microsatellites, similarly, have been shown to be difficult to apply to organisms with large genomes (Garner, 2002).
To overcome this problem, Fay and Cowan (2001) investigated the use of plastid microsatellites as alternative markers. The plastid genome comprises a single, circular chromosome. Unlike the nuclear chromosomes, this exists as a single type inherited from one parent only (the female in orchids and most other angiosperms; Corriveau and Coleman, 1988) and shows relatively low levels of size variation (Chase and Palmer, 1989). The sequence of genes is well characterized, and in many cases the genes contain introns and/or are separated by intergenic spacers (IGS), both non-coding regions of DNA. At least some of these non-coding regions contain repetitive motifs, commonly consisting of either all adenine (A) or all thymine (T) residues or a combination of these two bases. When these regions contain higher numbers of repeating units (generally ten or more for single nucleotide repeats), they are referred to as plastid microsatellites. They show high levels of variability, and the length of a microsatellite can vary extensively within and between populations. They have proved especially useful in studies of biogeography, and in Europe they have been used to study biogeographical patterns in a range of orchids, including Orchis spp. (Qamaruz-Zaman, 2000; Fay et al., 2007; Bateman et al., 2008), Dactylorhiza spp. (Shipunov et al., 2004, 2005; Pillon et al., 2007) and Spiranthes romanzoffiana (Forrest et al., 2004). In addition, non-coding plastid DNA often also includes insertions/deletions (indels) due to tandem repeats of longer motifs (minisatellites) that can vary in length from a few to hundreds of base pairs (bp), for example in Sorbus (King and Ferris, 2002) and Anacamptis (Cozzolino et al., 2004).
The placement of these length-variable regions varies from taxon to taxon, and an initial survey of plastid DNA is necessary to locate them. Once suitable regions are identified, primers are designed to amplify fragments containing them. Amplification products are obtained for all individuals under study, and their length in base pairs is determined. This technique has a general advantage over multilocus genetic fingerprinting methods such as RAPD and AFLP in that only one short region is amplified per reaction, making the technique less sensitive to DNA quality and quantity. It is therefore, at least potentially, applicable to DNA samples extracted under less than ideal circumstances, e.g. from herbarium material (Fay and Cowan, 2001; Cozzolino et al., 2007).
Here we report the development of a set of length-variable fragments of plastid DNA for C. calceolus and their use in analysing patterns of genetic variation in this species. For plants in England, these data are used to infer the origin of putatively non-native material and to inform the choice of material to be used for reintroduction purposes.
The plant material used in this study is listed in Table 1. Detailed localities are not given, due to the sensitivity surrounding many of the sites. New specimens were collected into silica gel using the method of Chase and Hills (1991). These included the one remaining wild plant and all living plants of known wild origin in England. Representative samples from elsewhere in Europe were included for comparative purposes. In Denmark, only two populations are known, and 34 plants were sampled from these. Because these had previously been shown to be genetically invariant (I. Kahandawala et al., unpubl. res.), however, only three representative samples are included from each population here. DNA was extracted from silica-gel dried leaf material, using a modified 2× CTAB (cetyltrimethyl-ammonium bromide) procedure (Doyle and Doyle, 1987) followed by purification on a caesium chloride gradient or columns following the manufacturers' protocols.
Length-variable regions flanked by conserved regions were sought in the accD-psa1 IGS and the rps16 intron. These were amplified using primers (Table 2) given in Small et al. (1998) and Oxelman et al. (1997), respectively, from three representative samples (English, Polish and Swiss) of C. calceolus and sequenced using modified dideoxy cycle sequencing with dye terminators run on an ABI 377 automated sequencer. The sequences were aligned manually in PAUP* 4·0b4 (Swofford, 2002), and variable regions were identified. Primers to amplify the length-variable regions in the rps16 intron were designed in conserved flanking regions (Table 3). For each region, one primer was labelled with a fluorescent dye to enable the amplification products to be visualized using an ABI 377 automated sequencer. Sizes (bp) were determined using GeneScan 3·1 and Genotyper 2·0 (Applied Biosystems Inc., Foster City, CA, USA). Due to the AT-rich nature of the flanking regions for the length-variable regions in accD-psa1 (see Results), primers could not be designed to amplify these regions. The entire IGS was therefore sequenced for all individuals to allow these regions to be scored. Sequences containing unique or rare indels were redone from new PCR products to test for reproducibility and to exclude possible Taq-induced artefacts. The new regions identified here and one identified by Fay and Cowan (2001) in the trnL-trnF IGS (cyp2) were scored for all individuals. [The other region in the trnL intron, orch1, used by Fay and Cowan (2001), proved to be invariant in a subset of the samples and was not used further.]
A matrix was prepared, using different numbers for alleles of different lengths of the microsatellite regions and 1 vs. 2 (see Table 4) for the simple indels. Where two or more indel events occurred at the same position in different samples (presumed to be single indels of different lengths rather than sequential indels), the alleles were numbered from 1 to 3 or 4, as appropriate (see Table 4).
Haplotypes numbered 1–23 were defined by the combination of the alleles for the 14 different loci. The haplotype definitions are given in Table 5.
Haplotype frequencies in the sampled populations were estimated using ARLEQUIN software (Schneider et al., 2000). Genetic diversity was estimated as the number of haplotypes, number of polymorphic sites and gene diversity using ARLEQUIN and haplotype richness following Petit et al. (1998) using the CONTRIB program. Haplotype richness was corrected for differences in sample size using the rarefaction method. To provide robust estimates, the sample size of the smallest population sample (n = 3) was used for rarefaction. The distribution of plastid DNA diversity within and among populations was studied using analysis of molecular variance (AMOVA, Excoffier et al., 1992) as implemented in ARLEQUIN, and the significance of the AMOVA was tested based on 10 000 permutations of haplotypes among populations. A comparison of the distribution of diversity for ordered vs. unordered haplotypes was carried out following Pons and Petit (1996) using the PERMUT/cpSSR software. Two different methods were used for obtaining ordered haplotype information: (1) using the number of mutation steps between haplotypes (genetic divergence estimated as NST), and (2) making use of information on repeat number of each polymorphic site (genetic divergence estimated as RST). The latter method was specifically designed for microsatellites, but the former method makes fewer assumptions about the nature and complexity of mutational patterns detected by fragment length analysis (Pons and Petit, 1996). A median-joining (MJ) network (Bandelt et al., 1999) was constructed based on plastid DNA haplotypes using the program NETWORK 4·2·0·1. (www.fluxus-engineering.com). This method is able to resolve even complex haplotype patterns (Posada and Crandall, 2001). It uses parsimony criteria to identify median vectors, i.e. consensus sequences of mutually close sequences of markers that are biologically equivalent to possible unsampled or extinct ancestral haplotypes. To reduce complexity of the network, rapidly mutating characters were downweighed in the analysis.
Two homopolymer microsatellites (both A on the sense strand) were detected in the rps16 intron, each consisting of 11 or 12 bp in the samples studied. In the full sampling, ten indels ranging in size from 6 to 21 bp and one homopolymer microsatellite (also A on the sense strand) of 11–12 bp were found in the accD-psa1 IGS. The amplified region of the accD-psa1 IGS was exceedingly AT-rich (approx. 80 %, depending on the sample) and most of the length-variable regions were found in a fragment of 461 bp in the aligned matrix (positions 328–788), in which only one base (a G at position 581) was not A or T. In all cases where sequences were redone from new PCR products, all the same indels were found. Representative sequences are available from the first author (firstname.lastname@example.org).
All 14 length-variable regions were successfully scored for all samples, and all loci were variable. The final matrix contained information for 84 (111 if all 34 Danish samples were included) individuals. The alleles found are presented in Table 4. Twenty-three haplotypes were obtained from the combined alleles for the 14 regions (Table 5).
Haplotype H6 was the most common. Haplotypes H14 and H16 were also relatively common, and these three haplotypes were also widespread (see Fig. 1, Table 6). Two (H6 and H16) were found in individuals ranging from the Spanish Pyrenees to northern Sweden. Other haplotypes, at this level of sampling, were more localized, with some only being represented by one individual.
The more diverse populations in terms of gene diversity and haplotype richness were Eastern Alps, France, Poland/Estonia and Switzerland (Table 6). If haplotype numbers and numbers of polymorphic sites are considered in addition, then Sweden/Dalarna also emerges as being among the more variable populations (eight haplotypes found among 37 specimens, eight polymorphic sites; Table 7). A high proportion of the genetic variability in the haplotype data was attributed to the within-population level (82·5 %) whereas only 17·5 % resided among populations (Table 8). Genetic divergence estimated with ordered haplotypes was not significantly different from divergence for unordered haplotypes, regardless of whether RST or NST was used to estimate divergence for ordered haplotypes. The results for the NST analysis are shown in Table 9.
The MJ network was composed of a predominantly Western European torso of closely related haplotypes (just one mutation step between nodes) and a number of predominantly Central European branches made up of more divergent haplotypes (up to four mutation steps between nodes; Fig. 1). Haplotypes H6, H14 and H16 were the most common. Each of these haplotypes was found in a central position in the network (at least three connections branching out from each haplotype), thus indicating that these are older, ancestral haplotypes that gave rise to less common, derived haplotypes found elsewhere in the network.
Five haplotypes (H2, H10, H14, H17 and H23) were found among individuals from England (Table 6). Of these, H14 was widespread, also being found in France, Switzerland and Sweden. Haplotypes H10, H17 and H23 were also found in France and Spain, Spain, and Sweden, respectively. Haplotype H2 was only found in England in one of the putatively introduced plants (the other putative introduction shared H17 with one plant from Spain). In the MJ network, this haplotype falls on a branch with three other haplotypes: H4 and H5 (both Poland/Estonia) and H8 (Sweden/Dalarna). This supports the suggestion that this plant is non-native.
Plastid DNA polymorphisms are here shown to have the capacity to reveal genetic variability in C. calceolus, providing an alternative to the use of nuclear DNA, which in this species is compromised by the large nuclear genome size. Although data from plastid DNA are generally less information-rich due to the uniparental pattern of inheritance, in this case genetic fingerprinting using AFLP appears to have given a dramatic underestimate of levels of genetic variation (Fay and Cowan, 2001; Fay et al., 2002, 2005; Leitch and Fay, 2008). In a separate study, aiming at developing nuclear microsatellites for C. calceolus, only two out of seven which gave scorable results were polymorphic (I. Kahandawala et al., unpubl. res.), possibly indicating a similar problem.
The A/T-rich region of the accD-psa1 IGS was particularly informative in this case, and the extra effort required in terms of sequencing the fragment rather than sizing it (as for the other loci) was repaid by the increase in the number of haplotypes revealed. The percentage A/T approaches 100 % for 461 bp, making this the most A/T-rich region of DNA of which we are aware.
Central European material, despite relatively sparse sampling, appears to be genetically more variable than northern and western material. In principle, the high levels of genetic diversity in the Eastern Alps and Switzerland (Table 7) are compatible with two hypotheses: (1) the presence of refugial populations in Central Europe, in regions surrounding the ice sheets that covered the Alps during the Weichselian glaciation; and (2) genetic admixture in this region involving two or more postglacial colonization routes from unsampled southern/south-eastern European refugia. A study of genetic structure with nuclear markers, including the estimation of admixture proportions for individual plants, would provide additional insight (although this may be difficult to achieve due to the problems with genome size discussed above).
The high level of genetic diversity detected in Poland/Estonia (Table 7) corroborates the findings of Kull & Paaver (1997) and Brzosko et al. (2002). The former, studying three strongly polymorphic aspartate aminotransferase loci, observed high levels of heterozygosity in seven Estonian populations of C. calceolus (Ho = 0·40–0·53). An eighth population was less genetically diverse (Ho = 0·16), which was ascribed to a founder effect. Brzosko et al. (2002) examined 11 allozyme loci in three populations from the Biebrza valley, north-eastern Poland, and assessed the genetic diversity as percentage polymorphic loci (P), mean number of alleles per locus (A), and observed (Ho) and expected (He) heterozygosity. They found diversity within populations to be relatively high compared with rare taxa and taxa with similar life histories (P = 45·5 % in all populations; A = 0·184; Ho = 0·184 on average; He = 0·156 on average). The genetic variation found between populations was even smaller (1·6 %) than in the present study.
The long branches in the predominantly Central European section of the haplotype network (Fig. 1) may be indicative of the presence of genetic material derived from a different post-glacial colonization route (or routes), possibly stemming from unsampled populations to the east of the region on which the present study was focused. Central European and eastern material remains undersampled and is an obvious focus for future studies, although the difficulty of obtaining samples due to legal protection of this species in many countries may make this problematic. This would potentially clarify the number of refugia that existed for this species during the last ice age. The variability found in Swedish material may indicate that Sweden was recolonized by individuals from two or more refugia, migrating along western and central/eastern routes.
The absence of a significant difference between the divergences for ordered haplotypes and unordered haplotypes (Table 9) indicates the absence of marked phylogeographical structure in the data, i.e. seed-mediated gene flow was more important than mutation in generating the observed genetic structure. This may obscure the geographical patterns, even if more central and eastern samples are obtained.
Orchid seeds are commonly said to be capable of long-distance dispersal by the wind due to their small size. Darwin (1877, pp. 278–279) stated that ‘The minute seeds within their light coats are well fitted for wide dissemination; and I have several times observed seedlings springing up in my orchard and in a newly-planted wood, which must have come from a considerable distance. This was especially the case with Epipactis latifolia; and an instance has been recorded by a good observer [Mr. Bree, in “Loudon's Mag. of Nat. Hist,” vol. ii. 1829, p. 70.] of seedlings of this plant appearing at the distance of between eight and ten miles from any place where it grew.’ Summerhayes (1951) stated that ‘the seeds are certainly very light, and easily carried for long distances by the wind’ and ‘the seeds are shaken out a few at a time, and are distributed far and wide. These seeds, as in the case with others distributed by the wind, are probably carried upwards, by ascending columns of hot air, to very considerable heights, and then transported long distances in the upper atmosphere before sinking slowly to earth again.’ In their extensive review on orchid seeds, Arditti and Ghani (2000) tabulated published dispersal distances of up to 2000 km for orchid seeds. In contrast, some authors have reported that most orchid seeds fall within only a short distance of the source plant. In Spiranthes spiralis, the vast majority of seeds were demonstrated to fall within 15 cm of the source plant (Machon et al., 2003). In Anacamptis morio, Dactylorhiza majalis and Pseudorchis albida, Jersáková and Malinová (2007) demonstrated that the proportion of seeds landing more than 1 m away from the source plant approached zero. In Orchis purpurea, Jacquemyn et al. (2007) demonstrated ‘rather limited seed dispersal distances’ (approx. 4–5 m), and in Cypripedium macranthos, Chung et al. (2009) stated that ‘most seeds fall close to the maternal plants’ on the basis of the significant fine-scale spatial genetic structure they found. Although orchid seeds do have the potential for dispersal over great distances it is thus probable that the pattern of dispersal follows a leptokurtic distribution in most (if not all) cases.
In the present study, some of the haplotypes found in C. calceolus were widely distributed, in extreme cases from the Pyrenees to northern Sweden. This and the highly homogeneous genetic structure revealed by AMOVA (Table 8) can be interpreted as support for wide-ranging seed dispersal in these light-seeded, wind-dispersed orchids, at least as an occasional (or rare) occurrence. Our data do not allow us to say whether dispersal occurred from the Pyrenees to Sweden in one or more than one step, but the hypervariable nature of the loci involved leads us to believe that it is unlikely to represent more than a few seed dispersal events. An alternative explanation is that both the Pyrenees and Sweden were colonized by individuals from geographically intermediate populations in glacial refugia. This explanation, however, still involves dispersal over substantial distances.
On the basis of the results presented here, English Nature (now Natural England) reached the decision that seedlings resulting from self-pollination of the two putatively introduced individuals of C. calceolus or cross pollination of these individuals with other plants in England should be excluded from the reintroduction programme for this species. Despite the extreme demographic bottleneck suffered by this species in England, a higher level of genetic diversity has been maintained than in the larger populations in Denmark. Data for nuclear microsatellites also show that the native English plants are genetically variable, whereas Danish plants are genetically homogeneous (I. Kahandawala et al., unpubl. res.). This should be taken into account for seed storage and other conservation activities. Due to the low level of sampling reported here, it is not possible to make firm recommendations about conservation activities elsewhere in Europe, but it appears that Central (and some Northern) European populations may be more diverse than those in Western Europe. Swedish populations are more diverse than those in Denmark, and this could be due to there being more than one migration route following the last glaciation. At the same time, the lack of genetic polymorphism in Danish populations could be due to a founder effect, as the present occurrence in the region of Jutland was probably founded by long-distance seed dispersal. Thus, the sampled populations (discovered as late as 1884 and 1968, respectively) are the only Danish populations known since 1767 and the only ones ever known from Jutland.
This work was funded by the Royal Botanic Gardens, Kew, and English Nature (now part of Natural England) as part of the Biodiversity Action Plan for C. calceolus. We thank all colleagues who provided plant material, particularly Fred Campbell (Sweden), Marcin Zych and Maciej Romañski (Poland), José M. Iriondo (Spain), Peter Schönswetter and Andreas Tribsch (Austria) and Tiiu Kull (Estonia). We also thank Mark Chase, Phil Cribb and Ilia Leitch (Kew) and Ian Taylor (Natural England) for useful comments on this project and Olivier Maurin and Robyn Cowan for support in the laboratory.