The chloroplast genes matK and rbcL have been proposed as a “core” DNA barcode for identifying plant species. Published estimates of successful species identification using these loci (70-80%) may be inflated because they may have involved comparisons among distantly related species within target genera. To assess the ability of the proposed two-locus barcode to discriminate closely related species, we carried out a hierarchically structured set of comparisons within Viburnum, a clade of woody angiosperms containing ca. 170 species (some 70 of which are currently used in horticulture). For 112 Viburnum species, we evaluated rbcL + matK, as well as the chloroplast regions rpl32-trnL, trnH-psbA, trnK, and the nuclear ribosomal internal transcribed spacer region (nrITS).
At most, rbcL + matK could discriminate 53% of all Viburnum species, with only 18% of the comparisons having genetic distances >1%. When comparisons were progressively restricted to species within major Viburnum subclades, there was a significant decrease in both the discriminatory power and the genetic distances. trnH-psbA and nrITS show much higher levels of variation and potential discriminatory power, and their use in plant barcoding should be reconsidered. As barcoding has often been used to discriminate species within local areas, we also compared Viburnum species within two regions, Japan and Mexico and Central America. Greater success in discriminating among the Japanese species reflects the deeper evolutionary history of Viburnum in that area, as compared to the recent radiation of a single clade into the mountains of Latin America.
We found very low levels of discrimination among closely related species of Viburnum, and low levels of variation in the proposed barcoding loci may limit success within other clades of long-lived woody plants. Inclusion of the supplementary barcodes trnH-psbA and nrITS increased discrimination rates but were often more effective alone rather than in combination with rbcL + matK. We surmise that the efficacy of barcoding in plants has often been overestimated because of the lack of comparisons among closely related species. Phylogenetic information must be incorporated to properly evaluate relatedness in assessing the utility of barcoding loci.
The trnH–psbA intergenic spacer region has been used in many DNA barcoding studies. However, a comprehensive evaluation with rigorous sequence preprocessing and statistical testing on the utility of trnH–psbA and its combinations as DNA barcodes is lacking.
Sequences were searched from GenBank for a meta-analysis on the usefulness of trnH–psbA and its combinations as DNA barcodes. After preprocessing, we constructed full and matching data sets that contained 17 983 trnH–psbA sequences and 2190 sets of trnH–psbA, matK, rbcL, and ITS2 sequences from the same sample, repectively. These datasets were used to analyze the ability of trnH–psbA and its combinations to discriminate species by the BLAST and BLAST+P methods. The Fisher's exact test was used to evaluate the significance of performance differences. For the full data set, the identification success rates of trnH–psbA exceeded 70% in 18 families and 12 genera, respectively. For the matching data set, the identification rates of trnH–psbA were significantly higher than those of the other loci in two families and four genera. Similarly, the identification rates of trnH–psbA+ITS2 were significantly higher than those of matK+rbcL in 18 families and 21 genera.
This study provides valuable information on the higher utility of trnH–psbA and its combinations. We found that trnH–psbA+ITS2 combination performs better or equally well compared with other combinations in most taxonomic groups investigated. This information will guide the optimal usage of trnH–psbA and its combinations for species identification.
The rapidly increasing number of available plant genomes opens up almost unlimited prospects for biology in general and molecular phylogenetics in particular. A recent study took advantage of this data and identified a set of nuclear genes that occur in single copy in multiple sequenced angiosperms. The present study is the first to apply genomic sequence of one of these low copy genes, agt1, as a phylogenetic marker for species-level phylogenetics. Its utility is compared to the performance of several coding and non-coding chloroplast loci that have been suggested as most applicable for this taxonomic level. As a model group, we chose Tildenia, a subgenus of Peperomia (Piperaceae), one of the largest plant genera. Relationships are particularly difficult to resolve within these species rich groups due to low levels of polymorphisms and fast or recent radiation. Therefore, Tildenia is a perfect test case for applying new phylogenetic tools.
We show that the nuclear marker agt1, and in particular the agt1 introns, provide a significantly increased phylogenetic signal compared to chloroplast markers commonly used for low level phylogenetics. 25% of aligned characters from agt1 intron sequence are parsimony informative. In comparison, the introns and spacer of several common chloroplast markers (trnK intron, trnK-psbA spacer, ndhF-rpl32 spacer, rpl32-trnL spacer, psbA-trnH spacer) provide less than 10% parsimony informative characters. The agt1 dataset provides a deeper resolution than the chloroplast markers in Tildenia.
Single (or very low) copy nuclear genes are of immense value in plant phylogenetics. Compared to other nuclear genes that are members of gene families of all sizes, lab effort, such as cloning, can be kept to a minimum. They also provide regions with different phylogenetic content deriving from coding and non-coding parts of different length. Thus, they can be applied to a wide range of taxonomic levels from family down to population level. As more plant genomes are sequenced, we will obtain increasingly precise information about which genes return to single copy most rapidly following gene duplication and may be most useful across a wide range of plant groups.
DNA barcoding will revolutionize our understanding of fern ecology, most especially because the accurate identification of the independent but cryptic gametophyte phase of the fern's life history—an endeavor previously impossible—will finally be feasible. In this study, we assess the discriminatory power of the core plant DNA barcode (rbcL and matK), as well as alternatively proposed fern barcodes (trnH-psbA and trnL-F), across all major fern lineages. We also present plastid barcode data for two genera in the hyperdiverse polypod clade—Deparia (Woodsiaceae) and the Cheilanthes marginata group (currently being segregated as a new genus of Pteridaceae)—to further evaluate the resolving power of these loci.
Our results clearly demonstrate the value of matK data, previously unavailable in ferns because of difficulties in amplification due to a major rearrangement of the plastid genome. With its high sequence variation, matK complements rbcL to provide a two-locus barcode with strong resolving power. With sequence variation comparable to matK, trnL-F appears to be a suitable alternative barcode region in ferns, and perhaps should be added to the core barcode region if universal primer development for matK fails. In contrast, trnH-psbA shows dramatically reduced sequence variation for the majority of ferns. This is likely due to the translocation of this segment of the plastid genome into the inverted repeat regions, which are known to have a highly constrained substitution rate.
Our study provides the first endorsement of the two-locus barcode (rbcL+matK) in ferns, and favors trnL-F over trnH-psbA as a potential back-up locus. Future work should focus on gathering more fern matK sequence data to facilitate universal primer development.
A universal barcode system for land plants would be a valuable resource, with potential utility in fields as diverse as ecology, floristics, law enforcement and industry. However, the application of plant barcoding has been constrained by a lack of consensus regarding the most variable and technically practical DNA region(s). We compared eight candidate plant barcoding regions from the plastome and one from the mitochondrial genome for how well they discriminated the monophyly of 92 species in 32 diverse genera of land plants (N = 251 samples). The plastid markers comprise portions of five coding (rpoB, rpoC1, rbcL, matK and 23S rDNA) and three non-coding (trnH-psbA, atpF–atpH, and psbK–psbI) loci. Our survey included several taxonomically complex groups, and in all cases we examined multiple populations and species. The regions differed in their ability to discriminate species, and in ease of retrieval, in terms of amplification and sequencing success. Single locus resolution ranged from 7% (23S rDNA) to 59% (trnH-psbA) of species with well-supported monophyly. Sequence recovery rates were related primarily to amplification success (85–100% for plastid loci), with matK requiring the greatest effort to achieve reasonable recovery (88% using 10 primer pairs). Several loci (matK, psbK–psbI, trnH-psbA) were problematic for generating fully bidirectional sequences. Setting aside technical issues related to amplification and sequencing, combining the more variable plastid markers provided clear benefits for resolving species, although with diminishing returns, as all combinations assessed using four to seven regions had only marginally different success rates (69–71%; values that were approached by several two- and three-region combinations). This performance plateau may indicate fundamental upper limits on the precision of species discrimination that is possible with DNA barcoding systems that include moderate numbers of plastid markers. Resolution to the contentious debate on plant barcoding should therefore involve increased attention to practical issues related to the ease of sequence recovery, global alignability, and marker redundancy in multilocus plant DNA barcoding systems.
Species identification of living organisms by standard DNA sequences has been well-accepted. Consortium for the Barcode of Life (CBOL) recommends chloroplast regions rbcL and matK as the DNA barcodes for the land plants. This study aims to evaluate the feasibility and limitations of rbcL, matK, and 5 other commonly used regions as the DNA barcodes for the medicinal Gentiana and their adulterants, Gentiana. rhodantha and Podophyllum hexandrum.
The species differentiation power of rbcL, matK, nuclear internal transcribed spacer (ITS) and 5S rRNA intergenic spacer, and chloroplast trnH-psbA, trnL-F and rpl36-rps8 intergenic spacers were tested in different medicinal Gentiana, including Gentiana scabra, Gentiana triflora, Gentiana manshurica and Gentiana rigescens, from common adulterants such as Gentiana rhodantha and Podophyllum hexandrum (a toxic herb producing podophyllotoxin).
All seven tested loci could be used to differentiate medicinal Gentiana species from their adulterants, and to distinguish Guanlongdan from Jianlongdan. In terms of general differentiation powers, rbcL and matK had no significant advantages over the other five loci. Only the 5S rRNA and trnL-F intergenic spacers were able to discriminate the closely related species G. triflora, G. scabra and G. manshurica.
The DNA barcodes rbcL and matK are useful in differentiation of closely related medicinal species of Gentiana, but had no significant advantages over the other five tested loci.
Although consensus has now been reached on a general two-locus DNA barcode for land plants, the selected combination of markers (rbcL + matK) is not applicable for ferns at the moment. Yet especially for ferns, DNA barcoding is potentially of great value since fern gametophytes—while playing an essential role in fern colonization and reproduction—generally lack the morphological complexity for morphology-based identification and have therefore been underappreciated in ecological studies. We evaluated the potential of a combination of rbcL with a noncoding plastid marker, trnL-F, to obtain DNA-identifications for fern species. A regional approach was adopted, by creating a reference database of trusted rbcL and trnL-F sequences for the wild-occurring homosporous ferns of NW-Europe. A combination of parsimony analyses and distance-based analyses was performed to evaluate the discriminatory power of the two-region barcode. DNA was successfully extracted from 86 tiny fern gametophytes and was used as a test case for the performance of DNA-based identification. Primer universality proved high for both markers. Based on the combined rbcL + trnL-F dataset, all genera as well as all species with non-equal chloroplast genomes formed their own well supported monophyletic clade, indicating a high discriminatory power. Interspecific distances were larger than intraspecific distances for all tested taxa. Identification tests on gametophytes showed a comparable result. All test samples could be identified to genus level, species identification was well possible unless they belonged to a pair of Dryopteris species with completely identical chloroplast genomes. Our results suggest a high potential of the combined use of rbcL and trnL-F as a two-locus cpDNA barcode for identification of fern species. A regional approach may be preferred for ecological tests. We here offer such a ready-to-use barcoding approach for ferns, which opens the way for answering a whole range of questions previously unaddressed in fern gametophyte ecology.
Phylogenetic relationship between the nine species of Eleusine was investigated based on RFLP of the seven amplified chloroplast genes/intergenic spacers, trnK gene sequence and cpSSR markers. The maternal genome donor (E. indica, 2n=2x=18) of the allotetraploid (2n=4x=36, 2n=2x=38) Eleusine species, and the phylogenetic relationships between cultivated E. coracana (2n=4x=36) and wild species have been successfully resolved. The species-specific markers were also identified. The explicit identification of the maternal parent and that of the immediate wild progenitor of finger millet will be immensely useful for future genetic improvement and biotechnological program(s) of the crop species.
Assessment of phylogenetic relationships is an important component of any successful crop improvement programme, as wild relatives of the crop species often carry agronomically beneficial traits. Since its domestication in East Africa, Eleusine coracana (2n = 4x = 36), a species belonging to the genus Eleusine (x = 8, 9, 10), has held a prominent place in the semi-arid regions of India, Nepal and Africa. The patterns of variation between the cultivated and wild species reported so far and the interpretations based upon them have been considered primarily in terms of nuclear events. We analysed, for the first time, the phylogenetic relationship between finger millet (E. coracana) and its wild relatives by species-specific chloroplast deoxyribonucleic acid (cpDNA) polymerase chain reaction–restriction fragment length polymorphism (PCR–RFLP) and chloroplast simple sequence repeat (cpSSR) markers/sequences. Restriction fragment length polymorphism of the seven amplified chloroplast genes/intergenic spacers (trnK, psbD, psaA, trnH–trnK, trnL–trnF, 16S and trnS–psbC), nucleotide sequencing of the chloroplast trnK gene and chloroplast microsatellite polymorphism were analysed in all nine known species of Eleusine. The RFLP of all seven amplified chloroplast genes/intergenic spacers and trnK gene sequences in the diploid (2n = 16, 18, 20) and allotetraploid (2n = 36, 38) species resulted in well-resolved phylogenetic trees with high bootstrap values. Eleusine coracana, E. africana, E. tristachya, E. indica and E. kigeziensis did not show even a single change in restriction site. Eleusine intermedia and E. floccifolia were also shown to have identical cpDNA fragment patterns. The cpDNA diversity in Eleusine multiflora was found to be more extensive than that of the other eight species. The trnK gene sequence data complemented the results obtained by PCR–RFLP. The maternal lineage of all three allotetraploid species (AABB, AADD) was the same, with E. indica being the maternal diploid progenitor species. The markers specific to certain species were also identified.
cpSSR; Eleusine; PCR–RFLP; phylogeny; Poaceae; trnK gene sequence.
Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations. Sequencing whole plastid genomes to find markers for evolutionary analyses is therefore particularly useful when overall genetic distances are low.
Chrysanthemum L. (Asteraceae-Anthemideae) is a genus with rapid speciation. It comprises about 40 species, most of which are distributed in East Asia. Many of these are narrowly distributed and habitat-specific. Considerable variations in morphology and ploidy are found in this genus. Some species have been the subjects of many studies, but the relationships between Chrysanthemum and its allies and the phylogeny of this genus remain poorly understood. In the present study, 32 species/varieties from Chrysanthemum and 11 from the allied genera were analyzed using DNA sequences of the single-copy nuclear CDS gene and seven cpDNA loci (psbA-trnH, trnC-ycf6, ycf6-psbM, trnY-rpoB, rpS4-trnT, trnL-F, and rpL16). The cpDNA and nuclear CDS gene trees both suggest that 1) Chrysanthemum is not a monophyletic taxon, and the affinity between Chrysanthemum and Ajania is so close that these two genera should be incorporated taxonomically; 2) Phaeostigma is more closely related to the Chrysanthemum+Ajania than other generic allies. According to pollen morphology and to the present cpDNA and CDS data, Ajania purpurea is a member of Phaeostigma. Species differentiation in Chrysanthemum appears to be correlated with geographic and environmental conditions. The Chinese Chrysanthemum species can be divided into two groups, the C. zawadskii group and the C. indicum group. The former is distributed in northern China and the latter in southern China. Many polyploid species, such as C. argyrophyllum, may have originated from allopolyploidization involving divergent progenitors. Considering all the evidence from present and previous studies, we conclude that geographic and ecological factors as well as hybridization and polyploidy play important roles in the divergence and speciation of the genus Chrysanthemum.
DNA barcoding as a tool for species identification has been successful in animals and other organisms, including certain groups of plants. The exploration of this new tool for species identification, particularly in tree species, is very scanty from biodiversity-rich countries like India. rbcL and matK are standard barcode loci while ITS, and trnH-psbA are considered as supplementary loci for plants.
Methodology and Principal Findings
Plant barcode loci, namely, rbcL, matK, ITS, trnH-psbA, and the recently proposed ITS2, were tested for their efficacy as barcode loci using 300 accessions of tropical tree species. We tested these loci for PCR, sequencing success, and species discrimination ability using three methods. rbcL was the best locus as far as PCR and sequencing success rate were concerned, but not for the species discrimination ability of tropical tree species. ITS and trnH-psbA were the second best loci in PCR and sequencing success, respectively. The species discrimination ability of ITS ranged from 24.4 percent to 74.3 percent and that of trnH-psbA was 25.6 percent to 67.7 percent, depending upon the data set and the method used. matK provided the least PCR success, followed by ITS2 (59. 0%). Species resolution by ITS2 and rbcL ranged from 9.0 percent to 48.7 percent and 13.2 percent to 43.6 percent, respectively. Further, we observed that the NCBI nucleotide database is poorly represented by the sequences of barcode loci studied here for tree species.
Although a conservative approach of a success rate of 60–70 percent by both ITS and trnH-psbA may not be considered as highly successful but would certainly help in large-scale biodiversity inventorization, particularly for tropical tree species, considering the standard success rate of plant DNA barcode program reported so far. The recommended matK and rbcL primers combination may not work in tropical tree species as barcode markers.
Oncidium spp. produce commercially important orchid cut flowers. However, they are amenable to intergeneric and inter-specific crossing making phylogenetic identification very difficult. Molecular markers derived from the chloroplast genome can provide useful tools for phylogenetic resolution.
The complete chloroplast genome of the economically important Oncidium variety Onc. Gower Ramsey (Accession no. GQ324949) was determined using a polymerase chain reaction (PCR) and Sanger based ABI sequencing. The length of the Oncidium chloroplast genome is 146,484 bp. Genome structure, gene order and orientation are similar to Phalaenopsis, but differ from typical Poaceae, other monocots for which there are several published chloroplast (cp) genome. The Onc. Gower Ramsey chloroplast-encoded NADH dehydrogenase (ndh) genes, except ndhE, lack apparent functions. Deletion and other types of mutations were also found in the ndh genes of 15 other economically important Oncidiinae varieties, except ndhE in some species. The positions of some species in the evolution and taxonomy of Oncidiinae are difficult to identify. To identify the relationships between the 15 Oncidiinae hybrids, eight regions of the Onc. Gower Ramsey chloroplast genome were amplified by PCR for phylogenetic analysis. A total of 7042 bp derived from the eight regions could identify the relationships at the species level, which were supported by high bootstrap values. One particular 1846 bp region, derived from two PCR products (trnHGUG -psbA and trnFGAA-ndhJ) was adequate for correct phylogenetic placement of 13 of the 15 varieties (with the exception of Degarmoara Flying High and Odontoglossum Violetta von Holm). Thus the chloroplast genome provides a useful molecular marker for species identifications.
In this report, we used Phalaenopsis. aphrodite as a prototype for primer design to complete the Onc. Gower Ramsey genome sequence. Gene annotation showed that most of the ndh genes inOncidiinae, with the exception of ndhE, are non-functional. This phenomenon was observed in all of the Oncidiinae species tested. The genes and chloroplast DNA regions that would be the most useful for phylogenetic analysis were determined to be the trnHGUG-psbA and the trnFGAA-ndhJ regions. We conclude that complete chloroplast genome information is useful for plant phylogenetic and evolutionary studies in Oncidium with applications for breeding and variety identification.
The utility of DNA barcoding for identifying representative specimens of the circumpolar tree genus Fraxinus (56 species) was investigated. We examined the genetic variability of several loci suggested in chloroplast DNA barcode protocols such as matK, rpoB, rpoC1 and trnH-psbA in a large worldwide sample of Fraxinus species. The chloroplast intergenic spacer rpl32-trnL was further assessed in search for a potentially variable and useful locus. The results of the study suggest that the proposed cpDNA loci, alone or in combination, cannot fully discriminate among species because of the generally low rates of substitution in the chloroplast genome of Fraxinus. The intergenic spacer trnH-psbA was the best performing locus, but genetic distance-based discrimination was moderately successful and only resulted in the separation of the samples at the subgenus level. Use of the BLAST approach was better than the neighbor-joining tree reconstruction method with pairwise Kimura's two-parameter rates of substitution, but allowed for the correct identification of only less than half of the species sampled. Such rates are substantially lower than the success rate required for a standardised barcoding approach. Consequently, the current cpDNA barcodes are inadequate to fully discriminate Fraxinus species. Given that a low rate of substitution is common among the plastid genomes of trees, the use of the plant cpDNA “universal” barcode may not be suitable for the safe identification of tree species below a generic or sectional level. Supplementary barcoding loci of the nuclear genome and alternative solutions are proposed and discussed.
A DNA barcode is a DNA fragment used to identify species. For land plants, DNA fragments of plastid genome could be the primary consideration. Unfortunately, most of the plastid candidate barcodes lack species-level resolution. The identification of DNA barcodes of high resolution at species level is critical to the success of DNA barcoding in plants. We searched the available plastid genomes for the most variable regions and tested the best candidates using both a large number of tree species and seven well-sampled plant groups. Two regions of the plastid gene ycf1, ycf1a and ycf1b, were the most variable loci that were better than existing plastid candidate barcodes and can serve as a barcode of land plants. Primers were designed for the amplification of these regions, and the PCR success of these primers ranged from 82.80% to 98.17%. Of 420 tree species, 357 species could be distinguished using ycf1b, which was slightly better than the combination of matK and rbcL. For the well-sampled representative plant groups, ycf1b generally performed better than any of the matK, rbcL and trnH-psbA. We concluded that ycf1a or ycf1b is the most variable plastid genome region and can serve as a core barcode of land plants.
Based on the testing of several loci, predominantly against floristic backgrounds, individual or different combinations of loci have been suggested as possible universal DNA barcodes for plants. The present investigation was undertaken to check the applicability of the recommended locus/loci for congeneric species with Dendrobium species as an illustrative example.
Six loci, matK, rbcL, rpoB, rpoC1, trnH-psbA spacer from the chloroplast genome and ITS, from the nuclear genome, were compared for their amplification, sequencing and species discrimination success rates among multiple accessions of 36 Dendrobium species. The trnH-psbA spacer could not be considered for analysis as good quality sequences were not obtained with its forward primer. Among the tested loci, ITS, recommended by some as a possible barcode for plants, provided 100% species identification. Another locus, matK, also recommended as a universal barcode for plants, resolved 80.56% species. ITS remained the best even when sequences of investigated loci of additional Dendrobium species available on the NCBI GenBank (93, 33, 20, 18 and 17 of ITS, matK, rbcL, rpoB and rpoC1, respectively) were also considered for calculating the percent species resolution capabilities. The species discrimination of various combinations of the loci was also compared based on the 36 investigated species and additional 16 for which sequences of all the five loci were available on GenBank. Two-locus combination of matK+rbcL recommended by the Plant Working Group of Consortium for Barcoding of Life (CBOL) could discriminate 86.11% of 36 species. The species discriminating ability of this barcode was reduced to 80.77% when additional sequences available on NCBI were included in the analysis. Among the recommended combinations, the barcode based on three loci - matK, rpoB and rpoC1- resolved maximum number of species.
Any recommended barcode based on the loci tested so far, is not likely to provide 100% species identification across the plant kingdom and thus is not likely to act as a universal barcode. It appears that barcodes, if based on single or limited locus(i), would be taxa specific as is exemplified by the success of ITS among Dendrobium species, though it may not be suitable for other plants because of the problems that are discussed.
Dendrobium; DNA barcoding; ITS; matK
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level.
Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species.
A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
The first comprehensive molecular phylogenetic reconstruction of the Cichorieae subtribe Lactucinae is provided. Sequences for two datasets, one of the nuclear rDNA ITS region, the other of five concatenated non-coding chloroplast DNA markers including the petD region and the psbA-trnH, 5′trnL(UAA)-trnF, rpl32-trnL(UAG) and trnQ(UUG)-5′rps16 spacers, were, with few exceptions, newly generated for 130 samples of 78 species. The sampling spans the entire subtribe Lactucinae while focusing on its Chinese centre of diversity; more than 3/4 of the Chinese Lactucinae species are represented. The nuclear and plastid phylogenies inferred from the two independent datasets show various hard topological incongruences. They concern the internal topology of major lineages, in one case the placement of taxa in major lineages, the relationships between major lineages and even the circumscription of the subtribe, indicating potential events of ancient as well as of more recent reticulation and chloroplast capture in the evolution of the subtribe. The core of the subtribe is clearly monophyletic, consisting of the six lineages, Cicerbita, Cicerbita II, Lactuca, Melanoseris, Notoseris and Paraprenanthes. The Faberia lineage and the monospecific Prenanthes purpurea lineage are part of a monophyletic subtribe Lactucinae only in the nuclear or plastid phylogeny, respectively. Morphological and karyological support for their placement is considered. In the light of the molecular phylogenetic reconstruction and of additional morphological data, the conflicting taxonomies of the Chinese Lactuca alliance are discussed and it is concluded that the major lineages revealed are best treated at generic rank. An improved species level taxonomy of the Chinese Lactucinae is outlined; new synonymies and some new combinations are provided.
The concept of DNA barcoding for species identification has gained considerable momentum in animals because of fairly successful species identification using cytochrome oxidase I (COI). In plants, matK and rbcL have been proposed as standard barcodes. However, barcoding in complex genera is a challenging task.
Methodology and Principal Findings
We investigated the species discriminatory power of four reportedly most promising plant DNA barcoding loci (one from nuclear genome- ITS, and three from plastid genome- trnH-psbA, rbcL and matK) in species of Indian Berberis L. (Berberidaceae) and two other genera, Ficus L. (Moraceae) and Gossypium L. (Malvaceae). Berberis species were delineated using morphological characters. These characters resulted in a well resolved species tree. Applying both nucleotide distance and nucleotide character-based approaches, we found that none of the loci, either singly or in combinations, could discriminate the species of Berberis. ITS resolved all the tested species of Ficus and Gossypium and trnH-psbA resolved 82% of the tested species in Ficus. The highly regarded matK and rbcL could not resolve all the species. Finally, we employed amplified fragment length polymorphism test in species of Berberis to determine their relationships. Using ten primer pair combinations in AFLP, the data demonstrated incomplete species resolution. Further, AFLP analysis showed that there was a tendency of the Berberis accessions to cluster according to their geographic origin rather than species affiliation.
We reconfirm the earlier reports that the concept of universal barcode in plants may not work in a number of genera. Our results also suggest that the matK and rbcL, recommended as universal barcode loci for plants, may not work in all the genera of land plants. Morphological, geographical and molecular data analyses of Indian species of Berberis suggest probable reticulate evolution and thus barcode markers may not work in this case.
Although DNA barcoding has become a useful tool for species identification and biodiversity surveys in plant sciences, there remains little consensus concerning appropriate sampling strategies and the treatment of indels. To address these two issues, we sampled 39 populations for nine Taxus species across their entire ranges, with two to three individuals per population randomly sampled. We sequenced one core DNA barcode (matK) and three supplementary regions (trnH-psbA, trnL-trnF and ITS) for all samples to test the effects of sampling design and the utility of indels. Our results suggested that increasing sampling within-population did not change the clustering of individuals, and that meant within-population P-distances were zero for most populations in all regions. Based on the markers tested here, comparison of methods either including or excluding indels indicated that discrimination and nodal support of monophyletic groups were significantly increased when indels were included. Thus we concluded that one individual per population was adequate to represent the within-population variation in these species for DNA barcoding, and that intra-specific sampling was best focused on representing the entire ranges of certain taxa. We also found that indels occurring in the chloroplast trnL-trnF and trnH-psbA regions were informative to differentiate among for closely related taxa barcoding, and we proposed that indel-coding methods should be considered for use in future for closed related plant species DNA barcoding projects on or below generic level.
DNA barcoding; indel (gap) coding; sampling strategy; noncoding chloroplast regions; Taxus
Despite considerable progress, many details regarding the evolution of the Arcto-Tertiary flora, including the timing, direction, and relative importance of migration routes in the evolution of woody and herbaceous taxa of the Northern Hemisphere, remain poorly understood. Meehania (Lamiaceae) comprises seven species and five subspecies of annual or perennial herbs, and is one of the few Lamiaceae genera known to have an exclusively disjunct distribution between eastern Asia and eastern North America. We analyzed the phylogeny and biogeographical history of Meehania to explore how the Arcto-Tertiary biogeographic hypothesis and two possible migration routes explain the disjunct distribution of Northern Hemisphere herbaceous plants. Parsimony and Bayesian inference were used for phylogenetic analyses based on five plastid sequences (rbcL, rps16, rpl32-trnH, psbA-trnH, and trnL-F) and two nuclear (ITS and ETS) gene regions. Divergence times and biogeographic inferences were performed using Bayesian methods as implemented in BEAST and S-DIVA, respectively. Analyses including 11 of the 12 known Meehania taxa revealed incongruence between the chloroplast and nuclear trees, particularly in the positions of Glechoma and Meehania cordata, possibly indicating allopolyploidy with chloroplast capture in the late Miocene. Based on nrDNA, Meehania is monophyletic, and the North American species M. cordata is sister to a clade containing the eastern Asian species. The divergence time between the North American M. cordata and the eastern Asian species occurred about 9.81 Mya according to the Bayesian relaxed clock methods applied to the combined nuclear data. Biogeographic analyses suggest a primary role of the Arcto-Tertiary flora in the study taxa distribution, with a northeast Asian origin of Meehania. Our results suggest an Arcto-Tertiary origin of Meehania, with its present distribution most probably being a result of vicariance and southward migrations of populations during climatic oscillations in the middle Miocene with subsequent migration into eastern North America via the Bering land bridge in the late Miocene.
Crofton weed (Ageratina adenophora) is one of the most hazardous invasive plant species, which causes serious economic losses and environmental damages worldwide. However, the sequence resource and genome information of A. adenophora are rather limited, making phylogenetic identification and evolutionary studies very difficult. Here, we report the complete sequence of the A. adenophora chloroplast (cp) genome based on Illumina sequencing.
The A. adenophora cp genome is 150, 689 bp in length including a small single-copy (SSC) region of 18, 358 bp and a large single-copy (LSC) region of 84, 815 bp separated by a pair of inverted repeats (IRs) of 23, 755 bp. The genome contains 130 unique genes and 18 duplicated in the IR regions, with the gene content and organization similar to other Asteraceae cp genomes. Comparative analysis identified five DNA regions (ndhD-ccsA, psbI-trnS, ndhF-ycf1, ndhI-ndhG and atpA-trnR) containing parsimony-informative characters higher than 2%, which may be potential informative markers for barcoding and phylogenetic analysis. Repeat structure, codon usage and contraction of the IR were also investigated to reveal the pattern of evolution. Phylogenetic analysis demonstrated a sister relationship between A. adenophora and Guizotia abyssinica and supported a monophyly of the Asterales.
We have assembled and analyzed the chloroplast genome of A. adenophora in this study, which was the first sequenced plastome in the Eupatorieae tribe. The complete chloroplast genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family.
Rhododendron is a group of famous landscape plants with high medicinal value. However, there is no simple or universal manner to discriminate the various species of this group. Deoxyribonucleic acid (DNA) barcoding technique is a new biological tool that can accurately and objectively identify species by using short and standard DNA regions.
To choose a suitable DNA marker to authenticate the Rhododendron species.
Materials and Methods:
Four candidate DNA barcodes (rbcL, matK, psbAtrnH, and ITS2 intergenic spacer) were tested on 68 samples of 38 species.
The psbAtrnH candidate barcode yielded 86.8% sequencing efficiency. The highest interspecific divergence was provided by the psbA-trnH intergenic spacer, based on six parameters, and the Wilcoxon signed rank tests. Although there was not a clear barcoding gap, the Wilcoxon Two sample tests indicated that the interspecific divergence of the psbA-trnH intergenic spacer was significantly higher than the relevant intraspecific variation. The psbA-trnH DNA barcode possessed the highest species identification efficiency at 100% by the BLAST1 method. The present results showed that the psbA-trnH intergenic spacer was the most promising one of the four markers for barcoding the Rhododendron species. To further evaluate the ability of the psbA-trnH marker, to discriminate the closely related species, the samples were expanded to 94 samples of 53 species in the genus, and the rate of successful identification was 93.6%. The psbA-trnH region would be useful even for unidentified samples, as it could significantly narrow their possible taxa to a small area.
The psbA-trnH intergenic region is a valuable DNA marker for identifying the Rhododendron species.
Deoxyribonucleic acid barcoding; psbA-trnH; Rhododendron; species identification
This study aims to determine the candidate markers that can be used as DNA barcode in the Lauraceae family.
Material and Methods:
Polymerase chain reaction amplification, sequencing efficiency, differential intra- and interspecific divergences, DNA barcoding gap, and identification efficiency were used to evaluate the four different DNA sequences of psbA-trnH, matK, rbcL, and ITS2. We tested the discrimination ability of psbA-trnH in 68 plant samples belonging to 42 species from 11 distinct genera and found that the rate of successful identification with the psbA-trnH was 82.4% at the species level. However, the correct identification of matK and rbcL were only 30.9% and 25.0%, respectively, using BLAST1. The PCR amplification efficiency of the ITS2 region was poor; thus, ITS2 was not included in subsequent experiments. To verify the capacity of the identification of psbA-trnH in more samples, 175 samples belonging to 117 species from the experimental data and from the GenBank database of the Lauraceae family were tested.
Using the BLAST1 method, the identification efficiency were 84.0% and 92.3% at the species and genus level, respectively.
Therefore, psbA-trnH is confirmed as a useful marker for differentiating closely related species within Lauraceae.
Deoxyribonucleic acid barcoding; ITS2; Lauraceae; matK; psbA-trnH; rbcL
DNA barcoding is expected to be an effective identification tool for organisms with heteromorphic generations such as pteridophytes, which possess a morphologically simple gametophyte generation. Although a reference data set including complete coverage of the target local flora/fauna is necessary for accurate identification, DNA barcode studies including such rich taxonomic sampling on a countrywide scale are lacking.
The Japanese pteridophyte flora (733 taxa including subspecies and varieties) was used to test the utility of two plastid DNA barcode regions (rbcL and trnH-psbA) with the intention of developing an identification system for native gametophytes. DNA sequences were obtained from each of 689 (94.0%) taxa for rbcL and 617 (84.2%) taxa for trnH-psbA. Mean interspecific divergence values across all taxon pairs (K2P genetic distances) did not reveal a significant difference in rate between trnH-psbA and rbcL, but mean K2P distances of each genus showed significant heterogeneity according to systematic position. The minimum fail rate of taxon discrimination in an identification test using BLAST (12.52%) was obtained when rbcL and trnH-psbA were combined, and became lower in datasets excluding infraspecific taxa or apogamous taxa, or including sexual diploids only.
This study demonstrates the overall effectiveness of DNA barcodes for species identification in the Japanese pteridophyte flora. Although this flora is characterized by a high occurrence of apogamous taxa that pose a serious challenge to identification using DNA barcodes, such taxa are limited to a small number of genera, and only minimally detract from the overall success rate. In the case that a query sequence is matched to a known apogamous genus, routine species identification may not be possible. Otherwise, DNA barcoding is a practical tool for identification of most Japanese pteridophytes, and is especially anticipated to be helpful for identification of non-hybridizing gametophytes.
Background and Aims
Cypripedium calceolus, although widespread in Eurasia, is rare in many countries in which it occurs. Population genetics studies with nuclear DNA markers on this species have been hampered by its large nuclear genome size. Plastid DNA markers are used here to gain an understanding of variation within and between populations and of biogeographical patterns.
Thirteen length-variable regions (microsatellites and insertions/deletions) were identified in non-coding plastid DNA. These and a previously identified complex microsatellite in the trnL-trnF intergenic spacer were used to identify plastid DNA haplotypes for European samples, with sampling focused on England, Denmark and Sweden.
The 13 additional length-variable regions identified were two homopolymer (polyA) repeats in the rps16 intron and a homopolymer (polyA) repeat and ten indels in the accD-psa1 intergenic spacer. In accD-psa1, most of these were in an extremely AT-rich region, and it was not possible to design primers in the flanking regions; therefore, the whole intergenic spacer was sequenced. Together, these new regions and the trnL-trnF complex microsatellite allowed 23 haplotypes to be characterized. Many were found in only one or a few samples (probably due to low sampling density), but some commoner haplotypes were widespread. Most of the genetic variation was found within rather than between populations (83 vs. 18%, respectively). Two haplotypes occurred from the Spanish Pyrenees to Sweden.
Plastid DNA data can be used to gain an understanding of patterns of genetic variation and seed-mediated gene flow in orchids. Although these data are less information-rich than those for nuclear DNA, they present a useful option for studying species with large genomes. Here they support the hypothesis of long-distance seed dispersal often proposed for orchids.
Biogeography; Cypripedium calceolus; genome size; plastid microsatellites; population genetics; seed dispersal