The utility of DNA barcoding for identifying representative specimens of the circumpolar tree genus Fraxinus (56 species) was investigated. We examined the genetic variability of several loci suggested in chloroplast DNA barcode protocols such as matK, rpoB, rpoC1 and trnH-psbA in a large worldwide sample of Fraxinus species. The chloroplast intergenic spacer rpl32-trnL was further assessed in search for a potentially variable and useful locus. The results of the study suggest that the proposed cpDNA loci, alone or in combination, cannot fully discriminate among species because of the generally low rates of substitution in the chloroplast genome of Fraxinus. The intergenic spacer trnH-psbA was the best performing locus, but genetic distance-based discrimination was moderately successful and only resulted in the separation of the samples at the subgenus level. Use of the BLAST approach was better than the neighbor-joining tree reconstruction method with pairwise Kimura's two-parameter rates of substitution, but allowed for the correct identification of only less than half of the species sampled. Such rates are substantially lower than the success rate required for a standardised barcoding approach. Consequently, the current cpDNA barcodes are inadequate to fully discriminate Fraxinus species. Given that a low rate of substitution is common among the plastid genomes of trees, the use of the plant cpDNA “universal” barcode may not be suitable for the safe identification of tree species below a generic or sectional level. Supplementary barcoding loci of the nuclear genome and alternative solutions are proposed and discussed.
This study reports the complete chloroplast (cp) DNA sequence of Eleutherococcus senticosus (GenBank: JN 637765), an endangered endemic species. The genome is 156,768 bp in length, and contains a pair of inverted repeat (IR) regions of 25,930 bp each, a large single copy (LSC) region of 86,755 bp and a small single copy (SSC) region of 18,153 bp. The structural organization, gene and intron contents, gene order, AT content, codon usage, and transcription units of the E. senticosus chloroplast genome are similar to that of typical land plant cp DNA. We aligned and analyzed the sequences of 86 coding genes, 19 introns and 113 intergenic spacers (IGS) in three different taxonomic hierarchies; Eleutherococcus vs. Panax, Eleutherococcus vs. Daucus, and Eleutherococcus vs. Nicotiana. The distribution of indels, the number of polymorphic sites and nucleotide diversity indicate that positional constraint is more important than functional constraint for the evolution of cp genome sequences in Asterids. For example, the intron sequences in the LSC region exhibited base substitution rates 5-11-times higher than that of the IR regions, while the intron sequences in the SSC region evolved 7-14-times faster than those in the IR region. Furthermore, the Ka/Ks ratio of the gene coding sequences supports a stronger evolutionary constraint in the IR region than in the LSC or SSC regions. Therefore, our data suggest that selective sweeps by base collection mechanisms more frequently eliminate polymorphisms in the IR region than in other regions. Chloroplast genome regions that have high levels of base substitutions also show higher incidences of indels. Thirty-five simple sequence repeat (SSR) loci were identified in the Eleutherococcus chloroplast genome. Of these, 27 are homopolymers, while six are di-polymers and two are tri-polymers. In addition to the SSR loci, we also identified 18 medium size repeat units ranging from 22 to 79 bp, 11 of which are distributed in the IGS or intron regions. These medium size repeats may contribute to developing a cp genome-specific gene introduction vector because the region may use for specific recombination sites.
chloroplast genome; Eleutherococcus senticosus; indels; nucleotide diversity; positional effect
DNA barcoding should provide rapid, accurate and automatable species identifications by using a standardized DNA region as a tag. Based on sequences available in GenBank and sequences produced for this study, we evaluated the resolution power of the whole chloroplast trnL (UAA) intron (254–767 bp) and of a shorter fragment of this intron (the P6 loop, 10–143 bp) amplified with highly conserved primers. The main limitation of the whole trnL intron for DNA barcoding remains its relatively low resolution (67.3% of the species from GenBank unambiguously identified). The resolution of the P6 loop is lower (19.5% identified) but remains higher than those of existing alternative systems. The resolution is much higher in specific contexts such as species originating from a single ecosystem, or commonly eaten plants. Despite the relatively low resolution, the whole trnL intron and its P6 loop have many advantages: the primers are highly conserved, and the amplification system is very robust. The P6 loop can even be amplified when using highly degraded DNA from processed food or from permafrost samples, and has the potential to be extensively used in food industry, in forensic science, in diet analyses based on feces and in ancient DNA studies.
The chloroplast genes matK and rbcL have been proposed as a “core” DNA barcode for identifying plant species. Published estimates of successful species identification using these loci (70-80%) may be inflated because they may have involved comparisons among distantly related species within target genera. To assess the ability of the proposed two-locus barcode to discriminate closely related species, we carried out a hierarchically structured set of comparisons within Viburnum, a clade of woody angiosperms containing ca. 170 species (some 70 of which are currently used in horticulture). For 112 Viburnum species, we evaluated rbcL + matK, as well as the chloroplast regions rpl32-trnL, trnH-psbA, trnK, and the nuclear ribosomal internal transcribed spacer region (nrITS).
At most, rbcL + matK could discriminate 53% of all Viburnum species, with only 18% of the comparisons having genetic distances >1%. When comparisons were progressively restricted to species within major Viburnum subclades, there was a significant decrease in both the discriminatory power and the genetic distances. trnH-psbA and nrITS show much higher levels of variation and potential discriminatory power, and their use in plant barcoding should be reconsidered. As barcoding has often been used to discriminate species within local areas, we also compared Viburnum species within two regions, Japan and Mexico and Central America. Greater success in discriminating among the Japanese species reflects the deeper evolutionary history of Viburnum in that area, as compared to the recent radiation of a single clade into the mountains of Latin America.
We found very low levels of discrimination among closely related species of Viburnum, and low levels of variation in the proposed barcoding loci may limit success within other clades of long-lived woody plants. Inclusion of the supplementary barcodes trnH-psbA and nrITS increased discrimination rates but were often more effective alone rather than in combination with rbcL + matK. We surmise that the efficacy of barcoding in plants has often been overestimated because of the lack of comparisons among closely related species. Phylogenetic information must be incorporated to properly evaluate relatedness in assessing the utility of barcoding loci.
The rapidly increasing number of available plant genomes opens up almost unlimited prospects for biology in general and molecular phylogenetics in particular. A recent study took advantage of this data and identified a set of nuclear genes that occur in single copy in multiple sequenced angiosperms. The present study is the first to apply genomic sequence of one of these low copy genes, agt1, as a phylogenetic marker for species-level phylogenetics. Its utility is compared to the performance of several coding and non-coding chloroplast loci that have been suggested as most applicable for this taxonomic level. As a model group, we chose Tildenia, a subgenus of Peperomia (Piperaceae), one of the largest plant genera. Relationships are particularly difficult to resolve within these species rich groups due to low levels of polymorphisms and fast or recent radiation. Therefore, Tildenia is a perfect test case for applying new phylogenetic tools.
We show that the nuclear marker agt1, and in particular the agt1 introns, provide a significantly increased phylogenetic signal compared to chloroplast markers commonly used for low level phylogenetics. 25% of aligned characters from agt1 intron sequence are parsimony informative. In comparison, the introns and spacer of several common chloroplast markers (trnK intron, trnK-psbA spacer, ndhF-rpl32 spacer, rpl32-trnL spacer, psbA-trnH spacer) provide less than 10% parsimony informative characters. The agt1 dataset provides a deeper resolution than the chloroplast markers in Tildenia.
Single (or very low) copy nuclear genes are of immense value in plant phylogenetics. Compared to other nuclear genes that are members of gene families of all sizes, lab effort, such as cloning, can be kept to a minimum. They also provide regions with different phylogenetic content deriving from coding and non-coding parts of different length. Thus, they can be applied to a wide range of taxonomic levels from family down to population level. As more plant genomes are sequenced, we will obtain increasingly precise information about which genes return to single copy most rapidly following gene duplication and may be most useful across a wide range of plant groups.
Chloroplast genomes evolve slowly and many primers for PCR amplification and analysis of chloroplast sequences can be used across a wide array of genera. In some cases 'universal' primers have been designed for the purpose of working across species boundaries. However, the essential information on these primer sequences is scattered throughout the literature.
A database is presented here which assembles published primer information for chloroplast DNA. Additional primers were designed to fill gaps where little or no primer information could be found. Amplicons are either the genes themselves (typically useful in studies of sequence variation in higher-order phylogeny) or they are spacers, introns, and intergenic regions (for studies of phylogeographic patterns within and among species). The current list of 'generic' primers consists of more than 700 sequences. Wherever possible, we give the locations of the primers in the thirteen fully sequenced chloroplast genomes (Nicotiana tabacum, Atropa belladonna, Spinacia oleracea, Arabidopsis thaliana, Populus trichocarpa, Oryza sativa, Pinus thunbergii, Marchantia polymorpha, Zea mays, Oenothera elata, Acorus calamus, Eucalyptus globulus, Medicago trunculata).
The database described here is designed to serve as a resource for researchers who are venturing into the study of poorly described chloroplast genomes, whether for large- or small-scale DNA sequencing projects, to study molecular variation or to investigate chloroplast evolution.
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level.
Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species.
A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
Comprehensive sampling is crucial to DNA barcoding, but it is rarely performed because materials are usually unavailable. In practice, only a few rather than all species of a genus are required to be identified. Thus identification of a given species using a limited sample is of great importance in current application of DNA barcodes. Here, we selected 70 individuals representing 48 species from each major lineage of Solanum, one of the most species-rich genera of seed plants, to explore whether DNA barcodes can provide reliable specific-species discrimination in the context of incomplete sampling. Chloroplast genes ndhF and trnS-trnG and the nuclear gene waxy, the commonly used markers in Solanum phylogeny, were selected as the supplementary barcodes. The tree-building and modified barcode gap methods were employed to assess species resolution. The results showed that four Solanum species of quarantine concern could be successfully identified through the two-step barcoding sampling strategy. In addition, discrepancies between nuclear and cpDNA barcodes in some samples demonstrated the ability to discriminate hybrid species, and highlights the necessity of using barcode regions with different modes of inheritance. We conclude that efficient phylogenetic markers are good candidates as the supplementary barcodes in a given taxonomic group. Critically, we hypothesized that a specific-species could be identified from a phylogenetic framework using incomplete sampling–through this, DNA barcoding will greatly benefit the current fields of its application.
Herpotrichiellaceous black yeasts and relatives comprise severe pathogens flanked by nonpathogenic environmental siblings. Reliable identification by conventional methods is notoriously difficult. Molecular identification is hampered by the sequence variability in the internal transcribed spacer (ITS) domain caused by difficult-to-sequence homopolymeric regions and by poor taxonomic attribution of sequences deposited in GenBank. Here, we present a potential solution using short barcode identifiers (27 to 50 bp) based on ITS2 ribosomal DNA (rDNA), which allows unambiguous definition of species-specific fragments. Starting from proven sequences of ex-type and authentic strains, we were able to describe 103 identifiers. Multiple BLAST searches of these proposed barcode identifiers in GenBank revealed uniqueness for 100 taxonomic entities, whereas the three remaining identifiers each matched with two entities, but the species of these identifiers could easily be discriminated by differences in the remaining ITS regions. Using the proposed barcode identifiers, a 4.1-fold increase of 100% matches in GenBank was achieved in comparison to the classical approach using the complete ITS sequences. The proposed barcode identifiers will be made accessible for the diagnostic laboratory in a permanently updated online database, thereby providing a highly practical, reliable, and cost-effective tool for identification of clinically important black yeasts and relatives.
Species identification of living organisms by standard DNA sequences has been well-accepted. Consortium for the Barcode of Life (CBOL) recommends chloroplast regions rbcL and matK as the DNA barcodes for the land plants. This study aims to evaluate the feasibility and limitations of rbcL, matK, and 5 other commonly used regions as the DNA barcodes for the medicinal Gentiana and their adulterants, Gentiana. rhodantha and Podophyllum hexandrum.
The species differentiation power of rbcL, matK, nuclear internal transcribed spacer (ITS) and 5S rRNA intergenic spacer, and chloroplast trnH-psbA, trnL-F and rpl36-rps8 intergenic spacers were tested in different medicinal Gentiana, including Gentiana scabra, Gentiana triflora, Gentiana manshurica and Gentiana rigescens, from common adulterants such as Gentiana rhodantha and Podophyllum hexandrum (a toxic herb producing podophyllotoxin).
All seven tested loci could be used to differentiate medicinal Gentiana species from their adulterants, and to distinguish Guanlongdan from Jianlongdan. In terms of general differentiation powers, rbcL and matK had no significant advantages over the other five loci. Only the 5S rRNA and trnL-F intergenic spacers were able to discriminate the closely related species G. triflora, G. scabra and G. manshurica.
The DNA barcodes rbcL and matK are useful in differentiation of closely related medicinal species of Gentiana, but had no significant advantages over the other five tested loci.
Intercontinental disjunctions between tropical regions, which harbor two-thirds of the flowering plants, have drawn great interest from biologists and biogeographers. Most previous studies on these distribution patterns focused on woody plants, and paid little attention to herbs. The Orchidaceae is one of the largest families of angiosperms, with a herbaceous habit and a high species diversity in the Tropics. Here we investigate the evolutionary and biogeographical history of the slipper orchids, which represents a monophyletic subfamily (Cypripedioideae) of the orchid family and comprises five genera that are disjunctly distributed in tropical to temperate regions. A relatively well-resolved and highly supported phylogeny of slipper orchids was reconstructed based on sequence analyses of six maternally inherited chloroplast and two low-copy nuclear genes (LFY and ACO). We found that the genus Cypripedium with a wide distribution in the northern temperate and subtropical zones diverged first, followed by Selenipedium endemic to South America, and finally conduplicate-leaved genera in the Tropics. Mexipedium and Phragmipedium from the neotropics are most closely related, and form a clade sister to Paphiopedilum from tropical Asia. According to molecular clock estimates, the genus Selenipedium originated in Palaeocene, while the most recent common ancestor of conduplicate-leaved slipper orchids could be dated back to the Eocene. Ancestral area reconstruction indicates that vicariance is responsible for the disjunct distribution of conduplicate slipper orchids in palaeotropical and neotropical regions. Our study sheds some light on mechanisms underlying generic and species diversification in the orchid family and tropical disjunctions of herbaceous plant groups. In addition, we suggest that the biogeographical study should sample both regional endemics and their widespread relatives.
Melon, Cucumis melo, and cucumber, C. sativus, are among the most widely cultivated crops worldwide. Cucumis, as traditionally conceived, is geographically centered in Africa, with C. sativus and C. hystrix thought to be the only Cucumis species in Asia. This taxonomy forms the basis for all ongoing Cucumis breeding and genomics efforts. We tested relationships among Cucumis and related genera based on DNA sequences from chloroplast gene, intron, and spacer regions (rbcL, matK, rpl20-rps12, trnL, and trnL-F), adding nuclear internal transcribed spacer sequences to resolve relationships within Cucumis.
Analyses of combined chloroplast sequences (4,375 aligned nucleotides) for 123 of the 130 genera of Cucurbitaceae indicate that the genera Cucumella, Dicaelospermum, Mukia, Myrmecosicyos, and Oreosyce are embedded within Cucumis. Phylogenetic trees from nuclear sequences for these taxa are congruent, and the combined data yield a well-supported phylogeny. The nesting of the five genera in Cucumis greatly changes the natural geographic range of the genus, extending it throughout the Malesian region and into Australia. The closest relative of Cucumis is Muellerargia, with one species in Australia and Indonesia, the other in Madagascar. Cucumber and its sister species, C. hystrix, are nested among Australian, Malaysian, and Western Indian species placed in Mukia or Dicaelospermum and in one case not yet formally described. Cucumis melo is sister to this Australian/Asian clade, rather than being close to African species as previously thought. Molecular clocks indicate that the deepest divergences in Cucumis, including the split between C. melo and its Australian/Asian sister clade, go back to the mid-Eocene.
Based on congruent nuclear and chloroplast phylogenies we conclude that Cucumis comprises an old Australian/Asian component that was heretofore unsuspected. Cucumis sativus evolved within this Australian/Asian clade and is phylogenetically far more distant from C. melo than implied by the current morphological classification.
Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used “1-nearest-neighbor” (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research.
Thailand, a part of the Indo-Burma biodiversity hotspot, has many endemic animals and plants. Some of its fungal species are difficult to recognize and separate, complicating assessments of biodiversity. We assessed species diversity within the fungal genera Annulohypoxylon and Hypoxylon, which produce biologically active and potentially therapeutic compounds, by applying classical taxonomic methods to 552 teleomorphs collected from across Thailand. Using probability of correct identification (PCI), we also assessed the efficacy of automated species identification with a fungal barcode marker, ITS, in the model system of Annulohypoxylon and Hypoxylon. The 552 teleomorphs yielded 137 ITS sequences; in addition, we examined 128 GenBank ITS sequences, to assess biases in evaluating a DNA barcode with GenBank data. The use of multiple sequence alignment in a barcode database like BOLD raises some concerns about non-protein barcode markers like ITS, so we also compared species identification using different alignment methods. Our results suggest the following. (1) Multiple sequence alignment of ITS sequences is competitive with pairwise alignment when identifying species, so BOLD should be able to preserve its present bioinformatics workflow for species identification for ITS, and possibly therefore with at least some other non-protein barcode markers. (2) Automated species identification is insensitive to a specific choice of evolutionary distance, contributing to resolution of a current debate in DNA barcoding. (3) Statistical methods are available to address, at least partially, the possibility of expert misidentification of species. Phylogenetic trees discovered a cryptic species and strongly supported monophyletic clades for many Annulohypoxylon and Hypoxylon species, suggesting that ITS can contribute usefully to a barcode for these fungi. The PCIs here, derived solely from ITS, suggest that a fungal barcode will require secondary markers in Annulohypoxylon and Hypoxylon, however. The URL http://tinyurl.com/spouge-barcode contains computer programs and other supplementary material relevant to this article.
Comparative chloroplast genome analyses are mostly carried out at lower taxonomic levels, such as the family and genus levels. At higher taxonomic levels, chloroplast genomes are generally used to reconstruct phylogenies. However, little attention has been paid to chloroplast genome evolution within orders. Here, we present the chloroplast genome of Sedum sarmentosum and take advantage of several available (or elucidated) chloroplast genomes to examine the evolution of chloroplast genomes in Saxifragales. The chloroplast genome of S. sarmentosum is 150,448 bp long and includes 82,212 bp of a large single-copy (LSC) region, 16.670 bp of a small single-copy (SSC) region, and a pair of 25,783 bp sequences of inverted repeats (IRs).The genome contains 131 unique genes, 18 of which are duplicated within the IRs. Based on a comparative analysis of chloroplast genomes from four representative Saxifragales families, we observed two gene losses and two pseudogenes in Paeonia obovata, and the loss of an intron was detected in the rps16 gene of Penthorum chinense. Comparisons among the 72 common protein-coding genes confirmed that the chloroplast genomes of S. sarmentosum and Paeonia obovata exhibit accelerated sequence evolution. Furthermore, a strong correlation was observed between the rates of genome evolution and genome size. The detected genome size variations are predominantly caused by the length of intergenic spacers, rather than losses of genes and introns, gene pseudogenization or IR expansion or contraction. The genome sizes of these species are negatively correlated with nucleotide substitution rates. Species with shorter duration of the life cycle tend to exhibit shorter chloroplast genomes than those with longer life cycles.
Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques.
DNA barcoding of land plants has relied traditionally on a small number of markers from the plastid genome. In contrast, low-copy nuclear genes have received little attention as DNA barcodes because of the absence of universal primers for PCR amplification.
From pooled-species 454 transcriptome data we identified two variable intron-less nuclear loci for each of two species-rich genera of the Hawaiian flora: Clermontia (Campanulaceae) and Cyrtandra (Gesneriaceae) and compared their utility as DNA barcodes with that of plastid genes. We found that nuclear genes showed an overall greater variability, but also displayed a high level of heterozygosity, intraspecific variation, and retention of ancient alleles. Thus, nuclear genes displayed fewer species-diagnostic haplotypes compared to plastid genes and no interspecies gaps.
The apparently greater coalescence times of nuclear genes are likely to limit their utility as barcodes, as only a small proportion of their alleles were fixed and unique to individual species. In both groups, species-diagnostic markers from either genome were scarce on the youngest island; a minimum age of ca. two million years may be needed for a species flock to be barcoded. For young plant groups, nuclear genes may not be a superior alternative to slowly evolving plastid genes.
Adaptive radiation; Island biogeography; Lobeliads; Next-generation sequencing; Progression rule; Single-copy nuclear genes
Based on the testing of several loci, predominantly against floristic backgrounds, individual or different combinations of loci have been suggested as possible universal DNA barcodes for plants. The present investigation was undertaken to check the applicability of the recommended locus/loci for congeneric species with Dendrobium species as an illustrative example.
Six loci, matK, rbcL, rpoB, rpoC1, trnH-psbA spacer from the chloroplast genome and ITS, from the nuclear genome, were compared for their amplification, sequencing and species discrimination success rates among multiple accessions of 36 Dendrobium species. The trnH-psbA spacer could not be considered for analysis as good quality sequences were not obtained with its forward primer. Among the tested loci, ITS, recommended by some as a possible barcode for plants, provided 100% species identification. Another locus, matK, also recommended as a universal barcode for plants, resolved 80.56% species. ITS remained the best even when sequences of investigated loci of additional Dendrobium species available on the NCBI GenBank (93, 33, 20, 18 and 17 of ITS, matK, rbcL, rpoB and rpoC1, respectively) were also considered for calculating the percent species resolution capabilities. The species discrimination of various combinations of the loci was also compared based on the 36 investigated species and additional 16 for which sequences of all the five loci were available on GenBank. Two-locus combination of matK+rbcL recommended by the Plant Working Group of Consortium for Barcoding of Life (CBOL) could discriminate 86.11% of 36 species. The species discriminating ability of this barcode was reduced to 80.77% when additional sequences available on NCBI were included in the analysis. Among the recommended combinations, the barcode based on three loci - matK, rpoB and rpoC1- resolved maximum number of species.
Any recommended barcode based on the loci tested so far, is not likely to provide 100% species identification across the plant kingdom and thus is not likely to act as a universal barcode. It appears that barcodes, if based on single or limited locus(i), would be taxa specific as is exemplified by the success of ITS among Dendrobium species, though it may not be suitable for other plants because of the problems that are discussed.
Dendrobium; DNA barcoding; ITS; matK
The Campanulaceae (the "hare bell" or "bellflower" family) is a derived angiosperm family comprised of about 600 species treated in 35 to 55 genera. Taxonomic treatments vary widely and little phylogenetic work has been done in the family. Gene order in the chloroplast genome usually varies little among vascular plants. However, chloroplast genomes of Campanulaceae represent an exception and phylogenetic analyses solely based on chloroplast rearrangement characters support a reasonably well-resolved tree.
Chloroplast DNA physical maps were constructed for eighteen representatives of the family. So many gene order changes have occurred among the genomes that characterizing individual mutational events was not always possible. Therefore, we examined different, novel scoring methods to prepare data matrices for cladistic analysis. These approaches yielded largely congruent results but varied in amounts of resolution and homoplasy. The strongly supported nodes were common to all gene order analyses as well as to parallel analyses based on ITS and rbcL sequence data. The results suggest some interesting and unexpected intrafamilial relationships. For example fifteen of the taxa form a derived clade; whereas the remaining three taxa – Platycodon, Codonopsis, and Cyananthus – form the basal clade. This major subdivision of the family corresponds to the distribution of pollen morphology characteristics but is not compatible with previous taxonomic treatments.
Our use of gene order data in the Campanulaceae provides the most highly resolved phylogeny as yet developed for a plant family using only cpDNA rearrangements. The gene order data showed markedly less homoplasy than sequence data for the same taxa but did not resolve quite as many nodes. The rearrangement characters, though relatively few in number, support robust and meaningful phylogenetic hypotheses and provide new insights into evolutionary relationships within the Campanulaceae.
The chloroplast trnH-psbA spacer region has been proposed as a prime candidate for use in DNA barcoding of plants because of its high substitution rate. However, frequent inversions associated with palindromic sequences within this region have been found in multiple lineages of Angiosperms and may complicate its use as a barcode, especially if they occur within species.
Here, we evaluate the implications of intraspecific inversions in the trnH-psbA region for DNA barcoding efforts. We report polymorphic inversions within six species of Gentianaceae, all narrowly circumscribed morphologically: Gentiana algida, Gentiana fremontii, Gentianopsis crinita, Gentianopsis thermalis, Gentianopsis macrantha and Frasera speciosa. We analyze these sequences together with those from 15 other species of Gentianaceae and show that typical simple methods of sequence alignment can lead to misassignment of conspecifics and incorrect assessment of relationships.
Frequent inversions in the trnH-psbA region, if not recognized and aligned appropriately, may lead to large overestimates of the number of substitution events separating closely related lineages and to uniting more distantly related taxa that share the same form of the inversion. Thus, alignment of the trnH-psbA spacer region will need careful attention if it is used as a marker for DNA barcoding.
The Chloroplast Genome Database (ChloroplastDB) is an interactive, web-based database for fully sequenced plastid genomes, containing genomic, protein, DNA and RNA sequences, gene locations, RNA-editing sites, putative protein families and alignments (). With recent technical advances, the rate of generating new organelle genomes has increased dramatically. However, the established ontology for chloroplast genes and gene features has not been uniformly applied to all chloroplast genomes available in the sequence databases. For example, annotations for some published genome sequences have not evolved with gene naming conventions. ChloroplastDB provides unified annotations, gene name search, BLAST and download functions for chloroplast encoded genes and genomic sequences. A user can retrieve all orthologous sequences with one search regardless of gene names in GenBank. This feature alone greatly facilitates comparative research on sequence evolution including changes in gene content, codon usage, gene structure and post-transcriptional modifications such as RNA editing. Orthologous protein sets are classified by TribeMCL and each set is assigned a standard gene name. Over the next few years, as the number of sequenced chloroplast genomes increases rapidly, the tools available in ChloroplastDB will allow researchers to easily identify and compile target data for comparative analysis of chloroplast genes and genomes.
Welwitschia mirabilis is the only extant member of the family Welwitschiaceae, one of three lineages of gnetophytes, an enigmatic group of gymnosperms variously allied with flowering plants or conifers. Limited sequence data and rapid divergence rates have precluded consensus on the evolutionary placement of gnetophytes based on molecular characters. Here we report on the first complete gnetophyte chloroplast genome sequence, from Welwitschia mirabilis, as well as analyses on divergence rates of protein-coding genes, comparisons of gene content and order, and phylogenetic implications.
The chloroplast genome of Welwitschia mirabilis [GenBank: EU342371] is comprised of 119,726 base pairs and exhibits large and small single copy regions and two copies of the large inverted repeat (IR). Only 101 unique gene species are encoded. The Welwitschia plastome is the most compact photosynthetic land plant plastome sequenced to date; 66% of the sequence codes for product. The genome also exhibits a slightly expanded IR, a minimum of 9 inversions that modify gene order, and 19 genes that are lost or present as pseudogenes. Phylogenetic analyses, including one representative of each extant seed plant lineage and based on 57 concatenated protein-coding sequences, place Welwitschia at the base of all seed plants (distance, maximum parsimony) or as the sister to Pinus (the only conifer representative) in a monophyletic gymnosperm clade (maximum likelihood, bayesian). Relative rate tests on these gene sequences show the Welwitschia sequences to be evolving at faster rates than other seed plants. For these genes individually, a comparison of average pairwise distances indicates that relative divergence in Welwitschia ranges from amounts about equal to other seed plants to amounts almost three times greater than the average for non-gnetophyte seed plants.
Although the basic organization of the Welwitschia plastome is typical, its compactness, gene content and high nucleotide divergence rates are atypical. The current lack of additional conifer plastome sequences precludes any discrimination between the gnetifer and gnepine hypotheses of seed plant relationships. However, both phylogenetic analyses and shared genome features identified here are consistent with either of the hypotheses that link gnetophytes with conifers, but are inconsistent with the anthophyte hypothesis.
Complete chloroplast genome sequences provide a valuable source of molecular markers for studies in molecular ecology and evolution of plants. To obtain complete genome sequences, recent studies have made use of the polymerase chain reaction to amplify overlapping fragments from conserved gene loci. However, this approach is time consuming and can be more difficult to implement where gene organisation differs among plants. An alternative approach is to first isolate chloroplasts and then use the capacity of high-throughput sequencing to obtain complete genome sequences. We report our findings from studies of the latter approach, which used a simple chloroplast isolation procedure, multiply-primed rolling circle amplification of chloroplast DNA, Illumina Genome Analyzer II sequencing, and de novo assembly of paired-end sequence reads.
A modified rapid chloroplast isolation protocol was used to obtain plant DNA that was enriched for chloroplast DNA, but nevertheless contained nuclear and mitochondrial DNA. Multiply-primed rolling circle amplification of this mixed template produced sufficient quantities of chloroplast DNA, even when the amount of starting material was small, and improved the template quality for Illumina Genome Analyzer II (hereafter Illumina GAII) sequencing. We demonstrate, using independent samples of karaka (Corynocarpus laevigatus), that there is high fidelity in the sequence obtained from this template. Although less than 20% of our sequenced reads could be mapped to chloroplast genome, it was relatively easy to assemble complete chloroplast genome sequences from the mixture of nuclear, mitochondrial and chloroplast reads.
We report successful whole genome sequencing of chloroplast DNA from karaka, obtained efficiently and with high fidelity.
The Streptophyta comprise all land plants and six monophyletic groups of charophycean green algae. Phylogenetic analyses of four genes from three cellular compartments support the following branching order for these algal lineages: Mesostigmatales, Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales and Charales, with the last lineage being sister to land plants. Comparative analyses of the Mesostigma viride (Mesostigmatales) and land plant chloroplast genome sequences revealed that this genome experienced many gene losses, intron insertions and gene rearrangements during the evolution of charophyceans. On the other hand, the chloroplast genome of Chaetosphaeridium globosum (Coleochaetales) is highly similar to its land plant counterparts in terms of gene content, intron composition and gene order, indicating that most of the features characteristic of land plant chloroplast DNA (cpDNA) were acquired from charophycean green algae. To gain further insight into when the highly conservative pattern displayed by land plant cpDNAs originated in the Streptophyta, we have determined the cpDNA sequences of the distantly related zygnematalean algae Staurastrum punctulatum and Zygnema circumcarinatum.
The 157,089 bp Staurastrum and 165,372 bp Zygnema cpDNAs encode 121 and 125 genes, respectively. Although both cpDNAs lack an rRNA-encoding inverted repeat (IR), they are substantially larger than Chaetosphaeridium and land plant cpDNAs. This increased size is explained by the expansion of intergenic spacers and introns. The Staurastrum and Zygnema genomes differ extensively from one another and from their streptophyte counterparts at the level of gene order, with the Staurastrum genome more closely resembling its land plant counterparts than does Zygnema cpDNA. Many intergenic regions in Zygnema cpDNA harbor tandem repeats. The introns in both Staurastrum (8 introns) and Zygnema (13 introns) cpDNAs represent subsets of those found in land plant cpDNAs. They represent 16 distinct insertion sites, only five of which are shared by the two zygnematalean genomes. Three of these insertions sites have not been identified in Chaetosphaeridium cpDNA.
The chloroplast genome experienced substantial changes in overall structure, gene order, and intron content during the evolution of the Zygnematales. Most of the features considered earlier as typical of land plant cpDNAs probably originated before the emergence of the Zygnematales and Coleochaetales.
The organization of a cloned rRNA gene cluster from Chlorella ellipsoidea chloroplast DNA (cpDNA) has been analyzed. Southern hybridization experiments with labelled chloroplast rRNAs as probes revealed an extraordinarily large size of the 16S-23S rRNA spacer region, ca. 4.8 kbp, almost twice as large as those of most higher plants. The nucleotide sequence determined on this region has shown that: (1) The tRNAIle gene locating in this region is similar to those of higher plant chloroplasts, blue-green algae and E. coli but does not contain any introns in contrast to higher plant chloroplasts. (2) The tRNAAla gene is absent from this region. (3) There are four open reading frames (ORFs) coding for 55, 102, 107 and 110 amino acids, respectively. (4) A few sets of unique sequence were found repeatedly in this region. (5) The 23S rRNA gene is coded on the opposite strand in the reverse order. This arrangement of the 16S-23S rRNA region of Chlorella cpDNA is quite different from any of those reported so far for various organisms.