|Home | About | Journals | Submit | Contact Us | Français|
Chickpea (Cicer arietinum, Leguminosae), an important grain legume, is widely used for food and fodder throughout the world. We sequenced the complete plastid genome of chickpea, which is 125,319 bp in size, and contains only one copy of the inverted repeat (IR). The genome encodes 108 genes, including 4 rRNAs, 29 tRNAs, and 75 proteins. The genes rps16, infA, and ycf4 are absent in the chickpea plastid genome, and ndhB has an internal stop codon in the 5′exon, similar to other legumes. Two genes have lost their introns, one in the 3′exon of the transpliced gene rps12, and the one between exons 1 and 2 of clpP; this represents the first documented case of the loss of introns from both of these genes in the same plastid genome. An extensive phylogenetic survey of these intron losses was performed on 302 taxa across legumes and the related family Polygalaceae. The clpP intron has been lost exclusively in taxa from the temperate “IR-lacking clade” (IRLC), whereas the rps12 intron has been lost in most members of the IRLC (with the exception of Wisteria, Callerya, Afgekia, and certain species of Millettia, which represent the earliest diverging lineages of this clade), and in the tribe Desmodieae, which is closely related to the tribes Phaseoleae and Psoraleeae. Data provided here suggest that the loss of the rps12 intron occurred after the loss of the IR. The two new genomic changes identified in the present study provide additional support of the monophyly of the IR-loss clade, and resolution of the pattern of the earliest-branching lineages in this clade. The availability of the complete chickpea plastid genome sequence also provides valuable information on intergenic spacer regions among legumes and endogenous regulatory sequences for plastid genetic engineering.
Gene mapping and genomic sequencing have demonstrated that plastid genome organization is generally highly conserved among angiosperms (Palmer, 1991; Raubeson and Jansen, 2005). Most genomes are characterized by a quadripartite structure, with two copies of an inverted repeat (IR) separating the large (LSC) and small (SSC) single copy regions. The genomes usually include 120– 130 genes and range in size from 120 to 170 kilobases (kb). Gene content and gene order are conserved throughout angiosperms with the ancestral configuration depicted by the earliest-branching angiosperm clades Amborella and Nymphaeales (Goremykin et al., 2003; Raubeson et al., 2007; Jansen et al., 2007). In a typical angiosperm chloroplast genome, there are 18 genes containing introns, six in tRNA genes and the remaining twelve in protein-coding genes. Fifteen of the intron-containing genes have only two exons, and the remaining three have three exons.
Changes in this highly conserved organization of plastid genomes have been utilized to resolve phylogenetic relationships among major clades in a number of angiosperm families, including Asteraceae (Jansen and Palmer, 1987; Kim et al., 2005), Berberidaceae (Kim and Jansen, 1995), Cactaceae (Wallace and Cota, 1996), Campanulaceae (Cosner et al., 2004), Leguminosae (Bruneau et al., 1990; Lavin et al., 1990; Doyle et al., 1995; Doyle et al., 1996), Geraniaceae (Chumley et al., 2006), Lobeliaceae (Knox et al., 1993), Oleaceae (Lee et al., 2007), Onagraceae (Hachtel et al., 1991; Greiner et al., 2008), Poaceae (Doyle et al., 1992), and Ranunculaceae (Hoot and Palmer, 1994; Johansson, 1999). The types of changes that have been used include inversions, loss of the 22–25 kb IR which contains a duplicated set of rRNA and tRNA genes, expansion/contraction of the IR, and gene/intron loss (Downie and Palmer, 1992). Although all of these genomic changes have exhibited some homoplasy, they have served as powerful phylogenetic markers for several reasons: (1) these types of changes are generally rare, resulting in lower homoplasy relative to nucleotide substitutions; (2) assessing homology of these events is generally straightforward; (3) the direction of evolutionary change is easily discerned; and (4) once a rearrangement is detected it is relatively easy to survey numerous taxa for each event. The relative phylogenetic utility of the different types of plastid genomic rearrangements varies considerably with inversions exhibiting the least amount of homoplasy. However, even genomic changes that are homoplasious provide valuable phylogenetic characters within each major lineage in which they occur. One example of this phenomenon concerns the rpoC1 intron, which has been lost independently six times among angiosperms (Downie et al., 1996) but its absence is still a valuable phylogenetic marker for resolving relationships within each of these six lineages.
The Leguminosae (also Fabaceae; the legumes) is one angiosperm family that has experienced considerable numbers of plastid genomic rearrangements. Legumes are the third largest family of angiosperms with 730 genera and more than 19,000 species distributed throughout the world (Lewis et al., 2005). Legumes are second only to grasses in their agricultural and economic value, and include many important species grown for food, fodder, wood, ornamentals, and raw materials for industry and also for their ecologically important role in biological nitrogen fixation. A number of previous studies have examined the phylogenetic distribution of different plastid genome rearrangements among legumes, including the loss of one copy of the IR (Palmer and Thompson, 1982; Lavin et al., 1990), inversions of 50 kb (Palmer and Thompson, 1981; Doyle et al., 1996), and 78 kb (Bruneau et al., 1990), loss of the rpl22 and rps16 genes (Doyle et al., 1995), and loss of the rpl2 intron (Doyle et al., 1995). These genomic rearrangements, combined with DNA sequence data (Doyle, 1995; Käss and Wink, 1995, 1996; Doyle et al., 1997, 2000; Kajita et al., 2001; Wojciechowski et al., 2004), have provided valuable phylogenetic data for resolving relationships among several deep nodes of legumes. The first, and probably most dramatic example of the phylogenetic utility of a plastid genomic rearrangement among legumes is the loss of one copy of the IR by all members sampled from the tribes Carmichaelieae, Cicereae, Hedysareae, Trifolieae, Fabeae (Vicieae), Galegeae, and three genera of Millettieae (Lavin et al., 1990; Liston, 1995). The monophyly of this clade, known as the “IR-lacking clade” or IRLC (Wojciechowski et al., 2000), was later confirmed by phylogenetic analyses of DNA sequences of the plastid genes rbcL (Doyle et al., 1997; Käss and Wink, 1997) and matK (Wojciechowski et al., 2004), the plastid trnL intron (Pennington et al., 2001), and the internal transcribed spacer (ITS) regions of the nuclear ribosomal DNA (Hu et al., 2002).
In addition to utilization of plastid genome sequences for phylogenetic studies, they are very useful in engineering foreign genes. However, complete chloroplast genome sequence of only six species of crop plants were determined until 2004. Therefore, complete plastid genome sequences of several major crop species including fiber crops (Lee et al., 2006), tubers (Daniell et al., 2006, 2008), cereals (Saski et al., 2007), trees (Steane, 2005; Bausher et al., 2006; Ravi et al., 2006; Samson et al., 2007), vegetables (Ruhlman et al., 2006), fruits (Jansen et al., 2006; Daniell et al., 2006) and legumes (Saski et al., 2005; Guo et al., 2007) have been determined recently. Plastid genetic engineering offers a number of unique advantages including high level of transgene expression (DeCosa et al., 2001), multi-gene engineering in a single transformation event (Quesda-Vargas et al., 2005), transgene containment via maternal inheritance (Daniell, 2002; Daniell, 2007) or cytoplasmic male sterility (Ruiz and Daniell, 2005). Plastid transgenic lines also lack gene silencing (DeCosa et al., 2001; Lee et al., 2003), position effect due to site specific transgene integration (Daniell et al., 2002) and pleiotropic effects due to subcellular compartmentalization of transgene products (Lee et al., 2003; Daniell et al., 2001; Leelavathi et al., 2003); concerns of transgene silencing, position effect and pleiotropic effects are often encountered in nuclear genetic engineering. Therefore, transgenes have been integrated into plastid genomes to confer valuable agronomic traits, including herbicide resistance (Daniell et al., 1998), insect resistance (McBride et al., 1995; DeCosa et al., 2001), disease resistance (DeGray et al., 2001), drought tolerance (Lee et al., 2003), salt tolerance (Kumar et al., 2004), phytoremediation (Ruiz et al., 2003; Hussein et al., 2007) or expression of various therapeutic proteins or biomaterials (Verma and Daniell, 2007; Kamarajugadda and Daniell, 2006; Daniell et al., 2005). However, soybean is the only legume that has been transformed via the plastid genome so far (Dufoumantel et al., 2004, 2005) and more genome sequence information is needed to facilitate plastid genetic engineering in other economically important legumes.
During the past eight years plastid genome sequences have been completed for four legumes, including Lotus japonicus (Regel) K. Larson, Medicago truncatula Gaertn., Glycine max Merr., and Phaseolus vulgaris L. In this paper, we report on the complete genome sequence of Cicer arietinum L. (chickpea). Sequences of these five legume plastid genomes, all from taxa belonging to the subfamily Papilionoideae and two of which are from members of the IRLC (Cicer, Medicago), will enable more detailed comparisons of the organization and evolution of the plastid genomes of legumes. Our comparisons have identified two additional genomic rearrangements, the losses of introns in the clpP and rps12 (3′-end) genes, and we survey the phylogenetic distribution of these changes in 302 taxa of legumes and the related family Polygalaceae.
Chickpea (C. arietinum L.) seeds were obtained from IARI (Indian Agricultural Research Institute) New Delhi, India. Fresh leaves were harvested from greenhouse grown chickpea seedlings. Prior to plastid isolation, plants were kept in the dark for 48 h to reduce the levels of starch. Plastid isolation was performed as described by Jansen et al. (2005) and Samson et al. (2007).
Purified plastids were used to amplify the entire plastid genome by rolling circular amplification (RCA) using the Repli-g RCA-KIT (Qiagen GmbH, Hilden, Germany) following the methods described in Jansen et al. (2005). The success of genome amplification and the quality of the DNA was verified by digesting with restriction enzymes BstXI, EcoRI, and HindIII, and visualization of the resulting fragments on ethidium bromide stained, 1% agarose gels.
Purified RCA products were subjected to nebulization, followed by end repair, and size-fractioned by agarose gel electrophoresis to obtain fragment lengths ranging from 2.0 to 3.5 kb. Repaired products were blunt-end cloned into the 4blunt-TOPO vector, followed by transformation into Escherichia coli ElectroMax TM-DH5 α cells by electroporation (TOPO shotgun Cloning Kit; Invitrogen, Carlsbad, CA, USA). Transformed cells were selected on Luria–Bertani (LB) agar containing 100 μg/ml ampicillin and arrayed into 30 × 96-well microtitre plates. Sequencing reactions were carried out in both the forward and reverse directions using the BigDye Terminator v3.1 Cycle Sequencing Kit and separated by a 3730×L DNA Sequence Analyzer (Applied Biosystems, Foster City, CA, USA). Sequence data were assembled using Sequencher version 4.5 (Gene Codes, Ann Arbor, MI, USA) following quality and vector trimming. Gap regions were filled by sequencing PCR fragments generated using primers that flank the gaps. The assembly was considered complete when a quality score of ≥20 was obtained at every base position with at least 6× coverage.
The annotation program Dual Organellar Genome Annotator (DOGMA; Wyman et al., 2004) was used to annotate the C. arietinum plastid genome, after uploading a FASTA-formatted file of complete nucleotide sequence to the program's server. BLASTX and BLASTN searches, against a custom database of previously published plastid genomes, identified putative protein-coding genes, tRNAs, and rRNAs. For genes with low sequence identity, manual annotation was performed, after identifying the position of the start and stop codons, as well as the translated amino acid sequence, using the plastid/bacterial genetic code.
MultiPipMaker (Schwartz et al., 2003; http://bio.cse.psu.edu) was used for multiple genome alignment of chickpea with four published legume plastid genomes from the subfamily Papilionoideae; Lotus japonicus (NC_002694, Kato et al., 2000), Medicago truncatula (NC_003119), Glycine max (NC_007942, Saski et al., 2005), and Phaseolus vulgaris (NC_009259, Guo et al., 2007). We generated the alignments of whole genomes using chickpea as the reference genome.
We surveyed for the presence/absence of two introns from 318 accessions of 301 legume species representing all 3 subfamilies and 198 genera, and 1 member of the related family Polygalaceae (Table 1) using primers designed that span the intron in each gene: clpPF3 and clpPR3 (5′-ATGCCMATTGGTGTTCCAAAAGTRCC and 5′-G CGTGAGGGAATGCTAGACGTTTGGT) for the clpP gene, and rps12F and rps12R (5′-CCYAAAAAACCAAACTCTGCYTTACGTAAA and 5′-TT ATTTTGGCTTTTTBGCMCCATATT) for the rps12 gene. PCR amplification products were resolved on 1% agarose gels and fragment sizes were determined by comparison to DNA size markers.
Chickpea has a circular plastid genome 125,319 bp in length with only one copy of the IR region (Fig. 1, GenBank accession number EU 835853). Gene order in chickpea is similar to the ancestral angiosperm gene order (Raubeson et al., 2007) except for the loss of one copy of the IR and by the presence of a single, large inversion of approximately 50 kb that reverses the order of the genes between rbcL and rps16. The same inversion is present in the four other completely sequenced legume plastid genomes Glycine max (Saski et al., 2005), Lotus japonicus (Kato et al., 2000), Medicago truncatula, and Phaseolus vulgaris (Guo et al., 2007), and is apparently shared by the majority of papilionoid legumes (Doyle et al., 1996). The AT content of the chickpea plastid genome is 66.1%, similar to other legumes including Glycine max (64.63%), Lotus japonicus (64.0%), Phaseolus vulgaris (64.56%), and Medicago truncatula (66.03%).
The chickpea plastid genome has 108 total genes, including 4 rRNA genes, 29 tRNA genes, and 75 protein-coding genes. Three genes, rps16, infA, and ycf4, found in most angiosperm plastid genomes, including representatives of the early-branching lineages (Goremykin et al., 2003; Hansen et al., 2007; Raubeson et al., 2007) are not present in the chickpea plastid genome. In ndhB, there is an internal stop codon, similar to other legume plastid genomes. There is no stop codon (Met…Val) in the rps8 gene, a characteristic feature of the Medicago truncatula plastid genome.
Fifteen genes contain one or two introns, nine of which are in protein-coding genes while six are in the tRNA genes. The protein-coding gene rpl2 in chickpea, which contains a single intron of 669 bp has 16 amino acids missing at the 5′-end relative to most other legumes (Fig. 2), while Medicago has 11 amino acids missing in the same portion of the rpl2 gene. Among intron-containing genes, trnK-UUU has the largest intron (2491 bp), and it includes the matK gene. The smallest intron is in trnL-UAA (555 bp). The ycf3 gene has two introns of 733 and 737 bp.
In the Cicer arietenum chloroplast genome we observed the absence of the clpP and rps12 3′-end introns. Observation of loss of both of these introns represents the first documented case of such loss within the same plastid genome. Therefore, 301 legume taxa from 198 genera and one member of the related family Polygalaceae (Table 1) were subjected to a PCR-based survey for the presence/absence of the clpP and rps12-3′-end introns. These taxa represent all major groups of legumes, with 23 caesalpinioid species (18 genera), 18 mimosoid species (15 genera), 260 species of papilionoids (165 genera), and the genus Polygala L. in the family Polygalaceae (potential sister group to Leguminosae in Fabales; sensu APG II, 2003). For clpP, expected fragments sizes should be 1100–1300 bp if the intron is present and 300–350 bp if the intron is lost (Fig. 3). For rps12, expected fragments sizes should be 800– 850 bp if the intron is present and 250–280 bp if the intron is lost (Fig. 4). Comparison of the sizes of both the clpP and rps12 PCR products from a diversity of legumes, represented in Figs. 3 and and4,4, reveals fragments very similar in length, a result consistent with a process by which plastid introns are excised precisely and entirely, as observed earlier by Doyle et al. (1995) and in other taxa by Downie et al. (1991). Indeed, sequence analysis of ten taxa selected from our survey that included those with or without the clpP and rps12 introns confirms that intron excision has occurred at precisely the same points in the gene sequence in each taxon lacking the intron (MFW, unpublished data). Furthermore, the minor variation in length of the clpP PCR products we observed (Fig. 3A) was due to the length of the intron in each taxon, which ranged from 702 bp (Lespedeza cuneata) to 799 bp (Lotus corniculatus) in the taxa we sequenced, whereas the length of the rps12-3′ intron was 529–532 bp in the taxa analyzed (Glycine max, Phaseolus vulgaris, Callerya reticulata; MFW, unpublished data).
The presence of extra minor PCR products, reactions that were weak or inconclusive, and polymorphic for intron presence/absence have been identified in our survey for these two introns (Table 1). Virtually all of these examples were found in reactions surveying for the first (of two) clpP intron in taxa that presumably have the intron present. Possible explanations include partial excision reactions, poor quality DNA (many taxa have been sampled from herbarium specimens), sequence variation in primer binding sites or rearrangements in/near the gene containing the intron.
Our survey (Table 1) reveals that the loss of the clpP intron is, with a few exceptions, limited to the large IR-lacking clade (Fig. 5). Loss of the clpP intron was detected in individual accessions of the mimosoid Inga Mill. and the papilionoids Aotus Sm., and Hypocalyptus Thunb., Platymiscium Vogel, although the latter three results are ambiguous and need to be confirmed with sampling of additional specimens. The rps12 intron is also lost in all members of the IRLC surveyed, with notable exceptions (all accessions of Callerya Endl., Wisteria Nutt., Afgekia Craib, and Millettia japonica A. Gray). This intron is also lost independently in the papilionoid tribe Desmodieae (Fig. 5), a monophyletic group nested in the “Millettioids-Phaseoloids” clade (Kajita et al., 2001; Wojciechowski et al., 2004). Desmodieae, which consists of 30 genera and ca. 530 species distributed in tropical to warm temperate regions of the world (Lewis et al., 2005), is represented in this analysis by the genera Alysicarpus Desv., Campylotropis Bunge, Desmodium Desv., Kummerowia Schindl., and Lespedeza Michx. (Table 1).
A number of previous studies indicated that legume plastid genomes have experienced substantial numbers of rearrangements (Palmer and Thompson, 1981, 1982; Palmer et al., 1988; Lavin et al., 1990; Bruneau et al., 1990; Doyle et al., 1995, 1996). Complete sequencing of plastid genomes of five legumes (Cicer, Glycine, Lotus, Medicago, and Phaseolus), combined with earlier gene mapping studies, have revealed nine genomic rearrangements, including two large inversions, the loss of the IR, three gene losses and three intron losses, two of which are reported here (Table 2). Thus, the legumes represent one of only a few angiosperm families that have experienced multiple, plastid genomic rearrangements and gene/intron losses (Jansen et al., 2007), and serve as an excellent choice in which to investigate contrasting patterns of plastid DNA evolution. Others families known to have comparable plastid genomic rearrangements include Asteraceae (two inversions; Jansen and Palmer, 1987; Kim et al., 2005), Campanulaceae (up to 42 inversions, two gene losses, 8 putative transpositions; Cosner et al., 1997, 2004; Haberle et al., 2008), Geraniaceae (12 inversions, 8 IR boundary changes; Chumley et al., 2006), Lobeliaceae (11 inversions; Knox et al., 1993), Oleaceae (4 inversions, 1 gene loss, 1 intron loss, 5 gene duplications; Lee et al., 2007), Poaceae (3 inversions, 3 gene losses, 2 intron losses; Doyle et al., 1992), and Ranunculaceae (9 inversions, 1 intron loss; Hoot and Palmer, 1994; Johansson, 1999).
The causes for the propensity of plastid genomic changes in these lineages are unknown but several explanations have been proposed. For legumes, it was suggested that the loss of the IR has a destabilizing effect on genome organization (Palmer, 1991; Palmer and Thompson, 1982). However, given that most of the angiosperms with highly rearranged plastid genomes still retain two copies of the IR (i.e., Campanulaceae, Lobeliaceae, Oleaceae, Poaceae, Ranunculaceae, and most Geraniaceae) IR-loss does not provide a general explanation for the extensive rearrangements in plastid genomes. Moreover, the fact that the majority of the known plastid genomic rearrangements in legumes are also found in papilionoid taxa with two copies of the IR in their genomes (Table 2; Fig. 5) argues against this explanation as well. Another possible explanation for the higher incidence of rearrangements in some lineages is the presence of dispersed sequence repeats, which could facilitate rearrangements by intramolecular recombination (Palmer, 1991). In wheat (Ogihara et al., 1988) and Oenothera (Hupfer et al., 2000), such repeats have been directly implicated in inversions, and the strong correlation detected between the number of dispersed repeats and the extent of genomic rearrangements in several lineages is consistent with this explanation (Pombert et al., 2005, 2006; Haberle et al., 2008). Recent comparisons of the number and distribution of repeated sequences in completely sequenced legume plastid genomes have demonstrated the presence of numerous dispersed repeats, many more than in related rosid genomes that are not rearranged (Saski et al., 2005).
We have plotted the distribution of several of the less ambiguous of the nine legume plastid genomic rearrangements (Table 2) on a phylogenetic tree based on cladistic analyses of complete nucleotide sequences of the plastid matK gene to assess the phylogenetic implications of these rare genomic changes (Fig. 5; tree summarized from Wojciechowski et al., 2004). Clearly, most plastid genomic rearrangements among legumes are restricted to the papilionoids with the exception of the loss of the rpl22 gene (Downie and Palmer, 1992; Doyle et al., 1995), which characterizes all taxa sampled from all three subfamilies of legumes, and the loss of the rpl2 intron in numerous species of the caesalpinioid genus Bauhinia (Lai et al., 1997). Interestingly, the rpl22 gene has not been lost from any other land plants (Downie and Palmer, 1992), and a functional copy has been isolated from the nuclear genome in Pisum sativum (Gantt et al., 1991).
The loss of one copy of the IR, as originally suggested by Lavin et al. (1990), has occurred only once among legumes and is restricted to a large clade of papilionoid legumes that includes the traditional tribes Carmichaelieae, Cicereae (chickpea), Galegeae, Hedysareae, Trifolieae, and Fabeae (Vicieae) and several genera formerly treated in the tribe Millettieae (so-called “IRLC millettioids”), including Callerya, Wisteria, Afgekia, Endosamara R. Geesink, and probably Antheroporum Gagnep. and Sarcodum Lour. (Hu et al., 2000, 2002; Hu and Chang, 2003). Taxa lacking the IR have been shown to form a monophyletic group informally known as the IRLC, which is well supported in phylogenetic trees based plastid matK and nuclear rDNA sequence analyses (Hu et al., 2000, 2002; Wojciechowski et al. 2000, 2004). From its taxonomic distribution, this mutational event in the plastid genome occurred relatively later in the evolution and diversification of the Papilionoideae, a molecular synapomorphy for a large (ca. 4400 species), derived group of primarily herbaceous taxa with a temperate distribution (Wojciechowski et al., 2000, 2004) and an estimated age of 39 Ma (Lavin et al., 2005). Like the now established monophyly of the taxa marked by loss of the IR (Palmer et al., 1988), results from molecular phylogenetic studies have provided both greater resolution and corroborating evidence for the relationships of many of the temperate and tropical groups long considered “derived” within papilionoids based upon the presence of morphological and/or chemical characters that have served as important taxonomic markers (Polhill, 1994). For example, the hypothesis for a single origin of canavanine biosynthesis (production of a non-protein amino acids such as l-canavanine, a close analog of arginine) in Papilionoideae (Bell, 1981), which occurs in all the tribes that comprise the IRLC plus the related tropical tribes that comprise their sister group (i.e., Millettieae, Phaseoleae, and allies), has been supported by recent analyses of plastid rbcL and matK gene sequences in legumes (Kajita et al., 2001; Wojciechowski et al., 2004). Distribution of the two legume plastid genome inversions provided additional support for clades identified in phylogenetic trees based on analyses of gene sequences. The 50-kb inversion defines an early evolutionary split in the diversification of the papilionoid clade with all members of this clade having the inversion except for taxa from the tribes Sophoreae, Swartzieae, and Dipterygeae (Doyle et al., 1996; Pennington et al., 2001; Wojciechowski et al., 2004), although the exact membership of this clade remains unresolved (due to lack of sampling all relevant taxa). The 78-kb inversion is much more limited in its distribution, being restricted to a majority of the genera that traditionally comprise subtribe Phaseolinae of the tribe Phaseoleae (Bruneau et al., 1990), which is also supported as a monophyletic group in trees based on plastid gene sequences (Thulin et al., 2004; Wojciechowski et al., 2004). Recent evidence indicates this inversion may be a synapomorphy for this clade, defined by the most recent common ancestor of the genera Wajira Thulin and Phaseolus L. (M. Moore, M.F. Wojciechowski, A. Delgado, and P.S. Soltis, unpublished data).
Two other gene losses in legumes have been detected in at least one genus in 15 (rps16 and ycf4) of the 28 papilionoid tribes (sensu Lewis et al., 2005). The taxonomic distribution of the rps16 loss based on filter hybridizations (Doyle et al., 1995) suggested multiple, independent losses within papilionoids but more rigorous PCR and sequencing strategies are needed to confirm these events. Among 64 completely sequenced seed plant plastid genomes there have been four independent losses of rps16: in Pinus, legumes, two members of the Malphigiales (Passiflora and Populus) and the monocot Dioscorea (Jansen et al., 2007). This gene has also been lost in the genus Adonis in the Ranunculaceae based on filter hybridization data (Johansson, 1999). The loss of ycf4 (formerly called ORF184) has been documented in Pisum (Nagano et al., 1991; Smith et al., 1991) and it is lacking in three (Cicer, Glycine, and Medicago) of the five completely sequenced legume plastid genomes. An earlier survey of the phylogenetic distribution of this loss among 392 legume genera based on filter hybridization screens with ycf4 gene-specific probes indicated at least 15 independent losses within tribe Phaseoleae alone (Doyle et al., 1995). Both the genome sequences and the filter hybridization data suggest that considerable homoplasy will limit the phylogenetic utility of this gene loss within legumes.
The two plastid genomic changes identified by sequencing the chickpea genome provide valuable information for resolving relationships among the IRLC papilionoid legumes (Table 2 and Fig. 5). Intron losses for both the clpP and rps12 genes have been identified in other angiosperm lineages as well. For example, both of the clpP gene introns have been lost in Poaceae, Onagraceae, Oleaceae, and Pinus (reviewed in Jansen et al., 2007), and the intron in the 3′-end of rps12 has been lost independently twice in the monocot order Asparagales (McPherson et al., 2004). However, the losses in Cicer represent the first documented case of the loss of introns from both of these genes in the same plastid genome. The clpP intron loss, which appears to have occurred only once within Leguminosae, provides additional support for the monophyly of the IRLC.
While the data suggest that loss of the rps12 intron generally coincides phylogenetically with the loss of the IR (Table 1), the distribution of the rps12 intron loss is more informative because it marks a slightly less-inclusive clade within the IRLC that provides additional data to resolve relationships among the early-branching lineages of the IRLC. That Callerya, Wisteria, Afgekia, and Millettia japonica unambiguously possess the intron in the rps12 gene is interesting because trees based on nucleotide sequences (Wojciechowski et al., 2000, 2004; Hu et al., 2002) have not been able to resolve the relationships of these and other lineages at the base of this clade. Indeed, results from molecular phylogenetic analyses are not in agreement on this point, with some data suggesting Glycyrrhiza L. is the sister group to the rest of the IRLC or Glycyrrhiza + Callerya and/or Wisteria s.l. (i.e., including other IRLC millettioids) are sister to the rest of the IRLC. While the consensus seems to be that Callerya + Wisteria s.l. form a clade (e.g., Hu and Chang, 2003), the presence of the rps12 intron in Callerya, Wisteria, and other members of the IRLC millettioids suggests they comprise the earliest-branching lineages that form the sister group to the rest of the IRLC, which are characterized by the loss of the intron from rps12. Furthermore, this result indicates that the loss of this intron occurred subsequent to the loss of one copy of the IR in these taxa. A second, independent loss of the rps12 intron has occurred in the more distantly related tribe Desmodieae (Fig. 5), which retains both copies of the IR, a group that is also marked by loss of the rpl2 intron (Doyle et al. 1995; Bailey et al. 1997).
Legume plastid genomes have undergone considerable diversification in gene/intron content and gene order, and these changes provide valuable information for resolving phylogenetic relationships among and within some major clades identified on the basis of analyses of DNA sequences. The two new genomic changes identified in the present study provide additional support of the monophyly of the IR-loss clade, and resolution of the early-branching pattern in this clade. In addition to providing insight into plastid genome evolution and phylogenetic relationships of legumes, the availability of complete plastid genome sequences facilitates plastid genetic engineering for improvement of agronomic traits and production of vaccines, biopharmaceuticals, biomaterials and industrial enzymes. Complete plastid genome sequences provide valuable information on spacer regions for integration of transgenes at optimal sites via homologous recombination, as well as endogenous regulatory sequences for optimal expression of transgenes, and should help in expanding plastid technology to other economically important crops.
Investigations reported in this article were supported in part by Grants from USDA 3611-21000-017-00D and NIH R01 GM 63879 to H.D. Research by R.K.J. was supported, in part, by NSF ATOL Grant DEB0120709. M.F.W. was supported, in part, by NSF Grant DEB0542958. The authors thank Matt Lavin, Alfonso Delgado-Salinas, Jennifer Trusty, Kelly Steele, Jer-Ming Hu, R. Toby Pennington, Mats Thulin, Aaron Listen, herbaria of The Royal Botanic Gardens at Kew (K), Edinburgh (E), Munich (M/MSB), and the Arizona State University Vascular Plant Herbarium (ASU) for tissue or DNA samples, and Dr. Michael Bausher (USDA) for help in early stages of genome sequencing. We also thank Anne Bruneau and an anonymous reviewer for their helpful comments on the manuscript.