|Home | About | Journals | Submit | Contact Us | Français|
Sequence analysis of organelle genomes and comprehensive analysis of C-to-U editing sites from flowering and non-flowering plants have provided extensive sequence information from diverse taxa. This study includes the first comprehensive analysis of RNA editing sites from a gymnosperm mitochondrial genome, and utilizes informatics analyses to determine conserved features in the RNA sequence context around editing sites. We have identified 565 editing sites in 21 full-length and 4 partial cDNAs of the 39 protein-coding genes identified from the mitochondrial genome of Cycas taitungensis. The information profiles and RNA sequence context of C-to-U editing sites in the Cycas genome exhibit similarity in the immediate flanking nucleotides. Relative entropy analyses indicate that similar regions in the 5′ flanking 20 nucleotides have information content compared to angiosperm mitochondrial genomes. These results suggest that evolutionary constraints exist on the nucleotide sequences immediately adjacent to C-to-U editing sites, and similar regions are utilized in editing site recognition.
The online version of this article (doi:10.1007/s00294-010-0312-4) contains supplementary material, which is available to authorized users.
Comprehensive analysis of RNA editing sites has been reported for several chloroplast and mitochondrial genomes (mtDNAs) of non-flowering and flowering plants. Chloroplast genomes (cpDNAs) of flowering plants typically possess about 30 C-to-U editing sites and no U-to-C editing sites (Wakasugi et al. 2001). However, cpDNAs of some non-flowering plants exhibit much higher frequencies of RNA editing. For example, the cpDNA from the hornwort, Anthoceros formosae, exhibits extensive editing with 509 C-to-U and 433 U-to-C editing sites (Kugita et al. 2003a, b), but other non-flowering plants such as Marchantia have no RNA editing sites (Oda et al. 1992). Comprehensive analysis of editing sites in the cpDNA of the fern Adiantum capillus-veneris reported 350 C-to-U and 35 U-to-C editing sites (Wolf et al. 2003, 2004). The moss Takakia lepidozioides has been partially characterized for editing sites, and 302 C-to-U and no U-to-C editing sites were reported (Yura et al. 2008). Several angiosperm mtDNAs have between 350 and 500 C-to-U editing sites (Giege and Brennicke 1999; Kubo et al. 2000; Notsu et al. 2002; Handa 2003; Mower and Palmer 2006), and the Cycas mtDNA was predicted to include over 1,000 C-to-U editing sites (Chaw et al. 2008).
Thus, the prevalence of C-to-U editing sites in the cpDNAs and mtDNAs of diverse plants offers an excellent opportunity to evaluate the nucleotide distribution in different organelle systems and diverse taxa of land plants. Several computational studies have examined RNA editing sites (Cummings and Myers 2004; Tillich et al. 2006; Mulligan et al. 2007; Jobson and Qiu 2008; Yura et al. 2008). C-to-U editing sites in angiosperm mtDNAs have similar information profiles in the 5′ and 3′ flanking regions and a similar RNA context (Mulligan et al. 2007). These results are consistent with molecular analyses that demonstrated that 5′ flanking regions are required for editing site conversion (Takenaka et al. 2004; van der Merwe et al. 2006). Edited chloroplast RNA fragments for ndhB-9 and ndhF-1 share sequence similarity around the editing site, and are specifically bound by the same chloroplast protein (Kobayashi et al. 2008). In addition, CRR4, a PPR protein required for editing in Arabidopsis chloroplasts, is a sequence-specific RNA-binding protein that binds to sequences comprised of 25 nucleotides upstream and 10 nucleotides downstream of the ndhD-1 editing site (Okuda et al. 2006).
RNA editing sites in lower plant cpDNAs have also been analyzed with computational tools. The distribution of nucleotides around editing sites in lower and higher plant chloroplasts was examined within codons, and a detailed analysis of the effects on codon changes and usage was developed (Tillich et al. 2006), and a model for the evolution of RNA editing based on conservation of codon usage was proposed. The RNA sequences flanking 302 C-to-U editing sites from Takakia cpDNA were classified into eight groups with common patterns, and these patterns could be used to predict novel editing sites (Yura et al. 2008). Recently, Jobson and Qiu (2008) examined C-to-U and U-to-C editing patterns with respect to codon position and amino acid changes in the cpDNAs and mtDNAs of plants (Jobson and Qiu 2008). Editing was reported to increase the hydrophobicity and molecular size of the amino acid side chain, was most abundant in genes of membrane proteins, and was more frequent in T-rich sequences and in genes under positive selection (Jobson and Qiu 2008).
In this paper, we report the first extensive analysis of RNA editing from the mtDNA of a gymnosperm, Cycas taitungensis (Taitung cycad) and confirm 565 editing sites in 25 mitochondrial genes. The Cycas editing sites and sequence data from known flowering and non-flowering plant mtDNAs are analyzed with informatics tools. Common features of the editing sites from these diverse taxa suggest similar mechanisms of editing site recognition and conversion.
Five grams of fresh Cycas leaves were frozen in liquid nitrogen and pulverized with a mortar and pestle. Total RNA was extracted and purified by RNeasy® Plant Mini Kit (Qiagen, Hilden). For reverse transcriptase-polymerase chain reaction (RT-PCR) assay, total RNA was treated with DNAse I and then extracted with phenol–chloroform to eliminate DNA contamination. RNA was reverse transcribed to synthesize cDNA with Superscript II reverse transcriptase (Invitrogen, Indianapolis) and a gene-specific primer (Additional file 2) according to the manufacturer’s protocol. cDNAs were PCR amplified with specific reverse and forward primer pairs (Additional file 2). The PCR products were purified by gel extraction (Gel-M, Viogene Inc., Taiwan) and directly sequenced with the BigDye terminator cycle sequencing kit (Applied Biosystems, Foster City, CA) according to the manufacturer’s protocol. DNA was sequenced with Applied Biosystems ABI 3700 sequencer.
cpDNA sequences and the identification of editing sites were obtained from the following Genbank accessions and citations: Adiantum capillus-veneris, AY178864 (Wolf et al. 2003, 2004); Anthoceros formosae, NC_004543 (Kugita et al. 2003a, b); Takakia lepidozioides, AB193121, AB254134, AB299142, AB367138, AB367138 (Yura et al. 2008); Zea mays, X86563 (Maier et al. 1995); and Nicotiana tabacum, Z00044 (Shinozaki et al. 1986; Tsudzuki et al. 2001). mtDNA sequences and the identification of editing sites were obtained from the following Genbank accessions and citations: C. taitungensis, AP009381 (Chaw et al. 2008); Arabidopsis thaliana, NC001284 (Giege and Brennicke 1999); Beta vulgaris, AP006444 (Kubo et al. 2000); Brassica napus, BA000009, DQ381444–DQ381465 (Handa 2003; Mower and Palmer 2006); Oryza sativa, BA000029 (Notsu et al. 2002). Additional file 3 provides accession numbers for cDNA sequence of Cycas mitochondrial genes.
Protein-coding sequences for each genome were annotated with edited nucleotides represented by an upper case C, and these sequences are available in Additional files 4, 5, 6, and 7. Thus, edited nucleotide positions are represented as the unedited nucleotide in the sequences analyzed in this study. The sequence files were limited to known protein-coding sequences larger than 100 nucleotides, and small or uncharacterized ORFs, introns, and other non-coding sequences were not analyzed.
Computational analyses were performed as previously described (Mulligan et al. 2007). Briefly, the nucleotide distribution around all edited and unedited cytidines was analyzed in a one-, two-, or three-nucleotide sliding window. Each coding sequence was scanned for edited and unedited C, and the sequence was written to an array of edited or unedited sequences. Thus, the sequences flanking all edited or unedited cytidines were aligned in a matrix. The frequency of nucleotides around the edited or unedited nucleotide (P or Q, respectively) was used to calculate the selectivity ratio (P/Q). Thus, a nucleotide or series of nucleotides with a selectivity ratio of one has the same relative frequency around edited and unedited cytidines, while a nucleotide with a selectivity ratio greater than 1 is more frequently present around an edited cytidine. Relative entropy was calculated as the Kullback–Leibler distance by the equation d = ∑Pk log(Pk/Qk) over k terms (k = 4n) for the distribution of nucleotides in 1, 2, or 3 nucleotide windows.
Random editing site assignment was used to produce coding sequences with randomly assigned editing sites. The editing site reassignment program scans each coding-sequence entry, and determines the number and codon position of each of the editing sites. This program randomly assigns a cytidine in the same codon position as an editing site and maintains the number and codon position of editing sites in a coding sequence. As a result, it is not “random”. Statistics such as mean, standard deviation, variance, and confidence intervals were determined from 1,000 iterations of random editing site reassignment.
Table 1 shows the distribution of RNA editing sites within gene sequences of the C. taitungensis mtDNA. Five hundred and sixty-five editing sites are confirmed by cDNA sequence analysis of 21 genes and partial sequences for four additional genes. Using PREP-Mt with several cutoff scores (Mower 2005), Chaw et al. (2008) predicted that the Cycas mtDNA has more than 1,000 C-to-U editing sites in the 39 protein-coding genes. Table 2 shows the distribution of editing sites within codons, and the distribution in the first, second and third codon positions are 30, 65, and 5%, respectively. This pattern of editing site distribution is very similar to that in angiosperm mtDNAs (Mulligan et al. 2007). Some start and stop codons are created by RNA editing in the Cycas mtDNA (Additional file 1). Start codons are created by editing ACG to ATG in three genes, including atp1, cox1 and sdh3, while stop codons are created by CGA to UGA conversion in atp6, atp9, ccmFC, nad4, rps12, and sdh3, and by CAA to UAA conversion for atp8, nad4L, and rps11. Table 2 shows the distribution of observed editing sites within codons in Cycas mitochondrial genome.
The relative entropy around Cycas editing sites is analyzed in a sliding window of 1, 2, or 3 nucleotides (Fig. 1a, b, c, respectively). The profiles show large values at nucleotides −1 and −2 and small peaks in the 5′ flanking region (−9, −6/−5/−4) and at +1. The influence of codon position is analyzed by separate analyses of editing sites in the first or second codon position, and the relative entropy of these subsets also exhibits similar information profiles (Fig. 1d). The highest relative entropy is present in the −1 position, and profile is similar in the 5′ flanking region with a peak at −5. The major difference in the relative entropy around editing sites in codon positions 1 and 2 is the large peak at the +2 position in CPA1, and a similar result was observed in angiosperm mitochondrial genomes (Mulligan et al. 2007). This position represents the first downstream wobble position, and synonymous mutations may allow optimization of the editing site for efficient editing, and would result in increased entropy at these positions. The information content around editing sites in angiosperm mtDNAs is similar with very high information immediately 5′ of the editing site, a peak at nucleotides −6/−5/−4 and +1, and relatively little information in the 3′ flanking region (Mulligan et al. 2007).
The highest level of information around C-to-U editing sites resides in the nucleotides immediately upstream of the edited nucleotide. Table 3 compares the selectivity ratios (P/Q) for the distribution of dinucleotides in the −2/−1 position around editing sites in three mtDNAs and five cpDNAs. The analysis of C-to-U editing sites in Cycas mtDNA reveals a similar distribution in the angiosperm mtDNAs, Arabidopsis and Oryza. The dinucleotides UU, UC, CU, and AU are highly enriched at the −2/−1 position (Table 3). Dinucleotides with a purine at the −1 position were rarely observed at the −2/−1 position and exhibited very small selectivity ratios.
A scatter plot compares the selectivity ratios for dinucleotides upstream of Cycas and angiosperm mitochondrial editing sites (Fig. 2a); thus, each of the 16 points corresponds to the selectivity ratios for a specific dinucleotide. An extraordinary level of congruence exists between the selectivity ratios of the dinucleotides in the −2/−1 position of the Cycas mtDNA and the angiosperm mtDNAs. Linear regression analysis of these data indicates slope values near 1, Y-intercepts near zero, and coefficients of determination (R2) greater than 0.9 (Fig. 2a).
The C-to-U editing sites in the cpDNAs of non-flowering plants show a similar trend in the pyrimidine-rich dinucleotides upstream of editing sites (Table 3). The selectivity ratios upstream of C-to-U editing sites in the cpDNAs of the hornwort (Anthoceros), the moss (Takakia), and the fern (Adiantum) are very similar to selectivity ratios observed in angiosperm mitochondria. Figure 2b compares the selectivity ratios of Takakia cpDNA editing sites with plant mtDNA editing sites, and the high degree of similarity is indicated by a coefficient of determination (R2) greater than 0.85. Therefore, the distribution of nucleotides around C-to-U editing sites in non-flowering plant cpDNAs is very similar to those in plant mtDNAs, which strengthens a common origin and early evolution of RNA editing in the two organelle systems.
Informatics analyses demonstrate strong similarities in the C-to-U editing sites across diverse taxa and organelle systems. The information profiles around C-to-U editing sites generally exhibit high relative entropies in the −1/−2 regions and smaller peaks in the 5′ flanking 20 nucleotides. In contrast, there is generally little information in the 3′ flanking nucleotides. Furthermore, RNA sequence context at C-to-U editing sites is very similar across these diverse taxa and both organelle systems. In plant mitochondria and in lower plant chloroplasts, pyrimidine-rich dinucleotides are highly enriched upstream of C-to-U editing sites, and a very low frequency of purines exists at −1. Thus, there is a strong selection of nucleotides immediately adjacent to C-to-U editing sites across eight organelle genomes including taxa that diverged at least 400 million years ago (mya) (Palmer et al. 2004). The conserved information profile and nucleotide context around C-to-U editing sites may result from constraints related to the editing mechanism. The peaks in the information profile suggest that similar positions are utilized in editing site recognition. These results further substantiate the model that common editing site features may exist immediately adjacent to C-to-U editing sites, and that cis-elements for individual editing sites reside in the 5′ flanking region.
The information profile around C-to-U editing sites usually exhibits small peaks in the 5′ flanking region. In contrast to the strong sequence conservation at −1/−2 positions of editing sites, relatively little sequence similarity is observed in the scatter plots in these upstream regions (data not shown). Molecular analyses of the sequences required for editing site conversion in angiosperm chloroplast and mitochondrial systems have demonstrated that the cis-element includes approximately 20 nucleotides of upstream sequence and relatively little downstream sequence (Shikanai 2006; van der Merwe et al. 2006; Hayes and Hanson 2008). Clusters of editing sites have been proposed for the recognition of editing sites in higher plant chloroplasts (Chateigner-Boutin and Hanson 2002), and some editing sites can be grouped into clusters that exhibit sequence similarity. In some cases, PPR genes are required for processing two or more editing sites, and the cis-elements share limited sequence similarity (Chateigner-Boutin 2008; Okuda et al. 2009, 2010; Zehrmann et al. 2009). The analysis of the cis-elements for 34 editing sites and 15 PPR proteins in Arabidopsis required for RNA editing indicated that the cis-elements for editing sites were not strikingly similar (Hammani et al. 2009).
Informatics analyses demonstrate that C-to-U editing sites share common features across diverse taxa and organelle systems. The information profiles around editing sites in these diverse systems show similar patterns. Furthermore, the nucleotides at −1/−2 show remarkable similarity across diverse taxa and different organelles systems. The conserved information profiles and nucleotide context around C-to-U editing sites across these broad taxa may be a constraint of common features of the editing mechanism.
Below is the link to the electronic supplementary material.
This work was supported by a National Science Council grant to S.-M.C. (97-2621-B001-003-MY3), and a National Science Foundation grant to R.M.M. (MCB-0929423).
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.