In flowering plants, mitochondrial and chloroplast mRNAs are edited by C-to-U base modification. In plant organelles, RNA editing appears to be generally a correcting mechanism that restores the proper function of the encoded product. Members of the Arabidopsis RNA editing-Interacting Protein (RIP) family have been recently shown to be essential components of the plant editing machinery. We report the use of a strand- and transcript-specific RNA-seq method (STS-PCRseq) to explore the effect of mutation or silencing of every RIP gene on plant organelle editing. We confirm RIP1 to be a major editing factor that controls the editing extent of 75% of the mitochondrial sites and 20% of the plastid C targets of editing. The quantitative nature of RNA sequencing allows the precise determination of overlapping effects of RIP factors on RNA editing. Over 85% of the sites under the influence of RIP3 and RIP8, two moderately important mitochondrial factors, are also controlled by RIP1. Previously uncharacterized RIP family members were found to have only a slight effect on RNA editing. The preferential location of editing sites controlled by RIP7 on some transcripts suggests an RNA metabolism function for this factor other than editing. In addition to a complete characterization of the RIP factors for their effect on RNA editing, our study highlights the potential of RNA-seq for studying plant organelle editing. Unlike previous attempts to use RNA-seq to analyze RNA editing extent, our methodology focuses on sequencing of organelle cDNAs corresponding to known transcripts. As a result, the depth of coverage of each editing site reaches unprecedented values, assuring a reliable measurement of editing extent and the detection of numerous new sites. This strategy can be applied to the study of RNA editing in any organism.
RNA editing is a co- or post-transcriptional RNA processing reaction that changes the nucleotide sequence of the RNA substrate. In flowering plants, mRNA editing is confined to organelle transcripts, altering cytidine to uridine. Recently, some members of a small Arabidopsis gene family were found to be important for editing of chloroplast and mitochondrial transcripts. Several methods have been developed to measure the amount of edited transcripts at specific Cs, but most of these methods either lack sensitivity or are unable to determine the number and location of edited Cs in a particular transcript. While sensitive assays have been previously developed, they are costly and labor-intensive precluding their use on a large-scale. In order to characterize the role of an entire gene family in RNA editing, we have successfully adapted RNA sequencing technology to characterize the effect of mutation and silencing of family members on organelle RNA editing. Our method to measure editing extent is sensitive, reliable, and cost-effective. As well as detecting additional family members that play a role in RNA editing, we have detected numerous new editing sites. Our strategy should benefit the investigation of RNA editing in any organism.
In plant organelles, specific messenger RNAs (mRNAs) are subjected to conversion editing, a process that often converts the first or second nucleotide of a codon and hence the encoded amino acid. No systematic patterns in converted sites were found on mRNAs, and the converted sites rarely encoded residues located at the active sites of proteins. The role and origin of RNA editing in plant organelles remain to be elucidated.
Here we study the relationship between amino acid residues encoded by edited codons and the structural characteristics of these residues within proteins, e.g., in protein-protein interfaces, elements of secondary structure, or protein structural cores. We find that the residues encoded by edited codons are significantly biased toward involvement in helices and protein structural cores. RNA editing can convert codons for hydrophilic to hydrophobic amino acids. Hence, only the edited form of an mRNA can be translated into a polypeptide with helix-preferring and core-forming residues at the appropriate positions, which is often required for a protein to form a functional three-dimensional (3D) structure.
We have performed a novel analysis of the location of residues affected by RNA editing in proteins in plant organelles. This study documents that RNA editing sites are often found in positions important for 3D structure formation. Without RNA editing, protein folding will not occur properly, thus affecting gene expression. We suggest that RNA editing may have conferring evolutionary advantage by acting as a mechanism to reduce susceptibility to DNA damage by allowing the increase in GC content in DNA while maintaining RNA codons essential to encode residues required for protein folding and activity.
In plants, RNA editing is a process that converts specific cytidines to uridines and uridines to cytidines in transcripts from virtually all mitochondrial protein-coding genes. There are thousands of plant mitochondrial genes in the sequence databases, but sites of RNA editing have not been determined for most. Accurate methods of RNA editing site prediction will be important in filling in this information gap and could reduce or even eliminate the need for experimental determination of editing sites for many sequences. Because RNA editing tends to increase protein conservation across species by "correcting" codons that specify unconserved amino acids, this principle can be used to predict editing sites by identifying positions where an RNA editing event would increase the conservation of a protein to homologues from other plants. PREP-Mt takes this approach to predict editing sites for any protein-coding gene in plant mitochondria.
To test the general applicability of the PREP-Mt methodology, RNA editing sites were predicted for 370 full-length or nearly full-length DNA sequences and then compared to the known sites of RNA editing for these sequences. Of 60,263 cytidines in this test set, PREP-Mt correctly classified 58,994 as either an edited or unedited site (accuracy = 97.9%). PREP-Mt properly identified 3,038 of the 3,698 known sites of RNA editing (sensitivity = 82.2%) and 55,956 of the 56,565 known unedited sites (specificity = 98.9%). Accuracy and sensitivity increased to 98.7% and 94.7%, respectively, after excluding the 489 silent editing sites (which have no effect on protein sequence or function) from the test set.
These results indicate that PREP-Mt is effective at identifying C to U RNA editing sites in plant mitochondrial protein-coding genes. Thus, PREP-Mt should be useful in predicting protein sequences for use in molecular, biochemical, and phylogenetic analyses. In addition, PREP-Mt could be used to determine functionality of a mitochondrial gene or to identify particular sequences with unusual editing properties. The PREP-Mt methodology should be applicable to any system where RNA editing increases protein conservation across species.
Three nonsense codons and an unusual initiation codon were located within the putative coding region of the atpB gene of chloroplast DNA of the hornwort Anthoceros formosae. Nucleotide sequencing of cDNA prepared from transcripts revealed extensive RNA editing. The unusual initiation codon ACG was changed to AUG and three nonsense codons were converted into sense codons. In total 15 C residues of the genomic DNA were replaced by U residues in the mRNA sequences, while 14 U residues were replaced by C residues. This is the highest number of editing events for a chloroplast mRNA reported so far. Partial editing was also shown in a cDNA clone where 23 sites were edited but six sites remained unedited, representing the existence of premature mRNA. The expected two-dimensional structure of the mRNA shows the existence of a sequence complementary to every editing site, which can produce continuous base pairing longer than 5 bp, suggesting that mispairing in the double strand is the site determinant for RNA editing in Anthoceros chloroplasts. Comparison of the cDNA sequence with other chloroplast genes suggests that the mechanism arose in the first land plants and has been reduced during evolution.
RNA editing describes the process in which individual or short stretches of nucleotides in a messenger or structural RNA are inserted, deleted, or substituted. A high level of RNA editing has been observed in the mitochondrial genome of Physarum polycephalum. The most frequent editing type in Physarum is the insertion of individual Cs. RNA editing is extremely accurate in Physarum; however, little is known about its mechanism. Here, we demonstrate how analyzing two organisms from the Myxomycetes, namely Physarum polycephalum and Didymium iridis, allows us to test hypotheses about the editing mechanism that can not be tested from a single organism alone. First, we show that using the recently determined full transcriptome information of Physarum dramatically improves the accuracy of computational editing site prediction in Didymium. We use this approach to predict genes in the mitochondrial genome of Didymium and identify six new edited genes as well as one new gene that appears unedited. Next we investigate sequence conservation in the vicinity of editing sites between the two organisms in order to identify sites that harbor the information for the location of editing sites based on increased conservation. Our results imply that the information contained within only nine or ten nucleotides on either side of the editing site (a distance previously suggested through experiments) is not enough to locate the editing sites. Finally, we show that the codon position bias in C insertional RNA editing of these two organisms is correlated with the selection pressure on the respective genes thereby directly testing an evolutionary theory on the origin of this codon bias. Beyond revealing interesting properties of insertional RNA editing in Myxomycetes, our work suggests possible approaches to be used when finding sequence motifs for any biological process fails.
RNA is an important biomolecule that is deeply involved in all aspects of molecular biology, such as protein production, gene regulation, and viral replication. However, many significant aspects such as the mechanism of RNA editing are not well understood. RNA editing is the process in which an organism's RNA is modified through the insertion, deletion, or substitution of single or short stretches of nucleotides. The slime mold Physarum polycephalum is a model organism for the study of RNA editing; however, hardly anything is known about its editing machinery. We show that the combination of two organisms (Physarum polycephalum and Didymium iridis) can provide a better understanding of insertional RNA editing than one organism alone. We predict several new edited genes in Didymium. By comparing the sequences of the two organisms in the vicinity of the editing sites we establish minimal requirements for the location of the information by which these editing sites are recognized. Lastly, we directly verify a theory for one of the most striking features of the editing sites, namely their codon bias.
In plant mitochondria, the post-transcriptional RNA editing process converts C to U at a number of specific sites of the mRNA sequence and usually restores phylogenetically conserved codons and the encoded amino acid residues. Sites undergoing RNA editing evolve at a higher rate than sites not modified by the process. As a result, editing sites strongly affect the evolution of plant mitochondrial genomes, representing an important source of sequence variability and potentially informative characters.
To date no clear and convincing evidence has established whether or not editing sites really affect the topology of reconstructed phylogenetic trees. For this reason, we investigated here the effect of RNA editing on the tree building process of twenty different plant mitochondrial gene sequences and by means of computer simulations.
Based on our simulation study we suggest that the editing ‘noise’ in tree topology inference is mainly manifested at the cDNA level. In particular, editing sites tend to confuse tree topologies when artificial genomic and cDNA sequences are generated shorter than 500 bp and with an editing percentage higher than 5.0%. Similar results have been also obtained with genuine plant mitochondrial genes. In this latter instance, indeed, the topology incongruence increases when the editing percentage goes up from about 3.0 to 14.0%. However, when the average gene length is higher than 1,000 bp (rps3, matR and atp1) no differences in the comparison between inferred genomic and cDNA topologies could be detected.
Our findings by the here reported in silico and in vivo computer simulation system seem to strongly suggest that editing sites contribute in the generation of misleading phylogenetic trees if the analyzed mitochondrial gene sequence is highly edited (higher than 3.0%) and reduced in length (shorter than 500 bp).
In the current lack of direct experimental evidence the results presented here encourage, thus, the use of genomic mitochondrial rather than cDNA sequences for reconstructing phylogenetic events in land plants.
Ebolavirus (EBOV), the causative agent of a severe hemorrhagic fever and a biosafety level 4 pathogen, increases its genome coding capacity by producing multiple transcripts encoding for structural and nonstructural glycoproteins from a single gene. This is achieved through RNA editing, during which non-template adenosine residues are incorporated into the EBOV mRNAs at an editing site encoding for 7 adenosine residues. However, the mechanism of EBOV RNA editing is currently not understood. In this study, we report for the first time that minigenomes containing the glycoprotein gene editing site can undergo RNA editing, thereby eliminating the requirement for a biosafety level 4 laboratory to study EBOV RNA editing. Using a newly developed dual-reporter minigenome, we have characterized the mechanism of EBOV RNA editing, and have identified cis-acting sequences that are required for editing, located between 9 nt upstream and 9 nt downstream of the editing site. Moreover, we show that a secondary structure in the upstream cis-acting sequence plays an important role in RNA editing. EBOV RNA editing is glycoprotein gene-specific, as a stretch encoding for 7 adenosine residues located in the viral polymerase gene did not serve as an editing site, most likely due to an absence of the necessary cis-acting sequences. Finally, the EBOV protein VP30 was identified as a trans-acting factor for RNA editing, constituting a novel function for this protein. Overall, our results provide novel insights into the RNA editing mechanism of EBOV, further understanding of which might result in novel intervention strategies against this viral pathogen.
Ebola virus (EBOV) causes severe hemorrhagic fever with case fatality rates of up to 90% and no therapy or vaccine currently available. A better understanding of the EBOV life cycle is important to develop new countermeasures against this virus; however, research with live EBOV is restricted to high containment laboratories. One unique feature of the EBOV life cycle is that its surface glycoprotein is expressed only after editing of the glycoprotein mRNA by the viral polymerase, leading to an insertion of a non-templated nucleotide into the mRNA. While this phenomenon has been long known, the mechanism of mRNA editing for EBOV is not understood. We have developed a unique minigenome system that allows the study of EBOV mRNA editing outside of a high containment laboratory. Using this system we have characterized EBOV mRNA editing and defined the sequence requirements for this process. Interestingly, we could show that signals both up- and downstream of the editing site are important, and that a secondary structure in the RNA upstream of the editing site as well as the viral protein VP30 contribute to editing. These findings provide new detailed molecular information about an essential process in the EBOV life cycle, which might be a potential novel target for antivirals.
RNA editing in land plant organelles is a process primarily involving the conversion of cytidine to uridine in pre-mRNAs. The process is required for gene expression in plant organelles, because this conversion alters the encoded amino acid residues and improves the sequence identity to homologous proteins. A recent study uncovered that proteins encoded in the nuclear genome are essential for editing site recognition in chloroplasts; the mechanisms by which this recognition occurs remain unclear. To understand these mechanisms, we determined the genomic and cDNA sequences of moss Takakia lepidozioides chloroplast genes, then computationally analyzed the sequences within −30 to +10 nucleotides of RNA editing sites (neighbor sequences) likely to be recognized by trans-factors. As the T. lepidozioides chloroplast has many RNA editing sites, the analysis of these sequences provides a unique opportunity to perform statistical analyses of chloroplast RNA editing sites. We divided the 302 obtained neighbor sequences into eight groups based on sequence similarity to identify group-specific patterns. The patterns were then applied to predict novel RNA editing sites in T. lepidozioides transcripts; ∼60% of these predicted sites are true editing sites. The success of this prediction algorithm suggests that the obtained patterns are indicative of key sites recognized by trans-factors around editing sites of T. lepidozioides chloroplast genes.
bioinformatics; chloroplast; computational biology; plant organelle; singlet and doublet propensities; Takakia lepidozioides
RNA editing is a post-transcriptional process that, in seed plants, involves a cytosine to uracil change in messenger RNA, causing the translated protein to differ from that predicted by the DNA sequence. RNA editing occurs extensively in plant mitochondria, but large differences in editing frequencies are found in some groups. The underlying processes responsible for the distribution of edited sites are largely unknown, but gene function, substitution rate, and gene conversion have been proposed to influence editing frequencies.
We studied five mitochondrial genes in the monocot order Alismatales, all showing marked differences in editing frequencies among taxa. A general tendency to lose edited sites was observed in all taxa, but this tendency was particularly strong in two clades, with most of the edited sites lost in parallel in two different areas of the phylogeny. This pattern is observed in at least four of the five genes analyzed. Except in the groups that show an unusually low editing frequency, the rate of C-to-T changes in edited sites was not significantly higher that in non-edited 3rd codon positions. This may indicate that selection is not actively removing edited sites in nine of the 12 families of the core Alismatales. In all genes but ccmB, a significant correlation was found between frequency of change in edited sites and synonymous substitution rate. In general, taxa with higher substitution rates tend to have fewer edited sites, as indicated by the phylogenetically independent correlation analyses. The elimination of edited sites in groups that lack or have reduced levels of editing could be a result of gene conversion involving a cDNA copy (retroprocessing). If so, this phenomenon could be relatively common in the Alismatales, and may have affected some groups recurrently. Indirect evidence of retroprocessing without a necessary correlation with substitution rate was found mostly in families Alismataceae and Hydrocharitaceae (e.g., groups that suffered a rapid elimination of all their edited sites, without a change in substitution rate).
The effects of substitution rate, selection, and/or gene conversion on the dynamics of edited sites in plant mitochondria remain poorly understood. Although we found an inverse correlation between substitution rate and editing frequency, this correlation is partially obscured by gene retroprocessing in lineages that have lost most of their edited sites. The presence of processed paralogs in plant mitochondria deserves further study, since most evidence of their occurrence is circumstantial.
The C->U editing of RNA is widely found in plant and animal species. In mammals it is a discrete process confined to the editing of apolipoprotein B (apoB) mRNA in eutherians and the editing of the mitochondrial tRNA for glycine in marsupials. Here we have identified and characterised apoB mRNA editing in the American opossum Monodelphus domestica. The apoB mRNA editing site is highly conserved in the opossum and undergoes complete editing in the small intestine, but not in the liver or other tissues. Opossum APOBEC-1 cDNA was cloned, sequenced and expressed. The encoded protein is similar to APOBEC-1 of eutherians. Motifs previously identified as involved in zinc binding, RNA binding and catalysis, nuclear localisation and a C-terminal leucine-rich domain are all conserved. Opossum APOBEC-1 contains a seven amino acid C-terminal extension also found in humans and rabbits, but not present in rodents. The opossum APOBEC-1 gene has the same intron/exon organisation in the coding sequence as the eutherian gene. Northern blot and RT-PCR analyses and an editing assay indicate that no APOBEC-1 was expressed in the liver. Thus the far upstream promoter responsible for hepatic expression in rodents does not operate in the opossum. An APOBEC-1-like enzyme such as might be involved in C->U RNA editing of tRNA in marsupial mitochondria was not demonstrated. The activity of opossum APOBEC-1 in the presence of both chicken and rodent auxiliary editing proteins was comparable to that of other mammals. These studies extend the origins of APOBEC-1 back 170 000 000 years to marsupials and help bridge the gap in the origins of this RNA editing process between birds and eutherian mammals.
Adenosine-to-inosine modification of RNA molecules (A-to-I RNA editing) is an important mechanism that increases transciptome diversity. It occurs when a genomically encoded adenosine (A) is converted to an inosine (I) by ADAR proteins. Sequencing reactions read inosine as guanosine (G); therefore, current methods to detect A-to-I editing sites align RNA sequences to their corresponding DNA regions and identify A-to-G mismatches. However, such methods perform poorly on RNAs that underwent extensive editing (“ultra”-editing), as the large number of mismatches obscures the genomic origin of these RNAs. Therefore, only a few anecdotal ultra-edited RNAs have been discovered so far. Here we introduce and apply a novel computational method to identify ultra-edited RNAs. We detected 760 ESTs containing 15,646 editing sites (more than 20 sites per EST, on average), of which 13,668 are novel. Ultra-edited RNAs exhibit the known sequence motif of ADARs and tend to localize in sense strand Alu elements. Compared to sites of mild editing, ultra-editing occurs primarily in Alu-rich regions, where potential base pairing with neighboring, inverted Alus creates particularly long double-stranded RNA structures. Ultra-editing sites are underrepresented in old Alu subfamilies, tend to be non-conserved, and avoid exons, suggesting that ultra-editing is usually deleterious. A possible biological function of ultra-editing could be mediated by non-canonical splicing and cleavage of the RNA near the editing sites.
The traditional view of mRNA as a pure intermediate between DNA and protein has changed in the last decades since the discovery of numerous RNA processing pathways. A frequent RNA modification is A-to-I editing, or the conversion of adenosine (A) to inosine (I). Since inosine is read as a guanosine (G), A-to-I editing leads to changes in the RNA sequence that can alter the function of its encoded protein. In recent years, tens of thousands of human A-to-I editing sites were discovered by computationally comparing RNA sequences to the human genome and searching for A-to-G mismatches. However, previous screens usually ignored RNA sequences that were edited to extreme, because the large number of A-to-G mismatches carried by these RNAs obscured their genomic origin. We developed a new computational framework to detect extreme A-to-I editing, or ultra-editing, based on masking potential editing sites before the alignment to the genome. Our method detected about 14,000 editing sites, with each edited molecule affected, on average, in more than 20 nucleotides. We demonstrated that the likely reason for the ultra-editing of those sequences is their potential to fold back into a particularly long double-stranded structure, which is the preferred target of the editing enzymes.
In bean, potato, and Oenothera plants, the C encoded at position 4 (C4) in the mitochondrial tRNA Phe GAA gene is converted into a U in the mature tRNA. This nucleotide change corrects a mismatched C4-A69 base pair which appears when the gene sequence is folded into the cloverleaf structure. C-to-U conversions constitute the most common editing events occurring in plant mitochondrial mRNAs. While most of these conversions introduce changes in the amino acids specified by the mRNA and appear to be essential for the synthesis of functional proteins in plant mitochondria, the putative role of mitochondrial tRNA editing has not yet been defined. Since the edited form of the tRNA has the correct secondary and tertiary structures compared with the nonedited form, the two main processes which might be affected by a nucleotide conversion are aminoacylation and maturation. To test these possibilities, we determined the aminoacylation properties of unedited and edited potato mitochondrial tRNAPhe in vitro transcripts, as well as the processing efficiency of in vitro-synthesized potato mitochondrial tRNAPhe precursors. Reverse transcription-PCR amplification of natural precursors followed by cDNA sequencing was also used to investigate the influence of editing on processing. Our results show that C-to-U conversion at position 4 in the potato mitochondrial tRNA Phe GAA is not required for aminoacylation with phenylalanine but is likely to he essential for efficient processing of this tRNA.
Uridine insertion/deletion RNA editing is a post-transcriptional RNA modification occurring in the mitochondria of kinetoplastid protozoa. The U-insertion/deletion Edited Sequence Database is a compilation of mitochondrial genes and edited mRNAs from five kinetoplastid species. It contains separate files with the DNA, mRNA (both unedited and edited) and predicted protein sequences, as well as alignments of the Leishmania tarentolae and Trypanosoma brucei protein sequences from edited and unedited genes. The sequence files are in GCG format. A 'map' sequence file showing the location of U-deletions, U-insertions and the translated amino acid sequences is also provided for each gene. Genomic maps for each species are also provided with clickable genes, including maxicircle-encoded gRNAs. Sets of aligned nuclear rRNA sequences from kinetoplastid protozoa are also provided, which were used for phylogenetic reconstructions in an analysis of the origin of RNA editing. The database is available through the World Wide Web as an HTML document at the URLhttp://www.lifesci.ucla.edu/RNA/trypanosome/ database.html
RNA editing revises the genetic code at precise locations, creating single base changes in mRNA. These changes can result in altered coding potential and modifications to protein function. Sequence analysis of the Shab potassium channel of Drosophila melanogaster revealed five such RNA editing sites. Four are constitutively edited (I583V, T643A, Y660C and I681V) and one undergoes developmentally regulated editing (T671A). These sites are located in the S4, S5–S6 loop and the S6 segments of the channel. We examined the biophysical consequences of editing at these sites by creating point mutations, each containing the genomic (unedited) base at one of the five sites in the background of a channel in which all other sites are edited. We also created a completely unedited construct. The function of these constructs was characterized using two-microelectrode voltage clamp in Xenopus oocytes. Each individual ‘unediting’ mutation slowed the time course of deactivation and the rise time during channel activation. Two of the mutants exhibited significant hyperpolarized shifts in their midpoints of activation. Constructs that deactivated slowly also inactivated slowly, supporting a mechanism of closed-state inactivation. One of the editing sites, position 660, aligns with the Shaker 449 residue, which is known to be important in tetraethylammonium (TEA) block. The aromatic, genomically-encoded residue tyrosine at this position in Shab enhances TEA block 14 fold compared to the edited residue, cysteine. These results show that both the position of the RNA editing site and the identity of the substituted amino acid are important for channel function.
RNA editing; activation; inactivation; pore block; voltage-gated potassium channel; Shab
The C↔U substitution types of RNA editing have been observed frequently in organellar genomes of land plants. Although various attempts have been made to explain why such a seemingly inefficient genetic mechanism would have evolved, no satisfactory explanation exists in our view. In this study, we examined editing patterns in chloroplast genomes of the hornwort Anthoceros formosae and the fern Adiantum capillus-veneris and in mitochondrial genomes of the angiosperms Arabidopsis thaliana, Beta vulgaris and Oryza sativa, to gain an understanding of the question of how RNA editing originated.
We found that 1) most editing sites were distributed at the 2nd and 1st codon positions, 2) editing affected codons that resulted in larger hydrophobicity and molecular size changes much more frequently than those with little change involved, 3) editing uniformly increased protein hydrophobicity, 4) editing occurred more frequently in ancestrally T-rich sequences, which were more abundant in genes encoding membrane-bound proteins with many hydrophobic amino acids than in genes encoding soluble proteins, and 5) editing occurred most often in genes found to be under strong selective constraint.
These analyses show that editing mostly affects functionally important and evolutionarily conserved codon positions, codons and genes encoding membrane-bound proteins. In particular, abundance of RNA editing in plant organellar genomes may be associated with disproportionately large percentages of genes in these two genomes that encode membrane-bound proteins, which are rich in hydrophobic amino acids and selectively constrained. These data support a hypothesis that natural selection imposed by protein functional constraints has contributed to selective fixation of certain editing sites and maintenance of the editing activity in plant organelles over a period of more than four hundred millions years. The retention of genes encoding RNA editing activity may be driven by forces that shape nucleotide composition equilibrium in two organellar genomes of these plants. Nevertheless, the causes of lineage-specific occurrence of a large portion of RNA editing sites remain to be determined.
This article was reviewed by Michael Gray (nominated by Laurence Hurst), Kirsten Krause (nominated by Martin Lercher), and Jeffery Mower (nominated by David Ardell).
RNA editing is a transcript-based layer of gene regulation. To date, no systemic study on RNA editing of plant nuclear genes has been reported. Here, a transcriptome-wide search for editing sites in nuclear transcripts of Arabidopsis (Arabidopsis thaliana) was performed.
MPSS (massively parallel signature sequencing) and PARE (parallel analysis of RNA ends) data retrieved from public databases were utilized, focusing on one-base-conversion editing. Besides cytidine (C)-to-uridine (U) editing in mitochondrial transcripts, many nuclear transcripts were found to be diversely edited. Interestingly, a sizable portion of these nuclear genes are involved in chloroplast- or mitochondrion-related functions, and many editing events are tissue-specific. Some editing sites, such as adenosine (A)-to-U editing loci, were found to be surrounded by peculiar elements. The editing events of some nuclear transcripts are highly enriched surrounding the borders between coding sequences (CDSs) and 3′ untranslated regions (UTRs), suggesting site-specific editing. Furthermore, RNA editing is potentially implicated in new start or stop codon generation, and may affect alternative splicing of certain protein-coding transcripts. RNA editing in the precursor microRNAs (pre-miRNAs) of ath-miR854 family, resulting in secondary structure transformation, implies its potential role in microRNA (miRNA) maturation.
To our knowledge, the results provide the first global view of RNA editing in plant nuclear transcripts.
RNAs transcribed from the mitochondrial genome of Physarum polycephalum are heavily edited. The most prevalent editing event is the insertion of single Cs, with Us and dinucleotides also added at specific sites. The existence of insertional editing makes gene identification difficult and localization of editing sites has relied upon characterization of individual cDNAs. We have now determined the complete mitochondrial transcriptome of Physarum using Illumina deep sequencing of purified mitochondrial RNA. We report the first instances of A and G insertions and sites of partial and extragenic editing in Physarum mitochondrial RNAs, as well as an additional 772 C, U and dinucleotide insertions. The notable lack of antisense RNAs in our non-size selected, directional library argues strongly against an RNA-guided editing mechanism. Also of interest are our findings that sites of C to U changes are unedited at a significantly higher frequency than insertional editing sites and that substitutional editing of neighboring sites appears to be coupled. Finally, in addition to the characterization of RNAs from 17 predicted genes, our data identified nine new mitochondrial genes, four of which encode proteins that do not resemble other proteins in the database. Curiously, one of the latter mRNAs contains no editing sites.
Pentatricopeptide repeat (PPR) proteins with an E domain have been identified as specific factors for C to U RNA editing in plant organelles. These PPR proteins bind to a unique sequence motif 5′ of their target editing sites. Recently, involvement of a combinatorial amino acid code in the P (normal length) and S type (short) PPR domains in sequence specific RNA binding was reported. PPR proteins involved in RNA editing, however, contain not only P and S motifs but also their long variants L (long) and L2 (long2) and the S2 (short2) motifs. We now find that inclusion of these motifs improves the prediction of RNA editing target sites. Previously overlooked RNA editing target sites are suggested from the PPR motif structures of known E-class PPR proteins and are experimentally verified. RNA editing target sites are assigned for the novel PPR protein MEF32 (mitochondrial editing factor 32) and are confirmed in the cDNA.
Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information.
We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes.
The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes.
RNA-seq; RNA editing; Potential SNP score; Constitutive editing; Editing box
RNA editing in chloroplasts of angiosperms proceeds by C-to-U conversions at specific sites. Nuclear-encoded factors are required for the recognition of cis-elements located immediately upstream of editing sites. The ensemble of editing sites in a chloroplast genome differs widely between species, and editing sites are thought to evolve rapidly. However, large-scale analyses of the evolution of individual editing sites have not yet been undertaken.
Here, we analyzed the evolution of two chloroplast editing sites, matK-2 and matK-3, for which DNA sequences from thousands of angiosperm species are available. Both sites are found in most major taxa, including deep-branching families such as the nymphaeaceae. However, 36 isolated taxa scattered across the entire tree lack a C at one of the two matK editing sites. Tests of several exemplary species from this in silico analysis of matK processing unexpectedly revealed that one of the two sites remain unedited in almost half of all species examined. A comparison of sequences between editors and non-editors showed that specific nucleotides co-evolve with the C at the matK editing sites, suggesting that these nucleotides are critical for editing-site recognition.
(i) Both matK editing sites were present in the common ancestor of all angiosperms and have been independently lost multiple times during angiosperm evolution.
(ii) The editing activities corresponding to matK-2 and matK-3 are unstable.
(iii) A small number of third-codon positions in the vicinity of editing sites are selectively constrained independent of the presence of the editing site, most likely because of interacting RNA-binding proteins.
Double-stranded (ds) RNA-specific adenosine deaminase converts adenosine residues into inosines in dsRNA and edits transcripts of certain cellular and viral genes such as glutamate receptor (GluR) subunits and hepatitis delta antigen. The first member of this type of deaminase, DRADA1, has been recently cloned based on the amino acid sequence information derived from biochemically purified proteins. Our search for DRADA1-like genes through expressed sequence tag databases led to the cloning of the second member of this class of enzyme, DRADA2, which has a high degree of sequence homology to DRADA1 yet exhibits a distinctive RNA editing site selectivity. There are four differentially spliced isoforms of human DRADA2. These different isoforms of recombinant DRADA2 proteins, including one which is a human homolog of the recently reported rat RED1, were analyzed in vitro for their GluR B subunit (GluR-B) RNA editing site selectivity. As originally reported for rat RED1, the DRADA2a and -2b isoforms edit GluR-B RNA efficiently at the so-called Q/R site, whereas DRADA1 barely edits this site. In contrast, the R/G site of GluR-B RNA was edited efficiently by the DRADA2a and -2b isoforms as well as DRADA1. Isoforms DRADA2c and -2d, which have a distinctive truncated shorter C-terminal structure, displayed weak adenosine-to-inosine conversion activity but no editing activity tested at three known sites of GluR-B RNA. The possible role of these DRADA2c and -2d isoforms in the regulatory mechanism of RNA editing is discussed.
RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified.
Data for analysis were derived from the the complete mitochondrial genomes of Arabidopsis thaliana, Brassica napus, and Oryza sativa; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations.
Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.
Although a large amount of experimentally derived information about RNA editing sites currently exists, this information has remained scattered in a variety of sources and in diverse data formats. Availability of standard collections for high-quality experimental data will be by of great help for systematic studying of RNA editing, especially for developing computational algorithm to predict RNA editing site. dbRES () is a public database of known RNA editing sites. All sites are manually curated from literature and GenBank annotations. dbRES version 1.1 contains 5437 RNA editing sites of 251 transcripts, covering 96 organisms across plant, metazoan, protozoa, fungi and virus. dbRES provides comprehensive annotations and data summaries, including (but not limited to) transcript sequences, RNA editing types, editing site locations, amino acid changes, organisms, subcellular organelles (if available), cited references, etc. A user-friendly web interface is developed to facilitate both retrieving data and online display of RNA edit site information.
RNA editing can lead to amino acid substitutions in protein sequences, alternative pre-mRNA splicing and changes in gene expression levels. The exact in vivo modes of interaction of the RNA editing enzymes with their targets are not well understood. Alterations in RNA editing have been linked to various human disorders and the improved understanding of the editing mechanism and specificity can explain the phenotypes that result from mis-regulation of RNA editing. Unbiased high-throughput methods of detection of RNA editing events genome-wide in human cells are necessary for the task of deciphering the RNA editing regulatory code. With the rapidly falling cost of genome re-sequencing, the future method of choice for the detection of RNA editing events will be whole-genome gDNA and cDNA sequencing. We describe a detailed procedure for the computational identification of RNA editing targets using the data from the deep sequencing of DNA and RNA from the peripheral blood mononuclear cells of a human individual with severe hemophilia A who is resistant to HIV infection. Interestingly, we find that mRNAs of the cyclin-dependent kinase CDK13 and the DNA repair enzyme NEIL1 undergo extensive A→ I RNA editing that lead to amino acid substitutions in protein sequences.
RNA editing; Single nucleotide variants; High-throughput sequencing; Bioinformatics; Human immunodeficiency virus infection
Human APOBEC3 proteins are cytidine deaminases that contribute broadly to innate immunity through the control of exogenous retrovirus replication and endogenous retroelement retrotransposition. As an intrinsic antiretroviral defense mechanism, APOBEC3 proteins induce extensive guanosine-to-adenosine (G-to-A) mutagenesis and inhibit synthesis of nascent human immunodeficiency virus-type 1 (HIV-1) cDNA. Human APOBEC3 proteins have additionally been proposed to induce infrequent, potentially non-lethal G-to-A mutations that make subtle contributions to sequence diversification of the viral genome and adaptation though acquisition of beneficial mutations. Using single-cycle HIV-1 infections in culture and highly parallel DNA sequencing, we defined trinucleotide contexts of the edited sites for APOBEC3D, APOBEC3F, APOBEC3G, and APOBEC3H. We then compared these APOBEC3 editing contexts with the patterns of G-to-A mutations in HIV-1 DNA in cells obtained sequentially from ten patients with primary HIV-1 infection. Viral substitutions were highest in the preferred trinucleotide contexts of the edited sites for the APOBEC3 deaminases. Consistent with the effects of immune selection, amino acid changes accumulated at the APOBEC3 editing contexts located within human leukocyte antigen (HLA)-appropriate epitopes that are known or predicted to enable peptide binding. Thus, APOBEC3 activity may induce mutations that influence the genetic diversity and adaptation of the HIV-1 population in natural infection.
Cytidine deaminases of the human APOBEC3 gene family act as an intrinsic defense mechanism against infection with HIV-1 and other viruses. The APOBEC3 proteins introduce mutations into the viral genome by inducing enzymatic modification of nucleotide sequences and inhibiting synthesis of cDNA strands from the viral RNA. Viral Vif counters this impediment to the fidelity of HIV-1 replication by targeting the APOBEC3 proteins for degradation. Low-level APOBEC3 activity that outlasts blockade by viral Vif may foster infrequent mutations that provide a source of genetic variation upon which natural selection acts. Here, we defined the APOBEC3 nucleotide contexts of the edited sites by titration of the wild type and non-editing APOBEC3 mutant in cultured cells. We then followed the patterns of G-to-A mutations we identified in viral DNA in cells obtained from ten patients with acute infection. Our deep sequencing analyses demonstrate an association between sub-lethal APOBEC3 editing and HIV-1 diversification. Mutations at APOBEC3 editing contexts that occurred at particular positions within specific known or predicted epitopes could disrupt peptide binding critical for immune control. Our findings reveal a role for human APOBEC3 in HIV-1 sequence diversification that may influence fitness and evolution of beneficial variants and phenotypes in the population.