We present results for the most comprehensive survey of RNA editing across the mouse genome performed thus far, and one of the largest for any organism. We produced an initial conservative set of RNA editing calls by applying stringent filters to remove systematic alignment artifacts. Using knowledge of the local clustering of A-to-G editing sites, we then extend this call set to 7,389 sites. Our analysis confirms that adenosine deamination is the primary mechanism responsible for RNA editing in the mouse as we find just a small fraction (<0.1%) of edits that are not A-to-I sequence changes. This result is in agreement with a recent analysis of RNA editing within the human transcriptome [12
We found that for 23% of the RNA editing sites we identified (1,710 out of 7,389), we could not determine the edited strand confidently because of missing annotation or because there were overlapping genes on opposite strands. Validation using Sequenom genotyping, however, confirmed these positions as being true RNA editing sites.
One of the notable features of our editing calls is the striking conservation of editing sites across mouse strains, including the evolutionarily distant wild-derived strains. In particular, we observed that 92% of clustered RNA editing sites are edited in 14 or more strains and for sites shared between strains the editing levels between strain replicates was almost identical to the level between strains (at 90% of sites the standard deviation is 10% within strains and 15% across strains). We believe that variability in the overall editing rate in different strains (Figure ) is most likely not attributable to biological differences, but reflects differences in transcriptome sequence coverage (and hence our sensitivity to detect edited bases), alignment issues, including the influence of SNPs and indels close to editing sites, and the influence of other types of systematic errors. Although we did not specifically validate the level of editing calculated from the RNA-seq data, the single molecule counting approach we applied is analogous to those used for Chip-seq and expression analysis where sequencing is the gold standard. Furthermore, RNA-seq was recently shown to provide good concordance with clonal sequence validation of RNA editing sites [12
The high level of conservation of RNA editing sites and of the rate of editing amongst the mouse strains examined in this study is remarkable considering the level of genomic sequence variation between them. For example, there are more than 35 million single nucleotide differences between SPRET/EiJ and the strain C57BL/6NJ [13
], yet the positions of editing in the brain transcriptomes of these strains are overwhelmingly conserved. Importantly, a large proportion of RNA editing sites (90.5%) could not be lifted over to the human genome owing to the fact that most RNA editing takes place in sequence that is repetitive, or is not highly conserved. Interestingly, we found a few instances where RNA editing at one site correlated with genomic base differences at nearby sites. For example, the presence of a SNP in Cds2
(chr2:123,135,391) correlated perfectly with the absence of RNA editing at a site 225 bp downstream (chr2:132,135,616); 80% of Cds2
transcripts are edited in strains without this SNP (Figure s12 in Additional file 2
). It is possible that the reason for this and other differences (Figures s13 and s14 in Additional file 2
) is an alteration of the double-stranded RNA structure of the transcript caused by genomic variation. We are not, however, able to confirm this hypothesis since the analysis of the RNA sequence structure of Cds2
using software tools, including mFold, was inconclusive.
We identified and validated 24 previously unknown RNA editing sites in protein coding sequence and found most of the known sites (19 out of 23). Of the four known sites we missed, two were poorly covered by RNA-seq data in our study (less than ten reads). We highlight two particularly interesting examples of the effect of RNA editing. In the Cacna1d gene we find two proximal non-synonymous coding RNA editing sites that were edited at a relatively constant level across all strains. Another example is an edit in a Cds2 transcript that occurred in all strains except the wild-derived strains, which have the corresponding A-to-G base change in their genomic sequence. In rat, the corresponding orthologous position in Cds2 is a G and agrees with the wild strains, indicating that the G-to-A SNP in the laboratory strains potentially occurred after their divergence. Thus, RNA editing may act on this site in the laboratory strains to preserve the ancestral sequence of the 3' UTR of the Cds2 gene.