A major goal of any genome sequencing project is the accurate identification of all encoded gene products. This is particularly challenging in organisms whose transcripts are subject to the addition of nucleotides that are not encoded in the gene itself. Indeed, such ‘cryptogenes’ are often invisible to standard gene-finding algorithms, as evidenced by the initial characterization of the mitochondrial genomes of Trypanosoma brucei
and Leishmania tarentolae
), as well as that of P.polycephalum
In Physarum mitochondria, ~1 out of every 25 nt in edited mRNAs and 1 of every 40 nt in stable RNAs are added during transcription, most as single nucleotide insertions. ORFs are created by repeated frameshifts, which can occur as often as every fourth codon. It is therefore not surprising that standard bioinformatics tools failed to identify all genes present in the Physarum mitochondrial genome. Although the genes for atp8, nad2, nad6 and nad4L were expected to be encoded in Physarum mitochondria, these four genes were found only upon application of PIE, an algorithm expressly developed to look for genes which require insertional editing for their expression. Additional potential genes have recently been mapped to other regions of the Physarum mitochondrial genome using PIE (C. Ainsley, H. Lee, and R. Bundschuh, unpublished data), demonstrating further the efficacy of this algorithm.
Use of the PIE algorithm is not restricted to identification of sites of C insertion in Physarum
mRNAs. Mitochondrial mRNAs in a number of other myxomycetes are edited by the insertion of single U residues. We have tested the general utility of the PIE algorithm by applying it to the cox
1 genes of Clastoderma debaryanum
, Arcyria cinerea
, Stemonitis flavogenita
, and Didymium nigripes
), whose mRNAs were previously characterized by Horton and Landweber (17
). In principle, the same general strategy should be applicable to all types of insertional editing, with only minor changes to reflect the characteristics of editing in that organism. This includes the possible search for novel cryptogenes in kinetoplastids, although in this case the algorithm would have to be modified to accommodate deletions and insertions of longer stretches of consecutive uridines. However, given that kinetoplastid editing sites are known to be specified by guide RNAs (18
), an approach involving a search for cryptogenes and their cognate guide RNAs (19
) would be more direct.
The genes for atp8, nad2, nad6 and nad4L are not as well conserved as the previously localized Physarum mitochondrial genes, and this may account for the failure of BLASTX and BEAUTY to find these genes. Not surprisingly, the accuracy of editing site predictions for those four genes were not as high as those of the cox2 mRNA, which was anticipated based on the insertion site probabilities generated for each gene. However, although the exact boundaries were not correctly identified for every gene, the predictions of PIE allowed facile characterization of their respective cDNAs. In doing so, two new features of Physarum mitochondrial gene expression were identified: overlapping genes and deletion editing.
Nucleotide deletions have been observed previously in kinetoplastid mRNAs (18
), but the discovery of deletion editing in Physarum
mitochondria was surprising, given that no deletions have been reported in the previously characterized mitochondrial RNAs, which include seven mRNAs (nad
9, and the unpublished nad
1), the large and small rRNAs, and four tRNAs (5
). Although nucleotide deletions are much less frequent than insertions in T.brucei
(322 deletions versus 3030 insertions), L.tarentolae
(161 deletions versus 1436 insertions), and other kinetoplastids (18
), they still make up a substantial proportion of the total number of editing events in these organelles. In contrast, the three deleted A residues described in this work constitute <1% of the known editing sites in Physarum
The existence of nucleotide deletions extends the list of editing types that occur in Physarum
mitochondria, which already includes single C insertions, U insertions, dinucleotide insertions (GU, CU, UA, GC, AA and UU), and C to U changes (5
). It remains to be determined whether these nucleotide deletions occur co-transcriptionally, as observed for the nucleotide insertions (20
), or post-transcriptionally, as is the case for C to U changes (21
). Although all forms of editing in Physarum
mitochondria are virtually 100% efficient in vivo
), RNAs made in vitro
contain a mixture of unedited, edited and mis-edited sites (22
). Interestingly, one form of misediting that is observed during run-on transcription in partially purified mitochondrial transcription elongation complexes is the deletion of three encoded nucleotides immediately downstream of an insertion site. This finding intimates that nucleotide deletions may also occur co-transcriptionally, although this remains to be tested.