|Home | About | Journals | Submit | Contact Us | Français|
Small RNAs are well described in higher eukaryotes such as mammals and plants; however, knowledge in simple eukaryotes such as filamentous fungi is limited. In this study, we discovered and characterized methylguanosine-capped and polyadenylated small RNAs (CPA-sRNAs) by using differential RNA selection, full-length cDNA cloning and 454 transcriptome sequencing of the rice blast fungus Magnaporthe oryzae. This fungus causes blast, a devastating disease on rice, the principle food staple for over half the world’s population. CPA-sRNAs mapped primarily to the transcription initiation and termination sites of protein-coding genes and were positively correlated with gene expression, particularly for highly expressed genes including those encoding ribosomal proteins. Numerous CPA-sRNAs also mapped to rRNAs, tRNAs, snRNAs, transposable elements and intergenic regions. Many other 454 sequence reads could not be mapped to the genome; however, inspection revealed evidence for non-template additions and chimeric sequences. CPA-sRNAs were independently confirmed using a high affinity variant of eIF-4E to capture 5′-methylguanosine-capped RNA followed by 3′-RACE sequencing. These results expand the repertoire of small RNAs in filamentous fungi.
Deep transcriptome analyses have revealed that almost the entire genome of complex eukaryotes such as mammals is transcribed (1–3). Transcript length varies from as little as 16nt (such as tiRNA) to >100kb (such as Xist RNA). Mammals contain a variety of small RNAs and include tiRNAs (16nt), siRNAs/miRNAs/piRNAs (20–30nt), tRNA halves (30–40nt), PASR/TASR (22–200nt) and snoRNAs (70–200nt) that can modulate transcription, translation, replication and chromatin structure (1,3). A growing number of ncRNAs have been described in Saccharomyces cerevisiae. Many of these are driven by RNA Pol II and include cryptic unstable transcripts (CUTs, ~400nt) (4) and stable unannotated transcripts (SUTs, ~700nt) (5). Most map to transcription start sites or the 3′-end of protein-coding genes and appear to be the result of bidirectional promoter activity. In general, CUTs are degraded rapidly by the Nrd1-exosome-Trf4-Air2-Mtr4p polyadenylation (TRAMP) complex (4). Several cases have shown where these ncRNAs interact with the ribosomal complex and are translated (6,7). In the filamentous fungus Neurospora crassa, several new species of small RNAs have been recently described including miRNA-like small RNAs (milRNAs), Dicer-independent small interfering RNAs (disiRNAs) and qiRNAs (8,9). qiRNAs arise in response to DNA damage and map to sense and antisense strands of the rDNA array.
In this study, we undertook small RNA profiling in the ascomycete filamentous fungus, Magnaporthe oryzae (anamorph Pyricularia oryzae Cav), which causes blast, the most destructive disease of rice worldwide. The fungus not only destroys rice leaves, panicles and roots but also infects other cereals including wheat, barley, finger millet and grasses (10–12). Due to its agronomic significance and molecular genetic tractability, M. oryzae has emerged as a model to study fungal pathogenesis. In 2005, the genome (40Mb) of M. oryzae was sequenced and ~11000 protein-coding genes identified (13). Studies using expressed sequence tags (EST), serial analysis of gene expression (SAGE), massively parallel signature sequencing (MPSS) and microarray expression profiling have revealed that the transcriptome is more complex than initially appreciated (13–15). Here, we conducted pyrosequencing of cDNA and describe a distinct class of small RNAs that are 5′- and 3′-modified, which we refer to as CPA-sRNAs (5′-methylguanosine-capped and 3′-polyAdenylated small RNAs) (Figure 1A). CPA-sRNAs share no similarity to qiRNAs, milRNAs and disiRNAs discovered recently in N. crassa, which appear to possess no 5′- and 3′-modifications (8,9).
Magnaporthe oryzae isolate 70–15 was used in this study because of the availability of genomic (13) and transcriptomic (14,15) resources. Conidia were germinated and mycelia cultured in a liquid medium (0.2% yeast extract and 1% sucrose) by shaking at 200rpm, 25°C for 3 days. The mycelia were filtered through cheesecloth and used for RNA isolation.
Total RNA was isolated from 2g of mycelia using the Trizol method (15,16). PolyA+ RNA was purified using a PolyATtract mRNA Isolation System III (Promega) according to manufacturer’s procedure. To construct the CPA-sRNA library, protocols used to generate full-length cDNA were followed, from which small molecules were size selected and sequenced (16). Briefly, the free phosphate at the 5′-ends of 1µg polyA+ RNA from mycelia was removed by treating with bacterial alkaline phosphatase (BAP, Epicenter) followed by removal of the 5′-methylguanosine caps by treating with tobacco acid pyrophosphatase (Epicenter). PolyA+ RNA with an exposed 5′-phosphate was ligated to a 5′-RNA oligo linker (5′-AGCAUCGAGUCGGCCUUGUUGGCCUACUGG-3′) using T4 RNA ligase (Epicenter). The ligated polyA+ RNA was treated with DNase I (Invitrogen) to remove contaminating genomic DNA and re-purified using the PolyATtract mRNA Isolation System III. The 3′-oligo (dT)20VN linker (5′-GCGGCTGAAGACGGCCTATGTGGCC(T)20VN-3′) was used to synthesize cDNA using SuperScriptIII (Invitrogen) according to supplier’s procedure. RNA was digested with RNase H (Invitrogen). Double-stranded cDNA was amplified with high fidelity Platinum Taq DNA polymerase (Invitrogen) using 5′-PCR primers specific for the 5′-RNA linker (5′-AGCATCGAGTCGGCCTTGTTG-3′) and 3′-PCR primers specific for the 3′-oligo(dT)20VN linker (5′-GCGGCTGAAGACGGCCTATGTG-3′). The conditions used for PCR amplification were 94°C for 2min followed by 30 cycles of 94°C for 30s, 60°C for 30s and 72°C for 1min and a final extension at 72°C for 10min. PCR products were resolved on 3% agarose gels and cDNA between 60 and 200nt were purified using a Gel and PCR Clean-Up System (Promega). Purified cDNA was ligated to 454 adapters and analyzed directly by 454 sequencing at the Joint Genome Institute, Walnut Creek, CA, USA.
We obtained 127330 raw reads in a FASTA format from a 454 sequencing run. 454 sequencing adaptemer and linkers at 5′- and 3′-ends were removed from raw reads and the remaining sequences were named CPA-sRNAs. Overall, we obtained a total of 80111 CPA-sRNAs from mycelia with a size of ≥10 nts. We retained 25389 reads with a size between 16 and 218 nts for matching to V6 M. oryzae genome assembly (GenBank ID; NZ_AACU00000000.2) (13). A detailed matching analysis was carried out using stringent BLASTN criteria of 80% coverage and 98% of sequence identity. We also utilized Magnaporthe transcriptome data (14,15) including ESTs, MPSS tags and RL-SAGE tags to annotate CPA-sRNAs. All the genomic features (contigs, genes, tRNAs, rRNAs, snRNAs, repeats, mitochondrial genome) and transcriptomic data (ESTs, SAGE, MPSS) were visualized in a genome browser based on gbrowse (17).
To define the transcriptional start and stop sites for protein-coding genes, we devised two approaches. First, we assigned a 5′-transcription start site (TSS) and 3′-transcription termination site (TTS) to gene models supported by ESTs. This provided a TSS and TTS for 2558 and 2551 genes, respectively. For the remaining annotated genes, we defined UTRs as 500bp from start and stop codons. This is likely a slight overestimate of the average actual UTR length for protein-coding genes, but a value of 500bp captured the vast majority of TUs. The average 5′-UTR for gene models supported by EST evidence was 327nt. For other RNA species we defined the 5′-TSS and 3′-TTS as the first and last nucleotide of the mature RNA. For tRNAs, we used 150-nt upstream from 5′-mature tRNA for the 5′-leader and 150-nt downstream from 3′-mature tRNA for the 3′-terminator region.
CPA-sRNAs may align to the genome one or more times. The genomic location of each alignment may correspond to features such as genes, tRNA, rRNA or transposable elements. Thus the alignments were used to map CPA-sRNAs to genomic features. We pursued three methods for describing CPA-sRNA mapped genomic data (alignment counts, read counts and prorating) that account for ambiguity in determining the genomic origin of each CPA-sRNA. Alignment counts are the simple summation of all CPA-sRNA alignments to a given genomic feature. Since CPA-sRNAs may align multiple times to the genome, use of alignment counts alone might result in over counting. This is most evident with CPA-sRNAs that map to transposable elements—there are 20671 alignments to transposable elements that originate from only 325 CPA-sRNAs. We addressed this issue by defining read counts, such that each CPA-sRNA is counted only once for a given genomic feature to which it maps. As CPA-sRNAs may map to multiple features, the use of read counts does not directly reflect the CPA-sRNAs origin (multiple mappings arise from either multiple alignments to multiple features or a single alignment spanning adjacent features). We further refined our approach by taking into consideration multiple mappings. We prorated the counts for CPA-sRNAs by apportioning the counts across any multiple alignments and any features associated with that alignment (prorating). This was done as an iterative process: first, each CPA-sRNA was assigned a weight based on the number of copies found in sequencing data. Second, the weight from a given read was divided evenly between its genomic alignments. Third, each feature within a given alignment was given an equal portion of that alignments’ weight. Last, each sub-feature divided the weight of the parent feature (sub-features exist as components of a feature—i.e. an exon is a sub-feature of gene). Summation of the apportioned CPA-sRNA weights for a given feature yields a balanced portrayal of CPA-sRNA coverage for that feature and summation of values for sub-features equals that of its feature. Supplementary Figure S1 provides a visualization of prorating using hypothetical examples.
5′-methylguanosine-capped transcripts were purified using recombinant eIF4EK119A, which binds 5′-m7GpppN RNA caps with a 10- to 15-fold higher affinity than wild-type eIF-4E (18,19). GST-tagged eIF4EK119A protein was bound to glutathione agarose beads (4E-beads) for 1h at room temperature in PBS. The 4E-beads were washed in the binding buffer: 10mM KHPO4, pH 8.0, 100mM KCl, 2mM EDTA, 5% glycerol, 0.005% Triton X-100, 1.3% poly(vinyl alcohol) 98–99% hydrolyzed (Aldrich), 1mM DTT and 20U/ml RNase inhibitor (Ambion). About 120µg of total RNA was heat denaturated, diluted in the binding buffer, added to 200µl (packed bead volume) of 4E-beads in a siliconized tube (Genemate, ISC BioExpress) and mixed for 1h at room temperature. Samples were briefly centrifuged to pellet the beads with bound RNA and washed three times (5min each) by mixing at room temperature in the binding buffer. The bound RNA on 4E-beads was phenol/chloroform extracted, precipitated and dissolved in RNase free water. The quantity of 5′-methylguanosine-capped RNA was measured by NanoDrop (Thermo Fisher) analysis and its integrity was determined with an Agilent 2100 Bioanalyzer.
5′-methylguanosine-capped RNA was treated with DNase I (NEB) to remove any contaminating genomic DNA. cDNA was synthesized in 20µl reactions by adding the following reagents: 1µg of 5′-methylguanosine-capped RNA, 50 picomole of 3′-oligo(dT) 20VN primer, 5mM of dNTPs, 1U of RNaseOut (Invitrogen) and 5 U of Superscript III (Invitrogen). Supplementary Table S1 lists all primer sequences used in this study. The reverse transcription reaction was incubated at 42°C for 2h and heat inactivated. For evaluating CPA-sRNAs in 5′-methylguanosine-capped cDNA, PCR amplification was performed using a forward primer specific to the 5′-end of CPA-sRNAs of interest described in the ‘Results’ section and a reverse primer specific to 3′-oligo(dT)20VN linker. PCR was done with high fidelity Platinum Taq DNA polymerase (Invitrogen) and under the following conditions 94°C for 2min followed by 35 cycles at 94°C for 30s, 55°C for 30s, 72°C for 30s and a final extension at 72°C for 5min. PCR products were resolved on a 3% agarose gel, purified and cloned into the pGEM-T easy vector according to supplier’s procedure (Promega). About 20 randomly selected white colonies were sequenced using the Sanger method.
Northern blot analyses were conducted using total RNA, RNA purified using eIF4EK119A or oligo(dT) columns and separated on 15% denaturing polyacrylamide gels. Blots were hybridized with [γ32P]ATP-labeled oligo(dT)20 probes. To document that the cDNA contained long as well as short full-length cDNAs, we confirmed the presence of the full-length actin gene (MGG_03982.6) using specific PCR primers prior to size selection (Supplementary Figure S2).
Correlation analyses were conducted using normalized signal intensity values of microarray data for M. oryzae mycelia grown in complete media for 48h and a further 12h in minimal media (NCBI GEO Accession #; GSE2716, Sample ID #s; GSM 52525, GSM 52524, GSM 52520) with the number of assigned CPA-sRNAs using JMP (SAS Institute) software. Analyses were conducted for both CPA-sRNAs mapping in the sense and antisense orientation with expression values for individual genes. Genes were also grouped into 100 bins based on gene expression and the relationship between the mean gene expression and the mean number of CPA-sRNAs per bin compared. Likewise, the relationship between MPSS or SAGE tags, which were both derived from RNA extracted after 72h growth on complete media, and CPA-sRNAs was determined by comparing the mean number of tags and the mean number of CPA-sRNA per bin. GO annotations for M. oryzae genes were obtained from the previously published work (20).
Full-length cDNA was constructed from mycelial RNA, which was separated on an agarose gel and the fraction <200nt subjected to 454 sequencing. A total of 127330 reads were obtained, from which 25389 CPA-sRNAs (≥16nt; excluding 3′-polyA sequences) were further analyzed (Supplementary Table S2). Of the CPA-sRNAs, 57.4% (14547) mapped to version 6 of the M. oryzae genome (BLASTN criteria of >80% coverage and >98% sequence identity). Interestingly, 84% (12235) of CPA-sRNAs mapped to unique loci and 16% (2354) mapped to multiple locations in the genome. 10265 (9780 prorated) CPA-sRNAs mapped to protein-coding TUs (13), and the remainder mapped to intergenic regions, transposable elements, rRNAs, tRNAs and snRNAs (Table 1 and Supplementary Table S3). 2778 (2498 prorated) CPA-sRNAs mapped to intergenic regions of M. oryzae. Of these, 1130 CPA-sRNAs overlapped EST or SAGE or MPSS sequences (Table 2). CPA-sRNAs ranged in length from 16 to 218nt with a mean of 41nt (Figure 1B).
To validate CPA-sRNAs, 5′-methylguanosine-capped RNA was purified from total RNA using a high affinity variant of eIF-4E, which was previously used to prove that specific miRNA precursors have 5′-methylguanosine caps (18,19). Gel blot analysis of 5′-methylguanosine-capped RNA using [γ32P]ATP labeled oligo(dT)20 revealed a smear from 20 to 200nt confirming diversity of length and that CPA-sRNAs contain both a 5′-methylguanosine cap and polyA tract (Supplementary Figure S2). The presence of a 3′-polyA tail was confirmed by 3′-RACE on individual CPA-sRNAs, which were subsequently cloned and sequenced. Sequencing of 3′-RACE products confirmed that CPA-sRNAs mapped to protein-coding genes, transposable elements, snRNAs, tRNAs, rRNA genes and to intergenic locations and is described in more detail below (Figure 2B–F and Supplementary Figure S2).
We also examined the genomic context of CPA-sRNAs to exclude the possibility that they may have arisen from loci corresponding to longer RNAs rich in adenosine. 83% (12096 out of 14547) of CPA-sRNAs aligned to genome regions lacking adenosine enrichment (≥5As) (data not shown), indicating that most CPA-sRNAs were not derived from internal poly-adenosine sequences of transcribed regions. We found that many CPA-sRNAs mapped to MPSS and SAGE tags derived from 3′-polyadenylated RNA located in intergenic regions and genes (Table 2). These tag associations were previously unexplained but in light of the present findings, they were likely derived from 3′-polyadenylated small RNAs. Taken together, these data from different approaches provide compelling evidence that CPA-sRNAs exist in fungal tissue and represent a distinct class of small RNAs.
A total of 10265 (9780 prorated) CPA-sRNAs mapped to 4327 (39% of the total number) predicted protein-coding mRNAs (TUs), with more than a quarter (3507, 2201 prorated) mapping in the antisense orientation (Table 1). The majority of CPA-sRNAs mapped to UTRs (2981 (2323 prorated) to 5′-UTRs and 6260 (5489 prorated) to 3′-UTRs), whereas only 681 (378 prorated) mapped to introns. Examination of sense CPA-sRNAs mapping to 5′- or 3′-UTRs revealed that the vast majority associated with the transcript initiation (TSS) or termination (TTS) site, respectively (Figure 1C). CPA-sRNAs were predominantly (4095 out of 4327) associated with genes supported by ESTs, MPSS and RL-SAGE tags (Table 2). A 3′-RACE was used to confirm the presence of CPA-sRNAs for nine randomly selected protein-encoding genes (Figure 2B), which included S-adenosyl methionine synthetase (MGG_0383.6), chitinase 18–11 (MGG_06594.6), Sad1/UNC domain-containing protein (MGG_00469.6), cell wall glucanosyl transferase Mwg1 (MGG_00592.6), yjeF-related protein (MGG_02597.6), ubiquitin (MGG_07928.6), 40S ribosomal protein S24 (MGG_10680.6), glutamine synthetase (MGG_14279.6) and nuclear encoded mitochondrial hypoxia responsive domain containing protein (MGG_01210.6). The expected 3′-RACE product size of 80–200nt were obtained for all nine genes. Sequencing of cloned 3′-RACE products confirmed they aligned with CPA-sRNAs obtained from pyrosequencing, including the splice junction for chitinase 18–11 (MGG_06594.6) (Figure 3A).
To identify a possible role of CPA-sRNAs, we correlated their abundance with mycelia gene expression. Overall, we observed a positive correlation between CPA-sRNAs mapping in the sense orientation and mycelial gene expression, although not all individual genes followed this pattern (Figure 4A and B and Supplementary Table S4). Notably, the most highly expressed group of genes had highest numbers of mapped CPA-sRNAs. Inspection of 127 genes with ≥10 CPA-sRNAs mapped in the sense orientation showed that nearly all were functionally assigned with gene ontology (GO) terms involved in metabolism, with 67 (53%) being assigned to mycelial development and 43 (34%) to translation [(20) Figure 4E and Supplementary Table S5]. Of the latter, most (42) were assigned to structural components of the ribosome (Figure 4F). Further analysis confirmed a strong positive correlation between CPA-sRNAs and gene expression for all (65) annotated structural ribosomal proteins (see asterisk in Figure 4A and Supplementary Table S4). In contrast, antisense CPA-sRNAs did not map primarily to TSS and TTS, nor was there evidence supporting a correlation with gene expression (Figures 1C and and4C4C and D). In addition, we also observed a strong positive correlation between both sense-mapping MPSS or SAGE tags and sense-mapped CPA-sRNAs (Supplementary Figure S3).
We detected 425 (289 prorated) CPA-sRNAs that mapped to 287 tRNA loci, with 31 (6 prorated) that mapped in the antisense orientation (Table 1, Supplementary Table S3). Of the 425 CPA-sRNAs, 114 mapped to pseudo-tRNAs, of which there are 141 essentially identical copies in the M. oryzae genome. Most CPA-sRNAs mapped around the beginning or end of the mature tRNA (Figure 3B and Supplementary Figure S4A). Several CPA-sRNAs corresponded to the entire tRNA, whereas others were shorter or longer than the corresponding tRNA. A number of CPA-sRNAs mapped to positions ~–50 and ~+50nt from 5′- and 3′-ends, respectively, of the mature tRNA locus and likely correspond to the pre-tRNA transcript. A 3′-RACE confirmed CPA-sRNAs for five tRNAs (Figure 2C and Supplementary Figure S4B). The expected 3′-RACE PCR and sequence products were obtained for Ala tRNA (MGG_20297.6), Cys tRNA (MGG_20209.6), Gln tRNA (MGG_20266.6) and Leu tRNA (MGG_20218.6). The 3′-RACE and sequencing also revealed CPA-sRNAs corresponding to all three Pro tRNA paralogs (MGG_20065.6; MGG_20044.6 and MGG_20298.6). Analysis of CPA-sRNAs mapping to rRNA revealed 1675 (1597 prorated) that mapped to 18S-5.8S-28S rRNA repeat locus (Table 1 and Supplementary Figure S5A and B). We obtained diverse CPA-sRNAs for 5.8S rRNA, many of which were supported by SAGE tags (Supplementary Figure S5C). We found 66 (46 prorated) CPA-sRNAs for the 8S rRNA locus, which has multiple copies dispersed throughout the M. oryzae genome (Table 1 and Supplementary Table S3). CPA-sRNAs associated with the 5′-end of 18S rRNA and 28S rRNA were validated by 3′-RACE analysis and sequencing (Figures 2D and and33C).
More than 10% of the M. oryzae genome consists of repetitive elements (13). We found 379 (320 prorated) CPA-sRNAs mapping to transposable elements, including 102 (42 prorated) antisense mappings (Table 1 and Supplementary Table S3). Fifty (46 prorated) CPA-sRNAs mapped specifically to the LTR region of MAGGY, a gypsy-like element linked with pathogenicity (21), which were validated by 3′-RACE analysis (Figure 2E and Supplementary Figure S6). The LTR (250nt) of MAGGY is structurally similar to retroviruses, which acts as a transcription initiator and terminator (21). We also identified a number of CPA-sRNAs that mapped to other retro-elements including MGR583 or SINE element (218; 155 prorated), PYRET (40; 29 prorated), OCCAN (11; 8 prorated) and MOLLY (5; 3 prorated), as well as to DNA transposon, POT2 (64; 20 prorated). Overall, we observed that CPA-sRNAs mapped primarily to the LTR (putative TSS and TTS) of retro-transposons while CPA-sRNAs were distributed across the entire transcript of DNA-transposons.
Only 43 (43 prorated) CPA-sRNAs aligned to the mitochondria genome (~35kb) (Table 1; Supplementary Table S3 and Figure S7). Of these, 31 (30 prorated) mapped to rRNA genes (rrnL, large subunit ribosomal RNA and rrnS, small subunit ribosomal RNA). In contrast to nuclear genes, only 11 (8 prorated) CPA-sRNAs were found to align to the coding sequence of two mitochondrial genes (MGG_21013.6, cytochrome b and MGG_21007.6, ATP synthase subunit 6). We did not detect CPA-sRNAs for mitochondrial tRNAs except for the SeC (MGG_21117.6) and Arg tRNA (MGG_21120.6) loci. Mitochondrial transcripts are typically transcribed by a simpler RNA polymerase, homologous to the bacteriophage T7/T3 RNA polymerase subunit as compared to the more complex nature of nuclear RNA Pol I, II and III (22). Although, we have limited knowledge of the mitochondrial RNA polymerase in filamentous fungi, our data suggest that CPA-sRNAs are biased toward nuclear RNA polymerase-derived transcripts.
Deep sequencing projects of small RNA species typically reveal many sequences that do not align to the genome and are often disregarded as sequencing errors. However, evidence now suggests that these molecules result from post-transcriptional RNA modifications (23–25). Gowda et al. (15) reported previously that a large fraction of MPSS and RL-SAGE tags did not match the M. oryzae genome. Similarly, we found that ~42% of CPA-sRNAs (10824 out of 25389) could not be aligned to the genome. Interestingly, we identified 388 CPA-sRNAs that matched only to M. oryzae ESTs but not to the genome (data not shown). In addition, 1585 CPA-sRNAs matched SAGE and/or MPSS tags but were unaligned to the genome. Further examination revealed that a small number (284) of unaligned CPA-sRNAs aligned to the genome sequence of strains P123 and Y34 (Y. Peng et al. unpublished data) suggesting that some CPA-sRNAs map to gaps in the 70–15 reference strain genome sequence. Manual inspection revealed evidence of 46 chimeric CPA-sRNAs (Supplementary Table S6), some of which were possibly derived from the fusion of RNAs from two or more non-contiguous genomic locations (3). We observed chimeric RNA fusions in head-to-head or head-to-tail orientations of the same exon, two exons of the same gene, exon–exon junctions of two genes, protein-coding region–intergenic or rRNA, intergenic–intergenic, rRNA–intergenic, rRNA–rRNA and rRNA–tRNA. Twenty-five percent of chimeric CPA-sRNAs were derived from exonic regions, where fusion points coincide with the canonical splicing sites (Supplementary Table S6). We also found evidence for short homologous sequences (SHS) for 44% of chimeric CPA-sRNAs, similar to reports of animal (26) and plant (27) chimeric RNAs. Further examination of these chimeric CPA-sRNAs revealed a high rate (96%) of non-template nucleotide additions, 20% (9 out of 45) of which had 1–3 non-template nucleotides internally at the point of RNA fusion. We also observed non-templated nucleotides in many genome-matched CPA-sRNAs. For example, 54% of Gln tRNA-associated CPA-sRNAs (7 out of 13) had non-templated nucleotides (G, AAC, A, C, C, T) at the 3′-end (Figure 3B). Similarly for 5.8S rRNA CPA-sRNAs, 81% (13 out of 16) and 88% (14 out of 16) had non-templated nucleotides at the 5′- and 3′-regions, respectively. Non-template sequence diversity at the 3′-end of 5.8S rRNA is supported by 26 RL-SAGE tags that matched the 3′-ends of CPA-sRNAs (15). Thus chimeric and non-templated nucleotide additions may explain why at least some CPA-sRNAs do not align to the genome.
Magnaporthe oryzae has emerged as a model to study fungal pathogenesis due to its agronomic significance, genetic tractability, availability of genomic sequences and expression datasets including expressed sequence tags (EST), serial analysis of gene expression (SAGE) tags, massively parallel signature sequencing (MPSS) tags and microarray data (13–15). During the course of analyzing a full-length cDNA library, we identified CPA-sRNAs. These are a distinct class of small RNAs because these contain both 5′-methylguanosine-capped and 3′-polyadenylated and associate with TUs of RNA Pol I (rRNA), Pol II (mRNA/retrotransposons) and Pol III (snRNA/tRNA).
RNA Pol II transcripts have been intensively studied with respect to 5′- and 3′-end modifications (28). Nuclear capping occurs co-transcriptionally by adding a 5′-N-methyl guanosine with an inverted 5′–5′-triphosphate bridge to the first gene-encoded nucleotide of RNA Pol II transcripts. A PolyA tail is added to 3′-ends of RNA Pol II transcripts by post-transcriptional events. These modifications protect RNAs from nucleases, and signal for RNA export and translation. Additional capping in the cytoplasm (29) and polyadenylation linked to nonsense-mediated mRNA decay (6) have been reported in eukaryotes. Recently, small ncRNAs such as CUTs and SUTs have been described in S. cerevisiae that appear to be the products of RNA Pol II and thus are likely capped and polyadenylated (6). This suggests that research on capping and polyadenylation is far from complete and further studies may shed further functions of capping and polyadenylation of small and long transcripts.
Although the ends of RNA Pol I and III transcripts are less well defined as compared to RNA Pol II, it has been shown in yeast recently that some of these elements contain 3′-polyA tails (30). Furthermore, small RNAs (21–400nt) in animals (31,32) and plants (33) possess short stretches of adenosine (1–7nt). The snRNA U6, which is transcribed by Pol III, carries a methylguanosine cap at the 5′-end (34). In our study, we obtained CPA-sRNAs for both U6 and U2 (Figure 2F). Recently small RNAs from humans have been reported to possess cap structures at the 5′-ends (35,36), many of which were associated with the TSS. Several reports have shown that some classes of small RNA (<200nt) associate with RNA Pol II TSS and TTS (1,37,38). Our study provides detailed evidence that CPA-sRNAs are not only associated with TSS and TTS of protein-coding genes but are also associated with snRNAs, tRNAs, rRNAs and retrotransposons. qiRNAs, milRNAs and disiRNAs reported recently in N. crassa, however, do not appear to possess 5′- and 3′-modified ends (8,9).
We have no knowledge of the mechanisms of CPA-sRNA biogenesis; however, it is likely they are either derived directly from the genome or are processed from longer transcripts. In the former case, they could be the product of an uncharacterized RNA polymerase, such as Pol IV, a homolog of DNA-dependent RNA polymerase II (36,39), or Pol III (40), which are then processed, if necessary, by adding methylguanosine to the 5′-end and polyA to the 3′-ends, possibly by a cytosolic mechanism (29). Alternatively, CPA-sRNAs could be derived by the action of an endo-ribonuclease/splicing complex on long mRNAs releasing short RNA fragments, which might be spliced together, undergo addition of untemplated nucleotides before adding a 5′-methylguanosine cap and 3′-polyA tail.
Currently, we have no direct evidence for a biological function for CPA-sRNAs. However, a growing body of research has shown that non-protein-coding small RNAs modulate many biological processes, including chromosome replication, chromatin remodeling, transcription regulation, RNA processing and stability as well as protein stability and translocation (2,3,41). Several studies have shown that ncRNA transcription is most predominant at the promoters (or TSSs), and also occurs at intergenic regions as well as within genes. Our finding of a strong positive correlation between CPA-sRNAs and gene expression suggests CPA-sRNAs may play a positive role in gene regulation. Small RNAs complementary to promoter regions have been shown to activate gene expression in several different cellular contexts (42–44). These RNAs are typically biased toward genes with higher levels of expression (38). Similarly CUTs are also typically positively correlated with gene expression (4). It is possible that CPA-sRNAs associate with transcriptionally active regions forming RNA–DNA hybrids, which create transcriptional bubbles (nucleosome-free single-stranded DNA) at the TSS and TTS (45). At TSS, CPA-sRNAs may facilitate transcription factor and RNA polymerase binding or act as primers for RNA synthesis, whereas CPA-sRNAs at TTS may block further transcription and facilitate the release of the RNA polymerase. On the other hand, as has been pointed out for CUTs, it is possible they simply reflect inefficient start site initiation and represent ‘transcriptional noise’ (6).
Finally, we provide an explanation of why at least some small RNAs do not align with the genome sequence. While we cannot discount that many may be the result of sequencing error, our analyses reveal that some are the products of fragment fusion or contain non-templated additions at their termini or point of fusion such that they no longer align to the genome using strict criteria. We suggest that RNA editing and/or posttranscriptional modification may be involved in generating CPA-sRNA diversity, thus increasing the complexity of the small RNA transcriptome. However, their origin and significance remain to be determined. While there remains much to be learned about CPA-sRNA biogenesis and function, their discovery and characterization add another fascinating chapter in genome and RNA biology.
Supplementary Data are available at NAR Online.
Funding for open access charge: United States Department of Agriculture (Award #2005-04936 to R.A.D.); National Institutes of Health grant (CA63640 to C.H.H.).
Conflict of interest statement. None declared.
We appreciate all the members in the fungal genomics laboratory for their feedback on this project. We also thank Malathy Krishnamurthy for help editing this manuscript.