MicroRNAs (miRNAs) are endogenous small non-coding RNAs of about 20–24 nt that can be coded either by intergenic or intragenic loci.
In plants a few examples of intronic miRNAs have been found in Arabidopsis and rice. In Arabidopsis, miR402 has been identified within the intron of gene At1g77230 in the same orientation as the hosting pre-mRNA [1
]. A putative mirtron, miR1429.2, has also been identified in rice by Zhu et al. [2
] as part of a homeobox gene. In Arabidopsis the intronic miR838 is particularly interesting being located in the intron of Dicer-like 1 (Dcl1
) gene (which codes the key enzyme involved in plant miRNA maturation,[3
]), and thus enabling a self-regulatory mechanism that helps maintain DCL1 homeostasis [4
The rice miR3981 was detected in the exon of a putative glyoxalase gene and its biogenesis pathway might be involved in the regulation of glyoxalase expression. Excessive levels of glyoxalase mRNA should lead to an increase in pre-miR3981 processing by the enzyme DCL1 and with glyoxalase also being a putative target for miR3981, an increase in the amount of miRNA3981 could lead to a decrease in glyoxalase mRNA levels, thus preventing durable high expression of the gene [5
]. This suggests the possibility of a balanced competition, mediated by DCL1, between glyoxalase protein production and miR3981 biogenesis leading to a fine-tuning of glyoxalase expression in rice. The possibility that the biogenesis of exonic miRNAs destabilizes the corresponding protein-coding transcript and reduces protein synthesis makes their characterization very intriguing and suggests a new mechanism of post-transcriptional regulation of gene expression.
This work was designed to carry out the computational identification of putative exonic miRNAs in plants by taking advantage of the huge amount of ESTs available for many species. The analysis of putative miRNA target sites in the host mRNA was also performed in order to gain some insight into possible novel mechanisms of gene regulation.
TCs (Tentative Consensus), from 30 plant species, publicly available from the DFCI Gene Index Project (http://compbio.dfci.harvard.edu/tgi/
), were computationally screened for the presence of miRNA precursors and miRNA target sites (Additional file 1
detailed methods can be found as additional file). TCs hosting miRNA precursors are reported in Additional file 2
. Some of the annotated TCs refer both to a miRNA and a protein and thus represent the starting point for the identification of candidate exonic miRNAs.
The relationships between miRNAs and transcription factors (TFs) is particularly relevant, being the basis of complex regulatory networks [6
] and several examples of TF genes hosting microRNA precursor sequence were found and are reported in aAdditional file 2
: examples include AP-2, NAC and SPL coding genes. Interestingly, in some of the miRNA-containing TCs identified in this study, the miRNA host gene is also its conserved target, for example miR444, a monocot-specific miRNA identified in several monocots (barley, maize, purple false brome, rice, wheat), is part of TCs coding for MADS-box transcription factors (Additional file 2
) and this family of transcription factors is also a well-known conserved target for miR444. Additional file 3
contains the example of the barley EST called GH228935 [GenBank: GH228935], which is a MADS-box-related sequence containing both the target site for miR444b (+/−) and the precursor sequence for miR444a (+/+).
All TCs or singletons that include both a miRNA precursor sequence and one or more miRNA target sites are presented in Table . TCs that include both a miRNA target site with +/− orientation and the miRNA precursor sequence with +/+ orientation on the coding strand can be referred to those cases where the miRNA is oriented on the same strand of the target gene. The presence of the target sequence upstream or downstream of the precursor sequence could affect both host gene post-transcriptional regulation as well as miRNA regulation or self-regulation in the case that the target site belongs to the same miRNA family as the miRNA precursor. The latter mechanism could be a further buffering system for modulation of miRNA expression and could be in accordance with the model of miRNA self-regulation proposed by Meng et al. [6
] that suggests a feedback model in which the miRNA* binds to the complementary sites on their precursor thus exerting a cleavage-based modulating role. Examples of putative self-regulation based on the presence of a target site upstream or downstream of the precursor have been found in our data for miR444 as reported above, miR395 (rice), miR159/319 (soybean), miR1118 (wheat) and miR1219 (moss).
Reports those TCs or singletons that include both a miRNA precursor sequence and one or more miRNA target sites
On the other hand, those TCs where the orientation of miRNA precursor sequence with respect to coding strand is +/− should refer to those miRNAs coded in the antisense orientation with respect to the host gene. Antisense transcription associated with miRNA target mRNAs have been previously discovered in Arabidopsis [8
] and may serve as a link between miRNA and RNA silencing pathways, suggesting that miRNAs may have additional roles in post-transcriptional regulation that are independent of cleavage.
TCs or singletons coding for more than one miRNA precursor (miRNA clusters) have also been identified both in the +/+ and +/− orientations (sense/antisense miRNAs) and are summarized in Additional file 4
. The occurrence of pairs of antisense miRNAs in plants has been previously reported [9
]. Most of the miRNA clusters highlighted in plants contain several members of a specific miRNA family rather than encoding miRNA with unrelated sequences, as is frequently the case for animal miRNA clusters [10
]. Examples of plant miRNA clusters with members of different families are reported in Additional file 3
Among the exonic miRNAs identified, we chose to explore, as an interesting example, the relationship between miR1118 and calmodulin-coding genes in wheat (Table ). Calmodulin genes are of particular interest due to the fundamental role of calmodulin in the calcium-dependent regulation during plant response to endogenous and exogenous stimuli [11
]. Calmodulin-binding proteins have already been shown to be targeted by miRNAs in plants [12
], while no report is available on miRNA targeting to calmodulin itself.
Many plant species possess calmodulin (CaM) multigene families composed of several genes: some of which encode identical proteins, while others encode different isoforms of calmodulin-related proteins. In wheat, at least 13 CaM-related genes have been identified and classified into four subfamilies (SF-1:4 on the basis of their nucleotide sequence) coding for three distinct isoforms named TaCaM-I, TaCaM-II and TaCaM-III [13
]. Genes from SF-1, SF-3 and SF-4 encode for the same isoform (TaCaM-I, the most similar to calmodulin from other plant species and organisms), while SF-2, the most divergent calmodulin subfamily with most recent origin in wheat, encodes at least two isoforms (TaCaM-II and TaCaM-III).
To deeply characterize the calmodulin gene family for the presence of miRNA-coding sequences, ten cDNA clones corresponding to 10 different genes encoding calmodulin-like proteins were screened against the wheat miRNA mature sequences present in miRBase. Each cDNA clone contained at least part of the 5′UTR, the full coding region, the 3′-UTR and the polyA tail. Blast output is reported in Additional file 5
. The most significant match was found for tae-miR1118 mature sequence (score 43, e-value 2e-006), furthermore, an additional miRNA (tae-miR1125 mature sequence) showed a very significant match (score 39; e-value 2e-005). Both miRNAs were found in the two genes belonging to SF-2 (TaCaM2-1
), but not in the third sequence belonging to the same subfamily (TaCaM2-3
). Mature sequences for both miRNAs are included in the 3′UTR regions of the two calmodulin genes. Although the coding sequences of TaCaM2-1
are substantially different and produce two distinguishable calmodulin isoforms (TaCaM-III and TaCaM-II, respectively), the corresponding 3′UTR regions differ only by one nucleotide. The nucleotide sequence alignment between the 3′ UTR regions of the three calmodulin genes belonging to SF-2 revealed a striking difference between TaCaM2-3
and the other two genes: TaCaM2-1
share a long insertion of 239 bp that is absent in TaCaM2-3
(Figure and Additional file 6
). A Blast search against the complete TREP database, containing a collection of repetitive DNA sequences from different Triticeae
species, highlighted a strong match (score 315, e-value 1e-86) of the inserted sequence with a DNA transposon of the Mariner class called DTT_Polyphemus_464G14-3. This finding is in agreement with a possible model for the evolution of MIR genes that proposes their origin from transposable elements, on the basis of the observation that the DNA-type nonautonomous elements called MITE (miniature inverted-repeat transposable element) can fold into imperfect stem loops that are typical of miRNA precursors [3
]. Several pieces of evidence pointed to plant mariner
-like elements (MLEs) as the autonomous partners of the nonautonomous Stowaway
Reports the positions of miR1118 and miR1125 coding sequences inside wheat SF-2 calmodulin genes (TaCaM2-1,TaCaM2-2,TaCaM2-3).
The presence of the transposon-related insertion in calmodulin genes from other plant species has was investigated, and a blast search using the sequence of TaCaM2-2
gene as query was performed. The results are reported in Additional file 7
: a significant match with the 3′UTR region of the wheat gene was detected only with a barley gene, where, however, the transposon-related sequence was absent.
Moreover, while the miR1125 precursor sequence deposited in the miRBase found a significant match (81% identity) entirely included in the inserted sequence, the miR1118 precursors sequence matched (100% identity) only partially into the insertion while it spanned also the very last part of the 3′UTR of the TaCaM
genes (Figure ). The reverse complement sequence for miR1118 found inside the 239 nt insertion could thus represent a target site for the same miRNA. This is very possible since only 3 mismatches were found when compared with the mature sequence, and the target site is also predicted by psRNAtarget. A second reverse complement site is also shared by all three SF-2 genes (TaCaM2-1
), being located in the last part of the 3′ UTR region. This site corresponds to the miRNA* of the precursor deposited in the miRBase and it can be considered a less likely target site due to the high number of mismatches with the mature sequence. However, the prediction of the RNA secondary structures by mFold software (Additional file 8
) highlighted the worse energetic stability of the miR1118 precursor stored in miRBase compared to that of the inserted sequence (−32.5 kcal/mol vs.−117.6 kcal/mol). Although the transposable element is longer than the deposited precursor (239 nt vs. 140 nt) it has a Minimal Folding Energy Index (MFEI) higher than the miRNA precursor (1.16 vs. 0.59) and it is in better agreement with the values known for plant miRNAs. The existence of two alternative miR1118 precursors could thus be possible. The inserted sequence has acquired some mutations since the insertion event and these variations have led to a better stability of the candidate miRNA precursor, thus confirming this hypothesis. In fact, compared with the inserted sequence, the predicted secondary structure of the transposable element stored in the TREP database shows a worse stability (−84.4 kcal/mol), a lower MFEI (0.88) and a higher number of mismatches (4) between miRNA and miRNA*.