In the previous description of 8.4 fold coverage assembly of the grapevine (Vitis vinifera
L.) genome [26
] we identified 164 candidate conserved miRNAs. Here we present a comprehensive characterization of conserved miRNAs in grapevine including a refinement of mature miRNA sequences and important information regarding pattern of expression of both mature miRNAs and precursors. Three complementary approaches were followed to characterize the expression pattern of both the mature miRNAs and their precursors. To allow a comparison between methods, all technologies were applied to leaf tissue. Whole transcriptome deep sequencing was performed on all tissues available from the highly homozygote sequenced clone PN40024. To maximize the coverage of tissues studied, microarray and 454 transcriptomic analyses were performed on berries - organs of particular agronomic importance - from other clones as they cannot be easily obtained from the very weak PN40024 clone.
Comparative prediction of microRNA precursors
We previously used the MicroHarvester software [13
] with all plant miRNAs present in release 9.1 of miRBase [27
] to identify 164 candidate conserved miRNAs and their precursors in the 8.4 fold coverage assembly of the genome of the grapevine (Vitis vinifera
]. Manual refinement of these predictions provided 140 high-confidence candidate pre-miRNAs classified in 28 conserved families (79 unique predicted mature microRNA sequences).
For the most part, we confirm existing patterns of miRNA family conservation with respect to completely sequenced plant genomes for which extensive analyses of miRNAs have been performed (Arabidopsis thaliana
, Populus trichocarpa
, Oryza sativa
and Physcomitrella patens
). Of the 28 families for which we identified putative precursor sequences in Vitis vinifera
(Table ), 9 are represented in all four of these species and a further 10 have been characterized in all three magnoliophytes. One family (miR403) is present in both of the previously sequenced Dicots, while two previously Arabidopsis-specific families (miR828 and miR845) were predicted in grapevine (suggesting their loss or - as yet - undetected presence in poplar), while miR479 and miR482 (annotated only in poplar and grapevine) are likely to have evolved in a common ancestor of these organisms after its divergence from the Arabidopsis lineage [26
]. miR477 precursors have been characterized only in poplar, grapevine and P. patens
, while a series of miR535 precursors represent the first members of this family to be identified in core eudicots, having been identified only in P. patens
, rice and more recently in the California poppy (Eschscholzia californica
]. The only families tested for which members have been identified in poplar and at least one other of the aforementioned genomes but for which microHarvester failed to identify candidate precursors in the grapevine genome were miR472, miR530 and miR827.
With respect to the reference annotation of protein coding genes in the Vitis vinifera genome, 127 putative pre-miRNAs were intergenic in location (17 overlapped with annotated genes but on the non-coding strand). Four precursor predictions fell within or overlapped annotated coding or UTR exons although homology searches and transcriptomics data generated subsequently to the initial annotations call into question the validity of all but two of these exon annotations. miRNA 156 h is probably an incorrect prediction derived from a coincidentally plausible hairpin structure formed by the opposite strand to the presumed target (a Squamosa-promoter Binding Protein (SBP) box gene). A similar situation is observed for miR171g which falls on the opposite strand to to a GRAS domain transcription factor gene. Nine precursor predictions were apparently intronic in location. Manual checks of the automated annotations suggested that all of the introns putatively containing pre-miRNAs were likely to be erroneous predictions, being atypically long (over 13 kb) and interrupting putative retroelement derived genes or obvious fusion gene predictions (not shown).
Deep sequencing of small RNAs from grapevine leaf tissue
We generated 13,078,222 reads with Illumina sequencing of small RNA isolated from Vitis vinifera
L. clone PN40024 leaves. 2,585,821 individual small RNA reads of 18-27 bases (19.8% of the total reads generated) yielded at least one perfect match to the draft genome after removal of adapter sequences and allowing for post transcriptional oligoadenylation of reads. After exclusion of reads mapping to annotated structural RNAs, Over 7% of the total mapped sequences were of length 21 bases and accounted for 7.8% of the genomic loci represented by the mapped data (mean redundancy of 4.38 reads/locus). 15% of loci represented were of length 24 (10.7% of tags sequenced) with a mean redundancy of 3.08 reads/locus, suggesting, in accord with other studies [29
], that miRNAs in our sample tend to be expressed at higher levels or processed more specifically than the more heterogeneous 24 base small RNAs.
Mapping of the short tags onto the genome sequence revealed that of the 28 families predicted by our comparative analysis, 23 showed at least one sequence tag either in exact or very close correspondence to the position of one of the predicted mature sequences (the exceptions being miR395, miR396, miR477, miR828 and miR845). In some cases, the most commonly observed sequences were identical to the predicted mature sequences while for other families, the predominant mature miRNA sequenced exhibited small variations (shifts or differences of length of one or two bases) with respect to the predicted mature sequences. This finding was not unexpected given the variation in mature miRNA lengths within families observed in other plant species and the nature of the comparative method used to generate the initial predictions. For predicted precursors for which matching small RNA reads were recovered, the vast majority of reads conform to the sequences indicated in Table [see also Additional File 1
: Supplemental figure S2], consistent with the primary requirement for the annotation of plant miRNA sequences [31
We recovered a number of reads that include an additional 3' base that does not correspond to any genomic locus, this tendency has been observed in other species (e.g. [29
]). Furthermore, a low but notable proportion of mapped small RNA sequences show mismatches to the genomic sequence while preferentially mapping only to putative miRNA precursor loci. This is probably due to errors in sequencing or during reverse transcription or amplification and particularly to the higher error rate of the Illumina sequencing strategy in GC rich sequences [32
]. During the course of these analyses it became clear that two precursors (miR172a and 172b) were likely to derive from the opposite strand from that initially predicted. Where appropriate, corrected mature sequences have been deposited in miRBase.
Due to identity or similarity among mature miRNAs belonging to the same family, deep sequencing of small RNAs does not allow consistent unambiguous assignment of mature miRNAs to their genomic loci of origin. Nevertheless our analysis provided indications of the presence and relative abundance of 36 distinct mature miRNAs from conserved families in leaves - corresponding to up to 65 distinct precursor loci (see Table for summary and Additional File 1
: Supplemental figure S2 for detailed maps of all small RNA reads mapping to predicted precursors).
A 12 K CombiMatrix custom array was developed to validate our in-silico miRNA predictions and to profile miRNAs expression in different tissues.
Slides were hybridized with low molecular weight RNA (LMW-RNA), extracted from six grapevine (V. vinifera L. cv Corvina) tissues: ripening berries (three stages analyzed), roots, leaves and young inflorescences. Each hybridization and LMW-RNA extraction was performed twice.
In addition to the mature miRNA sequences, the probe set included probes shifted 5 or 10 bases 3' or 5' with respect to the central base of the corresponding mature miRNAs as well as probes derived from regions of the stem not predicted to overlap with the mature miRNA sequence and controls containing maximally destabilizing substitutions with respect to probe sequences [see Additional File 2
: Supplemental figure S5]. Except for probes shifted 5 nucleotides towards the 5' end of the miRNA precursor, for more than 90% of the probes a signal drop-off greater than 90% was observed - indicating no significant hybridization for these probes occurred. On the contrary, for probes shifted 5 nucleotide towards the 5' end of the miRNA precursor the lack of signal drop-off might be due to the fact that probes were synthesized with their 3' termini towards the slide, and that no "spacer" oligonucleotide was used (according to CombiMatrix protocols). As a consequence, steric effects might reduce the specificity determined by the 3'-most five bases of the probes.
Other than for 26 out of 140 pre-miRNAs (Table ), no detectable signals were recorded for the probes designed on the precursor loop regions - likely due to size fractionation of RNA samples and the relatively short half-life of pre-miRNAs. We conclude that our miRNA expression data are principally derived from mature miRNAs molecules, without appreciable pre-miRNA contamination.
Finally, it should be noted that recent studies have demonstrated appreciable levels of cross-hybridization between closely related miRNAs and probes differing by only one or two bases [33
]. It is therefore difficult to exclude the possibility that cross-hybridization within miRNA families causes a distortion of quantitative estimates of expression levels of some individual mature miRNA sequences.
Microarray analysis of miRNA expression
Of the mature miRNA sequences considered, 56 (corresponding to 23 different families), showed significant expression in at least one tissue tested (Table and Additional File 1
: Supplemental Figure S1), and another 6 showed a borderline signal. Specifically, 41 different miRNAs showed significant signal in roots, 47 in leaves, 49 in young inflorescences, 53 in green berries, 42 in berries at veraison (the point where growth ends and maturation begins) and 40 in mature berries.
To evaluate the statistical significance of the differential expression of mature miRNAs in the six tissue considered, we set up two distinct comparisons: one among the three developmental stages of the ripening berries and the other one among leaves, roots and inflorescences. ANOVA analyses were performed with a P-value threshold of 0.05 and subsequently a Scheffè test was used to assess which of the three tissues showed significant differences. Thirteen different mature miRNAs showed a statistically significant change in signal between the ripening stages of the berry (Figure ), and 27 miRNAs showed significant changes in their expression when comparing three different tissues (leaves, roots and inflorescences)(Figure ). miR395a and miR171h show a distinctive pattern of expression - being highly expressed at veraison with respect to the other two stages (4.4 and 2.3 fold changes of expression level respectively) (Figure ). Seven miRNAs (miR156f, miR169a, miR169f, miR169r, miR169x, miR319b and miR535a) are more expressed in mature berries than in green berries (Figure ). Four miRNAs (miR171c, miR172c, miR396c, miR403a) are, on the contrary, more expressed in green berries, their expression decreasing during ripening (Figure ).
Figure 2 Differential expression of mature miRNAs by tissue. miRNAs showing significant changes in expression by tissue are reported. Panels A-C: miRNAs differentially expressed in one stage of berry ripening A: at veraison, B: in green berries, C: in mature berries. (more ...)
Clear patterns also emerge from analyses of differential expression between roots, leaves and young inflorescences. Thirteen miRNAs are significantly differentially expressed in roots, showing a similar expression in the other tissues. In particular miR397a, miR398b and miR408 all show at least 100 fold higher expression in root than either leaf or early inflorescences, while miR159a, miR160a, miR399a, miR399b, miR403a and miR535 show more modest, but still significant, changes in the same comparisons (Figure ). On the contrary miR164a, miR164b, miR171c and miR172c show a significantly lower level of expression in roots (Figure ).
Five miRNAs (miR169v, miR169y, miR171f, miR171h and miR319b) yield significantly higher signals in young inflorescences than both leaves and roots (between 2 and 7.2 fold higher levels in this tissue)(Figure ). Only one miRNA, miR160c, shows a leaf-specific expression profile (2.5 fold lower level in leaves with respect to other tissues) (Figure ). Finally, six miRNAs (miR169a, miR169e, miR169f, miR169x, miR171e and miR395a) exhibit significant differences in expression levels in all comparisons between leaf, root and inflorescences (Figure ). Five of these miRNAs (169a, 169e, 169f, 169x and 171e) show the highest expression in young inflorescences and the lowest in roots.
Following the widespread assumption that many miRNA/target interactions are conserved between related species [2
], our data regarding differential expression of mature miRNA sequences raise some intriguing possibilities particularly with respect to the potential importance of miRNA in the regulation of fruit maturation.
Li et al. [34
] recently showed that the transcription factor NFYA5 is targeted by miR169 and that overexpression of miR169 leads to excessive water loss through leaves and hypersensitivity to drought stress in A. thaliana
. In this light, the preponderance of miR169 family members in the group of miRNAs upregulated in mature berries is striking and might reflect a mechanism to protect maturing fruit from dehydration. We also note that miR535 family, identified so far only in O. sativa
and P. patens
] is upregulated during berry maturation. This is a first indication of a possible function of miR535 for which no information was previously available. miR396c shows 6 fold decrease in expression during ripening. The mir396 family targets seven Growth Regulating Factor
) genes in Arabidopsis [14
genes encode putative transcription factors associated with cell expansion in leaf and other tissues in A. thaliana
and O. sativa
]. A potential role for miR396 in the regulation of cell expansion during fruit maturation is an intriguing hypothesis. In addition, recent data also link miR396 to responses to abiotic stresses including drought [38
], again suggesting the importance of water homeostasis during berry ripening. miR172, downregulated during berry maturation, targets Apetala 2
(AP2) -like transcription factors, regulators of flowering time, organ identity and of vegetative phase change [39
]. In grapevine, genes related to AP2 are upregulated at veraison, being involved in berry maturation [40
] and putatively connected with abiotic and biotic stress resistance. This evidence fits well with our findings. The sharp up-regulation of miR395 at veraison suggests a further role for miRNAs in an agronomically important aspect of grape maturation. miR395 is known to contribute to the regulation of sulfur metabolism, targeting both sulfate transporters and ATP sulphurylase genes. A direct connection between ATP sulfurylases and berry maturation has not been demonstrated, but it is known that a Glutathione S-transferase is strongly connected with berry ripening and in particular with coloration during berry development [40
miR397a, miR398b and miR408 which are extremely highly expressed in root tissues target various copper proteins: plantacyanin, laccases and a superoxide dismutase, all putatively involved in stress responses and lignification [14
]. These miRNAs have also been shown to be coexpressed in Arabidopsis under conditions of copper deprivation [43
]. Moreover some laccase
genes in Arabidopsis are root specific (for example AtLAC15
) or mostly expressed in roots [44
] and are involved in root elongation and lignification [45
]. Given that grapevine roots are much more lignified than those of Arabidopsis, it is plausible that regulation of laccase expression is vital in the grapevine. It is interesting to note that the laccase family is, along with other polyphenol oxidase gene families, massively expanded in grapevine with respect to Arabidopsis (>60 genes in V. vinifera
, 17 in Arabidopsis).
Whole transcriptome sequencing and differential expression of precursors
The majority of plant miRNA genes are transcribed by RNA polymerase II and result in the production of polyadenylated primary transcripts [8
]. A strict correlation between expression levels of individual precursors and levels of mature miRNAs should not be expected. Mature miRNAs are likely to be, in general, more stable than their corresponding primary transcripts and may derive from more than one genomic locus. Furthermore, recent data in plants [46
] and animals [47
] suggest that a variety of mechanisms, including alternative splicing and the specific binding of protein factors, can regulate the efficiency with which pri- or pre-miRNAs are processed. High levels of primary transcript can thus be associated with low levels of mature miRNAs and vice-versa. These considerations notwithstanding, it is reasonable to presume that sequences derived from highly expressed pri-miRNA transcripts should be represented in whole transcriptome "deep sequencing" experiments.
To investigate this hypothesis, we have analyzed whole polyA+
transcriptome data generated with the Illumina Solexa technology [48
] and Roche 454 next generation sequencing platforms.
A total of 135,047,735 Illumina sequences (33-35 bases in length) derived from polyA+ RNA isolated from 4 tissues (in vitro cultivated juvenile leaf (29,829,113 sequences), in vitro cultivated juvenile stem (30,785,175 sequences), in vitro cultivated juvenile root (29,254,635 sequences) and embryonic callus (45,178,812 sequences) were mapped to the grapevine genome and coordinates compared to those for predicted pre-miRNAs.
The statistical significance of the number of reads mapping within a predicted pre-miRNA was evaluated (see Materials and Methods) and 52 predicted precursors show significant expression in at least one tissue (25 in leaf, 38 in stem, 17 in root, 33 in callus)(Table ). Many predicted precursors show a wide expression (miR156d, miR159c, miR166a and c, miR168, miR171a, miR398a, miR398b and c, miR408, miR482). In some families, when expressed, precursors show overlapping patterns. For example, miR319c, miR319e and miR319f are all expressed in stem, while miR319c and miR319g are expressed in callus, no expression of miR319 was detected in leaf or root. A similar situation is observed for the miR396 family. In other cases, different precursors seem to be predominantly expressed in different tissues. For example miR171e transcripts are detected only in callus, miR171f is only transcribed in stem while miR171g is observed in callus and root - a similar situation can be observed for several families including miR166, miR167 and miR169). These data suggest that tissue specific expression of different precursors within single families is widespread in the grapevine.
454 sequencing generated 613,098 and 581,655 reads respectively from leaf and berry polyA+ RNA. The expression of 15 unique predicted precursor sequences received ulterior support from these data (Table ). With the exception of miR160b and the miR535 family the expression of all precursors detected by 454 sequencing in leaf was also strongly supported by Illumina data. Interestingly, given the lack of detectable expression of the mature sequence in leaf or berry, miR482 precursors were detected at high levels both by Illumina and 454 sequencing, suggesting post-transcriptional regulation of processing of this transcript.
Estimation of primary microRNA transcripts and splice sites
For a number of predicted microRNAs the density of coverage of the corresponding genomic loci was sufficient to attempt to estimate primary transcript coordinates as well as patterns of splicing and alternative splicing.
We constructed Position Specific Scoring Matrices (PSSMs) of experimentally validated grapevine canonical splice donor and acceptor contexts (French-Italian Consortium for Characterization of the Grapevine Genome, unpublished data) and used these matrices to evaluate all possible canonical splice donors and acceptors from 3 kb upstream to 3 kb downstream of predicted microRNA precursors showing extensive coverage by Illumina RNA-Seq reads. The positions and flanking exonic sequences of all possible splice donor/acceptor pairs were used to combinatorially generate possible splice junctions. RNA-Seq reads which did not map perfectly to the genome sequence were compared to these computationally generated splice junctions and pairs providing perfect matches (with at least 8 bases on either side of the splice junction) were recorded along with the tissue distribution of reads supporting each splice event. Additionally, for each supported splice event, we recorded the ratio of base coverage by RNA-Seq of the flanking 40 putatively exonic bases and the coverage of flanking 40 putatively intronic bases (not including reads previously identified as covering the putative splice junction). Three fold greater coverage of exonic regions was considered as additional support for the presence of a functional splice junction. Introns inferred from mapping of 454 transcriptome reads were also recorded.
Visual examination of RNA-Seq coverage of regions upstream and downstream of miRNA precursor loci was used to provide initial estimates of transcript start and end positions. This step was complicated by the known propensity of RNA-Seq to provide uneven coverage of transcript termini - presumably due to the dynamics the nebulization step in sample preparation and to issues associated with differential recovery of fragments during sample preparation. Accordingly, we subjected the 6 kb interval centered on predicted precursors to promoter prediction analysis by TSSP-TCM [49
] in order to attempt to provide support for manually identified transcription start sites. Estimated transcript coordinates, putative intron coordinates quality scores for each donor/acceptor, frequencies of splice junction-covering reads, and TSS proposed by TSSP-TCM are reported in Additional File 1
: Supplemental Table S4.
Figure shows the transcriptional landscape for the genomic region from 3 Kbp upstream to 3 Kbp downstream of three exemplar predicted miRNA precursors, including introns inferred from 454 and Illumina sequence data, the concordance of splicing events identified by 454 and Illumina reads is notable and consistent with the reliability of the Illumina data to infer splicing events. Detailed genomic alignments of all reads supporting splices indicated in Figure are available in Additional File 1
: Supplemental Figure S4. We note that relative numbers of tags representing different regions of putative primary miRNA transcripts vary but tend to be consistent between different tissues. The GC content of 100 base windows centered on each genomic position are also shown and illustrate, within the proposed primary transcripts, an apparent correspondence between depth of coverage and GC content [32
Figure 3 Transcription and splicing of pri-miRNAs in Vitis vinifera. A summary of transcription of genomic loci containing predicted pre-miRNAs is provided. Illumina whole transcriptome reads per base are reported for four tissues as log(number of reads/expected (more ...)
Figure shows the transcriptional context of miR394b and the presence of a canonical intron supported by 14 Illumina reads (7 distinct sequences). This intron was also easily detectable through RACE experiments (see Additional File 1
: Supplemental Table S3). We note that the position of the intron corresponds well to a region of low, or undetectable levels of Illumina transcriptome coverage, and that tissue specific differences in Illumina reads mapping to this region are quite apparent. These data suggest that our approach is capable of identifying introns in pri-miRNA transcripts and differences in steady state levels of pri-miRNA transcripts between tissues. Additionally, 3' RACE experiments indicated a transcript 3' end within 20 bp of the position predicted from RNA-Seq read coverage (see Additional File 1
: Supplemental Table S3).
The miR162 precursor (Figure ) is of particular interest in that it covers a region including several potential canonical introns that are supported by multiple Illumina and 454 reads. All tissues indicate that the transcriptional start site falls between positions 4,714,680 - 4,714,687 on chromosome 17. The postulated canonical introns imply alternative splicing of the nascent primary transcript from this locus, as does the coverage of the region by 454 contigs (whose map positions are also consistent with the Illumina data with respect to the overall coordinates of the nascent primary transcript). Several of these introns, and the alternative splicing of this transcript were also supported by preliminary RACE experiments (see Additional File 1
: Supplemental Table S3). Indeed, while the boundaries of proposed introns correspond to "shoulders" of falling transcript coverage, significant levels of reads mapping within the introns are observed. Interestingly, Hirsch et al. [46
] recently demonstrated that the primary miR162a transcript of Arabidopsis is subjected to complex pattern of alternative splicing, similar to that proposed for the grapevine miR162 transcript. In Arabidopsis, only unspliced isoforms are capable of yielding mature miRNAs. Our findings therefore suggest conservation of alternative splicing as a key regulatory mechanism in miR162 expression and indicate that Illumina and 454 transcript data can also be used to identify alternatively spliced plant pri-miRNAs.
Figure shows evidence for expression of the miR168 locus. Analogously to miR162, our data suggest alternative splicing of the pri-mRNA, while the distribution of 454 contigs is highly consistent with the Illumina data. Vaucheret et al. [50
] showed that AGO1, the target of miR168 is involved in the regulation of miR168 stability. Our data may hint at yet another mechanism of regulation of this intriguing miRNA.
Of 25 precursor loci chosen on the basis of extensive RNA-Seq coverage (see Additional File 1
: Supplemental Table S3), 18 showed evidence of transcript splicing and 8 of alternative splicing, suggesting that post-transcriptional modification of miRNA transcripts is likely to be widespread. It is possible that some splicing events frequently identified by deep sequencing approaches could be associated with regulation of downstream processing of transcripts as has been shown for the miR162 transcript of Arabidopsis [46
]. For miR162 and miR168, this hypothesis might be consistent with the low levels of mature microRNA observed by deep-sequencing, in contrast to the apparently high spliced transcript levels. For several pre/pri-miRNA loci (notably miR162 and miR168) we infer several closely related canonical introns (shared splice donors with splice acceptor sites shifted by a few tens of bases or vice-versa). We speculate that this phenomenon might be due, in part, to the incapacity of the Nonsense Mediated Decay pathway (which is dependent on ribosomal scanning of mRNAs [51
] to monitor "erroneous" splicing of non-coding transcripts.
The estimation of primary transcript coordinates, and in particular transcription start sites is a critical step towards the elucidation of specific mechanisms regulating the expression of miRNAs at the transcriptional level. Our Illumina transcript reads are non-directional - it is not possible to establish from which strand of the genome a transcript is derived. However, we show elsewhere that both concomitant transcription of both genomic strands at single loci and transcription of intergenic regions are rare in grapevine (French-Italian Consortium for Characterization of the Grapevine Genome, unpublished data). Thus, evidence of transcription of intergenic pre-miRNAs can reasonably be considered as validation of transcription of the precursor.
The finding that relative depth of coverage of different regions of primary transcripts is consistent between tissues suggest the presence of systematic biases in either the procedure used to fragment the cDNA, in amplification of fragments for sequencing, or in sequencing efficiency. Dohm et al. [32
] observed a strong relationship between local GC content and depth of coverage with Illumina genome resequencing. Indeed, we observe a relationship between local GC content and depth of coverage - even within regions that show contiguous coverage and are unlikely to represent introns (correlation between log coverage for positions represented by at least one sequence and GC content of 100 base window centered on that position for all bases within 3 kb of a predicted precursor is >0.25 for all tissues, p = 0). However, grapevine introns between both coding and non-coding exons show a low GC content (34.7 and 32.3% respectively) with respect to coding and non-coding exons (44 and 37.3% respectively) [26
]. Thus, it may be difficult to differentiate between introns and regions where low coverage is a result of low GC content in exonic regions on the basis of Illumina transcriptome data - particularly where levels of template are likely to be low and a-priori
gene models are not available. However, the discovery that putative splice junctions for pri-miRNAs can be identified by discontiguous mapping of illumina reads may help to ameliorate this problem for plant pri-miRNAs. The fact that we recovered evidence of alternative splicing of miR162, is consistent with data from A. thaliana
] and validates our basic approach. Indeed, other putative pri-miRNAs, including miR394b show evidence of splicing from both transcript coverage and discontiguous mapping of whole transcriptome reads.