|Home | About | Journals | Submit | Contact Us | Français|
Diatoms, which are important planktons widespread in various aquatic environments, are believed to play a vital role in primary production as well as silica cycling. The genomes of the pennate diatom Phaeodactylum tricornutum and the centric diatom Thalassiosira pseudonana have been sequenced, revealing some characteristics of the diatoms' mosaic genome as well as some features of their fatty acid metabolism and urea cycle, and indicating their unusual properties. To identify microRNAs (miRNAs) from P. tricornutum and to study their probable roles in nitrogen and silicon metabolism, we constructed and sequenced small RNA (sRNA) libraries from P. tricornutum under normal (PT1), nitrogen-limited (PT2) and silicon-limited (PT3) conditions.
A total of 13 miRNAs were identified. They were probable P. tricornutum-specific novel miRNAs. These miRNAs were sequenced from P. tricornutum under normal, nitrogen-limited and/or silicon-limited conditions, and their potential targets were involved in various processes, such as signal transduction, protein amino acid phosphorylation, fatty acid biosynthetic process, regulation of transcription and so on.
Our results indicated that P. tricornutum contained novel miRNAs that have no identifiable homologs in other organisms and that they might play important regulator roles in P. tricornutum metabolism.
Diatoms are important planktons that are believed to be responsible for one-fifth of the primary productivity on Earth [1,2]. There are two major classes of diatoms, the pennates and the centrics. With their vital role in silica cycling [3,4], the unusual evolutionary position of secondary endosymbiotic origin [5-9], the presence of C4 photosynthesis in some species , and potential as sources of biodiesel fuel , diatoms have attracted increasing attention. As early as 2002, Scala et al.  analyzed EST (expression sequence tag) data of the pennate diatom Phaeodactylum tricornutum and found that some of its genes were more similar to those of animals than of photosynthetic counterparts, implying an unusual evolutionary history. The genome of P. tricornutum and the centric diatom Thalassiosira pseudonana have been sequenced, shedding light on significant features of diatom genomes, including the mosaic genome that contains 'animal-like', 'plant-like' and 'bacteria-like' genes, performing fatty acid metabolism in both peroxisomes and mitochondria, and the presence of enzymes necessary for a complete urea cycle [7,13,14]. These characteristics prompted us to hypothesize that the gene expression regulators (e.g. miRNAs) of diatoms may show some different specificity to other photosynthetic organisms.
miRNAs are important post-transcriptional regulators. They regulate gene expression in eukaryotes by targeting mRNAs for translational repression or cleavage [15-17]. It is believed that miRNAs exist extensively in eukaryotes such as animals and plants with high conservation in each kingdom [18,19]. The expression of miRNAs has a spatio-temporal pattern [15,17,20-22] and they influence the transcription and translation of many genes . Generally, their functions involve various processes, including developmental patterning, organ separation, cell differentiation and proliferation, tumor generation, cell death and cell apoptosis, stress resistance, auxin response, fat metabolism and miRNA biogenesis . In higher plants and animals, miRNAs have been extensively studied but rarely so in algae.
P. tricornutum is an atypical diatom with a weakly silicified outer shell, and the unusual property of being pleiomorphic with three convertible morphotypes  (i.e. oval, fusiform and triradiate), and silicification essentially restricted to one valve of the oval cells [24-28]. With its characteristics of short life-cycle, small genome size and ease of transformation, P. tricornutum has become an attractive photosynthetic model [12,14,29,30]. Additionally, being rich in polyunsaturated fatty acid (PUFA), especially in eicosapentaenoic acid (EPA), P. tricornutum has been used as a food organism and is considered a potential source of EPA. There have been many studies investigating the factors affecting its cell composition [31-34]. There were reports that microalgae accumulated lipids under nitrogen-limited as well as silicon-limited conditions [35,36], with similar studies conducted on P. tricornutum [33,34]. Accumulation of lipids in cells and a significant change in fatty acid composition were observed in P. tricornutum under low nitrogen conditions. Using suppression subtractive hybridization technology, Tang et al. separated a number of upregulated genes from P. tricornutum under nitrogen starvation, seven of which had high similarity with functional genes related to nitrogen utilization . Studies of lipid metabolism of P. tricornutum under silicon-limited conditions are scarce. Notwithstanding, Sapriel et al. identified 223 genes regulated by silicic acid availability, including 13 upregulated and 210 downregulated genes, from P. tricornutum under silicon-limited conditions . Interestingly, they also observed some upregulated genes coding for transporters of metabolites related to nitrogen assimilation and transfer from P. tricornutum in the complete medium compared to silicon-limited conditions. A previous study on T. pseudonana showed that a glutamate acetyltransferase was involved in silicon metabolism . How are these genes regulated? Do miRNAs play a role in P. tricornutum nitrogen and silicon metabolism? There have been few studies that address these questions.
In the present study, we constructed small RNA (sRNA) libraries from P. tricornutum under normal, nitrogen-limited and silicon-limited conditions and then used high-throughput Solexa technology to deeply sequence the sRNAs. The sequencing data were analyzed and miRNAs were identified from all samples studied.
To determine the likely roles of miRNAs in nitrogen and silicon metabolism in P. tricornutum, we constructed and sequenced small RNA libraries from P. tricornutum grown in normal (PT1), nitrogen-free (PT2) and silicon-free (PT3) media, respectively. After removing adaptor sequences and filtering out low quality data (see Additional file 1 for flow chart of the procedure for processing of reads), we obtained small RNAs with size range of 10-30 nt, with an enrichment in 20-22 nt (Figure (Figure1).1). After removing sequences shorter than 18 nt, we obtained 8 924 476, 5 609 466 and 6 982 282 total sequences, representing 718 770, 596 498 and 672 323 unique, although sometimes partially overlapping, clean reads from PT1, PT2 and PT3, respectively (Table (Table1).1). Of these unique sequences, about 73% (521 761), 74% (441 959) and 73% (491 748) were only sequenced once. There were 4 105 629, 2 492 000 and 2 908 127 total; and 221 523, 262 038 and 250 371 unique sequences with at least one perfect match in the P. tricornutum nuclear genome - whereas 3 076 974, 1 503 395 and 2 410 100 total; and 68 048, 43 151 and 55 321 unique sequences matched the chloroplast genome, in PT1, PT2 and PT3, respectively (Table (Table1).1). It was quite unexpected that a majority of sRNAs were located in the minus strand of chromosome 13 and both strands of the chloroplast genome (Figure (Figure2).2). The usual preference for a U at the 5' - end of plant small RNA sequences  was not observed (see Additional file 2 for redundant small RNA nucleotide bias at each position). The four types of bases appeared equally in each locus.
All clean reads were annotated according to their identities with non-coding RNAs (Rfam, GenBank), plant miRNAs (miRBase), exon and intron (P. tricornutum genome) and siRNAs (Table (Table22 and Additional file 3). In the case that some sRNA was mapped to more than one category, the following priority rule was adopted: rRNA etc. (in which GenBank > Rfam) > known miRNA > exon > intron . rRNA degraded fragments were the most abundant sequences retrieved from the P. tricornutum total sRNA pools, boasting the highest read frequency of all small RNA classes in all the samples: 62.53, 48.29 and 54.96% for PT1, PT2 and PT3, respectively (Table (Table22 and Additional file 3). Yet in the unique sRNA pools, non-annotated sRNA represented a significant part, with 50.53, 50.61 and 54.98% in PT1, PT2 and PT3, respectively. Homologs of plant known miRNAs accounted for approximately 0.5% of the unique sequences in all the three samples, whereas in total sequences pools, the numbers were approximately 0.6% in PT2 and PT3 and only 0.4% in PT1. sRNAs mapped to exons and introns in either sense or antisense directions also represented a considerable part. The remaining sRNAs were snRNA, snoRNA and tRNA. Common and specific sequences analysis showed that only approximately 15% of the unique sequences were shared by every two samples (Table (Table33 and Additional file 4), suggesting a diverse set of endogenous small RNAs in P. tricornutum.
The identification of a great quantity of small RNAs in P. tricornutum prompted us to examine whether some were miRNAs. First we compared all the non-annotated sRNAs with the sequences of animal miRNAs and virus miRNAs available from miRBase (miRBase Sequence Database version 15)  to identify homologs of known miRNAs. Then we used the small RNAs with homology to all known miRNAs (including plant, animal and virus miRNAs) and the remaining non-annotated sRNAs to identify candidate known and novel miRNA families in P. tricornutum, respectively (see Additional file 1 for flow chart of the procedure for miRNA identification). First we mapped these small RNAs onto the P. tricornutum nuclear genome. Then we extracted 300 nt upstream and 300 nt downstream from those loci and examined whether they could form hairpin secondary structures, a character of known plant and animal pre-miRNAs, using criteria developed previously for plant miRNA prediction . Basically, precursors with free energy ≤ -18 kcal/mol checking by Mfold [44,45], ≥ 16 bp and ≤ 4 bulges or asymmetries between miRNA and miRNA*, with miRNA sequence length between 18-25nt and flank sequence length of 20, were considered as potential P. tricornutum pre-miRNAs and selected for further analysis. Secondary structural predictions identified a total of 21 small RNA species that were derived from genomic loci whose surrounding sequences had the probability to form hairpin structures that met the requirements as a miRNA precursor. Then we checked for the structure stabilities of these 21 sequences. Among these, five were found to have a P-value lower than 0.05. They were checked for 5' homogeneity using 0.5 as cut off. For those sequences with a P-value above 0.05, a more stringent 5' homogeneity of 0.75 was used. All together we obtained 14 sequences for manually rechecking according to criteria made previously for miRNA identification [46-48]. Finally we determined 13 sequences to be P. tricornutum miRNAs. They were submitted to miRBase and named pti-miR5471-5483. Of these 13 small RNAs, seven of pre-miRNA hairpins were supported by EST data.
Each miRNA had a single precursor. The length of pre-miRNA ranged from 101 to 360 nt, with a mean of 235 nt (Table (Table4,4, see Additional file 5 for patterns of reads mapped to the pre-miRNAs and Additional file 6 for figures of stem loops for pre-miRNAs). The MFE range was -105 to -26.1 kcal/mol, with a mean of -67.61 kcal/mol. Most pre-miRNAs were located in intergenic regions and the others were mapped to genes of hypothetical protein, probably being mis-annotated.
To investigate the probable roles of miRNAs in nitrogen and silicon metabolism in P. tricornutum, we sequenced small RNAs from P. tricornutum grown in normal, nitrogen-limited and silicon-limited media. Of the 13 miRNAs identified, two appeared in all the three small RNA libraries, one exclusively in PT2 and eight in PT3; and one was shared by PT1 and PT3, and one by PT2 and PT3 (Table (Table4).4). The expression of miRNAs in the samples indicated that they might play an important role under nitrogen-limited and/or silicon-limited conditions. To determine the likely regulated genes, we predicted targets of these miRNAs. Using the rules for target prediction suggested by Allen , no target was identified. Ignoring locus one and those larger than 21 nt and allowing four mismatches between the miRNA-target duplex in positions 2-21, some potential target sites were suggested (see Additional file 7 for information of potential target genes). Some of these potential targets were involved in lipid metabolism, suggesting that P. tricornutum miRNAs might play a role in fatty acid metabolism. This was in accord with the report that P. tricornutum accumulated lipids under nitrogen-limited and silicon-limited conditions [32-34]. However, as the genome of P. tricornutum is not fully annotated and the functions of many protein-coding genes are unknown, it is difficult to determine whether these miRNA targets have any functional bias.
It has been reported that in Arabidopsis, miRNAs direct the generation of siRNA (termed ta-siRNA), which were phased relatively with each other . To determine whether miRNAs direct the generation of siRNA in P. tricornutum, we identified potential siRNAs and determined their location. Potential siRNAs were found in these samples: with 499, 2032 and 2483 unique sequences; and 1206, 6135 and 7836 total sequences in PT1, PT2 and PT3, respectively. The majority of siRNA were produced from a few hot-spots distributed in all the chromosomes; however, they were not phased relatively with each other. To determine whether small RNAs play a role in silencing of repetitive sequences in P. tricornutum, as for other organisms, we performed a BLAST search against P. tricornutum repeat sequences and found 16 (PT1), 100 (PT2) and 167 (PT3) siRNA derived from these regions. This implied that small RNAs might induce silencing of repetitive sequences in P. tricornutum.
MiRNA northern blotting was used to detect initial expression of miRNAs and their precursors in P. tricornutum. 5s RNA was blotted as load control. Northern blot hybridization detected precursors of expected size (~100 nt for pti-miR5473 and ~200 nt for pti-miR5475) in all the samples (Figure (Figure3).3). This provided strong evidence for their expression.
We compared all P. tricornutum small RNAs (Table (Table1)1) with all known plant, animal and virus miRNAs in miRBase, and found significant identities (Table (Table5).5). However, these identities did not pass the criteria we used to identify miRNAs. The most straightforward interpretation for the relative lack of universally conserved miRNAs between P. tricornutum and other organisms is that all miRNAs in P. tricornutum are rare due to its small genome size, although scenarios that P. tricornutum contains novel miRNAs that have no sequence homology with all known ones cannot be ruled out. In a study of miRNAs in the unicellular green alga Chlamydomonas reinhardtii, Zhao et al.  compared its miRNAs with all known plant and animal miRNAs, and found no homologs. In fact, C. reinhardtii lacked homologous miRNAs even with other green algae . Thus we asked whether P. tricornutum had some specific miRNAs that have no sequence homology with all known miRNAs, as for C. reinhardtii. We predicted novel miRNAs from the small non-annotated RNAs, using the same criteria as used to identify known miRNAs. A total of 13 novel miRNAs were identified from P. tricornutum under normal, nitrogen-limited and/or silicon-limited conditions. They lacked homology with all known miRNAs in the miRBase, including C. reinhardtii miRNAs. Thus we propose that miRNAs in algae may have evolved independently to animals and plants, consistent with the suggestion of Zhao et al .
We also used the P. tricornutum chloroplast genome to identify miRNAs. Two loci met all the criteria we used to identify miRNAs. Interestingly, one of these miRNA-like small RNAs was homolog of cin-miR4175, and part of the potential precursor shared 74% identity (21% mismatches and 6.5% gaps) to cin-miR4175 precursor. EST analysis of P. tricornutum showed that many of its genes were more similar to animals than photosynthetic organisms . Complete genome sequences showed that diatoms had a mosaic genome with genes from animals, plants and bacteria [13,14]. Thus it is probable that P. tricornutum might share some common miRNAs with animals, although the percentage may be relatively low. We propose that this animal miRNA-like small RNA from P. tricornutum might be present in diatoms due to gene transformation, or are conserved miRNAs derived from the heterotrophic secondary-host evolutionarily prior to the secondary endosymbiosis, or may be miRNAs lost in the plant/red algal lineage during evolution, similar to the incorporation of animal-like genes in diatoms . If this small RNA found in our study was genuine miRNAs (i.e. P. tricornutum contains animal miRNAs, which located in chloroplast genome), then this represents a very interesting discovery.
De Riso, et al. had successfully demonstrated gene silencing in P. tricornutum . They analyzed molecular players involved in RNA silencing in P. tricornutum and identified both Dicer like proteins (RNA splicing enzyme) and Argonaute like proteins (core components of the effector RNA-induced silencing complexes, RISC). These Argonaute like proteins in P. tricornutum clustered in a clade different from either animals or plants , suggesting that P. tricornutum might own a special RISC pathway different from that of animals and plants, and thus probably result in the lack of preference for U at the 5' of P. tricornutum sRNAs.
miRNAs have been found to play important regulatory roles in various processes in multicellular organisms as well as the unicellular green alga C. reinhardtii [18,40]. In the present study, miRNAs were sequenced from P. tricornutum under normal, nitrogen-limited and silicon-limited conditions (Table (Table4).4). This suggests that miRNAs might play important roles in P. tricornutum.
Two miRNAs appeared in all samples (Table (Table4).4). Candidate target genes for these miRNAs included DNA-directed RNA polymerase; glutamate synthase and Δ5 fatty acid desaturase (fatty acid metabolism). This indicates that P. tricornutum miRNAs might play important roles in a range of biological processes. It was reported that the composition of fatty acids was significantly influenced by availability of nitrogen [32-34] and silicon [35,36]. Some genes related to glutamate/glutamine metabolism are regulated by silicon availability . Interestingly, we predicted that one gene involved in glutamate synthesis (ferredoxin-dependent glutamate synthase) was targeted by pti-miR5474, which was downregulated in both PT2 and PT3, indicating that miRNA might play a role in silicon-regulated glutamate metabolism.
There were eight miRNAs exclusively sequenced from PT3 (Table (Table4).4). Candidate target genes for these miRNAs include phospholipase C isoform delta (lipid metabolic process), nucleotide transporter, ornithine aminotransferase, nucleosome remodeling factor. In P. tricornutum, silicification is restricted to one valve of the oval cells and there is no silicon requirement for growth . The strain used in the present study was a fusiform type whose cell wall was not silicified. However, miRNA species were most abundant in PT3 (12/13), and their targets involved in various processes, indicating that various biological processes might be influenced by silicon available through miRNA regulation.
It was interesting that a majority of sRNAs were located in the minus strand of chromosome 13 and both strands of the chloroplast genome (Figure (Figure2).2). As reported by McFadden and van Dooren , green algal/plant and red algal originated from a first endosymbiosis between a eukaryotic and a endosymbiont, whereas diatoms originated from the secondary endosymbiosis between a heterotrophic organism and a red alga. The diatom chloroplast originated from the plasmid of the second endosymbionts, while nucleus of the second endosymbionts lost, living enormous numbers of their genes - typically more than 90% - house in the second host nucleus [6,7,50-52]. We proposed that the enrichment of sRNAs on the minus strand of chr13 as well as both strands of the chloroplast genome indicated that chr13 might have some relative to the second endosymbionts. E.g., chr13 might have originated from nucleus of the second endosymbionts or the majority of the second endosymbionts nuclear genes might have transform into chr13. To test this hypothesis, we extracted the hot spot loci where most small RNA derived from. Those were 39000-46000 nt of the minus strand of chr 13, 63675-70586 nt of the sense strand of chloroplast genome, and 110485-117369 nt of the minus strand of the chloroplast genome. We then aligned them and found that the hot spot locus of chr 13 had no homology with the chloroplast genome. Thus, even if chr 13 have some relative to the second endosymbionts, our data has little support for this hypothesis. We also found that the two hot spot loci of the chloroplast genome in fact share 100% identity. They are two inverted repeats, IRa and IRb, on the chloroplast genome. Thus, small RNAs might play an important role in silencing of inverted repeat region.
We detected precursors of expected size for pti-miR5473 and pti-miR5475. In other organisms, precursors were more difficult to detect than mature miRNAs in wild type samples [53,54], probably due to their temporary summation in the cells and convert fast into mature miRNAs. We detected miRNA precursors in all the three samples of P. tricornutum easily (Figure (Figure3),3), implied that diatom might obtain different miRNA processor from other organisms, which made the accumulation of miRNA precursors. Expected sizes for the mature miRNAs were not detected. The most straightforward interpretation for this is the low expression of mature miRNAs in the samples we detected, although scenarios that these miRNAs are not real miRNAs but sequencing artifacts or fragments of a longer transcript cannot be ruled out. More sensitive technology is needed to perform further analysis.
Our results indicated that P. tricornutum owned a complex sRNA processing system. It contained novel miRNAs that have no sequence homology with miRNAs of other organisms and that they might play important regulator roles in P. tricornutum metabolism.
Axenic cultures of Phaeodactylum tricornutum were available in our laboratory. Cultures were grown in f/2 medium  made with steam-sterilized local seawater supplemented with inorganic nutrients and f/2 vitamins (filter sterilized). Cultures were grown at 20°C under cool white fluorescent lights at 24 μmol.m-2.s-1 with a 12-h photoperiod for one week. Then cells were harvested by centrifugation for 10 min at 4000 g, washed with sterilized seawater, aliquoted into a 500-mL conical flask and then incubated in normal, nitrogen-free and silicon-free f/2 media made with artificial seawater  for 48 h. Then cells were harvested by centrifugation for 10 min at 4000 g, washed with 4 mL of sterilized seawater, aliquoted into 1.5-mL Eppendorf tubes, and pelleted for 2 min at 10 000 g. Cell pellets were frozen instantly in liquid nitrogen and stored at -80°C before RNA extraction.
Total RNA was extracted from Phaeodactylum tricornutum cells using the Trizol method according to manufacturer's protocol (Invitrogen, USA). Basically, sRNAs were separated by size fractionation on denaturing polyacrylamide gels. Fragments of 18-28 nt were gel-purified then ligated to a 5'-adaptor and a 3'-adaptor and then RT-PCR-amplified using SuperScript II Reverse Transcription Kit (Invitrogen, USA). RT-PCR product was then sequenced directly using a Solexa 1G Genome Analyzer according to the manufacturer's protocols (see Additional file 1 for flow chart of the procedure for sample preparation and sequencing).
After removing adaptor sequences and filtering the low-quality tags from the raw reads, the remaining small RNA sequences (clean reads) were mapped to the Phaeodactylum tricornutum v2.051706 genome and chloroplast genome , using the Short Oligonucleotide Analysis Package (SOAP) , all hits were reported and mismatch was not allowed. Non-coding RNAs (rRNA, tRNA, snRNA and snoRNA) degradation fragments were identified by comparing all the clean reads with the sequences of noncoding RNA available in Rfam  and the GenBank noncoding RNA database , using blastn  with a e-value of 0.01 as cutoff. Degraded fragments of mRNA were identified by aligning all the clean reads with exons and introns of mRNAs annotated on the Phaeodactylum tricornutum genome and chloroplast genome. sRNAs with perfect overlapped with mRNA sequences were considered as mRNA degraded fragments. homologs of known miRNAs were identified by comparing all the clean reads with the sequences of known miRNAs available from miRBase (miRBase Sequence Database version 15) . If a Phaeodactylum tricornutum sRNA exhibited homology with ≤ 2 mismatches (or 90% identity) with other known miRNAs, it was considered a homolog of known miRNAs. Potential siRNA candidates were identified by aligning tags from clean reads to each other; the two perfectly complementary sRNAs with 2 nt hanging at the 3'-end were annotated as siRNA. The remaining sequences were used for further characterization (see Additional file 1 for flow chart of the procedure for processing of reads). All of the raw reads and clean reads generated in this study have been submitted to the GEO at NCBI under accession number GSE29321.
After initial processing, homologs of known miRNAs and the remaining non-annotated sRNAs were used to identify miRNAs (see Additional file 1 for flow chart of the procedure for miRNA identification). We first mapped them to genome. sRNAs with more than one read, and ≤ 20 hits to the genome were used for pre-miRNA secondary structure filtering. 300 nt upstream and 300 nt downstream from those loci were extracted and examined for hairpin secondary structures to identify potential miRNAs using criteria developed previously for plant miRNA prediction . Basically, precursors with free energy ≤ -18 kcal/mol checking by Mfold [44,45], ≥ 16 bp and ≤ 4 bulges or asymmetries between miRNA and miRNA*, with miRNA sequence length between 18-25nt and flank sequence length of 20, were considered as potential Phaeodactylum tricornutum pre-miRNAs and selected for further analysis. The stabilities of the candidate pre-miRNAs were checked using randfold  in dinucleotide shuffling test. Then the 5' homogeneity was checked. The 5' homogeneity was defined as the total number of reads that had the same 5' end as the mature miRNA divide the total number reads mapped to the precursors. For precursors with a low P-value of ≤ 0.05 tested by randfold, a 5' homogeneity >0.5 was applied. For precursors with a P-value > 0.05, a 5' homogeneity ≥0.75 was applied. Then we checked the remaining sequences manually according to criteria made previously [46-48]. Sequences that slightly violated one or none of these primary criteria suggested by each author were obtained.
The miRanda [62-65] was used to detect potential target sites for the Phaeodactylum tricornutum candidate miRNA sequences. The parameters employed were as follows: match score S ≥ 90 and target duplex free energy ΔG ≤ -20 kcal/mol; scaling parameter = 2. The miRNA-target duplexes were then checked manually according to rules suggested by Allen et al.  and Schwab et al. . Basically, ≤ 4 mismatches between the small RNA and the target at positions 2-21, counting from the 5' - end of the miRNAs; ≤ 2 adjacent mismatches; no adjacent mismatches in positions 2-12; no mismatches in positions 10-11; and ≤ 2.5 mismatches in positions 1-12 (counting G-U bases as 0.5 mismatches). The minimum free energy (MFE) of the miRNA/target duplex should be >74% of the MFE of the miRNA bound to its perfect complement.
The expression of two miRNAs (pti-miR5473 and pti-miR5475) and their precursors were verified by northern blot hybridization using the High Sensitive MiRNA Northern Blot Assay kit (Signosis, USA) according to the manufacturer's protocol. Biotin labeled High Sensitive probe were designed according to the complementary sequences of the mature miRNAs and Phaeodactylum tricornutum 5s rRNA. 5 μg total RNA was loaded to each well.
AYH carried out the experiments, performed the data analysis and drafted the manuscript. LWH cultured the P. tricornutum, prepared the samples and participated in data analysis. GCW conceived of the study, and drafted the manuscript. All authors read and approved the final manuscript.
Flow chart of the procedure for sample preparation and sequencing, processing of reads and miRNA identification. (A) Flow chart of the procedure for sample preparation and sequencing. (1) P. tricornutum log phase cells were incubated in normal, nitrogen limited and silicon limited medium for 48 h and harvested, frozen instantly in liquid nitrogen and stored at -80°C before RNA extraction. (2) Total RNA was extracted using the Trizol method. (3) Fragments of 18-28 nt were gel-purified. (4) A 3' adaptor was ligated to the 3' end of sRNAs. (5) A 5' adaptor was ligated to the 5' end of sRNAs. (6) sRNAs were RT-PCR-amplified. (7) Sequencing. (B) Flow chart of the procedure for processing of reads. The numbers in parentheses represented the total reads from PT1, PT2 and PT3, respectively. (1) Initial processing: remove adapter, filter low quality tags and clean up tags smaller than 18nt. (2) Common/specific tags identified between samples. (3) Length distribution analysis of clean reads. (4) Matched clean reads to P. tricornutum nuclear genome using SOAP. (5) Matched clean reads to P. tricornutum chloroplast genome using SOAP. (6) Compared clean reads with non-coding RNAs from GenBank and Rfam. (7) Exon/intron fragment identified. (8) siRNA identified. (9) Plant miRNA homologs identified. (10) Annotated sRNAs. (11) Identified miRNA by hairpin structure filtering. (12) Target prediction. (C) Flow chart of the procedure for miRNA identification. (a) mfold was used to predict the secondary structure of extracted sequences. Sequences with Δ G < -18 kcal/mol, ≥ 16 bp and ≤ 4 bulges or asymmetries between miRNA and the other arm, miRNA sequence length between 18-25nt, with flank sequence length of 20, were obtained for further analysis. (b) randfold was used to check the stabilities of the candidate pre-miRNAs. (c) 5' homogeneity was checking. For precursors with a low P-value of ≤ 0.05 tested by randfold, a 5' homogeneity >0.5 was applied. For precursors with a P-value > 0.05, a 5' homogeneity ≥0.75 was applied. (d) Criteria made previously for miRNA identification were used to check the remaining sequences manually.
Nucleotide bias at each position for total small RNA. The percentages of each type of bases in positions 1 to 24 were indicated by the area. (A) PT1. (B) PT2. (C) PT3.
Categorization of P. tricornutum small RNAs. The proportion of unique/total sRNA tags matched to all categories of RNAs were showed. (A1) Categorization of unique small RNAs in PT1. (B1) Categorization of unique small RNAs in PT2. (C1) Categorization of unique small RNAs in PT3. (A2) Categorization of total small RNAs in PT1. (B2) Categorization of total RNAs in PT2. (C2) Categorization of total small RNAs in PT3.
Common and specific sequences between samples. The common and specific tags of every two samples, including the unique tags and total tags were summarized. (A1) unique sequences of PT1 & PT2. (B1) unique sequences of PT1 & PT3. (C1) unique sequences of PT2 & PT3. (A2) total sequences of PT1 & PT2. (B2) total sequences of PT1 & PT3. (C2) total sequences of PT2 & PT3.
Patterns of reads mapped to pre-miRNAs.
Stem loops for pre-miRNAs.
Candidate targets for P. tricornutum miRNAs.
We thank Zhaolei Zhang for his constructive suggestions in drafting the manuscript. The work was supported by the National Natural Science Foundation of China [30830015, 30970302, 40806063 and B49082401], and the Innovative Foundation of Chinese Academy of Sciences (KGCX2-YW-374-3).