Diatoms are unicellular photosynthetic phytoplankton of global importance. They are responsible for 20% of global carbon fixation
[1],
[2], and as such not only provide a major source of carbon for food webs, but also are key players in atmospheric carbon cycling and its attendant environmental issues. Diatoms are in the heterokont class of microalgae, and have a distinct evolutionary history relative to land plants and red, green, and glaucophyte algae in that they are the result of a secondary endosymbiotic event, whereby a free living heterotroph acquired a plastid through enslavement of a red or green algae
[3],
[4]. This leads to a unique genetic complement in diatoms
[5] and, by inference, potentially unique gene expression control mechanisms. In addition, the ocean environment is subject to dynamic changes that frequently occur on short time scales, and organisms in this environment must have gene expression control mechanisms to enable rapid adaptation to these changes.
Complete genome sequences have now been reported for two diatoms,
Thalassiosira pseudonana [3] and
Phaeodactylum tricornutum [6]. Both species were selected for their small genome sizes (32.4 Megabases (Mb) and 27.4 Mb, respectively). These two diatom genomes contain between 10,000 and 14,000 genes
[3],
[6],
[7],
[8], of which only around 50% of the genes can be assigned a putative function based on current experimental knowledge, and about 35% are specific for each diatom
[6],
[9].
Small non-coding RNA genes have been found in numerous organisms where they act as transcriptional and translational regulators of gene expression. Their ability to silence specific genes affects a wide range of biological functions, ranging from gene regulation during embryological development and cell differentiation, to genome rearrangement
[10]. Of the classes of small RNAs, the microRNA (miRNA) family is the most extensively characterized. MicroRNAs are estimated to occur at a frequency of approximately 0.5–1.5% of the total genes in the genome of an organism
[11], and it is estimated that 20 to 30% of human genes are regulated by miRNA
[12]. Besides microRNAs, the remaining types of small RNAs may be grouped together collectively as endogenous short interfering RNAs (siRNA). Endogenous siRNAs have been found in multicellular and unicellular plants and animals
[13]. The common characteristic of siRNA genes is that their biogenesis involves double-stranded RNA, without a hairpin precursor, which is in contrast to miRNAs
[14]. The siRNA types of relevance to this study include repeated-associated siRNAs (rasiRNAs) and natural antisense siRNA.
While most research on eukaryotic small RNA to date has focused on multicellular plants and animals, there have recently been studies in unicellular eukaryotes. In particular, several types of small RNAs were reported in the unicellular green algae
Chlamydomonas reinhardtii, following 454 sequencing of a small cDNA library
[15],
[16], and evidence of miRNAs has also been recently found in the heterokont brown algae
Ectocarpus siliculosus [17] and the red algae Porphyra yezoensis
[18]. Overall, these studies identified miRNAs, phased siRNAs, tasiRNAs, and nat-siRNAs. All of the miRNAs found that had the characteristic hairpin structure precursor, were novel and did not exhibit sequence identity with known plant and animal miRNAs. These results are significant, since they imply that deep sequencing might be the key to discovering miRNAs in organisms that have not yet been studied extensively for small RNAs, or in which specific miRNAs are not exceptionally highly represented. Highly expressed plant and animal small RNAs were initially characterized using traditional cloning approaches
[19],
[20],
[21]. Re-examination of these organisms by deep sequencing approaches has revealed a larger population that includes miRNAs expressed at lower levels
[22]. miRNAs with lower expression levels are generally not conserved between organisms, suggesting that they play specialized roles
[22].
Applied Biosystems (ABI) SOLiD next-generation sequencing represents an emerging technology that may aid in the discovery of small RNAs. The SOLiD (Supported Oligonucleotide Ligation and Detection) platform utilizes a sequencing-by-ligation method, which involves iterations of hybridization and ligation, on a glass slide support, using probes labeled with four different fluorescent dyes
[23],
[24]. Each dye encodes a two-nucleotide pair, generating sequence data represented in “colorspace” format, rather than in nucleotide “base space” data format. The promise of resequencing applications and transcriptomic analyses has brought this next generation sequencing technology to the forefront
[23],
[25]. To date, there have been only a small number of published studies using SOLiD for identifying miRNAs in standard model organisms, such as human, rat, and Arabidopsis
[26],
[27],
[28],
[29], and no studies have been reported on a nonstandard model organism, such as the diatom.
In this study, we applied both 454 and SOLiD deep sequencing approaches to characterize classes of small RNAs from the diatom
Thalassiosira pseudonana, determine their similarity to plant and animal small RNAs, examine their genomic distribution, and identify potential target mRNAs for regulation. Comparison of both approaches (which included two different sample preparation methods and biological and technical replicates) was done to obtain a broader-based look at the small RNA population, with 454 providing accurate evaluation of small RNA lengths and terminal nucleotides, and SOLiD providing deep coverage. For SOLiD data analysis, the standard ABI SOLiD data processing pipeline includes a step whereby the data is filtered by a comparison to the Sanger miRBase database of known miRNAs ([
http://microrna.sanger.ac.uk/])
[30]. Because the majority of miRNAs in the diatom may be novel, as was found for
Chlamydomonas miRNAs
[15],
[16], filtering by the known Sanger miRBase may have undesirable effects. Therefore, this study also reports the development of a new methodology to process SOLiD data to extract the entire small RNA population, which can then be examined to identify and predict novel miRNAs and endogenous siRNAs.