|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRNAs) are small, noncoding RNAs that post-transcriptionally influence a wide range of cellular processes such as the host response to viral infection, innate immunity, cell cycle progression, migration and apoptosis through the inhibition of target mRNA translation. Due to the growing number of microRNAs and identification of their functional roles, miRNA profiling of many different sample types has become more expansive, especially with relevance to disease signatures. Here, we address some of the advantages and potential pitfalls of the currently available methods for miRNA expression profiling. Some of the topics discussed include isomiRNAs, comparison of different profiling platforms, normalization strategies and issues with regard to sample preparation and experimental analyses.
MiRNAs are a class of small, noncoding RNAs that have emerged as a novel and rapidly expanding area of interest in disease research. This is in large part due to their ability to fine tune expression of a vast number of target genes through mRNA degradation or translational repression. Binding of mature miRNAs (approximately 19-25 nt) to target genes through the 5′ nucleotides 2-8, termed the “seed sequence”, allows each of these small RNAs to impact the expression of hundreds of downstream genes. These and other rules derived from bioinformatics experiments capture the majority of cases, but not all. For instance, examples exist where nucleotides outside the seed sequence contribute to target recognition 1-3. The size of the miRNA is variable with experimentally confirmed extremes of 16 nt and 35 nt in length 4-7. Lastly, miRNAs are subject to mutation and selection and single nucleotide polymorphisms in mature and precursor miRNAs have been described, which can modulate targeting, processing and expression 8-10. Any study of miRNAs must start with an accurate and detailed accounting of the species and expression levels of the miRNA collection in the cell of interest.
In addition to cellular microRNAs, several viruses including the herpesvirus family can use encode miRNAs to aid in the establishment of viral latency and persistence, immune evasion, cellular proliferation and tumor progression11-16. MiRNAs encoded by the herpesviruses are also involved in controlling the molecular switch of the viral lifecycle between latency and lytic replication and reactivation 11, 12, 17-23. Several groups have examined changes in the cellular miRNA repertoire following acute viral infection, which has piqued even more interest in the field of miRNA profiling 8-12, 17-36.
MiRNA profiling can provide useful information on the pathogenesis of many different diseases. Identification of novel miRNAs can reveal new levels of gene regulation associated with a specific disease. Profiling of host miRNAs at different stages of cancer, for example, can give insight into potential biomarkers associated with tumor progression, metastasis and angiogenesis 11, 12, 35, 37-40. In some cases, altered expression of a few key miRNAs can separate groups of patients into clinically relevant classes. Moreover, miRNAs are relatively stable and the detection of miRNAs in serum and plasma may allow miRNAs to emerge as clinically useful disease biomarkers 27, 34-37, 41-43. Here, we will delineate some of the broader concerns with miRNA profiling and the advantages and pitfalls of the different profiling techniques.
We start our discussion with a brief overview of the miRNA maturation pathway, since any intermediate in this pathway can be subjected to profiling and may reveal non-redundant information about the cell (Figure 1). For example, qPCR analysis of the miRNA gene locus may provide insight into deletions, amplifications or SNPs in the microRNA gene. Both primary and precursor microRNAs (pri- and pre-miRNAs) can provide information on the transcriptional regulation of microRNAs whereas profiling the more mature microRNA form can lead to insights on function, target specificity, and changes in processing and microRNA export and stability.
Specifically, each processing step has its own rate constant k, which determines the pool size and pool turnover for the next intermediate (Figure 2). The primary microRNA transcript is subject to less processing and therefore dependent on the least number of processing rates/k values. As the microRNA matures, the number of k values with the potential to influence the pool size and turnover increases. Another factor to consider for profiling of different classes of microRNAs is that it is increasingly difficult to design specific primers for the detection of smaller targets. Depending on the diagnostic objective it may be desirable to profile the most mature, most stable pool, e.g. for tissue of origin profiling or it may be opportune to profile more immature intermediates, e.g. to interrogate the cell response to infection or drug.
Following microRNA processing, one strand of the mature miRNA is incorporated into the RNA-induced silencing complex (RISC), which can now recognize its target mRNA. Of note, not every miRNA that is expressed is loaded onto the RISC complex 44. The remaining unused strand is called the *strand or the passenger strand and is often degraded 18, 24, 45-47. Selection of the incorporated strand depends on the relative stability of the base pairing at the 5′ end of each strand. The mature miRNA strand exhibiting weaker base pairing at its 5′ end is usually selected for incorporation into the RISC 44, 48. However, if there is little difference in the base pairing stability, each strand may be incorporated into the complex, thus increasing the number of potential mRNA targets.
Quantitative real-time PCR (qPCR) is considered the gold standard of gene expression measurement. Several companies have developed qPCR-based assays for the detection of miRNA expression 49-51 and have increased throughput with the introduction of microfluidic array platforms, which allow one to analyze thousands of miRNAs at the same time 52-54. This technology can also be applied to pre-miRNA and pri-miRNA profiling 12, 40 and is typically done on total RNA. Size selection is not necessary because of the superior specificity of the primers.
Another technique for miRNA profiling is the hybridization-based glass-slide microarray 55-60. Microarrays have been used extensively in the past for gene expression profiling and thus are well-optimized and robust. The main difference between mRNA and miRNAs is the small size of the target. Thus, the advantages of longer oligos (e.g. 70-mers used in many commercial mRNA slide arrays) do not exist for miRNA profiling. At least in principle, both mature and pre-miRNAs can hybridize to the target, since they contain the same sequence. This may boost sensitivity or may require additional size selection prior to labelings if very precise measurements of mature miRNAs are needed.
Since clinical samples are often small and only a very limited amount of miRNAs can be extracted, non-specific amplification methods are of great practical value. Of course, all non-specific amplification methods may affect the linearity of the final assay. These methods are well established and commercially available for mRNAs and typically incorporate a T7 polymerase recognition site within the primer for reverse transcription 61. Because miRNAs are much smaller, adding an amplification step is not trivial and is likely to introduce non-linearity into the measurement process.
The growing popularity of Next Generation Sequencing has introduced yet another technique for studying miRNA profiles. In this method, adapters are ligated to sample RNA and cDNA libraries are made and amplified by PCR. These libraries are then sequenced and the output reveals sequencing reads of varying lengths corresponding to miRNAs, which are then aligned to the reference sequence of choice. Importantly, PCR amplification of sequencing libraries can also introduce issues with linearity. Northern blots and RNAse protection assays are other ways of measuring miRNA expression but are time-consuming. They require large amounts of sample input and are not suitable for high-throughput screens. For these reasons, we will not focus on these techniques in this review, although they can be used to confirm miRNA expression.
A more recent development in microRNA expression profiling is the nanotechnology-based assay for microRNA 62, 63. This platform eliminates some of the biases resulting from other profiling platforms because it does not require enzymatic modifications such as ligation or reverse transcription 62, 63. Instead, it utilizes nanopores that can detect the position and conformation of the target molecule within the pore and provides highly sensitive, quantitative data for miRNA molecules in a given sample 62-64. For more on this profiling technique, please refer to the following reviews 64-68.
Choosing the appropriate methods for miRNA profiling can be difficult as there are many factors to consider 69. Some platforms are able to detect more drastic changes than others, due to variation in dynamic range. However, this large range of detection is not necessary if only a 2-fold change in expression is significant in the samples being analyzed. qPCR and sequencing both have broad dynamic ranges but require different amounts of sample input and runtime (Table 1). Sequencing can be helpful in determining sequence and end variation but also requires additional time and expertise in bioinformatic analysis. We estimate as much analysis time as profiling time for qPCR and microarrays and about ten-times as much analysis time for complete sequencing-based analyses. Therefore, if prior studies have narrowed the field of interest to a smaller group of interesting miRNAs, qPCR assays and microarrays may be most cost- and time-effective (Table 1). The specificity and quality of the profiling results is often more important than the sheer number of miRNAs profiled. It is advisable to choose the platform that attains the best balance between cost, precision, accuracy and sample availability for your study (Table 1).
At present, the Sanger database (http://mirbase.org/, release 18.0) lists 18,226 precursor miRNAs expressing 21,643 mature miRNAs from 168 different species 4-7. Of these, over 1500 are human miRNA sequences and over 800 are derived from mouse 4-7. A virtual database of tissue-specific expression for experimental design purposes can be found at www.microrna.org. At first glance, the ultimate goal in miRNA profiling is to obtain an accurate overall picture of the miRNA repertoire. However, the experimental design to achieve this goal is not always straightforward. First, we need to consider an important biological property of the cellular miRNA pool. NextGen sequencing has shown that, with few exceptions, a small number (<20) of miRNAs accounts for > 80% of all cellular miRNAs 70. By an extreme example, the liver-specific microRNA miR-122 accounts for > 50% of all miRNA molecules expressed in hepatocytes 38. Similar results were obtained when looking at the overall expression of viral microRNAs. In KSHV infected lymphoma cells, the 23 viral miRNAs account for over 80% of all miRNA molecules 31.
For these abundant miRNAs, a small (2-fold) change will have a significant impact on target protein levels. However, even a 10-fold change in a miRNA of low abundance (<1% of total) may not translate into a measurable biological phenotype. This shouldn’t discount the importance of low abundance miRNAs completely, as less abundant miRNAs may still serve as specific biomarkers for a correlated disease phenotype. Biomarkers can be of high predictive (and commercial) value even if the functional mechanism is unknown 27, 34, 41, 43, 53, 71-73. Lastly, only a few miRNAs have a known biological function and validated, experimental targets. All too often, exhaustive profiling experiments yield statistically significant changes in miRNAs that are completely uncharacterized. At that point, the follow-up work to attach biological plausibility and significance to a profiling result becomes substantial. Several groups have studied the biological activity of microRNAs in terms of kinetics of the mRNA-miRNA interaction and miRNA abundance with relevance to experimental applications 74-80.
One source of confusion when comparing miRNA profiling data is that they are only somewhat reproducible between different platforms and even intraplatform variation is common 69. Typically, the relative quantification of miRNA expression is similar among the various platforms and biological repeats whereas the absolute expression measurements tend to vary.
Often times the variation with regard to sample classification between profiling platforms results from the contribution of miRNAs with very low abundance. This reflects the different sensitivities of the expression profiling methods. One approach is to use rigorous thresholding, by only including the 50 most abundant miRNAs in an experiment in cross-platform comparison. Because of the biological makeup of the miRNA pool (see above), these would capture > 90% of the miRNA repertoire. A disadvantage of this approach is that it can be subject to platform-specific biases with regard to a single miRNA. For instance, in the early days of profiling by sequencing, the miRNAs were cloned prior to Sanger sequencing of libraries. Because one of the cloning steps relied on a particular restriction endonuclease, miRNAs that contained the endonuclease recognition site were never part of the library 30. By extrapolation, there will always be a few miRNAs that because of their particular sequence are not efficiently labeled, sequenced or amplified and thus lost to analysis.
Another factor that can complicate miRNA profiling is the recent discovery of isomiRNAs 81-83. IsomiRNAs are miRNAs that display sequence variations, typically by shortening or lengthening of the 3′ end. Figure 3 shows an example of the variation in miRNA length and sequence. Some have argued that shorter isomiRNAs may be miRNA degradation products but the presence of longer isomir sequences suggests otherwise 81, 83. A recent study identified over 3300 miRNA variants and found that in several cases, the most abundant miRNA differed from the miRBase sequence 39. Moreover, the dominant isomiRNA expressed can vary by tissue and sample. A comprehensive list of isomiRNAs that have been experimentally identified can be found at http://galas.systemsbiology.net/cgi-bin/isomir/find.pl 83. This database may prove useful in determining the most abundant sequence of a particular miRNA and the degree of heterogeneity for each miRNA species 83. It is somewhat reassuring that in many cases, one or two isomers account for > 90% of the species and thus of the signal detected (Figure 3). The remaining variants even if extensive, are not abundant enough to contribute to the signal (Figure 3). The potential pitfall exists that in certain commercial assays the wrong isoform may be included as the sensor.
MiRNA sequence variations can result from post-transcriptional modifications and differential processing. Alternative processing of primary miRNA transcripts by Drosha can generate multiple pre-miRNA species and further expand the sequence heterogeneity of mature miRNAs 84. In addition, RNA editing events like deamination, 3′ end processing of stable intermediates, base substitutions and 3′ extensions and 5′ deletions can add to the complex nature of miRNA sequence variation 82, 85-87. Several groups have shown that although sequencing or enzymatic errors could potentially account for some of this variation, miRNA heterogeneity is not an technical artifact but more likely generated in vivo through biologically relevant processes 84.
One could infer that variations in the 5′ end of the miRNA may drastically alter the seed sequence and target specificity. Further, variations even within the 3′ end may affect the affinity at which a miRNA binds to its target 86, 88. miRNA sequence variations may also alter the specificity of miRNA association with different Argonaute proteins, another functional consequence of isomiRNAs and RNA editing 82, 85.
MiRNA end heterogeneity can affect the consistency and accuracy of measuring miR expression levels. Since qPCR and microarrays heavily rely on the availability and accuracy of miRBase sequences for primer and probe design, mutations can lead to miRNA detection issues. One study found that as few as 1-2 nucleotide changes in the miRBase sequence from either end can drastically affect the miRNA profiling results 83.
In the broader picture, accumulation of miRNA expression levels can depend on the rate of transcription, processing and miRNA decay. The stability of miRNAs can be controlled by cis-acting modifications, protein complex formation and exposure to nucleases 89. When a mature miRNA is in complex--especially within the Ago/RISC complex-- its stability is greatly increased, allowing for enhanced detection of these miRNAs. A recent study showed that these miRNA/Ago complexes could be found in serum and plasma and exhibited high stability for miRNA profiling 37. Therefore, miRNAs that preferentially mask themselves in these ribonucleoprotein complexes may outlast others and thus could be reflected in the miRNA repertoire.
Here, we will focus on some of the technical issues that may arise with qPCR-based profiling of miRNAs. Although much of this section is dedicated to qPCR profiling due to our experience in this area, many of the same problems are also encountered using microarrays 90.
Overall, qPCR is a popular, reliable technique for miRNA profiling because of its high sensitivity, reproducibility and large dynamic range. More recently, this method has expanded to accommodate even more high-throughput capability with the introduction of microfluidic qPCR 52-54, 91, 92. These methods and their smaller reaction size (down to nanoliters) provide the user with rapid, cost-effective customizable arrays that decrease sample input and allow thousands of reactions per experimental run. qPCR-based profiling is more rapid than other platforms and accommodates a wide range of samples, from cells to formalin-fixed, paraffin-embedded (FFPE) tissues requiring limited input. qPCR assays can be easily automated using robotic systems, which reduce hands-on time significantly and decrease variation due to human pipetting error 93.
qPCR profiling is highly compatible with fixed tissue samples. Even during RNA-protein crosslinking, short RNAs like miRNAs may be less affected than other RNA species due to their smaller size and high stability. However, prior to profiling, RNA sample quality should be tested by running an RNA or Agilent gel. Although RNA quality is less important when detecting miRNAs as compared with mRNAs, it can provide insight into the potential degradation of RNA, quality of the nucleic acid isolation procedure and could affect the overall outcome of the qPCR results.
Once purified RNA is obtained, the process of cDNA synthesis can introduce unexpected variation, more so than the qPCR step itself 61, 94. One study found that the cDNA synthesis reaction could introduce up to 100-fold variation in RT yields 61, 95. Introduction of errors due to secondary structure, variation in priming efficiency and properties of the RT enzyme itself can all influence the product yield from the RT reaction 61.
Much of the error introduced with qPCR-based profiling is due to preferential ligation and amplification. Certain miRNAs can preferentially bind or hybridize to the primers or probes used and similarly enzymes can exhibit biases toward certain sequences. This ultimately relies on the access to the target site and folding. For microarrays, this bias may occur at the RNA labeling step. Newly developed microarrays have aimed to eliminate some of these issues by using a label-free system 58. Although the hybridization efficiency may introduce bias, the preferential binding of specific sequences and associated error likely exists for all platforms. This is especially true when universal array conditions are applied since the optimal conditions of specific probes may be compromised. It is therefore possible that some miRNAs may be left out of analysis if optimal binding does not occur.
Profiling pitfalls can also occur with the use of universal annealing temperatures, primer efficiency and qPCR cycling conditions. Comparing large numbers of miRNAs can introduce Tm issues since miRNAs can vary highly in their GC content and therefore the Tm may differ, leading to complications in primer design. We have seen the same bias in targeted NextGen sequencing, where a GC content >65% yielded a >1 log decrease in coverage. This may be acceptable for resequencing, but would distort any NextGen sequencing based quantification. The primer and probe design can also be complicated by size. Due to the small size of miRNAs, primers must utilize nearly the entire miRNA sequence. Stem-loop RT primers, as opposed to universal RT primers, are long oligos that bind the stem-loop region of the microRNA and contain a region complementary to the sequence of each specific microRNA. The use of these stem-loop primers increases primer specificity while the incorporation of modified oligonucleotides such as locked nucleic acids (or LNAs) can help the issue of Tm variation. Finally, since qPCR primers can form primer dimers and lead to false positive results, it can prove useful to confirm PCR products via gel electrophoresis. While electrophoresis may not distinguish one amplified miRNA from another due to size similarity, it can boost confidence in results by eliminating potential false positives.
There are two distinct priming methods used among commercially available qPCR-based platforms. Some platforms make use of unique, sequence-specific RT primers for cDNA synthesis while others make cDNA using universal tailing primers. The universal tailing system used by the Qiagen™, Exiqon™ and SABioscience™ platforms, uses a polyadenylase to generate poly-A tails and an anchored poly-T primer to generate cDNA 83. The stem-loop RT primers such as the Applied Biosystems (ABI) Taqman™ primers use a sequence-specific looped RT primer. In our hands, this results in enhanced specificity and sensitivity compared to the conventional linear RT primers.
We directly compared two platforms -- one universal tailing primer platform and one sequence-specific stem-loop primer platform. While the CT data generated from the qPCR correlated rather well, gel electrophoresis analysis of the PCR products using the Caliper™ Labchip GX system revealed a large degree of variation in product size, indicative of non-specific priming (Figure 4). We compared the ten most highly expressed miRNAs to those that were expressed at moderate levels or whose expression levels were undetectable. We observed that the specificity of the sequence-specific RT primers was greatly increased and yielded one prominent PCR product (Figure 4). This was evident for both highly abundant (A) and moderately abundant (B) miRNAs. The sequence-specific RT primer assay failed to yield a signal (CT) for low abundance miRNAs (C). By contrast, the universal tailing RT-primer-based assay yielded a quantitative signal (present call), but most of this signal was due to non-specific amplification. Thus, one can expect a large number of false positive results for this assay. Figure 4 also shows that the most highly expressed miRNAs were not always consistent between platforms due to differences in the workflow of these two commercially available miRNA assays.
Each system, however, has its advantages and drawbacks. The use of unique RT primers can be more time-consuming than the alternate universal method since each miRNA has a specific RT and qPCR primer and probe. In an attempt to eliminate this time constraint, ABI recently introduced pooled RT primers that contain a mixture of these stem-loop primer sequences and decrease the number of sequence-specific reactions needed. Exiqon™, which uses a single universal RT primer, has enhanced the specificity and selectivity of their qPCR primers compared to other assays by incorporating locked nucleotide acid analogs (LNAs) 96.
The method of priming can affect both the overall detection and the degree of sensitivity to sequence alterations at miRNA ends. In this case, platforms like Exiqon™ display an advantage in detecting isomiRNAs because unlike the unique, stem-loop primers, it does not rely on the 3′ end sequence for primer annealing 83. Rather, the 5′ sequence of the miRNA (first 15- 20nts) influences primer design since shorter primers can be made with LNA technology. Problems with cross-priming can also lead to specificity issues and make it difficult to discriminate between closely related miRNAs. This is especially troubling when trying to detect the expression of miRNAs that belong to the same family and may only differ by 1-3 nucleotides. Unlike qPCR, hybridization-based microarrays do not offer the option of post-assay QC to verify the identity of the hybridized molecule. Moreover, many miRNAs exhibit a high degree of species conservation and therefore some cross-species reactivity will occur. In most cases, the addition of LNAs and sequence-specific primers can both be helpful in boosting assay sensitivity.
Another factor for consideration in data interpretation is qPCR efficiency. The qPCR efficiency can vary due to a number of different reaction parameters including sample concentration, degradation, dyes, non-specific products and the presence of PCR inhibitors or enhancers. Ideally, the efficiency of the qPCR reaction should be 100%, which translates to a doubling of the PCR product with each cycle. This would result in a 10-fold increase every 3.3 cycles. However, if a particular primer is performing at only 70% efficiency or the efficiency is slightly variable between two samples, the true differences may be vastly different 97. If variation in PCR efficiency occurs between different samples, it would be reflected in the respective standard curves and cause the slope to vary. Calculation of expression levels based on these standard curve equations would therefore be skewed without also factoring in the PCR efficiency. Therefore, it is important to test the efficiency of the qPCR reaction by performing serial dilutions of known templates to estimate the efficiency differences based on the template, primers, cycling parameters and reaction chemistry. There are several publications and online resources available dedicated to this issue [97-99 http://www.gene-quantification.de/efficiency.html].
MiRNAs represent only a small fraction of the total RNA and this fraction can easily vary across samples. Since changes in miRNA expression can be clinically or biologically significant, normalization of data is one of the most important and challenging issues faced when profiling. Cell number cannot really be used as a normalization factor when dealing with tissue samples and normalizing to 18/28S ribosomal RNA (rRNA) can present a challenge for miRNA-enriched samples, where rRNA is absent. Some of the characteristics of a good normalizer are as follows: invariant expression across all samples, expression of the normalizer with target in the cells of interest, high stability in storage conditions, efficiency of extraction and quantification is similar to the target of interest 95, 100, 101. Normalizing to a housekeeping gene can remove differences due to sampling, input and quality of RNA and can identify true changes in gene expression.
Currently, many groups use other noncoding RNAs as normalizers for miRNA expression. These may not serve as great reference genes because small nuclear RNAs like U6 do not share the same properties as miRNAs in terms of their transcription, processing and tissue-specific expression patterns. Therefore, especially in experiments analyzing potential defects in miRNA processing and regulation, other small RNAs could be misleading as reference genes. Some miRNAs exhibit stable expression under different conditions and may serve as miRNA reference genes that are good alternatives to U6 101. However, the best way to approach analysis of miRNA expression data is through global mean normalization of a set of reference genes. These may be tissue-specific. This method takes a minimum of three stable housekeeping genes and takes the geometric mean to provide a reliable normalization factor that can control for outliers and differences in abundance between genes 95, 100, 101.
In addition to reference genes, it is also important to include plate normalizing factors to account for potential plate or slide variation that may arise from pipetting errors and different lot numbers of arrays. Microarray normalization of fluorescent spot intensity can also exhibit variation due to background fluorescence correction and spot quality102. Although there are several factors to consider for normalization of qPCR and microarray experiments, these methods have been around longer and have undergone several rounds of quality control and optimization, making them an attractive platform for measuring miRNA expression.
qPCR-based miRNA profiling can be used with many different sample types and is especially useful when sample is very limited (<50ng). This can be a great way to determine the known miRNA profile of a specific disease or cell type. However, newly identified miRNAs may not be available for profiling use since the throughput of qPCR is lagging behind the growing number of miRBase entries4-6.
One important factor to be considered in experimental design is the potential difference in tissue and cell-type expression levels. Specific microRNAs may be restricted in expression to specific cell types and tissues and this could contribute to differences in microRNA profiling data between different samples. Moreover, in addition to cell-type specific miRNA expression, the mRNA targets may vary between cell or tissue type 103. One may also find that tissue culture-adapted cell lines have lower variability due to their clonal nature and may differ in expression compared to primary cells.
Several databases including miRanda (microRNA.org) provide expression profiles for miRNAs that can be accessed prior to experimental design. Cell type-specific miRNAs can be helpful to include in profiling studies as controls for confirming cell or tissue type differences. It is important to note that some miRNAs that are “tissue-specific” may not be completely absent and can still be expressed at low levels in other tissues but present at much higher levels in certain tissues or cell types. Since qPCR displays very high sensitivity, it also may be able to detect miRNAs that are present at very low levels that other techniques could deem undetectable. In this case, it is up to the user to determine how physiologically relevant the data is by possibly confirming the results with other techniques. Finally, it is of utmost importance that qPCR-based profiling be performed in several replicates to ensure high confidence in miRNA expression data. While technical replicates can be useful, they are not independent samples and therefore do not increase the number of biological observations of a given subpopulation. Only true biological replicates increase the statistical power of the study.
The use of Next Generation sequencing to analyze gene expression has grown rapidly due to its comprehensive nature. Sequencing can provide information about both known and novel miRNAs, giving it an advantage over techniques that depend on known sequences for assay design. However, there are issues that must be taken into consideration prior to performing your experiment 104. Sequencing can introduce errors during library preparation and may also be contaminated with RNA degradation products, creating elevated noise as compared with other platforms.
The sequencing platform can easily detect sequence variation of miRNAs, expression levels of other members of the small RNA repertoire and post-transcriptional modifications of these small RNAs, providing a more complete picture of a given sample. High-throughput sequencing is also the best way to discover novel miRNAs. However, much more work is required to confirm novel miRNAs than solely the detection of reads. A recent sequencing study found that a considerable proportion of newly identified miRNAs could merely be artifacts 105.
Another advantage of the sequencing platform for miRNA profiling is the large dynamic range provided by this technique if enough input is available. Sequencing can detect above five orders of magnitude dynamic range although its 95% confidence interval is actually similar to that of the microarray 51, 56, 60, 106. Like other techniques, several versions of sequencing platforms exist and platforms vary based on the desired read length. The Roche 454 sequencing system provides the user with longer reads of about 400bp but the overall number of reads tends to be lower (~400,000/sample). Other systems such as Solexa (Illumina) and SOLiD (ABI) output a higher number of shorter reads (~35bp reads and up to 100 million reads/sample). Both of these techniques have proved efficient for assessing microRNA expression18, 23, 25, 26, 29, 31, 70, 81, 83, 106, 107. A high correlation between the different sequencing platforms has been observed and there seems to be less variation here as compared with the different commercially available qPCR and microarray assays 51, 106.
Several sources of bias can emerge during sample preparation for sequencing. First, RNA quality should be tested prior to proceeding with the experiment although miRNAs can exhibit high stability. The most commonly observed problems with sample preparation involve the introduction of RNA degradation products, contamination of the sample with ribosomal RNA and adapter and enzyme bias. Factors such as the preferential ligation of adapters or barcodes to RNA containing certain sequences and variation in enzymatic efficiency can generate bias during library preparation. Also, since the library undergoes both a reverse transcription and PCR amplification step prior to sequencing, some of the same biases discussed earlier may be introduced and amplified during this time 108. The assay reproducibility may ultimately depend on the consistency of the preparation method and the quality of RNA sample input. Spiked-in oligonucleotides of known concentrations can also be added to samples prior to sequencing preparation to help track and quantify sample input lost during the technique, detect any potential sequencing errors and assess other factors that can affect the data during experimental library preparation. Moreover, since Next Generation sequencing is fairly novel, the introduction of newer products aiming to eliminate biases such as ribosomal RNA contamination will improve sequencing as a reliable miRNA profiling platform.
Next Generation sequencing has many applications. In terms of miRNA profiling, it provides a broad picture of the small RNA profile. Sequencing is not restricted to detecting the expression of known miRNAs and exhibits the highest discerning power for identifying unknown, novel miRNAs. In addition to discovery, this technique can be applied to detect different isoforms of miRNAs, new sequence variants and other types of small RNAs such as mirtrons, shRNAs, piwi-interacting RNAs or degradation pathway intermediates. Therefore, if one is interested in more broad-based profiling or obtaining data on different classes of small RNAs, sequencing is the best profiling platform.
Deep sequencing techniques have been used to profile and identify both viral and cellular miRNAs in response to infection with many different viruses such as the herpesviruses EBV, KSHV, MHV68, Marek’s disease virus, HCV, Avian Influenza, rLCV and Adenovirus 9, 10, 17-21, 23, 25, 26, 29, 31, 33. These studies have provided insight into how miRNAs may control the lifecycle and pathogenesis of virus-associated diseases. One study using sequence-based profiling of small RNAs revealed that miRNAs, newly identified miRNA offset RNAs (moRNAs) and antisense miRNAs (as-miRNAs) are all expressed following infection with Kaposi’s sarcoma-associated herpesvirus (KSHV), leaving us with a much more complete picture of virus-induced changes in the small RNA repertoire18. Furthermore, these types of studies have contributed important information to the field of small RNAs as a whole. High throughput sequencing-based miRNA profiling will continue to grow as a field and shed light on various aspects of miRNA expression, function and processing.
After performing sequencing, there is a need to further evaluate the noncoding transcripts detected to ensure that they are functionally relevant and do not just appear due to technical noise. MicroRNA sequencing data can be confirmed by Northern blot analysis and also by confirming the alignment of the sequence reads with the pre-microRNA stem-loop region in the reference genome. When examining both viral and host miRNAs within infected cells, an important factor in experimental design is the percent of infection or transformation as it can affect the proportion of sequencing reads mapping to viral miRNAs. Another area of caution is the discovery of novel miRNAs. For miRNA discovery, computational predictions alone are not robust enough and they can lead to false positive results. However, novel viral and cellular miRNAs can be predicted using computational strategies and then experimentally confirmed by several other methods. One such prediction database for viral miRNAs is vir-mir db, a database that predicts miRNA hairpin loops based on virus sequence and secondary structure 28. Predictions from online databases or computational programs can then be mapped to small RNA reads for alignment. One potential caveat in attempting discovery in this manner is that the 5′ end of predicted miRNAs or stemloop regions might not be precise due to false prediction algorithms or miRNA editing events. For other experimental methods and criteria regarding new miRNAs, researchers are encouraged to examine the new criteria for miRNA discovery referenced here 105.
While deep sequencing may capture non-specific fragments such as degradation products, its high sensitivity can be of great experimental use. First of all, miRNAs that are undetectable by Northern blotting or microarray may produce low levels of count data from sequencing. The question that arises: What is the biological relevance of the low-abundance miRNA? The high sensitivity of Next Generation sequencing also allows identification of potential disease-specific mutations within the miRNA transcriptome, which can correlate and sometimes predict disease progression or outcome 9, 10. Other techniques such as qPCR-based profiling, while sensitive, may detect, but not identify these SNPs since mutations inhibit the primers from binding or yield an altered melting profiling to the mutated miRNA.
Analysis of sequencing data is complex, requires in-depth bioinformatics and complicated sequence algorithms. In addition to its long runtime, sequencing may require more time for analysis and in some cases extra personnel with expertise in statistical analysis of this type of data. While more user-friendly programs are being introduced for this growing field, sequencing can provide an overwhelming amount of read data that can be difficult to decode and translate into miRNA profiles 107. Moreover, the different statistical analysis methods may alter the classification and clustering of sequencing samples (our data, Lee et al, submitted). A reliable standard approach has yet to emerge.
Sequencing read data is typically first aligned to known miRNAs registered in miRBase or to the organism being studied. However, introduction of bias due to RNA ligase preferences, the cDNA synthesis reaction and PCR amplification of libraries may cause problems with normalization. One can try scaling the data to the size of the library and report the reads of a particular miRNA over the total number of aligned reads as a fraction of the miRNA profile. The use of spike-in oligonucleotides may also be helpful in normalizing the data. Universal reference miRNAs are also currently in development to aid in the process of normalizing complex sequencing data 109, 110.
In addition to analyzing the mature miRNA repertoire, precursor miRNA (pre-miRNA) profiling has also been used successfully to delineate stages of cellular transformation and disease 12. Expression levels of these stem-loop precursors can be analyzed using Next Generation sequencing of small RNAs or qPCR-based assays 18, 21, 40, 93. Pre-miRNAs can offer several technical advantages in terms of profiling and are similar in this respect to mRNAs. Their increased length (~70nt) allows for better primer design and enhanced assay specificity. Also, because they represent an intermediate of miRNA processing, pre-miRNAs may offer a more robust and quick response to changes in cellular transcription (Figure 1). In addition, their decreased stability may also increase assay sensitivity for profiling various types of samples. One of the limitations of pre-miRNA profiling is that differences in miRNA processing and maturation may not be evident until it reaches the mature miRNA stage and modifications of pre-microRNAs can be tissue-specific 111. Consequently, although pre-miRNAs often correlate with the levels of mature miRNAs, profiling of both the precursor and mature species can provide non-redundant information for classifying samples into distinct and biologically relevant clusters. These two assays perform with slightly different characteristics and therefore one may detect a change the other cannot. Therefore, combining pre-miR and mature miRNA profiling can provide a more complete picture of the response to disease, viral infection and other cellular stresses.
MiRNA profiling provides valuable information on the expression levels and kinetics of these small RNAs and may reflect onto their potential biological roles in normal cellular function and disease. We have taken an in-depth look at several profiling techniques and visited the advantages and potential pitfalls of each. The best microRNA profiling platform may ultimately be determined by the convenience, dynamic range, variability or experience with a particular technique as well as local availability and cost. Any platform that is well tested in your experimental system and displays low technical variance will likely be an effective way of examining the small RNA repertoire. Once a primary profiling technique is chosen, it is advisable to avoid switching between platforms and preparatory reagents to achieve the best consistency among results. Confirmation of results using alternate methods such as qPCR or Northern blotting is also important. Furthermore, modulation of miRNA expression levels by addition of commercially available mimics and miRNA hairpin inhibitors allow researchers to explore the potential importance and function of a given miRNA signature.
This work is supported by NIH grant CA019014 to DPD. PC is a recipient of a T32 award T32- CA009156-37.