|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRs) are endogenous, non-coding RNAs involved in many cellular processes and have been associated with the development and progression of cancer. There are many different ways to evaluate miRs.
We described some of the most commonly used and promising miR detection methods.
Each miR detection method has benefits and limitations. Microarray profiling and quantitative real-time reverse transcription PCR (qRT-PCR) are the two most common methods to evaluate miR expression. However, the results from microarray and qRT-PCR do not always agree. High-throughput, high-resolution next generation sequencing of small RNAs may offer the opportunity to quickly and accurately discover new miRs and confirm the presence of known miRs in the near future.
All of the current and new technologies have benefits and limitations to consider when designing miR studies. Results can vary across platforms, requiring careful and critical evaluation when interpreting findings.
Although miR detection and expression analyses are rapidly improving, there are still many technical challenges to overcome. The old molecular epidemiology tenet of rigorous biomarker validation and confirmation in independent studies remains essential.
In 1993, Rosalind Lee, Rhonda Feinbaum, and Victor Ambros reported the discovery of a gene, lin-4, that coded for small RNAs rather than a protein (1). This discovery led to the identification of an entirely new class of RNA: microRNA (miR). Mature miRs are small, single-stranded RNAs about 22 nucleotides in length that are highly conserved across species (2). By degrading messenger RNA (mRNA) transcripts or inhibiting protein translation, miRs negatively regulate gene expression for a variety of fundamental biological processes, such as apoptosis, development, differentiation, and proliferation (2, 3). It is estimated that miRs regulate approximately 30% of human genes (4), and miR dysregulation has been associated with the development and progression of cancer (5, 6). In fact, the same miR can act as oncogenes in some tissues and as tumor suppressors in others (5).
These discoveries have sparked a great deal of interest in miR research. Because of their unique post transcription and protein translation regulatory functions, miRs are important epigenetic modulators. For example, since miRs can inhibit protein translation, gene expression may be high while the encoded protein expression is low (7). In addition, while mRNA is not stable in formalin fixed paraffin-embedded (FFPE) tissue, miR expression profiles seem to correlate well between fresh and FFPE samples, possibly due to miRs’ small size and resistance to RNA degradation (8–13). Stable miRs have also been detected in serum, plasma, urine, and other biological fluids and may be associated with cancer (14–22). These features make miRs extremely attractive for epidemiologic research, where archived FFPE tissue, blood, or other biological fluids is most often available.
MiRs have been evaluated through a number of different methods, and each has its own limitations. Some of the most commonly used and promising methods are listed in Table 1. The cloning method was originally used to discover miRs, which were subsequently confirmed with Northern blot (1, 23–26). Although most new miRs are still discovered through cloning, these methods are time-consuming, low-throughput, and biased toward the discovery of highly abundant miRs (27, 28). Similarly, other miR profiling methods have benefits and limitations. For example, while in situ hybridization is low-throughput and has limited sensitivity and specificity, it shows the cellular localization of miRs, which is useful for characterizing the biology of individual miRs (28, 29). More generally, methods that directly detect miRs have low sensitivity because of the miRs’ extremely short sequences and relatively low copy numbers. For these reasons, methods that do not involve miR amplification require more input total RNA. However, methods that do employ amplification can be error prone due to the extremely short and inflexible template characteristics and similarity in sequences within miR families. Amplified samples are also more greatly affected by handling errors (30).
Although there is currently no gold standard for measuring miR expression (29), oligonucleotide microarray (microchip) and quantitative real-time reverse transcription PCR (qRT-PCR) are two of the most common methods for evaluating known miRs (7, 27, 29–31). While some studies of cancer cell lines (32, 33) or human tissue (34) found good correlation between microarray and qRT-PCR for selected miRs, one recent study compared semi-high-throughput microarray and qRT-PCR in proliferating murine myoblast cells and concluded that there was low correlation across platforms (27). Similarly, we found poor overall correlation between microarray- and qRT-PCR-based miR expression in 49 samples from lung cancer cases in the Environmental And Genetics in Lung cancer Etiology (EAGLE) population-based case-control study. Microarray and qRT-PCR miR expression were significantly correlated for only 4 out of 9 (44%) human miRs evaluated. Other studies of miR expression in cancer have also reported a relatively poor replication of microarray miR expression by qRT-PCR (35–37), and studies with 100% validation often report only 1 to 3 miRs (38–43).
There are a number of reasons why the results from qRT-PCR and microarray might differ. First, the larger dynamic range of stem-loop qRT-PCR (7 logs vs. 3–4 logs for microarray) may provide greater sensitivity (27). qRT-PCR may also have higher specificity compared to microarray in distinguishing miRs with bases that differ at the 3′-end since stem-loop primers can distinguish between miRs that differ by one nucleotide (27, 44). In addition, since miRs vary slightly in length and GC content, they have different melting temperatures (30). Yet all miR probes on a microarray must undergo the same hybridization conditions since they are all on the same microchip. These homogenized hybridization conditions can lead to sequence-dependent differential hybridization affinities that may result in either false positives due to non-specific hybridization or false negatives due to hybridization signals that do not exceed the background threshold (32). Dual channel (color) hybridization is less affected by this limitation than single channel hybridization since the ratio between the two channels is used for data analysis rather than signal intensity. On the other hand, qRT-PCR relies exclusively on the success of cDNA synthesis, which is initiated by a stem-loop primer primed to short sequences at the 3′-end of the miR. Failure to initiate cDNA synthesis could result in false negatives. Readers should take care when reviewing older studies since stem-loop primers designed based on older versions of miR sequence databases, such as miRBase 9.2 (45), may not correctly prime to natural miR sequences due to inaccuracies in miR sequences from earlier versions as compared with the current version, miRBase 14 (46). However, most modern commercially available stem-loop primers are designed based on later versions of the miRBase. In addition, qRT-PCR requires extreme care to avoid contamination or other technical errors and can produce variable results even in expert laboratories, suggesting that it is not the ideal gold standard (29).
It is considered good practice to profile miRs by microarray followed by validation with qRT-PCR (5). However, there are no standard guidelines for conducting and reporting such validation. For example, some authors report validation by qRT-PCR for some miRs and by Northern blot for other miRs, or report validation of precursor miRs but not mature miRs, without any explanation as to why these tests were chosen (47–49). In addition, when authors report that a few miRs were validated by qRT-PCR, it is often unclear if other miRs were also tested but not validated by qRT-PCR. Standardized guidelines would aid interpretation of miR data by creating transparency in reporting. Furthermore, relative quantification of miR expression by qRT-PCR depends on the small nuclear RNA used as an internal control. There is no standard as to which internal control should be used for the normalization of qRT-PCR data, and inappropriate normalization can result in erroneous conclusions (50). Clarity in describing how standardization controls are chosen would also aid data interpretation.
Since the full complement of human miRs has not been ascertained (29), platforms like microarray and qRT-PCR that can only identify known sequences are limited. Emerging sequencing technologies provide a new discovery approach and have already been used to study small RNA, of which microRNA is one of the main components. Next-generation high-resolution deep sequencing allows both discovery of new miRs and confirmation of known miRs (7) in a high-speed, high-throughput fashion without the need for gels (51) or the ambiguity in data interpretation inherited by other methods. These new methods primarily include three platforms: the Roche (454) Genome Sequencer (GS), which uses pyrosequencing to simultaneous sequence over 1 million reads in excess of 400 base pairs (bp) (52); the Illumina (Solexa) Genome Analyzer (GA), which uses sequencing-by-synthesis to produce approximately 200 million 75- to 100-bp reads (53); and the Applied Biosystems SOLiD system, which uses sequencing by oligo ligation and detection to produce 400 million 50-bp reads (54).
In brief, these methods determine the nucleotide sequence by taking a picture every time a new nucleotide is added to the growing strand, thus emitting light (51). To ensure sufficient light signal intensity for accurate detection of each added nucleotide, these methods typically amplify the fragments through emulsion PCR or library generation followed by PCR-based cluster amplification. However, amplification can result in sequence errors, and some sequences may be preferentially amplified, limiting the ability to accurately quantify relative abundance. These methods can also be less accurate in areas of homopolar (identical) bases. New techniques to read the sequence derived from a single molecule are currently under development. Limitations of next-generation sequencing include bioinformatic challenges due to large quantities of data and the high cost of instruments and reagents, although each sample can be bar-coded, allowing samples to be mixed and run simultaneously to reduce cost. The third generation of sequencing technologies currently under development could eventually provide lower-cost options (51, 55).
In summary, miR research is an exciting and growing field. Accurate and quantitative estimation of miR profiles or specific miR expression levels and their correlation with a given condition is the key to fully understanding the function of miR biological processing. All of the current and new technologies have benefits and limitations to consider when designing miR studies. Results can vary across platforms, requiring careful and critical evaluation when interpreting findings. When costs come down as they have for genotyping, next-generation sequencing may allow fast and possibly more accurate miR profiling in a way that could greatly enhance epidemiologic research.
The authors thank Dr. Aaron Schetter for his critical review of this manuscript.