|Home | About | Journals | Submit | Contact Us | Français|
The fields of applied and translational microRNA research have exploded in recent years as microRNAs have been implicated across a spectrum of diseases. MicroRNA biomarkers, microRNA therapeutics, microRNA regulation of cellular physiology and even xenomiRs have stimulated great interest, which have brought many researchers into the field. Despite many successes in determining general mechanisms of microRNA generation and function, the application of microRNAs in translational areas has not had as much success. It has been a challenge to localize microRNAs to a given cell type within tissues and assay them reliably. At supraphysiologic levels, microRNAs may regulate hosts of genes that are not the physiologic biochemical targets. Thus the applied and translational microRNA literature is filled with pitfalls and claims that are neither scientifically rigorous nor reproducible. This review is focused on increasing awareness of the challenges of working with microRNAs in translational research and recommends better practices in this area of discovery.
Since the discovery of mammalian microRNAs in the early 2000s, there has been a huge effort to understand the functionality of this class of small RNAs. These highly conserved molecules are an essential regulatory element affecting the activation of whole pathways and certainly have a central role in disease processes. Investigators have rapidly explored all manner of ideas related to microRNAs. Much of this work has been elegant, highlighting roles of microRNAs in transcriptional regulation and their potential as biomarkers and therapeutics. Another significant portion of microRNA studies advanced ideas that, although initially intriguing or plausible, have been shown through further, more rigorous study to be of questionable validity. This has created a legacy of ideas that have challenged the clarity of the physiologic roles of microRNAs.
The field of microRNA research is one with unique challenges. The manner of their numerical nomenclature, while highly organized, does not allow an intrinsic understanding of the microRNA's activity or localization in the same manner that gene names can be used. microRNAs can be difficult to assay specifically, as their short length (around 22 nt) and similar sequences can confound hybridization and amplification techniques and can complicate interpretation of cross-species discoveries. Many microRNA assaying techniques are new and not fully characterized, with nonstandard normalization methods, allowing for their misuse or misinterpretation. microRNAs also regulate mRNAs through short (6–8 bp) seed sequences, which by random chance can be found on many genes, making in silico identification of microRNA-regulated pathways possible but also potentially meaningless. All of these challenges have hampered the reproducibility and rigor expected of microRNA experiments, particularly in the translational and applied space. This is important in light of new NIH guidelines to assay rigor and reproducibility for grant funding (http://grants.nih.gov/reproducibility/).
We have evaluated aspects of the microRNA literature, primarily in the translational realm, and have identified both common misconceptions that, in our view, are complicating scientific discovery, and best-practices. It is our hope that by clearly stating these problems related to microRNA studies and suggesting better approaches and examples, a more rigorous and more biologically relevant approach to microRNA research can ensue. Further, we hope to encourage reconsideration of closely held, inaccurate beliefs based on less than rigorous microRNA literature so that the field can advance and the important functions and roles of microRNA can be more firmly established.
Although this should be obvious, all genes, proteins, microRNAs, and other RNA moieties are expressed by cells. A complex collection of different cell types constitutes a tissue. Thus, macerating a tissue for RNA isolation and discovering the presence of a particular microRNA does not localize it to the most common cell type or the cell type of interest to the investigator. This mistake is common and relates to the anonymity of microRNAs and our current lack of a centralized resource that provides microRNA expression at the cellular level.1-4
Although many microRNAs are ubiquitously expressed, others have expression specificity or marked concentration differences between cell types (Fig. 1).5 While a problem for many cell-specific microRNAs, 2 examples illustrate this point. miR-126 is expressed highly, but not necessarily exclusively, in endothelial cells.6,7 microRNAs miR-451a and miR-144, in a bicistronic cluster, are expressed exclusively in red blood cells (RBCs), while miR-486 is found in RBCs and possibly platelets (Sequence Read Archive [SRA] sample ERR747965). Robust expression of these 4 microRNAs is present in all tissue samples as a result of blood vessels, comprised of endothelial cells containing red blood cells (RBCs), being a constituent of all organs/tissues. While saline flushes can remove most RBCs in laboratory animal tissues, this is not achievable with surgically resected human tissues. However, the extent of rinsing of a tissue, to remove RBCs, may modulate the amounts of these 3 RBC microRNAs and can strongly and unknowingly complicate tissue signals (see point 2).8
These four microRNAs have been assigned incorrect functional significance in unrelated cell types. Again, because of tissue-level expression data, a miR-126 signal has repeatedly been reported as lower in cancer vs. normal tissue comparisons.9,10 As a result, its levels are then experimentally manipulated, likely inappropriately, in epithelial cancer cell lines, in an effort to determine function.10,11 microRNAs miR-451a and miR-144 have yet to be identified in any non-neoplastic cell type, yet they repeatedly are assigned non-RBC functionality due to tissue level data being confused for cell level expression data (Tables 1 and 2).12-17 miR-486 is also abundant in platelets, but neither RBC nor platelet expression justifies its assigned role in a variety of disease states in which it is unlikely to participate.18,19 Thus it is important to identify the cellular source of a tissue-level microRNA signature, which can be done by checking known datasets,5,20 obtaining cell-specific data from the Sequence Read Archive or Gene Expression Omnibus, or measuring the microRNA by qPCR or northern blot in a cell type of interest.
Might a normally absent microRNA be “turned on” in a disease state? Although plausible, as discussed in point 4 below, microRNA expression levels appear to be tightly regulated and tend to not have the marked fold changes observed for mRNAs. Thus, if a microRNA is not expressed in a certain cell type in a non-perturbed state, it is unlikely to be significantly increased by perturbation. It is worth noting that cancer-derived cells and cell lines may not express the same microRNAs as a matched native cell due to larger chromosomal structural changes and major pathway alterations. Therefore, caution should be taken in making that comparison. For almost all microRNA studies, it is essential to have definitive knowledge of the cellular origin of the microRNA.
As a correlate to point 1, because many microRNAs have widely variable expression differences between cell types, the ratio of cell types in tissues is important. The cellular composition of tissue changes in most disease states. In cancer, malignant cells replace the native epithelium. In inflammatory diseases, the number of infiltrating leukocytes can increase markedly. If a rare cell type has a 4-fold increase in a tissue, then its cell-specific microRNAs will also increase in a commensurate way (Fig. 2). Several examples from the literature support this underappreciated concept. The Tewari group demonstrated that the plasma expression level of miR-223, a microRNA with exclusivity to neutrophils and monocytes, strongly correlated with the absolute neutrophil count (R = 0.76).21 In a study of inducible colitis, the Zen group showed a strong increase of miR-150 in colons with inflammation.22 miR-150 is highly expressed exclusively in lymphocytes, which are increased multi-fold in colitis. Thus a mere alteration in the ratio of one cell type to another can drive perceived changes in tissue microRNA levels.
It is not yet possible to deconvolute complex tissues into their intrinsic cell types based on microRNA expression levels alone. To avoid the pitfall of conflating tissue composition and microRNA expression changes, histopathologic examination of case and control tissue can be cross-checked against microRNA expression manuscripts to identify microRNAs that may change only as a result of cellular composition changes. A microRNA level change in tissue does not imply necessarily its up or down regulation. Failure to recognize this cause of expression alteration has probably adversely affected the interpretation of numerous studies.
The tools that have been developed to analyze microRNA expression are still being refined, and several technical and methodologic challenges remain. In recent years, for example, small RNA-seq for microRNA (microRNA-seq) has become a major method to analyze microRNA expression. While microRNA-seq is used extensively, it does have its limitations. Here, we will discuss several challenges related to microRNA-seq in general and normalization methods specifically.
Currently, most microRNA data sets are normalized to reads per million microRNA reads (RPM). This is similar to Fragments Per Kilobase of transcript per Million mapped reads (FPKM), established for mRNA reads, without a need to normalize for transcript length since mature microRNAs are all ~17-25bp. Also, since the number of reads can vary >10-fold within/between projects, it is superior to using raw microRNA counts. The fundamental problem with RPM is that it is a dependent normalization value, such that any change in one microRNA's read counts will adjust all other microRNAs values whether or not absolute expression actually changed (Fig. 3). Between random fluctuations in sequencing reads and such a change, one could identify statistically significant microRNA changes that are a feature of other fluctuations. Array and qRT-PCR based methods do not have the same challenge, as these are independent observations.
A second challenge of microRNA-seq is the diversity of alignment tool methods. These have an outsized effect on which microRNAs are identified and how many reads are called. In a comparison study, we used 8 different tools on the same datasets.23 The differences are dependent upon the library to which the microRNAs are being aligned (genome or hairpin microRNA), the alignment tool (Bowtie, BLASTn, PatMaN), the alignment parameters (allowed mismatches) and handling of reads mapping to multiple locations. For a sample containing 33.23 million reads (SRA sample SRR873410) the methods ranged from discovering 26.24 million microRNA reads down to only 16.39 million microRNA reads. In this same data set, the methods found between 489 and 1,499 microRNAs in total. Thus, differences in alignment tools can significantly impact on the counting of microRNAs in a sample. We found strong agreement between 2 new tools, miRge and Chimira and can recommend them both.23,24
Finally, there are biases in microRNA-seq library preparation, mainly due to challenges of adaptor ligation, GC content, and PCR amplification bias.25-27 The most commonly used method for Illumina sequencing is the TruSeq SmallRNA kit. As an example of the biases that exist, we have found that in almost all examples of microRNA-seq using this method there is a ~40 fold higher level of miR-143 than miR-145, 2 microRNAs in a bicistronic cluster. However, by other methods including RNA gel blot and qRT-PCR, these levels are significantly more equivalent.28-30 Two studies have demonstrated wide differences in microRNA RPM depending on the use of different library preparation kits (TruSeq small RNA [Illumina], NEXTflex [Bioo Scientific], and NEBNext [New England Biolabs]), with some microRNAs having counts as high as 100,000 RPM by one method and <1,000 RPM by a different method from matched samples.31,32 Thus in comparing studies, it is important to keep these concerns in mind and for all projects within a given laboratory or core sequencing facility be consistent in regards to the preparation method used.
For studies designed to investigate a handful of microRNAs or even hundreds of microRNAs, other methods such as qPCR arrays, hybridization arrays, droplet digital PCR (ddPCR), or NanoString, may be better approaches. In a head-to-head analysis of methods in the miRQC study, the sequencing methods were not superior and, in fact, correlated less-well to the other methods for matched samples.33 The improvements that need to be made for microRNA-seq to become a true gold standard are 1) improvement of library preparation to reduce bias; 2) the use of a consensus method of alignment; and 3) the development of universal internal controls as has recently been proposed.34
Another big methodologic challenge concerns methods to measure microRNAs in biofluids, particularly finding and using appropriate internal or external controls to which the data can be normalized. In the first paper describing circulating microRNAs for cancer detection, the Tewari group used 3 spiked-in C. elegans microRNAs to normalize the data.35 Other groups have tried a variety of methods, including housekeeping-style RNAs with the small nucleolar RNA U6 (RNU6B, U6) being one of them. U6 has been used at least as far back as 2007 to normalize microRNA data, despite evidence showing its poor performance as a normalization tool in some settings.36,37 Whether or not it is the best normalization standard for tissue and cell work is beyond the scope of this commentary, and we recommend several manuscripts on the topic for further information.37-39 However, for serum and plasma, it is likely unacceptable. U6 would be present in serum or plasma only as a result of cell lysis or coagulation, which are variable events and highly influenced by pre-analytical factors (clotting differences, spin speed differences, red blood cell lysis, storage temperature, etc.).40,41
The best approach to normalize serum/plasma microRNA is still open to debate. A microRNA spike-in during the extraction and amplification steps is a current best practice but does not provide biological normalization.42 If the experiment is using an array-based method with hundreds of microRNAs, a global normalization approach may be useful.33,43 Many microRNAs have been tried as “housekeeping genes,” with the most common being miR-16. However, this microRNA has also been reported to vary in many human diseases.5 Thus, while there may not yet be an agreed upon way of doing this normalization in a fluid sample, the main pitfall to avoid is the use of any single RNA species, such as U6, without evidence for its reliability.
Most well-performed studies that have investigated microRNA expression changes in a disease state generally find small (~1.5-4-fold) changes in microRNA levels.44-46 This may not hold in cancer, where, as described above, genetic alterations may drive some large differences in a microRNA signal. Also, microRNAs may change to a greater extent during development.47 Nonetheless, huge fold differences in microRNAs in somatic adult cells may be due to a methodologic problem, often poor normalization. For example, a 2013 study investigated microRNAs as serum biomarkers for endometriosis.48 Using a TaqMan microRNA array, the investigators found 22 microRNAs that all had >1,000 fold differences between samples. However, this conclusion was based on 2 pooled samples of 10 individuals each. Therefore, any technical variables could not be accounted for or removed. The need to pool samples for such experiments is often based on financial resources, but it carries a high cost of uncontrolled variability. Additionally, a subset of these microRNAs were “confirmed” in a second sample of serum, with normalization to RNU6B, which was described in point 3 as inappropriate as it is not native to serum.
Experimental noise—background signal—is all too often misinterpreted as fertile ground for biomarker discovery. A detection or abundance threshold is a must for any profiling study, since undetected features cannot be normalized or assessed for expression differences. Unfortunately, scores of studies have reported putatively significant findings that are simply noise (examples available upon request).
As an example, 2 very closely related reports described differential regulation of microRNAs in brain cells exposed to HIV-1 proteins.49,50 Specifically, 69 microRNAs had a 2-fold change or greater response to treatment in neurons and a neuronal cell line. A reanalysis of the microarray data revealed that only 1 of the 69 microRNAs was expressed above a common detection threshold setting (negative control (background) signal plus 2 standard deviations). Despite this, numerous microRNAs were “confirmed” by quantitative PCR.
It is often obvious if a study has generated seemingly significant findings within the noise of the data. These papers report obscure microRNAs or the anti-sense microRNA (not Argonaute (AGO)-incorporated), which mostly has extremely low levels of mature expression. These papers often claim large fold changes of expression which are beyond the physiologically reasonable changes in disease (see point 4).
The pitfall to avoid is reporting noise as results. Certain basal levels of expression should be adhered to, and the minimal threshold should be reported in the methods section. As stated, 2 standard deviations above background signal, which indicates statistical significance, may be a reasonable cutoff level for hybridization array based data. For microRNA-seq, a rule of thumb is that microRNAs should have values above 100 RPM, with the added caveat that changing levels of one microRNA affects levels of all others (point 2, above). Indeed, microRNAs with <1,000 RPM may not have biologic activity.51 It is also worth determining if the described microRNA is expressed in the cell type of interest (point 1). Failing to account for noise, one will very likely identify a host of obscure microRNAs that have no biologic functionality in the tissue or cell type being evaluated. Such results are unlikely to be reproducible in follow up experiments, and functional studies of these microRNAs are generally artificial.
The ability to transfect or otherwise introduce microRNAs or their inhibitors into cells is important to understanding the biology and regulatory behavior of this class of RNA. However, one must be aware of potential problems. For example, abnormal AGO loading and formation of high molecular weight aggregates have been reported in transfection experiments,52 while the presence of microRNA inhibitors has been found to directly inhibit qPCR reactions. Length-dependent activation of dsRNA sensor pathways by microRNA mimics (but not by commercially available 21-nucleotide controls) can also cloud interpretation of results.53,54 While exogenous delivery of microRNAs are routinely performed and widely published, just as with mRNA studies, marked overexpression of a microRNA can result in biologically inaccurate discoveries. The recent efforts to understand extracellular vesicle (EV)-mediated transfer of microRNAs through the use of transfection are illuminating.
With the seminal work by Valadi et al, the field of EV microRNA transfer was born.55 There has been a significant effort to understand how microRNAs are packaged (selective versus stochastic) and transferred, and into what cell types they can be delivered to provide paracrine signaling.56-58 Many groups break this latter experiment into 2 parts. First they demonstrate transfer of EV from one cell type to another, often using fluorescent lipid dyes that may themselves form micelles or transfer promiscuously. Then they use a more traditional transfection approach to increase their microRNA of interest in the recipient cell and assess its effects. The problem with this approach is the vastly different uptake of microRNAs via EVs and artificial liposomal systems. As seen in Fig. 4, the ability of recipient cells to uptake microRNA by EV transfer likely differs, by orders of magnitude, to microRNA levels achievable by transfection. This supraphysiologic overexpression of microRNA obtainable by transfection may have different biological functions as described below.
Consider the abundance of microRNAs in EVs in the circulation. Using a published estimate of microRNA copies per exosome,59 and selecting a hypothetical abundant microRNA with a prevalence of 5% of total microRNA, one copy would be expected in every 2400 EVs. This estimate is close to the finding of abundant miRs-223 and −126 at a frequency of one copy per 7,000 and 18,000 blood plasma EVs, respectively.59 Based on our nanoparticle tracking data and published blood EV abundance estimates,60 there may be approximately 3 to 11 trillion EVs in circulation in the average human, with around 1 to 5 billion carrying a copy of our hypothetical microRNA. These “loaded” EVs, though, are part of a system that includes about 25 trillion erythrocytes, 1.5 trillion endothelial cells, 1.5 trillion platelets, and 500 billion lymphocytes (based on estimates from Bianconi E et al61 and Sender, et al62). Even if all blood EVs could somehow be concentrated and delivered to a specific, small cell population (with no uptake by any of the cell types listed above), circulating blood carries only enough EVs to introduce a regulation-relevant thousand or more EVs into only a million or so cells around the same number one might culture in a single well of a 6-well culture plate. In contrast, typical microRNA mimic transfection experiments involve millions of copies per cell, delivered using lipid components optimized for nucleic acid delivery into the cytoplasm of cells.
We do not suggest that EVs have no role in delivering microRNAs in vivo. In tissue, especially at local cell-cell “synapses,” concentrations of EVs might reach levels that are relevant to functional transfer of microRNA. Although much remains to be learned about EV-mediated delivery,63,64 some EVs might also deliver microRNAs in the “right,” i.e., AGO-associated, form to plug into regulatory complexes more efficiently than exogenous synthetic complexes. EV-delivered microRNAs might also accumulate in specific cells over extended periods of time, to be activated under certain stimuli. However, it is clear that there are insufficient microRNA-containing EVs in the entirety of human circulation to achieve the immediate, per-cell exposure levels of typical microRNA transfection experiments, and this must be considered when designing experiments and interpreting results.
How does supraphysiologic overexpression affect the functionality of the microRNA in the cell? Mayya and Duchaine have shown that microRNAs work in a variable dose-response fashion with their mRNA binding sites.65 Thus, the concentration of a microRNA may dictate which genes it regulates. If the microRNA is at a low concentration it may modulate only a subset of genes, whereas at high concentrations, it may find additional mRNA binding partners. Which of these are physiologic partners and which are non-physiologic “off-target” effects are not known but are likely impacting biologic pathways and cell phenotypes in different ways.
A caveat: it is unknown what percent of microRNAs from either method are incorporated into the RISC complex. Interesting work by Thomson et al suggests that only a small percent of transfected microRNAs end up in a potentially functional pool.53So perhaps the discrepancy is not as wide as thought, but this should be clarified before more 2-step exosomal microRNA transfer experiments are performed in this fashion. To avoid this pitfall, one should consider the biological levels of the microRNA on a cell and transfect it accordingly. Supraphysiologic levels could lead to unintended and biologically unlikely regulatory activities. It is also worth noting that experimental expression of a microRNA in a cell type that does not natively express the microRNA, will likely result in a change to the cellular phenotype. This is simply the result of the minimalist requirement of a short (6-8bp) complementary seed sequence to be present on an mRNA that can be bound by abundant microRNAs in a RISC complex. These phenotypes could be interesting and important from a therapeutic angle, but not necessarily biologically relevant.20
Even if a microRNA is abundant under physiologic conditions, it might not play an important role in posttranscriptional regulation of specific RNAs. This is because microRNA concentration is only one of several factors that influence the extent of regulation. Even changes in concentration of microRNAs may be inconsequential. A common approach to determining microRNA regulation of a gene in the literature is as follows: 1) profiling reveals that miR-X is upregulated in a disease of interest; 2) an in silico predicted target of the microRNA, Target A, is related to the disease and is downregulated; 3) supraphysiologic miR-X overexpression in a transformed cell line partially silences a reporter with recognition sites from putative Target A; 4) miR-X must regulate Target A. This is often true, however, there are many additional factors that must be considered in microRNA-mediated regulation, particularly due to the supraphysiologic and ex vivo nature of these experiments.
Perhaps the most intuitive factor is target concentration.66 A microRNA that increases 2-fold under cell stimulation might not have a proportional effect on a target mRNA with 100-fold greater transcriptional abundance under the same circumstances. Each microRNA species may regulate many messages and the fold expression effect can be diluted out by its affinity to binding partners other than the gene of interest. Nor are mRNAs the only RNAs recognized by microRNAs. microRNAs have been found in RISC in proximity to a variety of RNA molecules, including long noncoding RNAs, tRNAs, even other microRNAs.67,68 Altering the levels of one “competing” target will not necessarily have a strong effect on other targets.69
Complicating the issue further is our limited understanding of how microRNAs interact with targets. Canonical, perfect seed matches may be enriched and relatively simple to search for, but they are not the only modes of microRNA:target recognition. Some interactions appear to have no seed-based interaction at all.68 Quantitating and correlating altered microRNA and transcript (or recognition element) levels, then matching them up via bioinformatics approaches, does not determine direct regulation. The cross-linking, ligation, and sequencing of hybrids (CLASH) approach68,70 which binds microRNAs to their targets, does demonstrate a true interaction and should be a more advantageous approach. Unfortunately, it has not been widely adopted and, like earlier methods, requires large amounts of sample to generate small amounts of direct interaction data. More sensitive refinements of CLASH and related methods are needed to assess direct binding and regulation in smaller samples, since the “regulome” for each specific microRNA is likely to be different from one cell type to another and under different conditions. Full and accurate knowledge of the target landscape is essential to understanding effects of microRNAs.
The subcellular localization of microRNAs strongly affects what putative targets they can regulate. First, the nuclear/cytoplasmic ratio of a specific microRNA will clearly affect its availability for canonical effects on messages. If a microRNA is upregulated during disease, no change in its regulatory influence would be expected if the nuclear ratio also shifted such that most of the “extra” copies were found in the nucleus. Similarly, a tilt toward the cytoplasm for a predominantly nuclear microRNA could lead to increased target regulation even if the overall microRNA level did not change or even declined. Second, location of a microRNA and its cognate mRNA within the cytoplasm might influence whether the message is translated, cleaved, or repressed. It has been proposed, for example, that P bodies are repositories of “inactive” mRNAs.71 Alternative interpretations have been proposed, though. Transcripts of important proteins are held in reserve with microRNAs and translational machinery near synapses, far from the nucleus.72 Upon appropriate signaling, proteins can be made locally and rapidly without waiting for communication with the nucleus.
In our view, one of the potentially most transformative recent insights in the field was reported by La Rocca and colleagues.73,74 In many or most mammalian tissues—and presumably in most cells—microRNAs were found predominantly in a low molecular weight form (inactive complex with AGO) rather than a high molecular weight form (within an active RISC complex and bound to mRNA).74 Complexed with AGO alone, microRNAs might be long-lived, but they have no regulatory influence.73 To regulate, a microRNA:AGO must be incorporated with other proteins into RISC and associated with a target RNA. La Rocca et al. found that the exceptions to the inactive, low mass forms were from high-activity organs such as thymus (immune cell development and maturation) and brain (high energy use). In vitro, cancer cell lines and activated cells also had predominantly high molecular weight complexes. These findings suggest that microRNA-mediated regulation may be rare and inconsequential in most adult cells and is important chiefly in development, proliferation and activation, high metabolic activity, and cancer. In stimulated T cells, compared to resting T cells, redistribution of microRNAs into the high molecular weight form was distinct for each microRNA. Strikingly, La Rocca et al reported a microRNA that declined in overall abundance during T-cell activation yet became a more functional regulator as it was incorporated into active, high molecular weight complexes.74 In future studies, one must now consider not only abundance across the microRNA:target network, but also the activity state of AGO and microRNAs in the cell.
One of the benefits of our information age and the sharing economy is the number of publically available data sets and tools to evaluate microRNAs. There is significant value in utilizing these tools (Tables 3). However, caution and skepticism should always be advised as, in general, these materials may not have been developed to address one's exact question.
An important challenge for microRNA bioinformatics is the prediction of mRNA targets. Multiple programs built on different algorithms have been developed, including miRanda, TargetScan, RNA22, and RNAHybrid.75-78 Due to differences in their models, each will predict different gene targets for any given microRNA. While some researchers have focused on any gene that shows up on multiple lists made by the different tools, no consistently significant enrichment of “true” targets was found in a test of this method.79 We further recommend confirming that possible mRNA binding partners are expressed in the appropriate cell type and at a reasonable level relative to the microRNA (see points 1 and 7).
Another approach to finding relevant gene-microRNA interactions is to query “validated” target databases. These tools use the literature and/or public databases to find microRNA-target interactions with experimental support; however, they should be used with caution and at least some fact-checking. We have previously noted the pitfalls of algorithms that apparently use simple text searches,80 such that, e.g., a mention of “β-actin” on page 122 of an edition might be recorded by the algorithm as miR-122 targeting the protein. We have also observed that the number and identity of interactions predicted by some databases can change drastically from month to month, even without version updates, compromising reliability.80 An example of a reliable validated target database is DIANA-TarBase, which is “manually” curated and helpfully provides not only the supporting study or studies for each interaction, but also valuable methods information.81 Even so, each interaction rests on studies that may have their inherent idiosyncrasies. If possible, it is recommended to check the primary sources for interactions of interest.
Related to target prediction and validation, many microRNA studies consist of profiling experiments followed by extensive pathway analysis. These differentially expressed microRNAs are assigned to genes, which are, in turn, assigned to biological pathways. Selected pathways are then discussed in the context of the biological question. This approach involves multiple levels of prediction and potential bias. Algorithms or literature records are used to assign microRNAs to possible targets. Unless gene expression data are available, the relative concentrations (or even presence) of targets are not assessed in the cell or tissue being studied. Pathway analysis introduces a publication bias, relying on published interactions and pathway associations. With much of the field focused on the role of microRNAs in neoplasia, it particularly biases microRNA function into cell proliferation pathways. Finally, identified pathways are integrated into a discussion. What is an appropriate control for a pathway analysis? Would a random selection of microRNAs also yield “significant” pathways that could be rationalized post-hoc? In general, we would suggest that pathway analysis is of limited importance and utility when performed alone. It is useful, in conjunction with other data and analyses to suggest interactions that are experimentally confirmed.
A substantial and growing amount of microRNA expression data is present at the Gene Expression Omnibus (GEO) or Sequence Read Archive (SRA). These data repositories can be used to evaluate microRNA localization (see point 1) or microRNA response to a stimulus. Quality checks should be performed when using public data as we have found them to be variable. For microRNA RNA-seq, the percent of all reads that are called microRNAs is important. In our experience, <25% is poor, 25–50% is likely OK, and >50% is typical of good data. It is also important to have a depth of coverage >1,000,000 microRNA reads for accurate RPM values. For array data, as stated above, first check that signal exceeds background, then ensure that an appropriate global normalization has been performed.
Some of the influential manuscripts in the microRNA field represent excellent case studies to help work through the challenges and successes of microRNA research. We present 4 important and often well-cited publications as case studies to demonstrate the challenges in rigor and reproducibility and how to successfully navigate through them.
Case Study 1: In 2007, a major study reported on the downregulation of microRNAs in HIV-1 infection.82 Huang et al's profile of microRNAs in activated and inactivated CD4+ T-cells reported active T cells are better hosts of HIV replication than resting cells. Downregulated microRNAs, the authors reasoned, could be involved in repressing HIV transcripts, so they used target prediction algorithms to identify potential microRNA recognition elements in the 3′ portion of the HIV genome. Five microRNAs that were 2x downregulated and had predicted binding sites in HIV RNA were selected for further investigation (miR-28, miR-125b, miR-150, miR-223, and miR-382). All five microRNAs were found to bind directly to HIV sequences in reporter assays and to have a negative effect on HIV when transfected together into cells.
Numerous subsequent studies have taken these 5 microRNAs to be an exclusive set of reliable “anti-HIV” microRNAs, even though no additional confirmation of results was done and despite identification of many additional candidates according to the supplemental materials of the original study. To our knowledge, there has been no published reproduction of the luciferase and mutation assays to confirm direct targeting of HIV by specific microRNAs. A seminal “guidepost” manuscript, such as this, can sometimes restrict subsequent studies and allow other manuscripts to be published with less-rigorous methodology. For example, a later study used a microarray approach to identify anti-HIV microRNAs that were downregulated by cocaine treatment of CD4+ T-cells.83 The authors reported that the same 5 microRNAs were downregulated by cocaine treatment. However, these data were cherry-picked, as analysis of the raw microarray data demonstrated that every detected microRNA on the array was “downregulated” by cocaine treatment—presumably the result of a dye bias due to improper methods. Essentially, this second paper's findings were accepted as plausible as they were aligned to the earlier study. This story reminds us to carefully interpret these “guidepost” manuscripts and subsequent studies and to maintain a healthy level of skepticism until the data has been rigorously confirmed.
Case Study 2: A recent manuscript is a good example of how to find fluid based microRNA biomarkers in disease. Allen-Rhoades et al. designed a study to identify plasma microRNAs that could detect and monitor childhood osteosarcoma.43 The first component was performed in an osteosarcoma mouse model. After power calculations were performed on how many animals would be required, human osteosarcoma tumors were grown in nude mice with serial plasma collections over time. All plasma samples were treated identically and all aspects of the method were reported including centrifugation speed, time and temperature. RNA extraction was monitored by the addition of synthetic microRNAs that were evaluated by PCR for consistency. microRNAs were initially evaluated by Exiqon array and normalized by global means after removing outliers and delineating a background threshold level. Four microRNAs were selected for follow up based on their fold changes and evaluation of the literature.
Validation of significant microRNA findings were performed in a second set of mice by a qPCR method. The qPCR method was normalized to 3 microRNAs and the synthetic spike-in microRNA determined using the GeNorm and NormFinder algorithms established by the array data.84 Validation confirmed one microRNA with lower signal and 3 microRNAs with increased signal. After validation, human samples from 3 sources were used. This can be less than ideal as different initial preparation and storage methods can alter microRNA expression levels.41 Nonetheless, within the project, these fluids were handled consistently. Global microRNA analysis was replicated in the human sample to find controls appropriate for normalization. Finally, the microRNA signature was replicated in the human subjects by qPCR.
Key strengths of this approach include 1) robust and consistent methods; 2) population sizes based on power calculations; 3) appropriate controls for microRNA expression levels; 4) developing a basal threshold of expression; 5) initial testing in an animal model with significantly reduced signal heterogeneity compared to humans; 6) follow up on both elevated and decreased microRNA signals; 7) validation in human subjects.
Case study 3: Our third case study involves a popular concept, not just one paper: the idea that microRNAs from the diet or other environmental sources (xenomiRs) impact on mammalian cell function. According to the hypothesis, microRNAs from food first survive processing, then pass through the harsh environment of the digestive tract intact, cross the intestinal barrier, enter the blood circulation, and finally, act systemically at zeptomolar to femtomolar concentrations to cause demonstrable phenotypes (see, for example, the concentrations in ). The premise of the function of such xenomiRs may come from studies of C. elegans, which takes up dsRNA from its environment and incorporates it effectively into the RNAi pathway.86 However, mammals lack the C. elegans mechanisms for uptake,87 systemic distribution,88 and amplification.89,90
The landmark publication on plant xenomiR uptake reported rice microRNA miR168a was abundant in mammalian serum and plasma and could bind the LDLRAP1 mRNA.91 This finding has now been attributed to artifact or contamination.92-95 Furthermore, artificial overexpression of a foreign microRNA in vitro can always have an effect as described (point 6), but this does not mean it is involved in a natural process. Numerous follow up studies have presented a mixed picture regarding xenomiRs. Many robust uptake experiments using a variety of species and microRNA sources (plant and animal) have found no significant influx of microRNAs into the bloodstream.92,93,96-98 In one human feeding study involving dicots, the most efficiently absorbed microRNA was a monocot microRNA that would not have been present in the food source.99,100 Transgenic mouse studies found no uptake of specific milk microRNAs.101,102 Finally, positive results from a milk intake study involving humans103 could not be validated.104
Exciting and new concepts in the microRNA oeuvre must be supported by rigorous data. When dealing with very low RNA concentrations, one must be on the outlook for laboratory contaminants, nonspecific amplification, sequencing bias, and effects of analysis pipelines, among others, that are more parsimonious explanations for the presence of foreign RNA than hypothetical uptake pathways. For any novel finding, rigorous adherence to optimal methods - adequate controls, sufficient reproducibility (‘n's), and public availability of raw data – is vital to evaluation and general acceptance.
Case Study 4: The Massagué laboratory reported microRNAs miR-126, and miR-335 suppress breast cancer metastasis based on their interpretation of a large number of experiments.105 While this study may be generally correct, there are reasons to be skeptical that these microRNAs represent definitive metastasis suppressors. Their initial study compared highly metastatic clones of the breast cancer cell line MDA-MB-231 to the parent MDA-MB-231 and was performed by TaqMan microRNA assay. They demonstrated higher expression of 8 microRNAs (miR-126, miR-489, miR-127-3p, miR-199a-5p, miR-122, miR-203, miR-206 and miR-335) in the parent MDA-MB-231 cell line. In subsequent publically-available datasets, by Agilent or Affymetrix hybridization microRNA arrays (GEO records GSM1564334 and GSM1571270) or by small RNA-seq (SRA record SRR029132), these microRNAs are either trivially expressed or not expressed in MDA-MB-231. This may indicate different MDA-MB-231 parental clones across these experiments and speaks to the NIH's concern over the identity and validity of cell lines. Or it may indicate interpreting low to negligible expression levels for these microRNAs. miR-126 is well-known to be highly expressed in endothelial cells, miR-122 is well-known to be exclusively expressed in hepatocytes at functional levels, and miR-206 is known to be expressed exclusively in skeletal myocytes (see point 1).7,106 So despite a caveat that malignant cells can have aberrant microRNA expression, it is unlikely MDA-MB-231 gains expression of all 8 of these microRNAs only to lose them during metastasis. Because these microRNAs had such low expression in the parent line, these significant fold-changes could have easily been experimental noise (see point 5) or perhaps differences among quantitative expression platforms (qPCR vs array and RNA-seq).33 To validate the importance of these microRNAs, they overexpressed, by 2-5 log10 fold, 3 of these microRNAs and saw marked reduction of lung colonization. However, supraphysiologic overexpression (see point 6) is likely to be toxic and indeed the measures of anti-metastasis effects (cell proliferation reduction, apoptosis, and reduced migration) are frequently reported with supraphysiologic overexpression of inappropriate microRNAs (Tables 1 and 2). Additionally, they reported low levels of miR-126 and miR-335 corresponded with clinical metastases based on human breast tissue samples. Although it is difficult to know which cell type provided the most miR-335 signal (expression is high in retinal pigment epithelial cells [SRA SRR493011], CD14+ monocytes [SRA SRR527681], and neural stem cells [SRA SRR1988280]) the tissue-level signal for miR-126 would be from endothelial cells and would indicate the vascularity of the tissue and probably not changes in epithelial miR-126 levels (see point 2). The sum of these points is to question the strength of this finding and beware of other studies that reference this paper to suggest miR-126 and miR-335 have a role in epithelial cancer.107
The application of microRNAs to the fields of applied and translational research are exciting with much room for growth and discovery. The points outlined through this manuscript speak to general concerns and successes in how microRNA research is being conducted to achieve high levels of rigor and reproducibility. As we conduct our own research and review that of others, we keep some of these key points in mind:
No potential conflicts of interest were disclosed.
The authors thank Dr. Michael Paulaitis for his helpful thoughts on the manuscript. KWW is funded by NIH (NIDA) DA040385 and MKH is supported by the American Heart Association [13GRNT16420015].
Kenneth W. Witwer http://orcid.org/0000-0003-1664-4233
Marc K. Halushka http://orcid.org/0000-0002-7112-7389