|Home | About | Journals | Submit | Contact Us | Français|
Here we present a unifying hypothesis about how messenger RNAs, transcribed pseudogenes, and long non-coding RNAs “talk” to each other using microRNA response elements (MREs) as letters of a new language. We propose that this “competing endogenous RNA” (ceRNA) activity forms a large-scale regulatory network across the transcriptome, greatly expanding the functional genetic information in the human genome and playing important roles in pathological conditions, such as cancer.
Lower organisms such as Caenorhabditis elegans have a comparable number of protein-coding genes as humans (Baltimore, 2001). However, the human genome is ~30 times larger than that of C. elegans, suggesting, that the non-coding portion of the genome is of crucial importance in dictating the greater complexity of higher eukaryotes (Costa, 2008; Mattick, 2009). Indeed, a significant proportion of the mammalian transcriptome does not correspond to annotated exons of protein-coding genes (Kapranov et al., 2007), implying that the fraction of the mammalian genome “carrying information” is significantly larger than previously expected. Remarkably, systematic analyses of the cancer genome and transcriptome have identified profound alterations in non-coding genes (Beroukhim et al., 2010; Futreal et al., 2004; Stratton et al., 2009). Rearrangements, such as deletion, amplification, inversion, and chromosomal translocation are observed to alter non-coding genes, as they do to coding genes.
Although recent studies have begun associating a subset of long non-coding RNAs (lncRNAs) with specific regulatory mechanisms, less is known about non-coding transcripts on a genome wide scale (Nagano and Fraser, 2011). Moreover, little is still known of the potential non-coding functions of coding genes. Recent theoretical and experimental studies have suggested that, in particular cases (Seitz, 2009; Poliseno 2010), RNAs influence each other’s levels by competing for a limited pool of microRNAs. Here we describe a unifying hypothesis that attributes this new and potentially predictable function to the coding and non-coding transcriptome. We outline this “competitive endogenous RNA” (ceRNA) hypothesis, including its logic, and then discuss recent experimental evidence for it and the consequences of altering its homeostasis. Overall, we hypothesize that all types of RNA transcripts communicate through a new “language” mediated by microRNA binding sites (MREs or “microRNA response elements”) and that recent advances in experimental techniques are finally allowing us hear and translate this language.
Approximately 22 nucleotides in length, microRNAs bind to sequences with partial complementarity on target RNA transcripts, called microRNA recognition elements (MRE), usually resulting in the repression of target gene expression (Bartel, 2009; Thomas et al., 2010). MicroRNAs can function in a combinatorial manner if an mRNA transcript harbors numerous MREs. Furthermore, each microRNA may repress up to hundreds of transcripts, and thus, it is estimated that microRNAs regulate a large proportion of the transcriptome (Friedman et al., 2009; Thomas et al., 2010). In fact, microRNAs have been implicated in numerous diseases (http://cmbi.bjmu.edu.cn/hmdd.), including cancer (Calin et al., 2002; Lu et al., 2008).
Approximately 20,000 protein-coding genes have been identified in the human genome, many of which are densely covered in MREs (Baltimore, 2001; Friedman et al., 2009). Our increasing capacity to identify MREs on coding gene transcripts allows us to predict the extent of microRNA-dependent regulation. We believe that this predictability, coupled with appropriate validation steps, will be critical in validating the ceRNA hypothesis.
Pseudogenes are genomic loci that resemble known genes but are defined as “nonfunctional,” “junk,” or “evolutionary relics” because, except for a few cases, they do not encode functional proteins; their translation is interrupted by a premature stop codons, frameshift mutations, insertions, or deletions (D'Errico et al., 2004). Sequencing efforts have revealed ~19,000 pseudogenes in humans, many of which are transcribed and are often well conserved, suggesting that selective pressure to maintain pseudogenes exists (Pink et al., 2011).
Despite lacking canonical promoters, processed pseudogenes (i.e., ones without introns) can use proximal regulatory elements for transcription (Birney et al., 2007). Indeed, transcription of pseudogenes displays tissue-specificity and can be activated or silenced in specific pathological conditions, such as cancer (Pink et al., 2011). Importantly, the high sequence conservation between gene and associated pseudogenes implies that the same microRNAs can target them (Poliseno et al., 2010).
LncRNAs are typically 300 to thousands of nucleotides in length. The number of reported lncRNAs is expanding, and of these, a subset have been linked to epigenetic mechanisms, including XIST (X-inactive specific transcript), which is implicated in X-chromosome inactivation (Brown et al., 1992), and the recently identified large intergenic non-coding (linc-)RNAs (Gong and Maquat, 2011; Guttman et al., 2009; Huarte et al., 2010; Khalil et al., 2009). Importantly, microRNAs also regulate lncRNAs, as shown in recent global analysis of Argonaute (Ago)-bound transcripts through the HITS-CLIP technique (Chi et al., 2009; Licatalosi et al., 2008).
MicroRNAs are negative regulators of gene expression, decreasing the stability of target RNAs or limiting their translation (Fabian et al., 2010). Accordingly, microRNAs are commonly viewed as active regulatory elements, whereas the target mRNAs are viewed as passive targets of repression (Figure 1A, left).
By contrast, in 2009 Seitz hypothesized that computationally identified microRNA binding sitescan titrate miRNAs and thereby regulate the microRNA availability for (Seitz, 2009). We have more recently demonstrated experimentally that pseudogenes, due to their high sequence homology, can act as legitimate bona fide microRNA competitors thereby actively competing with their ancestral protein-coding genes for the same pool of microRNAs through sets of conserved MREs (Poliseno, 2010). The consequence of competition for microRNA is observed as a decrease in microRNA detection, and thus an impairment of microRNA activity (Cazalla et al., 2010; Lee et al., 2010; Wang et al., 2010).
Thus, we hypothesize that in addition to the conventional microRNA → RNA function, a reversed RNA → microRNA logic exists (Figure 1A, right), in which bona fide coding and non-coding RNA targets can cross-talk through their ability to compete for microRNA binding. On the basis of this hypothesis MREs can be viewed as the letters of a “RNA language” by which transcripts can actively communicate to each other to regulate their respective expression levels (Figure 1B). We hypothesize that RNAs that share multiple MREs will cross talk effectively. Importantly, we predict that this “RNA language” can be used to functionalize the entire mRNA dimension through the identification of cross-talking ceRNAs, as well as ceRNA networks.
Besides attributing a new, global function for all the non-coding RNAs, the ceRNA hypothesis challenges the notion that a protein-coding gene must be translated into a protein to exert function. We propose that mRNAs may also possess an additional and predictable function through their ability to regulate other mRNAs. Moreover, the non-coding function of mRNA may be consistent with the coding function, but the two functions could also be incoherent or even opposite in effect, thereby creating built-in regulatory loops, functional complexity, and diversification, in both physiological and pathological conditions.
Furthermore, the ceRNA hypothesis may explain the regulatory function of 3’UTRs (Rastinejad and Blau, 1993; Rastinejad et al., 1993). Besides acting as cis regulatory elements that alter the stability of their own transcripts, 3’UTRs may also act in trans to modulate gene expression through microRNA binding (Figure 1C). This is particularly relevant given the recent identification of 3’UTRs expressed separately from the associated protein-coding sequences to which they are normally linked (Mercer et al., 2010). In addition, we are proposing that all types of RNAs may compete with each other for microRNAs, generating large-scale trans-regulatory crosstalk across the transcriptome as a whole.
The ceRNA hypothesis relies on a knowledge of the precise number and location of MREs; “the letters” of the RNA code. Although several target prediction algorithms are successful in identifying some microRNA targets, they commonly fail to predict some important microRNA targets, mainly because the rules of targeting are still not understood (Bartel, 2009; Thomas et al., 2010). We expect that better target prediction algorithms and innovative biochemical techniques will contribute significantly to the definition of the ceRNA language. For example, the high-throughput sequencing of RNAs isolated by crosslinking immunoprecipitation, or HITS-CLIP allows the identification of MREs associated with the RNA-induced silencing complex (RISC)(Thomas et al., 2010).
What cellular conditions must exist for ceRNA network to occur? First, the relative concentration of the ceRNAs and their microRNAs is clearly important. Changes in the ceRNA expression levels need to be large enough to either overcome or relieve the microRNA repression on competing ceRNAs. This is exemplified by RNA transcripts “switched” on or off at the transcriptional level in different developmental stages or physiological/pathological conditions. Similarly, the expression levels of the sequestered microRNAs could be neither absent, nor grossly overexpressed, because either condition will abolish competition.
Second, the effectiveness of a ceRNA would depend on the number of the number of microRNAs it can “sponge”. This in turn would depend on the ceRNA’s accessibility to microRNA molecules, which is influenced by its subcellular localization and its interaction with RNA binding proteins. The specific tissue, developmental, or pathological context in which the ceRNA is expressed would also impact its overall influence because not all microRNAs are present everywhere and at all times (Venables et al., 2009). Although a ceRNA network could be built around a single microRNA, we hypothesize that the most robust ceRNA networks would contain transcripts that share multiple MREs targeted by multiple microRNAs. Thus, overall ceRNA networks would also depend on the identity, concentration, and subcellular distribution of the RNA and the microRNA species that are present in a given cell type at a given moment.
Third, not all the MREs on ceRNAs are equal. Although 2 MREs may be predicted to bind the same microRNA, their specific nucleotide composition may be partially different and the effectiveness of each MRE to bind a microRNA is critical for overall ceRNA function.
Similarily, microRNAs are predicted to target tens to hundreds of RNAs, but they do not exert the same degree of repression on all of them; the primary targets are usually few, while the rest are finely tuned (Bartel and Chen, 2004; Seitz, 2009). It is conceivable that, if a given microRNA is sequestered by a ceRNA, the primary targets of that microRNA would be preferentially affected.
More recently, our work demonstrated experimentally that indeed a non-coding pseudogene can bind to and compete for the same collection of microRNAs as its ancestral gene (Poliseno et al., 2010). Specifically, we found that many MREs in the tumor suppressor gene PTEN are conserved in its related pseudogene PTENP1, and overexpression of the PTENP1 3’UTR increased levels of PTEN and growth inhibition in a DICER-dependent manner. Interestingly, copy number losses at the PTENP1 locus in sporadic colon cancer suggest that PTENP1 could be considered a tumor suppressor gene (Poliseno et al. 2010).
We and other laboratories have extended this analysis to other gene-pseudogene partners (e.g. KRAS and its pseudogene KRAS1P) and protein-coding mRNAs [e.g. PTEN 3’UTR (Poliseno et al., 2010), versican 3’UTR (Lee et al., 2010; Lee et al., 2009), CD44 3’UTR (Jeyapalan et al., 2010)]. Overall, these findings suggest that 3’UTRs from both pseudogenes and coding genes may possess powerful biological activity through their ability to act as endogenous decoys for microRNAs.
Approximately three years before the identifications of these endogenous decoys, numerous studies found that exogenously expressed “microRNA sponges,” were able to inhibit microRNA function specifically and effectively (Ebert et al., 2007; Brown et al., 2007; Gentner et al., 2009). MicroRNA sponges are artificial transcripts that contain multiple copies of a single MRE in tandem. They are often cloned into viral vectors so that they can be expressed at high levels (Ebert and Sharp, 2010). The applications for sponging constructs are exciting, and perhaps the future of RNA-based therapeutic modalities (Brown et al., 2007; Gentner et al., 2009). Analogously, we propose that ceRNAs are “endogenous sponges” that are able to impact the distribution of microRNA molecules on all their targets. Unlike artificial sponges, ceRNAs contain MREs for a combination of different microRNAs, thus they can impact the multiple targets of multiple microRNAs.
In addition to pseudogenes, other examples of ceRNA have been reported recently. Franco-Zorrilla and colleagues demonstrated that the non-coding RNA IPS1 in Arabidopsis thaliana sequesters miR-399 by mimicking its target site, a phenomenon called “target mimicry” (Franco-Zorrilla et al., 2007). Analogously, a noncoding-RNA in herpesvirus saimiri RNA has been shown to bind to and cause the degradation of human miR-27 to possibly produce a permissive cellular environment for viral infection and transformation (Cazalla et al., 2010). Furthermore, highly up-regulated in liver cancer (HULC) lncRNA sequestered endogenous miR-372 to modulate its own transcriptional upregulation in HCC (Wang et al., 2010). Notably, all the endogenous sponges reported thus far do not encode for proteins.
Although our hypothesis applies to both protein-coding and non-coding RNAs, we speculate that non-coding ceRNAs may be highly effective inhibitors precisely because they are devoted to microRNA binding, without any interference from active translation (Gu et al., 2009). Further development and widespread use of HITS-CLIP and other related techniques will ultimately reveal the full extent of pseudogene and lncRNA regulation by microRNAs, and consequently their respective impact and positioning within ceRNA networks.
The ceRNA hypothesis is strongly supported by the fact that a single microRNA’s effectiveness is influenced by the concentration of its target mRNAs (Arvey et al., 2010). MicroRNAs that have a larger repertoire of target genes may down-regulate each individual target gene to a lesser extent than microRNAs with fewer targets. By the same token, when a given mRNA is upregulated, the repression conferred by its targeting microRNAs would be diluted because the total number of MREs exceeds that of the microRNAs themselves (Figure 2). Thus, altering the expression levels of an individual ceRNA would have repercussions on other ceRNAs with which it shares MREs.
In principle, almost any RNA molecule that possesses at least one MRE accessible to microRNA binding could act as a ceRNA. Therefore, to characterize the ceRNA networks requires the accurate identification of MREs within RNA molecules. Indeed, we speculate that this type of analysis could uncover molecular interactions and gene regulatory networks that have been missed by proteomic and conventional genomic methods. In this framework, aberrant expression of coding and non-coding genes should be systematically studied in the context of human disease.
Pseudogenes are a compelling example of ceRNA because they likely posses many (if not all) of the same MREs that are harbored on their ancestral genes and thus can act as “perfect sponges.” However, the ability of pseudogenes to regulate the biology of a cell may go beyond the modulation of the levels of their ancestral genes. For instance, PTENP1 is biologically active even in a PTEN null context, as it alters the microRNA network normally regulating PTEN (Poliseno et al., 2010). Moreover, genes such as OCT4, NPM1, and many ribosomal protein pseudogenes often have numerous differentially regulated pseudogenes (Balasubramanian et al., 2009), indicating that, gene-pseudogene networks can become extensive and intricately dynamic.
In the context of cancer, a straightforward implication of our hypothesis is that pseudogenes and lncRNAs should now be systematically studied as potential tumor suppressors and oncogenes through their ceRNA function. Accordingly, the notion of endogenous lncRNA sponges was recently linked to the progression of liver cancer. It was reported that the lncRNA HULC is one of the most upregulated of all genes in hepatocellular carcinoma (Panzitt et al., 2007). Wang and colleagues identified that CREB (cAMP response element binding protein) is involved in the upregulation of HULC (Wang et al., 2010). They also demonstrated that HULC RNA inhibits miR-372 activity through a ceRNA function. This in turn leads to derepression of one of its target genes, PRKACB, which can then induce the phosphorylation and activation of CREB. Overall, HULC lncRNA is part of a self-amplifying autoregulatory loop in which it sponges miR-372 to activate CREB, and in turn upregulates its own levels.
Gross genomic losses and amplifications commonly observed in cancer could have potentially dramatic consequences for the ceRNAs contained in those regions. Moreover, under the ceRNA hypothesis, gene loss events should be clearly distinguished from point mutations that abolish protein function but retain full ceRNA function.
If the ceRNA hypothesis proves correct, then one would need to consider the repercussions of knocking out and overexpressing ceRNAs when modeling diseases in mice. For instance, when generating knockout mice, one must consider whether only the transcript or also the protein expression is disrupted. Many experimental techniques normally neglect UTRs and limit functional studies to gene coding regions. For example, when generating transgenic mice, it has been standard to only overexpress coding sequences, but not UTRs. However, binding sites for microRNAs could occur in 3’UTRs, 5’UTRs, and coding regions (Tay et al., 2008), suggesting that the entire transcript may possess an inherent trans regulatory function. Thus, by limiting their focus or scope to coding region, many conventional tools and techniques may have been neglecting the full function of the gene.
Chromosomal translocation events and recurrent “readthrough” transcripts are common in cancers. For example, the t(15;17) translocation which generates PML-RARα and RARa-PML fusion transcripts, is often seen in Acute Promyelocytic Leukemia, whereas the or “readthrough” transcript CDK2-RAB5B is common in melanoma (Berger et al., 2010; Scaglioni and Pandolfi, 2007). Such events could be considered “UTR-swaps,” leading to perturbed MRE levels due to misplacement and consequent altered expression of UTRs (Figure 3). ceRNA perturbation could also possibly occur as a consequence of somatic genomic rearrangements affecting non-coding regions, which are emerging as hitherto unappreciated events in many cancers (Stephens et al., 2009)
Aberrant alternative splicing events could also introduce new RNA sequences and potentially new MREs into the cell. Because splicing can be perturbed in disease and cancer (Venables et al., 2009), the associated perturbation of the ceRNA network may also contribute to pathologies. Similarly the shortening of 3’UTRs as observed in human cancer cells (Mayr and Bartel, 2009) would not only impact microRNA-dependent mRNA regulation, but on the flipside, could also alter the capacity of a given mRNA transcript to “sponge” or titrate away microRNAs.
All these described events have a single commonality; they represent perturbations in the expression levels of a given transcript (and consequentially MREs), irrespective of whether or not the transcript is translated into a protein. Thus, it will be interesting to determine if elevated or depressed levels of a given transcript could exert oncogenic activities by altering competition for miRNAs.
In conclusion, we hypothesize that cross-talk between RNAs, both coding and non-coding, through MREs forms large-scale regulatory network across the transcriptome. This ceRNA activity could offer answers to evolutionary questions, as it may, in part explain the correlation of genome size and organism complexity. Moreover, perturbations of ceRNA and ceRNA networks could have consequences for diseases, but on the flip side, it may explain disease processes and present opportunities for new therapies. Although the understanding of this field and its consequences are in their infancy, we believe that experimental tools are now poised to fully identify microRNA binding sites and cataloguing the basic lexicon of the ceRNA “language.” We envision that the ceRNA language will allow us to predict and manipulate regulatory networks working through microRNA competition. Future challenges will be then to understand why such regulatory networks exist, how they may have evolved, and what the consequences are when they are perturbed. Only then, we will able to fully decipher the “Rosetta Stone” of this hidden RNA language.
We thank all Pandolfi lab members for critical discussions. L.S. was supported by fellowships from the Canadian Institutes of Health Research and the Human Frontier Science Program. L.P. was supported by fellowships from the Istituto Toscano Tumori and the American Italian Cancer Foundation. Y.T. is supported by a Special Fellow Award from The Leukemia & Lymphoma Society. L.K. is supported by a NHMRC Overseas Biomedical Postdoctoral Fellowship. This work was supported by NIH grant R01 CA-82328-09 awarded to P.P.P.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.