Gene expression is finely regulated to ensure that the correct complement of RNA and proteins is present in the correct cell at the correct time. Owing to its diversity—in sequence and structure—RNA plays critical roles in cell biology, and is regulated by numerous proteins that modulate its content and spatial-temporal expression. Methodological advances, including bioinformatic, microarray-based, biochemical and deep sequencing studies, are producing new insights into the role that regulation of RNA complexity—the sum of the unique isoforms of RNA in a cell, including mRNA variants, non-coding RNAs and microRNAs (miRNAs)—plays in generating organismal complexity from a relatively small number of genes. Here we review this progress, focusing on mRNAs and the ways in which the technological advances are beginning to revolutionize our ability to understand the mechanisms and consequences of mRNA diversification.
The recognition of RNA regulation as a central point in gene expression and the generation of phenotypic complexity
1 began with new methodologies and biological insights developed in the 1970s–1980s. Nascent transcripts were found to be generated as long heterogeneous nuclear RNAs (hnRNAs)
2,3 (now termed pre-mRNA) that serve as precursors for smaller 5′ capped and 3′ polyadenylated mRNAs that are then exported to the cytoplasm. Insights into the mechanism by which pre-mRNA is processed to mature mRNA resulted from methodologic advances - including S1 nuclease mapping
4 and electron microscopy to visualize R-loops of adenovirus mRNA:DNA hybrids
5,6 - that enabled nucleotide-level examination of the precursor-product relationship of adenoviral transcripts. These efforts revealed that adenoviral mRNA has “an amazing sequence arrangement”
6 such that processing of pre-mRNA to mature mRNA involves the intra-molecular joining (splicing) of expressed sequences (exons) that are separated by non-coding intervening sequences (introns)
7 in the primary transcript (). This was quickly recognized as a general feature of eukaryotic RNA processing
8,9.
The discovery of splicing led to the realization that RNA has the potential to be more complex than DNA
7,10. This potential was demonstrated by the finding, first in adenovirus
11 and subsequently in eukaryotic cells during cell differentiation
12 and in different tissues
13, that alternative mRNA products could be generated from a single pre-mRNA precursor in a regulated manner. In this way regulation of alternative splicing and polyadenylation enables a single mammalian gene to encode multiple mRNAs that possess distinct coding and regulatory sequences.
A more recent epoch in understanding RNA complexity was ushered in with the ability to sequence complete genomes, and the concomitant realization that humans and worms have roughly the same number of protein coding genes (and, more recently, that human and chimpanzee genomic coding regions are 99.7% identical)
14. These observations, together with the development of the RNA World hypothesis
15, 16, led to a new concept that is explored in this review. This concept is that biological complexity—the variation in cell type and function—has RNA complexity at its core. In this view, it is the intricate unfolding of the genetic information in DNA into diverse RNA species - mediated by RNA-protein interactions - that leads to biological variation not evident from analysis of DNA sequence alone.
The known roles of RNA in the cell have expanded from it being a machine and template for protein synthesis to a regulatory hub for post-transcriptional control with emerging, and still incompletely understoood roles as a
trans-acting factor that is capable of regulating expression of genetic information. For example, miRNAs
17, piRNAs
18 and long non-coding RNAs
19,20 act to direct different RNA binding proteins (RNABPs) to their regulatory targets in order to suppress translation
21, provide protection from transposable elements
18, and mediate epigenetic changes
1,22,23, respectively. Adding to its versatility, RNA transcripts are diversified from the point of transcription onwards through the action of a plethora of mechanisms, including alternative transcription initiation
24–26, alternative splicing
27–29, alternative polyadenylation
30, RNA editing
31, and post-transcriptional modification (pseudouridylation
32, methylation
33, and non-canonical polyadenylation and RNA terminal polyuridylation
34,35). Once generated, mature RNA isoforms are subject to many levels of regulation that include the regulation of translation by miRNAs
21 and regulatory factors
36, the use of alternative translational start sites
37, RNA localization
38, and mRNA stability and turnover
39,40.
RNA regulation is achieved through the concerted action of multiple RNABPs
41 that bind to ‘core’ and ‘auxiliary’ elements, which are required for and modulate pre-mRNA processing events, respectively (
Box 1). Core splicing elements demarcate exons and the sequences required for their splicing, and auxiliary splicing elements, which are located in introns and/or exons, bind factors that enhance or inhibit splicing. Similarly, mRNA 3′ end maturation also depends on the presence of core and auxiliary elements that define the site of transcript cleavage and polyadenylation
42,43. The identification of alternative polyadenylation sites in the majority of human genes and evidence for tissue-specific biases in alternative polyadenylation
8,44–46, suggests that regulation of alternative polyadenylation through auxiliary control might be a common mechanism to diversify the transcriptome.
BOX 1. Alternative splicing and polyadenylationCore elements necessary for pre-mRNA splicing include the 5′ and 3′ splice sites (SS), a branch point sequence (BP) upstream of the 3′SS, and a polypyrimidine-rich tract (PPT) between the BP and the 3′ SS. All of these elements are bound by components of the spliceosome, which is a dynamic macromolecular complex that consists of snRNAs and ~170 proteins
29. Auxiliary sequences are variable in number and location - they can be located in exons and in the flanking intronic sequences - and are bound by factors that generally function to either enhance or inhibit basal splicing activity. The combinatorial actions of both core and auxiliary splicing factors participate in the regulation of alternative splicing. For example, the SR proteins comprise a family of auxiliary RNABPs that bind to splicing enhancer elements to facilitate exon identification and promote splicing (although like most RNABPs they are also able to serve other functions in the cell). In contrast, the binding of auxiliary hnRNP proteins to splicing silencer elements has a negative effect on exon inclusion; in many cases they antagonize the “pro-splicing” activity of SR proteins. Interestingly, the levels of some core snRNPs vary between tissues
128, and such variations might contribute to splicing regulation
41. Core elements necessary for maturation of the 3′ end of an mRNA include a poly(A) signal (an adenylate-rich hexameric sequence, most often AAUAAA) and a U/GU-rich sequence, which are positioned upstream and downstream of the poly(A) site respectively. These elements direct the endonucleolytic cleavage and polyadenylation of the transcript. Although a number of auxiliary elements that affect the use of poly(A) sites have been identified
43, the extent to which these elements regulate alternative poly(A) site use remains unclear.
Current interest relating to RNA complexity has three main aspects: meeting methodological challenges so that the vast amount of information present in RNA can be collated; analysis of these data sets so that new rules of RNA regulation can be detailed; and application of the new insights, to achieve a basic understanding of cellular control and, ultimately, an understanding of gene dysregulation in human disease. This review will discuss each of these points - methodology, RNA analysis and, more briefly, its biological manifestations – in each case focusing on the control of RNA complexity. Although this review touches on many aspects of RNA function, including links to transcriptional and translational regulation, space does not allow a discussion of these issues, which are discussed in several excellent reviews
19,24,36,41,47–50.