|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Alu elements are the most abundant repetitive elements in the human genome; they emerged 65 million years ago from a 5′ to 3′ fusion of the 7SL RNA gene and amplified throughout the human genome by retrotransposition to reach the present number of more than one million copies. Over the last years, several lines of evidence demonstrated that these elements modulate gene expression at the post-transcriptional level in at least three independent manners. They have been shown to be involved in alternative splicing, RNA editing and translation regulation. These findings highlight how the genome adapted to these repetitive elements by assigning them important functions in regulation of gene expression. Alu elements should therefore be considered as a large reservoir of potential regulatory functions that have been actively participating in primate evolution.
The initial sequencing of the human genome revealed that 55% of its nucleotide sequence is composed of repetitive elements (1). Among different families of repetitive elements, Alu elements are the most abundant in the human genome. They are present in more than one million copies, which altogether represent 10% of the whole genome mass. Alu belong to the SINE family (Short Interspersed Nuclear Elements) of repetitive elements; they emerged 55 millions years ago with the radiation of primates by a fusion of the 5′ and 3′ ends of the 7SL RNA gene, which encodes the RNA moiety of the signal recognition particle (SRP). The first fossil Alu monomers (FAMs) arose from this fusion (2); they were ~160 bp long and are poorly represented in the human genome (2). According to the current model, modern Alu elements emerged from a head to tail fusion of two distinct FAMs (3) that gave rise to a dimeric structure composed of two similar but distinct monomers (left and right arms) joined by an A-rich linker (Figure 1). Modern Alu elements are ~300 bp in length and are classified into subfamilies according to their relative ages [for a review see Ref. (4)]. Dimeric Alu elements are unique to primates; they amplified throughout the primate genomes via RNA intermediates by a mechanism of retrotransposition that remains to be elucidated. Their amplification has been dependent on the transposition machinery of other retrotransposons, since they do not encode any protein. It has recently been shown that they could use LINE-1 (long interspersed nuclear elements) elements for this purpose (5).
Alu elements, as well as other repetitive elements, were at the origin considered as parasites of the genome that had no major effect on its stability and genic expression. They were thought to be ‘selfish’ or ‘junk’ DNA (6,7), but nowadays, several lines of evidence show that the presence of repetitive elements and especially of Alu elements, had a great influence on the human genome, in particular on its evolution. These effects were both negative and positive. On one hand, integration into genic regions that caused gene inactivation might often have been deleterious for the organism. On the other hand, because of their extended sequence homology, Alu elements induced a considerable number of non-allelic recombinations that lead to both duplications and deletions of DNA segments, thereby accelerating evolution by several orders of magnitude. Another function frequently attributed to Alu elements is their ability to provide new regulatory elements to neighboring genes. It was, indeed, reported several times that Alu elements became effectors of gene transcription by providing new enhancers, promoters and polyadenylation signals to many genes [for reviews see Ref. (8,9)].
In this review, we focuse on more recently discovered effects of Alu elements on gene expression, at the post-transcriptional level. We describe their influence on mRNA splicing, on RNA editing and on protein translation.
Alternative splicing is a mechanism by which the use of alternative splice sites in pre-mRNAs generates multiple variants of proteins from a single gene. This type of variation in splicing patterns is a major source of protein diversity in the genome and it has been estimated that 30–60% of all human genes produce alternative exons (10). This mechanism could expand the proteome diversity by several orders of magnitude and, in an extreme example, the Drosophila axon guidance receptor gene, Dscam, may potentially generate 38000 DSCAM isoforms by alternative splicing (11). There are several ways by which a gene can acquire new alternative splice sites; one of them is the mutation of pre-existing intronic sequences that result in the recruitment of intronic sequences into coding regions of mRNAs. This process is called exonization.
Of the thousands of Alu elements that are found in introns of the human genome, a certain number of complete or partial Alu sequences are also present in the coding regions of mature mRNAs (12). The presence of several potential splice sites in the Alu consensus sequence (13,14) strongly suggested that they were recruited in the coding region through exonization (Figure 2). Recently, genome-scale computational studies confirmed this hypothesis. By comparing the human genome to cDNA and EST databases (14), Sorek and colleagues identified a subset of alternatively spliced internal exons of which 5% were derived from Alu elements. As the same analysis performed on constitutive exons failed to identify any Alu element, they concluded that all Alu containing exons are alternatively spliced.
The Alu consensus contains 9 potential 5′ splice sites and 14 potential 3′ splice sites (14). Along the consensus, 19 out of 23 potential splice sites are present on the minus strand; this means that an Alu element has more chances to be exonized when inserted in the sense that opposes the one of transcription. It was indeed confirmed by the same study that 85% of Alu containing exons derive from antisense Alu elements (14).
Sequence comparison of Alu containing exons revealed that all potential splice sites were not used at the same frequency (14); the most favored sites in the antisense Alu consensus were positions 275 and 279 used as 3′ splice site (referred as proximal and distal splice sites, respectively) and position 158 used as 5′ splice site. Further sequence comparison of nucleotides surrounding the alternative 3′ splice sites helped to determine mutations in the Alu consensus that govern the selection of this site. The results of sequence analysis were confirmed in vivo by experiments using an ADAR2 minigene. The ADAR2 gene is known to contain an alternative Alu exon that adds 40 amino acids to the open reading frame (ORF) of the protein (15). By site-directed mutagenesis of the ADAR2 minigene, Lev-Maor et al. (16) discovered a delicate interplay between the proximal and distal 3′ splice sites; when the G immediately upstream the distal site is mutated to any other nucleotide, the Alu exon becomes constitutively spliced and uses the proximal 3′ splice site. A similar computational and mutational approach determined the minimal base substitutions required in an antisense intronic Alu element to form a 5′ splice site at position 158. The base compositions at positions 2 and 5 of the intron were found to be critical for maintaining a fine balance of base pairing between the 5′ splice site and U1 snRNA, which determines the level of alternative splicing (17).
Most striking was the fact that single mutations could turn the splicing from alternative to constitutive (16,17). Several cases are known, such as the Alport (18) and the Sly syndromes (19), where constitutive inclusions of an Alu exon lead to genetic diseases. More recently, it was discovered that insertion of an alternative Alu exon could also lead to a genetic disease. A mutation in the intron 6 of the CTDP1 gene, which creates an alternatively spliced Alu exon, results in CCFDN (congenital cataracts, facial dysmorphism and neuropathy) syndrome (20).
Although, these studies provided the molecular bases of Alu elements exonization, the regulation of Alu exonization might be even more complex. Recent work on the intron 16 of the human angiotensin converting enzyme gene revealed that exonized antisense Alu elements contain several auxillary splicing enhancer sequences that facilitate their exonization (21). Similarly, it was determined by the same group that Alu elements inserted in sense orientation contain strong splicing silencer sequences that repress their exonization (22). The presence of such enhancers may provide an alternative explanation for the observation that 85% of exonized Alu elements are inserted in antisense orientation (14).
The evolutive process leading to exonization of a partial or complete Alu element is a complex series of successive mutations. Insertion of an Alu exon is likely to introduce a premature termination codon or a frame shift. Several mutations are therefore required to create an Alu exon. The precise scenario of sequential mutations that lead to exonization of an Alu element has recently been elucidated for five different loci by careful PCR analyses on different species among the primates lineage (23,24). For example, in the TNF receptor gene icp75TNRF, the alternative replacement of the first exon by an Alu exon, which lies upstream of exon 1, required three successive mutations. After integration of the Alu elements (58–40 mya ago), an A to G transition gave birth to a new ATG initiator codon before the divergence of platyrrhines. Following the New World Monkeys divergence (40–25 mya),a C to T transition then created a new 3′ splice site and a subsequent 7 nt deletion enabled the ORF by disabling a stop codon (23).
Taken altogether, these data showed that the million Alu elements present in the human genome could act as a very large reservoir of alternative exons. As all identified Alu exons were shown to be alternatively spliced (14), a strong selective pressure must exist to avoid the loss of the original form of the protein. As Alu exonization is prone to introduce premature termination by frame-shifts, the loss of the natural polypeptide would most likely be deleterious for the organism but these alternative Alu exons might have played an important role in the evolution of primates. For example, humans and mice almost have all of their respective genes in common. However, it has been estimated that although they share 70% of their constitutive splice sites, they only share 15% of their alternative splice sites (25). This difference in alternative splicing creates a species-specific pool of alternative exons and might therefore account for a significant part of the morphological and physiological differences between mice and humans.
RNA editing is a process by which the nucleotide sequence of RNA molecules is changed co- or post-transcriptionally. The modifications in the RNA include nucleotide insertions, deletions or base modifications. Among these modifications, base conversions appear to be the major type of editing. The best-characterized base conversions are hydrolytic deamination reactions by which cytosine are converted to uracyl and adenosine (A) to inosine (I). The A–I editing reaction is catalyzed in vivo by members of the adenosine deaminase acting on RNA (ADAR) family of enzymes (Figure 3A), which preferentially edit adenosines located in double-stranded regions of RNA molecules [for reviews see Ref. (26,27)]. The precise role of A–I editing in cell metabolism is still unclear but it has been shown that it is required for normal life; the knockout ADAR1 is embryonically lethal by liver disintegration (28), while ADAR2−/− mice die young and are prone to seizures (29). Until recently, very few positions edited by ADAR were known in the human transcriptome. This was in contrast to the apparent mass of inosine estimated to one molecule per 17 000 bases in rat brain tissue and one molecule per 33 000 bases in heart tissue (30).
The missing mass of inosine in the human transcriptome has recently been localized within Alu elements by three independent groups using genome-scale computational searches for A–I editing sites. These three groups identified editing sites by aligning mRNAs (EST and cDNA databases) to the human genome sequence and detecting A–G substitutions. As I is read as G by sequencing, the presence of A–G substitutions between genomic DNA and mRNA reflects the presence of an inosine edited site. To get rid of single nucleotide polymorphisms and sequencing errors, the three groups designed elaborated in silico filters allowing to select only A–I substitutions. Datasets of mRNAs and filters used are the major differences between these three studies. Using this approach, Kim et al. (31) identified 30 085 substitutions in 2674 different transcripts, Levanon et al. (32) identified 12 723 substitutions in 1637 different transcripts, and Athanasiadis et al. (33) found 14 500 substitutions in 1445 mRNAs.
Together, these three studies demonstrated that A–I editing is a widespread mechanism and, more surprisingly, that >90% of all A–I substitutions occur within Alu elements contained in mRNAs. Further investigations revealed that 54% of all identified editing events occurred in the 3′-untranslated region (3′-UTR) of mRNAs, 12% occurred in 5′-UTRs and 33% in introns (32); this clearly demonstrated that A–I editing preferentially occurs in non-coding regions of mRNAs. In order to assess the accuracy of the in silico analysis, 26 previously unknown editing sites were confirmed experimentally (32).
ADAR enzymes have no strict sequence requirements at the editing sites; a double-stranded RNA region seems to be sufficient. By comparing edited mRNAs, hot spots of editing along the Alu consensus sequence were identified. Adenosines 27, 28, 136 and 162 of the Alu consensus were more prone to editing than the others (32). Neighboring bases might also influence the editing frequency; T is over-represented at the position preceding the edited adenosine, whereas G is under-represented at the same position (31).
At first, the preferential editing of Alu sequences inside mRNAs might have been attributed to the secondary structure of Alu RNA that contains long double-stranded regions (34) (Figure 4). However, Athanasiadis et al. (33) determined an interesting correlation between adenosine editing inside an Alu element and the presence of an inverted Alu element in close proximity. They demonstrated that editing is favored when a distance <2 kb separates two Alu elements in opposite orientations. These data defined a model in which two closely inserted Alu elements base pair and become an ideal substrate for ADAR (Figure 3B). This model was recently confirmed by the study of the editing patterns of cyclin M3 intron 2 and NFκB1 intron 16 showing that the base pairing between two Alu elements occurs intramolecularly, and not intermolecularly (35), and therefore confirmed this model.
Interestingly, the three genome-scale studies failed to detect any known editing sites in the coding region of glutamate (36) and serotonin receptor (37) mRNAs; this might be explained by the high stringency of the in silico filters used. Moreover, it strongly suggests that not all editing sites of the human transcriptome have been uncovered.
Similar sequence comparisons carried out in mouse and fly transcriptomes revealed that the high levels of editing are specific to primates (38). As Alu elements exist only in primates, the correlation between their presence and abundant A–I editing is striking. This logically led to the speculation that appearance of Alu elements in the primate lineage lead to widespread editing, which most likely played a role in primate evolution (38).
In summary, Alu elements play an important role in editing of the human transcriptome by providing ideal templates to the ADAR family of enzymes. The large number of Alu elements present in mRNAs and their relatively low divergence explains why they are more prone to be edited than other sequences, and why widespread editing is specific to primates (39). Although the precise role of RNA editing is still speculative, it might affect gene expression at several steps. As inosine does not base pair with uracyl but with cytosine, editing might influence the stability of RNA molecules by creating and disrupting secondary structures. At another level, as inosine is recognized as guanosine by the translation and splicing machineries, A–I editing could lead to amino acid substitutions in the coding sequence, or to modification of splice sites in introns that could induce premature termination or frame-shifts. Knowing that aberrant editing is found in several neurological disorders (40,41), it is highly probable that the phenomenon is of great physiological importance.
It has been known for a long time that Alu RNAs, transcribed from Alu elements, are present in the cytosol of primate cells. As mentioned previously, Alu elements contain the internal A and B boxes of the RNA polymerase III promoter from the 7SL RNA gene (Figure 1). These internal promoter elements significantly diverge from the consensus (42) and are too weak to drive efficient transcription of Alu elements, which is then dependent on sequences flanking their site of insertion (43). In normal growth conditions, Alu RNAs are present at very low levels in the cytosol (103–104 molecules per cell) but numerous stress conditions, such as viral infection, cycloheximide exposure or heat shock, transiently increase their level of expression (44), which rapidly decreases upon recovery. This precisely controlled regulation raised the attractive possibility that Alu RNAs may serve a specific function in cell metabolism, which is required during stress conditions. This hypothesis was supported by two independent studies showing that an overexpressed Alu RNA was able to stimulate translation of a co-transfected reporter gene in mammalian cells (45,46). These data suggested for the first time that transcribed Alu elements may serve a specific function in translation regulation. Alu RNAs were initially thought to act as inhibitors of the double-stranded RNA-dependent protein kinase, PKR (45). They were proposed to bind PKR and prevent its autophosphorylation; this, in turn, would avoid eIF2α phosphorylation that would itself inhibit translation initiation. By preventing inhibition of translation initiation, Alu RNA would then behave as translational activators. However, it has been discovered that Alu RNAs can activate PKR when they are present at low concentrations and can inhibit it at higher concentrations (47). Knowing that Alu RNA stimulates translation of reporter genes in PKR knockout cells (46), the involvement of PKR in this mechanism is questionable, even though it cannot be excluded.
Alu RNAs transcribed from Alu elements are highly structured RNAs that maintained strong structural similarities with their ancestor, SRP RNA. The typical Alu RNA is a dimer of related but non-equivalent arms that are joined by an A-rich linker and followed by a short poly(A) stretch (Figure 4). Each arm is related to the Alu domain of SRP RNA in terms of sequence and secondary structure and can bind the cognate SRP protein SRP9/14 in vitro (48) and in vivo (49). However, the left arm shows a higher affinity for these proteins than the right one (48). Recent results from our group showed that synthetic Alu RNPs composed of Alu RNA in complex with recombinant SRP9/14 have a different effect on protein translation than naked Alu RNA. While Alu RNA stimulates the translation of all reporter mRNAs in a cell free translation system, Alu RNP acts as a general inhibitor of protein translation (50). Such opposite activities of Alu RNP and Alu RNA were at first quite surprising; however, these data could easily be explained by conformational changes of Alu RNA upon SRP9/14 binding. It was previously observed in SRP that the RNA is in a loosely folded state in the absence of SRP9/14 while it assumes a very compact structure in its presence (51,52).
As it is known that SRP9/14 are present in a large excess over SRP in mammalian cells (53), the occurrence of Alu RNP in vivo becomes very likely. Moreover, it has been shown that Alu RNA sediments in high molecular weight complex (53,54) and that Alu RNAs expressed in response to adenovirus infection are assembled in SRP9/14-containing RNPs (49). These data make the occurrence of naked Alu RNA in the cytosol questionable; however, they do not exclude a mechanism by which the activity of Alu RNA would be modulated by the binding of SRP9/14.
Further investigation about the mechanism by which Alu RNP and Alu RNA influence protein translation showed that both of them act at the level of translation initiation (50). These results unexpectedly demonstrated that despite strong structural similarities, Alu RNP and SRP influence translation in very different ways; while SRP mediates a transient delay in translation by blocking the elongation step, Alu RNP inhibits translation by reducing initiation.
Alu elements are also frequently found in UTR of several transcripts (55,56). They are transcribed by RNA polymerase II as part of mRNAs. As 5′- and 3′-UTRs are hot spots of regulation of translation initiation, these elements are suitably located to modulate translation initiation. Several independent studies described a role for Alu elements present in UTRs in regulating translation initiation. For example, it was suggested that an Alu element in the 5′-UTR of human growth hormone receptor (hGRH) could regulate the translation of this mRNA (57). Similarly, an antisense Alu element in the 3′-UTR of the manganese superoxide dismutase (MnSOD) acts as a translation inhibitor (58). In the case of ZNF177 mRNA, a zinc finger protein of unknown function, an inverted partial Alu element in the 5′-UTR has been shown to strongly decrease the translation efficiency of the mRNA (56). The best-characterized example of translation regulation by an Alu element in the UTR is BRCA1. BRCA1 is a DNA repair protein whose mutation is associated with breast cancer. The 80 kb genomic sequence of this gene is composed at 40% of Alu elements (59). BRCA1 mRNA exists in two forms that differ in their leader sequences and in their patterns of expression (60). These two transcripts are formed by selective use of different promoters (61). The isoform with a short 5′-UTR is expressed in normal and cancerous mammary tissue whereas the isoform with a longer 5′-UTR is expressed only in breast cancer tissue (62). The latter mRNA is much less efficiently translated than the other one and this translational defect has been shown to be due to an Alu element in the 5′-UTR of this transcript (62). This Alu element has a 60 nt deletion in the left arm but the right one is intact and forms the stable secondary structure that partially prevents translation initiation. Deregulation of BRCA1 transcription in cancer then results in a higher proportion of translationally inhibited mRNA, which contributes to a decrease in the BRCA1 protein level leading to accumulation of defects and mutations, and ultimately to cancer.
Altogether, these data show that Alu elements modulate protein translation in at least two different manners; they can act as trans regulatory factors when transcribed by Pol.III and assembled in Alu RNP, and act as cis regulatory elements when transcribed by Pol.II in 5′- and 3′-UTRs. Unpublished data from our group show that Alu elements in UTRs of some mRNAs are able to bind SRP9/14 in vitro. This observation suggests new potential roles for Alu elements in UTRs, for example a role in stabilization of the mRNA by SRP9/14 binding. The impact of Alu elements on protein translation is most likely only partially uncovered and further work will be required fully to understand their impact on gene expression.
Together with the fact that Alu elements have a high potential to modulate gene transcription by binding several transcription factors (63), the findings reported here demonstrate that ‘junk RNA’, transcribed from Alu elements, is useful for many purposes in cell metabolism. Alu elements were probably no more than selfish DNA at the origin, but the genome clearly adapted to their presence by assigning them some important function in regulation of gene expression. This gain of regulatory function, known as exaptation (64), most likely participated in primate evolution and probably helped in their divergence from other mammals. Alu elements should then rather be considered as a huge reservoir of potential regulatory functions that are expressed or not, at the mercy of point mutations occurring randomly over the time, as detailed previously for alternative splice site formation (23,24).
Other families of retrotransposons were already suggested to affect gene expression. LINE-1, for example, have been shown to influence gene transcription and to introduce polyadenylation signals in several reports [reviewed in Refs. (65,66)]. However, the effects of LINE-1 elements on gene expression have been less studied, and roles, such as described for Alu elements are still speculative.
We believe that the amplitude of the ‘Alu phenomenon’ in both human genome and transcriptome has been only partially uncovered. Several other potential functions of Alu elements might be unsuspected yet; it was indeed recently reported that Alu elements in 3′-UTRs of mRNAs are probable microRNA targets (67).
Funding to pay the Open Access publication charges for this article was provided by the Canton of Geneva.
Conflict of interest statement. None declared.