|Home | About | Journals | Submit | Contact Us | Français|
Nucleotide-repeat expansions underlie a heterogeneous group of neurodegenerative and neuromuscular disorders for which there are currently no effective therapies. Recently, it was discovered that such repetitive RNA motifs can support translation initiation in the absence of an AUG start codon across a wide variety of sequence contexts, and that the products of these atypical translation initiation events contribute to neuronal toxicity. This review examines what we currently know and don’t know about repeat associated non-AUG (RAN) translation in the context of established canonical and non-canonical mechanisms of translation initiation. We highlight recent findings related to RAN translation in three repeat expansion disorders: CGG repeats in fragile X-associated tremor ataxia syndrome (FXTAS), GGGGCC repeats in C9orf72 associated amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) and CAG repeats in Huntington disease. These studies suggest that mechanistic differences may exist for RAN translation dependent on repeat type, repeat reading frame, and the surrounding sequence context, but that for at least some repeats, RAN translation retains a dependence on some of the canonical translational initiation machinery.
Nucleotide-repeat expansions underlie a heterogeneous group of primarily neurological diseases that in aggregate impact a large number of patients (Mason et al., 2014). Repeats can cause problems through a variety of mechanisms delineated over the past 25 years. Expansion of trinucleotide repeats within protein-coding open reading frames (ORFs) cause a gain-of-function toxicity downstream of the production of polyglutamine or (less frequently) polyalanine proteins (Orr and Zoghbi, 2007). This toxicity results from both alterations in the native functions of the protein in which the repeat resides as well as toxicity independent of protein context, related to perturbations in neuronal proteostasis. Repeat expansions located outside of known protein-coding ORFs can elicit changes in the expression of the gene in which they reside, leading to reduced or enhanced expression at the transcript and protein level (He and Todd, 2011). Such non-coding repeats can also elicit toxicity as RNA by binding to and sequestering specific RNA-binding proteins via presentation of a repetitive motif (Mohan et al., 2014).
The discovery of repeat-associated non-AUG (RAN)-initiated translation blurs the lines that define which repeats elicit toxicity via protein gain-of-function and which act through RNA repeat-elicited gain-of-function (Cleary and Ranum, 2014; Kearse and Todd, 2014). This non-canonical translational initiation process enables elongation through a repeat strand in the absence of an AUG initiation codon and in multiple reading frames, producing multiple homopolymeric or dipeptide repeat-containing proteins. Originally described in association with CAG-repeat expansions causative for spinocerebellar ataxia type 8 (SCA8), this process also occurs in association with expansions of CAG, CUG, GGGGCC, GGCCCC, and CGG repeats (Zu et al., 2011; Ash et al., 2013; Gendron et al., 2013; Mori et al., 2013a; Mori et al., 2013b; Todd et al., 2013; Zu et al., 2013; Bañez-Coronel et al., 2015). Repeats can drive RAN translation in a surprising variety of RNA contexts, including within 5’ untranslated regions(UTRs), protein-coding ORFs, or introns and “non-coding” RNAs. The identification of this novel translational initiation event has led to a flurry of activity within the research community, with a significant body of work now demonstrating 1) the presence of RAN-translated peptides across a wide spectrum of neurodegenerative disorders and 2) an association between these short repeat-containing proteins and neuronal toxicity. Despite this interest, the mechanism or mechanisms by which RAN translation occurs remains largely unknown. The focus of this review will be what we know thus far about RAN translation initiation, with particular attention paid to how RAN translation compares to canonical translation and other forms of initiation described over the past 40 years. We hope that revisiting these foundational experiments will shed light on this new disease-relevant process.
Translation initiation is the step-wise assembly of elongation-competent 80S ribosomes at start codons of mRNA. It is a highly complex process, entailing the concerted activity of at least nine eukaryotic initiation factors (eIFs; Jackson et al., 2010). A comprehensive account of the roles of each eIF and each stage of initiation is beyond the scope of this review, but several steps are worth highlighting. In most cases, initiation begins with the recognition of the 5’ methyl-7-guanosine (m7G) cap on mRNA by the eIF4F complex (Fig.1, Step 1; Sonenberg et al., 1978; Sonenberg et al., 1979; Grifo et al., 1983). The eIF4F complex is composed of eIF4E (the direct cap-binding subunit), eIF4G (a scaffolding subunit), eIF4A (a DEAD-box RNA helicase), with eIF4B and eIF4H serving as additional helicase stimulatory factors. eIF4G recognizes the poly-adenosine-binding protein (PABP; Imataka et al., 1998; Kessler and Sachs, 1998), which in turn binds to the 3’ polyA tail on mRNAs. This is thought to result in circularization of the mRNA and greater initiation efficiency (Borman et al., 2000; Kahvejian et al., 2005).
The eIF4F complex, still bound to the m7G cap, is joined by the 43S pre-initiation complex (PIC), composed of the 40S ribosomal subunit, eIF1, eIF1A, eIF3, eIF5, and the ternary complex [in turn composed of methionine-conjugated tRNA (tRNAMet), and eIF2-GTP; Fig.1, Step 2]. This joining of the 43S PIC to the eIF4F complex is mediated by an eIF4G-eIF3 interaction (LeFebvre et al., 2006). Successful translation of most eukaryotic mRNAs is thought to require the RNA helicase activity of eIF4A in order to resolve RNA-RNA secondary structures adjacent to the m7G cap and prepare a “landing pad” for the 43S PIC (Pestova and Kolupaeva, 2002). In the ribosomal scanning model of translation initiation (Kozak, 1978), the 43S PIC and components of the eIF4F complex scan through the 5’ UTR in the 5’ to 3’ direction(Fig. 1, Step 3). This stage is also known to require eIF4A in order to resolve weaker internal secondary structures, though additional helicases assist in melting stronger structures (Jackson, 1991; Chuang et al., 1997; Svitkin et al., 2001; Pisareva et al., 2008; Zhang et al., 2015b). The 43S PIC scans until encountering an AUG codon in a good Kozak context, (A/G)NNAUGG (Fig. 1, Step 4; Kozak, 1984, 1986). At this point, base-pairing between the AUG codon and CAU anti-codon loop on tRNAMet results in the ejection of eIF1, a factor which, along with eIF1A, increases the stringency of AUG start-codon selection (Yoon and Donahue, 1992; Pestova and Kolupaeva, 2002; Unbehaun et al., 2004; Maag et al., 2005; Passmore et al., 2007). eIF2 hydrolyzes its bound GTP with the assistance of eIF5, the associated GTPase-activating protein (GAP; Fig.1, Step 5). At this point, the 40S ribosome is committed to its selection of start codon, and forms a tighter interaction with the substrate mRNA, collectively known as the 48S PIC. In the final stages of initiation, the 40S subunit is joined by the 60S ribosomal subunit (Fig. 1, Step 6), the majority of remaining eIFs are ejected, eIF5B hydrolyzes its bound GTP (Fig. 1, Step 7) and translation elongation begins with formation of the first peptide bond (Dever and Green, 2012 for review).
RNA secondary and tertiary structures contribute significantly to the dynamics and regulation of translation initiation. Structures in the 5’ UTR impact translation initiation both positively and negatively, depending on the structure’s location. When placed upstream of an AUG start codon, highly structured regions are known to inhibit initiation, either by blocking the eIF4E-m7G interaction when located adjacent to the cap, or by impeding 5’-to-3’ translocation of the 43S PIC when located internally in the 5’ UTR (Kozak, 1980, 1986, 1988, 1994). In contrast, secondary structures downstream of start codons facilitate initiation at imperfect start codons: those with poor Kozak context and even non-AUG codons (Kozak, 1990, 1994). Initiation at non-AUG codons occurs at reduced efficiency relative to AUG codons in in vitro translation systems (Peabody, 1987, 1989), but the presence of secondary structures downstream markedly increases initiation efficiency (Kozak, 1989, 1990, 1994).
Recent advances in ribosomal foot printing methodologies suggest that these in vitro findings may reflect common but heretofore unrecognized set of initiation events in vivo. Ribosome profiling combines the traditional aspects of an RNase-protection assay with next generation sequencing to identify the positions of initiating and elongating ribosomes on mRNAs on a transcriptome-wide level (Ingolia et al., 2009). This technique has found evidence for thousands of unpredicted translation initiation events, many of which occur at non-AUG codons (Ingolia et al., 2011; Lee et al., 2012; Gao et al., 2015). This is especially true for upstream open reading frames (uORFs), which are short ORFs upstream of canonical, annotated ORFs in the same mRNA transcript. Many of these uORFs appear to play regulatory roles in translation that are dependent on metabolic conditions and cell-cycle stage (Brar et al., 2012). These findings are now supported by a variety of studies utilizing mass spectroscopy to confirm the presence of these uORF-coded peptides, many of which possess functional roles (Magny et al., 2013; Menschaert et al., 2013; Slavoff et al., 2013; Vanderperre et al., 2013; Chanut-Delalande et al., 2014; Pauli et al., 2014; Anderson et al., 2015). Thus, initiation at non-AUG codons occurs in vivo, may be regulated in part by trans-acting eIFs as well as mRNA-specific cis factors, and appears to play important regulatory roles in global protein translation.
While the majority of eukaryotic mRNAs are likely translated via the canonical mechanism described above, multiple atypical mechanisms exist, and like canonical initiation, atypical modes of initiation are modulated by mRNA secondary structure. Multiple viral and cellular RNAs are translated via an internal ribosome entry site (IRES)-mediated pathway (Fig. 2B; (Lozano and Martínez-Salas, 2015). IRESes are complex RNA structures that directly recruit ribosomal subunits and eIFs to internal sites within RNA transcripts. These are highly heterogeneous structurally and functionally, but the common, central feature is that translation initiation bypasses the eIF4E-m7G interaction. Therefore, IRES-based translation is said to be m7G cap-independent, allowing for initiation within bicistronic or circular RNA elements.
In addition, IRESes are heterogeneous in which eIFs are and aren’t recruited or required, with some forms requiring most eIFs while others require only the 40S ribosomal subunit and an alanine-conjugated tRNA (Fig. 2C; Wilson et al., 2000a; Wilson et al., 2000b; Jan et al., 2001). The latter, employed by the cricket paralysis virus (CrPV), intriguingly utilizes CCU for an initiation codon and codes for a protein with an N-terminal alanine. Though viral IRESes are the most thoroughly characterized, multiple eukaryotic mRNAs are also thought to contain IRESes, including c-Myc, p53, Bcl-2, and others (Komar and Hatzoglou, 2005). Translation of these proteins is maintained under various stress conditions in which canonical translation is inhibited, implying a unique mechanism that escapes this general inhibition.
In a second example of non-canonical modes of initiation, histone 4 mRNA appears to be translated through a ribosomal tethering mechanism (Fig. 2D; Chappell et al., 2006a; Chappell et al., 2006b). The 5’ UTR of histone 4 is shorter than most 5’ UTRs (the mouse homolog is 9 nucleotides, at the short extreme). It is efficiently translated, however, despite translation initiation generally requiring a 5’ UTR of at least 20 nucleotides (Kozak, 1987, 1991). Translation of histone 4 mRNA begins with the recruitment of the eIF4F complex to a structural element within the ORF. eIF4F then binds to the m7G cap, which is buried within a nearby structural element. The 43S PIC is subsequently recruited to eIF4F, and is then transferred directly to the AUG start codon (Martin et al., 2011). Thus, translation of histone 4 is cap-dependent, but deviates from canonical translation in that the ribosome is initially recruited internally. And, as with IRES-mediated translation, translation of histone 4 is mediated through interactions between eIFs and mRNA structural elements.
In each of the above examples, mRNA secondary structure encodes “instructions” for how a given transcript is to be translated. Laura Ranum and colleagues’ discovery of RAN translation introduced a novel mode of translation to this mechanistic multitude (Zu et al., 2011). Expansions of protein-coding CAG repeats in the gene Ataxin 8 (ATXN8) lead to the neurodegenerative disorder SCA8. Unexpectedly, mutation of the only AUG codon upstream of expanded CAG repeats did not abrogate protein translation (Zu et al., 2011). Nevertheless, translation initiated in multiple reading frames, generating homopolymeric proteins with glutamine, serine, or alanine repeats (depending on the reading frame). RAN initiation on ATXN8 transcripts depended on the stability of secondary structures formed from the expanded CAG repeats, as decreasing the number of CAG repeats or their GC content abrogated RAN translation (Zu et al., 2011). RAN translation products from all three reading frames accumulated in cells transfected with expanded CAG reporters, occasionally even within the same transfected cell. Antisense ATXN8 transcripts bearing expanded CUG repeats also supported RAN translation (Zu et al., 2011). Antibodies generated against the predicted polyalanine product of the ATXN8 sense transcript recognized a protein in the cerebellums of SCA8 human patients and mouse models. A similar approach provided in vivo evidence of a polyglutamine RAN product from DMPK antisense transcripts bearing expanded CAG repeats, associated with myotonic dystrophy type 1 (DM1; Zu et al., 2011).
Since this initial observation, several groups have demonstrated that RAN translation occurs at a wide variety of different repeat expansions (Ash et al., 2013; Gendron et al., 2013; Mori et al., 2013a; Mori et al., 2013b; Todd et al., 2013; Zu et al., 2013; Bañez-Coronel et al., 2015). This review focuses on RAN translation in three distinct repeat-expansion disorders that occur in different sequence contexts: CGG repeats in the 5’ UTR of the fragile X mental retardation 1 (FMR1) gene, as occurs in fragile X-associated tremor/ataxia syndrome (FXTAS), intronic GGGGCC repeats in C9orf72-associated amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), and CAG repeats in the protein-coding sequence of Huntingtin (HTT) in Huntington disease. Each example offers unique insights into how RAN translation occurs, but also presents unique challenges in our effort to understand this process mechanistically. Here we review the current literature pertinent to each repeat expansion, in search of hints as to the mechanism of RAN translation.
FXTAS is a late-onset neurodegenerative disorder caused by the expansion of CGG repeats in the 5’ UTR of FMR1. In unaffected individuals, repeats number less than 45. Individuals with FXTAS carry between 55 and 200 repeats, known as the “premutation” range (Pembrey et al., 1985; Dorn et al., 1994; Hagerman et al., 2001). Premutation repeat expansions result in enhanced transcription of FMR1 mRNA bearing these repeats (Tassone et al., 2000; Tassone et al., 2007). In contrast, expansions to greater than 200 repeats trigger transcriptional silencing of the FMR1 locus, leading to loss of FMR1 mRNA and the Fragile X protein, FMRP. Transcriptional silencing manifests as Fragile X syndrome, a clinically distinct neurodevelopmental disorder characterized by intellectual disability and autistic features (Pieretti et al., 1991; Verkerk et al., 1991). Approximately 40% of male premutation carriers develop FXTAS (approximately 1:3000 of the total population), with increased penetrance at older ages and larger repeat sizes (Rousseau et al., 1995; Dombrowski et al., 2002; Jacquemont et al., 2004). FXTAS is characterized clinically by action tremors, ataxia, parkinsonism, and cognitive decline, and pathologically by both neuronal and non-neuronal ubiquitinated inclusions throughout the cerebral cortex, brainstem, and cerebellum (Leehey, 2009; Leehey and Hagerman, 2012). Premutation carrier women are also at increased risk of premature ovarian insufficiency (FXPOI; Cronister et al., 1991; Allingham-Hawkins et al., 1999).
Our lab demonstrated that the CGG-expanded FMR1 5’ UTR supports RAN translation initiation (Todd et al., 2013). Initiation within the 5’ UTR occurs in at least two reading frames in the absence of an AUG start codon: the GGC (+1) frame yields a polyglycine product (FMRpolyG), and the GCG (+2) frame yields a polyalanine product (FMRpolyA). FMRpolyG accumulates in ubiquitinated inclusions in patient tissue and cellular and animal disease models, is necessary to elicit toxicity in Drosophila models of disease, and induces proteasome perturbations in Drosophila and HeLa cells (Todd et al., 2013; Buijsen et al., 2014; Oh et al., 2015). In an inducible mouse model of FXTAS that expresses the FMR1 5'UTR with 90 CGG repeats, turning off transgene expression reverses the formation of neuronal FMRpolyG-positive inclusions and repeat-elicited behavioral deficits (Hukema et al., 2015). Finally, FMRpolyG-positive inclusions have been found in ovarian stromal cells in FXTAS mouse models and a FXPOI patient, suggesting FMRpolyG expression is linked to other Fragile X-related clinical phenotypes (Buijsen et al., 2016).
In an effort to investigate RAN translation of CGG-expanded FMR1 mechanistically, our lab developed several transfectable reporters for expression of FMRpolyG and FMRpolyA. First, we observed production of an FMRpolyG-green fluorescent protein (GFP) fusion reporter construct bearing 30, 50, or 88 CGG repeats, suggesting that RAN translation can occur in the absence of pathological expansions (Todd et al., 2013). In contrast, FMRpolyA was expressed from reporter constructs bearing 88 repeats but not 30 repeats (Todd et al., 2013). Second, insertion of a stop codon immediately upstream of the CGG repeats precluded expression of FMRpolyG reporters, indicating that RAN translation of FMRpolyG initiates upstream of the repeats (Todd et al., 2013). Further mutational analysis revealed that initiation in this frame can occur at multiple upstream near-AUG codons in the human 5’UTR (Todd et al., 2013). In contrast, insertion of a stop codon did not preclude expression of FMRpolyA (Todd et al., 2013). This suggests that RAN translation can initiate in the GCG frame within the repeats. These results raise the intriguing possibility that RAN translation of the same sequence can differ mechanistically in different reading frames.
One important question is whether RAN translation of CGG-expanded FMR1 is cap-dependent or utilizes an IRES-like cap-independent mechanism (Fig. 2A–C). There is some evidence to suggest that the FMR1 5’ UTR possesses IRES activity. Insertion of the FMR1 5’ UTR between two ORFs in a plasmid-based bicistronic reporter doesn’t eliminate translation of the second ORF (Chiang et al., 2001; Dobson et al., 2008). Although some expression of the second ORF may be due to the presence of a cryptic promoter element within the FMR1 5’ UTR, this same finding was observed when in vitro transcribed bicistronic RNA was transfected into cells (Dobson et al., 2008). Further experiments indicated that translation from monocistronic reporters with the 5’ UTR of FMR1 were less cap-dependent than reporters bearing the 5’ UTR of β-globin, as would be predicted if an IRES was utilized (Dobson et al., 2008). This putative IRES activity was partially dependent on the CGG repeat and required both the approximately 100 nucleotides upstream of the repeats in FMR1 5’ UTR, as well as the region containing the repeat itself. However, these studies were done before a conceptualization of RAN translation was in place, and it is unclear whether these initiation events occurred at the AUG of the reporter ORF or within the FMR1 5’ UTR itself. Also, some of these studies were performed with a very short repeat (9 CGGs). Follow-up work suggested that a significant fraction of FMRP is translated through a cap-dependent process, highlighting the significance of the eIF4E-m7G interaction both in cells and in vitro translation systems (Chen et al., 2003; Ludwig et al., 2011).
More recently, we demonstrated that RAN translation of CGG-expanded FMR1 reporters is cap-dependent in multiple repeat reading frames (Kearse et al, 2016). Using a modified nanoLuciferase that selectively reports on RAN translation from the FMR1 5’UTR , we found that in vitro transcribed, m7G-capped RAN-translation reporter RNAs are efficiently translated in both transfected HeLa cells as well as rabbit Reticulocyte lysate in vitro translation reactions. However, when m7G is substituted with an A-cap which is not recognized by eIF4E, expression of these RAN reporters decreased dramatically. Furthermore, addition of excess free m7G cap, which binds to and sequesters eIF4E, also blocked CGG RAN translation of both FMRpolyG and FMRpolyA reporters. Neither of these manipulations affected expression of an IRES reporter, strongly suggesting that RAN translation of CGG repeats in the FMR1 5’ UTR is a cap-dependent process akin to the first stage of canonical translation (Kearse et al, 2016).
In the next stage of canonical initiation, the 43S PIC scans through the 5’ UTR, performing a base-by-base inspection for an AUG codon (Fig. 1, Step 3). If the scanning model were to hold for RAN translation of CGG-expanded FMR1, then the 43S PIC would need to scan through the CGG repeats to reach the AUG start codon for FMRP. In silico modeling predicts and in vitro analysis suggests that consecutive CGG repeats form a stable hairpin structure (Sobczak et al., 2010; Kiliszek et al., 2011), presenting a significant impedance to scanning 43S PICs. Consistent with ribosomal scanning, increasing the length of CGG repeats reduces the expression of a downstream AUG initiated reporter (Chen et al., 2003; Ludwig et al., 2011). In addition, Ludwig et al. (2011) observed initiation at a near-AUG codon in an artificial hairpin inserted in the 5’ UTR in place of the CGG repeats, further supporting a scanning mechanism.
More directly, to determine whether the scanning model holds true for RAN translation of CGG-expanded FMR1, we treated cells with hippurastinol, an inhibitor of eIF4A, an RNA helicase that is required for 43S PIC scanning. Addition of hippurastinol effectively blocked translation from both an AUG driven reporter and CGG RAN translation reporters in both the polyglycine and polyalanine reading frames, but had no effect on IRES-mediated translation (Kearse et al, 2016). This would suggest that the initial stages of RAN translation at CGG repeats resembles canonical initiation and requires ribosomal scanning.
Marilyn Kozak demonstrated that downstream secondary structures enhance initiation at upstream AUG and non-AUG codons (Kozak, 1989, 1990, 1994). The increase in non-AUG codon usage is maximal when a hairpin falls 14 nucleotides downstream of the AUG codon. Based on the known size of ribosomes, this orientation would place the start codon within the P site of the 40S ribosome, opposite the anti-codon loop of tRNAMet (Kozak, 1990). These findings led to the hypothesis that secondary structure causes scanning 43S PICs to stall, increasing initiation at optimally positioned non-AUG codons. RAN translation on CGG-expanded FMR1 mRNA to generate FMRpolyG may utilize a similar mechanism (Todd et al., 2013). If mRNA secondary structures are necessary for CGG RAN translation initiation, then it creates specific testable hypotheses regarding how such structures promote initiation at non-AUG codons. For example, stalling of scanning ribosomes (Kozak, 1989, 1990, 1994) is predicted to lead to both congestion of 43S PICs on mRNAs upstream of the repeat and an increase in the dwell time of the 40S subunit over imperfect codon-anticodon matches. Stalling could also favor the dissociation of key eIFs that help determine AUG start codon fidelity (eIF1 and eIF1A) or favor alternative ribosomal conformations as occurs with IRES-mediated translation (Fernández et al., 2014; Muhs et al., 2015). Both of these events could presumably enhance the rate of enzymatic catalysis and 48S complex formation at upstream non-AUG codons. Consistent with this, as the size of CGG repeats increases in reporter constructs, there is an increase in the relative efficiency of RAN translation in both the polyglycine and polyalanine reading frames in RNA transfected cells (Kearse et al, 2016). This is coupled with a decrease in the need for any specific near-AUG codon upstream of the repeat in the glycine reading frame, suggesting impaired start codon selection fidelity. At large repeat sizes, there is even evidence for initiation within the CGG repeat itself in both the polyalanine and polyglycine reading frames, although these events remain inefficient compared to translation from near-cognate codons located upstream of the repeat (Kearse et al, 2016).
In consideration of the existing literature, our current working model of RAN translation at CGG-expanded FMR1 is as follows (Fig. 3A): the eIF4F complex and 43S PIC bind to the m7G cap on FMR1 mRNA. This complex then scans downstream through the 5’ UTR until encountering secondary structure formed either by CGG repeats or the surrounding, intrinsic sequence of the 5’ UTR. Ribosomal stalling results in aberrant translation initiation at non-AUG codons either upstream of or within the repeat in the +1 and +2 frames, resulting in the production of FMRpolyG and FMRpolyA. Important remaining questions include what specific protein factors or neuronal factors are important for RAN translation at CGG repeats, whether initiation also occurs in the CGG (+0, polyarginine) frame or on antisense FMR1 transcripts, and how transferable these mechanisms are to RAN translation at other repeat expansions in different sequence contexts.
The C9orf72 GGGGCC/GGCCCC hexanucleotide repeat expansions was identified by two groups in 2011 as the most common known cause of ALS and FTD (DeJesus-Hernandez et al., 2011; Renton et al., 2011). ALS is the most frequently occurring form of motor neuron disease, affecting approximately 2–4/100,000 individuals (Johnston et al., 2006), and is characterized by progressive paralysis typically leading to death within two to three years after onset. FTD is the second most common form of presenile dementia and affects approximately 20/100,000 individuals between the ages of 45 and 65 (Onyike and Diehl-Schmid, 2013; Luukkainen et al., 2015). FTD presents heterogeneously and is divided into three clinical syndromes; behavioral variant, semantic dementia, and progressive nonfluent aphasia (Neary et al., 1998). The C9orf72 hexanucleotide repeat expansions is most frequently associated with the behavioral variant (DeJesus-Hernandez et al., 2011; Renton et al., 2011; Gijselinck et al., 2012), characterized by changes in personality and conduct. Although ALS and FTD each manifest with a unique set of symptoms and pathology, they are believed to constitute two ends of a single disease spectrum. Approximately 50% of ALS patients develop FTD-like cognitive and behavioral impairment (Lomen-Hoerth et al., 2003; Ringholz et al., 2005); while up to 50% of FTD patients develop motor dysfunction (Lomen-Hoerth et al., 2002). Additionally, TDP-43-positive inclusions are present within the neurons and glia of a majority of ALS patients, as well as in the most common variant of FTD (FTLD-TDP;Neumann et al., 2006).
In C9 ALS/FTD, the GGGGCC repeat, located within the first intron of transcript isoforms 1 and 3, and the promoter region of isoform 2, is expanded from 2–25 repeats in healthy individuals, to upwards of more than a thousand repeats in C9 ALS/FTD patients (DeJesus-Hernandez et al., 2011; Renton et al., 2011; Gijselinck et al., 2012). Both the sense and antisense strands of C9orf72 are transcribed in mutation carriers, resulting in the production of GGGGCC and GGCCCC-repeat containing RNAs (Gendron et al., 2013; Mori et al., 2013b; Zu et al., 2013). These expanded repeat sequences are both predicted to form highly stable RNA secondary structures, with the sense RNA repeat generating a G-quadruplex and hairpin in vitro (Fratta et al., 2012; Reddy et al., 2013; Haeusler et al., 2014; Su et al., 2014) and the antisense RNA repeat recently shown to assume an A-form-like double helix in vitro (Dodd et al., 2016).
In addition to TDP-43-positive inclusions within both neurons and glia (Neumann et al., 2006), neuronal TDP-43-negative inclusions that co-stain for ubiquitin and ubiquitin-binding proteins are uniquely found throughout the CNS of C9-associated ALS and FTD patients (Al-Sarraj et al., 2011; Boxer et al., 2011). Immunohistochemical (IHC) analysis by multiple laboratories indicates that RAN-translation-derived proteins constitute these TDP-43-negative inclusions (Ash et al., 2013; Gendron et al., 2013; Mori et al., 2013a; Mori et al., 2013b; Zu et al., 2013). A total of six different dipeptide repeat proteins (DRPs) are generated from the GGGGCC and GGCCCC transcripts (Fig. 3B & 3D). Specifically, glycine-alanine (GA) and glycine-arginine (GR) DRPs are generated from the sense strand, proline-alanine (PA) and proline-arginine (PR) arise from the antisense strand, and two glycine-proline (GP) containing proteins arise from RAN translation of both strands (Ash et al., 2013; Gendron et al., 2013; Mori et al., 2013a; Mori et al., 2013b; Zu et al., 2013).
DRPs form both neuronal cytoplasmic and intranuclear inclusions (NCIs and NIIs) throughout the CNS. However, the distribution of DRPs throughout the brain is highly variable, with the highest burden occurring in the hippocampus, cerebellum, neocortex, and thalamus (Ash et al., 2013; Zhang et al., 2014; Schludi et al., 2015). Additionally, although limited by potential differences in antibody affinities, IHC studies suggest that the different DRPs are not present in equal abundance. In several brain regions assessed with multiple different antibodies for each DRP, polyGA appears to be most abundant, followed by polyGP and polyGR, while the DRPs derived exclusively from antisense transcripts (polyPA and polyPR) appear to be least abundant (Mori et al., 2013a; Mori et al., 2013b; Mackenzie et al., 2015).
Despite the different CNS regions that exhibit marked neurodegeneration in ALS and FTD—the motor cortex and spinal cord in ALS, and the frontal and temporal lobes in FTD—quantitative IHC studies show that DRP abundance within the frontal cortex and lower motor neurons is not significantly different between C9 patients with pure ALS or FTD (Mackenzie et al., 2015). Several additional studies have similarly shown that DRP burden is not well-correlated with degeneration (Mackenzie et al., 2013; Davidson et al., 2015; Schludi et al., 2015). This is in contrast to TDP-43-positive inclusions, which are most abundant in the most severely affected brain regions (Mackenzie et al., 2013; Davidson et al., 2015; Mackenzie et al., 2015).
This lack of correlation may suggest that RAN translation is not the driving force in disease pathogenesis. Alternatively, it may be that DRP inclusion formation is neuroprotective while the soluble DRPs oligomers drive toxicity, as has been proposed in several other neurodegenerative proteinopathies (Saudou et al., 1998; Haass and Selkoe, 2007). Alternatively, Edbauer and Haas (2015) propose an “amyloid-like” mechanism of toxicity for the DRPs, in which accumulation of DRPs initiates a cascade of events that leads to TDP-43 mis-localization and aggregation in selectively vulnerable neurons (Edbauer and Haass, 2015). Work is still needed to distinguish between these possibilities, perhaps using the AAV GGGGCC66 mouse model in which neuronal loss and TDP-43 pathology is detectable (Chew et al., 2015).
Although their distribution throughout the brain raises questions about their exact role in disease, it is clear from studies in vitro and in vivo that DRP expression in isolation can induce neurodegeneration. From yeast (Jovičić et al., 2015), to Drosophila (Mizielinska et al., 2014; Wen et al., 2014; Freibaum et al., 2015; Tran et al., 2015; Yang et al., 2015), cultured cells (Zu et al., 2013; Zhang et al., 2014; Tao et al., 2015; Yamakawa et al., 2015), and primary mammalian neurons (May et al., 2014; Wen et al., 2014; Zhang et al., 2014), DRP expression leads to cell death and/or reduced survival. Significantly, in many of these systems, DRP expression is sufficient to trigger toxicity, as demonstrated by the use of alternative codons in place of GGGGCC that allow for DRP production in the absence of the potentially toxic repeat-containing RNA species (May et al., 2014; Mizielinska et al., 2014; Wen et al., 2014; Zhang et al., 2014; Jovičić et al., 2015; Tao et al., 2015; Yamakawa et al., 2015; Yang et al., 2015). Furthermore, transgeneic flies expressing various length GGGGCC repeats with stop codons in all three reading frames form RNA foci, but only flies containing pure repeats produce polyGR and polyGP proteins and undergo significant cell death (Mizielinska et al., 2014).
Multiple studies suggest that arginine-containing DRPs are the most toxic RAN species. Both polyGR and polyPR form intranuclear aggregates that disrupt nucleoli when overexpressed in model systems (Kwon et al., 2014; Wen et al., 2014; Tao et al., 2015; Yamakawa et al., 2015). However, nucleolar DRPs are not detected in patient brain tissue (Mackenzie et al., 2015; Schludi et al., 2015), and when co-expressed with polyGA, polyGR proteins are recruited into cytoplasmic polyGA inclusions (Yang et al., 2015), suggesting that nucleolar stress may not be a significant driver of toxicity in patients. Alternatively, Wen et al. (2014) identified nucleolar polyPR-positive inclusions in spinal cord tissue from a C9 ALS patient, and suggest that the high toxicity of nucleolar polyPR results in increased vulnerability of neurons containing these species (Wen et al., 2014). PolyGR proteins can also mediate toxicity through impairment of the Notch pathway (Yang et al., 2015) and polyPR and polyGR inhibit nucleocytoplasmic transport in flies and yeast (Freibaum et al., 2015; Jovičić et al., 2015). Importantly, GGGGCC repeats directly interact with RanGAP, a regulator of nucleocytoplasmic transport (Zhang et al., 2015a), suggesting that the arginine-containing DRPs and the repeat-containing RNA both contribute to this mode of toxicity.
Although comparatively less toxic than polyGR or polyPR when each is expressed in isolation (Mizielinska et al., 2014; Wen et al., 2014; Freibaum et al., 2015; Jovičić et al., 2015; Yamakawa et al., 2015; Yang et al., 2015), non-arginine-containing DRP proteins also appear important to neurodegeneration in model systems. Adult flies expressing exclusively GA-100 DRPs within neurons have significantly reduced survival (Mizielinska et al., 2014), and expression of polyGA in primary mammalian neurons causes increased toxicity through impairment of the ubiquitin-proteasome system (Zhang et al., 2014), induction of ER stress (Zhang et al., 2014), and sequestration of Unc119, a trafficking protein with a GAGASA binding motif (May et al., 2014).
Despite compelling evidence that RAN translation and the resulting DRPs are involved in disease pathogenesis, little is known about the mechanism by which the expanded GGGGCC and GGCCCC repeats trigger DRP production. When placed in a 5’ leader context, RAN translation at GGGGCC/GGCCCC repeats is repeat-length dependent, with more robust DRP production occurring with longer repeats (Mori et al., 2013b; Zu et al., 2013; Su et al., 2014), consistent with observations at CAG and CGG repeats (Zu et al., 2011; Kearse et al, 2016). The repeat-length requirement for initiation also appears to be different for different reading frames and in different sequence contexts. When GGGGCC and GGCCCC repeats are placed downstream of a synthetic sequence, all DRPs are detected by immunocytochemistry (ICC) with as few as 30 or 40 repeats, respectively (Zu et al., 2013). When GGGGCC repeats are instead placed downstream of 113 nucleotides from intron1, a partially native context (Fig. 3B), production of polyGA occurs similarly with as little as 38 repeats (Mori et al., 2013b). However, within this sequence context, polyGP detection required 66 repeats in one report and 145 in another, while polyGR was not detected within cells expressing constructs containing up to 145 repeats (Mori et al., 2013b; Su et al., 2014). Antisense DRPs also showed different length requirements for detection when placed downstream of native sequence; while polyGP and polyPR are detected in cells expressing 66 GGCCCC repeats downstream of 99 native nucleotides, polyPA is not (Gendron et al., 2013). While these apparent differences in length requirements may reflect artifacts of detection based on antibody avidity or differences in DPR solubility, it could also indicate an inherent discrepancy in RAN translational efficiency across reading frames. For example, different RNA secondary structures might favor initiation in certain frames at shorter repeats, with an increase in promiscuity or frame-shifting at larger repeat sizes becomes more prominent.
Beyond these initial insights, however, are a series of unanswered questions related to the mechanism of RAN translation at GGGGCC and GGCCCC repeats. First, it remains unknown exactly what RNA species actually undergo RAN translation in C9 repeat expansion patients. The GGGGCC repeat expansion is located within the first intron of C9orf72. Therefore, in patients, RAN-translated GGGGCC repeats could conceivably derive from a retained intron, a spliced intron in a lariat, or within aberrant disease-specific transcripts generated by transcriptional stalling (Fig. 3B). There is some evidence for generation of such aberrant transcripts, at least in vitro (Haeusler et al., 2014). The ratio of exon1a-intron1 (unspliced or abortive) RNA to exon1-exon2 (mature, spliced) RNA (Fig. 3B), however, is not altered in C9 iPSC-derived neurons and patient brain tissue relative to controls, arguing against significant production of truncated transcripts or increased intron retention (Tran et al., 2015). However, a recent study suggests that intron retention occurs with some frequency in both control and C9 patient cells (Niblock M et al., 2016), and this may only have pathological consequences when the expanded repeat is present. Therefore, the lack of increased retention does not rule out the possibility that such transcripts undergo RAN translation in C9 patients.
In Drosophila, placement of the repeat into an efficiently spliced intron dramatically reduces both RAN translation and its relative toxicity compared to repeats placed into a 5’ leader sequence, suggesting that spliced lariats containing GGGGCC repeats may not be efficiently utilized to produce DRPs (Tran et al., 2015). However, DRP production from the intronic repeat becomes sufficient to elicit toxicity when Drosophila are grown at elevated temperatures, indicating that an intronic context is able to support pathological RAN translation under certain conditions (Tran et al., 2015). Whether the limited amount of DRPs observed was produced from a spliced or retained intronic repeat is unclear. However, if an intron lariat is the transcript subtype utilized, then some mechanism must exist for it to bypass normal degradation mechanisms, exit to the cytoplasm, and become engaged with translational machinery.
Each of these target transcript possibilities has significant implications for what translational initiation factors would be required and what translational mode would be preferentially utilized. For instance, if an intronic lariat RNA is the substrate of GGGGCC/GGCCCC RAN translation, then this almost by definition rules out a role for cap-dependent, canonical processes and strongly favors mechanisms more in line with internal ribosome entry. Similarly, where initiation occurs in each reading frame and what trans factors are required will likely be highly dependent on the RNA species being studied. Therefore, more direct studies are needed to address these questions. Doing so will help formulate a clearer picture of how RAN translation occurs, and will likely provide new potential targets for therapeutically inhibiting this pathological process.
RAN translational events at both CGG repeats in FXTAS and GGGGCC/GGCCCC repeats in C9orf72 occur within putatively non-coding transcripts or non-coding regions of coding transcripts. However, RAN translation can also occur efficiently at CAG repeats embedded within annotated open reading frames. This was initially suggested by work on the CAG repeat in SCA8 (Zu et al, 2011), where the AUG codon normally resides just proximal to the repeat itself, and new data demonstrates that RAN translation also occurs in Huntington disease (Bañez-Coronel et al., 2015).
Huntington disease (HD) is the most common known neurodegenerative repeat expansion disorder, affecting 5.8/100,000 people worldwide (Pringsheim et al., 2012). Huntington disease results from a CAG-repeat expansion in the first coding exon of the Huntingtin gene, HTT. Normal sized repeats are typically in the 20’s, with a minimum threshold of disease as greater than 35 and 100% penetrance at repeat sizes of 40 or greater (Bean and Bayrak-Toydemir, 2014). The repeat begins 51 nucleotides (17 amino acids) 3’ to the AUG initiation codon that demarcates the very large (3144 amino acids) annotated ORF for the full-length huntingtin protein (HTT). The CAG repeat within this annotated ORF codes for a polyglutamine stretch that can be released as a smaller peptide by either alternative splicing or protease cleavage. A large body of evidence suggests that large (usually greater than 60) CAG repeats in isolation or in the context of HTT exon 1 are sufficient to elicit toxicity in mouse and fly models of disease (Mangiarini et al., 1996; Jackson et al., 1998; Schilling et al., 1999). However, evidence also suggests that the native functions of the Huntington protein contributes to phenotypic and molecular findings observed in Huntington Disease patients and some mouse models of HD (see review by Saudou et al, 2016).
Huntington disease shows significant genetic anticipation, with serial repeat expansions over generations leading to an earlier onset of disease and a more severe phenotype, with early dementia, dystonia and parkinsonism in addition to or in place of the cardinal features of chorea and psychosis seen in the later onset form of the disease (Hansotia et al., 1968; Gonzalez-Alegre and Afifi, 2006). While long thought to represent the impact of a larger polyglutamine expansions, the qualitatively different phenotype observed in patients could reflect alternative mechanisms of pathophysiology that only occur at larger repeat sizes rather than a simple additive effect of more polyglutamine (Williams and Paulson, 2008).
Upon this background, Bañez-Coronel and colleagues (2015) provided compelling evidence that RAN translation proteins are generated from the CAG repeat in Huntington disease (Bañez-Coronel et al., 2015). Using antibodies generated against the predicted C-terminal regions that would be produced from proteins initiating in the AGC (serine) and GCA (alanine) reading frames, they observed significant staining in both striatum and cerebellar tissues in Huntington disease patient brains and in a mouse model of Huntington disease. They also identified RAN-translated proteins arising from an antisense strand generated through the repeat in the CUG orientation, producing polyleucine and polycysteine products. In the sense strand, a cutoff for detection of the serine product by ICC was observed in cells at 35 repeats and for alanine at 45 repeats, both around the threshold for pathogenicity (Bañez-Coronel et al., 2015). Intriguingly, the presence of an AUG in the polyglutamine frame appeared to have little impact on RAN translation in neighboring frames from CMV-driven, plasmid-derived transcripts (Bañez-Coronel et al., 2015). No RAN translation was observed from constructs with CAA repeats or when the codon utilized to make a homopolymeric protein was varied throughout the sequence, suggesting that the repeat sequence itself and/or its secondary structure is important for RAN initiation (Bañez-Coronel et al., 2015).
How might RAN translation from expanded Huntingtin transcripts be mediated mechanistically? For the most part, this question remains unaddressed. However, the findings at the HTT locus are internally consistent with previous work on CAG RAN translation in Spinocerebellar Ataxia type 8 (SCA8) (Zu et al, 2011). In SCA8, an AUG initiation codon in ATXN8 resides just one codon above the repeat in the normal sequence context. Mutation of that start codon to AAG failed to prevent translation of the polyglutamine containing protein (Zu et al., 2011). Placement of a stop codon immediately upstream of the repeat or insertion of an AUG and V5 tag above the repeat in the glutamine reading frame failed to block RAN translation of alanine and serine containing products (Zu et al., 2011). As in the HD sequence context, changing the CAG repeat to CAA in the context of the SCA8 leader sequence precluded RAN translation in all reading frames (Zu et al, 2011)
In SCA8, the exact sequence located upstream of the repeat and the cellular context influences RAN translation at CAG repeats in at least some reading frames. For example, placement of a stop codon immediately above the repeat in the polyglutamine frame of the ATXN8 sequence blocked RAN translation of a polyglutamine product below levels detectable by western blot in transfected HEK293 cells(Zu et al, 2011) . Zu et al. (2011) also varied the 20 nucleotides immediately upstream of a CAG repeat to that of the upstream sequence other repeat expansion disorders. Lentiviral delivery of constructs with the Spinocerebellar Ataxia type 3 (SCA3) sequence placed upstream of a CAG repeat in HEK293 cells or mouse brains exhibited no detectable polyglutamine RAN production. In contrast, inclusion of the upstream sequence from another CAG repeat expansion disorder, Huntington Disease Like 2 (HDL2) supported RAN polyglutamine translation in both of these settings (Zu et al., 2011). In contrast, polyalanine expression was robustly maintained in both sequence contexts (Zu et al., 2011). Attempts to recapitulate CAG RAN translation in an in vitro rabbit reticulocyte lysate system revealed a complete loss of polyalanine production regardless of the upstream sequence (Zu et al., 2011). Moreover, in vitro translation of polyglutamine or polyserine products from these constructs required near-AUG start codons (Zu et al., 2011), akin to that reported for RAN translation at CGG repeats (Kearse et al, 2016). However, unlike the case for FMRpolyG translation from CGG repeats, this requirement for a near AUG codon was lost in CAG repeat transfected cells. Thus, it appears that for the CAG repeat in the context of a number of disease specific sequence contexts, the upstream sequence, repeat length, and cellular context impact RAN translation efficiency. While there are hints that similar sequence dependent effects may be present in Huntington’s Disease, future studies using quantitative methodologies will be needed to address this question empirically.
A related question is whether methionine forms the N-terminus of RAN translation products. In canonical translation, tRNAMet is already bound to the 40S ribosome throughout initiation (Fig. 1; Step 2). Initiation at non-AUG codons predominantly utilizes an N-terminal methionine in vitro (Peabody, 1989). If initiation of RAN translation occurs similarly, this would suggest that methionine remains the first amino acid of RAN products. The current published evidence on this question, however, is mixed. For the CAG repeat in ATXN8, data from in vitro rabbit reticulocyte lysate assays demonstrate incorporation of methionine at the N-terminus when a near-AUG codon is included above the repeat in the glutamine reading frame (Zu et al., 2011). In contrast, in the alanine reading frame, mass spectroscopy identified a series of peptides with varying lengths of N-terminal polyalanine in transfected HEK293 cells, suggesting initiation within the repeat itself (Zu et al., 2011). Whether the N-terminal amino acid utilized at initiation was alanine (which is utilized in IRES based translation by the CrPV; Fig. 2C) or methionine remains unclear, however, as N-terminal methionines are often proteolytically removed and replaced by an acetyl group (Giglione et al., 2004).
One further issue is whether frameshifting may also contribute to toxicity and inclusion burden in Huntington’s disease (Wojciechowska et al, 2014). Frameshifting from the polyglutamine reading frame into the polyalanine or polyserine reading frames has been suggested for a number of years as a potential mechanism for aspects of neurotoxicity in animal models of SCA3 and Huntington disease (Toulouse et al, 2005, Davies and Rubinsztein, 2006; Stochmanski et al., 2012; Girstmair et al., 2013). It is important to note that most of this work on frameshifting was conducted prior to the discovery and description of RAN translation, which makes its interpretation difficult. However, it does occur. One recent report published after the original description of RAN translation used both mass spectroscopy and 2D electrophoresis to demonstrate that at least some of HTT polyalanine product results from -1 frameshifting (Girstmair et al., 2013). In contrast, no frameshifting products were observed by Bañez-Coronel and colleagues when they used a series of Amino and Carboxy-terminally tagged AUG driven constructs to mimic exon 1 of Huntington disease in transfected cells (et al., 2015). The discrepancy between these two results is potentially important, because the nature of the detection method utilized (Carboxy-terminal antibodies directed against the predicted amino-acid sequence below the repeats) for histochemical staining of sense strand derived RAN translation products in Huntington’s Disease cannot distinguish between proeteins generated by RAN translation or by frameshifting (Bañez-Coronel et al., 2015). Thus, the relative contribution of these two processes (which are not mutually exclusive) to disease pathogenesis awaits further analysis.
RAN translation represents a new and provocative mechanism by which protein translation can occur in the setting of nucleotide repeat expansions to produce a novel set of toxic proteins. However, at this early stage in our understanding of this process, there are many more questions than answers. This review has tried to take the limited mechanistic data generated to date and place it into the context of known canonical and non-canonical translation initiation processes. These different modes of translation provide a framework for which questions are of greatest importance. By determining the cap-dependency, the requirement for linear and continuous 5’-to-3’ scanning, and the N-terminal amino acid used during RAN translation, we will be able to take advantage of previous work on processes with similar biology. Such an approach can also narrow down and prioritize which of a myriad of potential trans factors should be studied and guide strategies for interventions that might selectively preclude RAN translation.
One important question going forward is whether the mechanisms underlying RAN translation are the same or different across repeat types, reading frames and sequence contexts. Some discrepancies suggest that different mechanisms may be in play. For example, data on RAN translation at CGG repeats thus far is most consistent with a scanning mechanism and use of a near-AUG codon for initiation just 5’ to the repeat (Todd et al., 2013, Kearse et al, 2016), but this would seem unlikely as a mechanism to explain initiation within an open reading frame, as occurs in Huntington disease (Bañez-Coronel et al., 2015). In this fashion RAN translation may be analogous to the situation with viral IRES RNA elements, which display a significant variance in both sequence and initiation factor requirements to achieve the same goal of bypassing cap-dependent ribosomal loading. We may need to begin thinking of these processes as RAN Translations rather than as a single entity. However, only after careful identification of the key factors required for RAN translation can this delineation really be made across different repeat and disease contexts.
Upon these same lines, it is important to recognize that aspects of what we currently observe in RAN translation may be part of a larger set of mechanisms which allow for translation initiation in the absence of an AUG start codon- a process that may be much more common than previously thought. Data from ribosome profiling datasets suggest significant non- and near-AUG initiated translation throughout the transcriptome (Ingolia et al., 2011). Thus, RAN translation may reflect aberrancy of normal non-canonical initiation processes that produces toxic proteins but which otherwise have valuable functions in other settings. Defining these normal functions and their roles in neuronal biology will be critical if RAN translation is to serve as a therapeutic target.
Lastly, it is worth noting that the novelty of RAN translation may well prove to be its greatest value, both in revealing interesting biology and in providing a particularly good target for therapy development. If this process proves important to neurodegeneration, as current data would support, then identification of factors that are selectively critical for RAN translation but not canonical transition may offer a real treatment opportunity going forward.
We thank members of the Todd lab for fruitful suggestions and discussions. KMG and AEL were supported by NIH T-32-GM007315. PKT was supported by VAMC BLRD 1I21BX001841 and 1I01BX001689 and NIH R01NS086810.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.