|Home | About | Journals | Submit | Contact Us | Français|
The rugged nature of the RNA structural free energy landscape allows cellular RNAs to respond to environmental conditions or fluctuating levels of effector molecules by undergoing dynamic conformational changes that switch on or off activities such as catalysis, transcription or translation. Infectious RNAs must also temporally control incompatible activities and rapidly complete their life cycle before being targeted by cellular defenses. Viral genomic RNAs must switch between translation and replication, and untranslated subviral RNAs must control other activities such as RNA editing or self-cleavage. Unlike well-characterized riboswitches in cellular RNAs, the control of infectious RNA activities by altering the configuration of functional RNA domains has only recently been recognized. In this review, we will present some of these molecular rearrangements found in RNA viruses, viroids and virus-associated RNAs, relating how these dynamic regions were discovered, the activities that might be regulated, and what factors or conditions might cause a switch between conformations.
To achieve a thermodynamically stable conformation, RNA maximizes base pairing and base stacking by folding into local secondary structures such as hairpins followed by long range interactions between remaining accessible sequences that pack the molecule into a tighter, globular configuration (1–3). Intricate and distinctive three-dimensional structural domains composed of a finite set of structural motifs (4) provide pockets and platforms for interactions between RNA and small metabolites, proteins or other nucleic acids. The rugged nature of the RNA structural free energy landscape allows RNA to act as a sensor, which can respond to increasing temperature or fluctuating levels of effector molecules by undergoing dynamic conformational changes that switch on or off activities such as catalysis, transcription or translation (5–8). Many such cellular riboswitches have been well characterized, are ubiquitous in prokaryotes, and have been recently found in lower and higher eukaryotes (9–12).
RNA viruses that enter a host cell must complete a variety of processes in a limited time-span to amplify and repackage their genomes before being targeted by cellular defenses. Upon cell entry, positive-strand RNA genomes must unpack from virions and be recognized by the translational machinery to produce the RNA-dependent RNA polymerase (RdRp) and other proteins. The same RNA genome must then be transcribed into complementary minus-strands followed by synthesis of progeny plus-strands. Further translation of the initial or progeny plus-strands may be needed to produce additional products such as structural proteins, followed by packaging progeny into virions and finally egress from the cell (13, 14). A number of these steps require that the viral RNA switch between activities that are mutually exclusive. For example, the initial infecting RNA genome must regulate translation by sensing when sufficient supplies of RdRp have been synthesized so that translation can be restricted and complementary strand synthesis can commence. Another transition shuts down minus-strand synthesis, which allows cellular and viral resources to concentrate exclusively on generation of progeny plus-strands. Additional transitions may also specify transit of plus- and/or minus-strand templates to membrane stacks or invaginations where replication takes place (15). Lastly, some models for replication of RNA genomes suggest that nascent plus-strands may be limited templates for further minus-strand synthesis (the “stamping hypothesis”; 16), which if correct would imply that newly synthesized plus-strands assume a conformation that restricts access of the RdRp to promoter elements or the RNA’s 3′ end.
Despite obvious requirements for RNA viruses to sense when conditions during the infection process dictate a switch between activities, little is understood about how viral RNA genomes regulate such transitions. Accumulating evidence for the role of dynamic conformational changes in regulating the function of cellular RNAs strongly suggests that viral RNAs also use structural plasticity to regulate transitions between alternative processes. Identifying regions in a viral RNA that adopt multiple, functional conformations, however, is a daunting undertaking fraught with potential problems. The propensity of up to 96% of bases in an RNA molecule to form either canonical (i.e., Watson-Crick) or non-canonical basepairs using all three edges of the nitrogenous base (17, 18), coupled with the inability of most computational structure folding programs to predict short and/or long range tertiary interactions (19, 20), greatly complicates prediction of RNA tertiary structure from the RNA’s primary structure. The accuracy of structural information provided by widely used “wet lab” methods such as biochemical structure probing, is highly dependent on whether the prepared RNA has adopted a form that is biologically relevant. RNA has a natural propensity to misfold, becoming trapped in unproductive metastable conformations (1). Resolving in vitro transcribed RNA into its lowest free energy form is possible using tricks of the trade, such as different ionic concentrations and heating followed by slow or snap cooling. This form, however, may not be the natural configuration of the RNA following transcription in a cellular environment. Folding of RNA occurs co-transcriptionally, with the slow rate of RdRp synthesis allowing secondary structures to form sequentially as transcription proceeds (21, 22). 5′ structures therefore form first but some are sufficiently dynamic to allow for the disassembly and reassembly of newly transcribed sequences, a process that can require participation by proteins that function as RNA chaperones (23). Assembly of RNA into a functional configuration may also depend on natural, strategically placed transcription pause sites that may temporarily or irreversibly reduce the rate of transcription (21, 22), allowing formation of local stable structures that are transiently not influenced by nearby downstream sequences (24). All of these factors can allow RNA to assume an initial, metastable functional state, which may not be discernable following heating and cooling and other unnatural manipulations of fully formed RNA transcripts that lead to adoption of more stable configurations.
Despite these problems in experimental and computations design and tools, recent studies on a few diverse viruses and other infectious RNAs have revealed the existence of overlapping stable and metastable structures that are required for critical functions. Many such structures were found fortuitously during genetic or biochemical structure analyses as containing portions of other known elements that would not be structurally compatible. Alternative, metastable structures have also been revealed by computational approaches using programs such as MPGAfold (25), which gives insights into secondary structure dynamics by resolving biologically relevant, intermediate secondary structures during a process that evolves the RNAs to their most stable form. In this review, we will present some of these alternative configurations in RNA structure that are located in RNA viruses, viroids, and virus-associated RNAs, relating how these dynamic regions were discovered, the activities that might be regulated, and what factors or conditions might cause a switch between conformations. This field is still in its infancy however, and the reader should bear in mind that these examples likely represent only the tip of the iceberg.
Phytopathogenic viroids are single-stranded, non-coding, circular RNAs with genome lengths of 250 to 400 nt making them the simplest RNA infectious agent (26). The 30 known species are divided into two families: the Avsunviroidae have branched lowest free energy forms, replicate in chloroplasts using the phage-like chloroplast DNA-dependent RNA polymerase and self-process multimeric replication products to their mature monomeric circular form. In contrast, members of the Pospiviroidae assume an unbranched, rod-shaped structure stabilized by a high degree of intramolecular base-pairing as their most stable form, and lack ribozyme activity. The Pospiviroidae use DNA-dependent RNA polymerase II to replicate in the nucleus by a rolling circle mechanism that produces multimeric complementary strands that are templates for the synthesis of multimeric infectious strands (26). The multimeric replicative forms are processed by as yet unidentified type III RNase into monomers with 5′-phosphomonoester and 3′ hydroxyl ends (27). The processed ends are then ligated by an unidentified T4 RNA ligase-type enzyme to produce the mature circular form, which transits between cells in the absence of a protective capsid.
Members of the Pospiviroidae share high homology in the central portion of their rod-shaped structure, which is known as the central conserved region (CCR). Within the CCR of the rod-shaped structure is an interior loop with high sequence and structural similarity to 5S rRNA loop E that is highly susceptible to inter-strand cross-linking. Processing of multimeric infectious strands occurs within the CCR, one base pair away from the loop E-type element (28). Reisner and colleagues discovered that the form of Pospiviroidae member Potato spindle tuber viroid (PSTVDd) that is processed by nuclear extracts in vitro is not susceptible to cross-linking and thus lacks the loop E element, suggesting that the highly stable rod-shaped structure is not active for viroid processing (29). The structure that was a substrate for processing was metastable, forming when in vitro synthesized RNA was heated and rapidly renaturated (snap-cooled) in a low ionic strength buffer (30). Biochemical structure mapping of an actively processed form of PSTVd with a 17 nt duplication of the CCR suggested that the alternative, metastable conformation contains a hairpin capped by a GNRA tetraloop that is conserved in many members of the Pospiviroidae. A model was proposed suggesting that synthesis of viroid multimers during replication in the nucleus allows assumption of the metastable form that is stabilized by nuclear proteins. The first cleavage needed to process the multimer and release monomers was proposed to cause a conformational transition to the more stable loop-E containing rod-shaped form, which was thought to be the substrate for the second cleavage and final ligation events (28). A miniature version of PSTVd (148 nt) containing the CCR, a 17 nt CCR duplication, and short flanking regions capped by tetraloops was also correctly processed in nuclear extracts, indicating that the remainder of the viroid is not required for processing (31).
A recent study re-evaluated the conformation of the actively processed metastable form. Gas et al. (32) used transgenic Arabidopsis thaliana plants expressing dimer strands of members of three different genera of Pospiviroidae, Citrus exocortis viroid (CEVd), Hop stunt viroid (HSVd), and Apple scar skin viroid (ASSVd), to more accurately reproduce processing events within a host. The in vivo mapped processing sites were equivalent to the site mapped in PSTVd in vitro, but mutagenesis suggested that the tetraloop-capped hairpin either did not form or was not important for viroid maturation. Instead, maintenance of a metastable structure originally proposed by Diener (33) containing a double-stranded region joining branched monomeric units was critical for processing to the mature single viroid circular form (Fig. 1). This metastable structure is phylogenetically conserved throughout the Pospiviroidae.
The double-stranded metastable dimer structure was also predicted to represent a significant intermediate by the sophisticated RNA structure folding program, MPGAfold (25, 34). MPGAfold is designed to mimic co-transcriptional RNA folding and includes parameters that evolve a sequence towards structural fitness (i.e., assuming a structure with the lowest free energy) by facilitating stem-chain growth from a nucleation point. The program functions in parallel on populations representing thousands of possible RNA structures and can predict H-type pseudoknots. Populations of RNA assume distinct, transiently stable conformations representing possible functional metastable structures that can be visualized during the process. The Stem trace component of STRUCTURELAB also allows visualization of identical substructures that form within the population (35). MPGAfold predicted that the CCR of PSTVd dimers transiently assumes the double-stranded structure with the remainder of the molecule assuming first a metastable branched configuration and then the stable rod-shaped structure. MPGAfold also predicted that PSTVd circular monomers form a branched intermediate structure that fits the biochemical mapping data (36).
A second metastable structure in PSTVd that forms between positions 227–237 and 318–328 has also been reported (37). Mutations that inhibit the stability of the structure revert to wild-type and the structure can be detected in vitro and in vivo. The function of this alternative structure in the viroid life cycle is not yet known.
A question that arises is why members of the Pospiviroidae evolved metastable and stable RNA configurations to complete activities required for replication and maturation. The stable rod-shaped structure is likely more resistant to cellular RNases and extracellular conditions encountered during capsid-free intercellular transit. This form, however, may not have the structural landscape required for specific interactions with cellular enzymes needed to complete the replication cycle. By incorporating the ability to shift between structures, viroids in the Pospiviroidae have the capacity to infect in the absence of a need to generate protective structural proteins, which may have enhanced the rate of systemic spread within a host.
Hepatitis delta (HDV) is a 1.6 kb subviral human pathogen with three distinct genotypes that can increase the severity of liver disease in people infected with its helper virus, hepatitis B virus (38). HDV shares selected similarities with viroids by also having a circular single-stranded RNA genome that adopts an unbranched rod shaped secondary structure as its most stable configuration. In addition, HDV replicates in the nucleus by a rolling circle mechanism using host DNA-dependent RNA polymerase II, and multimeric forms of both the infectious genome and complement antigenome are cleaved autocatalytically by self-encoded ribozymes that require pseudoknot structures not found in the rod-like form (39–42). Since the conformational switch that controls HDV ribozyme activity has been recently reviewed (43, 44), it will not be covered here.
Unlike viroids, HDV encodes a protein known as the HD antigen (HDAg) that is translated from the antigenome (42). Sequencing of HDV revealed an intriguing heterogeneity at position 1012 (the amber/W site) that is generated during HDV infection (45, 46). An adenosine at this location produces an amber stop codon, and translation termination leads to production of the short form of HDAg, S-HDAg. A guanylate at position 1012 results in a tryptophan codon, extending HDAg by an additional 19 amino acids and producing the long form of the protein (L-HDAg; 46). Both S-HDAg and L-HDAg are critical for HDV infection, with S-HDAg required for replication of HDV RNA and L-HDAg limiting replication and initiating assembly of HDV particles (47).
Conversion of the adenosine at the amber/W site to a guanosine is mediated by a process known as RNA editing (48). Deamination of the adenine base in the antigenome is directed by a cellular adenosine deaminase, ADAR1, producing an inosine that leads to nucleotide miss-incorporation during further HDV replication (49). Controlling the level and timing of edited transcripts is critical as too much editing limits the accumulation of S-HdAg, which severely restricts replication, while insufficient editing reduces L-HDAg levels, impacting packaging of HDV and transmission within the host (47, 50).
Editing by ADAR1 requires at least 6 contiguous base-pairs around the editing site, with the target adenosine positioned as either an AU pair or AC mismatch (49, 51). While an appropriate base-paired region exists and is a substrate for editing in the rod-shaped form of HDV genotype I (48), sequence differences between genotype members cause disrupted base-pairing in the vicinity of the amber/W site in genotype III (Fig. 2; 50). Using either full length or miniature versions of HDV genotype III, editing was determined to require an alternative, metastable double-hairpin branched structure with 80 nt structurally rearranged compared with the more stable rod form (50, 52). The metastable structure contains two extensive hairpins linked by a central base-paired stem, with the edited adenylate at the base of one hairpin within the linked region. The alternative structure only forms during replication as a minor portion of a population that is dominated by the more stable rod-shaped form (53), thus limiting the extent of editing.
With the balance of S-HDAg to L-HDAg critical for maintaining and propagating HDV, the question arises of how similar editing efficiencies are maintained when only one genotype requires formation of an alternative configuration. For genotype I, which requires the rod-shaped structure for editing (48), enhancing the stability of the rod-shaped structure in a region near the edited site increased the efficiency of editing, suggesting that the editing site in genotype I is in a suboptimal configuration (53, 54). In addition, S-HDAg binds to HDV genotype I RNA, which inhibits editing at the amber/W site and prevents rapid accumulation of edited RNA early in the replication cycle (47). In contrast, editing of HDV genotype III is insensitive to the level of S-HDAg but is affected by binding of L-HDAg (55). This results in feedback inhibition by the editing product, which decreases the availability of edited transcripts.
Recently, two genotype III HDV were compared that differed in the efficiency of branched structure editing (56). The Peruvian isolate edited 3 times more efficiently in vitro than the Ecuadorian isolate, while in vivo, the opposite was true. MPGAfold revealed that differences in their free energy folding landscapes affected relative abilities to form the active branched structure. The Peruvian HDV was significantly less likely to form the productive branched structure due to enhanced stability of the rod shaped structure and was also more likely to adopt unproductive alternative branched structures. Synthesis of transcripts by T7 RNA polymerase under conditions that reduced the rate of transcription to more accurately simulate the rate of HDV transcription in cells confirmed the population structural predictions by finding more branched structures in the Ecuadorian isolate population. These results suggested that levels of edited genotype III HDV are controlled by both the fraction of RNA that assumes the productive metastable branched structure and the efficiency with which ADAR1 edits the branched RNA (56).
The coronavirus family members in the order Nidovirales have single-stranded RNA genomes between 27–31 kb that are divided into three groups (57). Coronavirus replication takes place in the cytoplasm, producing progeny full length plus-strands from minus-sense intermediates together with a nested set of 3′ co-terminal subgenomic (sg)RNAs that are translated to produce most of the virus-encoded products (58–60). SgRNAs, which contain identical 5′ leader sequences derived from the 5′ end of the genome connected to different lengths of 3′ sequences, are generated by premature termination of transcription at specific locations during minus-strand synthesis, followed by reinitiation of synthesis to include the 5′ leader (59, 60). Translation of the genomic RNA that includes a ribosomal frameshifting event produces two polyproteins that are extensively processed to intermediate and final forms (58–60).
The 3′ UTR of coronaviruses ranges from about 270 to 500 nt and is followed by a poly(A) tail. Using a thermodynamically-based computational approach that predicts sequential stem-loop structures, two hairpins were identified in the 3′ UTR of group 2 Bovine Coronavirus (BCoV) where the loop of one hairpin formed the stem of the upstream hairpin (61). This pseudoknot signature was conserved in group 1 and 2 coronaviruses, and mutations that disrupted the pseudoknot reduced replication of a BCoV-derived defective interfering RNA (DI RNA) replicon (61). Compensatory mutations between the pseudoknot partner residues restored infectivity, although some of the replicating DI RNAs restored the wild-type pseudoknot sequence through recombination with the helper BCoV genomic RNA. This suggested a need to maintain the original sequence and not just the structure.
Biochemical structure probing was not consistent with formation of the pseudoknot in transcripts synthesized in vitro (61). Although the hairpin was well supported, the loop of the hairpin was highly susceptible to single-stranded specific enzymes, unlike upstream partner residues that were in a double-stranded or stacked configuration. These upstream bases were determined by others to form the lower stem of an essential bulged stem loop structure located just downstream of the nucleocapsid gene stop codon in all group 2 and 3 coronaviruses and the distantly related group 2 member SARS (Fig. 3; 62–64). The mutually exclusive pseudoknot and bulged stem loop structures were both critical for virus accumulation (65). Only mutations in the lower stem of the bulged loop structure that maintained both the lower stem of the hairpin and the downstream pseudoknot were tolerated. These compensatory alterations, however, produced a small plaque phenotype, suggesting that specific residues impact the correct adoption or timing of the alternative structures during the virus life cycle.
Using an unstable insertion in the large loop of the MHV pseudoknot (L1 in Fig. 3), two classes of second site mutations accumulated in the replicating population (66). One class of alterations was located in the viral-encoded nsp8 and nsp9 ORFs, providing evidence for a possible interaction between these proteins and the region of the molecular switch (although an RNA:RNA interaction was not ruled out). Both nsp8 and nsp9 are RNA-binding proteins, and nsp8 has RdRp activity, but is not the principal viral polymerase (59). Second site changes were also found at a residue in a phylogenetically conserved sequence just upstream of the poly(A) tail, suggesting that pseudoknot function or adoption may necessitate interaction with sequences near the 3′ end (Fig. 3; 66). This putative interaction was phylogenetically conserved, and supported by finding that virus viability did not depend on the presence of the hypervariable region between the pseudoknot and the 3′ end sequence.
A model was presented where the initial structure of MHV contains the bulged stem-loop, the stem loop of the pseudoknot structure and the 3′ end-loop 1 interaction (66). The authors proposed that viral proteins including nsp8 and nsp9 bind to these elements causing a conformational shift that releases the 3′ end-loop 1 interaction and disrupts base-pairing in the lower stem of the bulged stem structure, causing formation of the pseudoknot (Fig. 3). This alternative conformation is proposed to contain the proper structures and proteins for attracting the viral RdRp and associated factors to the template allowing minus-strand synthesis to proceed. Assays that only measure initiation of minus-strand synthesis do not require that the virus contain the region with the bulged stem-loop/pseudoknot structure (67), since the replication-active alternative structure should form in the absence of the controlling region. Only group 2 coronaviruses contain both the bulge stem-loop and pseudoknot structures, suggesting that group 1 and 3 viruses must have developed alternative means to regulate the same function. Although host proteins have been found that interact with the coronavirus 3′ UTR, none of these proteins target the region of the switch (60).
A molecular switch has also been discovered in the arterivirus family of the Nidovirales. Equine arteritis virus (EAV) contains two stem-loop structures (SL4 and SL5) that control initiation of minus-strand synthesis (68, 69). The 43-nt SL5 is located in the 3′ UTR and the 14-nt SL4 is located at the 3′ terminus of the upstream nucleocapsid (N) protein ORF. Revertants arising from EAV containing disabling SL5 mutations had second site mutations in SL4, leading to the discovery of a 10 base, two gap pseudoknot interaction between the two hairpins that disrupts the structure of SL4. Compensatory alterations between the two stem loops restored low to moderate levels of virus accumulation. Similar pseudoknot interactions that would disrupt one or both of the stem-loop structures were possible in all known arteriviruses, although the biological process that requires alternative RNA structures in this region remains unknown.
Turnip crinkle virus (TCV) and other members of the Carmovirus genus in the family Tombusviridae are among the smallest of the plus-strand RNA viruses. Besides the single genomic RNA, TCV can be associated with non-coding subviral RNAs known as satellite (sat) RNAs, one of which (satC) is partially derived from the TCV genome (70,71). Most of the 15 carmoviruses share less than 50% sequence similarity, with little or no conservation in regions important for critical replication/translation functions such as 5′ and 3′ UTRs. However, carmoviruses share the capacity to fold into several distinctive structures at their 3′ ends with important roles in replication either demonstrated or predicted (Fig. 4). TCV and all carmoviruses with the exception of Galinsoga mosaic virus contain a very stable 3′ terminal hairpin (Pr) tentatively identified as the core promoter for minus-strand synthesis based on analysis of the comparable hairpin in satC (72). Directly upstream of all carmoviral Pr hairpins is a structurally conserved and critically important hairpin (H5), which contains a large symmetrical internal loop composed of phylogenetically conserved sequences (73). The 3′ side of the H5 internal loop forms an RNA:RNA interaction with bases at the 3′ terminus (termed ψ1), which is important for efficient TCV accumulation in vivo (74) and is present throughout the Tombusviridae (Fig. 2; 75). TCV and the most related carmovirus, Cardamine chlorotic fleck virus (CCFV; 65% nt identity), have two juxtaposed hairpins just upstream of H5 (H4a/H4b), involved in two additional pseudoknots (ψ2, ψ3). Adjacent to these elements is another conserved hairpin, H4, which is important for both replication and translation (76; X. Yuan and A.E. Simon, unpublished), which forms a fourth pseudoknot (ψ4) with the 5′ side of the H5 large symmetrical loop (X. Yuan and A.E. Simon, unpublished).
SatC (356 nt) shares 150 3′ co-terminal nt with TCV genomic RNA, differing in 10 positions that reduce the stability of the Pr and H5 hairpins (Fig. 4). The ability of satC to form similar structures within the TCV-derived region was supported by hairpin exchanges with CCFV and by in vivo functional selection, where randomization of entire hairpins or portions of hairpins followed by selection in host plants led to recovery of similar structures (77–81). Mutations that disrupted ψ1 and freed the 3′ terminus enhanced transcription of satC transcripts by purified TCV recombinant RdRp in vitro, while compensatory alterations that restored the pseudoknot reduced transcription to wild-type levels. These results affirmed a requirement for the hairpins and pseudoknot during the course of events that lead to minus-strand transcription (73).
Unexpectedly, solution structure mapping of wild-type and mutant satC full-length transcripts did not support the presence of ψ1 or any of the TCV-related hairpins, suggesting that the initial conformation of the transcripts was significantly different from the structure that was required in vivo (80, 82). The possibility that these transcripts synthesized by T7 RNA polymerase had formed non-productive, kinetically-trapped intermediates was ruled out by folding the RNA using different mono/divalent ion concentrations and different heating/cooling treatments and finding that the population of RNAs always adopted a single, stable configuration (82). This initial conformation, termed the “pre-active” structure, also contained an important pseudoknot (ψ2) that forms between the terminal loop of hairpin H4b and sequence flanking H5 in satC (Fig. 4; 80, 82) and TCV (76) and is conserved in many carmoviruses (80). The presence of ψ2 in the initial structure adopted by satC transcripts and the formation of ψ1 during the process that leads to minus-strand synthesis in vitro (73) strongly suggested that the pre-active satC structure was a functional alternative to the phylogenetically conserved, multiple hairpin structure.
Alterations in several regions of satC H5 or short 3′ or 5′ end deletions caused identical structural rearrangements in the Pr, H4a and H4a-flanking regions that correlated with enhanced transcription in vitro (82). These findings led to the hypothesis that satC initially adopts the pre-active structure and requires a conformational shift to an active structure for transcription in vitro and in vivo. Altering the H4a-flanking sequence, known as the “DR”, decreased satC accumulation in vivo and substantially reduced transcription in vitro, indicating that this region is critical for initiation of minus-strand synthesis. However, mutations in the DR lost some or all of their inhibitory effects when combined with alterations that either shifted the structure to the active form in vitro, or stabilized active structure hairpins in vivo (80, 82). The suggestion was made that the DR was necessary for the switch from the pre-active to the active conformation, but was of reduced importance if transcripts initially assumed the active structure, or if enhanced stability of the active structure lowered the activation energy between conversion of the two structures.
An important question is why untranslated satC requires a conformation that is not active for transcription. One possibility is that this pre-active conformation prevents newly synthesized plus-strands from being used as templates for further minus-strand synthesis, as suggested by the stamping model hypothesis for viral replication (16). By restricting access of RdRp to nascent plus-strands, most progeny would be “stamped” off of the original parental genome, thereby reducing the amount of potentially deleterious mutations that would accumulate during multiple rounds of replication.
Although the factor that mediates the switch between pre-active and active satC structures is not known, a likely possibility is the viral RdRp. The RdRp was recently implicated in the switch between translation and replication in TCV genomic RNA (X. Yuan and A.E. Simon, unpublished). A portion of the 3′ region of TCV (ψ2, ψ3, H4a/H4b and H5) folds into a T-shaped structure (76) that binds to 60S ribosomal subunits and functions (with upstream sequences) as a translational enhancer (83). Binding of the RdRp to the region causes a substantial shift in the conformation of the RNA from H4 to the 3′ end and disrupts the ribosome-interacting site. This conformational switch is postulated to restrict translation and promote replication (X. Yuan and A.E. Simon, unpublished).
Retroviral RNAs are unique from the perspective that two copies of the genome are packaged per virion (84, 85). The 5′ untranslated region of the human immunodeficiency virus (HIV-1) is 335 nt and is the most conserved region of the genome. The UTR contains sequences and structures that influence: 1) transcriptional transactivation (the TAR domain); 2) RNA splicing (the splice-donor site domain [SD]; 3) reverse transcription (the primer binding site (PBS) domain); 4) genomic RNA encapsidation (the packaging signal [ψ]); and 5) RNA dimerization (the Dimer Initiation Site, DIS). These domains are illustrated in Fig. 5 (86, 87).
Reports describing the characterization of HIV and bovine immunodeficiency-like virus (BIV) Tat protein-TAR RNA interactions have been extremely informative for enhancing our understanding of the importance of bulged nucleotides, non-canonical basepairing, arginine-rich regions in RNA binding proteins, and the interactions of peptides and proteins with the RNA grooves (88). The Tat protein is a transcriptional transactivator that enhances the efficiency of RNA polymerase elongation (89). Tat interacts with Cyclin T1 and recruits the viral TAR RNA at the 5′ end of the viral long terminal repeat (90). The Tat protein has two functional regions: The arginine-rich motif (ARM/TAR RNA binding region); and the activation domain that interacts with Cyclin T1, thereby increasing the specificity and affinity for TAR RNA binding (91).
Using nuclear magnetic resonance spectroscopy, Puglisi et al. (92) discovered that the TAR RNA stem forms an A-helix that includes two bulged nucleotides, which are now known to be key determinants for Tat protein binding. As an example of creative structural features that stabilize RNA-protein complexes, the HIV TAR RNA bound to arginine has a U(U-A) base triple in the major groove (93). In the BIV Tat peptide-TAR interaction, the unstacking of nucleotide U10 is the single RNA conformational change that accompanies Tat peptide binding. At the same time, the Tat protein, which is unstructured in the absence of RNA, is conformationally altered to form a β-hairpin (93). These critical viral RNA-protein interactions and the corresponding conformational changes promote the assembly of a TAR-Tat-Cycin T1 complex that facilitates viral RNA transcription.
In vitro data suggest that RNA conformational switching could determine how the HIV UTR is used in different stages of the viral life cycle (94–96; also reviewed by Rein (85)). The 5′ UTR of HIV genomic RNA can fold into two mutually-exclusive conformations called the long-distance interactions form (LDI; Fig. 5, left), and the branched multiple hairpins form (BMH; Fig. 5, right). The LDI form is the more energetically stable, while the BMH form is competent to generate the RNA dimers that are packaged into viral particles. The BMI form also exposes both the splice donor site (SD) and the RNA packaging hairpin (ψ) (86). The LDI conformation cannot form RNA dimers because the dimerization initiation site (DIS) is not exposed. Experimental data further suggest that the viral nucleocapsid protein induces a shift from LDI to BMH and also BMH to LDI (Fig. 5; 95). Mihailescu and Marino (97) used NMR spectroscopy to examine the protonation state of HIV-1 RNA in the dimer initiation site region, and concluded that the nucleocapsid protein (NCp7) catalyzes a structural rearrangement that is correlated directly with protonation of the N1 base nitrogen of DIS loop residue A272. The protonation of A272 changes the base pairing potential of the base, thereby providing a molecular mechanism for the conformational changes.
Although in vitro biochemical structural data are consistent with the LDI and BMH conformers, the conformation of the RNA in the virion and during the intracellular life cycle is not clear. Paillart et al. (98) performed RNA structure probing in cultured cells and reported data consistent with the BMH conformer, but not LDI. Berkhout and colleagues (86) countered that Paillart et al. did not examine the structures of the spliced leader variants and also commented that the in vivo probing methods employed may not have detected minority structures or transient structures. A recent method, called SHAPE (selective 2′-hydroxyl acylation and primer extension), has been described as a means of assessing structural features at nearly every nucleotide of an RNA molecule (99). Using this approach, Wilkinson et al. (99) analyzed structural features of the 5′ 10% of the HIV genome under four experimental conditions: 1) HIV-1 genomic RNA inside the virion; 2) HIV-1 genomic RNA extracted from virions; 3) HIV genomic RNA inside the virion, but under conditions where the nucleocapsid protein-RNA interactions were disrupted by aldrithiol-2 (AT-2), a zinc-ejecting agent; and 4) an HIV-1 transcript generated by in vitro transcription. The results of this analysis suggest that HIV RNA forms a single predominant structure that more closely resembles the branched multiple hairpin structure described by Damgaard et al. (100). Additional structural and quantitative data describing retroviral RNA dimerization have been described by D’Souza and Summers (84) and Badorrek et al. (101).
Complementary sequences near the 5′ and 3′ termini of the yellow fever flavivirus genome were described by Strauss et al. in the late 1980’s (102). More recently, the critical role of “cyclization motifs” for the replication of flavivirus RNAs has been revealed (103–106). Additional mechanistic details (104, 107, 108) underscore the importance of these sequences for viral RNA replication, and physical evidence for circularization using atomic force microscopy (104) has been reported. Dengue (DEN) virus-specific peptide-conjugated phosphorodiamidate morpholino oligomers (P4-PMOs) directed against the 3′ cyclization sequences were reported to be highly efficacious in reducing viral RNA replication (109).
Increasing evidence suggests that the initiation of negative strand RNA synthesis involves both the 5′ and 3′ untranslated regions of positive sense viral RNAs. A parallel and perplexing question for researchers in the positive strand RNA virus field is this: What “switches” the viral RNA from translation to replication? Important work by Gamarnik and Andino (110) addressed this question, and the authors proposed that the genomic RNA can be loaded with translating ribosomes, or by the replication complex—but not both. The involvement of both 5′ and 3′ termini in viral RNA replication may help explain how ribosomes and RdRp that are moving in opposite directions avoid mid-transcript collisions.
In vitro transcription using dengue virus RNA transcripts revealed that the 3′ UTR alone is a poor template for the RdRp (105); however, template efficiency improved significantly in the presence of the 5′ UTR, leading to the concept of trans-initiation of replication (108). The 5′ stem-loop A (SLA) region was identified as a flavivirus promoter element and binding site for the viral RdRp (108). More recently, an additional region of 5′-3′ complementarity termed the “UAR” (upstream AUG region) was determined to also be required for replication (Fig. 6; 111). By bringing together the 5′ and 3′ termini, the idea is that the polymerase bound to the SLA is positioned to initiate minus strand synthesis at the 3′ terminus (Fig. 6).
In vitro replication assays and reverse genetic experiments using infectious viral clones provide compelling evidence of the importance of 5′-3′ long range RNA-RNA interactions (112). Alvarez et al. (104) have proposed that cyclization/circularization places key structural elements for viral RNA translation and replication in proximity so that they can be regulated coordinately. DEN differs from TCV in that overlapping regions for translation and replication are in the 5′ UTR whereas in TCV, they reside at the 3′ UTR. The DEN 3′ CS region is not required for efficient RNA translation (113, 114).
Alfalfa mosaic virus and, more broadly, the ilarviruses, have an unusual replication strategy. The genomes of these viruses, which are positive stranded and segmented (RNAs 1–3), are not infectious unless the viral coat protein or the coat protein mRNA (subgenomic RNA 4) is present in the inoculum (115). The RNAs have a 5′ cap structure, but are not 3′ polyadenylated. Biochemical and X-ray crystallographic evidence reveal large conformational changes that accompany the formation of the viral RNA-coat protein interaction; however, the functional role(s) of these conformational changes in regulating viral RNA translation and replication are still not clear.
Experimental analyses of the AMV-coat protein interactions represent some of the earliest detailed biochemical work on RNA-protein complexes. Nuclease sensitivity was used to map regions of the viral RNA that are protected by coat protein, and some of the first electrophoretic mobility bandshift experiments were done using this system (116). These viruses were attractive models for RNA work prior to the advent of bacteriophage SP6/T7 transcription kits because of the ability to isolate relatively large quantities of biochemically pure RNA and coat protein.
The amino terminus of many plant viral coat proteins is highly basic, and is referred to as the amino terminal “arm” (117). The N-terminal arms are unstructured, and in the case of AMV, interfere with virus crystallization (118). Coat protein molecules lacking the basic amino terminus are unable to activate viral RNA replication (119). Coupled with data showing that coat protein binds the 3′ terminus of the viral RNAs, the results suggested that coat protein binding to the RNAs is functionally significant for mechanisms beyond assembly of virus particles.
The 3′ termini of AMV and ilarvirus RNAs can be aligned on conserved (A)(U)UGC sequences that are spaced regularly near the extreme 3′ terminus (Fig. 7). RNA secondary structure mapping demonstrated that single-stranded AUGC sequences flanked two hairpins at the extreme 3′ terminus (Fig. 7). In the presence of viral coat protein, the AUGC regions were not cleaved by T1 ribonuclease (G-specificity), suggesting either that the AUGCs were protected by bound protein, or that the RNA conformation was altered (120). A wide range of biochemical methods was applied toward mapping the coat protein binding sites and determining the RNA sequence and structural features that are recognized by the coat protein (121–126). However, it was not until the structure of AMV 3′-terminal RNA bound to the N-terminal arm of the viral coat protein was solved (127) that the extent of conformational changes in both the protein and the RNA became clear. The coat protein-induced pairing is shown schematically in Fig. 7, and the resulting structure is presented in Fig. 8. In the presence of the viral coat protein, four new base pairs form that stabilize the complex in a manner that was completely unexpected. A “kink” is present in the backbone, as a result of the inter-AUGC base pairing. Like the Tat-TAR complex, AMV coat protein binding to its RNA is dependent on a critical arginine residue (arg17) that nucleates both binding and conformational changes (128). The unstructured coat protein N-terminus is converted to an alpha helix with a long tail upon binding RNA; therefore, both the RNA and the protein change their shapes through the process of co-folding. The structure data are consistent with in vitro genetic selection results (129) and also with data suggesting that the RNA is more compact when bound to a coat protein peptide (124).
In spite of the level of detail offered by the co-crystal structure, the biological significance of the complex has not been resolved. One line of reasoning is that coat protein binding to the 3′ terminus generates a unique structure that is critical for template selection and specific RdRp binding, thereby explaining why coat protein is required in the inoculum to initiate viral RNA replication (127, 130, 131). An alternate proposal (132) is that coat protein binding enhances viral RNA translation by facilitating end-to-end circularization interactions. The crystal structure data are not compatible with the conformational switch model (133), which suggests that the viral RNA structure is extended upon coat protein binding. Instead, the RNA conformation is compacted by the formation of four additional base pairs (127). Nonetheless, the presence of covarying nucleotides in the 3′ RNA sequence (133) may be consistent with RNA conformation(s) that are important in the viral life cycle. Additional experimentation is needed to gain a better understanding of the switch between viral RNA translation and viral RNA replication.
In 2004, Fujita and colleagues (134) reported that the cytoplasmic RNA helicase RIG-I senses cytoplasmic viral RNA. RNA-RIG-I binding correlated with activation of a signal transduction cascade that culminated in interferon expression and the establishment of an anti-viral cell state. Since that time, an additional helicase, MDA5, has been identified (135), as well as a regulatory protein called LGP2 that retains some of the structural features of RIG-I and MDA5 (136). Details of interactions between viral RNAs and RIG-I or MDA5 are starting to emerge, revealing new questions about conformational changes upon viral RNA-protein interactions.
Models of RNA-free RIG-I protein present the protein in a closed conformation, with interacting N- and C-termini (Fig. 9; 137). The C-terminal domain (CTD) has regulatory functions, and also recognizes and binds the 5′ triphosphate group (138) present on uncapped viral RNAs (139). Although RIG-I requires a 5′ triphosphate to recognize single-stranded RNA, not all 5′-triphosphorylated RNAs are activators (139). A second RNA activation domain has been identified in the 3′ untranslated region of hepatitis C virus RNA (140, 141). Contrary to many expectations, the activating domain is unlikely to be double-stranded, having little potential for forming stable secondary structure because the 100 nt domain is a pyrimidine-rich polyU/UC sequence. Interestingly, the antisense strand of the polyU/UC sequence (polyAG/A) is also strongly stimulatory (140, 141).
Future studies will likely reveal important details about the RNA-RIG-I interactions and their biological functions, but for now, there are some puzzling issues. First, RIG-I is a prototypical RNA helicase, containing the common conserved motifs. However, the requirement for helicase activity is not understood, with one report stating that helicase activity is inversely proportional to RNA-mediated signaling (137). Helicases contain ATP binding domains and exhibit ATPase activity; however, Bamming et al. (142) reported that the ATPase motifs can be mutated without loss of biological activity, suggesting that ATPase activity is not essential. Conversely, other authors have reported that disrupting ATPase activity also disrupts signaling (143). Other data indicate that LGP2 can function as a repressor of RIG-I signaling in the absence of RNA binding (144), raising questions about the regulatory mechanism.
RNA modifications also affect RIG-I signaling, and modified RNAs may represent important tools for dissecting the events leading to interferon expression. Following work on siRNA-mediated activation of toll-like receptor signaling (145, 146), it was reported that 2′-OH-modified RNAs did not activate RIG-I (139, 147). It is now clear that, although the modified RNA do not activate, they retain the ability to bind RIG-I (140), suggesting that the RNA-protein complex is trapped in an inactive form. Although it is anticipated that RNA binds to the helicase domain, the RNA binding site on RIG-I has not been mapped experimentally.
Innate immunity is a cell’s first response to a viral infection, and the sensing of cytoplasmic viral RNAs is a key component of the response portfolio. As expected, viruses have evolved mechanisms to defeat the innate immune signaling pathways by, for example, using a virally-encoded protease to cleave off a key membrane-bound mitochondrial protein in the signaling pathway (148). In future months and years, studying the interactions of viral RNAs with cytoplasmic helicases RIG-I, MDA5 and LGP2 will likely yield new details of RNA and protein conformational alterations that are associated with critical cellular functions.
In this review, we have described examples of viral RNA conformational changes that correlate closely with corresponding events in the viral life cycle. Over a number of years, the experimental work in this area has established many of the approaches and methodologies that are used routinely to study RNA structure and RNA-protein interactions. The work is far from finished, however, because many unanswered questions remain about how particular RNA structures relate to specific details of the viral life cycle in infected cells. In other words, correlations are highly suggestive in many cases, but specific mechanisms are lacking.
The precise mechanisms may be difficult to nail down because of the dynamic nature of RNA conformation. For example, the proposed riboswitch involving LDI and BMH conformers of the HIV-1 5′ UTR represents an elegant theoretical mechanism for controlling the differential activities of the viral RNA during its life cycle. However, evidence for the LDI-BMH conformational switch in vivo is lacking. The highly conserved nature of the HIV 5′ UTR nucleotide sequences could be a coincidence for the LDI-BMH folding models; however, short-lived conformers that are difficult to trap experimentally may have important regulatory significance. The limitations of experimental methodologies tend to constrain our image of RNA structures to static rods that are frozen in time, rather than flexible strings with multiple important short-distance and long-distance interactions (112) that are constantly changing with new intra- or inter-molecular interactions. An added dimension is the role of specific RNA-protein interactions that stabilize RNA structures. The AMV RNA-protein complex is an example of how a bound protein can lock an RNA into a conformation that could serve as a regulatory signal. However, a search of the literature reveals that different laboratories often report non-overlapping lists of proteins that bind to a particular viral RNA 5′ or 3′ untranslated region. Future experiments that explore binding kinetics and competition among proteins for overlapping RNA binding sites could bring us closer to understanding how viral RNA translation and replication are coordinately regulated in infected cells.
Work in the Simon lab is supported by NSF (MCB-0615154) and U.S. Public Health Service (GM 061515-05A2/G120CD). Work in the Gehrke lab is supported by the U.S. Public Health Service through an award from the Harvard Digestive Diseases Center (P30 DK034854). We thank Jorgen Kjems and Laura Guogas for helpful comments about the manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.