|Home | About | Journals | Submit | Contact Us | Français|
Rotaviruses are a major cause of acute, often fatal, gastroenteritis in infants and young children world-wide. Virions contain an 11 segment double-stranded RNA genome. Little is known about the cis-acting sequences and structural elements of the viral RNAs. Using a database of 1621 full-length sequences of mammalian group A rotavirus RNA segments, we evaluated the codon, sequence and RNA structural conservation of the complete genome. Codon conservation regions were found in eight ORFs, suggesting the presence of functional RNA elements. Using ConStruct and RNAz programmes, we identified conserved secondary structures in the positive-sense RNAs including long-range interactions (LRIs) at the 5′ and 3′ terminal regions of all segments. In RNA9, two mutually exclusive structures were observed suggesting a switch mechanism between a conserved terminal LRI and an independent 3′ stem–loop structure. In RNA6, a conserved stem–loop was found in a region previously reported to have translation enhancement activity. Biochemical structural analysis of RNA11 confirmed the presence of terminal LRIs and two internal helices with high codon and sequence conservation. These extensive in silico and in vitro analyses provide evidence of the conservation, complexity, multi-functionality and dynamics of rotavirus RNA structures which likely influence RNA replication, translation and genome packaging.
Rotaviruses are responsible for the death from gastroenteritis of over 600000 children per annum, mainly in developing countries of sub-Saharan Africa, South and South–East Asia (1). Two rotavirus vaccines (2,3) have been licensed since 2006 and are now in use in over 100 countries. Rotaviruses are a genus of the Reoviridae and have a double-stranded RNA (dsRNA) genome comprising 11 segments (4). The segments are monocistronic with the exception of RNA11, which encodes two proteins (NSP5 and NSP6). Rotaviruses replicate in the mature enterocytes at the tip of the villi of the small intestine. Adsorption of infectious virions [triple-layered particles (TLPs)] to specific receptors and interaction with co-receptors are followed by virus particle uptake into the cytoplasm and removal of the outer VP4/VP7 capsid layer. The resulting double-layered particles (DLPs) transcribe and release large numbers of all 11, capped, non-polyadenylated, positive-sense RNAs which can act as mRNA for translation or may be incorporated into progeny viral particles in lipid droplet associated cytoplasmic inclusion bodies termed ‘viroplasms’ (5) where early morphogenesis and viral RNA replication take place. Viroplasms release DLPs into the endoplasmic reticulum where they acquire their outer layer to become TLPs which are released from cells by lysis or other, non-lytic mechanisms (4).
RNA structures are instrumental in RNA–RNA and RNA–protein interactions; in viruses these have many functions including control of transcription, sub-cellular trafficking including nuclear export, translation, packaging, and, in some cases, effector functions such as immune evasion (6) and inhibition of apoptosis (7). Viral RNAs carry multiple overlapping functional cis-acting elements, and it is likely that many of the structures involved from only transiently during passage through the host cell. To date there has been little analysis of the structure function relationships of rotavirus RNAs. Signals essential for negative strand synthesis were identified at the 5′ consensus sequences and 3′ consensus sequences (3′CSs) of viral RNAs (8,9). Structural studies of the individual positive-sense RNAs have been limited to computational analyses of single sequences (9,10).
The genomes of many other RNA viruses have been analyzed much more extensively, and many characteristic motifs have been noted whose structural solution has been critical in establishing their function, such as internal ribosome entry site (IRES) structures to control translation (11), nuclear export signals (12) and RNA packaging signals (13). Packaging in non-segmented RNA viruses relies on specific regions of the RNA genomic strand folding into one or more complex 3D structures which can be recognized by viral structural proteins. Segmented RNA viruses, such as rotaviruses require additional mechanisms to encapsidate 11 separate distinct RNA segments into each particle. Rotaviruses achieve this with remarkable specificity, evidenced by the very low particle-to-pfu ratio (between 2 and 5) in cell culture-adapted rotavirus preparations (14). In addition, substitution of particular segments by their equivalent from a different rotavirus strain (re-assortment) occurs frequently during co-infection, implying a sequentially and/or structurally ordered packaging mechanism. In other segmented RNA viruses such as influenza virus cis-acting sequences involved in genome encapsidation have been identified within coding regions by observing focal restriction of codon usage (15).
To probe RNA structure, there is an increasing variety of physics-based analytical tools relying mostly on prediction of Watson–Crick and wobble base pairs. Free energy minimization is used by many RNA folding programs such as mfold (16,17) and RNAfold (18) for secondary structure prediction of an individual RNA molecule. However, the 2D minimum free energy (MFE) structure derived from these limited parameters calculated under non-physiological conditions, does not truly represent the native biological structure. Its accuracy can be improved by applying constraints derived from biochemical probing experiments. Algorithms which allow pseudoknot prediction (19), and comparative folding methods which consider both thermodynamics and covariation in related RNA sequences can also increase accuracy (20,21). RNAalifold predicts the MFE consensus structure from multiple sequence alignments of a set of related RNA sequences (21), and has been used to search for functional RNA motifs in viral RNAs (22–24). However, covariation analysis using the RNAalifold requires high-quality sequence alignments and careful selection of input sequences to avoid bias towards more closely related sequences within an alignment. The most recent version of the consensus structure prediction programme ConStruct allows the use of the RNAalifold algorithm to measure covariation from sequence alignments; it also has additional features such as a graphical user interface for the optimization of sequence alignments, the assignment of sequence weighting to minimize sampling bias, and the calculation of a thermodynamic score from a superimposed consensus base pairing matrix (25). RNAz is a useful tool to detect non-coding RNAs (ncRNAs) from sequence alignments of genomes (26). Because of its ability to detect conserved and thermodynamically stable secondary structure, it also allows high-throughput screening of conserved structures in viral RNA genomes.
Cis-acting functions in viral RNA are inevitably associated with the presence of structural motifs. For rotavirus genome packaging as well as for other functions of the viral RNA such as replication and translation structural signals should be detectable. Secondary structures may provide clues to the mechanism and the method of maintenance of specificity of packaging, and the capability for and control of re-assortment. To seek evidence for conserved structural elements in the published RNA sequences of rotavirus, we analyzed over 1600 full-length sequences of all 11 segments of group A rotaviruses using a variety of bioinformatic methods.
Conserved long-range interactions (LRIs) were found between the 5′- and 3′-terminal regions, sometimes extending into the coding region. Alternative structures possibly with distinct roles in RNA replication and translation were identified in two segments. Biochemical structure probing of the segment 11 led to a complete secondary structure model and confirmed the in silico prediction of three highly conserved helices. These studies provide a novel first insight into RNA structure function relationships in rotaviruses.
Positive-sense RNA sequences of all segments of group A rotaviruses (Rotavirus A) were obtained from GenBank (27). Sequences with full-length ORFs were used in the codon conservation analysis. Only full-length sequences with complete 5′- and 3′-UTRs were used in the ConStruct and RNAz analyses. Sequences of avian strains and of rearranged RNA segments of mammalian strains were excluded from all analyses. Table 1 summarizes the numbers of sequences evaluated for each RNA segment. Details of the rotavirus strains used in the RNA structural conservation analysis including GenBank accession numbers are listed in Supplementary Table S1.
The nucleotide variation at each independent position across each segment was calculated as follows:
where N = total number of sequences analyzed at each position and n = number of A,C,G or T nucleotides.
The method used for estimating variability inside the rotavirus ORFs was previously successfully applied to influenza A virus RNAs (15). Normalized mean pair-wise distance was calculated for each amino acid position from alignments of rotavirus ORFs as described earlier (15). To identify clusters of low variability, for each site a moving average was computed by taking the average of the normalized MPD (nMPD) scores over an 11 amino acid window: the site in question plus five sites on each side. The statistical significance of low-nMPD scores was determined by computing a bootstrapping value for each site as described earlier (15).
RNAfold was used to predict MFE structures of single sequences. RNAcofold was used to predict MFE structures involving two interacting sequences. Both RNAfold and RNAcofold are part of the Vienna RNA package (28).
Full-length sequences were aligned using the multiple sequence alignment program ClustalX2 (29). Due to the large number of closely related full-length sequences of RNA7 available from GenBank, sequences with >99% pair-wise identity were filtered from RNA7 alignments using the WEIGHT utility from the SQUID C function library (Eddy,S. 2005 http://selab.janelia.org/software.html). Full-length alignments were analyzed for RNA5–RNA10. A refined analysis was carried out with an alignment of RNA11 sequences in which the 150-nt long insertion regions in the 3′-UTR of DS-1-like strains were excluded. The extremely high-memory requirement due to the length of RNA1–RNA4 made it impossible to analyze full-length alignments in these four larger segments. Therefore, we analyzed 800-nt hybrid alignments generated by artificially joining the alignment of 400-nt sequences from the 5′-terminus to the alignment of 400-nt sequences from the 3′-terminus in all segments longer that 800-nt (RNA1–9). In order to minimize sampling bias towards closely related strains, a weight value was assigned to each individual sequence from the corresponding ClustalX2 alignment using WEIGHT. The ConStruct package consists of two main programs, the script generator CS_FOLD and the dot plot analysis tool CS_DP (25). Base pairing probability dot-plot and MFE structure were created by RNAfold for each sequence (30). ConStruct project files were either generated by the CS_FOLD, or by custom-made bash scripts to facilitate sequence weight assignment and sequence selection for subgroup analyses (see below). A weighted consensus base pairing probability matrix was computed by CS_DP for each alignment to calculate the thermodynamic scores, while covariation scores were computed using an RNAalifold algorithm which takes stacking into account. Gap positions in some alignments were edited manually. The relative weightings for thermodynamics (wTD) and covariation (wCS) used to calculate the final probability score were wTD=0.7 and wCS=0.3, while the minimum thresholds for thermodynamics (tTD) and covariation (tCS) used were tTD=0.10 and tCS=0.10 (25). In addition to the probability scores, structural alignments, base frequencies and base pair frequencies (AU, GC and GU) for all predicted conserved base pairs were produced by CS_DP from each sequence alignment (25). Consensus structure and base pairing probability values were mapped to the sequences of the bovine rotavirus UK strain unless specified otherwise. Subgroup analysis was carried out by modifying the project file and sequence alignment to select only a particular group of sequences. Detailed analyses of particular sections of alignment were done by generating project files and alignments using custom-made shell scripts.
Short-range RNAz analysis was carried out as previously described earlier (26). Sub-section alignments were generated from full-length alignments using a window size of 60nt or 120nt and a sliding distance of 10 or 20nt. The long-range RNAz analysis involved additional preprocessing of input alignment. All non-overlapping combinations of subsection alignments were joined to create 120-nt hybrid alignments, which were analyzed by RNAz as described earlier. A custom-made shell script was used to extract data from the output file including Z, SCI and P scores. A matrix was generated for each of these values using the OpenOffice Spreadsheet (Supplementary Figure S5). The artificial joining of two alignments may introduce false negatives if one strand of a predicted helix spans across the artificial joining point. Therefore, we manually evaluated the secondary structure predicted for individual strains and for the consensus sequence for all the high scoring alignments (P-score>0.5). Clusters of high P-scores on the matrix suggest that the conserved structures are true positive results. The calculation of P-score by RNAz 2.0 stems from a support vector machine (SVM) training on functional RNA structures in the Rfam 9.0 database (26). Since ‘hybrid alignments’ were used in the long-range RNAz analysis and the SVM was trained for continuous alignments, the corresponding confidence levels for the 0.9 and 0.5 P-score cut-off points are indicative, but not definitive.
The cDNA of group A bovine rotavirus UK strain RNA11 has been cloned previously into the multiple cloning site of the TA cloning vector pCR2.1 from Invitrogen (plasmid kindly provided by Malcolm McCrae, University of Warwick, UK). PCR amplification from this plasmid generated cDNA with a T7 promoter attached directly to the 5′-end of the gene. The following primers were used; 5′-TAATACGACTCACTATAGGCTTTTAAAGCGCTACAGTG-3′ (+sense) (T7 promoter sequence underlined) and 5′-GGTCACAAAACGGGAGTGGGG-3′ (−sense) (italicized Gs represent the 5′- and complementary 3′-ends, respectively, of RNA11). Subsequent in vitro transcription from this cDNA using T7 polymerase generated RNA11 transcripts with authentic 5′- and 3′-termini. Viral RNA transcripts were synthesized using the Ambion MEGAscript T7 transcription kits according to the manufacturer’s protocol. Per reaction, 0.5μg cDNA was used. After 1.5h incubation at 37°C, 1μl of DNase Turbo (1U/μl; Ambion) was added directly to the transcription reaction which was incubated for a further 15min at 37°C. RNA was then purified by phenol/chloroform extraction, followed by precipitation with 2.5× volumes of ethanol in the presence of 0.5M ammonium acetate. Polyacrylamide gel electrophoresis was used to confirm the size and cintegrity of the RNA, and the concentration was determined using a NanoDrop® ND-1000 UV-Vis Spectrophotometer.
The size of RNA11 (667nt) made end labeling techniques impractical for probing the majority of the structure, thus the RNA was analyzed by primer extension. To ensure structural uniformity, RNA was heated to 80°C for 10min and allowed to cool to room temperature before structure mapping was carried out. Enzymatic probing reactions were carried out essentially as described by Ambion using RNases T1, CL3, A, I, V1 (Ambion) and U2 (Thermo Scientific) at limiting dilutions to achieve partial cleavages. RNase A cleaves unpaired cytosine and uracil bases, RNase T1 cleaves unpaired guanine residues, RNase U2 cleaves both unpaired adenines and guanines (but shows a preference for guanines) and RNase I cleaves unpaired bases with no base specificity. RNase V1 cleaves phosphodiester bonds flanked by a helical backbone (31). Reactions were incubated at room temperature for 10min in 10-μl reactions containing 1μg carrier yeast tRNA. Each reaction contained 2μg of RNA11.
For primer extension reactions, 1μg of partially digested RNA was annealed to 50pmol primer by heating to 70°C for 1min in a buffer containing 60mM Tris–HCl pH 7.5 and 100mM KCl, followed by slow cooling to 42°C. RNA11-specific primers used in primer extension reactions are listed in Supplementary Table S2. Primer extension reactions were carried out in 10-μl reaction volumes containing 0.5μg partially digested RNA/primer mix, 16U RNasin (Promega), 8.8 units AMV RT (Promega) and 0.37MBq 33P-dATP (Perkin Elmer) in an extension buffer containing 7.5mM Tris–HCl pH 8.3, 30mM KCl, 7.5mM MgCl2, 6mM DTT, 3μM dATP and 0.3mM each of dCTP, dGTP and dUTP. Reaction mixes were incubated at 42°C for 30min followed by the addition of an equal volume of Ambion Gel Loading buffer II and heating to 80°C for 10min. Samples were analyzed on 10–12% polyacrylamide 7M urea gels alongside dideoxy-sequencing ladders of template DNA, primed by the same oligonucleotide primer that had been used for cDNA synthesis.
Due to the nature of the primer extension protocol, it was only possible to obtain structural data up to nucleotide 635 in the 3′ terminal region, and therefore an absence of data beyond nucleotide 635 is not indicative of an absence of cleavage by RNases. Untreated control lanes indicate reverse transcriptase pausing generated by enzyme detachment as a result of homopolymeric runs of nucleotides or the presence of RNA secondary structure ahead of, or behind, the catalytic site (32–34). Cleavage sites were identified by the higher intensity of the cDNA band in the RNase-treated lane compared to the untreated controls. Results represent cleavages in multiple experiments, using the lowest concentration of RNase where a difference between treated and untreated lanes was clearly visible to avoid the small but finite risk of generating artifactual data from secondary cleavages in RNA fragments.
Sequence conservation was identified in many rotavirus RNA segments. For RNA11, conserved regions are present in the 5′- and 3′-UTR, but also as discrete regions within the ORFs, most notably at nucleotides 94–130 and 187–314 (Figure 1). In RNA8 (Supplementary Figure S1), three conserved regions with variation values of <0.1 were found at nucleotides 1–27, 42–60 and 1042–1059. Much of the second conserved region is within the NSP2 ORF. High-nucleotide sequence conservation cannot be wholly explained by the need to maintain protein sequence because alternative codons can be used for most amino acids, suggesting that sequence dependent cis-acting functions contribute to the conservation.
To identify likely cis-acting functional domains within the coding regions of rotavirus RNAs, thenMPD scores for each amino acid codon in the 12 ORFs were calculated. A low-nMPD score represents higher conservation of a particular codon than would be expected from amino acid conservation or codon bias. Examples ofnMPD plots for NSP2, NSP1 and NSP5/NSP6 are shown in Figure 2, and the results for these and all the remaining rotavirus ORFs are presented in Supplementary Figure S2. Large areas of low-moving average value were found near the 5′-termini of ORFs at amino acids (aa) 1–12 of NSP2 (Figure 2A), amino acids 1–20 of VP1, amino acids 1–15 of VP3 and amino acids 1–10 of VP7, all with bootstrap values below 0.001. Clusters of low-nMPD scores were also found at amino acids 1–10 of NSP5; amino acids 6–17 of NSP6; amino acids 1–7 of VP4; and amino acids 1–3 and 15–19 of NSP3. Taken together, these observations suggest that functional RNA domains exist in the coding region, possibly as part of a single continuous cis-acting sequence extended from the 5′-UTR. No significant functional constraints were detected in the ORFs for NSP1 (Figure 2B), VP2, VP6 and NSP4. Codon conservation of the two overlapping ORFs on RNA11, NSP5 and NSP6, was found beyond the 5′-terminal regions (Figure 2C). At amino acids 21–112 of the NSP5 ORF where the two ORFs overlap, three distinct regions of low-moving average were found within the overlapping region. The alternating pattern of low (amino acids 25–36, amino acids 59–93 and amino acids 106–111 of NSP5) and high (amino acids 37–58 and amino acids 94–105 of NSP5) moving averages within this region raises two possibilities: (i) the low-nMPD clusters define regions which encode amino acid residues crucial for protein functions in both NSP5 and NSP6 ORFs; or (ii) at least part of the low-nMPD clusters are results of constraints imposed by RNA cis-acting functions. Because NSP6 is not essential for viral replication in vitro (35) and is not expressed in some rotavirus strains (Supplementary Table S1), the second possibility seems more likely.
For some ORFs in which we could not identify any low-nMPD regions using alignments of all available sequences, we also analyzed individual or groups of genotypes (Supplementary Figure S3). A small cluster of low-nMPD scores was found at the 5′ terminus of the VP2 ORF (amino acids 1–11) when 42 sequences from the C2 genotype (36) were analyzed alone. A small cluster of conserved codons also appeared at amino acids 387–389 of VP6 when sequences of subgroup II consisting of genotypes I2, I5 and I6 (36) were analyzed separately. Due to insufficient numbers of available sequences this process was not possible for other genotypes of VP2 and NSP4, and for NSP5 from RNA11 sequences which do not express NSP6. The extreme variability including large insertions and deletions in NSP1 precluded meaningful alignment for an analysis based on individual genotypes
We investigated whether conserved RNA structures are present in the coding and non-coding regions of the rotavirus RNA segments. Using the ConStruct RNA consensus structure prediction package (25), base pairing probabilities were computed from the alignments of full-length sequences (RNA5–RNA11) and alignments of 800-nt ‘hybrid’ sequences (RNA1–RNA9, ‘Materials and Methods’ section). Most of the conserved RNA structures obtained from the ConStruct analysis were formed by sequences within the 200nt 5′- and-3′ terminal regions of the RNA (Supplementary Figure S4 and Table S2). For RNA5–RNA9, predictions for the 5′- and 3′-terminal 400-nt regions from analyses of full-length and 800-nt ‘hybrid’ alignments were identical (data not shown), validating the ‘hybrid’ alignments as representative for the prediction of conserved secondary structure in the terminal regions of larger segments (RNA1–RNA4). The most common type of conserved secondary structure identified was a long stretch of helical regions held by LRIs between the 5′- and 3′-terminal regions. Structures with a medium to high-average base pairing probability were identified in RNAs1, 2, 3, 5, 7, 8, 10 and 11, and others with a low-average base pairing probability in RNA4 and RNA6. In RNA8, conserved LRI helices were predicted to form between nucleotides 56–94 and 1004–1042 (Figure 3\A). In RNA11, such a structure was predicted with high certainty to form between nucleotides 18–52 and 607–638 including a long-continuous helix H1 (Figure 3A), in agreement with the prediction by Tortorici et al. (9) using the Massively Parallel Genetic Algorithm (MPGAfold). In RNA11, in addition to the terminal LRIs, two remarkably conserved helices with high-base pairing probability, named H2 and H3, were found in the central region of the RNA (Figure 3C and D). Many stem–loop structures were also identified in the 5′- and 3′-terminal regions, including a stem–loop (nt 15–55) in 5′ terminal region of RNA8 (Figure 3A), SL1 (nt 1201–1259) and SL2 (nt 1261–1342) in the 3′ terminal region of RNA6 (Figure 4A) and SL1 (nt 998–1073) within the 3′-UTR of RNA9 (Figure 5A). RNA10 was predicted to have a conserved terminal structure consisting of three conserved stem–loops in the 3′-UTR linked by many LRI helices (Supplementary Figure S4).
In addition to using ConStruct, we evaluated structural conservation using the RNAz programme, which calculates the probability of the presence of thermodynamically stable and conserved secondary structure within a particular region. The short-range-RNAz analysis confirmed the presence of many stem–loops predicted by ConStruct such as 3′SL1 and 3′SL2 in RNA6 (Figure 4B), conserved stem–loops in the 3′-UTR of RNA10 (Supplementary Figure S4) and a stem–loop at nucleotides 939–976 in RNA8 (Figure 3A). The long-range analysis not only confirmed the presence of nearly all LRI helices predicted by ConStruct, but also suggested an additional conserved stacked helix (Figure 5B) formed by LRIs between the 5′-terminal region (nt 11–42) and the 5′-strand of the 3′SL1 (nt 1003–1031) of RNA9.
Data fromnMPD, nucleotide variation and structural conservation analyses can be used to study how conserved RNA structures are maintained. Both sequence conservation and covariation are involved in maintaining the terminal structures of RNA8 and RNA11. Three types of base pairing can be found within the long-terminal LRIs of RNA8 (Figure 3): (i) between two conserved nucleotides (e.g. base pairs G69:C1029); (ii) between a conserved nucleotide (e.g. U81 in base pairs 81:1017) and a varying 3′ nt (e.g. swapping between A1017 and G1017); and (iii) between two co-varying nucleotides (e.g. U70:A1028 or C70:G1028). In both RNA8 and RNA11, the long-terminal LRIs are formed between highly conserved sequences on the 5′-strand overlapping the low-nMPD regions, and highly variable sequences within the 3′-UTRs (e.g. nucleotides 1005–1034 of RNA11) on the 3′-strand. The much higher sequence conservation on the 5′-strands may be due to protein-coding constraints in addition to the formation of LRIs, but also to other cis-acting functions. Sequence conservation seems to play a key role in maintaining some conserved LRIs. In H1, H2 and H3 helices of RNA11, most base pairs are formed between conserved nucleotides within clusters of low-nMPD scores (Supplementary Text S1), indicating strong selection pressure to maintain these helical structures. Individual and clusters of low-nMPD scores allowed us to evaluate the functional importance of many predicted conserved structures, but due to the mechanism of covariation the lack of codon conservation does not necessarily indicate a lack of functional constraints.
In RNA6, both SL1 and SL2 are largely maintained by covariation (Figure 4). In contrast to the high level of nucleotide variation at the stem region, most nucleotides in the loop region (nt 1226–1232) are highly conserved and may be important for tertiary interactions, intermolecular interactions or protein binding. The 3′-UTR of RNA6 had previously been shown to possess translation enhancement activity, attached to either the RNA6 sequence or a reporter gene (37). This activity is contained within nucleotides 1218–1276, therefore it seems very likely that the stacked stem–loop at nucleotides 1218–1238 of SL1 functions as a translation enhancer when placed after the stop codon of an ORF. A conserved stem–loop (SL3) was predicted at the 5′-terminus but with only low-base pairing probabilities (Figure 4A). The SL3 structure is maintained mostly by weak A:U and G:U wobble base pairs. Evaluation of individual MFE structures shows that within all groups of similar sequences, SL3 is predicted in some sequences while a short-LRI helix (LRI-1) is predicted in others. Using SL3 or LRI-1 as folding constraints, RNAfold predicted two MFE structure models for the bovine rotavirus UK strain with a very small free energy difference of 1.50kcal/mol. The major conserved LRI-2 was extended with the formation of a 3-bp-short helix in model 1 (Figure 4A), while a 4-helix junction is formed between LRI-1, LRI-2, SL1 and SL2 in Model 2 (Figure 4B). Because the 5′-terminus of all rotavirus RNAs contains a cap structure (38), we carried out an additional ConStruct analysis with the first three bases 5′-GGC unpaired to mimic interaction with a cap-binding protein. With this small change the consensus structure prediction switches to the formation of LRI-1 at low base pairing probability.
In RNA9, both SL1 (Figure 5A) and the alternative LRI (Figure 5B) structures can be formed involving extremely conserved sequences (Figure 5C and D). The low-nMPD region in the N-terminal of NSP3 ORF (amino acids 1–3, nucleotides 35–43) was also found to be completely within the 5′-strand of the LRI. In the bovine rotavirus UK strain, both the long- and short-range interactions involve a total of 27bp with identical numbers of 11G:C, 13A:U and three G:U pairs. The prediction of these two mutually exclusive conformations with equal thermodynamic stability suggests the presence of a molecular switch in RNA9.
Two rotavirus proteins, VP1 and NSP3, are known to specifically bind to the 3′CS of all rotavirus positive strand ssRNAs. VP1 interacts with the 3′CS UGUGACC-3′ (39,40), while NSP3 interacts with the four base sequence GACC-3' at the 3′-termini of rotavirus RNAs (41,42). Because the 3′-strand of SL1 involves the first 4nt of the 3′CS, we speculated that the switch from the SL1 to the LRI structure may be induced by VP1 binding, causing the destabilization of the lower helix of SL1, while NSP3-binding to GACC-3′ may not be sufficient to cause the destabilization of SL1. To test the hypothesis in silico, alignments of RNA9 sequences with a deletion of the entire 3′CS (mimicking VP1-binding) or GACC-3′ (mimicking NSP3-binding) were analyzed by ConStruct. When UGUGACC-3′ was deleted a large part of the LRI structure (base pairs from 22:1021 to 42:1003) was predicted with medium to low-base pairing probability (Figure 5B), while the SL1 structure was predicted when GACC-3′ was deleted. To evaluate the effect of the formation of SL1 or LRI on the overall structure of RNA9, MFE structures for the RNA9 of the bovine rotavirus UK strain were predicted using RNAfold with folding constraints forcing the formation of SL1 or LRI. In the SL1 model (Figure 5E), the 5′-terminal sequence was found to form another long-LRI structure, and a long-single-stranded region is formed at nucleotides 970–997. In the long-LRI model (Figure 5F), weak helices were formed at the 5′-terminal region. Interestingly, no ‘hinge region’ was found to isolate the switch region from sequences in the central region of the RNA. Instead, the central region is predicted to undergo extensive structural rearrangement.
As a consequence of the switch from SL1 to the long-LRI conformation, the accessibility of the 3′-terminal region is much higher in the LRI model. To test if there are possibilities of intermolecular RNA–RNA interactions between this 3′-accessible region and 5′-regions of other RNA segments, RNAcofold was used to predict structures between nucleotides 1026–1076 of RNA9 and nucleotides 1–80 of all RNA segments of the bovine rotavirus UK strain. The highest sequence complementarity and lowest free energy were found between nucleotides 1–45 of RNA8 and nucleotides 1030–1076 of RNA9 (Figure 5G). A large part of this potential interaction is formed between the loop region (nucleotide 22–48) of the conserved stem–loop in RNA8 (Figure 3A) and the discontinuous loop region of SL1 (nucleotide 1017–1054) in RNA9, raising the possibility of a kissing-loop interaction between the two RNA segments which may trigger the switch in conformation in RNA9. However, the RNAcofold algorithm does not allow the prediction of kissing loop interactions.
ThenMPD analysis identified a short-low-scoring region at the 5′-terminus of the VP4 (amino acids 1–7). However, prediction of the terminal structure using ConStruct yielded a consensus structure with a 5′ terminal stem–loop and two short helices with a lower base pairing probability than in other segments (Supplementary Figure S4). It is possible that different strains adopt different secondary structures due to high-nucleotide variability at the 3′-terminal region. We divided the RNA4 sequences into three large clusters of genotypes based on their phylogenetic distances and analyzed them separately using ConStruct. The ‘Wa’ cluster contains the most common human P genotypes P, P, P and P, the ‘AU-1’ cluster contains P, P and P; while the ‘SA11’ cluster contains P, P and P, mostly animal rotavirus strains. By mapping the base pairing probability to the MFE structure of representative strains in each cluster (the Wa cluster: human P ST3 strain; the SA11 cluster: bovine P UK strain; the AU-1 cluster: human P AU-1 strain), it became obvious that different RNA4 clusters adopt different LRI structures at the terminal regions (Figure 6). In each cluster, the conserved LRI was found to extend beyond the 5-bp consensus LRI, and the 5′-terminal stem–loop is always predicted. Despite the difference in positions of bulges, most nucleotides in the 3′-UTR are base paired with the 5′-terminal region in all clusters to form a similar terminal structure. The low-structural conservation and high-codon conservation at the terminal regions of RNA4 suggest that the much more conserved 5′-strand may be involved in other functions.
We analyzed the in vitro secondary structure of the shortest segment, RNA11, by biochemical structure probing followed by primer extension (Figure 7). The presence of H1, H2 and H3 LRIs are confirmed by the presence of many cleavage sites on both 5′- and 3′-strands by RNase V1, which cleaves specifically within regions with helical backbone conformation (31). Because helical backbone conformation may extend to several nucleotides adjacent to a helix, RNase V1 cleavages are observed in nucleotides between helices or nucleotides flanked by long helices (31). The presence of the 5′-terminal stem–loop is confirmed by RNase I data.
Based on both, biochemical structure mapping data and consensus structure models derived from ConStruct and RNAz analyses, we built a complete secondary structure model for the RNA11 of the bovine rotavirus UK strain. The first step was to map the RNase cleavage sites to different models including the consensus model from ConStruct, MFE predictions from high-scoring windows in RNAz analysis, and the MFE structures predicted by RNAfold and Mfold. This led to a selection of helices and stem–loops which are compatible with the structure mapping data. By joining different combinations of these substructures together, a model with maximal agreement of the biochemical data, conservation data and thermodynamics was derived (Figure 8). Many of the structures from the ConStruct model are compatible with the biochemical data and are also present in this strain-specific model. In the H2 helix, only 11bp predicted with high probabilities by ConStruct is included in this model (base pairs from 213:480 to 223:470), because the MFE structure by RNAfold predicts the formation of a 7-bp helix adjacent to H2 which gives an overall higher thermodynamic stability and agrees with the RNase V1 cleavages at A207 and C209.
Many interesting features are found in the complete structure of RNA11. First, the sequence immediately before the start codon of NSP6 is highly accessible, strongly supported by RNase I cleavages at nucleotide 74–80. Second, there are many repeat sequence motifs. For example, 5 CUUC motifs are found at nucleotide 49–52 (within H1), 53–56, 62–65, 89–92 and 218–221 (within H2); while 5 GAAG motifs are found at 183–186, 472–475 (within H2), 555–558, 582–585 and 607–610 (within H1). Interestingly, all the CUUC and GAAG motifs are base paired with each other and the CUUC motif is always on the 5′-strand of the resulting helix. Third, a highly conserved palindromic sequence motif with extremely high-GC content 5′-G640GGAGCUCCC649-3′ is found in the small stem–loop at nucleotide 640–656 in the 3′-terminal region. In retroviruses autocomplementary motifs, underpin intermolecular RNA pairing (43) and Hepatitis C virus may have a similar mechanism (44). Fourth, the region between the 3′-strands of H2 and H3 helices is highly conserved and is highly accessible in all MFE and consensus models. Lastly, a tertiary interaction between regions nucleotides 391–419 and nucleotides 502–526 (triangles in Figure 8) was suggested by RNAz from two overlapping hybrid alignments with P=0.47 and 0.39 (Supplementary Figure S5). This interaction was evaluated using ConStruct by joining subsection alignments of the two regions and large regions of sequence complementarity can be found between the two regions in most strains. However, a single consensus structure cannot be determined because of the varying position of interaction sites in different strains. We notice that many base pairs in this proposed tertiary interaction are formed between regions of high accessibility in our secondary-structure model with no conserved structure present. The long single-stranded region between 5′-strands of H2 and H3 may allow enough flexibility for H3 to flip towards H2 to form this tertiary interaction, bringing the start and stop codons of NSP6 into close proximity. More structure mapping data will be needed to prove whether this proposed tertiary structure can be formed.
This detailed and extensive study has evaluated sequence, codon and structural conservation in all 11 rotavirus RNA segments using all the available complete published sequences. All segments show structurally conserved elements and in some we have identified different alternative stable conformations. We also derived a complete in vitro secondary-structure model for RNA11 from a combination of in silico data and biochemical structure mapping. These data not only suggest the presence of cis-acting elements within the highly structured rotavirus (+)RNAs, but also illustrate the potential dynamics and multifunctionality of these RNA structures. Both sequence conservation and covariation are important in maintaining these conserved structures. Where strict sequence conservation is observed it is likely to be the result of multiple functional constraints. We were able to use knowledge of the functions of certain rotavirus proteins (VP1, VP6, NSP2 and NSP3) along with the RNA structures we identified to develop hypotheses on the association of these latter with particular functions.
For RNA conservation analyses, the ConStruct and RNAz programmes both have advantages and disadvantages. ConStruct allows the detailed analysis of conservation, stability and dynamics of structures within intact RNA, and enables analyses of a large number of sequences while applying sequence weight to minimize sampling bias. However, ConStruct is impractical for in silico analyses of larger RNAs and does not easily identify tertiary interactions and alternative conformations involving mutually exclusive structures. In contrast, RNAz allows high-throughput genomic screening of regions containing conserved local structures. By using hybrid alignments, we can detect potential LRIs, intermolecular interactions, mutually exclusive structures and tertiary interactions. However, the SVM model in RNAz was not designed for such a purpose, and the incorporated folding algorithm cannot directly predict some tertiary interactions such as kissing hairpins due to their pseudoknot character. Our analysis showed that RNAz is still highly useful in suggesting candidate functional structures because nearly all of the conserved interactions predicted by ConStruct with high-base pairing probabilities lead to a high P-score of ≥0.5 (and mostly >0.9) in the long-range RNAz analysis. Further developments of the RNAz programme may allow more accurate predictions of LRI and other RNA–RNA interactions. We showed that the two in silico tools when used together are very powerful in detecting potential local structures and LRIs. This highlights the importance of cross-validation of results obtained by different methodologies.
Previous studies of two single rotavirus sequences from separate strains have suggested that a terminally unpaired long-helical ‘panhandle’ may link the 5′- and 3′-termini in RNA8 and RNA11 (9,10). We found such structures, with evidence of evolutionary conservation, in all rotavirus RNA segments. Predicted conserved LRIs in the terminal regions of all RNA segments do not result from bias within the folding algorithms as they are powerfully supported by sequence and codon conservation data. These LRIs may facilitate RNA circularization which has been observed in many other viruses including HIV (45), orthomyxoviruses (46–52), enteroviruses (53) and hepatitis C virus (54). Circularization may be important for a number of steps in the rotavirus life cycle such as RNA replication (10,55), translation (9,10,55) or genome assortment. NSP3 interacts with the GACC-3′ 4-nt sequence at the 3′-termini of rotavirus as well as with the cap-binding eIF4GI (56), however these interactions may be insufficient for efficient circularization of the RNA molecule and the terminal LRIs may approximate the 5′- and 3′-terminal regions before the NSP3–eIF4GI interaction can occur. Furthermore, several terminal LRIs including those in RNA4, RNA8 and RNA11link the stop codon to the start codon via a long-stacked helix and may have additional roles in translation reinitiation. LRI structures may also function to prevent 5′- and 3′-terminal sequences from base pairing with sequences in the central region of the RNA and thus maintain the structural integrity of functional terminal stem–loops and accessibility of sequences at the termini for intermolecular interactions. For example, replication of rotavirus (+)RNAs requires the binding of UGUG from the 3′CS UGUGACC-3′ (39,40) and possibly the 5′-cap structure to the viral RdRP VP1 (57,58).
The in vitro structure mapping data for RNA11 (Figure 7) support a previously proposed stem–loop formed by 12nt at the 5′-terminus (9). Similar 5′-terminal structures were predicted by ConStruct in RNA1, 2, 4 and 6 (Supplementary Figure S4). However, alternative terminal conformations involving a short-LRI helix was predicted by ConStruct for RNA6 and RNA11. The small difference in free energy between the 5′-stem–loop and the short-LRI conformations suggests a flexible terminal structure which may adopt different conformations in vivo at different stages of the life cycle such as translation and replication. The 3′CS is likely unpaired in the conformation adopted during replication (10,40,59).
The proposed structural rearrangement between two alternative conformations of RNA9 is striking. Despite the extremely high-thermodynamic stability and sequence conservation, SL1 is incompatible with RNA replication since the UGUG nucloetides of the 3′CS form part of a highly stable helix. This is good evidence for the existence of distinct, function specific, structural conformations at different stages of the viral life cycle. Such a model may explain the presence of an extensive predicted molecular switch in RNA9. SL1 formation may be important for translation of NSP3, supported from the compatibility of SL1 with NSP3-binding and the shortened distance between the start and stop codon in the SL1 conformation. Replication requires switching from the SL1 to the LRI conformation triggered by VP1 binding to the 3′CS. Biochemical analyses will be needed to address the validity of such models.
The biochemical structure mapping analysis of RNA11 validated the presence of all conserved structures predicted by ConStruct at high probability and many others at medium to low probability. The presence of two highly conserved internal LRI helices, H2 and H3, is unique to RNA11 and may be particularly important to ensure correct folding of RNA11 and to isolate the intervening sequence between them, such that it folds independently of the remainder of the structure. This is most obvious between H2 and H3 where the formation of the two LRIs creates a highly structured region on the 5′-side and a highly unstructured region on the opposite side. The co-localization of conserved LRIs and isolated codon-conservation regions suggest that RNA sequence is important beyond the maintenance of two ORFs.
Several structures suggest translational control mechanisms. In RNA9, a second upstream initiation codon (AUG/CUG) exists at nucleotide 26 in frame with the probable authentic AUG nucleotide 35. This, together with the presence of strong structure, raises the possibility that translation initiation of NSP3 is mediated through an IRES (60,61). The identification of the conserved SL1 in RNA6 has provided us with structural insights to the previously described translation enhancement activity of the 3′-UTR of this RNA segment (37). IRESs are usually found in viral 5′-UTRs but there are well characterized examples of translational elements in the 3′-UTR such as the cap-independent translation elements found in many plant viruses (62). Because of the abundance of G:C base pairs, a possible function of the small stem–loop at the tip of SL1 may be to force ribosomal disassembly and facilitate the recycling of 40S ribosomal subunits at the 5′-cap. The conserved internal sequences and structures within RNA11 may be linked with the expression of the second ORF encoding NSP6. Currently, the translation mechanism used to express this protein is not known. Pulse/chase radio-labeling of virus infected cells using antiserum against NSP6 showed that it is expressed at a low but steady rate throughout the replication cycle and localizes to viroplasms within the cytoplasm (63). Expression in Rabbit Reticulocyte Lysate found the NSP5 and the NSP6 ORFs to be translated into proteins of the expected size and to be expressed in equimolar quantities (data not shown). Neither the NSP5 nor the NSP6 start codons are found in the most optimal Kozak start codon context (GCC(A/G)CCAUGG) (64,65). Expression of NSP6 is likely to be initiated through leaky scanning, with a proportion of ribosomes passing through the NSP5 start codon and starting translation at the NSP6 start codon. The potential tertiary interactions bringing the start and the stop codons of NSP6 in close spatial proximity and the high accessibility of sequence immediately before the NSP6 start codon suggest a ribosomal shunting reinitiation mechanism (66), as proposed for cauliflower mosaic virus (67), prototype foamy virus (68), Sendai virus (69) and adenovirus (70).
Packaging and replication of rotavirus (+)RNA are most likely controlled by the interaction between VP1 and the 3′CS (57), however currently there is no mechanism to explain the precision of genome assortment which ensures that one copy of each RNA segment is packaged per virion. In bacteriophage Phi6 which has a genome of 3 dsRNA segments, the availability of an in vitro assembly system (71,72) led to the proposal of a mechanism involving capsid expansion, exposure of RNA-binding sites on capsid, and sequential segment packaging (73,74). In rotavirus, the requirement of 11 different RNA-binding sites makes this model unlikely. It has been proposed that rotavirus RNA segments form inactive prereplication complexes with VP1 (57), and it is possible that these complexes are linked to each other through RNA–RNA interactions. Because many RNAs have strong intra-molecular structures, the helix destabilizing activity (75) of NSP2 in viroplasms may assist strand exchange reactions by reducing the kinetic barrier between intra- and intermolecular interactions to form inter-segmental linkages. Our data from RNA9 suggest that the formation of a prereplication complex and establishment of RNA–RNA interactions are likely to involve structural changes. Further in silico and in vitro analyses are needed to test these hypotheses.
The consistent finding of conserved structures at the segment termini suggests that they may contribute to packaging specificity. Reoviruses efficiently package internally deleted genomic RNA segments indicating that the packaging signals are located within the 200nt at the 5′- and 3′-terminal regions (76,77). For influenza virus, another virus with a segmented RNA genome, specific packaging signals regions of 20–200nt in length were identified in segments encoding NA (78), HA (79,80), NS (81), PB2, PB1 and PA (82–84). Codon variability studies suggested that the signals extended into the coding region from both 5′- and 3′-ends (15) and segment specificity of the signals was shown to influence influenza A virus segment re-assortment (85). Codon variability studies revealed 30–90nt sequences of low variability at and near most of the segment termini where mapping had suggested packaging signals and in addition, previously unidentified conserved areas in the ORFs for PA, MA and PB2 (15).
This first comprehensive combined computational and biochemical analysis of rotavirus RNA has generated novel and unexpected insights into the pervasiveness, complexity and dynamic nature of RNA structures within the genome. A number of the questions raised by the data are currently under investigation, others await development of a universally applicable reverse genetics system.
Supplementary Data are available at NAR Online.
Wellcome Trust (grant WT082031MA to A.M.L. and U.D.); Biomedical Research Centre (grant RG52162 to A.M.L); Studentship from the Medical Research Council (to J.C.vK.); Royal Society University Research Fellowship (to J.R.G.). Funding for open access charge: Wellcome Trust (grant WT082031MA to A.M.L. and U.D.).
Conflict of interest statement. None declared.
The authors would like to thank Suzanne Diston for secretarial assistance; Malcolm McCrae, University of Warwick, UK, for kindly providing the plasmid containing the RNA11 sequence of the bovine rotavirus UK strain; and Gerhard Steger, University of Düsseldorf, Germany, for advice on installing and using the ConStruct RNA consensus structure prediction package.