|Home | About | Journals | Submit | Contact Us | Français|
The 3′ splice site of the influenza A segment 7 transcript is utilized to produce mRNA for the critical M2 ion-channel protein. In solution a 63 nt fragment that includes this region can adopt two conformations: a pseudoknot and a hairpin. In each conformation, the splice site, a binding site for the SF2/ASF exonic splicing enhancer and a polypyrimidine tract, each exists in a different structural context. The most dramatic difference occurs for the splice site. In the hairpin the splice site is between two residues that are involved in a 2 by 2 nucleotide internal loop. In the pseudoknot, however, these bases are canonically paired within one of the pseudoknotted helices. The conformational switching observed in this region has implications for the regulation of splicing of the segment 7 mRNA. A measure of stability of the structures also shows interesting trends with respect to host specificity: avian strains tend to be the most stable, followed by swine and then human.
Influenza viruses are divided into three major clades: influenza A, B and C. Of these, influenza A and B are the most dangerous as they cause seasonal epidemics. There are an estimated three to five million serious infections yearly, resulting in approximately 500,000 deaths.1 The economic cost of seasonal influenza is very steep. In the US alone the yearly cost is $10.4 billion in direct medical expenses and $16.3 billion in lost wages due to illness or death.2 Even more troubling is the propensity of influenza A strains to cause pandemic outbreaks. Influenza A is one of the major killers of the 20th century. The Spanish Flu of 1918, caused by a pandemic H1N1 avian strain, killed at least 20 million people3 and perhaps as many as 100 million.4 A major factor in the rise of pandemics is the propensity of strains from different host species, e.g., human and avian, to combine and form a novel strain, to which humans are immunologically naive.5,6 Re-assortment in influenza is possible because the virus has a segmented genome. There are eight negative-sense RNA segments that make up the influenza A genome. Viral RNAs (vRNAs) bind to multiple copies of virally encoded NP protein, and the heterotrimeric viral polymerase to form vRNPs, which are uniquely packaged into active virions.
RNA structure plays an important role in the formation of the vRNP and in the replication of the vRNA. The 5′ and 3′ untranslated regions of the vRNAs have conserved complementary regions which allow for the formation of long range base pairs.7-9 The association of these ends forms the promoter necessary for the initiation of RNA synthesis10 and RNA structure may influence which positive-sense RNA [(+)RNA] is produced11-14: protein coding mRNA or template complementary RNA (cRNA) that is used for producing more vRNA. This interaction is stable under physiological conditions15; the particular structure, however, remains controversial, with two competing models: a stem-like “panhandle” structure16,17 or a more complex “corkscrew” model.18 Beyond this region there is little evidence for additional structure in the vRNA. Calculations of global folding free energy indicate that the vRNA is much less stable than the (+)RNA and shows little propensity for maintaining RNA structure in influenza A strains.19
In contrast to the vRNA, the influenza A (+)RNA shows evidence for stable, global RNA structure in four of the eight viral segments: segments 1, 5, 7 and 8.19 Additionally, a survey for local RNA structure revealed 20 regions where the (+)RNA showed a propensity for forming unusually stable and conserved predicted RNA structure.20 Several sites also occurred in regions with suppressed third codon variability, which indicates the possible constraint of maintaining RNA structure.21 Five regions of special interest occurred within or near to functional sites, showed suppressed codon evolution and were within or near to predicted structured regions.20 These sites occurred in segments 8, 7 and 2.
The best characterized of these predicted regions occurs at the 3′ splice site in the segment 7 transcript. Segment 7 encodes the M1 matrix protein and the alternatively spliced M2 ion channel protein as well as the less well-understood M3 polypeptide and M4 protein.22 The conserved structural region encompasses the 63 nt surrounding the splice site and, in addition to the splice site, contains multiple splicing signals (Fig. 1): a binding site for the SF2/ASF exonic splicing enhancer,23 a polypyrimidine tract and a putative branch point signal, are all contained within this region. This region folds as both a pseudoknot and a hairpin, and each conformation places these splicing signals in different structural contexts (Fig. 1).24. When folded in the presence of Mg2+ the ratio of each conformation is roughly 50/50, which implies a similar free energy of folding for each structure. Such a delicate equilibrium may be easily influenced by changes in the cellular environment or the effects of proteins. A similar, but structurally distinct, conformational switch, from hairpin to pseudoknot, was discovered in the 3′ splice site of the influenza segment 8 transcript.25 The structures discovered in segment 8 comprise a distinct family of structured RNAs conserved between influenza A and B.26 Segment 7, in contrast, is not spliced in influenza B and the pseudoknot/hairpin are not predicted to form in influenza B.
RNA secondary structure is known to play important roles in alternative splicing.27 In particular, hiding or revealing splice sites28-31 and protein binding sites32-34 are mechanisms used in nature to regulate the splicing of mRNA. The splicing of the M2 mRNA of influenza A is timed to produce this protein late in viral infection, and this product is roughly 5% as abundant as the M1 transcript.35 Thus, RNA conformational switching could be involved in the regulation of the amount and timing of splicing. This raises the possibility of specifically targeting either or both conformations of the 3′ splice site of segment 7 to modulate biological activity or for the application of oligonucleotide36-38 or small-molecule39,40 therapeutics. For example, the hairpin and pseudoknot structures were probed with a library of 861 unique pentamer and hexamer oligonucleotides. The observed binding pattern was unique and specific to each conformation.24
The hairpin conformation and structurally relevant mutations of the 3′ splice site are shown in Figure 2A. P1 and P2 are common structures between the hairpin and pseudoknot. P1, which contains the polypyrimidine tract and putative branch point, is less stable and more accessible to enzymes in the hairpin conformation.24 P2 and P3 are separated by a 2 by 2 nucleotide internal loop and the splice site occurs within the 5′ side of this loop. The nucleotides in this loop likely form non-canonical GA and GG pairing interactions.24 Similar GA/GG loops are observed in the HIV-1 Rev binding domain41 and the ribosomal loop-E motif42 and are important for protein recognition. The P3 hairpin loop is comprised of a hexamer terminal loop and a six base pair stem, which is interrupted by a single bulged A at nt 730. Most of the terminal loop and the 3′ half of P3 comprise the key residues for binding the SF2/ASF exonic splicing enhancer protein23(Fig. 1). This purine-rich binding site extends into the 3′ side of the 2 by 2 nt internal loop and two nts of the P2 stem (Fig. 1).
The hairpin structure is well conserved. Canonical base pairing is, on average, 96.8% conserved (Table 1). Each helix is supported by at least one compensatory (double point mutation that preserves base pairing) or consistent mutation (single point mutation that preserves canonical pairing; Fig. 2A).
In the pseudoknot, nucleotides that make up P3 in the hairpin conformation are re-arranged to make P3′, and nucleotides 714 to 717 are paired in the P0 helix (Fig. 2B). In this structure the 3′ splice site is canonically paired in the middle of the P0 helix. In addition to four canonical pairs, P3′ has several non-canonical pairing possibilities (Fig. 2B). The loop of P3′ may contain three continuous GA pairs. Continuous stretches of three purine-purine pairs (the 3Rs motif) are especially stabilizing in internal loops.43
P0 and P3′ are both well conserved, with canonical pairing 99.0 and 100.0% conserved, respectively (Table 1). There are five consistent mutations in P0, but only a single change at C720 in P3′. Interestingly, when mutations occur within the loop of P3′, they always lead to purines at these positions (Table 1). This observation supports the potential formation of a structure similar to the 3Rs motif.
The 3′ splice site structured region was initially discovered by identifying parts of the coding RNA with constrained evolution of synonymous sites.20 This implies a need to maintain RNA secondary structure in addition to protein sequence,21 which reduces the allowable synonymous site substitutions. Indeed, the sequence in this structured region has multiple constraints on its evolution: in addition to encoding both the hairpin and pseudoknot structures, it must maintain the M1 open reading frame (ORF) and, after nt 714, the M2 ORF, an SF2/ASF protein binding site, and the polypyrimidine and branch point sequences (Fig. 1). These constraints explain the small number of double point mutations (compensatory changes) that preserve canonical base pairing compared with single point mutations (consistent changes; Table 1). For example, residues 689 and 702 are most often an AU pair (Fig. 2B), but mutate to GU and AC pairs with much higher frequency than the GC double point mutation. In no case do compensatory changes outnumber single mutations and many single mutations resulted in non-canonical pairs. Mutations from canonical to non-canonical pairs resulted in primarily CA pairs followed by GA pairs (Table 1). CA and GA pairs are commonly observed non-canonical pairs that are able to maintain helicicity in RNA structures44-46; they also play important roles in molecular recognition.47,48 Thus, it may be hard to make double point mutations that do not alter the protein coding sequence, as synonymous sites are rarely paired in these structures.
An interesting trend was observed when all available influenza A sequences for the 3′ splice site structure were sorted by expected stability. Sequences with the highest fraction of canonical and GC pairs, which are expected to stabilize structure, were overwhelmingly comprised of avian specific strains.24 As the fraction of canonical and GC pairs decreased, the percentage of avian specific strains decreased. Sequences with the smallest fraction of canonical and GC pairs were mainly from human specific strains with the percentage decreasing with increasing canonical and GC pairing potential. Swine specific strains tended to fall in between. Interestingly, no significant difference was observed in the stability of hairpin vs. pseudoknot in terms of host-specific structural stability.23 Perhaps the ratio of each conformation needs to be maintained irrespective of host.
This trend in host-specific structural stability was explored in the context of whole coding region folding thermodynamics and was found to be a general phenomenon in influenza A.19 In general, avian sequences are more stable than human sequences and swine specific strains fall in between. This trend parallels the temperature at which the virus must replicate. The human and swine respiratory tract, and avian gut are 33, 37 and 42°C, respectively.49 Perhaps the RNA structure is optimized to perform its function at each temperature.
A new structured RNA family has been characterized in influenza A. This family joins the structured splice site from segment 826 in the growing collection of known influenza RNA structures. Both of these structured regions are proposed to influence splicing of their respective mRNAs. Conformational switching places sites that are functionally important for splicing in different structural contexts. In particular, these sites are expected to be more accessible in the hairpin conformation than in the pseudoknot. Switching between hairpin and pseudoknot may be a conserved mechanism for modulation of splicing in influenza. Such a switch makes an attractive target for RNA therapeutics as either structure may be specifically targeted with small molecules or oligonucleotides to inhibit the virus.
A seed alignment for this structured region, created by collapsing alignments to only include non-redundant sequences, has been submitted to the Rfam database.
Previously published online: www.landesbioscience.com/journals/rnabiology/article/22343