|Home | About | Journals | Submit | Contact Us | Français|
Many viruses contain RNA elements that modulate splicing and/or promote nuclear export of their RNAs. The RNAs of the major human pathogen, hepatitis B virus (HBV) contain a large (~600 bases) composite cis-acting 'post-transcriptional regulatory element' (PRE). This element promotes expression from these naturally intronless transcripts. Indeed, the related woodchuck hepadnavirus PRE (WPRE) is used to enhance expression in gene therapy and other expression vectors. These PRE are likely to act through a combination of mechanisms, including promotion of RNA nuclear export. Functional components of both the HBV PRE and WPRE are 2 conserved RNA cis-acting stem-loop (SL) structures, SLα and SLβ. They are within the coding regions of polymerase (P) gene, and both P and X genes, respectively. Based on previous studies using mutagenesis and/or nuclear magnetic resonance (NMR), here we propose 2 covariance models for SLα and SLβ. The model for the 30-nucleotide SLα contains a G-bulge and a CNGG(U) apical loop of which the first and the fourth loop residues form a CG pair and the fifth loop residue is bulged out, as observed in the NMR structure. The model for the 23-nucleotide SLβ contains a 7-base-pair stem and a 9-nucleotide loop. Comparison of the models with other RNA structural elements, as well as similarity searches of human transcriptome and viral genomes demonstrate that SLα and SLβ are specific to HBV transcripts. However, they are well conserved among the hepadnaviruses of non-human primates, the woodchuck and ground squirrel.
HBV is a widespread causative agent of human liver disease, with an estimated 240 million people currently infected. HBV is the smallest DNA virus but has characteristics of both DNA and RNA viruses. Because of the restrictions of its small genome size (~3.2 kb), HBV has extensive overlapping open reading frames (ORFs) encoding its proteins (Fig. 1).1 HBV is the type member of the mammalian hepadnaviridae. Related viruses of birds form the avihepadnaviridae. HBV RNAs are transcribed by cellular RNA polymerase II. They begin at different transcription start sites but end at a common non-canonical polyadenylation signal.2 Remarkably, HBV has 3 major greater-than-genome length intronless transcripts, the pregenomic RNA (pgRNA), preC RNA (pcRNA), and long X RNA (lxRNA).3 Despite the 3 transcripts containing all the HBV ORFs, they are not all translated into all the major HBV proteins. The pgRNA encodes the core (C) protein and polymerase (P) protein. This pgRNA also serves as the template for the synthesis of partial-double stranded HBV genomic DNA by the P protein. The longer pcRNA encodes the precore (also known as preC and PC) protein that forms the HBeAg (hepatitis B e antigen) after post-translational cleavage events. The lxRNA encodes a transcriptional trans-activator, the X protein.3 This X protein is also encoded by the shorter X mRNA. Other HBV subgenomic transcripts are preS1, preS2 and S mRNAs that encode 3 surface (S) proteins, the structural proteins of the virus particles and subparticles (HBsAg).
Notably, all the HBV transcripts contain most of the cis-acting PRE including 3 conserved RNA structures: the stem-loop α (SLα) and SLβ of the PRE at the 3′ end, and the epsilon element () at the 3′ end or both ends (Fig. 1).4 The element located at the 5′ end of the pgRNA is necessary for the viral replication and packaging whereas, the element located at 3′ of the transcripts may have different functions and a slightly different conformation. The 5′ structure forms part of an RNA family ‘clan’.5 In contrast, SLα and SLβ are not present in avian hepadnavirus transcripts. The avian hepadnaviruses also lack the X ORF that harbors most of the PRE sequence.
The HBV PRE can enhance the expression of HBV transcripts and promote the nuclear export of unspliced HBV subgenomic RNAs via CRM1-independent pathway (see ref. 2 and 7 for reviews).2,4,6,7 In contrast, the PRE is not involved in the nuclear export of the full-length pgRNA that utilizes TAP/NXF1 dependent pathway.8 There are also spliced products of the pgRNA whose role is poorly understood.2,9 In the cytoplasm, HBV transcripts serve as mRNAs to produce HBV proteins.
The HBV PRE (nucleotides 1151-1805; RefSeq: NC_003977.1; GenBank: X04615.1) contains 2 major nuclear export elements, recently named the sub-element of PRE 1 (SEP1) (nucleotides 1590-1705) and SEP2 (nucleotides 1239-1442) (Fig. 2A).10 The SEP1 element binds directly to the ZC3H18 protein and recruits the TREX complex and other cellular RNA export factors including ARS2, Acinus and Brr2 for nuclear export. The SEP2 can be further divided into 2 sub-elements, PREα and PREβ, these encompass SLα (nucleotides 1292-1321) and SLβ (nucleotides 1411-1433), respectively.4 However, SLα and SLβ can function independently in nuclear export. Minimum PRE fragments containing nucleotides 1278-1340 and 1347-1457 are sufficient to provide the nuclear export function of SLα and SLβ. Disrupting the base-pairs of the stem-loops using mutagenesis reduced their nuclear export activity significantly, which can be restored by compensatory mutations. In addition, SLα (that contains a weak 3′ splice site at position 1305) in conjunction with an upstream RNA stem-loop (that contains a weak 5′ splice site at position 458) can allow the preS2/S mRNAs to escape over-splicing.11 Thus, it has been postulated that cellular factors bind to SLα to prevent spliceosome assembly. However, the cellular splicing and nuclear export factors that bind to SLα and SLβ have yet to be determined. Additionally, the PRE promotes the splicing and stability of the pgRNA but not the nuclear export of this RNA.9,12 Intriguingly, the splicing regulatory element-1 (SRE-1) (nucleotides 1252-1348) of the PRE, which encompasses SLα, is an exonic splicing enhancer for the pgRNA. All of the observations indicate that SLα and SLβ are involved in the complex splicing and/or nuclear export of HBV transcripts.
The woodchuck hepadnavirus PRE (WPRE) is more potent than HBV PRE, probably due to the presence of an additional third RNA element PREγ.13 Consequently, WPRE has been widely used in various vectors to promote expression of transgenes.14,15 The WPRE can functionally substitute for HBV PRE for the nuclear export of unspliced HBV subgenomic transcripts.13 As with HBV SLα, disrupting the base-pairs of the WPRE SLα using mutagenesis reduced the nuclear export activity significantly. However, there is insufficient evidence to support the presence of conserved RNA structure in the PREγ.
Sequence conservation analyses of human HBV genotypes A-H (n = 32)16 and woodchuck hepadnavirus (n = 9) sequences show that SLα and SLβ are highly conserved RNA structures (Fig. 2B and 2C). Based on previous studies using nuclear magnetic resonance (NMR) (PDB: 2JYM)17 and mutagenesis,4,13 we created the seed alignments for SLα and SLβ (see the Stockholm files in the Supplemental Material). We then built the covariance models for SLα and SLβ using their seed alignments and compared the covariance models with other Rfam models using CMCompare.18 Relatively low pairwise-scores were obtained from the model comparisons, i.e., 12.1 or below, indicating that no similar RNA families were present in the database. However, it is notable that the comparison of SLα and SLβ models with other Rfam models returned a few stem-loop-like structures: human T-cell lymphotropic virus ribosomal frameshift site (RF01790) and retroviral readthrough element 2 (RF01092), respectively, with pairwise-scores of 9.8 and 10.7, respectively.
We next examined the sensitivity and specificity of the SLα and SLβ models using 41416 complete viral genomes (6495 mammalian hepadnaviruses and 34921 other viruses; retrieved from GenBank at 24 February 2016). Using the score thresholds obtained from 5-fold cross-validation, both SLα and SLβ models scored 100% specificity, with slightly lower sensitivities of 99.2% and 99.4%, respectively (see Table S1 and S2). Besides detecting the hepadnaviruses from human and woodchuck, the models also identified both the SLα and SLβ for non-human primate and ground squirrel hepadnaviruses. The searches did not return results above the score thresholds for the distantly related bat hepadnaviruses and some of the HBV (potentially non-viable) with deleted X gene or truncated and/or mutated P and/or X genes. Taken together, the results suggest that SLα and SLβ are evolutionary conserved RNA structures for most mammalian hepadnaviruses but not for bat hepadnaviruses, or the more distantly related avian hepadnaviruses.
On the other hand, searching human transcriptome (retrieved from RefSeq Release 74 at 1 March 2016) using the models did not return matches above the score thresholds, suggesting that similar RNA elements are not present in the host RNAs.
Compensating base-pair changes are common among the seed sequences of SLα, providing an independent support for the existence of the base-pairs. SLα is located within the P reading frame (Fig. 1 and Fig. S1A). There are a total of 3888 possible sequence variants that allow compensatory base-pair changes in SLα that preserve the P amino acid sequence. One hundred and three of the sequence variants (including missense variants/mutations and single base deletions) are found in mammalian hepadnaviruses in GenBank (n = 6450). Further, the most common sequence variant comprised 24% of the sequences, suggesting that the base-pairs of SLα are restricted by sequence in addition to P reading frame.
The RNA consensus diagram for the seed alignment of SLα (Fig. 2A) shows that the lower 5 base-pairs of the stem are less conserved. Indeed, this is in agreement with the consensus mammalian hepadnavirus sequences in GenBank (Fig. S2A). Interestingly, the apical loop of SLα for 99% of the mammalian hepadnavirus sequences have a U-bulge, which can be represented as CNGG(U). The first loop residue (C) pairs with the fourth loop residue (G), forming a tetraloop and bulging the fifth loop residue (U). Therefore, the minimum length of the 30-nucleotide stem-loop required for function is most likely the upper 8 base-pairs in conjunction with G- and U-bulge.
Based on mutagenesis, SLβ is likely to be a 23-nucleotide stem-loop which contains a 7-base-pair stem and a 9-residue loop containing mainly U and C with one G (Fig. 2A).4 However, no NMR or crystal structure is available. In contrast to SLα, there are no sequence variations for the stem of SLβ among the seed sequences. This region also encodes both the P and X proteins so its conservation is likely due to the constraints imposed by this (Fig. 1 and Fig. S1B). As a result, the stem of SLβ would be expected to lack compensatory base-pair changes. However, there are few changes that would change the protein sequences (2% in mammalian hepadnaviruses in GenBank (n = 6426); Fig. S1B).
Unlike the HBV PRE, the nuclear export pathways mediated by retrovirus regulatory elements such as HIV-1 Rev response element (RRE) and the constitutive transport element (CTE) of simian retrovirus type 1 and Manson-pfizer monkey virus are well characterized.19 The lack of similarity of between the PRE and the retrovirus RRE and CTE suggests that the PRE utilizes a distinct nuclear export pathway(s) which is still poorly understood2,9 and recent evidence indicates that it may function differently in different cell types.20 The evidence for this distinction is, firstly, inhibiting leucine-rich NES-dependent export disrupted RRE-mediated RNA export (via CRM1-dependent pathway) but not PRE-mediated RNA export.21 Secondly, TAP/NXF1 binds directly to the CTE (in the TAP/NXF1-dependent pathway) but not the PRE.22 Thirdly, a RanBP1 mutant disrupted the nuclear export pathways for the PRE and RRE but not the CTE.21
Therefore, the proposed models for SLα and SLβ could serve as the basis for understanding the structural tolerance of the RNA stem-loops, which could be the binding sites for cellular splicing and nuclear export factors. Understanding the sequence variations of the RNA stem-loops is also important for antiviral siRNA (small interfering RNA) design, as siRNA targeting the SLα region has been effective in vitro.23-25
The representative set of HBV genotype A-H sequences (n = 32) were taken from a reference alignment used in HBVRegDB.5,16 The annotations on the HBV sequences were manually curated by comparing with the annotation analysis obtained from HBVdb.26 Representative woodchuck hepadnavirus sequences (n = 9) were retrieved from GenBank. The woodchuck hepadnavirus sequences were manually annotated. Multiple sequence alignment was done using the CLUSTALW 2.0.27,28 Phylogenetic analysis was done using HBVseq.29 Using the aligned sequences, CDS (coding sequence) annotations and phylogenetic data as input, sequence conservation analyses were done using CDS-plotcon webserver and StructureDist algorithm in SSE 1.1.30,31
Seed sequences of SLα and SLβ were extracted from the aligned sequences. Based on previous studies using NMR (PDB: 2JYM)17 and/or mutagenesis,4,13 Stockholm files of SLα and SLβ were prepared. RNA secondary structures were drawn using R2R 184.108.40.206 Covariance models were built and calibrated using Infernal 220.127.116.11 Comparison of the covariance models with other Rfam models were done using CMCompare webserver.18 Using cmsearch tool from Infernal, the covariance models were evaluated on 41416 complete viral genomes using 5-fold cross-validation (6495 Orthohepadnavirus and 34921 other viruses; retrieved from GenBank viral division at 24 February 2016). Similarity searches of human transcriptome (retrieved from RefSeq Release 74 at 1 March 2016) were done using the score thresholds obtained from the cross-validation step. The covariance models have been submitted to Rfam database.
No potential conflicts of interest were disclosed.
CSL is a recipient of a Dr Sulaiman Daud 125th Jubilee Postgraduate Scholarship.