|Home | About | Journals | Submit | Contact Us | Français|
Like all retroviruses, the Human Immunodeficiency Virus (HIV) selectively packages two copies of its unspliced RNA genome, both of which are utilized for strand-transfer mediated recombination during reverse transcription – a process that enables rapid evolution under environmental and chemotherapeutic pressures. The viral RNA appears to be selected for packaging as a dimer, and there is evidence that dimerization and packaging are mechanistically coupled. Both processes are mediated by interactions between the nucleocapsid (NC) domains of a small number of assembling viral Gag polyproteins and RNA elements within the 5′-untranslated region (5′-UTR) of the genome. A number of secondary structures have been predicted for regions of the genome that are responsible for packaging, and high-resolution structures have been determined for a few small RNA fragments and protein-RNA complexes. However, major questions remain open regarding the RNA structures, and potentially the structural changes, that are responsible for dimeric genome selection. Here we review efforts that have been made to identify the molecular determinants and mechanism of HIV-1 genome packaging.
During the Late Phase of the viral replication cycle, the Human Immunodeficiency Virus Type-1 (HIV-1) selectively and efficiently packages two copies of its positive strand, unspliced, 5′-capped and 3′-polyadenylated RNA genome by a mechanism that has been extensively studied (for previous reviews, see1–9) but remains only partially understood. The packaging mechanism efficiently discriminates against the monomeric genome, the spliced viral mRNAs that encode for viral accessory envelope proteins, and the more highly abundant cellular mRNAs2. Packaging is mediated by the retroviral Gag proteins, which can efficiently assemble in the absence of their native genomes by incorporating an equivalent amount of cellular RNAs10–14. Although retroviruses can package essentially any RNA (some mutants even package ribosomes14), RNAs containing the appropriate viral packaging signals are efficiently enriched in assembling virions. The Gag protein contains three independently folded domains (from N- to C-terminus): matrix (MA), capsid (CA) and nucleocapsid (NC), as well as three other unstructured but functionally important segments, Figure 1. Genome selection appears to proceed via the direct binding of NC to conserved RNA packaging signals, called Ψ-sites, that are generally located near the 5′-end of the viral RNA. It now seems likely that a ribonucleoprotein complex comprising a relatively small number of Gag molecules and two copies of the genome is trafficked to plasma membrane assembly sites, where several thousand additional Gag molecules localize and assemble to form an immature virus particle15,16. During or shortly after budding, the Gag proteins are cleaved by the viral protease to produce the mature MA, CA and NC proteins, which rearrange to form the mature and infectious virus particle, Figure 1.
The requirement for two genome molecules is intriguing since all other viruses contain only a single copy of their genetic material17. Both RNA molecules are utilized for strand transfer-mediated recombination during reverse transcription18,19, but only one DNA allele is generated, and retroviruses are therefore considered “pseudodiploid.” Recombination appears to enhance viral fitness in several ways. It enables strand transfer-mediated read through at sites of RNA damage20,21, which could serve as a mechanism for dealing with what appears to be a relatively fragile genome22, and as a defense against restriction nucleases23. In addition, although cells containing a single integrated provirus can only produce homozygous virions, cells from infected individuals have sometimes been observed to contain several integrated proviruses24–26, which probably explains the very high number of circulating recombinant forms of HIV-127. Thus, strand transfer-mediated recombination from heterozygotes likely serves as a primary pathway for the rapid evolution viruses that are resistant to antiretroviral therapies28. Retroviral genomes exist as weak, non-covalently linked dimers in immature and young virus particles, and the stability of the RNA dimer29–34 increases with virus age, which might be important for subsequent reverse transcription events. The genome also appears to play a structural role in virus assembly, although this function can also be achieved by cellular RNAs13.
As for most other retroviruses, the nucleotides that participate in HIV-1 genome selection appear to reside near the 5′-end of the genome and primarily within the 5′-Untranslated Region (5′-UTR)1–9. Relatively short elements within the 5′-UTR that are independently capable of directing heterologous RNAs into assembling virus-like particles (VLPs) have been identified for some retroviruses (for example, the Rous Sarcoma Virus (RSV)35–40 and Moloney Murine Leukemia Virus (MoMuLV)41,42), but HIV-1 appears to require most of it’s 5′-UTR43–58 as well as downstream nucleotides within the gag coding region59–61 for optimal packaging efficiency. The 5′-UTR is the most conserved region of the HIV-1 genome62,63 (http://www.hiv.lanl.gov/), and in addition to promoting packaging, it also helps regulate or promote transcriptional activation, splicing, primer binding during reverse transcription, and dimerization. For some retroviruses, elements known to be important for packaging reside downstream of the major 5′ splice donor site (SD), providing a potential mechanism for selecting the full length genome and ignoring spliced viral mRNAs64. RNA elements that are believed to facilitate HIV-1 genome packaging reside near elements that promote RNA dimerization3,4,6,8, and since both processes are promoted by the NC domain of Gag, it is likely that genome dimerization and packaging are intimately coupled1,2,65. This now certainly seems to be true for the evolutionarily distant MoMuLV virus66–68. Efforts to elucidate the structural determinants and mechanisms that regulate these activities have been made using a variety of biochemical, in vivo packaging and computational approaches, and there is general consensus that some activities are controlled by well-defined hairpin structures within the HIV-1 5′-UTR48–50,54,57,69–75. However, there is less agreement regarding the structures that regulate genome packaging. Here we review efforts that have been made to understand the mechanism of HIV-1 genome selection, with emphasis on the RNA and protein-structures that appear to play important roles.
There is now considerable evidence that genome selection is mediated primarily by the NC domain of the HIV-1 Gag polyprotein. Substitution of the HIV-1 NC domain by that of MoMuLV leads to the preferential packaging of the MoMuLV RNA76 into HIV-1 derived chimeric virions, and conversely, substitution of the MoMuLV NC domain by that of HIV-1 results in preferentially packaging of the HIV-1 genome77. Similar results have been observed for other retroviruses78. Retroviruses belonging to a given genus appear to be more readily capable of packaging each other’s RNAs79–81. Non-reciprocal packaging has also been observed for some combinations of retroviruses82,83, and interestingly, the HIV-1 and mouse mammary tumour viruses (MMTV) preferentially package their native RNAs, even when the NC domains of these retroviruses are swapped84. These findings suggest that the NC domains are not uniquely responsible for RNA selection.
Except for the spumaviruses, all retroviral NC proteins contain one or two copies of a conserved CCHC array motif (Cys-X2-Cys-X4-His-X4-Cys; X = non Cys/His amino acid) that was originally discovered by Henderson and co-workers85 and later proposed by Berg to function as a zinc binding site86. Although early biochemical studies led to suggestions that the arrays have weak affinity for zinc87 and do not contain zinc in virions88,89, subsequent in vivo mutagenesis experiments showed that conservative single atom S to O substitutions in the arrays (Cys to Ser) block both viral replication and genome packaging, consistent with a zinc binding function90. Studies with peptides and both recombinant and virus-derived HIV-1 NC protein confirmed that the arrays are capable of binding zinc with high affinity91–94, and zinc-edge Extended X-ray Absorption Fine Structure Spectroscopy (EXAFS) studies of intact retroviruses eventually provided strong evidence that the arrays are populated with zinc in mature particles95,96. These and other studies44,90,97–99 have provided convincing evidence that the NC zinc knuckles play an essential role in selecting the unspliced genome during retrovirus assembly. Compounds that eject zinc from the NC zinc knuckles have potent antiviral activity in cell cultures100, and several antiviral zinc ejectors have been identified100–103. Unfortunately, poor specificity and toxicity have thus far precluded the therapeutic development of this class of antivirals, although they have been used as reagents in structure probing experiments104.
HIV-1 genome packaging might also be facilitated by the nucleic acid chaperone property of NC, which is known to promote dimerization of viral RNA33,105,106. Recent studies have shown that dimerization of the MoMuLV packaging signal leads to a register shift in base pairs that exposes high affinity NC binding sites, and that the MoMuLV NC protein facilitates the RNA structural changes associated with dimerization66. An important future goal should be to determine if the HIV-1 leader exhibits similar dimerization-dependent NC binding. NC can also stimulate annealing of the appropriate cellular tRNA primer to the primer binding site (PBS), a requirement for the initiation of reverse transcription107,108. In the presence of NC, pausing of reverse transcriptase (RT) is greatly reduced, thus increasing the efficiency of the synthesis of full-length DNA109,110. NC induces annealing of the strong stop cDNA with the complementary region of the 3′-end of the genomic RNA, and increases the strand transfer efficiency during reverse transcription111–115. Mutational analysis indicates that the N-terminal highly basic residues of NC are crucial110, and the two specific zinc-finger architectures also appear to be required for its nucleic acid chaperone activity. As discussed in a recent review by Levin and coworkers116, the mechanism for the chaperone activity likely involves relatively weak and labile NC-RNA interactions that lower the energy barrier for breaking and/or reforming base pairs117–119.
To date, no X-ray crystal structures have been reported for any CCHC zinc knuckle. However, NMR structures have been reported for several isolated zinc knuckle domains120–123, intact NC proteins96,124–129, and NC:RNA complexes66,128,130–137. The zinc knuckles form highly stable mini-globular domains that are structurally distinct from other CCHC-type modules131,138,139. Observation of weak NOEs between aromatic residues of the N- and C-terminal HIV-1 NC zinc knuckles suggested that the domains pack together in solution126, but an alternative model, in which the zinc knuckles are connected by a flexible tether and interact weakly, was proposed on the basis of NMR relaxation and chemical shift analyses140. The knuckles of the HIV-2 and SIV NC proteins appear to interact tightly with each other in the absence of nucleic acids127,128. Tight inter-knuckle packing is clearly observed for HIV-1 NC upon binding to RNAs with high affinity binding sites (see below)132,133,137.
The MA domain of Gag plays important roles in intracellular trafficking and membrane targeting141–144, and by this means may also be important for genome packaging. In 1993, Arlinghaus and co-workers showed that approximately 18% of the cellular MoMuLV Gag protein is associated with the nucleus, and a role of nuclear Gag in regulating splicing and/or dimerization activities was proposed145. Green and co-workers subsequently reported that mutating two basic residues in the HIV-1 MA domain of Gag can lead to accumulation of Gag and the viral genome in the nucleus146. These mutations also led to a significant reduction in virus particle production, and the particles that were formed were poorly infectious and appeared to contain predominantly monomeric genomes. Earlier studies had shown that the MA domain of Gag contains a nuclear localization signal (NLS)146–150, and these findings suggested that MA also possesses a nuclear export signal (NES)146, and that both signals play a role in the transient shuttling of Gag through the nucleus prior to virus assembly. These findings apparently have not been replicated in other laboratories, which has raised questions about the proposed role of nuclear shuttling by HIV-1 Gag. However, Parent and co-workers reported a similar transient nuclear shuttling activity by the Gag proteins of the Rous sarcoma virus (RSV). In this case, nuclear export could be blocked by specific mutations in MA or treatment agents that inhibit the Chromosome Region Maintenance-1 (CRM-1) dependent nuclear export pathway151,152. Interestingly, a mutation in RSV MA that enhanced membrane binding (Myr1Glu) and blocked nuclear shuttling led to the formation of particles with reduced levels of primarily monomeric genomes153,154. These studies suggested that transient nuclear localization of Gag plays a role in genome packaging. Although arguments have been made that question a nuclear shuttling role for Gag in genome packaging152, Parent and co-workers very recently showed that wild-type levels of RSV genome packaging could be restored by inserting a non-viral NLS into Myr1Glu-Gag154. Thus, although questions remain regarding the ability of HIV-1 Gag to transiently access the nucleus, the more extensive studies with RSV are consistent with a packaging mechanism in which genome recognition occurs in the nucleus, and transient nuclear shuttling of Gag, directed by the MA domain, is required for efficient trafficking of the genome from the nucleus to virus assembly sites on the plasma membrane. Note, however, that very recent studies by Hu and co-workers indicate that dimerization of the HIV-1 genome takes place mainly in the cytoplasm, subsequent to nuclear export155. It is conceivable that different retroviruses use unrelated mechanisms to select their genomes. An alternative (and untested) possibility is that multiple RNA export pathways are available to all retroviruses, and that the dominant pathway either changes temporally with the age of the infected cells or is dependent on conditions employed for the packaging/replication experiments.
The MA domain could also function by interacting directly with the viral RNA. Because they are highly basic, retroviral MA proteins are capable of binding to nucleic acids156–158. Early SELEX experiments identified a short RNA sequence that can bind to MA with high affinity (~10−9 M), but the sequence did not correspond to sequences within the viral genome and the biological relevance of these findings was not clear159. A subsequent SELEX study by a different group identified a different RNA with high affinity for the MA domain of Gag (~5 × 10−7 M) that possessed a sequence similar to one in the Pol gene160. Although mutations in Pol designed to disrupt interactions with RNA led to a delay in replication, the relevance of the specific interactions proposed by these studies remains to be firmly established. More recent studies examined the abilities of nucleic acids and liposomes to compete for the myristylated MA protein and the myr-MA domain of a MA-CA construct. These studies showed that liposomes can only compete efficiently for MA if they contain phosphatidylinositol-4,5-bisphosphate (PIP2), a cellular factor required for Gag membrane targeting161,162, suggesting that MA interactions with the viral genome may contribute to membrane selectivity163. Similar conclusions were reached in a subsequent study164. Other studies have shown that the ability of HIV-1 Gag to promote tRNA binding to a PBS oligonucleotide is stimulated by the binding of PIP2 to the MA domain165. It therefore seems plausible that MA-RNA interactions, mediated by PIP2-dependent membrane binding, could function in the regulation of Gag’s chaperone activity, possibly inhibiting primer binding prior to membrane binding165.
Early studies by Mangel and co-workers showed that RNAs extracted from the RSV under mild denaturing conditions were dimeric, as determined by sucrose gradient sedimentation, but formed monomers in the presence of the gene-32 protein that unwinds double-helical nucleic acids structures and preferentially binds to single stranded RNAs166. These studies showed that the RNA in virions exists as non-covalently linked dimers, but they did not rule out the possibility that the RNAs might be recognized by Gag as monomers and form dimers during or shortly after assembly. About the same time, Berezesky and co-workers showed that MoMuLV virions produced from cells treated with actinomycin D, which inhibits mRNA synthesis, packaged significantly less viral RNA, and that the RNAs that were packaged were dimeric167. These findings suggested that RNAs might be selected for packaging as dimers prior to virus budding and assembly. MoMuLV particles that were rapidly harvested and mutant virions that lack protease activity, package loosely formed dimers that are readily dissociated by mild heating33, suggesting that protease-induced viral maturation promotes a maturation of the viral RNA to a more stable dimeric form. Similar maturation-dependent RNA dimer stability has been observed for HIV-1105. More recent studies by Laughrea and co-workers revealed that small amounts of monomeric genome could be isolated from very young virus particles (i.e. less than 30min old), and that the monomeric RNAs convert to dimers in a protease- and time-dependent manner168. Sakuragi and co-workers showed that HIV-1 derived RNAs containing an additional, downstream 5′-UTR element (which lacked the PolyA and SD residues, to prevent polyadenylation and aberrant splicing) could be efficiently packaged into HIV-1 virus-like particles as monomers, suggesting that it is the structure of the dimeric 5′-UTR that is responsible for packaging169. A 144-nucleotide region that included residues of the PBS through the AUG hairpin was identified by this approach as the minimal sequence necessary and sufficient for genomic RNA dimerization during virus assembly169,170.
Early biochemical and mutagenesis studies led to proposals that dimerization is initiated by a conserved hairpin that serves as the dimer initiation site (DIS)48,51,74,171–175, and recent genetic and biochemical studies by Hu and co-workers have provided strong in vivo evidence in support of this mechanism176,177. They showed that, for viruses produced by cells dually infected with different HIV-1 subtypes (subtype B, DIS loop sequence GCGCGC; and subtype C, DIS loop sequence GUGCAC), the inter-subtype recombination frequency is much lower than intra-subtype recombination. The recombination rate increased four-fold by a 3 nucleotide substitution in the subtype B DIS, which made it fully complementary with the subtype C DIS176. Similar conclusions were reached in studies involving co-infection with mutants containing DIS sequences that were complementary to each other, but not self-complementary177, Figure 2. These results strongly argue that HIV-1 genomes dimerize prior to packaging, and that RNA-RNA recognition is mediated by the DIS.
The incorporation of RNA into assembling particles was recently imaged in live cells by Bieniasz and colleagues15,16. In these studies, the genome RNA was engineered with multiple stem loops that bind to the bacteriophage MS2 coat protein, which could then be imaged in vivo upon binding to an MS2-NLS-GFP (GFP = green fluorescence protein) protein. The HIV-1 Gag protein was tagged with the mCherry fluorophore. The genome RNA and Gag proteins were then imaged in co-expressed living cells using dual-color total internal reflection (TIR) fluorescence microscopy. In the absence of Gag, genome RNA molecules were highly dynamic and did not localize at specific sites at or near the plasma membrane. However, in the presence of Gag, genomes were targeted to well-defined sites on the plasma membrane, initially exhibiting slow lateral movement and gradually becoming static. The targeting of the RNA molecules to plasma membrane assembly sites was achieved by a sub-detectable number of Gag proteins (likely ~12 or fewer). Detectable and increasing populations of Gag molecules appeared at the RNA localization sites after the RNA became stationary. The intensity of GFP-labeled RNA signal did not increase throughout this process, indicating that both RNA molecules are simultaneously trafficked to plasma membrane assembly sites.
The location in the cell where retroviral RNAs dimerize has been a subject of considerable recent interest. As indicated above, RSV genomes are likely to dimerize in the nucleus. Dimerization of the MoMuLV genome has also been proposed to occur within the nucleus, based in part on the observation of higher recombination frequencies observed for co-expressed MoMuLV-based vectors transcribed from physically proximal proviruses as compared with vectors expressed by spatially separated proviruses 178,179. In addition, co-expressed MoMuLV genome RNAs preferentially self-dimerize, resulting in a non-random co-packaging of homo- and heterodimeric genomes consistent with dimerization in the nucleus180,181. In contrast, HIV-1 co-packages co-expressed genomes as homodimers and heterodimers in random proportions, suggesting that dimerization may occur outside the nucleus 182,183. Very recently, Hu and co-workers developed a cell fusion system, in which two groups of cells (modified to help enable cell-cell fusion) were differentially infected with two strains of HIV-1, each containing mutations in Gag that cause a severe replication defect. When the two groups of cells are fused, the different Gag proteins can co-assemble, rescuing each other’s replication defect. Interestingly, viruses that assemble and bud from the fused cells contain heterozygous RNAs at levels similar to those obtained using a single cell co-infection system (based on a recombination assay), suggesting that HIV-1 RNA dimerization occurs in the cytoplasm and not in the nucleus184. In addition, cells were co-transfected with viral RNAs mutated to facilitate nuclear export by either the CRM-1 or nuclear export factor-1 (NXF1) dependent pathways. Different viral RNAs exported by the same pathway (CRM-1 or NXF1) were found to dimerize at levels similar to those observed for wild-type like RNAs, whereas differentially exported RNAs exhibited a reduced tendency to be packaged as heterodimers184. Thus, the findings obtained by different laboratories involving different retroviruses argue both for and against nuclear-associated genome dimerization. It is conceivable that dimerization occurs in multiple sites in infected cells, and the site that predominates depends on temporal or other conditional factors that have not been explored.
Over the past 20 years, considerable effort has been made by many research groups to identify the RNA residues and structures that are responsible for genome selection and packaging. Most studies indicate that the primary packaging determinants reside within the 5′-UTR, and most structural studies therefore focused on this region of viral genomes. Deletion mutagenesis studies have shown that ~120 nucleotides located upstream of the Gag start codon are required for efficient packaging43–46,48–51. Although this region has sometimes been referred to as the Ψ-site, accumulating lines of evidence indicate that efficient HIV-1 genome packaging requires nearly the entire 5′-UTR and possibly extends into the gag coding region9,54,59,185. Thus, unlike for some other retroviruses, a minimal packaging element that is independently capable of directing efficient HIV-1 genome packaging has yet to be identified and independently validated.
Electron microscopy (EM) studies of RSV, MoMuLV and some other tumor viral RNAs purified from sucrose gradient sedimentation revealed that residues near the 5′-end of the RNAs (ca. 300–500 nucleotides) adopt X- or Y- shaped dimer linkage structure (DLS)29,32,65,186,187. Circular loop structures without free 5′-ends were observed in the EM studies of the HIV-1 genome RNAs extracted from virions188. It is worthy to note that the RNA samples used in this study were treated by 50% formamide with 2.5 M of urea in order to unwind the tightly packed coil structures, suggesting that the dimer linkages are highly stable. Similar loop structures were observed in a NC-incubated RNA (1–744nt) prepared by in vitro transcription71. These findings were interpreted in terms of two dimer interfaces near the very 5′-end of the RNA.
More than 20 different secondary structures have been predicted for the HIV-1 5′-UTR over the past 25 years47–50,69–71,73–75,189,190. The collection of site-directed mutagenesis experiments, chemical and enzymatic accessibility assays, phylogenetic studies, and free energy calculations are generally consistent with a 5′-UTR structure that consists of a series of stem-loop structures connected by relatively short linkers, Figure 3. These stem-loops are (from 5′ to 3′): trans-activation region (TAR), the 5′ polyadenylation signal (polyA), the primer binding site (PBS), the dimer initiation site (DIS), the major splice donor (SD), and the Ψ hairpin. The chemical probing data for these regions are not entirely self-consistent, and for other regions of the 5′-UTR, significant variations in nucleotide accessibility have been reported. Results of nucleotide accessibility mapping experiments highlighting similarities and differences among the chemical probing results obtained for these regions of the 5′-UTR are summarized in Figure 5. Based in part on these variations, five different models have been proposed for the AUG region over the past eight years alone, Figure 3b-f, and six different models have been proposed for the PBS loop, Figure 4. In the following subsections, experimental data and modeling results are summarized according to the current understanding of structure/function relationships.
There has been considerable interest in the TAR domain of the 5′UTR due to its essential role in Tat-mediated transcriptional activation191–194, but other roles in replication have also been proposed, including roles in dimerization71, strand transfer during reverse transcription171, as a possible HIV-1 derived miRNA during latency184, and packaging54,55,195. Mutagenesis studies have sometimes been difficult to interpret due to the dominant negative effects that the mutations have had on transcription. For example, Berkhout and co-workers showed that mutations designed to disrupt the structure of TAR lead to significant reductions in genome packaging efficiency195 and concluded that the TAR hairpin structure shown in Figure 3b is important for genome packaging. Similar results have been reported by Clever and co-workers196. However, efficient genome packaging and replication was subsequently demonstrated for an HIV-1 variant, in which the TAR hairpin and Tat gene is substituted by a tetracycline-inducible tetO-rtTA system197, indicating that the TAR hairpin is required for efficient transcription and replication but does not play an essential role in packaging198.
Chemical probing, mutagenesis, and other structural studies are in general agreement that the 57-nucleotide TAR sequence forms the stable hairpin shown in Figure 3b71,104,199. The TAR hairpin contains a conserved 3-nucleotide pyrimidine bulge that binds to the viral Tat transactivator protein200–203 and an apical 6-nucleotide loop that binds to the cyclin T1 subunit of the cellular transcriptional elongation factor (pTEFb)204–206. The high-resolution 3D solution NMR structure of the upper portion of the TAR stem loop has been determined in both the absence and presence of the bound ligands155,200,207–218, Figure 6a. The unbound Tat binding 5′-bulge residues are conformationally flexible, but become rigid upon binding to Tat-derived peptides. Binding is mediated by an arginine side chain (Arginine fork)216,219, which stabilizes a coaxially aligned TAR conformation by interacting with residues at the interface that connects the lower and upper A-form helical segments200,207,208,217,218. The 5′-uracil of the bulge forms a base triple with the adjacent A-U base pair, causing a significant distortion of the RNA backbone. The intrinsic conformational mobility of the bulge plays an important role in ligand binding220–224, and information about local dynamics trajectories and transiently formed structures has facilitated the identification of new inhibitors that bind to the Tat binding pocket and inhibit Tat-mediated activation and viral replication221.
Residues immediately following the Tat hairpin have been predicted to form a second hairpin, called polyA, which contains the AAUAAA polyadenylation signal in an unstructured loop190. This sequence is part of the repeat (R) element that is duplicated near the 3′-end of the viral transcript. Although the 3′ polyadenylation signal is known to function in mRNA maturation, little is known about the function of the polyA hairpin located in the 5′-UTR. Parslow and coworkers found that mutations disrupting the poly(A) base pairing caused profound defects in both genome packaging and viral replication196. Compensatory mutations that restored base pairing also restored genome packaging, leading to conclusions that the secondary structure of PolyA hairpin, and not its primary sequence, is important for packaging196. Berkhout and coworkers similarly found that destabilization of the polyA hairpin results in diminished genome packaging225. However, the Berkhout lab later showed that destabilizing the poly(A) hairpin can lead to significant reductions in the intracellular levels of HIV RNA, and this can account for the observed reductions in packaging. Their results further suggested that the polyA hairpin is required for both efficient repression of the 5′-polyadenylation site and full activation of the 3′-polyadenylation signal, but may not be essential for packaging (although a 5′-terminal hairpin seems to be necessary)226.
To date, no 3D structural information is available for residues of the polyadenylation signal. In addition to the proposed hairpin structure shown in Figure 3b, residues in this region have been proposed to form base pairs with DIS in the long-distance interactive (LDI) model of Berkhout and co-workers227. Paillart and co-workers suggested on the basis of a phylogenetic analysis and biochemical studies that the loop residues of the polyA hairpin base pair with residues in the matrix coding region of the genome, forming a long range pseudoknot228. This proposed interaction is conserved in all HIV-1 isolates as well as in HIV-2 and simian immunodeficiency virus228.
Residues of the PBS region of the 5′-UTR play a critical role in replication by serving as the binding site for the human tRNALys RNA, the primer for reverse transcription, and serving as regulatory elements that affect the initiation of reverse transcription208,229–231. Numerous secondary structural models have been proposed for residues of the PBS, based on chemical and enzymatic probing, computer modeling, and mutagenesis experiments71–73,104,199,205,213,227,231–234, Figure 4. In the upper loop, some of the structures contain an apical hairpin ending in an A-rich loop, which is not supported by the phylogenetic analysis. Kjems and co-workers instead proposed that the important A-rich sequence is displayed in an internal loop which is conserved in all major subtype isolates of HIV-171, Figure 4a. The bottom region of PBS has more variations with the most dramatic difference in the CU rich region, Figure 4b. In spite of the significant discrepancy in the top and bottom part of PBS, nearly all models include a common base paired helix PBS2 in the center of the PBS, Figure 4.
The Lever group first predicted that the CU rich region forms a long-range interaction with the AG rich linker between the Ψ and AUG hairpins72, and supporting evidence has more recently been obtained by mutagenesis235 and SHAPE probing104,233, Figure 5. Kjems and coworkers predicted that the CU rich region forms a small hairpin with the nucleotides right after the PBS2 stem loop9. However, the Berkhout group interpreted chemical reactivity of the CU rich region in terms of a loop conformation, Figure 5. To identify the contribution of the PBS region to genome packaging, Parslow and co-workers57 created a series of 13 disruptive and compensatory mutations within this region using a single-round replication assay. In agreement with the results from the Berkhout group213 using spreading-virus assay, the top part of PBS had little functional effect on genome packaging. However, although mutations designed to disrupt the predicted base pairing in the lower stem impaired replication, compensatory mutations intended to restore its structure exacerbated the defect. Mutations targeting the middle stem PBS2 by both groups confirmed its biological role in genome packaging. However, compensatory PBS2 mutations designed by Berkhout group failed to restore replication while Parslow group’s mutations corrected the packaging defect and partially restored the viral infectivity. This discrepancy may have resulted from different mutation sequences or using different assays. Although there is currently little agreement regarding the structure of the PBS, all studies consistently indicate that some of the PBS residues play an important role in genome packaging.
The DIS has also received considerable attention, with nearly all studies supporting the hairpin structure shown in Figure 3b. The DIS promotes dimerization via formation of intermolecular “kissing contacts” involving the GC-rich palindromic loops74,236, Figure 3b. As indicated above, co-infection with plasmids that encode non-self-complementary DIS loops that are complementary to each other has been used to enhance packaging of heterodimeric RNAs177. Although DIS-containing oligo-RNAs are capable of converting from a kissing dimer to an extended duplex, larger RNAs that include the entire 5′-UTR and the first 265 nucleotides of the gag coding region do not appear to form an extended duplex, and it is likely that the kissing dimer is the biologically relevant species173. Mutations designed to disrupt the DIS structure, or deletion of the entire DIS stem-loop, inhibit RNA dimerization in vitro and lead to significant reductions in both genome packaging and infectivity in vivo48,51,53,74,171–175,193,210. Multiple secondary structures have been predicted for residues below the GG bulge, Figure 3b71,104. Although DIS is important for genome packaging, it does not appear to bind with high affinity to the NC protein237, and it is conceivable that, as observed for the MLV packaging signal66, DIS-dependent HIV-1 genome dimerization promotes exposure of high-affinity NC binding sites.
The structure of the kissing dimer has been determined by NMR74,237–242 and crystallography243,244, Figure 6b. Some discrepancies are apparent upon comparison of the structures. The NMR solution structure shows a symmetrical homodimer where the averaged loop plane is approximately perpendicular to the axis of the stems, whereas a perfect coaxial alignment with the stems was observed in the crystallography structure. Unlike the NMR structure where all bases are inside the RNA molecule, the crystallographic structure presents the two loop adenines as stacked in a bulged out conformation. A recent NMR structure of the kissing dimer consisted of two coaxially aligned hairpins with loop adenines in a bulged in conformation239. The kissing dimer can be converted into a more thermodynamically stable extended dimer by incubation at 55°C245–247, or by incubation at lower temperatures in the presence of NC106,248,249, and X-ray and NMR structures of the extended dimer have also been solved246,247. However, as indicated above the biological relevance of this species remains uncertain. Marino and co-workers demonstrated a proton-coupled dynamic conformational switch in the kissing complex. The NC catalyzed maturation of the kissing complex was shown to directly correlate to the observed proton coupled dynamics242. The structure237,250 and dynamics251 of the lower stem and bulge have also been examined.
We note here that a second dimer interface that is separate and distinct from the DIS was proposed on the basis of EM studies (see above). Based on computer-modeling analysis, polyA was proposed to be the second DLS188. However, deletions and mutations of the proposed dimer-promoting, palindromic AAGCUU element in PolyA were found to have no impact on genome dimerization58,252. More recent biochemical studies with oligo-RNAs indicate that TAR has a high propensity to dimerize and is capable of promoting dimerization in RNAs that lack the DIS sequence, and it was therefore suggested that TAR may contribute to a second DLS71. However, as indicated above, TAR can also be deleted or substituted by a non-viral transcriptional activator without significantly affecting RNA packaging or infectivity. Thus, the second dimerization site observed in the EM images either has yet to be properly identified, or its role in genome packaging not significant.
Results of nucleotide accessibility mapping are generally consistent with the SD hairpin structure shown in Figure 3b. The major splice donor site is located in the GGUG loop of the proposed hairpin. The isolated SD hairpin is capable of binding NC with high affinity253. The three-dimensional structure has been determined for the isolated DIS hairpin253 and its complex with NC133 have been determined by NMR methods, Figure 6c. High affinity binding is mediated by interactions between the zinc knuckles of NC and exposed guanosines in the GGUG tetraloop (as also observed in a NC:Ψ hairpin structure; see below)133. Interestingly, mutations designed to disrupt the base pairing in the lower stem of the hairpin did not affect genome encapsidation or replication, leading to suggestions that this hairpin does not play a role in genome packaging48,49.
Most evidence indicates that the hairpin structure predicted for the Ψ residues (Figure 3b) plays an important role in genome packaging – hence its name. Hayashi et al reported that a fragment of the 5′-UTR containing the Ψ hairpin is capable of directing the packaging of heterologous RNAs into virus-like particles69, but this finding has not been independently verified. Russell and co-workers reported that the Ψ hairpin and downstream GA rich region are required not only for packaging but also for genome dimerization235. Deletion of Ψ, and mutations designed to disrupt base pairing in the stem, have been reported to lead to significant reductions in genome packaging47,49,59,235, whereas compensatory mutations designed to restore base pairing in the stem47, and the complete substitution of Ψ by a different NC-binding RNA fragment254, largely restored packaging efficiency.
RNAs containing Ψ bind NC with high affinity75,133,211, and high-resolution NMR structures have been determined for the free Ψ hairpin255 and for a high affinity NC:Ψ RNA complex132, Figure 6d) Both of the HIV-1 NC zinc knuckles contain a hydrophobic pocket that interacts specifically with the bases of exposed guanosines130,132,133,137. For both zinc knuckles, the guanosine nucleobase inserts deeply into the hydrophobic pocket and forms hydrogen bonds with backbone NH and carbonyl groups of the polypeptide. The ability to bind specifically and tightly to exposed guanosine bases was proposed to be a primary function of the HIV-1 NC zinc knuckles132,133. Surprisingly, although mutations that disrupt the stem structure of the Ψ hairpin severely impair genome packaging, substitution of the NC binding GGAG loop by GCUA or AAGA did not significantly affect packaging or replication235. Both substitutions contain a tetraloop guanosine, and oligoribonucleotide Ψ RNAs containing these mutations are capable of binding NC (albeit with weaker affinities of ~ 4 μM and ~ 800 nM, respectively) (Heng and Summers, unpublished results). The single zinc knuckle domain of the MoMuLV NC protein was also found to bind exposed guanosines66,136, but only one of the two RSV NC zinc fingers appears to function as a guanosine binding site upon binding to a minimal RNA packaging element135. Thus, guanosine recognition does not appear to be a universal function of the highly conserved structural motif.
Residues that span the gag start codon (here called AUG) have been proposed to adopt a number of different secondary structures, Figure 3b49,256,257. Early nucleotide probing, mutagenesis studies and free energy predictions were generally consistent with the hairpin structure shown in Figure 3b48,59,73. Studies of AUG-derived oligoribonucleotides indicated that the hairpin contains an unstable stem and a stable GNRA tetraloop (N: any base; R: G or A)258–260, and the 3D structure of the hairpin RNA was solved by NMR methods260, Figure 6e. GNRA tetraloops have often been found in nature to participate in long range RNA-RNA interaction214,261, and it was therefore hypothesized that the GNRA loop of AUG might interact with the other parts of the HIV-1 5′-UTR261. More recently, using a combination of phylogenetic analyses, biochemical probing, mutagenesis, and computer-assisted RNA modeling, Berkhout and co-workers suggested that AUG might instead base-pair with residues of the Unique-5′ element (U5) that connects the poly(A) and PBS stem loops234, Figure 3c. Supporting evidence for U5-AUG base pairing was obtained by Kjems and co-workers in enzymatic probing studies of a relatively large 5′-fragment of the HIV-1 genome (residues 1–744), combined with a newer computational RNA structure prediction algorithm71. However, subsequent nucleotide reactivity probing of viral RNAs in transfected cells and intact virions by Ehresmann and co-workers revealed that several nucleic acid bases predicted to participate in U5:AUG interactions were readily modified by chemical probes, indicating that they are exposed and highly reactive, a finding that is not compatible with the proposed U5:AUG base pairing62. More recently, results of SHAPE experiments employed by Weeks and co-workers were used to argue that residues of AUG do, in fact, base pair with residues of U5, forming a stable structure that persists in transfected cells, in virions, and in in vitro transcribed 5′-UTR RNAs104 (although the proposed base pairing pattern differed somewhat from the predictions by the Kjems and Berkhout laboratories). The SHAPE method is purportedly more sensitive to unstructured elements than traditional chemical probing techniques, and it is therefore unclear why the residues of U5 and AUG were reactive to traditional chemical probes but poorly reactive to the SHAPE probes. All of the in vivo chemical probing experiments were conducted using transfected cells and virus particles under what appear to be similar, native-like conditions. The AUG structure was revised in a subsequent paper by the Weeks group262, although the SHAPE reactivities of the participating residues did not appear to change relative to the earlier studies.
Although early studies showed that a relatively small region of the HIV-1 5′-UTR containing the Ψ hairpin is sufficient to direct the packaging of heterologous RNAs into virus-like particles69, more recent studies showed that mutations in the stem of Ψ designed to disrupt base pairing, or substitution of the NC-binding GGAG tetraloop by AAGA, resulted in only modest reductions in packaging47, and it is now clear that a much larger portion of the 5′-UTR participates in genome recognition and packaging. Unfortunately, 3D structural information for larger, multi-hairpin RNAs remains sparse. High-resolution structural studies of RNA by X-ray crystallography can be problematic, due in part, to conformational heterogeneity and a relatively uniform, negative surface charge that can hamper crystallization. NMR has its own issues: NMR signal degeneracy, spin relaxation, and a general paucity of long-range restraint information have limited the utility of NMR for studying even modest-sized RNAs. As of 2010, the RNA structures deposited in the Nucleic Acid Structure Database comprise only 25 nucleotides on average, and only three structures comprise more than 50 nucleotides263. For these reasons, structural studies of larger RNAs, including the intact 5′-UTR, have relied primarily on nucleotide protection, mutagenesis, and biochemical experiments62,71,73,79,104,199,233,234.
RNA cross-linking has been utilized to probe for long-range RNA-RNA interactions, and Kjems and co-workers used a UV cross-linking approach to identify a novel tertiary interaction within the PBS hairpin structure71. More recently, Fabris and co-workers used mass spectrometry and chemical cross linking to probe for potential long-range interactions in a 5′-UTR fragment encompassing residues of the DIS through the AUG hairpins264, and a 3D model was generated using the results of chemical crosslinking as distance restraints. These studies indicated that the 5′-UTR fragment forms a globular structure, in which the GAGA tetraloop of AUG participates in A-minor like interactions with the stem of DIS (a common interaction for this class of tetraloops258,261). Of course, in the context of the intact 5′-UTR, it is certainly possible that AUG could instead base pair with U5 or participate in other predicted long-distance interactions (see below). No other high-resolution 3D structural information for larger fragments of the 5′-UTR has been published to date.
There is considerable evidence that the HIV-1 genome, and its 5′-UTR, can adopt multiple conformations. Even the earliest nucleotide probing experiments by the Ehresmann group suggested that the dimeric form of the 5′-end of the viral genome (residues 1–500) likely adopt multiple distinct and mutually exclusive structures that could modulate the function of the 5′-UTR73. The isolated, recombinant HIV-1 5′-UTR (residues 1–373) appears to form a mixture of monomers and dimers, and native gel electrophoresis studies revealed intermediate bands after incubated for 8hr at 55°C in dimer formation buffer (50 mM Tris-HCl, pH 7.5, 10mM KCl, 1mM MgCl2)265. Berkhout and co-workers showed that a 1–290 nucleotide fragment of the HIV-1 5′-UTR can adopt two distinct monomeric conformations that migrate at different rates on native polyacrylamide gels266. The fast migrating species was proposed to form a rod-like structure (termed LDI, for “Long Distance Interactions”)267 based on site-directed mutagenesis and secondary structure calculations267, Figure 3f. Addition of Mg2+ shifts the equilibrium towards the slowly migrating species, which is referred to as the “branched multiple hairpin” (BMH) conformer266. A longer fragment (1–744) of the HIV-1 genome is also capable of forming LDI and BMH conformations71. The NC protein can efficiently promote dimerization of the BMH conformer, but not the LDI species, and the burial of the DIS loop in the LDI was proposed to inhibit dimerization267. The equilibrium of these two species was hypothesized to regulate dimerization, translation and genome packaging268,269. However, mutations designed to shift the LDI-BMH equilibrium did not significantly affect translation efficiency227. No evidence for an LDI-like species was observed by in vivo nucleotide probing, although the presence of a minor LDI species would likely be difficult to detect by this approach62.
There is also evidence suggesting that the structure of the genome changes as the virus matures. Genomes isolated from mature HIV-1 viruses under mildly denaturing conditions exist as a mixture of monomeric and dimeric species (80–95% dimer)168, and the proportion of dimers can be reduced to ~ 40–50% by inactivating the protease105, the distal NC zinc fingers252 or the DIS52,210. The electrophoretic mobility of the dimeric genome increases as the virus ages (from 2hr to 9hr). In addition, dimers in newly released viruses are thermolabile, and can dissociate during extraction270, but after ~ 4-6hr the dimers become “mature” and are thermostable168. Thus, both the monomer/dimer ratio and the stability of the dimer appear to vary as a function of the age of viruses168. In this regard, the conclusions based on SHAPE experiments that the 5′-UTR adopts a single structure in cells, in virions, and in vitro104, seems surprising.
A number of findings made over the past five years or so have significantly advanced our understanding of the mechanism of HIV-1 genome packaging. Fluorescence imaging studies have enabled real-time visualization in living cells of RNA trafficking, the assembly of Gag proteins, and the recruitment of cellular proteins during virus assembly15,16,271–273. These studies have shown that genome trafficking to virus assembly sites (from unknown origins in the cell) is dependent on Gag, that virions do not assemble repeatedly at pre-formed assembly sites, that two copies of the genome are simultaneously trafficked to the membrane, that the RNA-Gag complex contains a small number of Gag proteins (probably ~12 or fewer), and that additional Gag molecules are recruited for assembly subsequent to the docking of the initial Gag:RNA complex on the plasma membrane. Although small amounts of monomeric genomes can be isolated from young HIV-1 particles under mildly denaturing conditions, the findings from fluorescence imaging experiments, together with recent results of genetic recombination experiments, provide convincing evidence that HIV-1 genomes are selected for packaging as dimers.
A significant outstanding questions is: Where in the cell does Gag:RNA recognition occur? Over the past five years or so, Parent and colleagues have provided compelling evidence that genome packaging by RSV Gag is dependent on the shuttling of Gag through the nucleus, and that this process is mediated by the NLS and NES activities of Gag’s MA domain. Although Green and co-workers presented early evidence that HIV-1 Gag transiently accesses the nucleus prior to assembly, and that mutations in the MA domain of Gag that interfere with this process lead to aberrant genome packaging, the role of HIV-1 Gag in nuclear export of the viral genome remains both controversial and un-validated. By engineering the RNA to utilize different nuclear export pathways, Hu and co-workers have obtained evidence that dimerization of the HIV-1 genome probably takes place in the cytoplasm (although dimerization does appear to depend, at least to some extent, on the mechanism of nuclear export)184. Assuming that HIV-1 Gag binds preferentially to the dimeric genome (as observed for MoMuLV, but not yet reported for HIV-1), this finding would seemingly favor a packaging mechanism in which HIV-1 Gag:RNA assembly is initiated in the cytoplasm.
Regardless of the location of Gag:RNA recognition, there remains considerable interest in understanding the molecular basis for genome selection. How does Gag specifically direct the packaging of the dimeric, unspliced genome? Although the MA domain of Gag could play a direct role by binding tightly and specifically to the genome, such a mechanism has yet to be firmly established by experiment. Since MA plays known roles in intracellular trafficking, an alternative role would be to help direct Gag to sites in the cell where the viral genomes are located. On the other hand, there is considerable experimental evidence that the NC domain of Gag plays a direct role in genome selection by binding tightly to recognition elements in the viral RNA. In the case of MoMuLV, it appears that dimerization of the genome, which may be induced by the chaperone activity of NC, exposes about a dozen high affinity NC binding sites that were sequestered and unable to bind NC in the monomer, and it was suggested that dimerization of the MoMuLV 5′-UTR leads to formation of an RNA structure that exposes unstructured UCUG NC binding elements66,67,274. Similar conclusions were reached recently by SHAPE-based in virio chemical probing and mutagenesis experiments 68. The RNA binding properties of a larger MoMuLV Gag fragment differ from those of the isolated NC domain68, and quantitative NC binding and 3D structural studies of such complexes are therefore also warranted.
High-resolution NMR structures have been reported for several isolated RNA hairpins corresponding to functionally important elements within the HIV-1 5′-UTR, and for the most part, the structures are consistent with secondary structure predictions made on the basis of chemical accessibility and phylogenetic mapping, mutagenesis and biochemical studies, and free energy calculations. NC:RNA structures have been determined for two of the predicted hairpins (SD and Ψ) and for a single-stranded element within the U5 region of the 5′-UTR, revealing details about atomic-level interactions that are important for high-affinity NC:RNA binding. However, none of these RNA elements appear to be strictly required for packaging, and because these studies involved relatively small fragments of the 5′-UTR, they did not lead to insights into the mechanism of dimeric genome selection. To date, quantitative thermodynamic studies of the dimerization-dependent NC binding behavior of the HIV-1 5′-UTR (or other larger portions of the genome) have not been reported, and in this regard, it should be of interest to determine if the HIV-1 5′-UTR exhibits dimerization-dependent NC (or Gag) binding behavior similar to that observed for the MoMuLV 5′-UTR.
There is also a need to develop better approaches for structural studies of large RNAs that are conformationally heterogeneous, as appears to be the case for the HIV-1 5′-UTR. The use of chemical probing and nucleotide cross-linking experiments has provided insightful structural information for some regions of the 5′-UTR, but for other regions, inconsistencies among the experimental data have led to a variety of structural predictions and mechanistic interpretations. These inconsistencies could potentially be due to the presence of multiple RNA structures and conformational heterogeneity, which could complicate interpretation of bulk reactivity data. For example, a 100 nucleotide RNA corresponding to the MoMuLV core encapsidation signal was shown by NMR to exist under physiological salt conditions as a mixture of four minor monomeric conformers, two minor dimeric conformers, and several minor higher-order multimers, in addition to the major dimeric species274, and this was offered as a potential explanation for the significant differences observed between a dimer model derived by chemical probing275 and the NMR-derived structure274. Conformational heterogeneity is likely to present problems for any RNA structure determination method that cannot distinguish between multiple equilibrium species. New methods that combine NMR-derived high-resolution local structural information, which can distinguish between slowly interconverting RNA conformers, with low-resolution global structural information derived by single molecule cryo-Electron Tomography274 should prove useful for future studies of larger and functionally relevant fragments of the HIV-1 genome.
Support from the NIH (R01 GM42561, AI30917) is gratefully acknowledged.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.