|Home | About | Journals | Submit | Contact Us | Français|
Retroviral capsid cores are proteinaceous containers that self-assemble to encase the viral genome and a handful of proteins that promote infection. Their function is to protect and aid in the delivery of viral genes to the host’s nucleus, and, in many cases, infection pathways are influenced by capsid-cellular interactions. From a mathematical perspective, capsid cores are polyhedral-cages and, as such, follow well-defined geometric rules. However, marked morphological differences in shapes exist, depending on virus type. Given capsid’s specific roles in the viral life cycle, the availability of detailed molecular structures, particularly at assembly interfaces, opens novel avenues for targeted drug development against these pathogens. Here, we summarize recent advances in the structure and understanding of retroviral capsid, with particular emphasis on assemblies and the capsid cores.
The majority of enveloped viruses, including those in the Hepadnaviridae, Retroviridae, and Flaviviridae families, contain spherical or ovoid protein shells, the capsid core, that enclose the viral genome . While these viruses exhibit significantly different mechanisms of replication, they all face similar problems during particle assembly. Retroviruses exhibit diverse assembly mechanisms. B- and D-type assembly viruses, like mouse mammary tumor virus (MMTV) and Mason-Pfizer monkey virus (M-MPV), form immature particles in the cytoplasm, while C-type assembly viruses, like bovine leukemia virus (BLV), Rous sarcoma virus (RSV) and human immunodeficiency virus (HIV), assemble at the cytoplasmic face of the plasma membrane and bud directly from the cell surface . The determinants for assembly are encoded in the respective Group-specific antigen protein (Gag) polyproteins. Gag is the only viral protein required for assembly and generation of immature virus particles . In C-type assembly, myristoylated Gag precursor proteins associate with viral RNA in a patch at the inner leaflet of the plasma membrane of the host cell, growing laterally until budding off . As a consequence, the immature virus particle contains a cell-derived lipid envelope that is wrapped around the nascent viral capsid core [4,5].
In immature virus particles, unprocessed Gag is radially arranged, with the matrix protein (MA) at one end, anchored with its N-terminal myristoyl tail to the inner membrane surface, and nucleocapsid protein (NC) at the other end, interacting with the viral RNA, each playing independent roles. During viral maturation, Gag is cleaved by the viral protease (PR), generating three major protein cleavage products, MA, capsid (CA) and NC, which are common to all retroviruses, while different numbers of spacer-peptides and subdomains exist in specific viruses. Proteolytic processing by PR results in a dramatic rearrangement of Gag, with only MA remaining associated with the viral lipid envelope, while CA and NC condense around the viral genome. NC forms a ribonucleoprotein (RNP) complex inside the virion, and CA assembles into a lattice that ultimately generates the spherical or ovoid protein shell of the capsid core.
The capsid core plays numerous roles during the viral life cycle, including interfering with innate immune sensing [6-10], regulating reverse transcription [11-13], and controlling the nuclear import pathway [8,9,11,13-15]. A large number of these functions are highly dependent on interactions between the capsid core and cellular host factors [11,14-16]. For instance, some proteins stabilize the capsid core (Cyclophilin A (CypA)) , others disrupt it (TRIpartite Motif or TRIM proteins) , and several assist in nuclear import (Transportin 3, Nucleoporin 153, Nucleoporin 358 (Nup358), and the Cleavage and polyadenylation specific factor 6) [16,19,20]. Furthermore, the capsid core also engages with the cytoskeleton of the cell , using dynein and kinesin motors for trafficking along microtubules to the nucleus . Given the importance of the capsid core for infection and the life cycle of retroviruses, the last two decades have witnessed enormous efforts focused on determining atomic structures of CA, alone, in assemblies, and in complexes with cellular interactors. In this review, we present the current knowledge regarding the structure of the retroviral capsid core and its relationship to biological function.
All retroviruses, irrespective of whether they assemble in the cytoplasm or at the cellular membrane, initially form immature viral particles , which mature by proteolytic cleavage of Gag, generating polymorphic conical or spherical capsid cores . Despite this heterogeneity in size and morphology, the unit building block, the capsid protein, CA, is essentially identical among retroviruses . It contains about 230 amino acids and is composed of two α-helical domains that are connected by a flexible linker (Figure 1A) . The N-terminal domain (NTD)  and the C-terminal domain (CTD) [26,27] can exist as independently folded entities in solution . The NTD is monomeric, while the CTD and the full-length CA protein exist either as a monomer (human T-cell lymphotropic virus type 1 HTLV-1, RSV) [29,30] or in a concentration-dependent monomer-dimer equilibrium (HIV type 1, HIV-1), modulated by experimental conditions [31,32].
The NTD, comprising ca. 150 amino acids, consists of six or seven helices (Figure 1B) with a beta-hairpin at the N-terminus. In lentiviruses, the loop following helix 4 constitutes the binding interface for CypA, a human protein involved in immunological responses. The CypA-binding loop mediates interactions between the CA protein and this important host factor in a cell-type specific manner [9,33]. HIV-1 is unique among primate lentiviruses in its ability to interact with CypA and, therefore, is uniquely sensitive to cyclosporine A, compared to human immunodeficiency virus type 2 (HIV-2) or simian immunodeficiency virus of macaques (SIVmac) . Intriguingly, in rhesus macaques, the cypA sequence has been incorporated by retrotransposition into the trim gene. The superfamily of TRIM proteins are associated with the innate immune response. TRIM5α restricts retroviral infection early after viral entry into the target cell [18,35], recognizing and binding to the surface of capsid assemblies using a SPRY domain . Similarly, TRIMCyp recognizes the retroviral capsid core by using a CypA domain, rather than the SPRY domain . The CypA-binding loop of CA is conserved in lentiviruses and is also present in feline immunodeficiency virus (FIV)  and endogenous lentiviruses in rabbits (rabbit endogenous lentivirus type K (RELIK)) and lemurs (prosimian immunodeficiency virus (PSIV)). Since the structure of the CA NTD-CypA complex is remarkably similar for RELIK/PSIV and HIV-1 , a conserved function of CypA in lentiviral infection is likely. Indeed, the equivalent loop in RSV’s CA protein is essential for infectivity , although no direct evidence of interactions between this loop and a CypA-type host factor has been shown.
The CTD contains four helices, numbered eight to eleven, as well as a 310 helix (Figure 1C). The region around residues ~150-170 is known as the major homology region (MHR) and is highly conserved among all retroviruses  as well as the Ty3 retrotransposon . Intriguingly, in higher vertebrates, the fold of an endogenous synaptic protein, Arc, is remarkably similar to the retroviral CTD and has been proposed to be a remnant from a previous retroviral infection . Together, helix 9 and the 310 helix constitute the core of the CTD-CTD dimerization interface (Figure 1C and Figure 2), irrespective of whether in an individual CA protein dimer or a multimeric assembly (see below). Located at the center of the dimer interface are hydrophobic amino acids, such as L151, W184 and M185 in HIV-1, W133, I168 and L172 in BLV, and W153, L180 and V188 in RSV. The variability in the relative arrangement of monomers at the CTD dimeric interface is remarkable, and several distinctly different conformations are seen for the HIV-1 and RSV dimer interfaces (Figure 2A, 2B, 2E and 2F). For both CA proteins, dimerization affinity is pH dependent, and the quaternary structure of the dimer is governed by the protonation state of E175 in HIV-1  and by D191 in RSV  (Figure 2E, 2F).
Although structures and conformational details of the CA protein and its individual domains are informative, higher-order oligomeric arrangements constitute the units of extended CA lattices or helical assemblies. As we discuss below, the quaternary structures of the isolated CTD dimers and the relative orientations of the monomeric units in these dimers (Figures 2 A, B, E and F) exhibit striking similarities to their counterparts observed in assembled and crystallographic lattices (Figures 2 C, D, G and H), highlighting their biological relevance. In the following sections, we describe how the CA protein is put together in these oligomers.
Retroviral capsid cores exhibit different overall shapes (spherical, spherocylindrical, or conical), and their structures are created according to common design principles of convex polyhedral cages . Like icosahedral viral capsids, all retroviral capsid cores are built from hexameric lattices, and the closure of the shell is accomplished by incorporating twelve pentamers. The pleiotropic shapes of capsid cores are a consequence of the distribution of the pentamers in the shells, simply governed by mathematical considerations: spherical capsid cores have twelve uniformly distributed pentamers, spherocylinders have six pentamers at each end of a tube, and cones have different numbers of pentamers at the ends, the smaller number at the narrow end and the larger number at the wide end . Non-symmetric structures contain randomly and non-uniformly distributed pentamers. Intriguingly, the first retroviral pentamers were visualized in cryo-electron microscopy (cryo-EM) studies of in vitro assembled double-shelled RSV particles, which contained as the inner shell a sphere made-up from 12 pentamers, surrounded by an outer shell, containing 20 hexamers and 12 pentamers .
Given the importance of the hexameric and pentameric building blocks in retroviral capsid cores, high-resolution structures of hexamers and pentamers were pursued. Thus, HIV-1 CA proteins were engineered such that stable, cysteine cross-linked hexamers  or pentamers  were created for X-ray crystal structure determination of these oligomers. To that end, specific mutant CA proteins (A14C/E45C/W184A/M185A for the hexamer and N21C/A22C/W184A/M185A or P17C/R18L/T19C/W184A/M185A for the pentamer) were prepared, which permitted cysteine disulfide crosslinking between NTDs, while disrupting the CTD dimer (Figures 3A & 3B). The cross-linked CA hexamer is made up of two concentric rings, with NTDs forming the inner ring and CTDs the outer ring (Figure 3A). While extensive contacts exist between the NTDs, with helix 2 from one monomer packing against helices 1 and 3 from the neighboring monomer, no contacts are seen between CTDs. The cross-linked CA pentamer has a similar structure, except that the inner concentric ring contains five NTDs, with 15 helices forming the inter-monomer contacts (Figure 3B), compared to 18 in the hexamer (Figure 3A).
Essentially identical hydrophobic contacts are seen between the helices at this interface for the hexamer and pentamer, and the alternative packing is achieved by a ~10° rotation of adjacent monomers around the center of the three helices. At the center of the cross-linked pentamer, the NTDs are in closer proximity than at the center of the cross-linked hexamer (cf. Figures 3A and 3B). Although it is clear that pentamers must be present in a capsid core (see below), compelling evidence for particular biophysical/biochemical effects that allow transition between hexamer and pentamer is lacking. Clearly, subtle changes in curvature can be introduced at the three interfaces, NTD-NTD, CTD-CTD and CTD-NTD, by conformational changes in the hinge between NTD and CTD, creating the rich polymorphism that is observed for retroviral cores .
Recently, X-ray structures of full-length native CA protein from HIV-1 and BLV (Figures 3C & 3D) [45,46] were solved. These crystal structures permitted the unambiguous identification of inter-hexameric interactions for the hexameric arrangement of the CA proteins without crosslinking. For HIV-1, these differ considerably from those derived from the cross-linked hexameric structure. Details of hydrophilic and water-mediated interactions at the dimer and trimer interfaces in the hexagonal lattice were determined . A large number of structural water molecules form an extensive hydration layer, which are associated with packing differences at the two- and three fold interfaces in the hydrated and non-hydrated hexamer . In addition, chloride and iodide ions mediate intra-hexameric interactions in the periphery and at the center of the hexamers. Whether these anion-mediated interactions play any biological role in assembly is unclear; they may simply aid ordering the protein and nucleate crystallization. The water-mediated packing differences have been proposed to permit isoenergetic structural rearrangements, important in conferring structural plasticity, necessary in the formation of an asymmetric conical capsid core.
In summary, the structural convergence between HIV-1, BLV and RSV hexamers suggests a conserved role for these oligomers in retroviral capsids (Figures 3A-E). Furthermore, there is strong evidence to support the existence of pentameric units in HIV-1 and RSV capsid cores, however, high resolution atomic structures of native pentamers are yet to be determined.
Inside immature virions, Gag is arranged in a quasi-spherical shell  with discernible boundary defects; the latter can be considered a sign of 3D spatial geometric frustration . Because of the size and disorder in immature retroviral Gag lattices, atomic models are hard to derive. However, a combination of cryo-EM [49-52] and computational modeling [53,54], recently generated detailed models for the packing of the Gag protein in immature lattices. These models included structural specifics on the subdomains unique to the immature assembly, like the p10 peptide in RSV [49,54], the spacer peptide (SP) in RSV and spacer peptide 1 (SP1) in HIV [34,39,41].
In the immature assembly, the packing arrangement in the CA-NTD layer is virus-specific. In HIV (Figure 4A), helix 1 resides at the NTD interfaces, while helix 2 forms the pseudo three-fold symmetry axis . By comparison, in RSV, helices 2 and 7 reside at the NTD interfaces, while helix 4 is arranged around a pseudo three-fold symmetry axis (Figure 4B). A quaternary organization similar to RSV is also seen for M-MPV [34,39,41,53,54]. In addition, the short p10 peptide, present only in RSV (Figure 4B), propagates the lattice by connecting adjacent NTDs from different Gag monomers [49,54]. The causes or functional consequences of these structural differences between immature assemblies of different retroviruses are currently unknown, although it is possible that differences in the cellular mechanisms of assembly for D-type (M-MPV) versus C-type (RSV,HIV) retroviruses may play a role .
In contrast to the varied arrangement of the CA-NTDs, the packing of the CA-CTDs in immature assemblies is conserved among retroviruses . The dimer interface always involves helix 9, although marked differences in the relative helix 9-helix 9 orientations are seen in the different CTD domain dimer structures (described above and Figure 2). For the CTD arrangements in immature RSV particles, cryo-EM data suggest that hexameric CTD-rings are possibly present, given their close proximity (Figure 4B) [34,39,41]. In the case of HIV-1, the arrangement of the CTDs in the immature lattice (Figure 2C) is remarkably similar to the structure of the CTD dimer in complex with a CA assembly inhibitor (CAI) peptide (Figure 2B) , suggesting that this inhibitor functions by stabilizing the immature assembly, preventing maturation into the conical capsid cores after cleavage of the SP1 domain. In addition, the quaternary structure of the low pH 4.3 RSV CTD X-ray dimer (Figure 2E)  resembles the arrangement at the dimer interface in mature HIV capsid assemblies (Figure 2D) , while the high pH 8.5 RSV CTD dimer structure (Figure 2F) is similar to what is observed for the dimeric interface in immature RSV capsid assemblies (Figure 2G) .
The region between the CTD of CA and NC in Gag, namely SP in RSV, SP1 in HIV, a short, non-cleavable SP1-like junction in M-MPV and a long, ‘charged assembly helix’ at the C-terminus of CA in murine leukemia virus (MLV), play important roles for virus function, since mutations or deletions interfere with Gag-Gag assembly and affect virus infectivity [47,54,57].
At present, there is still uncertainty about the conformation of the region between CA and NC in immature assemblies of Gag. In individual SP/SP1 or CA-SP1 polypeptides, SP/SP1 is flexible and disordered [53,57], although a helical conformation of SP1 can be induced by appending a leucine zipper to the SP1 peptide . In addition, it has been proposed that in immature assemblies the SP/SP1 region of RSV/HIV forms a six helix bundle, tethering together six Gag molecules [53,54]. Likewise, for MLV, cryo-electron tomography (cryo-ET) of in vitro assembled virus-like particles suggested the presence of helices between CA and NC . For RSV and HIV, the SP/SP1 region has been evoked to function as a molecular switch, which, if present, stabilizes the immature assembly. Removal of SP/SP1 by proteolytic cleavage, therefore, may relieve constraints on Gag and would allow conversion to the mature capsid core [58,60]. The importance of the region between CA and NC is further supported by the observation of altered phenotypes when mutations are introduced in or near the SP/SP1 cleavage site  and by the effects of maturation inhibitors that target specific regions in SP1 . In addition, differences between viruses in the linkage between Gag CA and NC may be related to virus-specific assembly pathways, since M-MPV, a virus lacking a cleavable SP/SP1 peptide, assembles in the cytoplasm.
Atomic models for asymmetric conical HIV-1 capsid cores were determined by a combination of experimental and computational approaches. Employing molecular dynamics flexible fitting (MDFF)  and large-scale molecular dynamics , a hybrid model was created in which the cross-linked hexameric CA X-ray structure  and the dimeric CA CTD NMR structure  were fit into the cryo-EM density of in vitro generated tubular capsid assemblies. In the first step, a hexamer of hexamers (HOH) was created, in which CA hexamers were connected by CA CTD dimers. Assembling a large number of HOH permitted the generation of a curved hexameric lattice, recapitulating the arrangement in tubular capsid assemblies. Next, placing the cross-linked pentameric CA X-ray structure at the center, a pentamer of hexamers (POH) was modeled, creating units that permitted closure of the conical HIV-1 core. A POH constitutes an acutely curved unit, in contrast to the flat HOH . Differences between the POH and HOH structures revealed key interactions, such as details at the three-fold axis that modulate the bite angle between neighboring oligomers. In the POH and HOH, the same three helices form the trimer interface, although it is more tightly packed in the POH than in the HOH. These interactions were probed and validated by mutagenesis. For example, the A204C variant, which upon oxidation cross-links the capsid cores at the trimer interfaces, results in a hyperstable capsid core and non-infectious virions . Finally, an atomic model of a native HIV-1 capsid core was derived, based on the shape determined by cryo-ET and the POH and HOH building blocks This all-atom capsid core model contains 216 hexamers and 12 pentamers (Figure 5A).
The generation of infectious HIV virions from immature viral particles entails as the final step a global re-arrangement of the Gag assembly, known as maturation, which is triggered by sequential cleavages of Gag polyprotein by PR [67,68]. Starting from a quasi-spherical assembly of Gag (Figure 5B), cleavage between SP1 and NC releases the NC-SP2-p6 polypeptide, bound to the viral RNA genome. Subsequently, CA is cleaved off from the membrane-attached MA, followed by cleavage of SP1 from the CTD of CA, which leads to the formation of the conical capsid core (Figure 5A). Similarly, the NC-SP2-p6 polypeptide undergoes further cleavages, first releasing p6, followed by SP2 , resulting in the final NC containing RNP core. The ‘scars’ or defects seen in the immature lattice (Figure 5B) may relate to the mechanism of maturation, since gaps in the lattice may allow access of PR from the inside of the virion to the lamina between MA and CA. In turn, this may suggest that PR-mediated release of CA from MA in Gag begins at the edges of the immature lattice. However, in general, the molecular mechanism(s) of maturation remain elusive and two alternative pathways exist: a displacive-transition and a disassembly-reassembly maturation pathway. Some experimental evidence supports the former [69-71], although there is also biochemical data that support the latter [61,72]. Irrespective, given the large structural differences between mature and immature assemblies, the global re-arrangement upon maturation most likely requires the disruption of the CTD dimer, before self-assembly of the mature capsid core can happen.
The HIV capsid core is genetically fragile  and has to straddle between being stable enough to protect the HIV genome from cytosolic nucleic acid sensors, evading restriction factors, but still allow for reverse transcription and formation of the pre-integration complex. Efficient generation of pre-integration complexes, their association with the nuclear pore [11,15] and nuclear import are critically dependent on the presence of CypA in the target cell [9,11,74,75]. As described above, interactions between the CA protein and CypA involve the CypA-binding loop, which engages the catalytic site of CypA . At the nuclear pore, the Nup358 protein also contains a CypA domain (Nup358Cyp), which has been shown to interact with the CA protein, albeit with a larger binding interface, compared to CypA . Intriguingly, although structures of the CA protein with CypA are available, the precise biological role of CypA during lentiviral infection has still not been elucidated.
Some hints towards CypA’s role on capsid assemblies have recently emerged from magic-angle-spinning solid state NMR. These experiments permitted the evaluation of conformational dynamics at the atomic level and revealed exceptional mobility on the nano- to microsecond timescales for the CypA-binding loop [31,78]. Interestingly, these motions were significantly attenuated upon CypA binding, and the dynamics profiles of CypA escape mutants closely resembled those of CypA-bound capsid assemblies . This resemblance in motional properties suggests that changes in the sequence-dependent conformational dynamics may be a critical factor in the escape mechanism of HIV-1 CA mutants from CypA dependence. In addition, a single-CypA molecule can simultaneously interact with two hexameric units of capsid assemblies using a non-canonical interface on CypA, with the second interaction interface possibly involved in the protective role played by CypA .
In summary, viral capsid cores are proteinaceous containers, which exhibit highly plastic, yet robust, properties. In retroviruses, the CA protein and capsid assemblies play important roles in nearly every step of the life cycle, from the early, immediate post-entry step, via trafficking to the nucleus, to formation and budding and final maturation of the viral particle. Elucidating the relationship between structure, dynamics and function of capsid assemblies in virions requires deciphering the interfaces between subcomponents and building blocks and generating a detailed atomic overall model, which is only attainable by combining and integrating data obtained from a plethora of experimental techniques, supplemented by functional and computational studies. Such studies have yielded important insights already, opened new avenues for further research and, no doubt, will continue to do so in the future.
We thank Boon Chong Goh for rendering the capsid assembly in an immature virion and helpful discussions, Klaus Schulten for continuous support and enlightening conversations, Chris Aiken, Peijun Zhang and all members of the Pittsburgh Center for HIV Protein Interactions for years of fruitful collaborations, and Teresa Brosenitsch for editorial support. This work is a contribution from the Pittsburgh Center for HIV Protein Interactions and was supported in part by National Institutes of Health grants P50GM082251 (A.M.G), R01GM067887 (J.R.P).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.