|Home | About | Journals | Submit | Contact Us | Français|
The mature capsids of HIV and other retroviruses organize and package the viral genome and its associated enzymes for delivery into host cells. The HIV capsid is a fullerene cone: a variably curved, closed shell composed of approximately 250 hexamers and exactly 12 pentamers of the viral CA protein. We devised methods for isolating soluble, assembly-competent CA hexamers and derived four crystallographically independent models that define the structure of this capsid assembly unit at atomic resolution. A ring of six CA N-terminal domains form an apparently rigid core, surrounded by an outer ring of C-terminal domains. Mobility of the outer ring appears to be an underlying mechanism for generating the variably curved lattice in authentic capsids. Hexamer-stabilizing interfaces are highly hydrated, and this property may be key to forming quasi-equivalent interactions within hexamers and pentamers. The structures also clarify the molecular basis for capsid assembly inhibition, and should facilitate structure-based drug design strategies.
The ribonucleoprotein genomic complex of human immunodeficiency virus type 1 (HIV-1) is encased within the mature capsid, a predominantly cone-shaped shell assembled from ~1,500 copies of the viral CA protein (recently reviewed by Ganser-Pornillos et al., 2008). The HIV-1 capsid is a fullerene cone (Ganser et al., 1999), with a body composed of a curved two-dimensional (2D) array of ~250 CA hexamers. To form a closed shell, the ends of the cone are capped by exactly twelve pentamers, with seven at the broad end and five at the narrow end. The fullerene model and its generality across retroviruses are now widely accepted (e.g., see Heymann et al., 2008).
HIV-1 CA is a highly helical protein with two independently folded domains, the N-terminal domain (NTD) and C-terminal domain (CTD), which are flexibly linked. Published high-resolution CA structures include isolated domains and full-length monomers from HIV-1 and other retroviruses (Berthet-Colominas et al., 1999; Campos-Olivas et al., 2000; Cornilescu et al., 2001; Gamble et al., 1997; Gitti et al., 1996; Jin et al., 1999; Khorasanizadeh et al., 1999; Mortuza et al., 2009; Mortuza et al., 2004). These structures collectively demonstrate that retroviral CA proteins share a common tertiary fold despite having widely divergent amino acid sequences, and that, as corollary, mature retroviral capsids are likely to be stabilized by similar quaternary interactions. The hexagonal capsid lattice is composed of three different types of interfaces: a six-fold symmetric NTD-NTD interface that creates hexameric rings, an intermolecular interface between the two domains (NTD-CTD) that reinforces the hexamer, and a homodimeric CTD-CTD interface that links the hexameric building blocks into an infinite hexagonal lattice (Bowzard et al., 2001; Gamble et al., 1997; Ganser-Pornillos et al., 2007; Ganser-Pornillos et al., 2004; Lanman et al., 2003; Lanman et al., 2004; Li et al., 2000; von Schwedler et al., 1998; von Schwedler et al., 2003). This lattice architecture was unambiguously established by an electron cryomicrocroscopy (cryoEM) structure of 2D crystals of HIV-1 CA hexamers, albeit only at moderate resolution (~9 Å) (Ganser-Pornillos et al., 2007). Moreover, crystal structures of isolated NTD and CTD have provided atomic models for two of the different types of interfaces in the hexagonal lattice: the homodimeric CTD-CTD interface of HIV-1 CA (Worthylake et al., 1999), and the hexameric interface formed by the isolated NTD of murine leukemia virus (MLV) CA (Mortuza et al., 2004).
Despite steady progress in elucidating the structure of the retroviral capsid lattice, high-resolution crystal structures of hexagonal arrays of full-length retroviral CA proteins have not yet been reported. This is presumably due to the low intrinsic stability of CA hexamers and the challenges of preparing discrete oligomeric CA assemblies. Here, we describe engineered HIV-1 CA proteins that form homogenous populations of stable, soluble hexamers, which are functional for assembly in vitro. The X-ray crystal structures of these CA proteins extend our understanding of the hexameric capsid assembly unit to atomic resolution.
Discrete HIV-1 CA hexamers were stabilized by two independent methods: thiol crosslinking and template-directed hexamerization. In the former case, the cryoEM-based coordinate model of the HIV-1 CA lattice (Ganser-Pornillos et al., 2007) was examined to identify residue pairs that appeared to be in close contact across the NTD-NTD interface. Cysteines were introduced at these positions, and the mutant CA proteins were assembled under reducing conditions into cylinders, which are known to faithfully mimic the hexagonal portion of the HIV-1 capsid (Briggs et al., 2003; Campbell and Vogt, 1995; Ehrlich et al., 1992; Ganser-Pornillos et al., 2004; Gross et al., 1997; Li et al., 2000). Our expectation was that, within the assembled cylinders, one or more of the engineered cysteine pairs would be positioned optimally to form intermolecular disulfide bonds, and upon oxidation, create a covalently linked hexamer. This approach also provided a functional check for the mutant proteins.
As shown in Figure 1, a construct with cysteine substitutions for A14 and E45 assembled into cylinders that were morphologically very similar to those formed by the wildtype protein (compare Figs. 1A and 1B). Upon oxidative crosslinking, the CA subunits from these assemblies migrated almost exclusively as hexamers in nonreducing SDS-PAGE gels (Fig. 1D, lane 5). Indeed, complete crosslinking was observed even in the presence of a 500-fold molar excess of reducing agent (not shown), demonstrating that disulfide bond formation was driven by the specific noncovalent protein-protein interactions within the cylinders. Not surprisingly, the crosslinked cylinders were extraordinarily stable, and remained intact under conditions in which wildtype cylinders readily disassembled (e.g., low salt, or in the presence of the proline rotamase cyclophilin A) (not shown). To obtain isolated hexamers for crystallization, we therefore introduced two additional mutations (W184A and M185A) at the CTD-CTD interface, to weaken the extended lattice (Gamble et al., 1997; von Schwedler et al., 2003) while leaving the hexamer-stabilizing interfaces unchanged further. This CA construct assembled less efficiently but still formed cylinders (Fig. 1C). More important, it crosslinked into discrete, soluble hexamers with ~100% efficiency (Fig. 1D, lane 6 and Fig. 1E).
Our second approach for creating discrete hexamers was to fuse the CcmK4 protein to the C-terminal end of HIV-1 CA. CcmK4 forms stable hexameric rings in solution and has accessible termini (Kerfeld et al., 2005), making it an attractive template for driving CA hexamerization. Various linker lengths and sequences were tested, and all of the CA-CcmK4 fusion proteins that expressed solubly were hexameric in solution, as analyzed by analytical equilibrium sedimentation (not shown). However, different constructs eluted with different retention times by gel filtration and we assumed that only the constructs that behaved as apparent hexamers (late eluters) were correctly folded (not shown). The successful construct consisted of CA residues 1-226, a two-residue linker, and the full-length CcmK4 sequence followed by the remains of an affinity tag. As in the crosslinked construct, it was necessary to introduce the W184A and M185A mutations to the CTD region to prevent the hexamers from polymerizing into insoluble aggregates.
Four crystallographically independent models of the HIV-1 CA hexamer were derived. The first model was obtained from crystals of CcmK4-templated CA, and was determined to 7 Å resolution (Fig. S1; see Table 1 for crystallographic statistics). The remaining three hexamer models were determined from two different crystal forms of crosslinked CA, which diffracted to 1.9 and 2.7 Å resolution (Figs. S2 and S3; Table 1).
The quaternary organization of the crosslinked and templated hexamers are identical (to the limit of their respective resolutions), and mutually validate the two oligomer-stabilizing strategies. The hexamers are also very similar to the cryoEM structure (Ganser-Pornillos et al., 2007) (Fig. S4), and are consistent with other structural and biochemical data (Bartonova et al., 2008; Bowzard et al., 2001; Cardone et al., 2009; Gamble et al., 1997; Ganser-Pornillos et al., 2007; Ganser-Pornillos et al., 2004; Lanman et al., 2003; Lanman et al., 2004; Li et al., 2000; Mortuza et al., 2004; von Schwedler et al., 1998; von Schwedler et al., 2003). As shown in Figure 2, the CA hexamer is composed of two concentric rings, with the NTD and CTD forming the inner and outer rings, respectively. Intramolecular interactions between the two domains of each protomer are minimal, but both pack against the neighboring NTD subunit. Indeed, the NTD-NTD and NTD-CTD interfaces essentially merge into one contiguous hexamerization interface (Fig. 3), emphasizing the degree to which the NTD and CTD interactions cooperate to create the hexameric assembly.
Two of the different crystal forms (one of CcmK4-templated and one of crosslinked CA) were composed of stacked sheets, with each sheet corresponding to a flattened version of the mature hexameric lattice (Fig. 2C, Figs. S1B and S2B), as also seen in the HIV-1 CA cryoEM structure (Ganser-Pornillos et al., 2007). These structures therefore recapitulate all three relevant CA-CA interfaces. Within each sheet, neighboring hexamers are connected exclusively by CTD contacts made through the CTD-CTD dimerization interface (Fig. 2C). The crystallographically distinct dimer interactions appear very similar to each other, and resemble the X-ray structure of the isolated full-affinity CTD dimer (pdb code 1a43) (Worthylake et al., 1999), although the agreement in the relative orientations of the CTDs about the dyad is not exact (Fig. 2D). This dimer interface is distinctly different from those seen in a shorter CTD construct (2a8o) (Gamble et al., 1997), or in the presence of an assembly inhibitor (2buo) (Ternois et al., 2005), or upon mutation-induced domain swapping (2ont) (Ivanov et al., 2007) (not shown).
Previous studies have suggested conformational plasticity in the tertiary fold of the CTD (Alcaraz et al., 2007; Bartonova et al., 2008; Berthet-Colominas et al., 1999; Bhattacharya et al., 2008; Ivanov et al., 2007; Ternois et al., 2005; Wong et al., 2008), and this flexibility is also evident in our structures of crosslinked CA. In one crystal form, the N-terminal two-thirds of helix 9 and the preceding loop are poorly ordered, and in the second crystal form this region adopts two distinct conformations. In one conformation, the loop trajectory places it close to the neighboring NTD, allowing polar sidechains to participate in the hydrogen bond network at the NTD-CTD interface (Fig. S5A). This CA CTD configuration has not been observed previously. In the second conformation, the loop resembles the position seen in the 1a43 monomer, and does not contact the NTD (Fig. S5B). We identified three additional areas of structural variability (not shown): the major homology region (MHR) hairpin in two protomers appeared to have an alternative conformation with a non-canonical hydrogen bonding network, which we did not attempt to model due to poor density; the native C198-C218 disulfide bond appeared to exist in both the reduced and oxidized forms, with varying occupancies for each protomer; and helix 10 was characterized by variable and poor densities across the different subunits, indicative of positional disorder and mobility. It is likely that the range of conformations seen in the crystal structures reflects both the natural plasticity of the CTD as well as amplified flexibility arising from the W184A and M185A mutations within helix 9.
The first three helices of CA contain the NTD-NTD interacting residues, which form a loose 18-helix barrel at the center of the hexamer (Fig. 2B). This interface contains a small hydrophobic core of aliphatic sidechains, which include M39 and A42 (indicated in Fig. 3B). These residues were previously shown by mutagenesis to be critical for both CA assembly in vitro and viral infectivity in vivo (Ganser-Pornillos et al., 2004; von Schwedler et al., 1998; von Schwedler et al., 2003). Despite these limited hydrophobic interactions, the bulk of the interface is created by hydrophilic contacts. In particular, numerous ordered water molecules bridge polar sidechain and backbone atoms throughout the entire interface, forming a pervasive hydrogen-bonding network (Fig. 3C). Bridging waters were also observed in the hexameric X-ray structure of the isolated NTD of MLV CA (Mortuza et al., 2004), suggesting that heavily solvated interfaces may be a general property of retroviral CA hexamer interfaces. To our knowledge, similar water-rich interfaces have not been seen previously in non-retroviral capsid assemblies.
A very recent cryoEM study showed that the CA hexamer and pentamer of Rous sarcoma virus are quasi-equivalent, with pentamers formed by simply removing a protomer from the hexamer and closing the ring (Cardone et al., 2009). By analogy, helices 1-3 in HIV-1 CA must therefore switch from an 18-helix barrel in the hexamer to a 15-helix barrel in the pentamer, with concomitant adjustments in intermolecular contacts. Although details of the pentameric interactions are not yet known, the highly hydrated character of the hexameric interface is compatible with this quasi-equivalent switching mechanism, because water molecules should be particularly adept at repositioning to accommodate altered orientations in hydrogen bonding and side chain packing geometries.
Intermolecular NTD-CTD interactions are made primarily by extended sidechains from helix 8 of the CTD, which pack against the C-terminal end of helix 3, the intervening loop, and the N-terminal end of helix 4 of the NTD (Fig. 3). Additional contacts are also made between helix 11 (CTD) and the C-terminal end of helix 7 (NTD). As in the NTD-NTD interface, NTD-CTD contacts also include a hydrophobic component (Fig. 3B), and mutations in two such residues (Y169 and L211) were recently shown to disrupt mature capsid assembly and viral infectivity (Bartonova et al., 2008). Once again, however, polar and water-mediated interactions are prevalent in this region (Fig. 3C). In this case, the most striking feature is a series of interdomain helix capping interactions. As shown in Figure 3D, the R173 sidechain (CTD) forms a C-cap for helix 3 (NTD), D166 (CTD) an N-cap for helix 4 (NTD), and E71 (NTD) an N-cap for helix 11 (CTD). In some protomers, Q219 (CTD) also forms an intermolecular C-cap for helix 7 (NTD) and R143 (NTD) forms an intramolecular C-cap for helix 8 (CTD). Thus, every helix in the NTD-CTD region has a corresponding cap that is provided by the other domain, and the capping residue has an extended and inherently flexible sidechain. We speculate that the helix caps can serve as pivot points, allowing relative motions between the NTD and CTD to accommodate the varying degrees of surface curvature required to create a conical lattice.
The HIV-1 capsid performs an essential role early in the viral replication cycle, and inhibition of capsid assembly by small molecules is therefore being pursued as a therapeutic strategy for the treatment of HIV/AIDS. Indeed, even molecules that simply alter capsid stability may have therapeutic benefit because CA mutations that either enhance or reduce stability (and presumably, the rate of disassembly) lead to dramatic decrease in viral infectivity (Forshey et al., 2002). The NTD-CTD interface appears to be a particularly attractive site for inhibition, because two experimental inhibitors of HIV-1 capsid assembly (CAP-1 and CA-I) both appear to target this set of interactions.
CAP-1 is a small molecule that binds within a hydrophobic pocket at the bottom of the NTD, through an induced fit mechanism (Kelly et al., 2007; Tang et al., 2003). In the CA hexamer, the aromatic sidechain of F32 fills the pocket, forming orthogonal ring stacking interactions with H62 and Y145 (Fig. 4A). The stack is reinforced by the guanidinium group of R162 from the neighboring CTD. Comparison with 2jpr, the NTD/CAP-1 structure (Kelly et al., 2007), shows that the inhibitor displaces the three aromatic sidechains from the pocket (arrows in Fig. 4B), and disrupts the polar network that stabilizes the helix 3/4 loop conformation. This analysis confirms the idea that CAP-1 acts as an allosteric inhibitor of capsid assembly, by accessing an adjacent hidden pocket and altering the local geometry required to make the NTD-CTD interface (Kelly et al., 2007).
CA-I and its progeny are small peptides that bind to the CTD (Bartonova et al., 2008; Bhattacharya et al., 2008; Sticht et al., 2005; Ternois et al., 2005), and were proposed to disrupt capsid assembly by competitively inhibiting formation of the NTD-CTD interface (Ganser-Pornillos et al., 2007; Sticht et al., 2005; Ternois et al., 2005). Superposition of the NTD-CTD interface with 2buo, the CTD/CA-I co-crystal structure (Ternois et al., 2005), confirms that the helical peptide occludes binding of the NTD to the CTD (Fig. 4C), although the binding modes of the two “ligands” are completely different, and their respective interactions are mediated by distinct sets of residues. Specifically, binding of helix 4 to the CTD is mediated primarily by polar contacts (as discussed above), whereas the CA-I peptide occupies a highly hydrophobic pocket that is exposed by a slight opening of the domain (Bartonova et al., 2008; Bhattacharya et al., 2008; Sticht et al., 2005; Ternois et al., 2005).
CA-I induces a different CTD-CTD dimer (Bartonova et al., 2008; Ternois et al., 2005), in terms of the relative orientations of the domains, compared to those observed in this study and in 1a43, the unbound isolated domain (Worthylake et al., 1999) (not shown). These observations lend support to the proposal that CA-I allosterically induces an “inert” CTD structure that is incapable of extending the capsid lattice (Bartonova et al., 2008). The two possible mechanisms are not exclusive, and CA-I may inhibit by disrupting both NTD-CTD and CTD-CTD interactions.
Conical fullerene shells are highly curved, and this curvature varies in two different ways. Specifically, the dihedral angles between adjacent hexamer planes vary gradually and continuously along lines of hexamers within the body of the cone, but change more drastically at the ends, at sites immediately adjacent to the pentamers (Li et al., 2000). By analogy to other characterized viral capsids, curvature may be accommodated by structural plasticity in the tertiary folds of the individual subunits, and/or by alterations at the three different intersubunit interfaces.
Although our 3D crystal lattices are not curved, we identified regions in which conformational variation can occur, by comparing our multiple crystallographically distinct CA structures (4 distinct hexamers and 19 distinct monomers). On the tertiary level, least squares superposition of the NTD models from the high-resolution structures illustrates that this domain is essentially invariant, except for the cyclophilin-binding loop, which does not form capsid-stabilizing contacts and is dispensable for assembly (Ganser-Pornillos et al., 2004) (Fig. 5A). On the quaternary level, the six-NTD core is also invariant across our four different hexamers (Fig. 5C), indicating that distortion of the hexamer ring is not a major mechanism for generating lattice curvature. Nevertheless, the water-mediated NTD-NTD interfaces seen in our structures are likely to facilitate repacking into the pentamer, which is the dominant factor in closing the capsid shell.
In contrast to the NTD, the globular fold of the CTD is more flexible, and the greatest variability is observed in a 15-residue stretch that includes the dimerization helix (Fig. 5B and Fig. S5). The structural variation seen in this region could, in principle, affect packing geometry at the CTD-CTD dimerization interface. However, the observed conformations, although intriguing, cannot be unambiguously interpreted in functional terms because they occur at sites where we mutated the protein to prevent formation of hyperstable lattices.
The most significant structural variation we observed is in the relative orientations of the two domains. This variability is best described as rigid-body motions of the CTD subunits relative to the NTD core of the hexamer (Fig. 5C). The mobility of the CTD subunits is restricted primarily by their interactions with neighboring NTD subunits, because intramolecular contacts between the two domains of each protomer and intermolecular contacts between adjacent CTDs within each hexamer are minimal. The intermolecular NTD-CTD interface must therefore have inherent plasticity that can accommodate the observed rigid-body motions. A superposition of the independent NTD-CTD interfaces in our high-resolution CA hexamers is shown in Figure 5D. It illustrates that although the two CA domains can move relative to one another, critical direct protein-protein interactions (such as the helix caps) are preserved, because they act as pivot points for the rigid-body motions. At the periphery of the interface, polar contacts are mediated by water molecules (Fig. 3C), which can presumably adjust slightly to maintain energetically favorable positions (as discussed above). As with the NTD-NTD contacts, the use of flexible sidechains and solvent-mediated interactions for NTD-CTD interactions likely facilitates the formation of the quasi-equivalent pentamer.
The picture that emerges is that the CA hexamer is organized as an apparently rigid ring of NTD subunits, surrounded by a belt of relatively mobile CTD subunits. As interactions between adjacent hexamers are mediated only by the CTD-CTD dimerization interface, variations in NTD-CTD angles within the hexamer effectively change the tilt of the dimerization interface relative to the hexamer plane. This provides an intuitively straightforward mechanism for modulation of dihedral angles between adjacent hexamers to generate curvature in the HIV capsid. Whether this is augmented by variation in CTD-CTD packing at the dimerization interface remains an open question.
A variety of CA-CcmK4 fusion constructs were assembled from coding regions of HIV-1NL4-3 CA and CcmK4 from Synechocystis sp. PCC 6803, and verified by DNA sequencing. CA-CcmK4 constructs with different CA end points (residues 221-228) and linker sequences (2-5 residues) were constructed, all with C-terminal polyhistidine tags. Protein expression followed the auto-induction method (Studier, 2005). Cell pellets were lysed in buffer (20 mM sodium phosphate, pH 7.4, 500 mM NaCl, 20 mM imidazole) supplemented with protease inhibitors. The soluble fraction was applied to a gravity column packed with Ni-NTA beads (Qiagen), washed, and eluted with 20 mM sodium phosphate, pH 7.4, 200 mM NaCl, 500 mM imidazole. After proteolytic removal of the tag, proteins were purified to homogeneity on a Q column, followed by gel filtration in 25 mM Bis-Tris, pH 6.0, 300 mM NaCl, 5 mM β-mercaptoethanol.
Crystallization used protein at 15 mg/mL in gel filtration buffer, with drops containing 3 μL protein and 2 μL well solution (0.1 M imidazole, pH 6.5, 600 mM sodium acetate) at 4°C. Crystals were cryoprotected with 25% glycerol and flash frozen in liquid nitrogen prior to data collection. The best crystals were obtained for a construct that had the CA endpoint at H226, a two-residue linker (EL), and full-length CcmK4. The CcmK4 portion contained the E104Y mutation, which was introduced to improve crystal quality by surface entropy reduction (Goldschmidt et al., 2007).
Data were collected at SSRL beamline 7.1 and processed with HKL2000 (Otwinowski and Minor, 1997). A self rotation function computed using MOLREP (Vagin and Teplyakov, 1997) revealed a strong six-fold non-crystallographic axis parallel to the c* axis of the C2 space group, and consideration of likely solvent content was consistent with the presence of six CA-CcmK4 subunits in the asymmetric unit. The EM-based CA hexamer model was positioned using EPMR (Kissinger et al., 1999). Due to the high degree of noncrystallographic symmetry (NCS) in this crystal form, the test set was selected in thin resolution shells using DATAMAN (Kleywegt and Jones, 1996). With the limited resolution, refinement was restricted to treating the separate NTDs and CTDs as rigid bodies with PHENIX (Adams et al., 2002) to yield an Rwork of 28% and an Rfree of 32% (Table 1).
Despite extensive effort, we were not able to define a precise location for the CcmK4 portion of the fusion protein. Some density is seen aligned with the CA hexamer six-fold, and fills a volume that must be occupied by CcmK4 in order to complete the crystal lattice, but it was not possible to position a CcmK4 hexamer into this density and multiple molecular replacement calculations failed to find a convincing solution. Analysis of washed crystals on SDS-PAGE indicated that the fusion protein was intact (not shown), and our preferred explanation is that CcmK4 can occupy multiple conformations. One extreme possibility is that the CcmK4 hexamers are oriented 50% up and 50% down with respect to the CA hexamer. This is suggested by the location of the N-termini on the outer rim of the CcmK4 hexamer, and the apparently equal probability that the CA hexamer might nucleate on either side of the CcmK4 hexamer.
Cysteine mutants were based on a pET11a (Novagen) construct harboring HIV-1NL4-3 CA under the control of the T7 promoter. Mutations were introduced using the Quikchange method (Stratagene) and verified by DNA sequencing. The two native cysteines at the CTD (C198 and C218) were retained, since the cryoEM model indicated a very low likelihood of spurious crosslinking with these residues. Proteins were expressed and purified as previously described (Yoo et al., 1997), with the addition of 200 mM β-mercaptoethanol (βME) to all buffers. CA proteins were assembled in vitro either by direct dilution (von Schwedler et al., 1998) or by overnight dialysis (Gross et al., 1997) into assembly buffer (50 mM Tris, pH 8, 1 M NaCl) containing 20-200 mM βME. Final protein concentrations were 0.1-30 mg/mL. Assembled particles were visualized by transmission EM, as previously described (Ganser-Pornillos et al., 2004). Crosslinking was achieved by subsequent dialysis into assembly buffer with the βME concentration dropped to 20 mM or lower. The extent and efficiency of crosslinking was assessed using non-reducing SDS-PAGE.
Crosslinked CA A14C/E45C/W184A/M185A hexamers were prepared by sequential dialysis of 10-30 mg/mL protein into assembly buffer containing 200 mM βME, assembly buffer with 0.2 mM βME, and finally, 20 mM Tris, pH 8. Each dialysis step was performed at 4 °C, for at least 8 hours. The soluble crosslinked hexamers were somewhat prone to aggregation, but remained competent for crystal formation even after storage at 4 °C for several days.
The crosslinked hexamers readily formed several visually distinct crystal forms. The best crystals showed hexagonal and prism-like morphology, and were obtained with the same precipitant (10-12% PEG 8,000) and protein-precipitant ratio (2:1), but at different pH and temperature (hexagonal = 100 mM sodium malonate, pH 6.5, 4 °C; prism = 100 mM Tris, pH 7.4, 20 °C). Crystals were cryoprotected by soaking in mother liquor containing 30% glycerol or ethylene glycol for 10 min (in 10% increments).
Data were collected at APS beamline 22-BM. The hexagonal crystals had unit cell parameters of a = b = 157.3 Å, c = 56.8 Å, α = β = 90°, γ = 120° (Table 1). Two-thirds of the reflections were systematically weak, indicative of translational pseudosymmetry. Strong reflections followed the selection rule (h,h+3n,l), and were on average 4 times larger than the weak reflections. A Patterson map calculated with only the strong subset showed a peak of equal intensity to the origin at fractional coordinates (0.67,0.33,0) (peak intensity was 80% of origin when calculated with all data). Using only the strong reflections, the data can therefore be indexed in space group P6 with a smaller unit cell (a’ = b’ = 91.0 Å, c’ = 56.8 Å, Rsym = 10%) (Table 1) containing one CA protein in the asymmetric unit. Note that the pseudo-cell dimensions closely match the dimensions of the 2D crystal lattice in the cryoEM structure (92.7 Å) (Ganser-Pornillos et al., 2007), and that the a’ and c’ edges in the pseudo-cell are related to the true cell a and c edges by the equations a’ ≈ a/sqrt(3) and c’ ≈ c, respectively. These indicated that the crystal was composed of stacked sheets of CA hexamers, and that each sheet is a flattened version of the 2D CA lattice within the capsid (Fig. S2).
Molecular replacement in the pseudo-cell setting was performed with MOLREP (Vagin and Teplyakov, 1997). To provide a check against possible model bias, we used crystal structures of the NTD and CTD in complex with assembly inhibitors (2pxr and 2buo, respectively). Our expectation was that the model-phased map would indicate different polypeptide conformations at the inhibitor-binding sites compared to the search models, and indeed, these were observed. The map also showed clearly defined density for regions that were absent from the search models. The merged intensities and rigid-body refined coordinates (Rfree = 39%) were submitted to the Bias Removal Server (www.tuna.tamu.edu) for map calculation with the Shake&wARP algorithm (Reddy et al., 2003) (Fig. S2A). The full model was rebuilt manually into this map with COOT (Emsley and Cowtan, 2004). Positional and isotropic B-factor refinement were performed in PHENIX (Adams et al., 2002), using simulated annealing and automated water-picking protocols. The current model has Rwork of 23% and Rfree of 27% (against randomly selected 5% of the data), with good geometry and no residues in disallowed regions of the Ramachandran plot (Table 1).
Statistical analyses of the reflection intensities, test refinements, and real space considerations indicated that the true space group is most likely perfectly hemihedrally twinned P3 (twin law = “-h,-k,l”), with one hexamer in the asymmetric unit. The combined pathologies of pseudosymmetry and twinning have made refinement in the true space group problematic. We therefore chose to simply report the structure in the pseudo-cell setting, with the understanding that it does not completely reflect the structural plasticity of the protein. The model represents an average of both the pseudotranslationally related molecules (because only the strong reflections were used) and the two “twin domains” (because the twin-related reflections were merged). Fortunately, the pseudosymmetry and twinning in this crystal form appeared mainly due to alternative conformations in a small proportion of the monomer, spanning ~15 residues at the helix 8/9 loop and the N-terminal end of helix 9. This region was characterized by poor density in this crystal form and therefore not modeled. As illustrated in Figure S5, this same region was also variable in the orthorhombic crystal form (which displayed no diffraction pathologies).
Data on the prism-like crystals were collected at SSRL beamline 7-1. Based on systematic absences, the space group was identified as P212121 (Rsym = 8.5%), with two hexamers in the asymmetric unit (Fig. S3). This crystal form was also solved by molecular replacement in MOLREP (Vagin and Teplyakov, 1997) (R = 49%), with a hexameric search model derived from the partially refined structure in the hexagonal pseudo-cell. The solution was deemed reliable by the appearance of unbiased density for regions that were deliberately deleted from the search model, and further confirmed with an anomalous difference density map derived from a selenomethionine dataset collected at APS beamline 22-ID to 3.5 Å (densities for all 120 Se sites in the asymmetric unit were clearly visible at 2-15σ; not shown). The two hexamers in the asymmetric unit are stacked head-to-head with approximately coincident six-fold axes. The self-rotation function of the molecular replacement solution was identical to experimental.
The test set was selected in thin resolution shells using DATAMAN (Kleywegt and Jones, 1996). Initially, 22 domains in the asymmetric unit, omitting the NTD and CTD of one CA molecule, were refined as rigid bodies in REFMAC (Murshudov et al., 1997) (Rfree = ~39%). The domains were re-fit into their corresponding omit maps with COOT (Emsley and Cowtan, 2004), then copied onto the other molecules using the NCS transformation matrices. Simulated annealing and omit refinement were performed in PHENIX (Adams et al., 2002). Density for the NTD was found to be of significantly better quality compared to the CTD. Subsequent rounds of model building used NCS-averaged maps calculated separately for the two domains, simulated annealing omit maps, and a Shake&wARP map (Reddy et al., 2003). The highly variable regions at the CTD (residues 176-187) were left unmodeled until the last refinement cycle, to obtain the best unbiased maps for chain tracing (Fig. S5). This region was completely modeled in chains C, E, F, J, and L, partially modeled in A, B, G, and I, and unmodeled in D, H, and K. Due to the relatively poor quality of the electron density, the chain traces for this region must be considered tentative. Density quality for helix 10 was also highly variable. We attempted to derive a model that would account for the observed flexibility in the protein while taking advantage of the twelve-fold improvement in observation/parameter ratio afforded by NCS. The current best approach (adapted from ter Haar et al., 1998) is to define 4 segments of the CTD as separate NCS groups (residues 149-174, 175-189, 190-204, 209-219). The globular region of the NTD, the helix 6/7 loop, and β-hairpin were also defined as separate NCS groups. The current model has Rwork and Rfree of 24% and 26%, respectively, with good geometry (Table 1).
Coordinates and structure factors are available from www.rcsb.org: templated CA, 3gv2; crosslinked CA, hexagonal crystal form, 3h47; crosslinked CA, orthorhombic crystal form, 3h4e.
We thank Kent Baker and V. Mitch Luna for technical assistance, Todd Yeates for crystallographic discussions, J. Harless for clerical support, and acknowledge the DNA synthesis and sequencing core facilities at The Scripps Research Institute and University of Utah. Diffraction data were collected at Stanford Synchrotron Radiation Laboratory (SSRL) beamline 7-1 and at Southeast Regional Collaborative Access Team (SER-CAT) beamlines 22-BM and 22-ID at Advanced Photon Source (APS), Argonne National Laboratory. SSRL and APS are supported by the U.S. Department of Energy, Office of Biological and Environmental Research. The SSRL Structural Molecular Biology Program is also supported by the National Institutes of Health (NIH), National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences. This study was funded by NIH grants R01-GM066087 (M.Y.) and P50-GM082545 (M.Y., C.P.H. and W.I.S.). B.G.-P. was supported by a postdoctoral fellowship from the George E. Hewitt Foundation for Medical Research.
Supplemental Data Supplemental data includes five figures (Figs. S1-S5).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.