A three-dimensional model of the gp120 portion of the trimeric HIV envelope glycoprotein complex was developed subject to the conditions that the surface of gp120 that is occluded in the oligomer interface should (i) maximize carbohydrate exclusion, (ii) minimize conserved residue exposure, and (iii) be sterically compatible with binding of the 17b antibody. Since gp120 interacts with gp41, which is a symmetric trimer in its isolated form and remains trimeric when in the complex, the model was constrained to be threefold symmetric. The occluded interface in the oligomer is expected to comprise gp120-gp41 contacts, gp120-gp120 contacts, and occluded surfaces that large ligands such as antibodies cannot access. The constraining conditions for the model were chosen because, a priori, they represent reasonable expectations about the oligomer interface and because they could be reduced to quantifiable criteria.
Glycosylation sites occur almost exclusively on the exposed surfaces of protein molecules, and they occur near protein interfaces only at the periphery. Carbohydrate residues in N-linked glycans tend to be both flexible and highly hydrated; although they can be secured by protein contacts, the resultant entropic loss is large, making such interactions generally unlikely. As a consequence, we could expect the oligomeric interface to be free of glycosylation.
We would also expect exposed surface residues on gp120 to be variable, a consequence of immune pressure. Possible factors of conservation are limited primarily to occlusion at the oligomer interface, involvement with receptor binding, or constraints of folding topology. Conservation due to the last two criteria could be eliminated by examining residues for proximity to either the CD4 or the 17b Fab binding site (the 17b site here served as a surrogate for the chemokine receptor binding site [61
]) and by performing a clustering analysis to remove statistical outliers (constraints on exposed residues for structural purposes should be rare).
Several ligands are known to bind to gp120 in the context of the oligomeric complex as well as to isolated gp120. The sites for such ligands must be oriented on the oligomer interface and also appropriately oriented for biological interactions. The 17b antibody is known to bind to oligomeric gp120 (53
) and does not cause appreciable dissociation of the gp120 from the oligomer (44
). Therefore, any valid oligomer model must be sterically compatible with 17b binding. Similarly, productive binding of HIV to CD4-positive cells is known to involve intact glycoprotein oligomers. Therefore, one expects the CD4 binding site to be both free on the oligomer surface and appropriately oriented for attachment to CD4 on the cell surface.
The crystal structure of core gp120 was first elaborated with a modeled completion of two N-terminal residues (one of which is glycosylated), of the V4 loop, and of the two N
-acetylglucosamine residues at each site of N-linked glycosylation. This elaborated core was then oriented as a rigid body relative to a coordinate system established with a threefold-symmetry axis perpendicular to the viral surface (Fig. ). The initial orientation of gp120 was set such that the D1D2 portion of CD4 in the core gp120-CD4 complex would be superimposed onto a promoter of D1D2 in the structure of dimeric soluble CD4, oriented with its diad axis perpendicular to the hypothetical cell surface. (The dimeric soluble CD4, which consists of the entire extracellular portion of CD4, crystallizes as a dimer in three different space groups [60
]; its orientation is physiologically relevant and thus serves to position the hypothetical cell surface.) The gp120 protomer was then reoriented about its center of mass, displaced from the triad axis sufficiently to avoid collisions with other protomers. First, all rotational orientations about the z
axis were tested with respect to the quantifiable criteria, and the optimal value was found to be at 30° (Fig. a, left panel). Then, with the Z
orientation at the optimum, rotations were made successively about the X
rotational axes (Fig. a, middle and right panels). A protomer in the optimized model can be obtained from the 1gc1
PDB coordinates by the Euler rotation (Θ1
= 6.90, Θ2
= 112.34, Θ3
= 22.60) followed by the translation (tx
= −25.91, ty
= −71.24, tz
= 30.90). This procedure of successive rotations, which was used for computational economy, does not sample all of the rotational space; nevertheless, we expect the result to be close to the optimum for the three conditions since it preserves the initial orientation relative to CD4 on the cell surface. In addition, visual inspection of the final orientation confirms that it is close to, if not at, the global optimum.
FIG. 2 Quantitative modeling of the gp120 oligomer. Quantitative modeling employed three criteria: carbohydrate exposure (open circles), occlusion of conserved residues which are solvent exposed on a gp120 protomer (filled symbols), and steric considerations (more ...)
The carbohydrate criterion and the exposed and conserved surface residue criterion were completely independent. Nonetheless, quantification of these two criteria led to maxima and minima within 20° of each other for all independent axial rotations (with the 17b steric criterion used bifunctionally to eliminate incompatible orientations) (Fig. ). Taken together, the three criteria produced well-defined peaks with all three independent rotational parameters, thereby allowing a “best” orientation to be derived at an X rotation of 0°, a Y rotation of 0°, and a Z rotation of 30° (0° 0° 30°). This best orientation actually corresponds to two possible alignments with respect to the viral membrane, “up” and “down.” One of these alignments could be eliminated due to CD4 (or 17b) steric constraints; the binding of CD4 (or 17b) in this orientation would bury it in the viral membrane (Fig. ), thus permitting a single unique best orientation to be derived.
FIG. 3 Trimeric model of gp120. Three orientations of the model are shown. The images at the top depict the view from the orientation of the viral membrane. The middle images depict the view from the side, in between the viral and target cell membranes. The (more ...)
The resultant model of the core oligomer (Fig. ) displayed several relevant features that were not included in the modeling criteria. The observed and deduced binding sites for CD4 and neutralizing antibodies were all on exposed surfaces. The variable loops (V1 to V5) were all well exposed on the oligomer. The N and C termini were clustered at the end pointing toward the viral membrane (with the C terminus proximal to the oligomer axis poised to interact with the trimeric gp41). Additionally, the region identified by substitutional mutagenesis as interacting with chemokine receptors (45
) was free of carbohydrate and pointed directly at the target membrane. These features are consistent with the biology of the gp120 trimer and with the qualitative characteristics of the rough model obtained previously (62
To estimate the precision of the surface criterion quantification procedure, we turned to the influenza virus hemagglutinin system. Atomic-level structures are known for both the HA1-HA2 heterotrimer (59
) (equivalent to the gp120-gp41 oligomeric complex) and a monomeric proteolytic fragment, the HA top, complexed with the Fab of a neutralizing antibody (4
) (equivalent to the gp120-17b complex). In addition, the fusion-activated forms of gp41 and HA2 are structurally similar, and although core gp120 and the HA top show no sequence similarity, they have almost identical radii of gyration (20.6 and 20.8 Å, respectively), making rotations about a position displaced the same distance from the trimer axis reasonable.
The HA top contains much less glycosylation than core gp120. Only three sites are present, although no carbohydrate is modeled in the 2vir coordinates (X31 isolate). In contrast, 18 sites are present on the gp120 core model (HXBc2 isolate). To increase the accuracy of this criterion for the HA top, all nonconserved sites of glycosylation were identified and then used to enhance overall surface coverage, increasing the residues used in the carbohydrate criterion to 11. Even so, this was less than two-thirds of that used for gp120 (compare the structures in Fig. c). In addition, because side chains, instead of glycan moieties, were used to mark the positions of the carbohydrates, criterion uncertainty increased. Nevertheless, the residues identified tended to be on the outside of the trimer, and for rotations in Y and Z this criterion produced well-defined maxima close to the known trimer orientation (Fig. b and c).
The hemagglutinin sequence is much less divergent than the gp120 sequence. This makes it difficult to find a reasonable set of conserved exposed residues. (If there were no divergence, for example, this criterion would be meaningless.) Using 485 aligned HA1 sequences and a minimum solvent exposure of 40%, 39 residues showed 98% identity, 28 showed 99% identity, and 12 showed 100% identity. The 100% criterion was judged too strict because sequencing errors might account for some divergence, and so a 99% criterion was used. This stringent 99% criterion selected approximately the same number of surface residues as the less-restrictive gp120 criterion (for which all single-atom substitutions [e.g., Gln to Glu] were included, as well as larger substitutions as long as they did not change the character of the amino acid [e.g., Lys to Arg or Tyr to Trp]). While these residues tended to cluster at the known oligomer interface, the minima produced by this criterion were not well-defined (Fig. b and c). For example, the minimum Z rotation of the furthest conserved residues appears close to the maximum for the nearest (Fig. b, left panel), and the X rotation showed very little change in parameter distance (Fig. b, middle panel). This lack of definition may be related to the packing of the hemagglutinin trimer, with conserved residues clustering at two discrete interfaces, as opposed to gp120, for which one central cluster was observed (Fig. c). Such clustering would account, for example, for the local maximum at the origin of the Z rotation for the minimum conserved residue distance.
Finally, because the HC19 Fab binds further from the trimer axis than 17b, it does not produce as much of a steric clash. Indeed, for rotations in X, no angles are restricted.
Judging from the poor shape of the quantification curves, the surface criteria optimization did not work as well with the HA top as with the gp120 core. Overall results of the optimization procedures are shown in Table . The internal agreement of the criterion optimizations, as judged by the peak orientational difference between the carbohydrate and conserved-residue extrema, was much better for gp120. Despite the poor agreement, the mean of the extrema for the HA top was within 20° of the correct orientation, suggesting that averaging these independent criteria reduced the overall error. In the case of gp120, the 17b steric criterion reduced the mean deviation even further, to one-fourth that observed for the HA top. This suggested that the error associated with the rotational parameters of the resultant gp120 oligomeric model was only 5°.
While it is conceivable that the deletion of the gp120 N and C termini (57 and 19 amino acids, respectively) could alter the modeling results, we feel that this is unlikely. With respect to the carbohydrate and 17b steric criteria, the missing termini are carbohydrate free and 17b binds to core gp120. Thus, both criteria should be unaffected. With respect to the conserved and exposed amino acid criterion, similar extrema were observed for both the furthest and closest residues (Fig. a, left panel). If some of the 15 exposed and conserved amino acids are covered by the termini, a subset should give similar results. In the event that all or nearly all 15 are covered, this would localize much of the missing termini in the same region as the previous exposed amino acids; since the missing termini correspond to the epitopes of virtually all of the nonneutralizing antibodies, this would again place the expected oligomer interface in a similar region. Finally, the agreement between the carbohydrate and exposed-residue criteria was so good that even if the exposed-residue criterion were omitted, the extrema would change by only 5° on average. Thus, we feel that the presence of the missing termini is unlikely to substantially alter the results obtained here.
Although the modeling precisely defined the rotational parameters of the trimer, the translational parameter was only partially determined. Steric constraints defined a minimum approach, and distance constraints between the C terminus of gp120 and the N terminus of gp41 defined a maximum, but these two criteria did not discriminate sharply. With the center of rotation for a protomer placed at 35 Å from the trimer axis, which was close to the sterically constrained minimum approach, the Cα
diameter of the model was ~110 Å; if the (mannose)3
carbohydrate extensions were included, the diameter increased to ~150 Å. These dimensions agreed with electron microscopic observations of the viral spike, although these observations vary widely, from 100 Å (by negative staining of gp120 from SIV [22
]) to 150 Å (by ultrathin section tannic acid cytochemistry and surface replica electron microscopy of HIV-1 [21
]). If the distance from the trimer axis is proportional to the dimensions of the protomer in the x
directions, using the relative proportions of hemagglutinin as a guide would place the gp120 core 30.5 Å from the trimer axis (Fig. c). A protomer of this hemagglutinin proportionate model can be obtained from the 1gc1
coordinates with the Euler rotation specified previously followed by the translation (tx
= −30.43, ty
= −71.24, tz
= 30.90). At this distance, steric clashes occur between neighboring gp120 protomers with the carbohydrate at position 197, although these are easily resolved by minor movements of this flexible portion of the model. (This proportionate model is shown in Fig. c and ; the rest of the figures depict the gp120 protomer with a center of mass at x
= 35 Å.)
FIG. 7 Selected features of the oligomeric gp120 model. (a) The entire extracellular portion of CD4 (yellow Cα worm) is shown binding to the oriented gp120 oligomer (copper-brown Cα worm). Carbohydrate on the gp120 is colored cyan. Distances (more ...) Electrostatic analysis of the oligomeric core gp120.
Electrostatic analysis of our model of the oligomeric core gp120 defined an electropositive surface which would face the target cell membrane. We tested the robustness of this basic region to changes in the modeling parameters. As can be seen in Fig. , rotations of ±30° and translations of ±5 Å maintained the electropositivity of the region.
FIG. 4 Robustness of the basic cell-facing surface of core gp120 to variations in modeling parameters. The oligomer modeling was dependent on four independent parameters, one translational (trans) and three rotational (rot). The effect of varying these parameters (more ...)
We used sequence analysis combined with homology modeling to test the conservation of this basic region throughout the primate immunodeficiency viruses. Although the charge of the region differed among viruses, its basic nature was conserved in different clades as well as with HIV-2 and SIV (Fig. ).
FIG. 5 Conservation of basic cell-facing surface on core gp120. Homology modeling was used to construct the corresponding structures of core gp120 for HIV-1 clades C and O as well as HIV-2 and SIV, starting with the crystal structure of HIV-1 clade B core gp120 (more ...) Analysis of the V3 loop.
The V3 loop comprises roughly 30 amino acids. It is too large to be modeled correctly without experimental constraints. We used the steric constraints inherent in connecting the V3 loop to core gp120 as well as those consistent with 17b Fab binding. Nevertheless, very different V3 loop structures could be successfully built (Fig. a).
FIG. 6 Modeling and electrostatic contribution of the V3 loop. (a) Different V3 loop models were constructed: alpha (green), beta (dark green; toward the back left of gp120 in this orientation), Haiti (forest green; depicted as the front right model in this (more ...)
The overall charge of the V3 loop ranges from +2 to +10, with that of a CCR5-using isolate generally in the range of +3 to +5 and that of a CXCR4-using isolate (often a T-cell-line-adapted strain) being from +7 to +10. With the HXBc2 sequence (+9 charge on the V3 loop), we analyzed the electrostatics of three very different V3 loop structures, as well as the effect of varying the overall net V3 loop charge (Fig. c).
Analysis of the potential near the cell-facing region of the oligomer showed a correlation between the potential and the overall V3 loop charge, especially at distances further than 20 Å from the gp120 core (Fig. c, middle and right panels). In contrast, the potential did not seem to correlate with the precise structure of the V3 loop (Fig. c, left panels).
The relatively low electrostatic potential for a zero V3 loop net charge suggested that the contribution of the core to the overall potential was small. We tested this explicitly by calculating the electrostatic potential as a function of only the V3 loop (Fig. c). At long distances and with a high charge (for example, CXCR4-using isolates), the V3 loop potential dominated, approximating closely the potential for the entire core with V3 loop. At short distances and with a low V3 loop charge, the core contributed significantly to the electrostatic potential. This was especially true at x = 35, y = 0, directly below the protomer, close to the above-described basic conserved region on the core.