The primary goal of these studies has been to probe the solution properties of IN tetramers, as this oligomeric state has been frequently implicated in the synaptic and strand transfer complexes formed between IN and DNA. Here, we have employed SAXS to consider models for the arrangements of domains in a series of complexes formed between HIV IN and human LEDGF. Although the resolution of SAXS is not sufficient to reveal specific inter-domain interactions, this method can rule out models of quaternary arrangements that are inconsistent with the determined shape properties and with hydrodynamic properties derived from complementary methods. The different IN and LEDGF truncations examined here have allowed us to consider specific questions about the relative positioning of domains such as the NTD and the IBD.
There are currently no crystal structures available containing full-length HIV IN. Thus, the intermolecular contacts observed in two-domain crystal structures have been used to construct plausible models of higher oligomeric forms of the IN protein (
18,
25,
26,
34,
35,
37,
–39,
62). Indeed, the models we consider here are based on similar logic. We note that our use of the term “tetramer” is only intended to indicate a stoichiometry of an IN oligomer that involves four subunits and does not imply an equivalence or particular symmetry relationship between the protein domains. As noted earlier, the solution properties of the complexes we have studied here are largely consistent with tetrameric assemblies that could be best thought of as a dimer of dimers rather than a symmetric tetramer, a view that is very much in line with current thinking about the functional forms of IN (
18,
21,
25,
28,
32,
63).
The IN tetramer model most consistent with our experimental data (,
model 1) is based on elements from three separate crystal structures (
15,
18,
26). The dimer-dimer interface of this model is formed from two NTDs and two CTDs of IN that interact with one another and with the CCDs. The IN CTDs contribute extensively to this interface, consistent with the previously established importance of this domain to oligomerization (
20) and the associated impact of mutations at this interface (L241A, L242A) (
64,
65). Interestingly, the remaining two CTDs do not contribute to the dimer-dimer interface in this model and are largely solvent-exposed. A more extensive kink in the helix linking the CCD and the CTD or an entirely different linker conformation could allow these additional domains to participate in the dimer-dimer interface, but we restricted our modeling to rigid-body placement of structural fragments available from existing crystal structures and did not employ any type of flexible fitting.
Model 1 also features an asymmetric positioning of the NTDs. One NTD is largely buried in the dimer-dimer interface, where it occupies an apical position with respect to the CCD dimer. The other NTD occupies a lateral position, where it is engaged in binding to LEDGF and is not directly involved in tetramer formation. Due largely to its extended form, this model provides the best agreement with the SAXS envelope and therefore best represents the shape of the IN:LEDGF tetramer in solution.
The second type of model we considered (,
model 3) is based on a different dimer-dimer interface. In this case, the CTDs are not directly involved, and the interface is mediated entirely by interactions between the four NTDs in the tetramer, as well as between the NTDs and the CCDs. This dimer-dimer interface is based on crystal packing between two IN(NTD-CCD) dimers in the crystal structure (
18), where a dimer of NTDs in apical positions is associated with each CCD dimer, and two of these NTDs are shared with the opposing CCD dimer where they occupy lateral positions and engage the LEDGF(IBD)s. The protein-protein interface in this model is more extensive in terms of buried surface area compared with model 1; however, the more compact globular shape is less consistent with the elongated nature of the SAXS envelope.
In addition to models 1 and 3, we considered the alternatives modes of IBD binding to CCD, where they would bind in the absence of interactions involving the NTD. These IBD positions (models 2 and 4) are unlikely for the 4:2 IN·IBD complex, but would be expected to be occupied in the 4:4 IN·LEDGF(Cterm) complex. Indeed, the SAXS data for this larger complex supports the placement of LEDGF molecules in the context of models 1 and 2.
If the IN·IBD tetramer represented by model 1 is in fact closely related to the oligomeric form that exists
in vivo, then we should be able to explain previous observations based on the proposed domain organization. For example, the zwitterionic detergent CHAPS has a reported dissociative effect on IN tetramerization (
66,
67). In the IN(CCD-CTD) crystal structure, two CHAPS molecules are bound to each CTD, and one of these binding sites would most likely be excluded by the dimer-dimer interface in model 1. At high concentrations, CHAPS would therefore be expected to compete for this binding site and disrupt formation of the IN tetramer.
A tetramer model should also be able to explain the observation that LEDGF binding lowers the dimer-tetramer
Kd value, thereby stabilizing the tetrameric state (this study and see Refs.
21,
22). Because the IBD-binding sites are well separated from the surfaces used for oligomerization in all of our candidate models, the effects of IBD binding on tetramer formation are expected to be indirect. We propose that this stabilizing effect involves the positioning of NTDs in the dimeric
versus tetrameric forms of IN. Upon tetramerization, model 1 requires that one NTD remain in the apical position and one move to the lateral position. This is likely to be true even in the absence of LEDGF, because there does not appear to be room in the dimer-dimer interface for both NTDs without making significant adjustments to the CTD positions. Binding of LEDGF to IN could therefore promote tetramer formation by stabilizing the lateral configuration of two of the NTDs. This mechanism of tetramerization also explains why two types of IBD-binding sites are created in the tetramer, resulting in a stable 4:2 IN·LEDGF complex with the IBD but a 4:4 complex with the higher affinity Cterm construct. Because we do not yet know the nature of the interactions between IN and the LEDGF sequences flanking the IBD, we cannot rule out additional effects where LEDGF plays a more direct role in facilitating tetramerization via these additional elements.
A recent cryo-EM study of wild-type IN bound to full-length LEDGF (~60 kDa) both in the absence and presence of U5 DNA described an IN:LEDGF stoichiometry of 4:2 based on mass spectral analysis of a chemically cross-linked complex. This work proposed an IN·LEDGF tetramer structure in the absence of DNA that differs considerably from the models described here (
27). We calculated geometric properties for an IN·IBD complex based on the EM tetramer and docked this model into our SAXS envelope. The results are summarized in and in
supplemental Fig. 5. Although the dimensions of the EM model result in favorable
Dmax and
Rg values, the shape of the model results in a poor fit to the SAXS envelope (
cc = 0.61), with several regions protruding outside the envelope and parts of the envelope unaccounted for by model. The calculated
P(
r) distribution is also in poor agreement with that derived from the experimental data. It is thus difficult to reconcile the EM protein-only model with the hydrodynamic properties and shapes of the IN·LEDGF structures described here. Further biophysical characterization of the full-length IN·LEDGF complex in solution, perhaps coupled with EM studies of the minimal IN·LEDGF complexes described here, will no doubt be required to fully understand the differences.
The
Kd values of dimerization and tetramerization reported here for IN and IN·LEDGF complexes are in the micromolar range, supporting the idea that IN may switch among oligomeric forms during viral replication (
20,
22,
66,
–68). Indeed, viral DNA has been reported to dissociate IN tetramers (
67), and recent studies suggest that distinct IN arrangements are formed on DNA during the various steps on the integration pathway (
29,
32), with dimers first binding to each viral LTR (
30). Thus, there are likely to be distinct quaternary structures of dimeric and tetrameric forms of IN that form during the integration reaction when bound to DNA. Similarly, the domain organization of IN oligomers may change upon binding viral DNA, an observation that has been made in a number of nucleic acid-binding proteins (
69). Indeed, the distances between IN active sites in our tetramer models (77 Å for model 1) are far greater than the ~18 Å required for catalysis of concerted integration (
18,
26,
67), and recent studies suggest that distinct IN arrangements are formed on DNA during the various steps on the integration pathway (
29,
32), with dimers binding to each viral LTR (
30), indicating that they cannot represent the form of IN required for the final stage of the integration process.
LEDGF stimulates tetramerization of lentiviral INs. Although the IN·LEDGF interaction appears to be most important for viral integration, the capacity for IN alone to multimerize into tetramers could be important for several additional stages of the viral life cycle. Many amino acid substitutions in IN are known to affect assembly and morphology of viral particles (
70), and recent work has demonstrated a role for IN in the uncoating of the viral core (
71). Additionally, mutations that disrupt IN tetramerization affect its ability to interact with Gemin2 and assemble with reverse transcriptase on viral RNA (
65).
In viral producer cells, IN is synthesized as a part of the Gag-Pol polyprotein precursor, which contains the myristoylated matrix (MA) protein at the N terminus, structural proteins, including capsid (CA), and enzyme precursors, including reverse transcriptase (RT) as intervening components, and IN at the C terminus. HIV is thought to contain ~2000 gag molecules (composed of MA, CA, and nucleocapsid) and ~100 Gag-Pol molecules (composed of Gag plus protease, RT, and IN) (
72,
73). Thus, the great majority of IN monomers synthesized will not contribute to an IN tetramer involved in catalysis of integration but, judging from mutational studies, may well participate in some aspects of assembly. LEDGF does not appear to play an essential role in these potential nonintegration activities of IN (
8,
74), but because LEDGF binds tightly to IN in infected cells, the properties of IN tetramers described here are likely to represent much of the IN present during viral assembly and maturation.
Our ultimate goal in this work is to develop structural models for IN assemblies that play a role in the virus life cycle and to understand the role of host factors in the formation and function of these assemblies. Here, we have presented initial studies that aimed to develop a biophysical basis for understanding the oligomeric states, stoichiometries, and solution shapes of IN·LEDGF complexes. The next step will be to carry out similar studies on IN assemblies bound to DNA, again considering different combinations of protein variants and substrate forms. For these macromolecular complexes, neutron scattering offers the additional advantage of providing contrast between the protein and DNA components.