|Home | About | Journals | Submit | Contact Us | Français|
The HIV integrase enzyme (IN) catalyzes the initial DNA breaking and joining reactions that integrate viral DNA in the host chromosome. Structures for individual IN domains have been determined by X-ray crystallography and NMR spectroscopy, but the structure of the complete IN-DNA complex has remained elusive. Homogeneous complexes of IN tetramers were assembled on DNA three-way junction substrates designed to resemble integration intermediates. Electron microscopy and single particle image analysis of these complexes yielded a three-dimensional reconstruction at ~27 Å resolution. The map of the IN-DNA complex displays four lobes of density ~50 Å in diameter. Three of the lobes form a roughly triangular base with a central channel ~20 Å in diameter. The fourth lobe is centered between two lobes and extends ~40 Å above the base. We propose that the central channel tethers the target DNA, and two of the lobes may bind the ends of the viral DNA. The asymmetry of the complex is a feature not incorporated in previous structural models and potentially provides a first view of an asymmetric reaction intermediate.
The retroviral-encoded integrase (IN) enzymes are members of a large family of recombinases that contain the D,D-35-E active site motif 1; 2; 3. X-ray crystallography and nuclear magnetic resonance spectroscopy have been used to determine high resolution structures of the three protein domains and two domain fragments of retroviral INs 4; 5; 6; 7; 8; 9; 10; 11. However, there is no structure available for the complete three domain IN protein or IN-DNA complex.
IN is a promising target for antiviral drugs because it is essential for HIV replication and because there are no close counterparts in host cells 1; 12; 13; 14. Understanding the correctly assembled IN-DNA complex is crucial for structure-based design of improved integrase inhibitors. Properly assembled IN-DNA complexes respond to small molecule inhibitors in vitro differently than do dissociated mixtures of free IN and DNA substrates. That is, integration reactions containing preassembled IN-DNA complexes are less prone to inhibition by nuisance compounds 15; 16, and screens using such reactions have yielded molecules that are showing success in clinical trials 14; 17. Thus further improvement of IN inhibitors would be greatly aided by structural information on the correctly assembled HIV IN-DNA complex.
The DNA breaking and joining reactions mediating HIV integration 18; 19; 20; 21; 22; 23; 24 are diagrammed in Fig. 1A. The immediate precursor for integration is the linear viral cDNA (Fig. 1A part 1). Prior to integration, two nucleotides are removed from each 3' end by IN (Fig. 1A, part 2), a reaction that may serve to generate a homogeneous substrate for subsequent reaction steps 25; 26 and stabilize the IN-DNA complex 27; 28. A coupled transesterification reaction mediated by IN joins the recessed 3' ends of the viral DNA to the protruding 5' ends in the target DNA (Fig. 1A, part 3)29. The specific enzymes responsible for repair of the resulting DNA gaps at each end of the viral DNA (Fig. 1A, part 4 and 5) are not fully clarified, but host cell gap repair enzymes are likely candidates 30.
The complex that carries out integration in vivo is expected to involve a multimer of IN. Support for this idea can be inferred from the substrate symmetry, since the two viral DNA ends can be reasonably modeled as each bound by a different IN subunit in an IN multimer 4; 7; 8; 9; 10. Additional evidence is based on the results of genetic complementation studies, in which different IN mutants were found to complement each other when present in the same complex 31; 32. Furthermore, purified IN forms multimers readily in vitro 4; 33; 34.
A complication in studying IN-DNA complexes has been the poor solubility of the protein in vitro. In one approach to this problem, several studies have reported surface mutations that improved solubility and allowed three-dimensional (3D) crystallization and X-ray structural analysis of IN domains 7; 8; 35. Another strategy has been to assemble IN with DNA fragments. The use of exact mimics of integration intermediates, however, results in a molecule that is not stably base paired (Fig. 1B). However, DNA stabilization can be accomplished by linking the structure together as a pair of DNA three-way junctions, and such modified structures were bona fide substrates for Rous sarcoma virus IN 36. Alternatively, the addition of oligonucleotides resembling the viral DNA ends yielded more homogeneous and soluble RSV IN complexes 37.
In this study we examined soluble HIV IN derivatives with DNA three-way junction substrates. Physical and spectroscopic analysis suggested that IN formed a tetramer bound to a single DNA substrate. Since the complexes were soluble and monodisperse, we used electron microscopy and image reconstruction to derive a 3D map at 27Å resolution. A remarkable feature is that the triangular base of the complex encloses a central channel that we propose binds the target DNA. The structure was found to be asymmetric, a feature not previously considered in structural models for IN-DNA complexes. However, a recent functional study in vitro did conclude that the two viral DNA ends become integrated sequentially into target DNA in a defined order, implying the existence of an asymmetric intermediate 28.
IN complexes were assembled on DNA substrates designed to resemble the product of the IN-catalyzed DNA strand transfer reaction (Fig. 1A and B). Pilot studies suggested that a DNA formed by annealing five oligonucleotides as shown in Fig. 1C yielded the optimal DNA for assembly. The horizontal parts of the DNA as drawn mimic the integration target DNA, and the two diagonal DNA duplexes match the viral DNA ends (U3 and U5). In the intermediate shown in Figure 1B, note that each viral DNA end is joined to the target DNA on one DNA strand only. The branched DNA molecule in Figure 1C differs from the authentic intermediate by the DNA loop that attaches the 5' end of the right viral DNA end to the free 3' end in the adjacent target DNA. This prevents dissociation of the annealed oligonucleotides due to melting of the 5 bases of target DNA between the points of joining of the two viral DNA ends (grey circle in Fig. 1B), which is expected to take place a physiological temperatures and is known to take place in the authentic intermediate 38. The substrate that we used (Fig. 1C) differs from previously reported paired DNA three way junctions 36 by having the non-biological connection in only one half of the DNA complex, a modification that was required for efficient assembly of HIV IN-DNA complexes (data not shown). The lengths of the DNA arms in the paired DNA three-way junction substrate were selected on the basis of pilot assembly experiments testing DNAs with different arm lengths (data not shown).
Several IN mutants were prepared and tested for improved complex formation and integration activity, including C56S/W131D/F139D/F185K/C280S, C56S/W131D/F139D/F185H/C280S, C56S/W131D/F139D/C280S, C56S/W131D/F139D/F185H/E246C/C280S, and C56S/W131D/F139D/E246C/C280S. The substitutions at positions 131, 139, and 185 improved solubility 8; 35, while the substitutions of Cys residues at 56 and 280 were well tolerated and may have reduced formation of oxidative side products during protein purification and storage 33; 39. To assess DNA binding efficiency, IN mutants were further modified to contain the E246C substitution and were tested for efficient cross-linking to DNA substrates that contained tethered sulfur atoms at LTR position 7 as described previously 39. The DNA substrates were end-labeled and incubated with the IN derivatives, followed by electrophoresis of the reaction products. Relative affinity was assessed by monitoring the formation of complexes in the presence of increasing salt concentrations. This analysis indicated that IN C56S/W131D/F139D/F185H/C280S formed DNA complexes with relatively high affinity (unpublished data), so IN derivatives containing these substitutions were used in further experiments.
To assemble complexes, purified IN protein was mixed with the annealed DNA three-way junction in the presence of 1M NaCl and 5 mM CHAPS detergent. It is known that IN does not bind to DNA in high ionic strength buffers. Consequently, dialysis against buffers containing 100 mM NaCl and 5 mM CHAPS allowed assembly of IN-DNA complexes.
Size exclusion chromatography showed that the IN-DNA complexes eluted as a single peak with a mobility slightly greater than the 158 kDa marker (Fig. 2A). The expected size of an IN tetramer bound to one molecule of the paired DNA three-way junction is 168 kDa. The ratio of absorbance at 254 and 280 nm was consistent with a 4:1 stoichiometric ratio of protein and DNA (Fig. 2A and data not shown). The dissociation constant of the IN-DNA complex (KD=120 nM) was estimated by dilution of IN-DNA complexes followed by gel filtration (Fig. 2B).
IN-DNA interactions in the assembled complexes were also characterized by DNAse I protection (Fig. 2C). One strand of the DNA substrate was end-labeled with 32P (asterisk in Fig. 2D), and labeled substrates were incubated with varying concentrations of IN. The entire substrate within the complex became protected from DNAseI digestion at ~150 ug/ml IN. Thus, we infer that the IN tetramer binds the paired three-way junction so that there is steric interference with DNAseI over most of the DNA length.
Electron micrographs of negatively stained IN-DNA complexes showed a homogeneous distribution of compact, globular particles (Fig. 3a). Cross-correlation analysis using the EMAN software suite was used to sort individual particle images into classes of similar views that were then averaged--data quality measures are presented in Fig. 3 b and c. Representative individual particle images and their class averages are shown in Fig. 3d–f, respectively. The class averages were then merged to yield a starting 3D map of the complex. Comparison of the raw images with 3D back projections of the 3D model generated an improved set of class averages and subsequently an improved 3D map. The iterative refinement was continued until there was no further statistical improvement in the 3D map. The resolution was based on the Fourier shell correlation method in which the data set was randomly divided in half and the two maps were compared in resolution shells (Fig. 3b). Using a correlation coefficient cut-off value of 0.5, the resolution was estimated to be 27Å.
Since we do not expect duplex DNA to be visible in a low resolution 3D map of negatively stained particles, the observed density is interpreted as representing predominantly the 128 kDa IN tetramer only (Fig. 4a). The map displays four lobes of density ~50 Å in diameter. Three of the lobes form a roughly triangular base with a central channel ~20 Å in diameter (Fig. 4a, top row). The fourth lobe is centered between two lobes and extends ~40 Å above the base (Fig. 4a, bottom row). We propose that the central channel tethers the target DNA, and two of the lobes may bind the ends of the viral DNA.
Several models for the structure of the IN-DNA complex have been proposed previously, based on the available high resolution structural data for IN domains, IN-DNA crosslinking data 39; 40; 41; 42; 43, as well as other biochemical experiments. We generated low resolution molecular envelopes for each model (Fig. 4b, c, d) to allow visual comparison with our 3D map.
Model 1 (Fig. 4b) satisfied constraints from structural and crosslinking studies, with particular emphasis on the results of disulfide-mediated cross-linking experiments 39. Model 2 (Fig. 4d) emphasized constraints on particle dimensions derived from fluorescence anisotropy studies 44. Model 3 (Fig. 4d) attempted to merge two of the two-domain IN structures, and DNA binding was modeled using the structure of a transposon-DNA complex 10; 45. A fourth model, emphasizing photo-crosslinking data, proposed that an octamer of IN was the binding moiety, but this is inconsistent with both the gel filtration data (Fig. 2A) and the dimensions of the reconstructed particle 41.
A common feature of models 1–3 is that the IN-DNA complex has two-fold rotational symmetry (C2). IN dimers are bound to each viral DNA end, and these assemble as a symmetric tetramer. However, the map in Figure 4a does not display C2 symmetry. Image reconstruction from negatively stained images must be interpreted with some caution due to possible artifacts arising during sample preparation. Nevertheless, uranyl acetate is itself a mordant, which can rapidly fix and preserve even transient biological structures 46. Although comparable in size to the 3D reconstruction of the IN-DNA complex, it is clear that none of the previously proposed models are a close match to the map in Figure 4a.
The image reconstruction reported here suggests that the IN-DNA complex has a triangular base with a central channel, which resembles a variety of other DNA binding proteins that wrap around their substrates, including PCNA 47, topoisomerases 48; 49; 50 and polymerases 51; 52; 53. For the case of IN, the simplest interpretation is that the channel tethers the target DNA, and the lobes of density may anchor the viral DNA ends during catalysis. Circumferential DNA binding around the target DNA may serve to exclude solvent from the IN active site during the strand transfer reaction, thereby favoring use of the viral DNA 3' end as a nucleophile instead of water.
The unexpected asymmetry in the complex is intriguing. It is possible that the asymmetry results from the averaging of images of particles with different conformations. In this case, the map would be a composite and the variable features would be smeared out. It is also possible that the structure reflects an authentic asymmetric intermediate in the integration reaction. It is possible and perhaps likely that strand transfer at each of the two viral DNA ends is not simultaneous but instead sequential. If so, the integration reaction would proceed through a series of asymmetric reaction intermediates to accomplish the sequential integration of the two DNA ends, and indeed a recent biochemical study suggested that this is the case 28. The DNA substrate in the complex is asymmetric in primary sequence and also contains the stabilizing DNA loop at only one of the two junctions between the viral and target DNA. Thus, it is possible that the asymmetric DNA may have stabilized a previously unappreciated asymmetric reaction intermediate.
The plasmid expressing IN C56S/W131D/F139D/F185H/C280S from a T7 promoter was constructed by replacing segments of the IN coding region of a synthetic IN gene containing a hexahistidine affinity tag 54 with synthetic oligonucleotides 39. The modified IN protein was expressed in Escherichia coli strain BL21/DE3 by the addition of IPTG. A cell pellet was resuspended in 20 mM Tris pH 7.9, 0.2M NaCl and lysed by sonication in the presence of 2 mg/ml lysozyme. The suspension was adjusted to 1M NaCl, 5 mM β-mercaptoethanol (BME), 10 mM CHAPS, 5 mM imidazole, and 1X protease inhibitor cocktail 1 (Calbiochem) and then clarified by centrifugation for 30 min at 15,000 rpm in a JA20 rotor. Aggregates were removed by passage through a 0.45 micron filter and purified by binding and elution on Ni-NTA agarose (Qaigen) as described 39. A Centriprep YM-30 concentrator was used to exchange the elution buffer for thrombin buffer (1.0 M NaCl, 20 mM Tris pH 7.9, 5 mM BME, 10 mM CHAPS) and to remove imidazole. The hexahistidine tag was removed by overnight incubation with thrombin (1.5 units per ml IN solution) at 4 °C. The cleaved hexahistindine tag was removed by repeated passage of the IN solution over the Ni-NTA agarose. Thrombin was removed by chromatography using benzamidine sepharose 6B. IN was concentrated using a Centriprep YM-30 and dialyzed overnight at 0 °C against sample buffer (1M NaCl, 10 mM Hepes pH 7.5, 10 mM BME, 10 μM ZnSO4, 10 mM CHAPS). IN activity was assayed using end-labeled DNA substrates as described 39.
The paired DNA three-way junction substrate was prepared by annealing 80 nmol of each of 5 oligonucleotides of sequence: U3Bb 5' CAAGTCACTGCTTTTACTGGAAGGGCTAATTA 3', U3Tb 5' TAATTAGCCCTTCCACCGCGCGTAGCCACAC 3', U5B1b 5' pACTGCTAGAGATTTTCC 3', U5B2b 5' GTGTGGCTACG 3', U5Tb 5' GGAAAATCTCAGCACGCGGGCAGTGACTTG 3'. Oligonucleotides were mixed, heated to 95 °C and cooled to 4 °C over 45 min.
To prepare IN-DNA complexes, 80 nmol DNA substrate was mixed with 320 nmol of purified IN in 1 M NaCl, 5 mM CHAPS, 10 mM DTT, 1 mM EDTA, 10 mM TrisHCl pH 8.0. The mixture was then dialyzed against 5 mM Hepes pH 7.3, 5 mM dithiothreitol, 5 mM CHAPS, 100 mM NaCl, 10 μM ZnSO4. For gel filtration analysis, complexes were diluted in running buffer (80 mM NaCl, 20 mM Hepes pH 7.5, 10 mM CHAPS, and 10 mM BME) to 0.1 mg/ml, or as indicated in Figure 2, and then separated in running buffer by Superose 12 chromatography.
IN-DNA complexes prepared as described above were diluted to 0.35 mg/ml using buffer containing 20 mM Hepes pH 7.6, 10 mM CHAPS, 80 mM NaCl, and 10 mM dithiothreitol. Aliquots (~4 μl) were incubated for 1 min at room temperature on carbon-coated Maxtaform, 300-mesh Cu/Rh grids (Ted Pella, Inc., Redding, CA) rendered hydrophilic by glow discharge in the presence of amylamine. Excess solution was removed by blotting, and the sample was stained for 30 sec with 2% uranyl acetate. Images were recorded on Kodak SO163 film using a CM100 electron microscope (FEI/Philips) at a magnification of 52,000 ±1% and an underfocus of ~2.5 μm. Negatives were digitized on a Zeiss SCAI flat-bed scanning densitometer (ZI/Zeiss) with a step size of 7 μm, followed by 2-fold pixel-averaging, which resulted in a pixel size of 2.69Å on the object scale.
Image processing was performed with the EMAN software suite 55. 2,196 particles were manually selected and extracted as 100 x 100 pixel images. The optical density histograms for the pixels in each image were scaled to the mean and standard deviation for all images. The contrast transfer function (CTF) parameters for each micrograph were determined using the routine ctfit from the computed Fourier transform of the carbon film of each micrograph, and phase corrections were then applied to each particle image. The particles were then centered using cenalignint. To minimize the influence of surrounding noise, the 100 x 100 pixel images were masked at 64 x 64 pixels. Reference free class averages of the particles were then generated using startnrclasses with about 60 particles in each of the 25 classes. The starting 3D model for reference based alignment was generated by a cross common lines approach (program startany), using the 12 class averages with the highest signal-to-noise ratio. The class averages were first low pass filtered to 20 Å−1, and five rounds of iteration were performed to determine the Euler angles for each group of class averages. Projections of the 3D starting model were computed at 9° intervals. The program refine was used to determine the x,y origin and the Euler angles for each particle by crosscorrelation with the 216 projections of the starting model. Particle images with the same Euler angles were averaged, and the distribution of correlation coefficient values was determined. Those images with correlation coefficients that deviated by 0.8σ were rejected. The final set of 216 class averages was used to generate a new 3D model. For the next round of refinement, the new model was smoothed using threed.1a.mrc. After 20 cycles of refinement, the process was halted because the Fourier shell correlation with the previous model did not yield any substantial differences within the resolution cutoff. The final 3D map was generated from 1,783 particles without applying any symmetry. To estimate the resolution of the final map, these 1,783 particles were randomly divided into two groups, and two 3D maps were correlated in Fourier space. The resolution was defined using a Fourier shell cut-off value of 0.5. The final 3D map was visualized by the use of Chimera and Vis5d software (http://www.ssec.wisc.edu/~billh/vis5d.html). A protein partial specific volume of 0.81 Da/Å3 was used to set the isosurface threshold that corresponded to the molecular volume.
We thank Dr. Leslie Orgel and members of the Bushman laboratory for help and comments on the manuscript. F. B. was supported by grant GM068408 from the National Institutes of Health, the James B. Pendleton Charitable Trust, and Frederic and Robin Withington. K. G. was supported by the UCSD Center for AIDS Research (NIAID 2 P30 AI 36214-09A1). G. R. was supported by a postdoctoral fellowship from the Universitywide AIDS Research Program. M. Y. was supported by grant RO1 GM066087 from the National Institutes of Health and was the recipient of a Clinical Scientist Award in Translational Research from the Burroughs Wellcome Fund during this work.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.