Rationalization of the crosslinking data in the context of currently available structural information
Photocrosslinking and chemical crosslinking data available to date, combined with results presented in this study, were compared with the interactions observed in the recently solved structures of the PFV intasome. In order to identify corresponding residues, a structure-based sequence alignment of ASV IN, HIV-1 IN, and PFV IN was created by superimposing the coordinates of the individual domains of the ASV and HIV-1 INs on the structure of full-length PFV IN complexed with the viral and target DNAs (). A summary of our analyses is presented in , , ,
. Comparison of the data from different sources was complicated by the fact that different ways of numbering of the nucleotides in the DNA substrates have been used by various investigators. For example, in several studies numbering of the cleaved strand starts with the first adenine on the 3′-end, resulting in the assigning of the numbers “−1” and “−2” to the two extra nucleotides on the 5′-end of the non-cleaved strand, (i.e. Gao et al. 
). In the structures of PFV IN complexed with DNA, numbering from the 5′-end was introduced for the cleaved strand of viral DNA, placing the 3′- end adenine under number 17. Because the length of the oligonucleotides used in different studies varies, numbering from the 5′-end introduces additional confusion, as the number designations for the structurally equivalent nucleotides in the cleaved strands of different length would be different. We, as well as some others, elected to number the non-cleaved viral DNA strand from the first nucleotide at the 5′-end. The first nucleotide on the 3′- end of the cleaved strand of processed substrate (closest to the junction in Y-mer or X-mer integration intermediate substrate) is assigned #3 (, green strand). For the target DNA, numbering of both strands starts from the junction of the integration site (, pink and blue strands). In order to compare our crosslinking results with IN-DNA contact data from other laboratories, we have translated all nucleotide numbering of the strands that vary in substrate DNAs into this format. However, as a reference, we have included in curly brackets the original numbers from Maertens et al. 
and Krishnan et al. 
for the nucleotides shown to interact with PFV IN.
To identify the functionally equivalent residues in ASV, HIV, and PFV INs, the structures of individual NTD, CCD, and CTD domains were superimposed upon the structure of the complex of PFV IN with DNA (PDB code 3OS0). Some chemical and photocrosslinking data identify the individual points of contact between the proteins and DNA. If a method does not allow one to specify a single contact point in both protein and DNA, then these data are not sufficient to establish the exact correlation with results from crystallography, even when they do not contradict them. Such data can be categorized only as either “do not contradict,” if IN and DNA are in proximity to each other in the PFV IN structure, or “no contact,” if IN is remote from substrate DNA in the PFV intasome. Specific residues shown to interact with DNA that are either in good correlation with the PFV structural results or do not contradict them are bolded in , , , . The tabulated results show that the correlation between the PFV crystal structures and experimental data from crosslinking, mutagenesis, protease mapping, and mass spectrometry for ASV, MuLV, and HIV-1 IN proteins is highest for the CCD (, with color coding as in ). The crosslinking results that pinpoint individual IN-DNA contacts in the NTD and CTD of HIV-1 and ASV IN proteins show low correlation with the interactions observed in the structure of PFV IN complexed with DNA.
Interactions between DNA and the CTD
For the CTD, none of the individual contacts revealed in ASV and HIV-1 IN proteins by crosslinking or other methods can be correlated with those observed in the crystal structures of PFV IN-DNA complexes (). Our crosslinking results with ASV IN show contacts of Arg244 to both strands of viral DNA at positions 10–12. However, in the PFV intasome structure 
, the equivalent residue, Asn348, is separated from the corresponding positions on DNA by the linker regions that connect the CCD with the NTD and CTD (). We note that while not seen in the PFV intasome structure, CTD interactions with the trans viral DNA remain a possibility and could be accomplished with minor movement of the domain. Results of Gao et al. 
, indicate that residues Ser230 and Glu246 of HIV-1IN interact with bases 1 and 7 of the non-cleaved strand of viral DNA, respectively. Crosslinking experiments based on the electron microscopy model obtained by Michel et al. 
provided evidence for contact between Lys266 in HIV-1 IN and nucleotides 6–7 in the non-cleaved strand of viral DNA. These results are not in agreement with the HIV-1 model of Krishnan et al. 
, which was derived from the PFV crystal structure. Contact between the CTD of HIV-1 IN and the base of thymidine 6 of the non-cleaved strand of viral DNA as reported by Esposito et al. 
, faces the same problems as the contact of nucleotide 7 with residue 246 
, as the linker regions separate the protein and DNA in the PFV intasome.
Structural interpretation of crosslinking data for ASV IN.
Residues Leu234, Arg262, Arg263 and Arg269 in the CTD of HIV-1 IN, which have been shown to interact with DNA by modeling and/or experimental studies, were also implicated as interacting with DNA by several mass-spectrometry and mutagenesis studies (). Residues that are structurally equivalent to Arg262 in HIV-1 IN are Ile366 and Ser262 in PFV and ASV INs, respectively. Because of the different sizes of the side chains of these residues in the three INs, analogous contacts cannot be made with PFV IN, and for ASV IN this seems problematic. Similarly, the capability of Leu234 and Arg263 in HIV-1 IN to contact DNA appears to correlate with the presence of arginine at structurally equivalent positions in ASV and MuLV IN proteins. However, HIV-1 IN Arg269 and PFV IN Ser373 both interact with DNA. The segment containing Ser373 is located at the very end of the visible CTD of PFV IN, and the flexibility of this part of the protein may facilitate interaction with DNA.
Heuer et al. 
showed that the azidophenacyl photocrosslinker, attached to unique phosphorothioate located between nucleotides 6 and 7 of the cleaved strand of viral DNA, could be crosslinked to the peptide comprising residues 247–270 of HIV-1 IN. While some residues from the corresponding range in PFV IN are within reach of equivalent nucleotides 6 and 7 in the crystal structure, the specific residues in HIV-1 IN that are involved in these interactions are unknown.
Interactions between DNA and the CCD
Much more information regarding the sites of contact with DNA is available for the CCDs of various INs. Out of twenty-seven individual residues and 7 peptide ranges identified in 50 experimental data points that were analyzed and presented in , as making contact between the CCD and DNA, thirty-seven IN-DNA contacts corresponded to residues analogous with those observed to interact with DNA in the crystal structure of the PFV intasome.
Our photocrosslinking data indicate that S124C of ASV IN makes contact with the third nucleotide of the cleaved strand of target DNA, and a minor contact with nucleotide 8 of the same strand (). In the crystal structure of the PFV intasome the analogous residue makes contacts with nucleotides 3 on the cleaved and 6 on the non-cleaved strands of the target DNA (shown in red in , they correspond to nucleotides 3 and 3, respectively, in the numbering system used here). The nucleotide corresponding to nucleotide 8 on host DNA complexed to ASV IN (minor contact) is not visible in the structure of the PFV intasome due to the mobility of the ends of the host DNA in the absence of contacts with the protein. This crosslink might be attributed to the flexibility of the photocrosslinking tether combined with mobility of the ends of host DNA (see Materials and Methods).
Photo- and chemical crosslinking data for I146C of ASV IN identified nucleotide 3 of the cleaved strand of viral DNA as the point of contact. Contact between I146C and nucleotide 2 of the non-cleaved strand of viral DNA was also detected by chemical crosslinking (,
). In MuLV, the structural equivalent of this residue is Cys209. Photo- and chemical-crosslinking experiments on MuLV by Vera et al. 
confirmed the involvement of this residue in the interactions with the viral end of DNA in the active site area. Cys209 in MuLV IN is reported to make contact with nucleotide 1 on the non-cleaved strand of viral DNA (). The corresponding residue in PFV IN, Thr210, also contacts the base of nucleotide 3, as in ASV IN, but in the non-cleaved DNA strand. All chemical crosslinks involving the ASV I146C derivative are maintained with the bases of the corresponding nucleotides. The contacts between Thr210 and DNA in PFV IN are localized in the minor groove between two strands; therefore the data from ASV IN correlate reasonably well with the PFV structure (). Residue 146 in ASV IN and the corresponding residues in HIV-1 and PFV INs are located within the active site flexible loop, which has been shown to adopt multiple conformations in different IN structures with various inhibitor, substrate, and pH/buffer conditions. The tip of this loop can move up to 7 Å under conditions that do not alter the overall three-dimensional structure of the CCD. In the PFV intasome, this loop is inserted between the ends of the complementary strands of viral DNA (). Therefore, if a similar position is assumed by the ASV loop when complexed with viral DNA, 146C would be able to interact with nucleotides on both strands.
Photo- and chemical crosslinking data for CCD-DNA contacts have been reported by several other groups. The contact for the HIV-1 residue Lys159 reported by Jenkins et al. 
is with A3 nucleotide at the 3′-end of the processed strand. This amino acid is equivalent to Lys228 in PFV IN, and it interacts with the phosphate backbone between the nucleotides 3 and 4. The crosslinking observed in 
between HIV-1 Lys159 and N7 of the base of A3 requires some adjustment of the orientation of A3 base, as seen in the PFV intasome structure.
The results of S-S crosslinking 
of both blunt and processed DNA substrates and the results of photocrosslinking 
of blunt DNA substrates to HIV-1 IN Q148 implicate two neighboring nucleotides of the non-cleaved strand of viral DNA, #2 and #1, respectively, for interaction. Although these nucleotides are found in the crystal structure of PFV IN in the vicinity of S217 (analogous to Q148 in HIV-1 IN), their bases, modified for crosslinking experiments, point away from the side chain of S217. As suggested by Krishnan et al. 
, such discrepancies can be attributed to the experimental setup (blunt vs.
processed substrates) or to conformational mobility of the crosslinker.
Several amino acid residues of HIV-1 IN were reported by Alian et al. 
to be involved in crosslinking, but these results do not match the IN-DNA contacts found in the PFV intasome structure for the corresponding pairs. There was a very low correlation of crosslinking data for HIV-1 residues 143, 160, and 164 using processed DNA with the model of HIV-1, which is derived from the PFV intasome structure 
. Nucleotide A1 of the non-cleaved strand of viral DNA identified by crosslinking to interact with all HIV-1 IN three residues () cannot reach the corresponding residues in the crystal structure of PFV IN. Only one contact was detected between HIV-1 Y143 and nucleotide A1 when a blunt ended substrate was used 
. The same contact was identified by Esposito et al.
in photocrosslinking experiments with blunt DNA substrates 
. Alian et al. 
suggested that the loop comprising HIV-1 residues 160–164 comes in close proximity to the 5′-end of the non-cleaved strand of viral DNA only during strand transfer. This hypothesis is inconsistent with the HIV-1 IN model 
; as Lys 160 lies within contact range of G8 and quite far from the integration center, HIV-1 Y143 is not listed as a possible contact with viral end DNA by Krishnan et al. 
but is positioned in close proximity to processed target DNA nucleotides (#1, 2, 3, −1, −2, and −3) closest to the integration site. It should be noted that, under some conditions, DTNB activation can produce nonspecific crosslinks 
Gao et al.
detected contacts between HIV-1 I191C and two nucleotides, 1 and 7 of non-processed viral DNA by S-S crosslinking 
. In the PFV intasome structure 
, the amide of V260 (I191 in HIV-1 IN) is located 4.5 Å away from the phosphate of nucleotide #7 of the non-cleaved strand of viral DNA, which is reasonable if the length of the thiopropyl linker is taken into account.
While the photocrosslinking experiments in which interactions between specific modified nucleotides and HIV-1 IN 
in most cases do not provide exact localization of the contact sites on the IN protein, comparison of the relative positions of identified peptides (49–69, 51–64, 139–152, and 158–198) and DNA show good correlation for 11 out of 13 reported crosslinking contacts when compared to the PFV intasome structure 
, the ASV IN two-domain structure (PDB code1C0M) 
superimposed on the corresponding domains of the PFV intasome, and the model of the HIV-1 intasome 
. Some of these peptides have been targeted from multiple locations on DNA. For example, HIV-1 peptide 49–69 comes into close proximity to the viral processed DNA (phosphate between C4 and G5, G5 base 
), non-processed viral DNA (A1 base, and the phosphate between G4 and C5 
), and non-cleaved strand of target DNA (backbone of G1, G2, and C(−2), G(−3) 
). The latter contacts are located on the opposite sides of the same strand of target DNA from the integration site (a similar spatial relationship is illustrated in for ASV IN substrate) and are made with residues from two IN monomers in the model of HIV-1 IN 
Introduction of the photoactivatable nucleotide analogs I-dU and I-dC into positions 3 of the cleavable strand and 1 and 2 of non-cleavable strand of blunt viral DNA substrates resulted in the crosslinks with CCD, although the exact positions in the protein were not elucidated 
. Nucleotides in these positions are also found to be in close proximity to the active site of the CCD in the PFV intasome 
Mutagenesis experiments carried out by Chen et al. 
on HIV-1 IN provided a list of residues (V54, V72, T124, T125, S153, K156, E157, K160, G193, 54–57) likely to be important for DNA binding and substrate specificity. Circular dichroism, fluorescence, and NMR experiments involving a synthetic analog of α4 helix of HIV-1 CCD and U5 LTR end 
revealed that the HIV-1 IN residues E152, S153, N155, K156, and K159 were likely to make contact with DNA. Protease mapping with HIV-1 IN 
assigned a similar role to the residues K111, K136, K159, E138, K185, K186, and K188, and mass spectrometry footprinting experiments 
indicated that K159 and K160 are involved DNA interactions. The corresponding residues in the PFV IN-DNA complex structure are within range to establish contacts with target or viral DNAs. However, the PFV equivalents of some residues in HIV-1 IN implicated in DNA binding in these experiments (e.g. 161, 162, 171, 172, 197 and 201 and peptides 128–130, 163–165), are not in a suitable range to contact DNA in the PFV intasome. Several positions in the fragment comprising residues 207–219, shown to interact with DNA by protease mapping 
and mass spectrometry 
, belong to the linker region between the CCD and CTD. This region differs in length in HIV-1, ASV, and PFV INs and exhibits little sequence homology. The HIV-1 IN model built by Krishnan et al. 
allows for the residues from this fragment to maintain contacts with non-cleaved strand of viral DNA (,
), correlating with the mapping data listed above.
Mutagenesis experiment by Esposito et al. 
indicated that nucleotides 3 ,4, 12, and 13 of the cleaved strand of viral DNA and nucleotide 2 of the non-cleaved strand participate in CCD-DNA interactions. The contacts of the nucleotides 2, 3,and 4 are in good agreement with the model of the HIV-1 intasome and structural data from PFV IN. Similarly, the loop comprising residues 207–209 of HIV-1 IN is in close proximity to nucleotides 12 and 13 of the cleaved strand. While the mutagenesis results 
do not contradict the structural data, they do not locate the contact residues in the protein. In contrast, our S-S crosslinking data identify both counterparts in the ASV IN-DNA interactions. For example, results with the I146C derivative of ASV IN implicate this residue in interactions with nucleotide 3 of the cleaved strand and nucleotide 2 of the non-cleaved strand of viral DNA.
In conclusion, the high degree of correlation between the structural and biochemical data on IN-DNA contacts in the CCD indicate that the mode of binding DNA to this domain is highly conserved in PFV, HIV-1, and ASV INs. Differences in protein structure and composition may explain the lack of correspondence in details of DNA binding by the NTD and CTD of PFV in the crystal structure of the intasome, when compared with data obtained from analysis of crosslinking and other experiments performed with ASV and HIV-1 IN proteins. The presence of an additional domain at the N terminus of PFV IN (the NED) certainly sets it apart from the other two retroviral IN proteins. In addition, variations in length and sequence of the linker regions between the NTD and CCD, and the CCD and CTD, suggests that residues at different positions in these domains could have been selected to perform analogous functions during the course of evolution of these viruses. On the other hand, depending on the concentration, IN proteins can exist in a variety of multimers in solution (dimers, tetramers, and higher forms), each of which might interact with DNA in unique ways during the assembly of a functional intasome. Such interactions may be detected in biochemical experiments, but not represented in the intasome crystal. Furthermore, the same amino acid in individual subunits might make different contacts with DNA in one or more of these multimers. We note that the NTDs and CTDs of only two of the four component subunits are visible in the crystal of the PFV intasome, and it is unknown if or how these domains in the other two subunits might interact with DNA. Additional crystal structures, including those of other retroviral intasomes, could help to resolve some of these issues. However, until we understand more about the dynamic properties of IN, and the conformational changes that accompany intasome assembly, it will be important to keep all of these factors in mind when interpreting both structural and biochemical data.