|Home | About | Journals | Submit | Contact Us | Français|
PA1 and PA25 are large hairpin polyamides that are effective in nearly eliminating HPV16 episomes (DNA) in cell culture, and PA25 has broad spectrum activity against three cancer-causing forms of HPV (Edwards, T. G., Koeller, K. J., Slomczynska, U., Fok, K., Helmus, M., Bashkin, J. K., Fisher, C., Antiviral Res. 91 (2011) 177-186). Described here are the interactions of these PAs with sequences in the long control region (LCR) of HPV16 (7348-122). Using an FeEDTA conjugate of PA1 (designed to recognize 5’-W2GW7-3’; W = A or T), 34 affinity cleavage (AC) patterns were detected for this fragment. These sites can be rationalized with sequences featuring perfect, single, double, triple and quadruple mismatches. Quantitative DNase I footprinting analysis indicates that perfect sites bind PA1 with Kds between 0.7-2.2 nM. Kds for single, double, triple and quadruple mismatch sites range from 1-3 nM to 20 nM. Using AC and EDTA conjugates, we report that unlike smaller 8-ring hairpin PAs, introduction of a chiral turn in this large polyamide has no effect on binding orientation (forward vs. reverse). Despite its design to recognize 5’-W2GW5GW4-3’ via two Im residues, a motif not represented in this HPV sequence, a PA25-EDTA conjugate yielded 31 affinity cleavage sites on the region. Low nM Kds for PA25 without EDTA indicate a high tolerance for triple and quadruple mismatches. While there is extensive coverage of the sequence examined, AC cleavage patterns for the two PAs show discrete binding events and do not overlap significantly. This indicates that within the context of A/T rich sequences, these PAs do not recognize a simple shared sequence-related feature of the DNA. These insights continue to inform the complex nature of large hairpin PA-DNA interactions and antiviral behavior.
Hairpin polyamides (PAs) are small polymers composed principally of pyrrole and imidazole. They are inspired by the crescent-shaped distamycin, a natural product that recognizes the hydrogen bonding patterns in AT rich sequences in the minor groove of DNA. Due to the elegant design but high toxicity, these natural products have evolved synthetically over a number of years to achieve high DNA affinity and reported sequence specificity [1-10] and have been subsequently applied to control the expression of genes [3, 11] and as chemical probes of nucleic acid structure .
A number of larger hairpin PAs (10 or more rings) have been shown to practically eliminate the HPV genome in cell culture with low IC50s and no detectable toxicity [13, 14]. The two lead compounds reported in this anti-HPV study are PA1 and PA25 (Fig. 1), compounds that exhibit IC50's of 0.1 and 0.036 μM against HPV16, respectively. Recently, the mechanism of action of this class of anti-HPV compounds has been partly elucidated in several reports that show an important role of the DNA Damage Response (DDR) modulated by PA administration to cells bearing small, circular dsDNA viral genomes, or episomes [14-16]. The access of these PAs to a DDR-related mechanism was unprecedented for any polyamides. Since our initial work was published, Dervan has reported other significant noncanonical PA influences on the DDR , though the mechanism seems largely unrelated since it does not depend on the presence of small, supercoiled circular dsDNA molecules.
In our recent studies of the interactions of PA1 with a 524 bp fragment of the E1 gene region of HPV16, we noted that even though only three perfect sites were predicted, 19 binding events were characterized using a combination of DNAse I footprinting and affinity cleavage. All of these could be rationalized as perfect, single or double mismatch sites, with the two former types of sites having indistinguishable Kd values. Such tolerance for mismatch sequences is not universal among the more extensively characterized 8-ring hairpin PAs, as far as we know. For example, among 8-ring PAs, single mismatch sites are bound with Kd values 1 - 7 to 26 - 55 times weaker than cognate (perfect) sites . The density and degree of overlap of sites can be attributed in part to the AT-rich nature of the HPV genome.
Our initial results led to additional questions. For example, given that the antiviral PAs are larger and distinct from their 8-ring hairpin cousins, how well do they tolerate sites with a greater number of mismatches? Another issue that emerged is the binding orientation. In theory, hairpin PAs can bind in either forward (the N terminal Im of the PA aligns with the 5’ end of the duplex) or reverse orientation (the C terminal Im of the PA aligns with the 5’ end of the duplex) (Fig. 1). Among the 8-ring PAs, the forward orientation is generally considered preferable (; vide infra); the combination of high AT density of the HPV genome and the apparently greater tolerance for mismatches indicate that larger PAs like PA1 might behave differently with respect to binding orientation. These issues could have significant implications for the mechanism of action of larger PAs (binding ten more bp) that show significant anti-HPV activity and in some cases show broad-spectrum anti-HPV activity [13, 20].
It is therefore imperative that we understand that mechanism on a molecular scale, approaching it from chemical, biophysical and biological perspectives. Here we address these issues with a comparative biophysical study of two antiviral PAs, PA1 and PA25, and their interactions with the long control region (LCR) region of HPV16.
PA1, dIm-PyPyβPyPyPy-γ-PyPyβPyPyPyPyβ-Ta, its EDTA conjugate (PA1-EDTA) and PA25 were prepared as previously described (PA1, PA25) [13, 21]. Minimal details (HRMS only) were given to characterize PA1 and PA25 in a biology report , so their 600MHz 1H and corresponding 13C NMR data are reported here.
The building blocks used were amino acids and amines with the following abbreviations: Im = imidazole or N-methyl-2-carboxy-4-amino-imidazole, dIm = desamino-imidazole or N-methyl-2-carboxy-imidazole, Py = pyrrole or N-methyl-2-carboxy-4-amino-pyrrole, β = β-alanine, γ = γ aminobutyric acid γ(NH2) = (R)-2-Cbz-4-Boc-diaminobutyric acid yielding (R)-2,4-diaminobutyric acid incorporated into the PA via the acid and 4-amino group after all deprotection; Ta is the amino tail at PA C-terminus formed from NMe(CH2CH2CH2NH2)2. New compounds PA2, PA2-EDTA (or amino-PA1) and PA25-EDTA were prepared by analogy with PA1 and PA1-EDTA, based on literature [21, 22]. For new polyamide PA2, purity was assessed by two orthogonal analytical HPLC methods, one using a Phenomenex Jupiter® Proteo C12 RP column (90 A, 4 um) with mobile phase of (A: 0.1% HCO2H in H2O; B: MeCN) and the a Phenomenex Synergi Polar-RP column (80 A, 4 um) with mobile phase (A: 0.1% NH4OH in H2O; B: MeCN). Analytical HPLC with in-line diode array and MS employed an Agilent 1100 or 1200 HPLC with thermostatted column compartment and Agilent 1956B MSD single quadrupole mass spec detector, m/z range up to 3,000. HRMS was carried out at the proteomics center of the Donald J. Danforth Plant Sciences Center or by Prof. B. J. Bythell of UMSL.
PA stocks were prepared by dissolving in 100 % DMSO and quantitated using an extinction coefficient of 92,600 M-1cm-1 at 305 nm (PA1, PA2 and derivatives), or 91,700 M-1cm-1 at 305 nm (PA25 and derivatives). Both values were obtained via fluorescence assay .
PA2 precursor dIm-PyPy-β-PyPyPy-γ(NHCbz)-PyPy-β-PyPyPyPy-β-Ta and PA2 (i.e. amino-PA1) dIm-PyPy-β-PyPyPy-γ(NH2)-PyPy-β-PyPyPyPy-β-Ta were prepared and characterized by methods analogous to the literature [13, 22, 24].
1H NMR (600 MHz, DMSO-d6) d = 10.46 (s, 1 H), 9.93 (s, 1 H), 9.91 (s, 4 H), 9.90 - 9.87 (m, 2 H), 9.86 (s, 1 H), 9.83 (s, 1 H), 9.63 (br. s., 2 H), 8.12 - 8.02 (m, 5 H), 7.98 - 7.92 (m, 1 H), 7.88 (br. s., 3 H), 7.40 (s, 1 H), 7.28 (d, J = 1.8 Hz, 1 H), 7.24 - 7.21 (m, 3 H), 7.21 - 7.18 (m, 4 H), 7.18 - 7.16 (m, 3 H), 7.15 (d, J = 1.8 Hz, 1 H), 7.08 (d, J = 1.8 Hz, 1 H), 7.07 (d, J = 1.8 Hz, 1 H), 7.06 (s, 1 H), 7.04 (d, J = 1.8 Hz, 1 H), 6.90 (d, J = 1.8 Hz, 1 H), 6.89 (d, J = 1.8 Hz, 1 H), 6.89 (s, 2 H), 6.85 (s, 2 H), 6.84 (d, J = 1.2 Hz, 1 H), 3.99 (s, 3 H), 3.85 (s, 3 H), 3.85 (s, 3 H), 3.84 (s, 9 H), 3.82 (s, 6 H), 3.81 (s, 3 H), 3.81 (s, 9 H), 3.49 - 3.42 (m, 4 H), 3.41 - 3.35 (m, 4 H), 3.25 - 3.16 (m, 4 H), 3.16 - 3.10 (m, 2 H), 3.10 - 2.98 (m, 4 H), 2.87 (br. s, 2 H), 2.74 (d, J = 4.7 Hz, 3 H), 2.55 - 2.51 (m, 2 H), 2.36 (t, J = 7.0 Hz, 2 H), 2.28 (t, J = 7.3 Hz, 2 H), 1.96 - 1.87 (m, 2 H), 1.82 - 1.74 (m, 2 H)
13C NMR (151 MHz, DMSO-d6) d = 171.1, 169.3, 167.9, 167.8, 161.3, 158.6, 158.5, 158.5, 158.4, 158.4, 158.4, 158.1, 157.9, 155.9, 138.7, 126.8, 126.3, 123.0, 122.9, 122.8, 122.8, 122.8, 122.7, 122.7, 122.2, 122.2, 122.1, 122.1, 122.1, 122.0, 121.9, 121.4, 118.6, 118.5, 118.4, 118.2, 118.1, 118.0, 117.8, 104.9, 104.8, 104.8, 104.7, 104.3, 104.3, 104.2, 104.0, 104.0, 103.9, 53.3, 52.1, 40.0, 39.3, 38.2, 36.2, 36.1, 36.1, 36.0, 35.8, 35.8, 35.6, 35.6, 35.5, 35.4, 35.1, 33.3, 25.7, 24.0, 21.8
1H NMR (600 MHz, DMSO- d6) δ = 10.56 (s, 1 H), 10.45 (s, 1 H), 9.93 (s, 2 H), 9.91 (s, 5 H), 9.89 (s, 1 H), 9.61 - 9.53 (m, 1 H), 8.35 - 8.25 (m, 3 H), 8.23 (t, J = 5.3 Hz, 1 H), 8.12 - 8.01 (m, 4 H), 7.91 (br. s, 2 H), 7.85 (br. s., 3 H), 7.39 (s, 1 H), 7.28 (d, J = 1.8 Hz, 1 H), 7.24 (d, J = 1.2 Hz, 1 H), 7.22 (s, 2 H), 7.21 (d, J = 1.8 Hz, 1 H), 7.20 - 7.17 (m, 4 H), 7.16 (d, J = 1.8 Hz, 1 H), 7.15 (d, J = 1.8 Hz, 1 H), 7.11 (s, 1 H), 7.09 (d, J = 1.2 Hz, 1 H), 7.07 (d, J = 1.2 Hz, 1 H), 7.06 (d, J = 1.2 Hz, 1 H), 7.05 (d, J = 1.2 Hz, 1 H), 7.03 (s, 1 H), 6.98 (d, J = 1.8 Hz, 1 H), 6.93 (d, J = 1.8 Hz, 1 H), 6.89 (d, J = 1.8 Hz, 2 H), 6.86 (d, J = 1.8 Hz, 1 H), 6.85 (d, J = 1.8 Hz, 1 H), 3.99 (s, 3 H), 3.85 (s, 6 H), 3.84 (s, 3 H), 3.84 (s, 3 H), 3.84 (s, 9 H), 3.82 (s, 9 H), 3.80 (s, 3 H), 3.48 - 3.42 (m, 4 H), 3.42 - 3.35 (m, 2 H), 3.35 - 3.23 (m, 2 H), 3.21 - 3.16 (m, 1 H), 3.16 - 3.09 (m, 2 H), 3.09 - 2.98 (m, 4 H), 2.91 - 2.82 (m, 2 H), 2.74 (d, J = 4.7 Hz, 3 H), 2.56 - 2.51 (m, 4 H), 2.36 (t, J = 7.3 Hz, 2 H), 2.03 - 1.94 (m, 2 H), 1.94 - 1.86 (m, 2 H), 1.82 - 1.74 (m, 2 H) 13C NMR (151 MHz, DMSO- d6) δ = 171.0, 167.8, 167.8, 165.3, 161.9, 161.3, 161.3, 161.2, 158.5, 158.4, 158.3, 158.1, 158.1, 157.9, 157.6, 156.0, 138.7, 127.9, 126.8, 126.3, 123.2, 123.0, 122.8, 122.8, 122.7, 122.7, 122.7, 122.7, 122.4, 122.2, 122.2, 122.2, 122.2, 122.1, 122.1, 121.9, 121.4, 120.5, 118.6, 118.5, 118.4, 118.4, 118.4, 118.3, 118.2, 118.0, 117.9, 117.9, 117.6, 115.6, 104.9, 104.8, 104.8, 104.3, 104.2, 104.2, 104.0, 104.0, 53.3, 52.1, 50.8, 40.4, 40.0, 36.2, 36.2, 36.1, 36.1, 36.0, 36.0, 35.9, 35.8, 35.6, 35.5, 35.5, 35.4, 35.1, 24.0, 21.8 HRMS (ESI) calcd for C91H112N32O16 [MH]+, 1909.9012, found, 1909.9018. HPLC purity: 95%.
The EDTA conjugate of PA2 was prepared via the same method as previously described for PA1-EDTA conjugate . To a vigorously-stirred mixture of EDTA dianhydride (25 mg, 0.096 mmol, 22 eq) in N,N-diisopropylethylamine (DIEA) (0.5 mL), DMF (0.25 mL) and DMSO (0.25 mL) at 55 °C was added a mixture of Cbz-protected PA2 (12 mg, 0.0055 mmol, 1 eq) in DIEA (0.5 mL) and DMF (0.5 mL) dropwise over a period of 30 min. The reaction mixture was stirred at 55 °C for 30 min. Following addition of 0.1 N NaOH (1.0 mL), the mixture was stirred at 55 °C for an additional 20 min. The bottom layer of the biphasic mixture was removed from the reaction vessel and neutralized with formic acid (0.013 mL). The mixture was diluted with 0.5 mL DMSO, filtered through a 20 um polyethylene filter, and purified by reversed-phase HPLC [A:H2O (0.1% TFA), B: MeOH] using a Phenomenex Jupiter 250 × 30 mm, 5 um, 100 Å, C18(2) column. Concentration of pooled fractions checked by analytical HPLC/MS, followed by lyophilization gave the Cbz-protected PA2-EDTA conjugate (9.9 mg, 73% yield) as a white, fluffy solid.
The Cbz-protected PA2-EDTA conjugate (3.7 mg, 0.0015 mmol) was treated with 10% trifluoromethylsulfonic acid/trifluoroacetic acid (0.30 mL) at 25 °C for 30 minutes to remove the Cbz-protecting group . The mixture was then flash frozen with liquid nitrogen, treated with N,N-dimethylformamide (0.30 mL) and allowed to warm to 25 °C. The mixture was diluted with N,N-dimethyl formamide (0.30 mL) and water (0.30 mL), filtered through a 20 um polyethylene filter, and purified by reversed-phase HPLC [A: H2O (0.2% formic acid), B: MeOH] using a Macherey Nagel 50 × 21 mm, 5 um, C18 column. Concentration of pooled fractions, followed by lyophilization gave the desired EDTA conjugate, dIm-PyPy-β-PyPyPy-γ(NH2)-PyPy-β-PyPyPyPy-β-Ta- EDTA (2 HCO2H) (PA2-EDTA, i.e. amino-PA1-EDTA) (3.0 mg, 86% yield) as a white, fluffy solid: exact mass [MH]+= 2183.9813, experimental (ESI) [MH]+ = 2183.9727.
1H NMR (600 MHz, DMSO-d6) d = 10.46 (s, 1 H), 10.35 (s, 1 H), 9.95 (s, 1 H), 9.92 (s, 1 H), 9.91 (s, 4 H), 9.90 - 9.87 (m, 2 H), 9.86 (s, 1 H), 9.85 (s, 1 H), 9.82 (s, 1 H), 9.59 (br. s., 2 H), 8.11 - 8.01 (m, 7 H), 7.97 - 7.91 (m, 2 H), 7.87 (br. s., 3 H), 7.51 (s, 1 H), 7.39 (s, 1 H), 7.32 (d, J = 1.8 Hz, 1 H), 7.28 (d, J = 1.2 Hz, 1 H), 7.23 - 7.21 (m, 2 H), 7.20 (d, J = 1.8 Hz, 1 H), 7.20 - 7.18 (m, 6 H), 7.17 - 7.16 (m, 2 H), 7.16 (d, J = 1.8 Hz, 1 H), 7.15 (d, J = 1.8 Hz, 1 H), 7.07 (d, J = 1.2 Hz, 1 H), 7.06 - 7.05 (m, 2 H), 7.04 (d, J = 1.2 Hz, 1 H), 6.89 (d, J = 1.2 Hz, 1 H), 6.89 - 6.87 (m, 4 H), 6.87 - 6.84 (m, 4 H), 6.84 (s, 1 H), 3.99 (s, 3 H), 3.95 (s, 3 H), 3.84 (s, 12 H), 3.83 (s, 6 H), 3.82 (s, 3 H), 3.82 (s, 6 H), 3.81 (s, 6 H), 3.80 (s, 3 H), 3.79 (s, 6 H), 3.57 - 3.50 (m, 2 H), 3.48 - 3.42 (m, 4 H), 3.42 - 3.34 (m, 2 H), 3.24 - 3.16 (m, 2 H), 3.16 - 3.09 (m, 2 H), 3.09 - 2.98 (m, 4 H), 2.91 - 2.83 (m, 4 H), 2.74 (d, J = 4.7 Hz, 3 H), 2.58 - 2.51 (m, 6 H), 2.36 (t, J = 7.0 Hz, 2 H), 2.27 (t, J = 7.6 Hz, 2 H), 1.97 - 1.87 (m, 4 H), 1.78 (td, J = 6.6, 13.8 Hz, 4 H) 13C NMR (151 MHz, DMSO-d6) d = 171.0, 169.2, 167.9, 167.8, 167.8, 161.3, 161.3, 161.3, 158.8, 158.5, 158.5, 158.5, 158.4, 158.4, 158.3, 158.1, 157.9, 155.9, 138.7, 136.2, 133.7, 126.8, 126.3, 123.0, 122.9, 122.8, 122.8, 122.8, 122.7, 122.7, 122.7, 122.2, 122.1, 122.1, 122.1, 122.0, 122.0, 121.9, 121.9, 121.9, 121.9, 121.8, 121.4, 119.5, 118.6, 118.5, 118.2, 118.1, 117.9, 117.8, 117.5, 115.5, 114.4, 105.7, 104.9, 104.8, 104.7, 104.3, 104.3, 104.2, 104.2, 104.1, 104.0, 104.0, 103.9, 103.9, 53.3, 52.3, 52.1, 40.4, 40.0, 39.3, 38.2, 36.2, 36.2, 36.1, 36.1, 36.1, 36.0, 35.8, 35.8, 35.6, 35.6, 35.5, 35.4, 35.2, 35.1, 35.0, 34.9, 33.3, 25.7, 24.1, 21.8, 21.7 HRMS (ESI) calcd for C120H144N42O22 [MH]+, 2526.1447, found, 2526.1557, HPLC purity: 98%.
The EDTA conjugate of PA25 was prepared via the same method as previously described for PA1-EDTA conjugate . 17 mg (0.067 mmol) of EDTA dianhydride were stirred vigorously at 55 °C in 0.5 mL DIEA, 0. 25 mL DMF and 0.25 mL DMSO. To the anhydride, a biphasic mixture of 10 mg (0.0034 mmol) PA25, 0.5 mL DIEA, and 0.5 mL DMF over 15 min via syringe. After HPLC (Phenomenex Jupiter 250 × 21 mm, C12, 4 u column, A: H2O 0.1% TFA, B: MeOH) and lyophilization, 3.5 mg (0.0012 mmol, 35% yield) of white fluffy solid were collected, PA25-EDTA ImPyPy-β-PyPyIm-β-PyPy-γ-PyPy-β-PyPyPy-β-PyPyPy-β-Ta-EDTA (2 TFA), LCMS [M+Na+H]2+ 1412.11 found 1413.0, HPLC purity: 86%.
To study this region of the HPV 16 genome (Accession # AF125673) quantitatively via CE, two fragments were generated. To generate the 7348-7798 bp fragment, two primer oligomers were used: (top strand) 5’- FAM-ATT GTG TTG TGG TTA TTC AT-3’ and (bottom strand) 5’-HEX- CAT GTA TGA ACT AGG GTG AC-3’, where FAM indicates the fluorescein dye and HEX indicates the hexachlorofluorescein dye. To generate the 7662-122 bp fragment by PCR, two primers were used: (top strand) 5’-FAM-TAA ATC ACT ATG CGC CAA CGC-3’ and (bottom strand) 5’-HEX-CCT GTG GGT CCT GAA ACA TTG-3’. All oligonucleotides were obtained from IDT (Coralville, Iowa).
DNase I footprinting experiments were performed in TKMC buffer (10 mM Tris, 10 mM KCl, 5 mM MgCl2 and 5 mM CaCl2) with 10 mM CHAPS as previously described . DNA concentrations were adjusted between 200 pM and 1 nM as needed. The reaction was initiated with 0.01-0.1 U DNase I; the amount of DNase I was dependent on the age of the enzyme, the DNA fragment, and DNA concentration. Reaction products were purified using the Qiagen MiniElute purification kit. Capillary electrophoresis (CE) analyses of samples were conducted at the DNA Core Facility at University of Missouri. Data were processed using Genemarker V1.97 software (Softgenetics LLC, State College, PA). DNase I cleavage products were mapped using Sanger sequencing (USB, Affymetrix, Santa Clara, CA). Peaks in the footprint were normalized to a peak not sensitive to PA concentration and plotted as fraction bound vs. PA concentration; the data were fit to a binding isotherm as previously described .
0.8 eq of Fe(NH4)2(SO4)2 was added to the polyamide-EDTA-Fe conjugate and incubated for 5 min. The DNA fragment was then added to 1 nM and the resulting mixture incubate 4-6 hours at room temperature in 10 mM Tris, 10 mM CHAPS, pH 7.5. The PA-EDTA conjugate concentration was varied from 5 nM to 250 nM and reported at 50 nM. To initiate the cleavage reaction, 5 μL of fresh 100 mM DTT was added to a 100 μL reaction and incubated 30 min to 4 hr. The reaction was quenched and products purified using a Qiagen PCR purification kit. Fragments were analyzed by capillary electrophoresis. Affinity cleavage results were mapped using Maxam-Gilbert reactions on 5’ FAM endlabeled DNA .
The first reported study of anti-HPV PAs involved tandem (covalently linked) hairpin PAs that targeted a 12 bp recognition sequence for the HPV transcription and replication factor E2 . The binding of the tandem PA to the minor groove of the E2 recognition sequence (ACCN6GGT) prevented this protein from binding the major groove .
PA1 and PA25 (Fig. 1) both represent evolution from the above design. They are both single hairpins with a ten bp recognition sequence of 5’-W2GW7-3’ for PA1 (where W=A or T) and a 13 bp recognition sequence and 5’-W2GW5GW4-3’ for PA25. As mentioned above, they have both been shown to nearly eliminate the HPV16 virus in vivo .
The LCR of HPV 16 starts at bp 7158 and ends at bp 83 of the circular genome. It features a number of binding sites for the E2 proteins and the E1 hexamer helicase assembly site  that are typically involved in initiating replication of the viral genome . In this study, we examine PA1, PA2 and PA25 binding behaviors toward the sequence comprising 7348-122 bp of the HPV 16 genome. In this region, there are nine predicted PA1 recognition sites; five of these are concentrated in the region where E1 and E2 bind (positions commencing at nt 7903, 14, 15, 26, and 62 and indicated with bolded arrows in Fig. 2). There are no predicted perfect PA25 sites in this region, but it does feature 2 predicted single mismatch and almost 30 predicted double mismatch sites. Here “mismatch” refers to an alignment between PA residue and nt that does not conform to the reported PA-DNA recognition rules [30-36]. Typically this means a G or C appears where the PA residue targets W or W aligns with an Im residue.
Since this fragment of the genome is too large for accurate studies with capillary electrophoresis (CE) [21, 37-39], this sequence was divided into two overlapping fragments, 7348-7798 bp (451 bp) and 7662-122 bp (365 bp).
In the affinity cleavage (AC) technique, hydroxyl radical cleavage of DNA at the binding site is achieved using FeEDTA [40-42]. Since the EDTA is located on the tail of the PA (open end of the hairpin; Fig. 1), the location of the tail on the DNA sequence is indicated with the cleavage patterns. Information about PA binding orientation is also provided by AC data (vide infra).
Figure 2 shows the affinity cleavage of PA1-EDTA with the 7348-122 bp DNA fragment of HPV16. As noted previously, 40-50 bp on the ends of the duplex are not accessible via CE . 38 AC patterns were observed, and all AC sites have 3’ shifts consistent with minor groove binders . Some AC patterns are extended, suggestive of overlapping binding events (AC 5, 14, 22, and 24). Prediction of the PA1 binding sites according to the rules developed by the Dervan lab showed that one AC pattern can be rationalized by various possible predicted PA1 binding sites. Binding site assignments assume that forward sites are preferred over reverse ones (see below), and sites with fewer mismatches were preferred over those with a higher number of mismatches.
Relative to the HPV16 fragment studied previously (2150-2622), the 7348-122 region is slightly more GC rich (36.4% vs. 33.2%) and 40% longer but harbors about twice the number of AC sites. This would suggest that either there are runs of more suitable PA1 sites or that there is a greater number of sites with multiple binding orientations. While there is a greater number of perfect sites in this region (7 vs. 3), they account for only a small fraction of the observed AC sites. As described below, the larger number of AC patterns result from sites with greater numbers of mismatches, and as a consequence of this, PA1 binding the same sequence in more than one orientation.
AC sites 1, 13, 30-32, and 34 can be rationalized by forward orientation perfect match binding sites only. Since single and double mismatch sites have already been shown to be abundant and generally of affinity similar to those observed for perfect mismatches , these sites will be tabulated but not discussed. Discussion is restricted to binding behaviors that are distinct from those observed earlier in the 2150-2672 fragment .
AC sites 11, 19, 20, and the 5’ part of AC 25 can be rationalized by forward triple mismatch sites only. In AC sites 9 and 10, there are two peaks on the top strand; on the bottom strand, there is only one AC peak. This could be explained by differences in resolution (bottom strand vs. top strand). AC sites 9 and 10 are unusual cases because AC 9 can be explained by PA1 bound to a forward triple mismatch site while AC 10 can only be rationalized by a quadruple mismatch or by PA1 binding to the DNA in some other conformation (half hairpin or linear).
AC sites 18 and 33 can be rationalized by PA1 bound to a forward quadruple mismatch site or to a reverse triple mismatch site; these cannot be distinguished.
For AC 12, the top strand AC pattern is missing even though the AC pattern further down the sequence was present (AC 13) and the bottom strand pattern was present for this same site. We suspect resolution issues associated with CE analysis. This AC pattern could be rationalized only by a forward quadruple mismatch site.
In summary, for the 7348-122 bp region of the circular HPV16 genome, eight perfect match forward and four perfect match reverse PA1 binding sites were observed by AC. A large number of single, double, triple, and, more rarely, quadruple mismatch sites that bound in both forward and reverse fashion was also observed in this sequence.
DNase I footprinting was used to evaluate all PA binding sites on this portion of the HPV16 genome. Integration positions were chosen based on location with respect to the footprint and for optimum signal-to-noise ratios . In addition to the footprint itself, signal-to-noise ratios are influenced by the distance of the position from the dye and the slight sequence preference of DNase I [44, 45]. HPV16 sequences that are not amenable to cleavage by the enzyme have been reported  and are present in this region as well (7831-7840 and 7857-7866). It is also important to note that since DNase I is larger than PA1, steric encumbrance occurs and the footprint is always larger than that covered by the PA, so integration sites can lie outside the PA1 binding site.
The integration sites for DNase I footprinting are indicated with solid boxes in Fig. 2. Twenty-nine nucleotide positions were integrated to obtain Kd's (Table 1). In most cases, a single Kd corresponds to each identified AC site, even if there are multiple possible sequence possibilities for an AC pattern. Note that the Kd for the PA1-EDTA conjugate as measured by footprinting is indistinguishable to that obtained for unmodified polyamide PA1 .
Table 1 summarizes the Kd values obtained by DNase I. All of the Kds for perfect match sites range from 0.7 nM to 1.9 nM, similar to those obtained previously for such sites on a different part of the HPV16 genome . Most Kds for single mismatch sites fall between 0.8 and 2.6 nM, demonstrating good tolerance for such sites (see below). There are three higher Kds: 8.1 nM at 7878, 10.8 nM Kd quantitated at 7774, and 13.1 nM at 7827. The 7770-7779 site is either a reverse single mismatch or a forward double mismatch site. The Kd quantitated at 7827 could be the result of the binding affinity being determined at the mismatch nucleotide, where local binding may not be as avid. In this case, this was the location of the most favorable signal-to-noise ratio in the footprint.
Kds for the double mismatch sites range from those typical of perfect and single mismatch sites (1.5 nM to 3.4 nM) to those for three mismatches, with Kds between 7 and 20 nM, the latter with a high relative error. Triple mismatch sites displayed slightly higher Kds than typical of most with fewer mismatches (3.1-6.8 nM). The same pattern holds for the quadruple mismatch binding sites (1.9 nM to 5.9 nM). Given the emphasis on perfect matches and the general intolerance of mismatches in the smaller, 6-8-ring hairpin polyamides [18, 46-51], we were somewhat surprised to observe such high affinities at sites with a large number of mismatches. However, we note that Sugiyama reports 18 bp specificity with a tandem hairpin in which three hairpins are linked together to form one large PA, though it is for a human telomeric repeat sequence and the actual measured specificity is low (Kd(mismatch)/Kd(match) is 2.6) .
Previous reports indicated that for some shorter hairpin PAs (i.e. 6-ring) and longer PAs recognizing 10 bp, the introduction of a chiral center in the gamma turn of the hairpin PA resulted in a strong preference for the forward binding orientation [35, 53, 54]. Sugiyama found discrimination between A/T and T/A base pairs with chiral β-hydroxy-γ-aminobutyric acid/β-alanine pairs . To examine how the introduction of a chiral group in the gamma turn of PA1, a chiral version of PA1 was prepared with chiral γ-turn (R)-2,4-diaminobutyric acid, giving PA2; the EDTA conjugate of PA2 was also prepared: PA2-EDTA (or amino-PA1). Binding sites on the 450 bp fragment (7348-7798) were evaluated using affinity cleavage at 50 μM conjugate. As shown in Fig. 3, there is no appreciable difference between the binding sites for PA1-EDTA and amino-PA1-EDTA, or PA2-EDTA. Thus in the context of this larger PA, the introduction of an (R)-amino-gamma turn does not influence binding orientation of the PA family.
In spite of the fact that there are no perfect match sites for PA25 in this 678 region, 31 distinct AC patterns were observed for PA25 and are numbered in Fig. 4. Extended AC patterns 1, 2, 4, 6, 8, 17, 18, 22, 25, 28 have multiple peaks (high intensity surrounded by those with lower intensity) in their patterns for a total of 43 discernable PA25 binding sites.
PA25 was initially designed to recognize the sequence 5’-W2GW5GW4-3’, the longest PA studied in this fashion to date. In attempting to reconcile AC patterns with probable binding sites, the first question is: How is the PA25 site anchored? Unlike mono-imidazole derivative PA1, PA25 has two Im residues and therefore the theoretical ability to recognize two G's on either side of a 5 residue/nt spacer (5’-GW5G-3’). Is the GW5G site, or at least a GN5G site, required for PA25 binding? To address this, all 5’-GN5G-3’ patterns were mapped on the fragment (Fig. S1). There are 14 on the top strand and 27 on the bottom, with a number on the latter overlapping.
What is more striking is that 16 AC patterns, roughly half of the total number of AC patterns for PA25 (1, 5, 7,12, 13, 16, 17abc, 18a, 20, 25ab, 26, 27,31) exist in locations that are not near a 5’-GN5G-3’ sequence at all. Conversely, there are two 5’-GN5G-3’ at 7620-7640 that could be rationalized as multiple mismatch sites (triple at 7638 and quadruple at 7632) but for which no AC patterns are observed. It is possible that low signal-to-noise ratios common for positions over 200 bp from the dye prevent us from detecting these sites. Regardless of this possibility, it is clear that the 5’-GN5G-3’ motif is not required for PA25 binding.
This begs the question of which, if either, G (the first (N-terminal) or second (internal) is more important in anchoring a site/rationalizing an AC pattern. To address this, the sequences at the observed AC patterns were analyzed (Tables S2-S4). The goal was to identify a binding site with the smallest number of mismatches, forward preferred if the number of mismatches is minimized, and then catalog whether the sequence has a first G, second G, both or neither. Two thirds of these sequences are either triple, quadruple, or quintuple mismatches. Over 80% of these sequences were anchored with one G or the other with no preference for one over the other with no obvious correlation with the number of mismatches. Only 6 sites had either zero or both G's. The A/T rich character of the genome is reflected in this observation.
Thus it appears that neither G is critical to the PA25 recognition sequence, and if the fragment were even more A/T rich, we predict that both PAs would populate those sites as much as any other part of the sequence.
Unlike for PA1 where the binding sequences were far more straightforward, the data do not permit us to establish exact sequences for PA25 binding. For this reason, regions supported by DNase I footprints are presented in lieu of specific sequences (Fig. 4.). Almost all of the sequence accessible via CE exhibits PA25 footprinting behavior at 10 nM PA (DNA is 1 nM). These results are consistent with the size of DNase I, the relatively large size of PA25, and the high tolerance for mismatches. Discrepancies between footprinting and AC patterns cannot be interpreted to due to signal-to-noise issues in those regions.
The 27 Kds obtained from the sites of integration appear in Table S1 and are summarized in Fig. 5. 18 of the Kds are between 0.7 and 1.9 nM. There are a few slightly weaker Kds but all are less than 3-fold above this range and thus not significantly different. All of these values compare well to the Kd of 1.2 ± 0.5 nM obtained for two overlapping perfect match sites integrated at position 3662 (outside the LCR; Gaofei He, unpublished results). Given that a large majority of sites are likely to feature triple or quadruple mismatch sequences, this indicates a high tolerance for mismatches. This is discussed in comparison to PA1 below (Section 4.2).
There are numerous examples of modifications to smaller hairpin PAs (6-10 rings) that influence binding orientation. For example, the 8-ring ImPyPy-γ—PyPyPyβDp binds in the forward direction with a Kd 16-fold stronger than a reverse sequence. Neutralizing the C terminal charge via acetylation decreases this preference 4-fold (Table 2, Entry 1; ). More dramatic are chiral modifications to the turn of the hairpin. When a chiral α-amino substitution is introduced into the γ-aminobutyric acid (γ-turn) in the R configuration of a 6-ring and an 8-ring hairpin PA, preferences for the forward orientation over the reverse are over 1000-fold (Table 2, Entries 2,3; [35, 57].
The study of Rucker et al.  with 10-ring PAs and β alanine substitution illustrates the complexity that can emerge. ImImPyPyIm-γ-PyPyPyPyPy-β-Dp binds in a decidedly forward orientation to the extent that it binds a predicted reverse site in a forward fashion, which corresponds to a double mismatch (Table 2, Entry 4). Reverse binding is too weak to quantitate but represents a forward preference of at least 500-fold. Substituting β adjacent to the N-terminal imidazoles (ImIm-β-PyIm-γ-PyPy-β-PyPy-β-Dp) results in a PA that prefers to bind what is a predicted reverse site in the forward fashion, with a double mismatch but with a Kd similar to that predicted for the reverse orientation (i.e., no thermodynamic discrimination between forward and reverse orientations).
In another case, the substitution of one pyrrole for β-Ala in the polyamide with a β-substituted γ-turn results in a preference for the reverse orientation of binding. This was based on thermal melting data (Table 2, Entry 5) . α-substitution (R) at the γ-turn resulted in a forward binding orientation preference.
Engineering sequences to test forward and reverse binding is much more difficult for a large PA with a demonstrated tolerance for mismatches  than for the PAs just discussed. Here we rely on the affinity cleavage technique and a PA2-EDTA, a version of PA1 that is modified with an amine in a chiral (R)-γ-turn, (R)-γ(NH2). While such modifications have been shown to influence binding orientation in smaller PAs, (R)-γ(NH2) has no appreciable effect on AC patterns as exhibited by PA2-EDTA (vs. PA1-EDTA as shown in Fig. 3). This result is entirely consistent with the high tolerance of large PAs for mismatches.
Given that both PA1 and PA25 are active against HPV16 and were originally designed to bind at or near the E1/E2 protein-DNA binding sites for an HPV sequence, a comparison of binding behaviors is instructive. Since the size difference between these PAs results in footprints that also differ in size, AC patterns are more readily compared than footprints themselves (Fig. 6). In making comparisons, we note that signal-to-noise ratios were generally better for PA1 than PA25; this had an impact on interpreting data far from the dye (the 5’ end of the DNA fragment). Due in part to this fact, we were better able to obtain PA1 data further out from the dye (longer fragments) than for PA25. This resulted in some locations where data for one strand (top or bottom, depending on the label) was more accessible for PA1 than PA25.
With that in mind, some observations emerge: First, the number of AC patterns (i.e., binding events) is similar for PA1 and PA25: 34 for the former (5 compounded, that is, having more than one maxima) and 31 for the latter (10 compounded). This means that the general specificity is not dramatically different, even though PA25 is 40 % longer than PA1 (see below). Second, while the footprints are extensive, the fact that there are numerous but distinct AC patterns for both indicate there are discrete binding sites/orientations in both cases. In other words, while the high tolerance for mismatches means that exact sequence binding rules do not exist, neither PA binds indiscriminately. Third, while there is some overlap of AC intensities (black, Fig. 6), there are many places where the AC patterns are distinct from one another (red for PA25 and blue for PA1, Fig. 6). This indicates that the PAs are not targeting the same local regions (e.g. 10-30 bps) and therefore are not similarly targeting any DNA structure driven by local sequence-dependent structures (e.g., minor groove width, roll, propeller twist, and helical twist; ). Finally, while more-active PA25 may target approximately the same number of sites in the LCR as PA1, the greater length of PA25 means that it more completely blankets or enshrouds the viral DNA in a way that PA1 cannot. This correlates with activity and may provide important insights into antiviral behavior.
A comparison of the distribution of Kds for the two PAs appears in Fig. 5. While most Kds for both PAs are between 0.7-4 nM, there is clearly a greater proportion of tighter Kds measured for PA25 than for PA1. This means that taken as aggregate data in this region of the HPV16 genome, PA25 tolerates a greater number of mismatches than PA1. That is, the average Kd is lower (1.7 nM for PA25; 4.5 nM for PA1). This is reasonable given that the number of potential hydrogen bonds with bases is also greater (24 vs. 17) for PA25 than PA1. Thus it appears that when it comes to hairpin PA length and sequence specificity, PA1 length is probably at or possibly past the maximum in terms of benefit to specificity. Longer PA25 tolerates mismatches better than PA1 and is therefore less specific. These properties are at least partly correlated with antiviral behavior, in that PA25 is a better, broad-spectrum antiviral compound, with superior anti-HPV18 activity than PA1 .
The above discussion leads to the question of whether or not it matters where either PA binds or if there is some larger effect that is responsible for the antiviral behavior. Both compounds have low IC50's toward HPV16  and both have high binding-site density in the LCR region. It is expected that this pattern would extend to other parts of the genome. Recently we reported that PA25 induces the DNA damage response (DDR) in HPV-infected cells bearing the unintegrated viral genome (episome) , presumably triggered by a change the shape of the episome brought on by PA binding. The high number of binding sites for both PA1 and PA25 is consistent with this suggestion, as is the relatively rigid nature of the PA molecules, which would strain a supercoiled structure. It is less clear is if there are specific binding events, rather than a global effect, that are responsible for inducing the DDR. Extending from that is whether or not PA1 and PA25 share a common antiviral mechanism at the molecular level; more specifically, if precisely the same binding events are responsible for exactly the same elements of the DDR being induced by each compound, or if there is a combination of binding events that can induce viral DNA degradation. This question continues to drive our efforts to elucidate the details of the interactions of antiviral PAs with their DNA targets.
We thank NIH for AI083803 awarded to JKB and NanoVir, LLC for financial support. JKB is part owner of NanoVir, LLC. We thank NanoVir, LLC for the loan of equipment and the use of active antiviral compounds. We thank the DNA Core at the University of Missouri for DNA fragment analysis. We thank the Danforth Plant Science Center for HRMS [National Science Foundation (NSF) Grant DBI 0922879]. We thank Prof. Ben Bythell for mass spectrometry data. The Agilent 600 MHz NMR spectrometer was obtained using funds from the NSF (#0959360).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.