|Home | About | Journals | Submit | Contact Us | Français|
The ability to design proteins with high affinity and selectivity for any given small molecule would have numerous applications in biosensing, diagnostics, and therapeutics, and is a rigorous test of our understanding of the physiochemical principles that govern molecular recognition phenomena. Attempts to design ligand binding proteins have met with little success, however, and the computational design of precise molecular recognition between proteins and small molecules remains an “unsolved problem”1. We describe a general method for the computational design of small molecule binding sites with pre-organized hydrogen bonding and hydrophobic interfaces and high overall shape complementary to the ligand, and use it to design protein binding sites for the steroid digoxigenin (DIG). Of 17 designs that were experimentally characterized, two bind DIG; the highest affinity design has the lowest predicted interaction energy and the most pre-organized binding site in the set. A comprehensive binding-fitness landscape of this design generated by library selection and deep sequencing was used to guide optimization of binding affinity to a picomolar level, and two X-ray co-crystal structures of optimized complexes show atomic level agreement with the design models. The designed binder has a high selectivity for DIG over the related steroids digitoxigenin, progesterone, and β-estradiol, which can be reprogrammed through the designed hydrogen-bonding interactions. Taken together, the binding fitness landscape, co-crystal structures, and thermodynamic binding parameters illustrate how increases in binding affinity can result from distal sequence changes that limit the protein ensemble to conformers making the most energetically favorable interactions with the ligand. The computational design method presented here should enable the development of a new generation of biosensors, therapeutics, and diagnostics.
Current approaches for designing ligand binding proteins for medical2 and biotechnological uses rely upon raising antibodies against a target antigen in immunized animals3,4 and/or performing laboratory directed evolution of proteins with an existing low affinity for the desired ligand5-7, both of which offer incomplete control over molecular details. Computational design has the potential to provide a general, complementary approach for small molecule recognition in which design features and selectivity can be rationally programmed. Structural and biophysical characterization of previously designed ligand binding proteins has revealed numerous discrepancies with the design models, however, and it was concluded that protein-ligand interaction design is an unsolved problem1,8. The lack of accuracy in programming protein-small molecule interactions also contributes to low catalytic efficiencies of computationally designed enzymes relative to their natural counterparts9-14. The development of robust computational methods for the design of small molecule-binding proteins with high affinity and selectivity would have wide-ranging applications.
The goal of existing methods for computational enzyme design is to promote catalysis by creating energetically favorable hydrogen bonding, van der Waals, and electrostatic interactions to a high-energy reaction transition state(s) and/or intermediate(s). Although these interactions are also important for stabilizing the bound ground-state conformations of protein-small molecule complexes, they are not the sole determinant of small molecule binding. We developed a computational method for designing ligand binding proteins with two properties characteristic of naturally occurring binding sites in addition to specific energetically favorable interactions with the ligand: (1) high overall shape complementarity to the ligand, and (2) structural pre-organization in the unbound protein state, which minimizes entropy loss upon ligand binding15,16. To program in specific interactions with the small molecule, disembodied binding sites are created by positioning amino acid side chains around the ligand in orientations optimal for hydrogen bonding and other energetically favorable interactions and then placed at geometrically compatible binding sites in a set of scaffold protein structures using RosettaMatch17. The surrounding side chain identities and conformations are then optimized to generate additional protein-ligand interactions and buttressing protein-protein interactions (Fig 1a). Designs with protein-small molecule shape complementarity below those typical of native protein complexes18 or having interface side chain conformations with low Boltzmann-weighted probabilities in the unbound state16 are then discarded.
We used the method to design proteins that bind the steroid digoxigenin (DIG; Supplementary Fig. 1), the aglycone of digoxin, a cardiac glycoside used to treat heart disease19, and a commonly used non-radioactive biomolecular labeling reagent20. Anti-DIG antibodies are routinely administered to treat overdoses of digoxin, which has a narrow therapeutic window21, and are widely used to detect biomolecules in applications such as fluorescence in situ hybridization20. We created idealized DIG binding sites with hydrogen bonds from Tyr or His to the lactone carbonyl oxygen and both hydroxyl groups of DIG and hydrophobic packing interactions between Tyr, Phe, or Trp and the steroid ring system (Fig. 1a). These interactions were embedded in designed binding sites with high shape complementarity to DIG as outlined above, and 17 designs were selected for experimental characterization based on computed binding affinity, shape complementarity, and the extent of binding site pre-organization in the unbound state (Fig. 1b and Supplementary Tables 1 and 2).
Binding of the designed proteins to DIG was probed by yeast surface display22 and flow cytometry using DIG-functionalized bovine serum albumin (DIG-BSA) or ribonuclease (DIG-RNase). DIG5 and DIG10 bound to both labels (Fig. 1c and Supplementary Fig. 2), and binding was reduced to background levels when ~1 mM of unlabeled DIG was added as a competitor (Fig. 1c and Supplementary Fig. 3).
Fluorescence polarization (FP) measurements with purified proteins and Alexa488 fluorophore-conjugated DIG (DIG-PEG3-Alexa488) indicated affinities in the low to mid micromolar range, with DIG10 binding more tightly (Fig. 2a,b). Isothermal titration calorimetry (ITC) measurements confirmed that the affinity of DIG10 for unlabeled DIG is the same as that for the labeled DIG conjugate (Fig. 2b, Supplementary Fig. 4, and Supplementary Table 3). The scaffold from which both DIG5 and DIG10 derive, PDB ID 1z1s, a protein of unknown function from Pseudomonas aeruginosa, does not bind to either label (Fig. 1c and Supplementary Fig. 3a) when expressed on the yeast surface or to DIG-PEG3-Alexa488 in solution (Fig. 2a), suggesting that the binding activities of both proteins are mediated by the computationally designed interfaces. Indeed, substitution of small nonpolar residues in the central binding pockets of DIG5 and DIG10 with arginines resulted in complete loss of binding, and mutation of the designed hydrogen-bonding tyrosine and histidine residues to the nearly isosteric phenylalanine reduced binding; for DIG10, substitution of any of the three interacting tyrosines abolished binding completely (Fig. 1d and Supplementary Fig. 5). Optimization of DIG10 by a single round of mutagenesis and selections using yeast surface display and fluorescence-activated cell sorting (FACS) identified small-to-large hydrophobic amino acid changes that increase binding affinity 75-fold by enhancing the enthalpic contribution to binding, yielding DIG10.1 (Fig. 2b,c,d, Supplementary Figs 4,6-8, and Supplementary Table 3).
To provide feedback for improving the overall design methodology and to evaluate the contribution of each residue in the DIG10.1 binding site, we used next generation sequencing to generate a comprehensive binding fitness map23-25. A library of variants with ~1-3 substitutions at 39 designed interface positions in DIG10.1 was generated using doped oligonucleotide mutagenesis, displayed on yeast, and subjected to selections using a monovalent DIG-PEG3-biotin conjugate (Supplementary Fig. 9). Variants with increased affinity for DIG were isolated by FACS, and next generation sequencing was used to quantify the frequency of every single point mutation in the unselected and selected populations. A large majority of the interrogated variants were depleted in the selected population relative to the unselected input library, suggesting that most of the designed residues are close to optimal for binding (Fig. 2e,f and Supplementary Fig. 10). In particular, mutation of the three designed hydrogen bonding residues, Tyr34, Tyr101, and Tyr115, to any other amino acid was disfavored. Several large hydrophobic residues that pack against the ligand in the computational model are also optimal for binding (e.g. Phe66 and Phe119). Besides A99, which contacts DIG directly, most of the observed mutations that improve binding are located in the second coordination shell of the ligand and fall into two functional categories: (1) core substitutions tolerating mutation to chemically-similar amino acids (e.g. Leu105 and Cys23), and (2) solvent-exposed loop amino acids having high sequence entropy (e.g. His90, Val92). The single best clone in the libraries, DIG10.2, features two of the most highly enriched mutations, Ala37Pro and His41Tyr (Fig. 2b,c and Supplementary Figs 6, 8, and 11).
Combination of the most highly enriched substitutions in a library followed by selections led to DIG10.3 (Supplementary Figs 6 and 12), which binds both DIG and its cardiac glycoside derivative, digoxin, with picomolar affinity (Fig. 2b, Supplementary Fig. 13, and Supplementary Table 4), rivaling the affinities of anti-digoxin antibody therapeutics21 and an evolved single-chain variable anti-DIG antibody fragment7. FP-based affinity measurements of DIG10.3 and its Tyr knockouts suggest that the designed hydrogen bonds each contribute ~2 kcal/mol to binding energy (Supplementary Table 5 and Supplementary Fig. 8).
The crystal structures of DIG10.2 and DIG10.3 in complex with DIG were solved to 2.05 Å and 3.2 Å resolution, respectively (Fig. 3a,b and Supplementary Figs 14-16). The structure of DIG10.2 bound to DIG shows atomic-level agreement (average all atom root mean squared deviation (RMSD) of 0.54 Å) with the design model (Fig. 3c). The ligand-protein interface has high shape complementarity (Sc = 0.66) and there are no observable water molecules within the binding pocket. The DIG binding mode is nearly identical in the structure and the model, with an average RMSD of 0.99 Å for all 28 ligand heavy atoms (Fig. 3d). As anticipated, Tyr34, Tyr101, and Tyr115 make the designed hydrogen bonds with O3, O2, and O1 of DIG, respectively. Tyr41, a residue identified during affinity maturation, engages in an additional weak hydrogen bond with the terminal hydroxyl group of DIG (O5) (Supplementary Fig. 16). Of 27 non-glycine and non-alanine non-surface protein residues within ~10 Å of the ligand, 21 adopt rotamer conformations in the design model (Supplementary Fig. 17), including Tyr101 and Tyr115 (in chain B) as well as the first-shell packing residues Trp22, Phe58, and Phe119. The structure of DIG10.3 bound to DIG also agrees with the design model (average all-atom RMSD of 0.68 Å) (Supplementary Fig. 18).
We assessed the binding specificity of DIG10.3 by determining affinities for a series of related steroids by equilibrium competition fluorescence polarization assays. Experiments with DIG, digitoxigenin, progesterone, and β-estradiol showed a decrease in affinity corresponding to the loss of one, two, and three hydrogen bonds respectively (assuming ~1.8 kcal/mol per hydrogen bond26), as expected from the structure if these compounds bind in the same orientation as DIG (Fig. 4a,b and Supplementary Table 4). We next investigated whether the observed steroid selectivity could be reprogrammed by mutagenesis of the key hydrogen-bonding tyrosine residues. The variants Tyr101Phe, Tyr34Phe, and Tyr34Phe/Tyr99Phe/Tyr101Phe show clear preferences for more hydrophobic steroids in a predictable manner that depends on the hydrogen bonding capabilities of both the protein and the steroid. Mutation of Tyr101 to Phe eliminates the DIG-specific hydrogen bond with O2 of DIG and provides better hydrophobic packing for the other three steroids lacking a hydroxyl group at that position (Fig. 4c). Substitution of Tyr34 with Phe removes a hydrogen bond to the C14 hydroxyl groups of both DIG (O3) and digitoxigenin, enhancing the preference for progesterone and maintaining the relative binding order of DIG and digitoxigenin due to the intact DIG-specific Tyr101-DIG O2 bond (Fig. 4d). Mutation of Tyr101, Tyr34, and binding site residue Tyr99 to Phe leads to decreased binding affinity for DIG and increased affinity for the more hydrophobic steroids (Fig. 4e). These results confirm that the selectivity of DIG10.3 for DIG is conferred largely through the designed hydrogen-bonding interactions and demonstrate how selectivity can be programmed through positive design alone by control of designed protein-ligand hydrogen bonding and non-polar interactions.
Comparison of the properties of successful and unsuccessful designs provides a test of the hypotheses underlying the design methodology. Although all 17 designed proteins had high computed shape complementarity to DIG by construction, the DIG10 design, which had the highest affinity for DIG, had the most favorable computed ligand interaction energies and was predicted to have the most pre-organized binding site (Fig. 1b and Supplementary Table 6), suggesting that these attributes should continue to be the focus of future design methodology development. One potential avenue for obtaining more favorable interaction energies would be the incorporation of additional binding site backbone flexibility to achieve more tightly packed binding sites: the observation that substitution of small hydrophobic interface residues to larger amino acids increased binding affinity indicates that the original DIG10 design was under-packed.
The binding fitness landscape in combination with the x-ray co-crystal structures highlight the importance of second shell interactions in stabilizing binding competent conformations. The fitness landscape favors substitution of Leu105, adjacent to the key hydrogen-bonding residue Tyr115, to Trp or other large hydrophobic residues (Fig. 2e). Tyr115 exhibits obvious conformational side chain heterogeneity in the four independent protein subunits of the 2.05 Å resolution DIG10.2 crystal structure. Mutation of Leu105 to Trp results in a single uniform set of side chain conformations for Tyr115 in the lower resolution DIG10.3 design (which contains nine independently visualized subunits) that features a more canonical hydrogen bond geometry between Tyr115 and DIG than the alternative DIG10.2 Tyr115 conformation (Fig. 3e and Supplementary Fig. 19). The higher affinity of DIG10.3 might result from a higher population of the pre-organized, higher affinity conformation of the protein15,27, and ITC studies confirmed that entropic as well as enthalpic factors contribute to its enhanced binding affinity (Fig. 2d and Supplementary Table 3). Indeed, all key hydrogen-bonding tyrosines, particularly Tyr115, have higher computed Boltzmann weighted side chain probabilities in apo-DIG10.3 than in apo-DIG10.2 and apo-DIG10 (Supplementary Table 7). Similarly, reduced backbone conformational entropy is likely responsible for the increased fitness of substitutions increasing β-sheet propensity at positions 90 and 92 which likely stabilize a more ordered extended strand backbone conformation (Fig. 2e). That conformational flexibility is selected against during affinity maturation suggests that accounting for free energy gaps between binding-competent and alternative states of the binding site28, possibly by better assessing side chain entropy or explicitly designing second shell buttressing interactions for key contacts, should aid in achieving high affinity in the next generation of computationally designed ligand binding proteins.
The binding affinity of DIG10.3 is within a factor of two of that of the widely used anti-DIG antibodies,21 and because it is stable for extended periods at ambient temperatures (Supplementary Fig. 20) and can be expressed at high levels in bacteria, it could provide more cost-effective alternative for biotechnological applications and for therapeutic purposes if it can be engineered for compatibility with the human immune response. With continued improvement in the methodology and feedback from experimental results, computational protein design should provide an increasingly powerful approach to creating a new generation of small molecule receptors for synthetic biology, therapeutic scavengers for toxic compounds, and robust binding domains for diagnostic devices.
Full details for all computational methods are given in Supplementary Methods. Example command lines and RosettaScripts31 design protocols are provided in Supplementary Data. Source code is freely available to academic users through the Rosetta Commons agreement (http://www.rosettacommons.org/). Design models, the scaffold library, and scripts for running design calculations are provided on the Baker lab website.
A set of 401 scaffolds was searched for backbones that can accommodate five pre-defined side chain interactions with DIG using RosettaMatch17 (Supplementary Table 8). This set contained scaffolds previously used for design projects within our lab10,13,32 as well as structural homologs of a subset of these scaffolds that are known to tolerate mutations. Full details are given in Supplementary Methods.
Two successive rounds of sequence design were employed. The purpose of the first was to maximize binding affinity for the ligand33. The goal of the second was to minimize protein destabilization due to aggressive scaffold mutagenesis while maintaining the binding interface designed during the first round. During the latter round, ligand-protein interactions were up-weighted by a factor of 1.5 relative to intra-protein interactions to ensure that binding energy was preserved. Two different criteria were used to minimize protein destabilization: (1) native scaffold residues identities were favored by 1.5 Rosetta energy units (Reu), and (2) no more than five residues were allowed to change from residue types observed in a multiple sequence alignment (MSA) of the scaffold if (a) these residues were present in the MSA with a frequency greater than 0.6 and, (b) if the calculated G for mutation of the scaffold residue to alanine34 was greater than 1.5 Reu in the context of the scaffold sequence. In some design calculations, identities of the matched hydrogen bonding residues were allowed to vary according to the MSA and G criteria described above. Designs having fewer than three hydrogen bonds between the protein and the ligand were rejected.
Designs were evaluated on interface energy, ligand solvent exposed surface area, ligand orientation, shape complementarity, and apo-protein binding site pre-organization. The latter was enforced by two metrics: (1) explicitly introducing second-shell amino acids that hold the pre-selected residues in place using Foldit35, and (2) selecting designs having rotamer Boltzmann probabilities16 > 0.1 for at least one hydrogen bonding residue (Supplementary Table 6). High shape complementary was enforced using by rejecting designs having Sc < 0.5. Shape complimentary was computed using the CCP4 package v.6.0.236 using the Sc program18 and the Rosetta radii library. All designs were evaluated for local sequence secondary structure compatibility, and those predicted to have backbone conformations that varied by > 0.8 Å from their native scaffold were rejected (see Supplementary Methods).
Detailed procedures for the syntheses of DIG-BSA-biotin, DIG-RNase-biotin, DIG-PEG3-biotin, and DIG-PEG3-Alexa488, as well as protein expression, purification, and crystallization, cloning, and mutagenesis methods are given in Supplementary Methods. Details about fluorescence polarization binding assays, isothermal titration calorimetry, gel filtration analysis, analytical ultracentrifugation experiments, and circular dichroism protein stability measurements are also provided in Supplementary Methods.
Designed proteins were tested for binding using yeast-surface display22. Yeast surface protein expression was monitored by binding of anti-cmyc FITC to the C-terminal myc epitope tag of the displayed protein. DIG binding was assessed by quantifying the phycoerythrin (PE) fluorescence of the displaying yeast population following incubation with DIG-BSA-biotin, DIG-RNase-biotin, or DIG-PEG3-biotin, and streptavidin-phycoerythrin (SAPE). In a typical experiment using DIG-BSA-biotin or DIG-RNase-biotin, cells were resuspended in a premixed solution of PBSF (PBS + 1 g/L of BSA) containing a 1:100 dilution of anti-cmyc FITC, 2.66 μM DIG-BSA-biotin or DIG-RNase biotin, and 664 nM SAPE for 2-4 hr at 4 °C. Cellular fluorescence was monitored on an Accuri C6 flow cytometer using a 488 nm laser for excitation and a 575 nm band pass filter for emission. Phycoerythrin fluorescence was compensated to minimize bleed-over contributions from the FITC channel. Competition assays with free digoxigenin were performed as above except that between 750 μM and 1.5 mM of digoxigenin was added to each labeling reaction mixture. Full details are given in Supplementary Methods.
Detailed procedures for constructing and selecting all libraries, including those for deep sequencing, are provided in Supplementary Methods. Yeast surface display library selections were conducted on a Cytopeia inFlux cell sorter using increasingly stringent fluorescence gates. In all labeling reactions for selections, care was taken to maintain at least a 10-fold molar excess of label to cell surface protein. Cell surface protein molarity was estimated by assuming that an O.D.600 of 1.0 = 1e7 cells/mL and that each cell displays 50,000 copies of protein22. For each round of sorting, we sorted at least 10 times the theoretical library size. FlowJo software v. 7.6 was used to analyze all data. Cell sorting parameters and statistics for all selections are given in Supplementary Table 9.
Two sequencing libraries based on DIG10.1 were assembled by recursive PCR: an N-terminal library (fragment 1 library) and a C-terminal library (fragment 2 library). To introduce mutations, we used degenerate PAGE-purified oligos in which 39 selected positions within the binding site were doped with a small amount of each non-native base at a level expected to yield 1-2 mutations per gene (TriLink BioTechnologies) (Supplementary Table 10). Yeast cells were transformed with DNA insert and restriction-digested pETCON37. Surface protein expression was induced22 and cells were labeled with anti-cymc-FITC and sorted for protein expression. Expressing cells were recovered, induced, labeled with 100 nM of DIG-PEG3-biotin for > 3 hrs at 4 °C and then SAPE and anti-cymc-FITC for 8 min at 4 °C, and then sorted. For each library, clones having binding signals higher than that of DIG10.1 were collected (Supplementary Fig. 9). To reduce noise from the first round of cell sorting, the sorted libraries were recovered, induced, and subjected to a second round of sorting using the same conditions (Supplementary Methods).
Library DNA was prepared as described25. Illumina adapter sequences and unique library barcodes were appended to each library pool by PCR amplification using population-specific primers (Supplementary Table 11). DNA was sequenced in paired-end mode on an Illumina MiSeq using a 300-cycle reagent kit and custom primers (see Supplementary Methods). Of a total 5,630,105 paired-end reads, 2,531,653 reads were mapped to library barcodes (Supplementary Table 12). For each library, paired end reads were fused and filtered for quality (Phred ≥ 30). The resulting full-length reads were aligned against DIG10.1 using Enrich38. For single mutations having ≥ 7 counts in the original input library, a relative enrichment ratio between the input library and each selected library was calculated23-25. The effect of each amino acid substitution at 39 binding site residues on binding () is given as the log base 2 frequency of observing mutation x at position i in the selected versus the unselected population, relative to that of the DIG10.1 residue (orig) at position i:
Fluorescence polarization-based affinity measurements of designs and their evolved variants were performed as described30 using Alexa488-conjugated DIG (DIG-PEG3-Alexa488). Fluorescence anisotropy (r) was measured in 96-well plate format on a SpectraMax M5e microplate reader (Molecular Devices) with λex = 485 nM and λem = 538 nM using a 515 nm emission cutoff filter. Fluorescence polarization equilibrium competition binding assays were used to determine the binding affinities of DIG10.3 and its variants for unlabeled digoxigenin, digitoxigenin, progesterone, β-estradiol, and digoxin. The inhibition constant for each protein-ligand interaction, Ki, was calculated from the measured total unlabeled ligand producing 50% binding signal inhibition (I50) and the Kd of the protein-label interaction according to a model accounting for receptor-depletion conditions30. Full details are provided in Supplementary Methods.
ITC studies were performed on an iTC200 microcalorimeter (MicroCal) at 25 °C in PBS, pH 7.4. Ligand solutions were prepared by diluting a stock solution of digoxigenin (100 mM in 100% DMSO) into the flow-through of the last buffer aliquot used to exchange the protein (final DMSO concentrations were 1-3%). ITC titration data were integrated and analyzed with Origin 7.0 (MicroCal). Full details are provided in Supplementary Methods.
We thank Dr. E.-M. Strauch and Dr. P.-S. Huang for providing the ZZ/pETCON and S2/pETCON plasmids, respectively, and Dr. B. Shen for assistance with data processing, modeling, and refinement of the x-ray crystal structures. We thank Dr. J. P. Sumida for assistance with analytical ultracentrifugation data collection, processing, and analysis. Analytical ultracentrifugation experiments were performed at the University of Washington Analytical Biopharmacy Core, which is supported by the Washington State Life Sciences Discovery Fund and the Center for the Intracellular Delivery of Biologics. We thank Dr. S. Fleishman, Dr. O. Khersonsky, and Dr. P.-S. Huang for comments on the manuscript. J.W.N. acknowledges a pre-doctoral fellowship from the National Human Genome Research Institute under the Interdisciplinary Training in Genome Sciences program (T32 HG00035). This work was supported by grants from the HHMI and DTRA to D.B., a grant from DARPA to D.B. and B.L.S., a grant from the Swiss National Science Foundation to K.J., and NSF grant MCB1121896 to C.G.K.
C.E.T., S.D.K., and D.B. designed the research. S.D.K., C.E.T., and J.D. performed computations. S.D.K. wrote program code. C.E.T. experimentally characterized the designs, constructed libraries, performed affinity maturation and deep sequencing selections, and conducted binding and biochemical studies of DIG10. J.D. characterized DIG5. J.W.N. analyzed deep sequencing data. C.E.T. and J.D. prepared protein samples for crystallographic analysis. L.D. and B.S. crystallized the protein samples and analyzed crystallographic data. A.S. and K.J. synthesized DIG-PEG3-biotin and DIG-PEG3-Alexa488. C.E.T. prepared protein samples for ITC studies, and W.J. and C.G.K. performed ITC experiments and analyzed ITC data. C.E.T., S.D.K., and D.B. analyzed data and wrote the manuscript. All authors discussed the results and commented on the manuscript.
The crystal structures of DIG10.2 and DIG10.3 bound to DIG have been deposited in the RCSB Protein Data Bank under the accession numbers 4J8T (DIG10.2) and 4J9A (DIG10.3). The authors declare no competing financial interest.
Supplementary Information is available in the online version of the paper.
Full Methods and any associated references are available in the online version of the paper.