The model system
We chose kinase domain of p-21 activated kinase 1 (PAK1) as our `target' protein. The full length PAK1 (PAK1-fl) is a multi-domain protein that can switch between an inactive and active conformation. In the inactivated state, the auto-inhibitory domain of PAK1-fl binds with the kinase domain of PAK1. In the activated state, the auto-inhibitory domain is unfolded by accessory factors and no longer interacts with the kinase domain.13
For de novo
interface design we targeted a region of the PAK1 that is exposed when the auto-inhibitory domain releases. This is an attractive binding site because it is a known region of protein-protein interaction and binders that target this region will be sensitive to the activation state of PAK1, potentially providing a tool for sensing or controlling PAK1 activity. The `target' patch is a hydrophobic cleft in the C-terminal domain of the kinase domain between two α-helices (αEF and αG). The auto-inhibitory domain of PAK1-fl forms a small helical bundle that inserts in the cleft. As a design scaffold for targeting the PAK1 kinase domain, we used a small helical bundle protein, the Hyperplastic discs protein (HYP, PDB ID: 1I2T),14
that is similar in size to the auto-inhibitory domain and can fit in the target cleft.
In a preliminary set of HYP designs we found that it was prone to aggregation when redesigned to bind PAK1. Because we are targeting a hydrophobic cleft on PAK1, the redesigned scaffolds typically have additional hydrophobic groups on their surface. To increase the baseline solubility of HYP we selected positions away from the target interface for mutagenesis to polar residues. The mutations, G26E, L37E and L38N are presented in some of the designs discussed here. Additionally, we introduced the mutation A15C to HYP to allow for conjugation of the fluorophore Bodipy for measuring changes in fluorescence polarization upon binding to PAK1. Circular dichroism spectra indicate that these mutations do not perturb the helical structure of HYP (Supplementary figure S1
Interface design protocol, DDMI
To redesign HYP to bind the PAK1 we developed an interface design protocol, called DDMI, within the Rosetta molecular modeling suite15; 16
(). The DDMI protocol took as inputs 1) a pdb file containing both the scaffold and target proteins, in an undocked conformation, and 2) a set of constraints, described below. As output, a single DDMI trajectory would produce a single docked conformation of the target and scaffold where the scaffold's residues at the interface had been redesigned to favorably interact with the target's interface residues. The first stage of the protocol was a randomization of the scaffold protein followed by a low resolution rigid-body docking of the two proteins.12
The goal of this stage was to find a plausible docked-conformation for the two proteins so that they can be designed to bind in this conformation. We filter at this stage any docked conformations that show backbone collision which the design phase would be unable to remove. To bias sampling of the scaffold's conformation to bind against the target cleft, square-well constraints were added to the energy function to reward the burial of specified residues at the interface: residues were rewarded if they were within 7 Å of any residue on the opposite chain. These constraints were removed after the initial stage of docking. shows the representative sampling of structures created during `dock' stage of the protocol. DDMI preserved the scaffold's original sequence during this phase. Though this sequence would have been unlikely to form favorable interactions (even at low resolution) with residues on the target protein, the constraints ensured that the desired regions of the scaffold's surface were positioned near the target.
Figure 1 DDMI protocol for protein interface design. Each trajectory starts with rigid-body docking using a low resolution score function with additional constraints to direct predetermined residues to the interface. A docked conformation with a high score (binding (more ...)
Figure 2 Conformational sampling in (a) `dock' stage and (b) energy convergence during the `design/minimize' stage. In (a) the region in `magenta' depicts the `hotspot' region on the target protein. In (b) each line represents an independent trajectory. A drop (more ...)
After docking, iterative rounds of sequence design and structure optimization with Rosetta's all-atom energy function16
were used to find low energy sequence structure pairs. Sequence design was performed with simulated annealing and a rotamer-based representation of the amino acid side chains. Structure optimization was performed using gradient-based minimization of the rigid-body degrees of freedoms orienting the two chains17
as well as the backbone and side chain torsion angles of the residues at the interface. In the early rounds of design and minimization, DDMI down-weighted the repulsive component of the Lennard-Jones potential and added coordinate constraints for the backbone Cα atoms to prevent the fledgling interface from “exploding”: the typical binding energy for the interfaces that resulted from the first few iterations was positive., so that, without the coordinate constraints, the minimizer displaced the scaffold from the target, preventing the design of any interface. In each iteration, DDMI decreased the weight on the coordinate constraints and increased the weight on the Lennard-Jones repulsive term. We found that after eight rounds of sequence design and structure minimization most trajectories had converged on a local minimum ().
Global sampling of conformational space (within the target constraints) was achieved by performing independent trajectories that effectively start from uniquely docked complexes. We performed >1 million independent DDMI trajectories with HYP and PAK1. To assess the quality of our designs, we compared our models against 43 naturally occurring protein-protein interfaces with high-resolution crystal structures (2.3 Å or less, Supplementary table S1
). We observed three key differences between native interfaces and designed interfaces. In the native interfaces, the minimized Rosetta energies correlated strongly with the interface sizes (Supplementary figure S2
). The average binding energy density
, defined as the binding energy (in Rosetta Energy Units, REU) per buried surface area (Å2
) was −0.013 REU/Å2
for the native interfaces (). Binding energy density was a potent discriminator of designed and natural interfaces; most DDMI trajectories resulted in interfaces with poor binding energy densities. We also observed that native structures had on average 4 unsatisfied hydrogen bonds at their interfaces; whereas DDMI models often contained many more. Finally, we observed that naturally occurring interfaces were packed more tightly than those produced by DDMI as measured by the SASApack score. The SASApack score in Rosetta is derived from examining the difference in the molecular surface areas accessible to a 0.5 Å radius probe and accessible to a 1.4 Å radius probe.18
This difference indicates surface area on the protein that is not in contact with either water or other protein atoms, and hence reflects the presence of voids that are too small to be filled with water. The score is normalized by the average surface-area difference observed in a large set of crystal structures. A negative SASApack score indicates better packing than crystal structures, a positive SASApack score indicates worse packing. The average SASApack score for the set of native interfaces was −1.39 ± 1.29, whereas many of the designs had positive SASApack scores.
Benchmark scores from native PDBs and the cutoffs used as Filter II during simulations
These observations motivated the set of filters we incorporated into the DDMI protocol. DDMI discarded designs if their binding-energy density was higher than −0.01 REU/Å2 and if they buried 4 or more polar groups which lacked hydrogen bonding partners. Final selection of the design models was made based on the SASApack score of less than 2.0 and a minimum buried surface area of 700 Å2. Satisfying all four criteria required considerable sampling; less than 1% of the DDMI trajectories passed these filters. The interfaces that were left were of moderate size (900 Å2 – 1700 Å2).
Four designs with favorable values for all of the evaluation metrics were selected for experimental validation: “0233”, “Spider Roll”, “1212” and “3533”. Each of them had the c-terminal helix of the scaffold interacting with the `hotspot' cleft in PAK1. The main interacting residues on PAK1 were L470, L473 and Y474. In some cases V436, R438 and R471 were also contributing to the interaction. On the scaffold side, the mutations were mainly concentrated on the helix IV (). Residues involved in interaction were mostly hydrophobic consisting of one or two aromatic amino acids forming the centre of interactions. Design model 1212 had the greatest number of polar residues at the interface, and design model s032 (introduced below) had the fewest polar amino acids at the interface ().
Multiple sequence alignment of the design models selected for experimental validation. The interface residues in each case are highlighted.
Computational scores of the selected design models and comparison with a native AIDkinase interaction in PAK1 full length.
Discrete molecular dynamics (DMD) to sample more backbone conformations
To study the effect of additional backbone flexibility on the design process, we used discrete molecular dynamics19; 20; 21
to sample conformational space near one of the designed complexes (Spider Roll). From the initial design model, we first performed short DMD simulations and picked ten structure snapshots from the simulation trajectory. Next, we applied the iterative sequence design and minimization phase (DMI phase) from the DDMI protocol to these ten snapshots to generate ten new designs, from which the one with the best Rosetta energies was selected for the next round of DMD simulation. These DMD+DMI iterations were repeated 100 times at decreasing DMD simulation temperature to identify low energy designs. The 1000 designs that resulted were filtered using the same criteria as for the DDMI protocol, and of the remaining structures, the top 10 by total energy were visually examined. This complete protocol was executed twice, and from the top 10 in each execution, we selected one design for experimental validation. The designs were named“s032” and “s037”. The backbone RMSD between s032 and Spider Roll when the target protein was superimposed was 5.36 Å while between s037 and Spider Roll the deviation was 4.53 Å. These RMSDs reflect a substantial shift in the docked conformation of these designs against the target, as well as a significant change to the starting backbone geometries. The two designs moved 1.28 Å and 1.52 Å in Cα RMSD away from the starting Spider Roll backbone. In contrast, gradient-based minimization in the standard DDMI protocol produced designs with Cα RMSDs to the starting HYP scaffold of under 0.3 Å. The sequences of the two new designs differ significantly from the initial Spider Roll design. Out of the 21 residue sites being designed, the numbers of mutated amino acids are 12 and 18 for s032 and s037, respectively.
Binding measurements and `hotspot' mapping
The six designs were expressed as MBP fusions. All of them expressed in the soluble fraction of E. coli
lysate, but 3533 and s032 aggregated when MBP was removed with TEV protease. No further studies were performed with 3533 and s032. The circular dichroism (CD) spectra of the designs indicate that the proteins are helical and all of the designs exhibited cooperative thermal melts as monitored by the CD signal at 222 nm (Supplementary figure S1
). The four soluble designs were labeled with the fluorescence probe Bodipy, and fluorescence polarization was used to monitor binding to PAK1. Spider Roll showed the best binding affinity (Kd
= 100 μM) while design model s037 which was derived from Spider Roll using DMD bound with a dissociation constant of 160 μM. Model 1212 bound with an affinity of 330 μM and model 0233 failed to show any conclusive binding with PAK1.
To probe if Spider Roll was interacting with PAK1 as designed, we mutated residues on both sides of the interface that were predicted to contribute to binding (). Consistent with the design model, the PAK1 mutations L473A and L470E each destabilized binding by over 1.4 kcal/mol. R471A and Y474A also destabilized binding by 0.35 and 0.85 kcal/mol respectively. R438, predicted to form a hydrogen bond with backbone carbonyls at the C-terminus of Spider Roll, did not have any effect on binding affinity when mutated to alanine. Mutations to residues on Spider Roll that were designed to form interactions with PAK1 also weakened binding. The point mutations F59A and G56E weakened binding by 0.55 kcal/mol. Y52A and A55E mutations had only subtle effect with a binding energy cost of approximately 0.4 kcal/mol. A more dramatic decrease in binding was observed by combining F59A with G56E (1.4 kcal/mol). I58A mutation, which was predicted to contact PAK1 at the edge of the interface, had very minimal effect on binding. The design scaffold (with solubilizing mutations G26E, L37E and L38N) had basal level of affinity for PAK1. Taken together the mutational data indicates that Spider Roll interacts with the target patch on PAK1, and that helix IV of Spider Roll is involved in binding.
Figure 4 (a) Spider Roll – PAK1 model showing the interface residues. On the PAK1 side (yellow) L470, R471, L473 and Y474 are the main residues involved in interaction. On the Spider Roll side the main contributions come from Y52, A55, G56, I58 and F59. (more ...)
NMR-based structure prediction of Spider Roll
In order to structurally characterize Spider Roll as well as its binding to PAK1, we nominated Spider Roll mutant I58A, which expressed with a more than two-fold increases yield when compared with Spider Roll (see Methods), as a community outreach target of the Protein Structure Initiative (PSI) 2 and collaborated with the Northeast Structural Genomics Consortium (NESG: http://www.nesg.org
) to obtain sequential polypeptide backbone and 13
Cβ chemical shift assignments. The 2D-[15
H] HSQC spectrum22
recorded for Spider Roll I58A shows favorable chemical shift dispersion and indicates that the protein is folded in solution (Supplementary figure S3
). Assignment completeness of detectable peaks in the 2D-[15
H] HSQC spectrum was 91% (48/53). Polypeptide backbone and 13
Cβ chemical shift assignments (Supplementary table S2
) were obtained for 50 residues and a total of 81% of the shifts assignable with the selected set (see Methods) of multidimensional NMR experiments (i.e. excluding the N-terminal 15
, the three prolyl 15
N and the 13
C' shifts of residues preceding prolyl residues). The chemical shifts are in agreement with the location of α-helices in the X-ray crystal structure of the design scaffold protein (HYP), except for the last ~6 residues of helix IV (Supplementary figure S4
, Supplementary table S3
). The shifts were then used to predict the structure of Spider Roll I58A with the program CS-ROSETTA23
(Supplementary figure S5
). The CS-ROSETTTA structure is very similar structure of design template protein HYP (backbone RMSD = 0.7 Å for helical residues; residues 1010–1022, 1026–1036, 1041–1049 and 1051–1065 for 1I2T and residues 2–14, 18–28, 33–41 and 43–57 for the current structure) which indicates that the redesign of HYP did not significantly affect the fold of the protein.
The chemical shift indices suggest that the last ~6 residues of the C-terminal helix (helix IV) of Spider Roll are frayed in solution. To further investigate the conformation of helix IV, we derived amide proton – amide proton upper distance limit constraints from 3D 15N-resolved [1H, 1H]-NOESY. The longer distances derived for the C-terminal segment of helix IV reflect weaker NOEs (the sequential NOEs between the last 6 residues either overlap or disappear), which are consistent with fraying of this segment. All of our designed interfaces include a fully intact C-terminal helix that makes close contact with PAK1. Fraying of the helix IV in the unbound state is not inconsistent with a fully folded helix in the bound state, but indicates that there will be an additional entropic penalty associated with binding. Hence, future improvement of the design of Spider Roll may focus on stabilizing the C-terminal segment of helix IV.
NMR characterization of Spider Roll-PAK1 binding
H] HSQC spectrum22
was recorded for Spider Roll I58A and monitored as a function of PAK1 concentration. The spectra were recorded at three different molar ratios of Spider Roll I58A / PAK1: 231 μM / none, 210 μM / 120 μM, 148 μM / 487 μM). Unexpectedly, addition of PAK1 did not induce any perturbation of Spider Roll chemical shifts or introduce large broadening of Spider Roll resonances, but lead to a dramatic decrease of Spider Roll signal intensities. Specifically, at a PAK1 concentration of 487 μM, Spider Roll peak intensities were reduced to <5% of their starting values () However, no `new' peaks emerged during the titration that could be attributed to Spider Roll bound to PAK1. This is likely due to the fact that the signals of the 43 kDa complex present at a concentration of only ~200 μM are too weak to be detected. Assuming for simplicity that both 1
H and 15
N line-widths scale linearly with molecular weight22
, the S/N ratios for Spider Roll in complex with PAK1 are expected to be about 40-fold lower than for free Spider Roll.24
Since the S/N ratios observed for free Spider Roll are in the range of ~20 in the 2D-[15
H] HSQC spectra, detection of the signal of bound Spider Roll is indeed not be expected.
Figure 5 Normalized Spider Roll I58A HSQC peak volumes as a function of PAK1 concentration. Titration 1: 231 μM Spider Roll I58A with no PAK1; Titration 2: 210 μM Spider Roll I58A with 120 μM PAK1 (green bar) and 213 μM Spider Roll (more ...)
Furthermore, Spider Roll may bind to PAK1 in different and slowly exchanging conformations, a phenomenon which would broaden resonance lines, thereby further impeding signal detection for bound Spider Roll. A third scenario which would manifest itself by lack of signals for the bound protein would be the formation of aggregates formed by the complex.
To check if the Spider Roll-PAK1 complex aggregates non-specifically, we performed size exclusion chromatography with the NMR sample and also ran an SDS page gel for an aliquot of the NMR sample. The SDS gel confirmed that the same amount of Spider Roll remained in solution as was initially added to the NMR sample, and the size exclusion chromatography indicated that the sample was not aggregating: the only two peaks in the chromatogram corresponded to monomeric PAK1 and Spider Roll, which is typically observed for proteins with micromolar binding affinities. Hence, these experiments suggest that the absence of NMR peaks from the bound state is indeed likely due to the slower overall rotational tumbling of the complex.
Our NMR data are consistent with the lifetime of the Spider Roll-PAK1 complex being much longer than the time required for signal detection (`NMR chemical shift time scale', around 0.1 s), that is, we obtain as an upper bound for the `off-rate' koff
<< 10 s−1
. With the dissociation constant KD
M, we then obtain for the `on-rate' kon
. The on-rate constants for protein-protein binding can vary dramatically (1×103
) but are often near 1×106
The comparably small on-rate constant for Spider Roll could reflect the absence of an effective docking funnel or a conformational change that accompanies binding, such as folding of the end of the helix IV in Spider Roll.
To validate that the reductions of Spider Roll peak intensities in 2D-[15
H] HSQC are due to binding to the kinase domain, the NMR titration was repeated for PAK1 mutant L470E, which shows reduced affinity for Spider Roll when monitored using fluorescence polarization experiments. Consistently, the changes in peak volume were much smaller than those observed with wild type PAK1: at a PAK1 L470E concentration of 470 μM, the Spider Roll I58A peak volumes were still 40% of their original size (in contrast to <5% for WT PAK1) (, Supplementary figure S6
Does Spider Roll adopt multiple docked positions when binding PAK1?
Both the NMR data and mutational data indicate that Spider Roll binds the target cleft on PAK1, but they do not rule out the possibility that it can adopt alternative docked orientations when bound to PAK1. To further examine this possibility we used Rosetta to perform protein-protein docking simulations with Spider Roll and PAK1. In these simulations, Spider Roll was constrained to be near the target binding site, but was allowed to adopt alternative orientations relative to PAK1. Many independent trajectories were used to probe the energy landscape and the energies of the various models were plotted versus RMSD to the target conformation. We identified two clusters of low energy structures (). The lowest energy cluster was centered on the design model, but the second cluster packed helix IV in a direction that was orthogonal to the design model (). The mutational data does not strongly distinguish between the two alternatives. The mutations that have the strongest effect on binding energy are buried in both sets of models () as calculated by the NACCESS program.27
Figure 6 Alternative docking orientations for Spider Roll (a) Each point represents a different model of the Spider Roll / PAK1 complex and derive from independent docking trajectories. Two alternative low energy states are observed, as shown by the two stems (more ...)
Table 4 Buried solvent accessible surface area (SASA) of the interface residues in Spider Roll kinase complex in two alternate docked positions (). NACCESS program26 was used to calculate the absolute SASA of each interface residue in the complex and (more ...)
Spider Roll binds preferentially to the activated form of full-length PAK1
In its inactive form, full-length PAK1(PAK-fl) forms a closed conformation in which an autoinhibitory domain binds to the same cleft in the kinase domain that we have targeted with Spider Roll. PAK1 can be opened by introducing mutations in the autoinhibitory elements (V127E, S144E) that weaken affinity for the kinase domain (unpublished data). Using fluorescence polarization we measured the affinity of Spider Roll for WT PAK-fl and PAK1-fl with the mutations V127E and S144E. Spider Roll showed no binding with WT PAK1, but bound PAK1 V127E, S144E with an affinity of 200 μM ().
Spider Roll distinguishes between `closed' and `open' full-length PAK1 (PAK1-fl). Spider Roll binding titrations with PAK1 V127E/S144E mutant (PAK1-fl mutant, model for full length `open' form) and PAK1-fl (inactive `closed' form).