algorithm has enabled the design of the first high-affinity hexapeptide CAL PDZ inhibitor with demonstrated ability to rescue ΔF508-CFTR. By interfering with CAL-mediated degradation, our best designed peptide, kCAL01, can act as a CFTR “stabilizer,” allowing ΔF508-CFTR to recycle back into the membrane. Currently the only well-studied ways to rescue mutant CFTR function with drug-like molecules are through “potentiators” and “correctors” which do not address the problem that ΔF508-CFTR is rapidly endocytosed and degraded at physiological temperatures 
. Like other CAL inhibitors, kCAL01 should work in conjunction with potentiators and correctors to create an additive effect 
kCAL01 was observed to increase ΔF508-CFTR activity by 12%. While this effect is clearly statistically significant (
), we also wished to assess its magnitude relative to the effect of known rescue compounds. The performance of kCAL01 was benchmarked using polarized human airway epithelial cells derived from a CF patient (stably expressing ΔF508-CFTR; CFBE-
F cells). In these cells, CFTR rescue is more challenging than in heterologous cells, but the levels of rescue observed are more likely to reflect the physiological situation. Since CFTR modulation is extremely sensitive to experimental conditions, and particularly to the type of cells used 
, we chose to compare the performance of kCAL01 against the corrector corr-4a. There are two reasons for this choice for comparison: (a) corr-4a is a well established benchmark for CFTR correctors 
; and (b) directly comparable data are available based on our previous studies 
. Under identical experimental conditions, corr-4a produces a 15% increase in ΔF508-CFTR levels in CFBE-
F cells 
. Thus, the 12% increase seen with the kCAL01 inhibitor peptide is similar to that produced by a first-generation corrector. Since corr-4a and kCAL01 have orthogonal mechanisms of action, this enables additive rescue as an attractive treatment option. Specifically, in the long term the therapeutic impact of CAL inhibitors is likely to be enhanced by their ability to provide additive rescue with correctors, offering the prospect of combination treatment 
To design kCAL01 we developed a novel, provable, ensemble-based protein design algorithm for protein-peptide and protein-protein interactions. The validation of
by comparing its predicted binding scores to CAL peptide-array data demonstrates
's strong ability to enrich for human protein sequences that bind CAL. While the HumLib array showed that CAL binds a specific motif, it also shows (along with the ProLib array) that CAL does not bind all sequences that match the motif. In HumLib, 191 of 261 sequences that match the motif did not bind CAL. Moreover, all of the peptides synthesized for this work (kCAL01-kCAL31) match the CAL motif, but have a wide range of binding affinities. Therefore,
needs to perform the difficult task of differentiating the affinities of peptides that share the CAL motif, rather than merely separating motif from non-motif sequences. The HumLib analysis, FP analysis of top and poorly-ranked
predictions, and the ProLib analysis all show that
is able to enrich for sequences within the CAL PDZ sequence motif that have high-affinity interactions with CAL.
The experimental validation of top-ranked
sequences confirms that
prospectively predicted novel high-affinity CAL peptide inhibitors. Compared to the inhibitory constant of the natural CFTR C-terminus, the designed sequences are much stronger binders. Indeed, our approach found peptide sequences that bound more tightly than iCAL35, the best previously known hexamer sequence. Interestingly, even though iCAL35 binds to the CAL PDZ domain, it is unable to mediate significant or substantial rescue of ΔF508-CFTR in CFBE-
F cells (). The designed inhibitor's improvement in binding directly translates to increased ΔF508-CFTR activity in CF-patient derived airway epithelial cells, demonstrating the value of using our computational approach to design protein
Current therapeutics known to rescue CFTR function are small molecules generally discovered through high throughput library screens 
. To find CFTR stabilizers we needed to discover inhibitors that could block the CAL-CFTR PPI. Unfortunately, small molecules that inhibit PPIs are rare and the development of such inhibitors has been very difficult due to the shallow, distributed nature of the interfaces 
. Therefore, we have focused on tools to design peptide inhibitors, developing and validating a new
algorithm that has identified low molecular weight, high-affinity sequences. While our previous work employed high-throughput peptide arrays to screen for inhibitors 
, the computational design approach can easily and accurately be expanded beyond the limits of peptide array synthesis, providing a novel avenue for identifying CF therapeutic leads with improved affinity, specificity, and proteolytic stability.
In this paper we have focused on improving peptide inhibitor affinities, but our success suggests that
can also be used to improve peptide specificity and proteolytic stability. For optimal biological efficacy, CAL inhibitors should avoid off-target effects, including interactions with other CFTR trafficking proteins (), such as the NHERF family 
. To achieve peptide specificity,
could be run to find peptides that did not bind well to these off-target interactors, a process known as negative design
. The experimentally-tested poorly-ranked
predictions all had a worse affinity for CAL than the top-predicted peptides ( and ). This suggests that
has the capability to conduct negative design for the CAL system. Also, we have shown the successful application of
negative design to other biological systems 
. Finally, since the efficacy of natural peptides is often limited by proteolytic stability, it could be beneficial to extend the
software to incorporate non-natural amino acids, such as d-amino acids, into the design search space. This will allow the design of compounds that inhibit CAL, but cannot be degraded as readily as linear L-peptides.
scoring function uses energy terms for electrostatics, van der Waals energy, and implicit solvation.
also utilizes an approximation of conformational entropy factors through its ensemble-based scoring 
. Analysis of these components can potentially identify important interactions in the top peptide inhibitor designs. Comparing the average energy contribution for the top 30 predictions to the median for all designs we find that all components contribute favorably to the peptide binding, with van der Waals giving the largest benefit (−11.2 kcal/mol), followed by electrostatics (−10.9 kcal/mol), and finally solvation (−8.2 kcal/mol). However, even within the top 30 predictions the dominant energetic component varies greatly (electrostatics is dominant for 12 sequences, van der Waals for 6 sequences, and solvation for 12 sequences).
Tidor and co-workers 
have suggested that design predictions are best when re-ranking structures using a purely electrostatic energy function. We addressed this possibility by comparing the AUC obtained from a purely electrostatic function vs. that obtained from our complete energy function. If we use only the electrostatic term, the AUC was 0.61 (bound energy only) or 0.66 (bound minus unbound). Both values are significantly lower than the 0.84 AUC value obtained with the full function. Thus, while electrostatic terms are important to the success of the algorithm, inclusion of a more complete energetic model improves the prediction. In fact, no individual energy term outperforms the
score when classifying the peptide array data. Thus,
predicts its successful designs by accurately incorporating all three energy terms through ensemble-based scoring.
Many of the binding sequences identified by
contain a positively charged residue (R/K) at
. Similarly, in the HumLib array, about 26% of the sequences that we consider to be binders contain a positively charged residue at
, and in the ProLib array 53% of the binders contain an R/K at
. Based on our previous NMR analysis 
Arg can form a salt-bridge with Glu309 on the periphery of the CAL binding site (), an electrostatic contribution that could theoretically dominate the ROC curve analysis. However, because 74% of the top binding sequences in the HumLib array do not contain the
R/K, the strong
AUC values suggest that it must also correctly predict these sequences. To test this assertion more forcefully, we removed all of the sequences with a positively charged residue at position −1 and then recalculated the ROC curve. This results in an AUC of 0.82, almost identical to the value of 0.84 obtained with all sequences. Thus, consistent with the significant contributions of each term in the energy function, the ROC behavior of the algorithm is not dependent on the presence or absence of a positively charged residue at
A small number of
values were used to train the new
algorithm to properly scale energy terms for protein-peptide interactions, which can now be used for additional protein-peptide interaction designs. Besides the training, the only system specific data used was the input starting structure and CAL sequence motif. The sequence motif was used as an optional filter to expedite the search, but should not affect the ability of
to find high-affinity inhibitors. As seen from the HumLib peptide array comparison,
yields a higher ROC AUC when considering the entire array, which implies that
is better at distinguishing CAL peptide inhibitors from the entire sequence space than from within only the known sequence motif. This suggests
will be able to find new high-affinity inhibitors if the search space is expanded.
Beyond its utility in the design of enhanced CAL inhibitors, the
algorithm represents a general framework for analyzing PDZ domains and other protein-protein interfaces. PDZ domains are among the most common interaction domains in the human genome 
. Using traditional biochemical approaches, the characterization of the binding affinity of candidate partners, as well as the identification of high-affinity reporters and inhibitors, often requires the individual synthesis of dozens of peptides, many of which fail to interact robustly. As shown for CAL,
offers a facile mechanism to predict affinities and to design novel ligand sequences using only an initial input structure. Furthermore, the proofs and algorithm presented here provide a general approach for modeling peptide-mediated PPIs that regulate a wide variety of critical physiological processes.
The source code of our program is freely available, and is distributed open-source under the GNU Lesser General Public License (Gnu, 2002). The source code can be freely downloaded at http://www.cs.duke.edu/donaldlab/osprey.php