|Home | About | Journals | Submit | Contact Us | Français|
Engineered binding proteins derived from non-antibody scaffolds constitute an increasingly prominent class of reagents in both research and therapeutic applications. The growing number of crystal structures of these “alternative” scaffold-based binding proteins in complex with their targets illustrate the mechanisms of molecular recognition that are common among these systems and those unique to each. This information is useful for critically assessing and improving/expanding engineering strategies. Furthermore, the structural features of these synthetic proteins produced under tightly controlled, directed evolution deepen our understanding of the underlying principles governing molecular recognition.
Advances in technologies and in our understanding of molecular recognition mechanisms have enabled us to create novel molecular recognition sites on otherwise inert proteins that serve as molecular scaffolds[1, 2]. Efforts over the past ~15 years have established multiple effective platforms for generating binding proteins. Such engineered binding proteins may serve as alternatives to antibodies in medical and industrial applications and have potential advantages such as superior biophysical properties, ease of production and efficient tissue penetration.
Alternative scaffold-based binding proteins are typically generated through directed evolution. In this type of approach, large combinatorial libraries are created in which the amino acid sequence of a contiguous patch on the surface of a starting scaffold is extensively diversified. Functional binding proteins are isolated from these libraries using molecular display technologies such as phage display, yeast display and ribosome/mRNA display . Because the creation of a high-affinity binding surface often involves the mutation of ≥10–15 residues, the total number of sequences that could be encoded vastly exceeds the number that can be experimentally sampled. Thus, appropriate choices of which positions to diversify and to which amino acid types are crucial to achieving success. Three-dimensional (3D) structural data are critical in making these decisions. Clearly, a structure of the starting scaffold is essential for designing a combinatorial library, but. subsequent structural characterization of engineered binding proteins is equally important. Such structures reveal how the engineered molecules actually achieve their function, validating or correcting hypotheses and offering insights into how library designs might be improved. Thus, combinatorial library design and structural characterization ideally form a positive feedback loop in which library designs are evaluated and incrementally improved based on structural data (Fig 1).
Of the numerous alternative scaffold systems that have been explored, there are now multiple 3D structures of binder/target complexes available from four: monobodies (derived from the tenth fibronectin type III (FN3) domain of human fibronectin), affibodies (derived from the immunoglobulin binding protein A), DARPins (based on Ankyrin repeat modules) and anticalins (derived from the lipocalins billin-binding protein and human lipocalin 2) (Table 1). The increasing number of structures available from these systems, nearly 30 now in the protein data bank (PDB), may allow us to extract meaningful information that goes beyond isolated “anecdotes”. Here we review these structures and discuss the insights they provide for protein engineering. We limit our discussion to scaffold systems with multiple available structures so that trends and tendencies may be assessed. In addition, we compare these engineered interfaces to natural interfaces, and discuss what these comparisons reveal about mechanisms of molecular recognition.
Combinatorial libraries in alternative scaffold systems are designed with a particular mode of interaction with a target in mind and it is assumed that the diversified surface will mediate this interaction. Structural characterization tests this assumption and occasionally reveals unanticipated modes of interaction.
The diversified surface in DARPins is comprised by positions on a series of α-helices and well-structured loops that have been chosen because they often mediate interactions in natural ankyrin repeat proteins (Fig 2A) . In all DARPin structures, this surface is used as envisioned to bind to targets. The structures of DARPin/maltose binding protein (MBP) and DARPin/BppL complexes provide examples (Fig 2A) [4, 5]. One measure of how well a structure agrees with a library design is the percentage of diversified positions that actually contact the target molecule. By this measure, among structurally characterized scaffolds, DARPin/target complexes most closely match their library design with 75% of diversified positions contacting target on average. Diversified positions in DARPins contribute an average of 68% of DARPin buried surface area and 54% of all target-contacting residues. Thus, although diversified positions typically comprise the majority of DARPin binding sites, undiversified regions, for example several positions near the center of a helix of the ankyrin repeat units, often contribute as well (Fig 2B). Conversely, there are other positions in the loops of the DARPin scaffold that are diversified in the library but do not contact target in any available crystal structures (Fig 2B). These structural data provide useful guidelines for further improving the library designs.
Anticalin libraries diversify positions in loop and sheet regions that line a pocket in the basket-like lipocalin scaffold (Fig 2A). Natural lipocalins typically use this pocket to bind small molecules. Anticalins use this surface to recognize their targets in all available structures, and thus, are also generally consistent with their library designs. However, a small number of available structures makes it impossible to assess general trends. In the anticalin/CTLA-4 complex, 63% of diversified positions contact the target. Additionally, 63% of binder buried surface and 41% of all target contacting residues are contributed by diversified positions. Alternative library designs for small molecule targets have also been constructed and resulting anticalins utilized the expected basket-like pocket for binding (Fig. 2A).
Affibody libraries essentially diversify an entire face of the helical protein A scaffold (Fig. 2A). Affibodies also generally recognize their targets as envisioned. On average, 90% of diversified positions contact target in affibody structures. These diversified positions contribute 80% of affibody buried surface and 62% of all target contacting residues. Contributions made by non-diversified positions are all from residues adjacent to diversified positions. Although these numbers suggest that affibody structures match their library design even more closely than DARPin structures do, the structure of an affibody bound to Aβ(1–40) peptide  provides a dramatic exception (Fig. 2A). This affibody unexpectedly formed a disulfide linked dimer through an engineered cysteine residue and mutations introduced into a normally helical region of the scaffold altered the structure to a β-sheet conformation which bound to the Aβ peptide in a hairpin conformation by forming an intermolecular β-sheet (Fig 2A) . Relatedly, some affibodies are partially or completely unstructured in the unbound state [7, 8]. Thus, although the affibody system has been effective, its structural integrity seems to be often compromised, possibly because of its helical bundle architecture that, unlike β-sheets, lacks strong linkage across secondary structure elements. This low structural stability makes it difficult to predict the mode of recognition for individual affibodies. These examples highlight the importance of structural characterization in understanding how engineered proteins achieve their function.
In monobodies, three surface exposed loops at one end of the molecule are typically used to construct a binding site with an intention of mimicking antibody-like, loop-mediated interaction (Fig. 2A) . Several monobodies indeed interact with their targets using these three loops, as anticipated (Fig. 2A, MBP and hSUMO1 complexes) and diversified positions contribute an average of 80% of monobody surface area burial and 68% of all contacting residues [10–12]. However, among the four systems, monobody structures show the greatest departure from their envisioned binding mode and, on average, only 51% of diversified positions contact target in monobody structures. Unexpectedly, most monobodies use a single diversified loop and positions in the undiversified β-sheet regions to form a binding site (Fig 2A, SH2 domain complex and Fig 2B) . In a recent study, a monobody/yeast SUMO structure exhibiting this interaction mode was used as a guide to create a SUMO-targeted library that diversified the single-loop and contacting positions in a β-sheet. This library generated binders to other SUMO family members where a conventional monobody library did not. Mutations in the β-sheet region were crucial to this success . Building on these observations, a new library design was recently reported in which this loop and a loop located on the opposite end of the molecule were diversified along with one face of the β-sheet that frequently contributed to target recognition in existing monobody structures (Fig 2A) . This library was as effective as a conventional loop-based library in producing high-affinity monobodies to three distinct targets, and more effective in some cases. The crystal structure of a monobody isolated from this library validated this approach  (Fig. 2A). This example highlights the value of structural analyses in guiding the development of new and improved library designs.
High-affinity interactions require shape and chemical complementarity. This complementarity is achieved using a variety of secondary and tertiary structural elements in natural protein-protein interactions. The structural features of binding sites in the four alternative scaffolds discussed here are similarly diverse. The success of the all these systems in producing high-affinity binders to diverse targets suggests that scaffold architecture matters little as long as diverse amino acid sequences can be presented without compromising overall structure. However, these crystal structures show that each scaffold exhibits its own distinct mode(s) of recognition that may reflect their underlying architecture.
DARPins tend to recognize convex surfaces in their targets, complementary to their concave binding sites (Fig 2A). Similarly, anticalins with their basket-like architecture tend to “cradle” their targets (Fig. 2A). Affibodies have a flat binding site architecture and tend to recognize similarly flat surfaces in their targets (Fig. 2A). As discussed, monobodies use two distinct binding surfaces with different topographies. The monobody/MBP complexes exemplify cleft recognition using the protruding surface formed by three diversified loops (Fig 2A) [10, 11]. This concave epitope is distinct from a convex epitope recognized by an MBP-binding DARPin  (Fig. 2A). Camelid single-domain antibodies, which are similar to monobodies in structure, preferentially bind to clefts supporting a correlation between convex binding site topography and cleft binding . The alternative recognition mode of monobodies utilizes a concave binding site that targets convex surfaces (Fig 2A). A very recent report published after the completion of this paper describes the structures of two FN3-based binders in complex with their targets. In one, a loop-driven binding surface is used to recognize a cleft and in the other, the concave alternative surface is used to recognize a convex epitope, again illustrating two distinct binding modes in monobodies . Notably, the use of this alternative binding surface appears to dominate in cases where the target lacks a prominent cleft. Taken together, these data indicate that the topography of a scaffold’s binding site is closely correlated with the type of epitopes that it recognizes with high affinity. This observation in turn suggests that one could improve the success rate of binding protein selection through judicious choice of scaffold or library design to complement the topography of an intended epitope.
Because of the limited coverage of sequence space in combinatorial libraries, proper choices of not only positions to be diversified but also amino acid types to be included substantially impact the odds of generating high-affinity binders. DARPin, affibody and anticalin libraries have all used largely unbiased sets of amino acids [4, 17–19] (Fig. 3A). In contrast, recent monobody libraries have utilized highly biased amino acid sets [10, 13], inspired by the enrichment of certain amino acids, particularly tyrosine, in the antigen-binding site of natural antibodies and also in other natural protein-protein interfaces [20, 21] (Fig. 3A).
In monobodies, aromatic amino acids are, not surprisingly, heavily enriched in interfaces. Aromatic residues occur, on average, at 37% of target contacting positions (41% of contacting, diversified positions) and contribute 54% of monobody surface area burial (Fig 3A). These figures are in line with highly biased library designs (Fig 3A). Notably, the overall usage of amino acid types at target contacting positions matches well with the typical monobody library design, suggesting that this composition is well-suited for recognizing diverse protein targets in this system.
Although not strongly emphasized in DARPin libraries, aromatic amino acids are enriched in DARPin binding surfaces, on average, occurring at 18% of target contacting positions (27% of contacting, diversified positions) and contributing 33% of DARPin surface area burial (Fig. 3A). When considering all target contacting positions (including non-diversified positions) DARPins also show an enrichment of aspartate and leucine (Fig. 3A). This enrichment is likely a byproduct of leucine and aspartate residues located in the non-diversified segments connecting the diversified positions in the helices and loops that are consistently in the binding interface (Fig. 2A).
There are only three affibody structures available that reflect a “typical” mode of binding, so it is difficult to draw a general conclusion for this system. However, the enrichment of aromatics is less prominent in affibodies and other amino acid types such as arginine and valine are more strongly enriched (Fig. 3A), potentially indicating different roles of amino acids in the entirely helical context. Similarly, four of the five available anticalin structures are complexes with small molecules, which makes this type of analysis less meaningful.
Structural data reveal a consistent enrichment of aromatic amino acids across different scaffold types and even in the context of largely unbiased libraries. The unique versatility of tyrosine, in particular, in forming intermolecular contacts has been attributed to the many types of interactions that the tyrosine side chain can make (aromatic stacking, hydrogen bonding, hydrophobic interactions, cation-π interacitons) [21, 22]. Interestingly, glutamine and histidine are consistently depleted at diversified positions in three scaffolds (Fig. 3A). Together, these data strongly suggest that highly biased amino acid compositions may be a generally effective strategy in library design.
Relationships between structure/affinity have been extensively examined in the context of natural interfaces. However, simple structural parameters such as interface size, packing, and hydrophobicity have been shown to be poor predictors of affinity [20, 23]. Although all these parameters surely influence affinity, their relative contributions appear to be highly context-dependent, minimizing their individual predictive power.
Because engineered binders are produced in the context of an invariant scaffold, one might expect that the structural determinants of affinity are more consistent and thus more evident. However, obvious structural determinants of affinity are also difficult to identify in alternative scaffold binders. Neither the interface size nor shape complementarity (Sc) value, which reflects the efficiency of interface packing, is a good predictor of affinity (Fig 3B and 3C). One striking example is a DARPin with a 22 nM Kd that exhibits an SC value of 0.48, well below that of many other protein-protein interactions (e.g. 0.67 for antibody-antigen interactions) and the average for alternative scaffold structures (0.70) .
Structural data inform us about how alternative scaffolds can be best used, how library designs might be improved, and what factors govern molecular recognition in these systems. High-affinity binders can be isolated from well-established scaffold systems, but the interface topography of individual systems dictates their epitope preferences. Structural analyses of these systems have revealed discrepancies between designed and actual usages of positions and amino acid composition for target recognition. Such knowledge will accelerate the evolution of library designs and consequently the generation of high-performance binding proteins. Systematic and in-depth analyses of the growing structural dataset of engineered binding protein/target complexes may contribute to an improved understanding of molecular recognition mechanisms.
We thank Drs. John Wojcik and Akiko Koide for critical reading of the manuscript. This work was supported by the NIH grants R01-GM072688 and R01-GM090324.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.