|Home | About | Journals | Submit | Contact Us | Français|
Combinatorial libraries built with severely restricted chemical diversity have yielded highly functional synthetic binding proteins. Structural analyses of these minimalist binding sites have revealed the dominant role of large tyrosine residues for mediating molecular contacts and of small serine/glycine residues for providing space and flexibility. The concept of using limited residue types to construct optimized binding proteins mirrors findings in the field of small molecule drug development, where it has been proposed that most drugs are built from a limited set of side chains presented by diverse frameworks. The physicochemical properties of tyrosine make it the amino acid that is most effective for mediating molecular recognition, and protein engineers have taken advantage of these characteristics to build tyrosine-rich protein binding sites that outperform natural proteins in terms of affinity and specificity. Knowledge from preceding studies can be used to improve current designs, and thus, synthetic protein libraries will continue to evolve and improve. In the near future, it seems likely that synthetic binding proteins will supersede natural antibodies for most purposes, and moreover, synthetic proteins will enable many new applications beyond the scope of natural proteins.
Protein-protein interactions are fundamental to biology 1-3. Consequently, there is great interest in understanding factors governing protein-protein interactions, as this knowledge would enable accurate prediction of natural binding partners and would also aid the efficient design of engineered proteins and chemical compounds for capturing proteins and perturbing their cellular functions.
Numerous studies of natural protein systems have served to advance our understanding of the fundamental principles governing protein-protein interactions. Traditional investigations have proceeded along similar lines and have relied on the following major steps: (i) biophysical characterization of the interaction in terms of affinity and specificity, ii) determination of the three-dimensional structures of the protein-protein complex, and ideally, of each component alone, (iii) comparison of homologous sequences, and (iv) systematic mutagenesis to dissect the energetics at the interface. Studies of this type are exemplified by the extensive studies of the interactions between human growth hormone and its receptor 4-6 and β–lactamase and its inhibitor 7, 8. These studies have revealed details that are specific for each system, but more importantly, they have also contributed to a general understanding of the importance of shape and electrostatic complementarity in natural protein-protein interactions.
Most natural proteins have been shaped under biological selection pressure over the course of evolution. However, there are vast combinations of amino acids even for a relatively small interaction interface, and even over the long times spans of natural evolution, nature has sampled only a small fraction of the theoretical sequence space. Furthermore, natural interfaces have been optimized for specialized biological functions, and nature does not necessarily maximize either affinity or specificity. Thus, it is unlikely that natural protein complexes represent optimal solutions, when considering protein function from a purely biophysical perspective. This view is supported by numerous studies in which the affinities and specificities of natural protein-protein interactions have been improved through protein engineering 9-14. Considering the constraints and complexities inherent to natural systems, we have come to believe that a full understanding of the chemistry and physics of protein-protein interactions necessitates the establishment of simplified systems that are not constrained by biology.
Advances in the understanding of protein function and in protein engineering technologies have enabled the generation of “synthetic” binding proteins with binding sites built by combinatorial design rather than from natural diversity 15-18. Synthetic binding proteins are generated by selection from combinatorial libraries using molecular display technologies. In synthetic libraries, the composition of amino acid diversity and the locations at which diversity is introduced are precisely defined. To a first approximation, functional molecules derived from such libraries using in vitro methods are selected according to their functional properties devoid of ill-defined biological biases. Thus, the analysis of synthetic proteins is likely to provide a less biased view of the mechanisms governing molecular recognition and may also lead to a better understanding of natural proteins.
In recent years, minimalist synthetic proteins have emerged as a particularly intriguing product of protein engineering investigations aimed at defining the minimal requirements for specific protein-protein interactions 18. By severely restricting the amino acid diversity used in libraries, these investigations aim to uncover principles that are obscured in natural systems due to high levels of “evolutionary noise”. Outcomes reviewed in this article prove that this approach is indeed highly effective. Of particular importance has been the finding that tyrosine residues are exceptionally versatile for mediating contacts at interfaces, which helps to rationalize trends found in natural interfaces and provides guidelines for engineering highly functional synthetic binding proteins.
The engineering of synthetic binding proteins has been enabled by technologies that allow for the generation and testing of large libraries of combinatorially mutated binding sites built on defined frameworks. Phage display has been the most commonly used technology, but other methods such as yeast, ribosome and mRNA display have been developed as viable alternatives 17, 19, 20. As exemplified by phage display (Figure 1), molecular display technologies establish unambiguous physical linkage between the phenotype of the protein and the genotype of the encoding DNA. Mutations introduced into the DNA are translated into diverse libraries of proteins, which can be used in binding selections to isolate members with affinity and specificity of interest. At the end of the selection process, the sequences of binding proteins can be deduced by sequencing the encoding DNA. These technologies provide an in vitro version of classic Darwinian evolution with precise control over the library design and the selection process.
The most common scaffolds for synthetic binding proteins have been antibody fragments, including heterodimeric antigen-binding fragments (Fabs) 21-25 monomeric single-chain variable fragments (scFvs) 26-28 and heavy-chain variable (VH) domains 29. However, other proteins have been adapted as scaffolds to produce binding proteins with affinities comparable to those of antibodies (Figure 2) 15, 30, 31. Notably, structural analysis has shown that different scaffolds impose different constraints on the interface shape. The binding sites of Fabs and scFvs are typically flat, but cavities, clefts or protrusions can be created by different combinations of frameworks and hypervariable loops 32, 33. Single-domain β-sandwich proteins, such as VH and fibronectin type III (FN3) domains, form a convex surface 34, 35, while ankyrin repeat proteins provide a slightly concave shape 36-38 and the lipocalin fold presents a deep cavity that can accommodate small molecules 39. Protein engineers can take advantage of this diverse array of scaffolds to design synthetic binding proteins with functional properties tailored for specific demands.
Synthetic proteins produced using molecular display technologies are still made of natural amino acids, and thus, they obey the same rules that govern natural proteins. Consequently, the knowledge gained from analyses of natural protein-ligand interactions has provided valuable guidelines for designing strategies to generate synthetic binding proteins.
Structural studies of many protein-ligand complexes have shown that proteins use all types of structural motifs (i.e. loops, turns, helices and sheets) to interact with other molecules. The underlying principle is that proteins accomplish specific recognition of cognate ligands by achieving shape and electrostatic complementarity. In the case of tight protein-protein interactions, the interface is contiguous and quite large with an average buried surface of ~1,600 Å2 on both sides 40. In general, interfaces are also closely packed 41, and systematic mutagenesis studies of high-affinity interfaces have revealed a characteristic spatial organization for the contacting residues. Binding interfaces tend to have “hot spots” that are intolerant to substitutions, but outside the hot spots the interfaces are remarkably plastic to amino acid substitutions 8, 42. A significant bias in amino acid composition was found amongst interface residues 40, and hot spots are particularly enriched in tyrosine, tryptophan and arginine 43.
The features of natural proteins have been formed by evolution, and thus it is difficult to define the minimum requirements for effective interactions. Recent systematic mutagenesis analyses have shown that natural interfaces can be further simplified. An “alanine shaving” analysis of human growth hormone showed that over half of the binding site residues could be simultaneously changed to alanine without significantly affecting affinity for its receptor, so long as most of the hot spot residues were maintained 44. Complementary to this study, affinity maturation of a single-domain antibody with a small binding site resulted in hot spot residues constituting 76% of the binding surface and complete elimination of neutral contacts 12. These “semi-synthetic” studies strongly suggest that protein binding sites can be minimized to an essential cluster of hot spot residues and that natural protein-protein interfaces are larger and more complex than necessitated by biophysical principles.
Examination of natural antigen-binding sites has shown that tyrosine is particularly abundant, as it accounts for ~10% of the total composition of the complementarity-determining region (CDR) loops and ~25% of the antigen contacts in functional antibodies 45. Moreover, analysis of naïve diversity in the third heavy-chain CDR (CDR-H3), which dominates most antigen-binding sites, predicts an even more extreme bias prior to antigen recognition and affinity maturation, as ~40% of the sequence is predicted to be tyrosine and ~30% is predicted to be small amino acids (serine, glycine, alanine and threonine) 46. While these biases may be the coincidental result of genetic biases in antibody genes, we and others believe that tyrosine, together with small residues that provide conformational freedom and space, is uniquely suited for mediating favorable contacts for antigen recognition 22, 45-47. Studies with minimalist synthetic antibodies have proven this hypothesis and have provided a path towards designed antibodies with functions beyond those of natural antibodies.
In the first direct demonstration of the special role of tyrosine in naïve antigen recognition, a synthetic library was constructed using a single Fab framework with CDR diversity restricted to only for amino acid types (tyrosine, serine, alanine and aspartate) 22. Despite the simple design, the library yielded high affinity antibodies against human vascular endothelial growth factor (VEGF). Although the four amino acid types were used equally at the primary sequence level, structural analysis of two distinct antibodies revealed that tyrosine plays a dominant role in the binding interfaces, as the bulky side chains contributed ~50% of the intermolecular contacts and ~70% of the buried surface area on the antibody side of the interfaces (Figure 3A).
In a subsequent study, diversity was simplified even further to a binary combination of tyrosine and serine, and remarkably, the resulting libraries remained highly functional and yielded specific antibodies against a number of protein antigens 48. The structure of a binary Fab in complex with the human death receptor DR5 revealed a chemically homogenous antigen-binding site dominated by tyrosine. In particular, the long CDR-H3 loop, which contributed ~40% of the antibody surface buried upon antigen binding, formed a “biphasic” helix with tyrosine and serine residues clustered on opposite faces and the tyrosine face in contact with the antigen (Figure 3B). Consequently, all of the buried surface area on CDR-H3 was contributed by tyrosine residues, which constitute more than 50% of the buried surface area overall. On the antigen side of the interface, there was no corresponding bias in chemical composition, showing that tyrosine is able to mediate productive binding interactions with a diverse of array of amino acid types. Furthermore, the tyrosine-rich antigen-binding sites were highly specific for their cognate antigens and showed no evidence of “sticky” behavior, as evidenced by excellent performance in cell-based assays 48, 49.
Binary tyrosine/serine libraries provide an ideal background against which the impact of additional diversity can be tested in a systematic manner. In particular, the role of additional chemical diversity can be precisely gauged, as exemplified by a study that explored the intrinsic contributions of glycine and arginine to molecular recognition 50. The two amino acids were added to the tyrosine/serine background and glycine was enriched in functional antibodies, suggesting that the flexibility and expanded range of backbone conformations afforded by this small residue impacts favorably on molecular recognition. In contrast, arginine was depleted in functional antibodies, and somewhat surprisingly, high content of this positively charged residue was correlated with increased levels of non-specific binding. Consequently, it was concluded that glycine is a useful addition to naïve libraries while arginine is generally detrimental to naïve interactions. In the future, analogous studies can be designed to explore the contributions of other amino acids to the function of antibodies and other binding proteins.
In a practical application of this approach, additional diversity was added to the tyrosine/serine background to generate a simple yet highly functional antibody library that has proven to be a rich source of high-affinity Fabs. The so-called “library D” was constructed to be heavily biased in favor of tyrosine and serine, but additional chemical diversity was added to the CDR-H3 loops, which were also varied in length 51. In addition, limited diversity was introduced at buried residues that may influence the conformations of the CDR loops, and length diversity was introduced into the third light-chain CDR (CDR-L3). Library D was incorporated into a high-throughput pipeline for antibody generation and yielded highly functional antibodies with affinities in the single-digit nanomolar range against a panel of 14 antigens. The structure of a high affinity Fab in complex with VEGF again showed the dominance of tyrosine for mediating antigen recognition, but it also revealed that a glycine residue that does not contact the antigen is nonetheless required for the CDR-H3 loop to attain a conformation that enables high affinity binding (Figure 3C).
In subsequent studies, library D has been applied to difficult molecular recognition tasks that have pushed synthetic antibody technology into applications that are beyond the scope of typical natural antibodies. In one case, Fabs were generated to specifically recognize different cross-linked forms of ubiquitin with exquisite specificity (Figure 4A) 52, and in another, Fabs were designed to discriminate between the non-active and active conformations of caspase-1 53. Library D has also been effective against nucleic acids, which have been recalcitrant to natural antibody repertoires, and it has yielded a highly specific Fab that has been used to solve the first structure of an antibody in complex with structured RNA (Figure 4B) 54. In an exciting recent development, Fabs from library D have been used to obtain the structure of the full-length KcsA potassium channel, providing the first view of this integral membrane protein in its native form (Figure 4C) 55. Taken together, these results show that minimalist synthetic antibodies can be applied to many challenging tasks that promise to expand significantly our capacity to address important questions about mechanisms of molecular recognition in biological systems.
The general effectiveness of tyrosine/serine binary interfaces has been demonstrated in non-antibody scaffolds using the small FN3 domain, which is similar in size to a single immunoglobulin domain (Figure 2B) 35. Synthetic FN3 binding proteins that recognize targets with affinities in the low-mid nanomolar range were derived using only tyrosine/serine diversity at as few as 16 positions in just two loops. These results set an exceedingly low threshold of chemical complexity for functional interfaces.
As with synthetic antibodies, an expansion of the tyrosine/serine code was also found to be effective in the FN3 scaffold. Binders were identified from a combinatorial library that was highly enriched for tyrosine and serine but also contained seven additional amino acid types 56. Crystal structures of a model antigen, maltose binding protein (MBP), were solved in complex with an FN3 molecule from the binary tyrosine/serine library or with one from the library with expanded diversity, providing a unique opportunity to directly investigate the role of chemical diversity in a protein-protein interface (Figure 5). In both complexes, tyrosine dominated the contacts from the FN3 domain side of the interface and other amino acid types appeared to be important for conformation. The FN3 domain from the expanded library bound with higher affinity, a slower off rate and a more favorable enthalpic contribution than the one from the binary library, suggesting that additional chemical diversity endowed the interface with better shape complementarity. Notably, the FN3 domain from the expanded library was more tolerant to mutations, suggesting an additional role for amino acid diversity in maintaining evolutionary robustness.
The above described studies lead to an inevitable question: What physicochemical properties of tyrosine make this amino acid exceptionally effective for mediating recognition in protein-protein interfaces? Long before these studies of synthetic binding proteins unequivocally established the importance of tyrosine for molecular recognition, the favorable properties of this versatile residue in the binding sites of natural antibodies were succinctly summarized 45:
“Amphipathic amino acids could readily tolerate the change of environment from hydrophilic to hydrophobic that occurs upon antibody-antigen complex formation. Residues that are large and can participate in a wide variety of van der Waals' and electrostatic interactions would permit binding to a range of antigens. Amino acids with flexible side-chains could generate a structurally plastic region, i.e. a binding site possessing the ability to mould itself around the antigen to improve complementarity of the interacting surfaces. Hence, antibodies could bind to an array of novel antigens using a limited set of residues interspersed with more unique residues to which greater binding specificity can be attributed.”
The tyrosine side chain is indeed amphipathic, large and capable of forming nonpolar, hydrogen-bonding and cation-π interactions. While the side chain is not highly flexible, we contend that its rigidity is an advantage for achieving affinity, rather than a disadvantage, because a small loss of conformational entropy is encountered when a side chain is immobilized in the binding interface. Furthermore, the lack of flexibility likely contributes to specificity, because interfaces that are rich in rigid tyrosine residues are unlikely to accommodate non-specific interactions. Consequently, it appears that tyrosine endows binding sites with characteristics that are important for both high affinity and specificity.
The versatility of tyrosine in forming intermolecular contacts has also been exploited in surface engineering to promote protein crystallization 57. Surface-exposed, “high entropy” residues, such as glutamate and lysine, were replaced with tyrosine, alanine, histidine, serine or threonine. Tyrosine replacements produced crystals under the largest number of crystallization conditions. These results are consistent with our view that the low side chain entropy of tyrosine is advantageous for forming interaction interfaces, although it is difficult to dissect the separate contributions of reduced conformational entropy and charge removal.
Consistent with the notion that tyrosine is particularly versatile amongst the genetically encoded amino acids for mediating molecular interactions, tyrosine-like molecules are used extensively for molecular recognition in natural systems (Figure 6). For example, proteins with high contents of dihydroxyphenylalanine (DOPA) are used by marine mussels to adhere securely to a wide variety of substrates 58, 59. Copolymers containing a high level of DOPA were found to be highly effective as reversible dry/wet adhesives, directly demonstrating the ability of DOPA to form molecular interactions with a variety of substrates 60. Furthermore, DOPA derivatives such as dopamine, epinephrine and norepinephrine act as neutotransmitters and hormones and the commonly used drug acetaminophen (Tylenol©) resembles tyrosine, suggesting that tyrosine-like molecules are predisposed to form specific interactions with protein receptors.
In summary, we contend that the physicochemical character of tyrosine makes it the genetically encoded amino acid that is most effective for mediating molecular recognition. A particularly illustrative example in support of our contention is provided by a study in which a di-tyrosine motif in thrombin-binding synthetic proteins was converted into an active tetrapeptide 61. These results suggest that it may be possible to develop small-molecule lead compounds by transferring minimalist tyrosine-rich motifs to non-protein scaffolds. Notably, the concept of using limited residue types to mediate molecular recognition in synthetic protein interfaces parallels wisdom that has been gained in small molecule drug development. A systematic analysis of common features among known drug molecules revealed that 11,000 of 15,000 drugs are constructed using “top 20” side chains 62. These results indicate that the chemical diversity of small molecule drugs is limited, and similar to how the minimalist synthetic protein interfaces present tyrosines in distinct conformations, these drugs present similar side chains in diverse conformations using a variety of molecular frameworks 63. In both proteins and small molecules, it appears that conformational diversity trumps chemical diversity for efficient molecular recognition.
It is now evident that tyrosine-based minimalist synthetic libraries can produce highly functional recognition interfaces against a remarkable range of proteins. These findings raise the question of just how far we can push this minimalist approach. Because most of the studies to date have focused on the recognition of globular proteins, little is known about the capacity of tyrosine-based interfaces to recognize flexible peptides and nonprotein targets such as small molecules and sugars. Also, as tryptophan shares many attributes with tryosine, the question remains whether tryptophan may be able to replace tyrosine as the main contributor to interface energetics. The significant enhancement of library performance by supplementing the tyrosine/serine binary code with additional diversity suggests that we might be able to find the “ultimate” amino acid composition that balances a high content of tyrosine and tailored addition of other amino acids to produce optimal shape and electrostatic complementarity.
Clearly, further investigation is needed to answer these outstanding questions. However, synthetic protein engineering employs highly controlled library designs and selection procedures, ensuring that we will be able to gain such knowledge in a comprehensive and objective manner. Furthermore, in vitro selection systems are set up to benefit from knowledge gained from preceding experiments, and thus, synthetic protein libraries will continue to evolve. Therefore, we are optimistic that in the near future synthetic binding proteins will supersede natural antibodies for most purposes and will enable many new applications beyond the scope of natural proteins.