|Home | About | Journals | Submit | Contact Us | Français|
Generation of novel protein functions is a major goal in biotechnology and also a rigorous test for our understanding of the relationship between protein structure and function. Early examples of protein engineering focused on design and directed evolution within the constraints of the original protein architecture, exemplified by the highly successful fields of antibody and enzyme engineering. Recent studies show that protein engineering strategies which step away from these natural architectures, i.e. by manipulating the organization of domains and modules thus mimicking nonhomologous recombination, is highly effective in producing complex and sophisticated functions both in terms of molecular recognition and regulation.
Analysis of genomes shows that protein evolution progresses via point mutations, duplication and recombination of genes under selective pressure. Technologies that simulate these mechanisms can be effective strategies to generate novel protein functions and architectures for diverse applications, a major goal in the field of protein biotechnology. To date, a majority of protein engineering efforts has utilized methodologies that recapitulate the processes of gene duplication and subsequent sequence divergence. Specific and random mutations are introduced by knowledge-based design and/or more random means such as error prone PCR into a single, essentially invariant scaffold, and mutants exhibiting a desired function are identified through screening and selection. [1–9]. This type of functional evolution generally is incremental and requires a starting scaffold that is already predisposed to the desired type of function.
Nonhomologous genetic recombination either by genome rearrangement and alternative splicing can produce new combinations of gene fragments and thereby drastically different polypeptide sequences. Bioinformatic analyses have suggested that domain recombination is a major driving force for leaps in protein function [10–14]. This view was also supported by simulations showing that evolution via nonhomologous recombination of protein segments is many orders of magnitude more effective than point mutation in acquiring significant new function . Herein a domain is defined as an evolutionary and structurally separable unit within a protein. Individual domains of eukaryotic, multi-domain proteins are often encoded in an exon. A domain in isolation may or may not be autonomously folded into a well-defined structure. In contrast, a module, another term often used in the context of protein evolution, is defined as a functionally minimal unit that is transferable from one protein context to another . A module may contain multiple domains, or it may not even contain a domain, as in the case of a short peptide segment containing a binding site for another module. There have been increasing number of successes in generating new protein functions by methodologies that recapitulate rearrangement and combinations of domains and modules, demonstrating they are indeed powerful means to generate large functional changes and expand the repertoire of synthetic proteins. This review focuses on recent design and engineering studies that are based on a structural and mechanistic understanding of how combinations of domains and modules define protein functions.
The design and engineering of specific and high-affinity recognition functions has been a major goal in protein engineering with clear applications in therapeutics and diagnostics. The main approach in this field has been to use a single, stable scaffold corresponding to a single module, such as the Fab and Fv segments of immunoglobulins and also non-antibody scaffolds [4,7]. Mutations are introduced in a small portion that is expected to form a contiguous "patch" within such a scaffold (e.g. the complementarity-determining regions of the immunoglobulins) to produce a repertoire while maintaining the overall domain architecture and tertiary structure, and variants are then identified using methods such as phage display (Fig. 1a) [17,18]. These strategies mimic evolution through point mutations and homologous recombinations. Recent studies described below suggest that binding proteins with multiple recognition patches each residing on separate domain offer distinct advantages over the traditional, single-patch binding proteins.
Enhancement of affinity and specificity through multivalent interactions, or avidity, is widely exploited at molecular and cellular levels . Here I classify avidity into homotropic and heterotropic. The former refers to enhancement due to multiple copies of identical interactions and the latter due to multiple distinct interactions.
Exploiting homotropic avidity requires relatively small costs in engineering effort. This type of multivalent approach can be used with any molecular recognition module. For example, the immune system exploits this principle by using IgM, which contains a total of ten identical antigen-binding sites that increases the effective affinity particularly to homo-oligomeric antigens. An oligomeric version of synthetic binding protein based on the fibronectin type III domain (FN3), a small antibody-like domain, showed significantly higher affinity to a cell surface antigen .
Heterotropic avidity can be created by concatenating multiple domains with flexible linkers where each domain recognizes different epitopes on a target (Fig. 1b). Prominent natural examples of such "beads-on-a-string" proteins for molecular recognition are nucleic acid-binding proteins using tandem repeats of the zinc-fingers (ZFs), ~23-residue modular domains each folding into a compact structure. Structural studies have revealed that each ZF recognizes three nucleotide base pairs and that tandem repeats of the zinc-finger modules can bind to long stretches of nucleotide sequences to achieve high levels of specificity. By exploiting this modular design, several groups have developed synthetic zinc-finger proteins . It is now possible to construct nucleases and transcription factors with exquisite specificity using poly-ZF proteins .
Molecules for recognizing proteins conceptually equivalent to multi-ZF proteins, termed "avimers", have been developed using small, disulfide-linked protein modules called A domains as a building block . The A domains are small (~35 residues) autonomously folded domains primarily stabilized by disulfide bonds and calcium binding. Similar to ZFs, A domains occur as multiple repeats in human receptor proteins, and a target is contacted by multiple A domains with each domain binding to a distinct epitope. By mimicking these natural receptors, these avimers were generated using phage display methods by step-wise procedures that mimic early ZF engineering , where a single A domain binding to a target is initially selected, then an adjacent A domain is optimized in the context of the first selected domain and so on. Individual A domains in an avimer recognize distinct epitopes of a target, which leads to high specificity and affinity (sub-nM Kd) using two or three A domain modules. This work has established a powerful strategy to generate high levels of molecular recognition function.
Structural analysis of multi-domain proteins, particularly enzymes, suggests that dramatic changes in function have emerged through joining protein domains and adjusting the newly created interface between them [10,24]. For example, the classical NAD-binding Rossmann fold domain is joined to distinct substrate recognition domains to form a series of active, NAD(P)-dependent enzymes . A recent comprehensive study of the haloacid dehalogenase superfamily convincingly show that its subfamilies with distinct substrate specificity have emerged as a result of multiple, independent evolutionary events that inserted a domain (termed "cap") into the core catalytic domain at different positions . The emergence of distinct caps correlate with the acquisition of new catalytic capacities. In this type of domain combination, there are extensive interactions between the two linked domains, suggesting that the inter-domain interface has been evolutionarily tuned by a series of point mutations. These observations support an evolutionary path for generating sophisticated functions by constructing a single active site at the interface between two newly combined, evolutionally unrelated domains (Fig. 1c).
Huang et al. demonstrated that highly specific and complex protein functions indeed can be generated at a newly created domain interface . Their goal was to produce highly specific and tight binding proteins to short, unstructured peptides, a class of difficult targets for molecular recognition using the conventional, single-patch strategy. A fusion protein of the erbin PDZ domain (a low-affinity peptide-binding domain) and a functionally inert second domain that serves as a diversity-presenting scaffold (the FN3 domain) was made and subsequently the interface between the two domains was optimized by combinatorial library selection of FN3 loops using phage display. This process, termed "directed domain-interface evolution", dramatically enhanced both affinity and specificity to a target peptide (~500 fold to single nM Kd and ~6,000 fold, respectively), levels unattainable by optimizing the binding interface of a single PDZ domain . The x-ray crystal structure of the evolved fusion protein confirmed that the design principles were successful. The two fused domains formed a clamshell architecture inside which the target peptide was bound (Fig. 1d). The optimized FN3 loops interact with both the target peptide and the PDZ domain extensively, thus significantly enlarging the binding surface for the peptide. These binding proteins, termed "affinity clamps", have a single-nanomolar dissociation constant and outperform a monoclonal antibody in immunochemical applications, demonstrating their practical utility.
Their subsequent work mapped the specificity profile of a highly specific affinity clamp using a phage display peptide library. These results revealed that "affinity clamping" can expand the peptide recognition epitope significantly beyond that of the starting PDZ domain . The crystal structure of this affinity clamp showed a deep binding groove for a larger portion of the target peptide constructed at the domain interface involving extensive contacts between the two domains, rationalizing the exquisite specificity of the affinity clamp (Fig. 1d). A comparison of two crystal structures revealed a considerable difference in the relative orientation of the two domains, explaining different functional properties between the two and also implies that diverse functions can be generated by controlling the mode of interactions between the two domains. These studies experimentally establish evolutionary paths that dramatically enhance the function of single-domain modules.
The ability to control protein's molecular recognition function is the basis for engineering higher-level functionalities such as regulatory circuits and signal sensing. Eukaryotic regulatory proteins are often constructed with discrete modules, each responsible for specific molecular recognition or catalysis . Such regulatory proteins utilize a limited number of module families rather than diverse individual proteins. Comparisons of oncogenes and proto-oncogenes show that alterations in modular architecture through nonhomologous combination and rearrangement can trigger dramatic changes in phenotype, such as carcinogenesis. Examples include BCR-Abl (fusion of dimeric BCR protein to Abl kinase) and v-Src (elimination of a phospho-Tyr site for intramolecular SH2 interaction), both of which result in kinase disregulation [29,30]. These observations in natural proteins suggest the effectiveness of nonhomologous recombination and rearrangement of protein modules in constructing diverse regulatory pathways.
Protein modules and their respective targets are autonomous and portable, offering "mix and match" simplicity in designing this type of proteins, which can be exploited to create synthetic regulatory proteins, a major goal in the field of synthetic biology. For instance, hybrid scaffold/adaptor proteins have successfully designed that redirect the flow of cellular signals by simple mass action [31–33]. Recently, a synthetic signal transduction pathway in plant was constructed using a component of a bacterial histidine kinase pathway, demonstrating a high level of portability of such modular components . Recent studies described below demonstrate that more complex regulatory mechanisms can be designed by covalently linking modules and creating and controlling interactions between them.
Many modular proteins act as regulated switches. Modular proteins can control the activity of an "output" module at the molecular level by two distinct modes of allostery, steric and conformational . Steric allostery physically masks the active site of an output module by employing another module (Fig. 2a). This masking can be relieved by the binding of an effector to the inhibitory input module. Conformational allostery is similar to the traditional model of allostery , where an output module samples active and inactive conformations and the equilibrium between the two states is modulated by other modules connected to the output module (Fig. 2b). Conformational allostery requires an output module whose active site is malleable, such as protein kinase modules .
Building upon their pioneering work on the design of synthetic switches by modular recombination , Dueber et al. constructed switches that exhibit nonlinear responses . These switches utilized multiple copies (up to five) of low-affinity SH3-peptide interactions as negative regulators of the WASP protein. In contrast to the original switch, which showed a linear response with respect to the input (SH3 ligand), the newly designed switches showed strong positive cooperativity with a Hill coefficient as high as 3.9. This study demonstrates that homotropic cooperativity akin to that exhibited by oligomeric proteins (e.g. hemoglobin) can be designed by modular recombination.
Proteins that exhibit conformational allostery are usually multi-domain proteins in which an active site is located at a domain interface. This architecture enables conformational coupling between the active site and an effector site (or among active sites) through domain reorientation. One can construct synthetic allostery by connecting two modules in such as way that their respective functional states are coupled. It is most straightforward to set up such coupling in a mutually exclusive manner. Then, the functional state of the output module can be controlled perturbed by altering the conformational state of the input module by a signal (e.g. ligand binding) (Fig. 2c). Insertion of one protein within another, rather than end-to-end fusion, offers higher probability of successful coupling. This topic has been reviewed recently [38,39]. A bottleneck in constructing this type of conformational coupling is the identification of a proper insertion site, which usually requires screening of a large number of constructs. A facile method based on a transposon to systematically generate random insertion mutations has been developed . From ~50 β-lactamase mutants in which cytochrom b562 was inserted, a variant was identified that confer >100 fold change in antibiotic resistance in response to heme binding. The extreme case of this type of design is mutually exclusive folding, where a protein with a long distance between the N- and C-termini is inserted in a loop of another protein so that folding of the two proteins are negatively coupled (Fig. 2d) . Detailed biophysical and computational studies were performed on the effects of the length of linkers connecting two modules , demonstrating that linkage plays an essential role in the context of mutually exclusive folding.
Light-triggered conformational changes of natural light-sensing domains (a subfamily of the light-oxygen-voltage or LOV domains) have been exploited as an input domain for constructing light-sensitive proteins . The LOV domains are a subfamily of the Per–Arnt–Sim (PAS) domain family that frequently occurs fused to another protein through N- or C-terminal helix. Light activation detaches and unfolds the C-terminal Jα helix. A light-sensitive histidine kinase exhibiting 1,000-fold change in activity has been generated by replacing an oxygen-sensing PAS domain of a histidine kinase with a light-sensitive LOV domain while maintaining the Jα helix-mediated linkage .
A light-controlled transcription factor was engineered by designing a sequence overlap between two proteins so as to create mutually exclusive folding of an α-helix between them (Fig. 2e) . In this fusion construct, the C-terminal Jα helix of another LOV domain (LOV2) and the N-terminal helix of Trp repressor were designed to be partially overlapped, i.e. encoded by the same stretch of polypeptide segment, so that the Trp repressor is functional only when light triggers the detachment and unfolding of the LOV Jα helix. One of 12 fusion proteins constructed in this manner showed light-dependent activity, validating this rational design strategy.
Light-induced conformational change was also used to regulate the enzymatic activity of dihydroforate reductase . In this work, the insertion position was designed based on statistic coupling analysis of family members, which indicated this position as a hot spot linking a distant surface with the enzyme active site. Small light-dependent changes in enzyme activity were observed.
Another light-dependent synthetic signaling system was using a modular design concept similar to rewiring of adaptor proteins . The light-dependent interaction between a plant photosensory domain (phytochrome B) and its binding domain (Pif3) was exploited as the basis for controlling the interaction between the WASP protein and its activator Cdc42. Red illumination induces complex formation of the Cdc42-phytochrome B fusion and the WASP-Pif3 fusion and consequently activates WASP. The activation can be reversed by far-red illumination. This simple but elegant modular strategy should be applicable to many other systems with minimal modification.
Protein interaction switches have been generated by designing a hybrid sequence that encodes overlapping and mutually exclusive binding sites for two unrelated interactions . This strategy is similar to the concept of mutually exclusive folding described above, but the design is based on steric allostery instead (Fig. 2e). Sallee et al. successfully designed three types of such hybrid sequences, (a) hybrid of two modular domains, (b) hybrid of one modular domain and a short peptide motif, and (c) hybrid of two peptide motifs. Biophysical characterization of a type (a) switch constructed from the syntrophin PDZ and the WASP p21-binding domain supported that the switch functions by the steric allostery mechanism.
The mutually exclusive folding principle has been expanded to construct a biosensor using only a single folded domain, rather than two domains, as the fundamental building block . Biosensor engineering usually couple ligand-induced conformational change with readout, and consequently it is difficult to design a sensor from a protein that does not exhibit a large conformational change upon ligand binding. The "alternative folding frame" strategy does not rely on this prerequisite. Instead, it creates a protein containing duplicated segments so that only one of the two fragments is incorporated in the folded protein at a time (Fig. 2f). Hence there is no net change in the total amount of folded protein regardless of which fragment is incorporated. Mutations are introduced to the duplicated segment in such a way that ligand binding alters the equilibrium between the two folding frames, thus creating large ligand-dependent conformational rearrangement suitable for readout. Stratton et al. created a calcium sensor by designing alternative folding frames in calbinding D9k, a calcium-binding domain that does not undergo a large conformational change upon calcium binding. Biophysical analysis supports the designed mode of action. The response half time of this sensor was 5–25 seconds, reflecting relatively slow kinetics due to a large conformational rearrangement. Although this strategy still requires that the duplicate segments do not cause the protein to aggregate and that the folding/unfolding kinetics are reasonably fast, it makes it possible to produce sensors from simple ligand-binding proteins.
Liang et al. discovered that a synthetic allosteric enzyme constructed by recombining β-lactamase and maltose-binding protein acquired an additional mode of allostery in response to zinc binding, even though the two parent proteins are not known to bind zinc . Thus, the recombination event created a new zinc-binding site that preferentially stabilizes the off state of β-lactamase. These results nicely illustrate that module/domain recombination can be an effective route to new function in both molecular recognition and regulation. They also give a cautionary note on interpreting in vivo data on synthetic switches engineered by module recombination, because an unidentified cellular compound might act as a new allosteric effector.
The studies discussed above demonstrate the effectiveness and potential of module/domain-level protein engineering in creating sophisticated functions. To date, design efforts have been focused on generating either new regulatory mechanisms or new molecular recognition functions. Clearly, these are not mutually exclusive goals , and synergistic uses of both would greatly expand our capacity to build new protein functionalities. As such, it is highly likely that this area of research and development will see significant growth in the near future.
I would like to thank M. Biancalana, R. Gilbreth and S. Yan for critical reading of the manuscript. SK was supported by National Institutes of Health grants.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.