|Home | About | Journals | Submit | Contact Us | Français|
The field of protein engineering has always attracted a multidisciplinary cast of characters ranging from biologists to organic chemists. Consequently, a variety of different protein engineering techniques have been described which encompass the purely synthetic and the purely biosynthetic. Of these, heterologous expression of recombinantly cloned genes is by far the most commonly employed route to engineered proteins — in part because the approach is readily accessible to most researchers. Although site-directed mutagenesis is facilitated by the ‘replicative fidelity and efficiency of living organisms’ , this of course also constrains the approach to the naturally occurring amino acids. This limitation is largely overcome by elegant chemistry-driven approaches, such as unnatural amino acid mutagenesis  and protein total or semi-synthesis via the chemical [3–9] or enzymatic [10–13] ligation of unprotected peptides. These approaches are, however, technically demanding, a feature which has prevented their widespread use by the biological community. Recently, a novel protein semi-synthesis approach was reported [14–16] that harnesses the phenomenon of protein splicing (for review see ), and in the process renders proteins harboring unnatural building blocks accessible to most researchers. This short review describes the genesis of this methodology, termed ‘expressed protein ligation’ [14,15], its applications thus far and its future potential in protein engineering.
Expressed protein ligation (EPL) allows synthetic peptides to be chemically ligated to the C terminus of recombinant proteins through a normal peptide bond. The approach arises from the unexpected convergence of two areas of research which, at first glance, appear unrelated: protein total synthesis  and protein self-splicing . The total chemical synthesis of proteins has in recent years been dominated by the use of powerful fragment condensation approaches that allow fully unprotected peptide building blocks to be regioselectively pieced together in aqueous solution . Unlike full-length proteins, peptide fragments can be efficiently assembled by stepwise solid-phase peptide synthesis (synthetic poly-peptides of up to ~50 residues are now readily accessible ). Moreover, by using unprotected peptides these modern ligation approaches overcome the solubility and characterization problems associated with classical convergent synthesis (reviewed in ).
Among the various chemical and enzymatic ligation methods in use, so-called ‘native chemical ligation’  has proven the most general route to fully synthetic proteins [22–27]. As illustrated in Figure 1a, native chemical ligation relies on the highly efficient reaction that occurs between a peptide possessing an N-terminal cysteine residue (peptide 2; Figure 1a) and a second peptide possessing an αthioester moiety (peptide 1; Figure 1a). Following an initial intermolecular, and important chemoselective reaction, the resulting thioester-linked intermediate spontaneously rearranges via an S→N acyl transfer to give the final amide-linked product. As was pointed out in the initial description of the technique , the C-terminal segment could be expressed using recombinant DNA methods; provided the polypeptide contained an N-terminal cysteine, it could be used in native chemical ligation. Indeed, this semi-synthetic version of native chemical ligation has been successfully used to attach an ethylene-diamine-tetraacetic acid (EDTA) analog to the N terminus of a chimeric heterodimer of the transcription factors c-Jun and c-Fos .
Although biotechnology provides the necessary tools required to generate N-terminal cysteine-containing proteins, the same has not been true for the production of recombinant protein αthioesters. Thus, native chemical ligation could not be fully interfaced with recombinant protein biotechnology. It is at this point that we must introduce protein splicing into our story, as this phenomenon provides the final piece of the puzzle. Protein splicing, first detected in yeast nearly a decade ago , is a post-translational event in which a proprotein undergoes an intramolecular rearrangement resulting in the extrusion of an internal sequence (termed an intein ) and the concomitant joining together of the flanking sequences (exteins) to form an edited version of the original protein. Unlike RNA splicing, which requires the help of an elaborate ribonuclear protein complex, protein splicing is an intrinsic process and can take place in vitro using purified proproteins . This has allowed the mechanism of the process to be studied in some detail using both mutagenesis and protein chemistry based techniques [31–33].
Our current understanding of the mechanism of protein splicing is summarized in Figure 1b. The first step involves an N→S or N→O acyl shift in which the N-extein is transferred to the sidechain sulfhydryl/hydroxyl group of the first residue in the intein — a cysteine, serine or threonine residue is always located at this position. This rearrangement would appear to be energetically unfavorable and is essentially the reverse of the final step in native chemical ligation (Figure 1a). Although the driving force(s) behind this event is (are) still the subject of study, it is interesting to note that in the recently solved crystal structure of the GyrA intein, the amide bond formed between the final residue in the N-extein and the first residue in the intein is observed to be in the higher energy cis conformation . It is possible that this unusual stereochemical constraint helps push the equilibrium of the system towards the ester/thioester side. Given that all inteins are likely to share a common fold, it is intriguing to speculate that this type of stereochemical ‘leg-up’ may be a general feature of protein splicing.
After the initial N→(S/O) rearrangement there follows a trans esterification step in which the N-extein is transferred to the sidechain of a second conserved cysteine, serine or threonine residue, this time located at the intein–C-extein junction. In the next step, the amide bond at the intein–C-extein junction is broken as a result of succinimide formation involving the conserved C-terminal asparagine residue within the intein. In the final step of the process, a peptide bond is formed between the N-extein and C-extein following an (S/O)→N acyl shift —identical to the final step in native chemical ligation.
The uncanny chemical similarities between native chemical ligation (Figure 1a) and protein splicing (Figure 1b) raises an interesting question: is it possible to use protein splicing as a route to the formation of recombinant protein αthioesters for use in native chemical ligation? This question is further highlighted following the recent clever demonstration by Xu and coworkers that protein splicing can be halted in midstream (presumably at the thioester/ester intermediate stage) by mutation of the key asparagine residue in the intein to an alanine residue . Indeed, an engineered intein of this type now forms the basis of a commercially available protein expression/purification strategy (New England Biolabs).
Now with all the pieces in place, recent months have seen the publication of a series of papers [14–16] that pull together protein splicing and native chemical ligation into a general new protein semi-synthesis approach. The principle of the method, referred to as expressed protein ligation (EPL) [14,15], is shown in Figure 1c. In the first step, the protein or protein fragment of interest is expressed as an intein–CBD (chitin binding domain) fusion protein, where the CBD allows the protein of interest to be affinity purified using chitin beads. Because an engineered intein is being used, protein splicing is unable to proceed to completion and is thus stalled at the stage involving the thioester-linked intermediate. Consequently, exposure of the immobilized fusion protein to an aqueous solution containing of an N-terminal cysteine-peptide and a thiol reagent (e.g. thiophenol) results in native chemical ligation of the peptide to the protein. The nature of the thiol agent used is of key importance to this process as it dictates the reactivity of the protein αthioester generated in situ during the reaction. Thus, thiophenol is a good choice as phenyl αthioesters are known to be extremely reactive in native chemical ligations .
Although the three reports to date all use more or less the same procedure shown in Figure 1c, they each illustrate a different potential application of EPL. The first study makes use of the commercially available Saccharomyces cerevisiae (Sce) VMA intein (Asn454→Ala) and demonstrates the utility of the approach for site-specific modification of recombinant proteins . EPL was used to test the hypothesis that addition of a phosphotyrosine tail to the C terminus of the protein tyrosine kinase Csk (C-terminal Src kinase) would, as in the Src family members [36,37], result in an intramolecular association between the phosphotyrosine and the Src homology 2 (SH2) domain within the protein. Because Csk is not naturally tyrosine phosphorylated at this position and contains 20 tyrosine residues within its sequence, the desired 450 amino acid phosphoprotein could not be generated using standard biotechnology techniques. Thus, EPL was used to ligate a short phosphotyrosyl-peptide to the C terminus of Csk, the reaction proceeding in essentially quantitative yield. Biochemical analysis of the semi-synthetic tail-phosphorylated Csk construct indicated the presence of an intramolecular phosho-tyrosine–SH2 interaction. Interestingly, the Csk containing the phosphorylated tail appears to have increased kinase activity towards the physiologically relevant substrate, Lck, compared with the nonphosphorylated version of the protein .
A second application of EPL, also using the commercially available Sce VMA intein, illustrates how the technique could prove extremely useful in studying complex biochemical machines. In this study, a fully functional semi-synthetic version of Escherichia coli RNA polymerase was prepared . EPL was used to ligate a synthetic peptide corresponding to residues 568–600 of the σ70 specificity factor to a recombinant fragment corresponding to the remainder of the σ70 protein (residues 1–566). The resulting ~600 amino acid construct was shown to be fully active in initiating transcription when added to the other subunits of E. coli RNA polymerase, and was used to map the binding region of the bacteriophage T4 antisigma protein (AsiA) to the C-terminal region of the σ70 subunit. The ability to introduce biochemical or biophysical probes into large multiprotein complexes, such as RNA polymerase, in a site-specific manner may provide new ways to study the structure and function of these complex biochemical machines. Along these lines, recent studies from our laboratory indicate that EPL can be used to introduce fluorescent probes into the preassembled ~500 kDa E. coli RNA polymerase complex in a site- and subunit-specific manner (TWM et al., unpublished data).
In a third application, EPL has been used to prepare fully active semi-synthetic versions of bovine pancreatic ribonuclease (RNase A) and the restriction endonuclease, Hpa I, from Haemophilus parainfluenzae . Both of these proteins are potentially cytotoxic to bacteria when expressed in their full-length forms, making their over-expression inefficient. The use of EPL, however, allowed inactive N-terminal fragments of the proteins to be safely expressed and then ligated in vitro to short synthetic peptides corresponding to the remainder of the proteins. Following refolding, the full-length semi-synthetic proteins were shown to have enzymatic properties indistinguishable from their natural counterparts. Interestingly, these semi-synthetic proteins were prepared using a slightly modified version of EPL as compared to the studies described above: a genetically engineered Mycobacterium xenopi (Mxe) GyrA intein was used rather than an Sce VMA intein, and 2-mercaptoethanesulfonic acid was used rather than thiophenol. Use of this intein/thiol combination was found to give higher yields of semi-synthetic product for these two examples. This suggests, as the authors point out, that for some targets more than one intein system may have to be tried to obtain optimal ligation results.
The recent observation that protein splicing can be triggered in vitro by simply reconstituting inactive N- and C-terminal fragments of an intein [38–41], adds an extra dimension to the use of splicing in protein engineering. So-called trans-splicing is made possible because of the modular composition of the intein element; of the ~40 inteins identified to date all but two (the Porphra pur-purea DNaB intein  and the Mxe GyrA intein ) contain a homing endonuclease protein module sandwiched between the functionally important flanking regions of the intein. Indeed, mutagenesis and sequence alignment studies have shown that this entire endo-nuclease domain can be deleted from the intein without destroying splicing activity in the resulting minimal inteins [40,43]. This finding suggested that it might be possible to obtain a folded, functional intein from two noncovalently associated intein fragments. This has in fact been shown to be the case in two systems: the Psp Pol-1 intein  and the Mycobacterium tuberculosis RecA intein [39,40]. In the process of trans-splicing, the two proteins to be joined are expressed as constructs with one of a pair of complementary intein fragments. Bringing these two proteins together in vitro under suitable conditions results in splicing (see Figure 2).
As illustrated in Figure 2, the real attraction of trans-splicing from a protein engineering point of view, is that it allows recombinant proteins from two different sources to be ligated together in vitro. As with EPL, this feature may prove to be extremely useful for obtaining cytotoxic proteins impossible to generate using standard approaches. An additional use of trans-splicing was recently reported by Yamazaki and coworkers who used fragments of the Pyrococcus furiosus PI-PfuI intein to link together two domains of the α subunit of E. coli RNA polymerase . This system allowed the group to express one half of the α subunit in 15N labeled media and the other in standard media. Consequently, the product obtained after ligation was partially labeled with 15N allowing this domain to be selectively detected using heteronuclear two-dimensional nuclear magnetic resonance (2D NMR) experiments. As demonstrated by this work, trans-splicing may have enormous implications for the study of large multidomain proteins by high-field NMR spectroscopy.
One of the most important features of EPL and (potentially) trans-splicing is that they are technically easy to perform, a basic knowledge in recombinant protein expression is really all that is required. Moreover, in the case of EPL the necessary expression vector is commercially available and customized peptides containing N-terminal cysteine residues can be obtained from a variety of sources. How then can these approaches be used to complement the type of information already available through standard biotechnology techniques? In the case of EPL, the ability to incorporate biochemical probes (e.g. cross-linkers), biophysical probes (e.g. fluorophores and spin labels) and post-translational modifications (e.g. phosphorylated and glycosylated residues) into large proteins and protein complexes in a site-specific manner will doubtless be of enormous value in the study of these systems. Equally, the ability to obtain cytotoxic proteins either by EPL or trans-splicing will allow such proteins to be generated in sufficient quantities for rigorous structural and functional studies. Segmental isotope labeling, which has also been successfully performed by EPL of labeled and unlabeled recombinant proteins (TWM et al., unpublished data), represents an exciting application of these technologies in the field of structural biology.
As pointed out earlier in this article, methods already exist for the generation of recombinant proteins containing an N-terminal cysteine residue. With the advent of EPL, native chemical ligation reactions can now be performed at either end of a recombinant protein. This of course allows two recombinant proteins to be directly ligated together (similar to trans-splicing), but it also opens the way for the insertion of synthetic peptides into recombinant proteins. The most obvious way to achieve this is to adapt the sequential native chemical ligation strategies already in use for the directed assembly of three or more synthetic peptides [27,44]. The implication of being able to specifically insert small synthetic peptide cassettes into large proteins is that the entire primary structure is made amenable to modification, and not merely the extreme N- or C-terminal regions. It is also worth noting that introduction of both an N-terminal cysteine and a C-terminal thioester within the same polypeptide chain allows intramolecular native chemical ligation reactions to be performed [45,46], a process which has been used to prepare a synthetic circular protein domain . It is easy to conceive how head-to-tail cyclized recombinant peptides and proteins could be obtained in an analogous manner using EPL.
This review began with the statement that protein engineering is a multidisciplinary endeavor, requiring the interactions of many scientific fields. The novel technique of EPL continues in that tradition, combining chemistry and biology to pave new roads towards chemically engineered macromolecules.
We thank Ming-Qun Xu for providing a copy of his manuscript prior to publication and Graham C Cotton and Francine Perler for useful discussions. This work was supported by grants from the National Institutes of Health Grant (GM55843-01; TWM), the Pew Foundation (TWM) and the National Leukemia Research Association (TWM).