|Home | About | Journals | Submit | Contact Us | Français|
This review outlines the use of expressed protein ligation (EPL) to study protein structure, function and stability. EPL is a chemoselective ligation method that allows the selective ligation of unprotected polypeptides from synthetic and recombinant origin for the production of semi-synthetic protein samples of well-defined and homogeneous chemical composition. This method has been extensively used for the site-specific introduction of biophysical probes, unnatural amino acids, and increasingly complex post-translational modifications. Since it was introduced 10 years ago, EPL applications have grown increasingly more sophisticated in order to address even more complex biological questions. In this review we highlight how this powerful technology combined with standard biochemical analysis techniques has been used to improve our ability to understand protein structure and function.
The field of protein engineering has always played a key role in biological science, biomedical research, and biotechnology . The development of recombinant DNA and heterologous expression techniques has allowed for the routine preparation of many proteins. These techniques, however, are limited to the introduction of the 20 naturally genetically-encoded amino acids. In many cases, is desirable the introduction of chemical modifications such as post-translational modifications (PTMs), un-natural amino acids or biophysical probes that are impossible to prepare in by standard ribosomal synthesis. During the last two decades, and thanks to the efforts of numerous chemists, an array of different techniques has been developed for the site-specific modification of proteins. These range from classical bioconjugation techniques  to more sophisticated approaches such as the use of nonsense suppression mutagenesis for example.
The use of chemical ligation techniques has also emerged during the last decade as another powerful approach for the chemical engineering of proteins (see ref.  for a recent and extensive review). Chemical ligation utilizes efficient reactions between unprotected peptides to form a stable peptide bond in a chemoselective way between the α-carboxyl group of one of the peptide fragments and the α-amino group of the other peptide fragment.
A defining point, however, was established when native chemical ligation (NCL) was independently introduced by Kent and Tam in 1994 [3, 4]. In this reaction two fully unprotected peptides, one containing a C-terminal α-thioester group and the other an N-terminal Cys, react chemoselectively under neutral conditions with the formation of a native peptide bond at the ligation site. This type of thioester-based chemistry was first pioneered by Wieland in 1950’s for the synthesis of small Cys-containing peptides [5, 6]. Since its introduction in 1994 NCL has been widely used for the chemical synthesis of a multitude of natural and chemically modified medium-sized proteins (see ref.  for a recent review).
The major strength of NCL is that allows combine the ability of chemical (peptide) synthesis to access any desired modification with the flexibility of recombinant DNA technology to produce any size of protein, thus permitting the semisynthesis of even large proteins. The NCL of a synthetic peptide thioester with a recombinant N-terminal Cys-containing protein was reported by Verdine and co-workers .
The NCL of recombinant polypeptide α-thioesters and synthetic N-terminal Cys-containing polypeptides was first reported in 1998 independently by the Muir and Xu groups independently, and it was named expressed protein ligation (EPL)  and intein-mediated protein ligation (IPL) , respectively
Since its introduction in 1998 EPL has been applied to the engineering of many classes of protein from both eukaryotic and prokaryotic organisms (see refs. [11, 12] for recent reviews). These include kinases, phosphatases, transcription factors, polymerases, ion channels, cytoplasmic and membrane signaling proteins as well as antibodies. A variety of chemical modifications have been introduced into these proteins allowing to answer questions that would be difficult to respond by other means.
Most recently, new types of chemical ligation involving protein splicing are also emerging for the chemical engineering of proteins (see refs. [13, 14] for recent reviews). Intein-mediated protein trans-splicing is based on the use of split inteins that mediate the linking of N- and C-terminal exteins by a native peptide bond in trans with concomitant removal of the intein complex. This naturally-occurring post-translational modification is a self-processing event that only requires the polypeptide fragments to be linked to be fused to a split intein thus providing a extremely powerful technique for chemical modification of proteins in vitro and also in vivo . The use of protease-catalized protein splicing has also been introduced recently for the chemical engineering of proteins .
The goal of this review is to provide a recent overview of EPL and its applications, with a particular emphasis in those involved in the study of protein structure function and stability. These include introduction of complex protein modifications (such as lipidation or glycosylation), protein immobilization and biosynthesis of topologically altered proteins, among others.
Native Chemical Ligation (NCL) is an exquisitely specific ligation reaction that has been extensively used for the total synthesis, semi-synthesis and engineering of different proteins [15–18]. In this reaction, two fully unprotected polypeptides, one containing a C-terminal α-thioester group and the other a N-terminal Cys residue, react chemoselectively under neutral aqueous conditions with the formation of a native peptide bond (Fig. 1). The initial step in this ligation involves the formation of a thioester-linked intermediate, which is generated by a trans-thioesterification reaction involving the α-thioester moiety of one fragment and the N-terminal Cys thiol group of the other fragment. This intermediate then spontaneously rearranges to produce a peptide bond at the ligation site.
The presence of other sulfhydryl groups from Cys residues within the peptide fragments does not affect the reaction since the trans-thioesterification step is reversible, and only the N-terminal Cys residue contains an α-amino group that reacts irreversibly with the thioester moiety to give the corresponding peptide bond. The speed rate of the NCL strongly depends on the nature of the amino acid present at the C-terminal of the thioester, being Gly the one that makes the reaction faster, while β-branched amino acids make the reaction to proceed more slowly and in lower yields . The nature of the thioester also plays an important role on the efficiency of the reaction, and usually aryl-thioesters are preferred respect to alkyl thioesters .
Several solid-phase methods are available for the chemical synthesis of peptide α-thioesters. The most general uses tert-butoxycarbonyl (Boc) based solid-phase peptide synthesis (SPPS) [3, 21–25]. This approach uses acid-base deprotections to which thioester linkers are stable. However, the final cleavage step typically involves the use of the highly toxic and corrosive anhydrous HF, which is not well suited for the synthesis of phospho- [9, 26] and glyco-peptides [27–29]. The commonly used 9-fluorenylmethoxycarbonyl (Fmoc)–based methodology uses repeated base treatments, which renders this strategy incompatible with thioester linkers. However this approach allows the incorporation of acid-sensitive groups, such as phosphates, carbohydrates and prenylated moieties . Consequently, several technologies have been developed to allow the synthesis of peptide α-thioesters by Fmoc-based SPPS [30, 31]. Although, none of the techniques is as robust as Boc-based SPPS, the use of safety-catch linkers (or “masked thioesters”) is quite promising [27, 32–35].
N-Cys peptides can be chemically synthesized by routine SPPS using either Boc- or Fmoc-based SPPS [7, 30]. It should be noted, however, that the use of side-chain protecting groups that generate formaldehyde, such as the benzyloxymethyl (Bom) or tert-butyloxymethyl (Bum) groups, may give rise to alkylation of the N-terminal Cys residue to produce a NCL unreactive thiazolidine . The thiazolidine group can be removed, however, by treatment with methoxamine to yield the free the N-terminal Cys residue .
Despite the simplicity and robustness of NCL, which has resulted in its widespread application , this extremely powerful technology is limited at present to the synthesis of small to moderately sized proteins, mainly because of the current restriction of SPPS to peptides of ≈60 amino acids in length and the difficulties associated with performing multiple NCL ligation steps.
A way to overcome this size limitation is combining the NCL approach with recombinant protein production. This can be accomplished via two different approaches. First, a synthetic peptide thioester can be reacted by NCL with a recombinant protein with an N-terminal Cys residue , which allows the introduction of chemically modified peptides at the N-terminus of recombinant proteins. The other possibility involves the ligation of a synthetic N-terminal Cys peptide with a recombinant protein α-thioester protein. The latter approach was first reported in 1998 [9, 10, 38, 39] and was termed expressed protein ligation (EPL) or less frequently, intein-mediated protein ligation.
Recombinant protein α-thioesters can be obtained by using engineered inteins [9, 16, 38, 39]. Inteins are self-processing domains which mediate the naturally occurring process called protein splicing  (Fig. 2). Protein splicing is a cellular processing event that occurs post-translationally at the polypeptide level. In this multi-step process an internal polypeptide fragment, called intein, is self-excised from a precursor protein and in the process ligates the flanking protein sequences (N- and C-exteins) to give a different protein. The current understanding of the mechanism is summarized in Figure 2A and involves the formation of thioester/ester intermediates . The first step in the splicing process involves an N→S or N→O acyl shift in which the N-extein is transferred to the thiol/alcohol group of the first residue of the intein. After the initial N→(S/O) acyl shift, a trans-esterification step occurs in which the N-extein is transferred to the side-chain of a second conserved Cys, Ser or Thr residue, this time located at the junction between the intein and the C-extein. The amide bond at this junction is then broken as a result of succinimide formation involving a conserved Asn residue within the intein. In the final step of the process, a peptide bond is formed between the N-extein and C-extein following an (S/O)→N acyl shift (similar to the last step of NCL, see Fig. 1A). Mutation of the conserved Asn residue within the intein to Ala blocks the splicing process in midstream thus resulting in the formation of an α-thioester linkage between N-extein and the intein  (Fig. 2B). This thioester bond can be cleaved using an appropriate thiol through a trans-thioesterification step to give the corresponding recombinant polypeptide α-thioester.
Several modified inteins are commonly used for this purpose and many are commercially available as E. coli expression vectors [41, 42]. One of the most generally useful inteins is the Mycobacterium xenopy DNA gyrase (Mxe GyrA) intein. This intein has shown several important features for this purpose: 1) It is relatively small (198 amino acids) and can be expressed very efficiently in E. coli; 2) It does not have special sequence preferences for the last residues of the N-extein fragment; 3) The thiolysis reaction can be performed in the presence of detergents [43, 44], and small amounts of denaturing agents  and organic solvents , and; (4) The GyrA intein can be efficiently refolded thus allowing the recovery of intein-fusion proteins from E. coli incusion bodies .
The introduction of N-terminal Cys residues into expressed proteins can be readily accomplished by cleaving (by proteolysis or auto-proteolysis) the appropriate fusion proteins. The simplest way to generate a recombinant polypeptide containing an N-terminal Cys residue is to introduce a Cys downstream to the initiating Met residue. Once the translation step is completed, the endogenous methionyl aminopeptidases (MAP) removes the Met residue, thereby generating in vivo an N-terminal Cys residue [45–49]. Other approaches involve the use of exogenous proteases. Verdine and co-workers added a Factor Xa recognition sequence immediately in front of the N-terminal Cys residue of the protein of interest . After purification, the fusion protein was treated with the protease Factor Xa, which generated the corresponding N-terminal Cys protein. Tolbert and Wong have also showed that the cysteine protease from tobacco etch virus (TEV) can be used for the same purpose . This protease is highly specific and it can be overexpressed in E. coli. Other proteases that cleave at the C-terminal side of their recognition site, like enterokinase and ubiquitin C-terminal hydrolase, could be used for the generation of N-terminal Cys residues as well. Proteins with N-terminal Cys can be also obtained by the convenient modification of expression vectors with the putative thrombin cleavage site LVPRG to LVPRC . More recently, Hauser et al  have used the N-terminal pelB leader sequence to direct newly synthesized fusion proteins to the E. coli periplasmic space where the corresponding endogenous leader peptidases [53, 54] can generate the desired N-terminal cysteine-containing protein fragment.
Finally, protein splicing can also be engineered to produce recombinant N-terminal Cys-containing polypeptides. Several inteins have been already mutated in such a way that cleavage at the C-terminal splice junction (i.e. between the intein and the C-extein, see Fig. 2B) can be accomplished in a pH- and temperature-dependent fashion [10, 55, 56].
The initial scope of EPL was the site-specific introduction of chemical modifications at the C-terminus of recombinant proteins. Since then, this technology has been used for a variety of novel purposes, including the incorporation of novel chemical groups to evaluate novel bioconjugation techniques [57–60], protein immobilization on solid-supports [61, 62], polypeptide backbone cyclization [47, 63–65], incorporation of non natural amino acids [43, 66] and optical probes [67, 68], isotopic-editing [44, 69–72], and semi-synthesis of prenylated proteins [73–75], among others. In this section we will highlight just a few representative examples that serve to illustrate the power of this technology to allow the detailed analysis of protein function and structure.
Many experimental approaches in biology and biophysics as well as applications in diagnosis and drug discovery require proteins to be immobilized on solid substrates. Immobilized proteins are instrumental in identifying protein-protein, protein-DNA, and protein-small molecule interactions; they can also be used for a variety of diagnostic and profiling purposes (see ref  for a recent review).
Although enormous progress has been made in immobilizing DNA onto different types of solid supports, the immobilization of proteins has been a particularly challenging task, mainly due to the heterogeneous chemical nature of proteins and the marginal stability of the native, active tertiary structure over the denatured, and inactive random coil structure.
In 2003 we reported the use of EPL for the covalent and site-specific attachment of recombinant protein onto glass surfaces . In this work, C-terminal α-thioesters of two fluorescent proteins (EGFP and DsRed) and the c-Crk SH3 domain were immobilized onto an N-terminal Cys-coated glass slide (Fig. 3). The reaction was highly selective allowing the covalent immobilization of folded and biologically active proteins through their C-termini.
Yao and co-workers have also used NCL and EPL, for the selective immobilization of N-terminal Cys-containing polypeptides  and proteins  onto α-thioester-coated glass slides. In this case, the polypeptide–proteins were site-specifically immobilized through their N-termini, which may be convenient in cases where the C-terminal immobilization, described earlier, affects the activity of the protein.
EPL has been also used for the site-specific introduction of reactive groups such as alkynes [59, 78] and azides [58, 60], that can be used later for the site-specific immobilization of proteins onto modified solid supports using Staudinger and/or Cu(I)-catalized Huisgen 1,3-dipolar cycloaddition reactions (see references [1, 79] for recent reviews).
Yao and co-workers have also used EPL for the site-specific biotinylation of proteins in order to be selectively immobilized onto streptavidin/avidin-coated solid supports [80, 81]. This reaction was performed by either in vivo or in vitro cleavage of the corresponding modified-intein fusion protein with N-Cys biotinylated peptides. More recently, Beck-Sickinger and co-workers have used a similar approach for the immobilization of the biotinylated aldoreductase AKR1A . Investigation of the kinetic parameters of the immobilized enzyme showed they were comparable to those of the wild-type enzyme in solution and 60–300-fold greater than that of the randomly immobilized enzymes. Furthermore, the enzyme was surprisingly stable. No loss of activity was observed for over a week, and even after 50 days more than 35% of activity was maintained.
A significant number of natural products with wide range of pharmacological activities derive from cyclic polypeptides. In fact, peptide cyclization is widely used in medicinal chemistry to improve the biochemical and biophysical properties of peptide-based drug candidates [83, 84]. Cyclization rigidifies the polypeptide backbone structure, thereby minimizing the entropic cost of receptor binding and also improving the stability of the topologically constrained polypeptide. Among the different approaches used to cyclize polypeptides, backbone or head-to-tail cyclization remains one of the most extensively used to introduce structural constraints into biologically active peptides.
Despite the fact that the chemical synthesis of cyclic peptides has been well explored and a number different approaches involving solid-phase or liquid-phase exist [23, 85–88], the biosynthesis of cyclic polypeptides offers many advantages over purely synthetic methods. Using the tools of molecular biology, large combinatorial libraries of cyclic peptides, may be generated and screened in vivo.
A typical chemical synthesis may generate ≈104 different molecules. It is not uncommon for a recombinant library to contain as many as ≈109 members. The molecular diversity generated by this approach is analogous to phage-display technology. Moreover, this approach takes advantage of the enhanced pharmacological properties of backbone-cyclized peptides as opposed to linear peptides or disulfide-stabilized polypeptides. Also, the approach differs from phage-display in that the backbone-cyclized polypeptides are not fused to or displayed by any viral particle or protein, but remain on the inside of the living cell where they can be further screened for biological activity. The complex cellular cytoplasm provides the appropriate environment to address the physiological relevance of potential leads.
An attractive alternative approach to the biosynthesis of circular polypeptides is the use of an intramolecular version of EPL reaction. The approach employed for the biosynthesis of backbone cyclized polypeptides using EPL is depicted in Fig. 4. The target polypeptide to be cyclized is fused at the N-terminus with a peptide leading sequence immediately followed by a Cys residue, and at the C-terminus with an engineered intein. The N-terminal leading sequence can be cleaved in vitro or in vivo by a proteolytic or self-proteolytic event thereby generating the required N-terminal Cys residue. This Cys residue then reacts in an intramolecular fashion with the α-thioester generated by the engineered intein at the C-terminus thus providing a recombinantly generated backbone cyclized polypeptide. This approach has been used for the in vitro and in vivo biosynthesis of different backbone cyclized polypeptides [48, 63–65, 89].
The demonstration of this biosynthetic cyclization strategy was first reported in vitro and in vivo by Camarero and Muir using the N-terminal SH3 domain of the c-Crk protein as model protein [63, 90]. Iwai and Pluckthum also reported the biosynthesis of a cyclized version of the β-lactamase protein using a similar approach .
More recently, we have applied the same approach for the biosynthesis of cyclotides inside living bacterial cells (Fig. 4) . Cyclotides are small globular microproteins with a unique head-to-tail cyclized backbone, which is stabilized by three disulfide bonds  (Fig. 4). The number and positions of cysteine residues are conserved throughout the family, forming the cyclic cystine-knot motif (CCK)  that acts as a highly stable and versatile scaffold on which hyper-variable loops are arranged. This CCK framework gives the cyclotides exceptional resistance to thermal and chemical denaturation and enzymatic degradation. Moreover, several cyclotides have been found able to cross eukaryotic cell membranes . All these unique properties make them ideal candidates for the development of peptide-based drugs . Our group has recently developed and successfully used a bio-mimetic approach for the biosynthesis of several folded cyclotides inside cells by making use of intramolecular NCL in combination with modified protein splicing units [64, 65] (Fig. 4). This important finding makes possible the generation of large libraries of cyclotides (≈109) for high throughput cell-based screening and selection of specific sequences able to recognize particular biomolecular targets [64, 65].
The development of EPL has helped to overcome some of the size limitations associated with the structural analysis of proteins by nuclear magnetic resonance (NMR). Although observable, NMR signals from large proteins exhibit extreme spectral overlap, which can not be resolved even in 3D- or 4D-NMR spectra . A way to decrease such spectral complexity is to use samples where only selected amino acids are labeled with NMR active nuclei, thus, editing out signals from the rest of the molecule. EPL allows the site-specific introduction of specific NMR-active isotopes within a protein thus facilitating the assignment of resonances from the labeled residues . In one recent example, Muir and co-workers were able to label the N-extein residue located at the intein junction of the GyrA intein with 13C and 15N using EPL . This allowed the 1JNC coupling constant of the amide bond at the N-extein-intein junction to be measured on an active intein. The data indicated this peptide bond was highly polarized thus supporting for the idea that the first step in protein splicing is facilitated, in part, by destabilizing the scissile amide bond. Baransky and co-workers used EPL to introduce 13C-labeled amino acids at the carboxyl terminus of the αsubunit of heterotrimeric G protein . Analysis of the 13C resonances revealed the Gα carboxyl terminus is highly mobile in its GDP-bound state but adopts an ordered conformation upon activation of the subunit. The authors suggest that this conformation change may facilitate the release of the Gα subunit from the G-protein-coupled receptor.
EPL also allows the generation of segmental isotopically labeled proteins for structural studies using NMR. Hence, by using EPL is possible to ligate a uniformly labeled protein fragment to the rest of the unlabeled protein. This approach dramatically decreases the spectral complexity of the sample allowing to analyze the signals from the labeled fragment in the context of the whole protein (Fig. 5).
Allain and co-workers have used an optimized on column EPL protocol in combination with transverse relaxation-optimized NMR spectroscopy  to elucidate the structure of the two C-terminal RNA recognition motifs (RRM3 and RRM4) of the polypyrimidine tract binding protein (PTB). Although preliminary studies showed that these two domains do not interact in the free state, they interact extensively when bound to RNA. EPL allowed the production of PTB constructs where either RRM3 or RRM4 were labeled with 13C and 15N thus allowing the rapid elucidation of the structure of the construct containing both domains. The final structure revealed a large interdomain interface, resulting in a very unusual positioning of the two RRM domains relative to one another. Based on these results, the authors suggest that this unusual structure induces the formation of RNA loops, which could repress splicing by sequestering either a short alternative-exon or a branch point within these RNA loops. The same authors have also used a similar EPL approach to study the intramolecular interactions in two other RRM-containing proteins, heterogeneous nuclear ribonucleoprotein L (hnRNP L) and Npl3 . The results indicated that RRMs of hnRNP L interact, whereas those of Npl3 were independent.
A similar on column procedure has been also reported recently for the preparation of segmentally labeled constructs of the lipoprotein Apoliprotein E3 (apoE)  in order to elucidate its structure. This protein, involved in the catabolism of lipids, contains a 22 kDa N-terminal domain and a 10 kDa C-terminal domain linked by a protease-sensitive hinge region. A potential domain-domain interaction has been hypothesized between the two domains of apoE that seems to regulate its biological functions. Although the structure of the ApoE N-terminal domain in the lipid free state is known, there is no structure available to date for the apoE C-terminal domain and full-length apoE. Since apoE is a 299 residue helical protein, its NMR spectrum is significantly overlapped. To reduce NMR spectral complexity for a complete spectral assignment, the authors produced several segmental labeled apoEs, in which one domain was 13C/15N labeled whereas the other domain was deuterated.
Acess to isotopically labeled proteins has also extended the scope of vibrational spectroscopy for the structural analysis of proteins. An example of this has been recently illustrated by Tatulian and co-workers who used EPL for the generation of a segmental 13C-labeled version of phospholipase A2 (PLA2) in combination with polarized attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy to asses the mode of interaction of this protein with membranes . The use of a segmental labeled PLA2, where only two of the three α-helices of this protein were 13C-labeled, allowed to assign the amide I signals from the labeled α-helices. This information was used to generate structural constraints that established the orientation of the membrane-bound PLA2. This approach is in principle quite general and it is likely to become a fundamental tool for determination and analysis of the structure of membrane proteins, which will undoubtedly provide valuable information on the molecular mechanisms of this important class of proteins.
More recently, EPL has also been used for the incorporation of the positron emitting isotope 18F into leptin . A two-step, site-specific ligation approach was developed for this purpose, in which an aminooxy-reactive group was incorporated at the C-terminus of leptin using EPL. This modified protein was subsequently derivatized with 18F-fluorobenzaldehyde using an aniline-accelerated radiochemical oximation reaction. The modified hormone was shown to be biologically active in vitro and in vivo, and it was applied to positronic emission tomography (PET) imaging in mice lacking a functional leptin gene.
EPL has also made possible the detailed structural study of proteins containing specific post-translational modifications (PTMs). Homogeneous protein samples containing chemically well-defined PTMs are extremely difficult to generate by using standard protein expression methods. PTMs are extremely important in regulating protein function. One of the most common PTM found in eukaryotic cells is phosphorylation of Ser, Thr and Tyr residues. It has been estimated that as many as a third of all mammalian proteins are phosphorylated. In fact, one of the first applications of EPL was to produce a semi-synthetic version of the kinase Csk where one of the C-terminal Tyr residues was site-specificially phoshorylated . This seminal work served to illustrate the enormous potential of expressed protein ligation as a simple and powerful new method in protein engineering to introduce sequences of unnatural amino acids, posttranslational modifications, and biophysical probes into proteins of any size. More recently, EPL has been used to prepare phosphorylated versions of the TFG-β signaling proteins, Smad2 and Smad3 [26, 99, 100]. These proteins are mammalian transcription factors that upon phosphorylation are able to oligomerize and be translocated to the nucleus where they control transcription. The preparation of semi-synthetic Smad proteins specifically phosphorylated at specific Ser and Thr residues has allowed several high-resolution crystal structures to be determined [101–104]. These structural data has lead to a detailed understanding of the interactions underlying the Smad homo- and hetero-oligomerization that involves the activation of this family of transcription factors.
Another type of PTM found in eukaryotic proteins is prenylation. This interesting PTM is found in Rab GTPases. These proteins are members of the class of monomeric GTPases and play a critical role in membrane trafficking. Rabs are anchored to the membrane by PTM with lipids at the C-terminus . Lipid modifications, and in particular prenylation, are extremely challenging due to the problems associated with low solubility and stability of the lipidated protein. The development of effective ligation conditions and synthetic approaches for the production of prenylated proteins is a remarkable achievement (see ref.  for an extensive review).
Waldmann and co-workers have extensively used EPL to prepare mono- and bi-prenylated versions of the small Rab/Ypt protein family member Ypt1 [105, 106]. The co-crystal structures of these proteins with Rab GDP dissociation inhibitor (RabGDI) were determined providing structural insight on how RabGDI is able to extract lipidated-Rabs from membranes to facilitate their translocation to acceptor membranes [107–109]. Another important field where EPL has made important contributions is in the study of the biological effects of PTMs on histones . Histones have flexible N-terminal tails that are heavily post-translationally modified. These modifications regulate both the structure and function of chromatin in transcription, replication, repair and condensation. Both, the position and nature of the modifications control the biological effects, and numerous enzymes have been identified that are able to introduce or remove such modifications .
Several groups have recently used protein ligation techniques for preparing homogeneously modified full length histones. Peterson and co-workers reported the preparation of a histone H3 variant with phosphorylated Ser at position 10. This homogeneously modified histone was then incorporated into nucleosomes arrays  and a series of enzymatic reactions involving the histone acetyltransferase Gcn5. Kinetic analysis revealed that Gcn5 had increased activity on the modified nucleosomes, while the Gcn5-containing SAGA complex was not stimulated by H3 phosphorylation in the context of nucleosomal arrays. By using a similar approach, a histone H4 variant with an ε-acetylated Lys residue at position 16 was also prepared by the same group . The incorporation of this modified histone into nucleosomal arrays inhibited the formation of compact 30-nanometer-like fibers and impeded the ability of chromatin to form cross-fiber interactions. Furthermore, this acetylated histone also inhibited the ability of the adenosine triphosphate-utilizing chromatin assembly and remodeling enzyme ACF to mobilize a mononucleosome, indicating that this single histone modification modulates both higher order chromatin structure and functional interactions between a non histone protein and the chromatin fiber.
McCafferty and co-workers also used chemical ligation to prepare several site-specifically modified histones, including an acetylated and methylated H3 and an acetylated H4 . In this work, the semisynthetic approach to generate the modified histones was extended by adding a desulfurization step after the ligation, thereby converting the Cys residue into Ala, allowing a traceless ligation. The modified histones were fully functional, as evidenced by their self-assembly into a higher order H3/H4 heterotetramer, their deposition into regular nucleosome arrays, and utilization as substrates for histone modifying enzymes.
Muir and co-workers have also developed a traceless semisynthetic approach for the preparation of ubiquitylated and sumoylated histone-derived peptides . This approach was recently extended to the preparation of a modified full-length H2B histone . Ubiquitylated H2B was incorporated into nucleosomes, and it was demonstrated that this modification stimulates intranucleosomal methylation of H3 Lys79 by the methyltransferase of hDot1L. According to the authors, this effect is mediated through the catalytic domain of hDot1L, most likely through allosteric mechanisms. This result demonstrates the direct biochemical evidence of crosstalk between two modifications on separate histone proteins within a nucleosome.
Another important PTM found in eukaryotic proteins is glycosylation. It is estimated that more than half of proteins present in nature are glycosylated. These carbohydrate chains play various essential roles, such as protein folding, cell adhesion, cell differentiation, and tumor metastasis. In contrast to nucleotide and amino acid sequences, the structure of sugar chains is not determined genetically, but solely by the activity of enzymes such as glycosyltransferases. As a result, the carbohydrate structure on the same amino acid sequence is highly variable, which is known as glycoform. Recent advances in the chemical synthesis of glycopeptides has allowed for the first time a reliable way to obtain homogeneous glycopeptides or glycoproteins (see ref.  for a recent review).
EPL and NCL have been extensively used for the preparation of homogenous glycoprotein samples. Berotzzi and co-workers prepared a variant of diptericin, an 82-residue antibacterial glycoprotein produced by insects in response to immunological challenge . Native diptericin exists as a mixture of O-linked glycoforms; one of the simplest of them possesses single GalNAc residues at the two glycosylation sites Thr10 and Thr54. This glycoform was prepared by NCL of two synthetic glycopeptides, each of which was generated Fmoc-based SPPS. The resulting glycoprotein was biologically active in bacterial growth inhibition assays. The same group also succeeded in the synthesis of Lymphotactin, a 93-residue chemokine containing eight sites of O-linked glycosylation, using a similar approach, except the thioester fragment was synthesized by Boc-based SPPS . NCL was also used by Imperiali and co-workers for the preparation of an N-linked chitobiose glycoprotein analogue of Im7, an 87-residue protein . Im7 is a member of a family of four homologous E colicin immunity proteins, the function of which is to inhibit the endonuclease domain of their specific bacterial colicin toxin. Im7, while not naturally glycosylated, was used as model system for the study of the effects of glycosylation on protein folding and stability. The reported results indicated that the folding mechanism of the glyosylated Im7 variant was not significantly altered over the unglycosylated analogue.
The use of EPL was first reported by Tolbert and Wong for the ligation of a 392-residue intein-generated α-thioester and N-Cys dipeptide functionalized with a single N-acetylgluocasmine residue . EPL was also used for the semi-synthesis of GlyCAM-1, a mucin-like glycoprotein composed of 132 residues and that functions as a ligand for L-selectin .
An interesting example of using EPL for the incorporation of non natural amino acids onto proteins to explore their structure and function was reported by Muir and coworkers for the semi-synthesis of a potassium channel analog with a D-amino acid located at the selectivity filter [43, 66]. Potassium channels are integral membranes, which permit the rapid and selective conduction of potassium ions across biological membranes. The recent elucidation of the crystal structures of the bacterial potassium channel KcsA has led to unprecedented insights into the basis of ion selectivity versus sodium and other cations. The selectivity filter of KcsA includes Gly77, which upon mutation to Ala results in functional loss. Gly77 exists in a left-handed helical conformation. However, its precise contribution to potassium channel function had been unclear. It was considered that a D-amino acid at this position would maintain the left-handed helical conformation and that the Gly is effectively serving as a D-amino acid surrogate. To test this hypothesis a KcsA analog with Gly77 replaced by D-Ala was prepared using EPL . The crystal structure of this analog revealed that in the presence of high [K+], the D-Ala replacement had not effect on the structure of the filter, which was expected. Interestingly, the D-Ala containing selectivity filter remained in the conductive conformation even at low [K+], and was able to conduct Na+ in the absence of K+ ions . However, the channel was still completely selective for K+ in the presence Na+. The same authors also explored the replacement of a native peptide bond (Tyr78-Gly79) within the selectivity filter with an ester bond . This subtle replacement is nearly isosteric but is expected to result in a reduction of the electronegativity of the carbonyl group by ≈50%. The crystal structure of this modified channel was studied under different conditions. However, the isosteric ester substitution did not have a significant effect on the backbone structure of the selectivity filter, but as anticipated, reduced significantly ion density at that particular location with the selectivity filter. These studies demonstrate how nature can use Gly in lieu of a D-amino acid in a protein to achieve a desired structure-function relationship. The semi-synthesis of the different analogs of this integral membrane protein and their assembly into a functional tetrameric form represents a technical milestone in the protein chemistry arena.
Perhaps of all of the potential chemical modifications of a protein, fluorescent labeling has been one the most widely used in biological research. Nature provides two natural fluorescent amino acids, Tyr and Trp, although Trp is the more sensitive and by far the most used to monitor protein folding transitions and ligand binding events. However, large proteins usually contain more than one Trp residue, which reduces the spectral resolution of the analysis. Protein ligation can expand the repertoire of fluorescent probes that can be introduced into proteins. There have been a large number of fluorescent-labeled proteins prepared by EPL [18, 121]. Most of these proteins have been used to study ligand binding events, either by change in the fluorescence of a single probe or by using fluorescence energy transfer (FRET) between donor and acceptor probes introduced site-specifically in the protein of interest.
An example of the first approach was reported by Muir and co-workers , who combined expressed protein ligation (EPL) and in vivo amino acid replacement of tryptophans with 7-azatryptophan (7AW, a tryptophan analogue) to produce a 7AW-labeled SH3 domain from the c-Crk-I adaptor protein. This was accomplished using a Trp auxotroph E. coli strain. The 7AW is isosteric with tryptophan, but its fluorescence excitation and emission properties are red-shifted. Chemical ligation of the 7AW-labeled SH3 domain to the c-Crk-I Src homology 2 (SH2) domain, via EPL, generated the multidomain protein, c-Crk-I, with a domain-specific label. Use of this non-invasive optical probe allowed the equilibrium stability and ligand-binding properties of the SH3 domain to be unambiguously studied in the context of the full-length protein. Lorsch and co-workers have also used EPL for the site-specific labeling of elF1A and elF1 with fluorescein and rhodamine [122, 123]. These two translation initiation factors are required for the formation of the 43S mRNA-ribosomal subunit complex. Using a combination of fluorescence anisotropy and FRET measurements it was revealed that elF1 and elF1A are close to each other when initially binding to the ribosome but elF1 dissociates after start codon recognition .
Cole and co-workers have also used EPL in combination with FRET spectroscopy to study serotonin N-acetyltransferase . Serotonin N-acetyltransferase [arylalkylamine N-acetyltransferase (AANAT)] is a key circadian rhythm enzyme that drives the nocturnal production of melatonin in the pineal gland. EPL was used for the generation of fluorescent versions AANAT and the protein 14–3–3ζ, in order to develop a rapid fluorescence-based assay to study the AANNAT-14–3–3ζ interaction . EPL was also used to generate doubly fluorescently labeled AANAT that could be used to assess the stability of this protein in a live cell using a real-time assay by fluorescence resonance energy transfer measured by microscopic imaging .
More recently, Zheng and co-workers  have used EPL to site-specifically label the histone acetyltransferases (HATs) PCAF and p300 with FRET quenchers. These labeled proteins were used to develop a novel assay for the identification and characterization of HAT inhibitors using both FRET and fluorescence polarization. HATs are an important class of epigenetic enzymes involved in chromatin restructuring and transcriptional regulation. This strategy should be useful in the search of new anticancer drugs that target the substrate interfaces of the HATs, as well as to find values in mechanistic study of HATs.
The exciting field of protein engineering has been dramatically enhanced since the capability of producing recombinant α-thioesters by modified inteins was combined with NCL. This new set of technologies has been applied to many problems in biochemistry and biophysics. The fundamental strength of this technology is that it allows the preparation of homogeneous samples of proteins with site-specific chemical modifications on scale sufficient to be studied by standard analytical methods. In principle any chemical modification can be introduced into proteins using EPL, as long as the final product is stable. Careful optimization of the conditions employed during the ligation reactions allows the preparation of extremely challenging molecules such as membrane proteins (potassium channel KcsA) or lipidated proteins (prenylated Rabs), for example. In this review we have tried to show a representative sample of several applications of EPL to study protein structure and function, however the number of applications is starting to grow exponentially. For example, EPL shows special promise in the area of nanotechnology , were this mild ligation approach could be used for the site-specific immobilization of proteins onto nanoparticles  and solid supports . Another field where EPL would prove increasingly useful is the development of complex protein-based therapeutics. For example EPL have been used for the in vivo and in vitro synthesis of several cyclotides [64, 65]. Cyclotides are extremely stable micro-proteins that show special promise for the development of a new type of protein-based therapeutics . EPL will also continue to be used for the incorporation of increasingly complex PTMs, for example with polysaccharides such as GPI-anchors , or proteins with multiple modifications.
In summary, in spite of continuing challenges, the remarkable developments in protein semi-synthesis over the past decade assure a bright future ahead for the role of EPL in the protein engineering challenges of this century.