|Home | About | Journals | Submit | Contact Us | Français|
The present paper reviews the use of protein splicing for the biosynthesis of backbone cyclic polypeptides. This general method allows the in vivo and in vitro biosynthesis of cyclic polypeptides using recombinant DNA expression techniques. Biosynthetic access to backbone cyclic peptides opens the possibility to generate cell-based combinatorial libraries that can be screened inside living cells for their ability to attenuate or inhibit cellular processes thus providing a new way for finding therapeutic agents.
A significant number of natural products with wide range of pharmacological activities are derived from cyclic polypeptides. In fact, peptide cyclization is widely used in medicinal chemistry to improve the biochemical and biophysical properties of peptide-based drug candidates [1, 2]. Cyclization rigidifies the polypeptide backbone structure, thereby minimizing the entropic cost of receptor binding and also improving the stability of the topologically constrained polypeptide. Among the different approaches used to cyclize polypeptides, backbone or head-to-tail cyclization remains one of the most extensively used to introduce structural constraints into biologically active peptides.
Despite the fact that the chemical synthesis of cyclic peptides has been well explored and a number different approaches involving solid-phase or liquid-phase exist [3–7], recent developments in the fields of molecular biology and protein engineering have now made possible the biosynthesis of cyclic peptides (Scheme 1). This progress has been made mainly in two areas, non-ribosomal peptide synthesis [8–10] and expressed protein ligation/protein trans-splicing [11–16]. The former strategy involves the use of genetically engineered non-ribosomal peptide synthetases and is reminiscent of more established technologies that yield novel polyketides. The later strategy relies on the heterologous expression of recombinant proteins fused to modified intein protein splicing/trans-splicing units .
The biosynthesis of cyclic polypeptides offers many advantages over purely synthetic methods. Using the tools of molecular biology, large combinatorial libraries of cyclic peptides, may be generated and screened in vivo. A typical chemical synthesis may generate 104 different molecules. It is not uncommon for a recombinant library to contain as many as 109 members. The molecular diversity generated by this approach is analogous to phage-display technology. Moreover, this approach takes advantage of the enhanced pharmacological properties of backbone-cyclized peptides as opposed to linear peptides or disulfide-stabilized polypeptides. Also, the approach differs from phage-display in that the backbone-cyclized polypeptides are not fused to or displayed by any viral particle or protein, but remain on the inside of the living cell where they can be further screened for biological activity in an analogous way as the yeast two hybrid technology works . The complex cellular cytoplasm provides the appropriate environment to address the physiological relevance of potential leads.
Protein trans-splicing had been successfully used by Benkovic and co-workers to generate backbone cyclized or polypeptides in vivo . In this approach, the peptide to be cyclized was nested between the two split intein fragments of the naturally occurring Ssp DnaE split intein  (usually referred as N- and C-inteins) in such way that the N-terminus of the peptide template is fused to C-intein fragment and vice versa. Protein splicing of this chimeric protein lead to the formation of the desired cyclic peptide inside E. coli cells. A potential limitation of this approach, however, was the requirement for specific N- and C-extein residues at the intein junction sites . These amino acids were necessary for efficient protein splicing to occur, which restricts the sequence diversity within the sequence of the cyclic peptide.
An attractive alternative approach to the biosynthesis of circular polypeptides was the use of an intramolecular version of Native Chemical Ligation reaction [21–23]. The present paper reviews the use of these processes for the biosynthesis of circular polypeptides (i.e. peptides and proteins) and it will discuss also the potential of these methods for the biosynthesis of cyclic polypeptide libraries inside living cells as a complementary source for the rapid discovery of new therapeutics.
Native Chemical Ligation (NCL) is an exquisitely specific ligation reaction that has been extensively used for the total synthesis, semi-synthesis and engineering of different proteins [22, 24–26]. In this reaction, two fully unprotected polypeptides, one containing a C-terminal α-thioester group and the other a N-terminal Cys residue, react chemoselectively under neutral aqueous conditions with the formation of a native peptide bond (Fig. 1A). The initial step in this ligation involves the formation of a thioester-linked intermediate, which is generated by a trans-thioesterification reaction involving the α-thioester moiety of one fragment and the N-terminal Cys thiol group of the other fragment. This intermediate then spontaneously rearranges to produce a peptide bond at the ligation site. This type of thioester-based chemistry was first discovered by Wieland in 1950’s for the synthesis of small Cys-containing peptides [27, 28].
It is well established that when these two reactive groups, i.e. the C-terminal α-thioester group and the N-terminal Cys residue, are located in the same synthetic precursor, the chemical ligation proceeds in an intramolecular fashion thus resulting in the efficient formation of a circular polypeptide (Fig. 1B). This reaction has been successfully employed for the chemical synthesis of cyclic peptides and small protein domains [3, 5–7].
The discovery of protein splicing and advances in protein engineering have made also possible the introduction of the C-terminal α-thioester group and N-terminal Cys residue into recombinant proteins. These important developments made possible the use of NCL between synthetic and/or recombinant fragments. This technology, called Expressed Protein Ligation (EPL), has allowed access to a multitude of chemically engineered recombinant proteins including biosynthetic circular polypeptides .
Recombinant protein α-thioesters can be obtained by using engineered inteins [25, 29–31]. Inteins are self-processing domains which mediate the naturally occurring process called protein splicing  (Fig. 2). Protein splicing is a cellular processing event that occurs post-translationally at the polypeptide level. In this multi-step process an internal polypeptide fragment, called intein, is self-excised from a precursor protein and in the process ligates the flanking protein sequences (N- and C-exteins) to give a different protein. The current understanding of the mechanism is summarized in Figure 2A and involves the formation of thioester/ester intermediates . The first step in the splicing process involves an N→S or N→O acyl shift in which the N-extein is transferred to the thiol/alcohol group of the first residue of the intein. After the initial N→(S/O) acyl shift, a trans-esterification step occurs in which the N-extein is transferred to the side-chain of a second conserved Cys, Ser or Thr residue, this time located at the junction between the intein and the C-extein. The amide bond at this junction is then broken as a result of succinimide formation involving a conserved Asn residue within the intein. In the final step of the process, a peptide bond is formed between the N-extein and C-extein following an (S/O)→N acyl shift (similar to the last step of Native Chemical Ligation, see Fig. 1A). Mutation of the conserved Asn residue within the intein to Ala blocks the splicing process in midstream thus resulting in the formation of an α-thioester linkage between N-extein and the intein  (Fig. 2B). This thioester bond can be cleaved using an appropriate thiol through a trans-thioesterfication step to give the corresponding recombinant polypeptide α-thioester. The IMPACT expression system, commercially available from New England Biolabs [33, 34], allows the generation of recombinant α-thioester proteins by making use of such modified inteins in conjunction with a chitin binding domain (CBD) for easy purification by affinity chromatography (see Fig. 2B).
The introduction of N-terminal Cys residues into expressed proteins can be readily accomplished by cleaving (by proteolysis or auto-proteolysis) the appropriate fusion proteins. The simplest way to generate a recombinant polypeptide containing a N-terminal Cys residue is to introduce a Cys downstream to the initiating Met residue. Once the translation step is completed, the endogeneous methionyl aminopeptidases (MAP) removes the Met residue, thereby generating in vivo a N-terminal Cys residue [14, 35–38]. Other approaches involve the use of exogenous proteases. Verdine and co-workers added a Factor Xa recognition sequence immediately in front of the N-terminal Cys residue of the protein of interest . After purification, the fusion protein was treated with the protease Factor Xa which generated the corresponding N-terminal Cys protein. Tolbert and Wong also showed that the cysteine protease from tobacco etch virus (TEV) can also be used for the same purpose . This protease is highly specific and it can be overexpressed in E. coli. Other proteases that cleave at the C-terminal side of their recognition site, like enterokinase and ubiquitin C-terminal hydrolase, could be also used for the generation of N-terminal Cys residues.
Protein splicing can also be engineered to produce recombinant N-terminal Cys-containing polypeptides. Several inteins have been already mutated in such a way that cleavage at the C-terminal splice junction (i.e. between the intein and the C-extein, see Fig. 2B) can be accomplished in a pH- and temperature-dependent fashion [41–43].
Proteins with N-terminal Cys can be also obtained by the convenient modification of vectors with the putative thrombin cleavage site LVPRG to LVPRC. Liu et al  successfully generated the Csk and Abl tyrosine kinase domains with N-terminal Cys using this method.
More recently, Hauser et al had used the N-terminal pelB leader sequence to direct newly synthesized fusion proteins to the E. coli periplasmic space where the corresponding endogenous leader peptidases [45, 46] can generate the desired N-terminal cysteine-containing protein fragment .
The approach employed for the biosynthesis of backbone cyclized polypeptides using EPL is depicted in Figure 3. The target polypeptide to be cyclized was fused at the N-terminus with a peptide leading sequence immediately followed by a Cys residue, and at the C-terminus with an engineered intein. The N-terminal leading sequence can be cleaved in vitro or in vivo by a proteolytic or self-proteolytic event thereby generating the required N-terminal Cys residue. This Cys residue then reacts in an intramolecular fashion with the α-thioester generated by the engineered intein at the C-terminus thus providing a recombinantly generated backbone cyclized polypeptide. This approach had been used for the in vitro and in vivo biosynthesis of different backbone cyclized polypeptides.
The demonstration of this biosynthetic cyclization strategy was first reported in vitro by Camarero and Muir in 1999 using the N-terminal SH3 domain of the c-Crk protein as model protein . In this work, the SH3 domain was fused to a modified VMA intein at the C-terminus and to the MIEGRC motif (which contains a Factor Xa proteolysis site) at the N-terminus. After expression in E. coli and purification, the intein fusion protein was treated with Factor Xa protease. This proteolytic step afforded a N-terminal Cys-containing SH3-intein fusion protein, which spontaneously reacted in an intramolecular fashion to yield the corresponding cyclized SH3 domain. The cyclization process was extremely clean and fast, and the resulting cyclic SH3 protein domain was fully active . Intriguingly, this intramolecular process did not require the presence of a thiol cofactor (absolutely necessary to facilitate intermolecular ligation reactions ). This interesting result was explained on basis of the close proximity of both reacting groups in the folded state of the SH3 domain (the N- and C-termini of the natively folded SH3 are located within 6Å), which was able to increase the local concentration of both reacting groups. This effect has been already reported in the cyclization of different small protein domains [5, 7].
Iwai and Pluckthum also reported the biosynthesis of a cyclized version of the β-lactamase protein using a similar approach . In their case, the N-terminal Cys residue was generated in vivo by removal of the initiating Met residue by an endogenous Met amino peptidase. After purification of the N-terminal Cys-containing intein fusion protein at pH 8.0, the cyclization was triggered by addition of a thiol cofactor at pH 5.0. The resulting cyclized protein was found to be more stable against irreversible denaturation upon heating than the linear form.
Based on the high efficiency observed during the in vitro cyclization of the SH3 domain [11, 48], Camarero and Muir used a similar approach to test the possibility of carrying out the cyclization of the SH3 domain inside living cells. For this purpose, the Factor Xa recognition leading sequence in the SH3-VMA intein fusion protein was replaced by a Met residue. During the expression of the resulting fusion protein in E. coli cells, the Met residue was efficiently removed by an endogenous Met aminopeptidase . This in vivo proteolytic event unmasked the N-terminal Cys residue which then reacted in an intramolecular fashion with the α-thioester group induced by the C-terminal engineered VMA intein . Analysis by SDS-PAGE showed that most of the SH3-intein fusion protein (>90%) was cleaved in vivo. Remarkably, when the entire soluble cell fraction was analyzed by reverse-phase HPLC, the expected cyclic SH3 protein and the cleaved intein were found to be the major components in the mixture. It is worth noting that no linear SH3 domain was found in the cellular mixture, suggesting that in vivo hydrolysis of the α-thioester linkage present in the precursor protein was minimal. This work demonstrated the first example of a polypeptide chemical ligation reaction performed in the complex cytoplasmic environment of a living cell, and represents an important milestone in current efforts to generate and screen libraries of cyclic polypeptides inside living cells.
More recently, we have applied the same approach for the biosynthesis of cyclotides inside living bacterial cells (Fig. 3) . Cyclotides are small globular microproteins with a unique head-to-tail cyclized backbone, which is stabilized by three disulfide bonds . The number and positions of cysteine residues are conserved throughout the family, forming the cyclic cystine-knot motif (CCK)  that acts as a highly stable and versatile scaffold on which hyper-variable loops are arranged. This CCK framework gives the cyclotides exceptional resistance to thermal and chemical denaturation and enzymatic degradation. Moreover, several cyclotides had been found able to cross eukaryotic cell membranes . All these unique properties make them ideal candidates for the development of peptide-based drugs . Our group has recently developed and successfully used a bio-mimetic approach for the biosynthesis of several folded cyclotides inside cells by making use of intramolecular EPL in combination with modified protein splicing units [49, 53] (Fig. 3). Our important finding makes possible the generation of large libraries of cyclotides (≈109) for high throughput cell-based screening and selection of specific sequences able to recognize particular biomolecular targets [49, 53].
An alternative approach to EPL for the cell-based biosynthesis and screening of backbone cyclized polypeptides in vivo is the use of protein trans-splicing (Fig. 4). This approach was first reported by Benkovic and co-workers and makes use of Ssp DnaE split intein. Protein trans-splicing is a naturally occurring post-translational modification similar to protein splicing with the difference that the intein self-processing domain is split in two fragments (called N-intein and C-intein, respectively, Fig. 4A) [54, 55]. These two intein fragments are inactive individually, however, they can bind each other with high specificity under appropriate conditions to form a functional protein splicing domain (Fig. 4A). By rearranging the order of the elements of the intein (i.e. N-intein and C-intein, see Fig. 4) the result of the splicing produces a backbone cyclized polypeptide [12, 56]. This methodology, also denominated SICLOPPS (split intein circular ligation of proteins and peptides) has been used to generate several natural cyclic peptides  as well as large genetically-encoded libraries of small cyclic peptides .
It should be noted, however, that these systems require the presence of specific amino acid residues at both intein-extein junctions for efficient protein splicing to occur [12, 16, 58]. In contrast to protein trans-splicing, the only absolute sequence requirements for native chemical ligation is the presence of a N-terminal cysteine. Model studies had also shown that all 20 natural amino acids located at the C-terminus of a polypeptide α-thioester can support ligation . It should be noted, however, the speed rate of the ligation reaction depends on the nature of the amino acid at the C-terminal of the thioester, being Gly the one that reacts faster, while β-branched amino acids react more slowly and in lower yields . Furthermore, the engineered inteins commonly used to generate recombinant polypeptide α-thioesters had been also found to be compatible with most amino acids upstream of the cleavage site . Thus, the EPL approach has been found to be quite general with respect to the sequence of the linear peptide precursor and therefore can generate a more diverse array of backbone cyclized polypeptides in vivo.
Protease catalyzed protein splicing, also known as transpeptidation, is employed in prokaryotes to attach proteins to peptidoglycan in the cell-wall envelope . For example, sortases are transpeptidase enzymes found in most Gram-positive bacteria that are specialized in this task. Among several isomorphs and homologues discovered so far, the Staphyloccocus aureus Sortase A (SrtA)  had been widely employed for protein engineering [63, 64]. SrtA recognizes substrates that contain an LPXTG sequence and catalyzes the cleavage of the amide bond between the Thr and Gly by means of an active-site cysteine (Cys184) residue (Fig. 5A). This process generates a covalent acyl-enzyme intermediate. The activated carboxyl group of the Thr residue then undergoes nucleophilic attack by an amino group of oligoglycine substrates (in S. aureus, a pentaglycine Gly5 cross-bridge on branch lipid II precursor) producing ligated products. In the absence of oligoglycine nucelophiles, the acyl-intermediate is hydrolyzed by a water molecule.
An intramolecular version of the sortase-mediated ligation (SML) can be used for the generation of backbone cyclized polypeptides (Fig. 5B). Bouder and co-workers reported the SrtA-mediated cyclization of EGFP containing the (Gly)3 and LPTEG peptides at the N- and C-terminus, respectively . SML-based protein cyclization had been also reported by other groups for the cyclization of dyhydrolate reductase (DFHR) and pleckstrin homology (PH) domain .
Protease-catalyzed protein splicing has also recently found in animals and plants . For example, recent studies have shown that the biosynthesis of cyclotides in plants (see above) involves an asparaginyl endopeptidase (AEP) catalyzed transpeptidation event . Cyclotides are naturally processed from precursor proteins that have both N- and C-terminal pro-regions. Mutation of a highly conserved Asn residue at the C-terminus of the cyclotide domain eliminates cyclization, as does deletion of the C-terminal Pro-region following this residue . Together, these findings indicate that an endopeptidase with specificity for asparagine residues is likely to be involved in the cyclization process . The protease that cleaves the N-terminal pro-region, however, has yet to be identified. The identification of all the proteins required for the cyclization of cyclotides could provide an alternative method to intramolecular EPL for the biosynthesis of genetically-encoded libraries of cyclotides inside living cells. However, it should be emphasized the simplicity of the EPL-based biomimetic approach, which only uses one single self-processing protein that can easily expressed in any type of cell for rapid screening of genetically-encoded libraries.
The ability to create cyclic polypeptides in vivo opens up the possibility of generating large libraries of cyclic polypeptides. Using the tools of molecular biology, genetically encoded libraries of cyclic polypeptides containing billions of members can be readily generated. This tremendous molecular diversity forms the basis for selection strategies that model natural evolutionary processes. Also, since the cyclic polypeptides are generated inside living cells, these libraries can be directly screened for their ability to attenuate or inhibit cellular processes. In contrast to phage display, where the screening takes place in vitro, screening that takes place in the cytoplasm offers the advantages conferred by a native physiological environment where diverse biochemical events may be examined. In addition, problems resulting from the presence of a fusion tag (in this case the viral particle), in a phenomenon known as template effect, may be circumvented.
Backbone cyclized polypeptides are relatively more stable and more resistant to cellular catabolism than linear polypeptides or disulfide-based cyclic polypeptides. Naturally occurring cyclic peptides are often associated with diverse therapeutic activities ranging from immunosuppression to antimicrobial activity. The stability of backbone cyclized polypeptides displaying certain pharmacological properties suggests that they may be suitable scaffolds on which to graft the molecular diversity of an intracellular library [52, 58].
A number of advances in vivo library generation and screening have recently been reported. Scott and Benkovic used protein trans-splicing to generate the cyclic peptide Pseudostellarin F, an eight amino acid circular peptide with tyrosinase inhibitory activity . The in vivo biosynthesized Pseudostellarin F was fully active and successfully screened in vivo for its tyrosinase inhibitory activity. More recently, cyclic peptide libraries based on the Pseudostellarin F scaffold demonstrated the structural requirements for this system. Apparently, several amino acids positions near the intein-extein junction were critical for expression and cyclization and the authors estimated that 70% of their library produced cyclic products.
Payan and coworkers also used a similar protein trans-splicing approach to generate random cyclic peptides in the cytoplasm of human cells using a retroviral expression vector . Screening of the library for modulation of the IL-4 signaling pathway, led to the identification of several cyclic peptides that selectively inhibited the ε promoter activity. The library was based upon a five amino acid coding strategy, and the potential complexity of the library was about 160,000 members at the amino acids level. Of the 565 clones tested, twenty-three hits were identified as potential therapeutic agents against allergy and asthma, and they may serve in the future as leads for the development of more potent compounds. These results demonstrated for the first time an efficient functional screen for cyclic peptides in vivo in mammalian cells.
More recently, Cheng et al  have also used protein trans-splicing to select backbone cyclized peptides able to inhibit the bacterial ClpXP protease from genetically encoded of octapeptides, where only five residues were randomized. Screening of potential inhibitors was performed in E. coli cells using fluorescence activated cell sorting in combination with a genetically fluorescent reporter. The selected inhibitors had little shared sequence similarity and were able to interfere with unexpected steps in the ClpXP mechanism in vitro. One of the selected cyclic peptides showed antibacterial activity against Caulobacter crescentus, a model organism that requires ClpXP activity for viability.
Tavassoli et al  have also recently used protein trans-splicing for the screening and selection of genetically-encoded libraries of cyclized peptides able to interfere with the Gag-TSG101 interaction, which is required for the release of HIV particles from virus-infected cells. In this case the peptide library (composed of ≈ 106 different octamers) was screened in E. coli cells using a reverse bacterial two-hybrid system (RTHS) [72, 73]. The bacterial RTHS system used a combination of chimeric repressor fusion proteins and promoter sequences to link the disruption of targeted fusion protein heterodimers to the expression of several reported genes that allowed the selection of peptide inhibitors. This approach yielded several cyclic octapeptides that when linked to the cell penetrating peptide (CPP) TAT were able to inhibit the production of virus-like particles in human cells with a reported IC50 of 7 µM.
The ability to biosynthesize backbone cyclic peptides using EPL, protein trans-splicing or SPL has important implications for drug-development efforts. The capability to screen for biochemical events in an environment as complex as the cell’s interior will result in valuable and unique information about potential leads identified by this method. Indeed, peptide-based libraries have been already shown to be effective in producing drug candidates in bacterial as well as mammalian systems [12, 69–71].
In summary, we have reviewed several approaches as well as recent developments on the use of different types of protein splicing for the biosynthesis of circular polypeptides. Protein trans-splicing has revealed itself as a powerful tool for the biosynthesis of circular polypeptides that include small peptides and large proteins [12, 15, 16, 20, 57]. It also has been shown that this approach can be used for the generation of large libraries of circular polypeptide inside living cells where they can be directly screened for biological activity [16, 69–71]. However, it has been shown that specific residues of the native N-extein and C-extein are required for efficient protein trans-splicing to occur with the naturally split DnaE intein . It is conceivable, therefore, that this could restrict or bias the sequence diversity in the corresponding circular peptide libraries generated by this method. Similar sequence requirements are also found in Sortase-mediated intramolecular ligations, where 9 extra residues are required for the Sortase-catalyzed cyclization to take place.
Intramolecular EPL, on the other hand, had also been used successfully for the in vitro and in vivo biosynthesis of cyclic polypeptides [11, 13, 14, 37, 48, 49, 53]. In contrast with the protein trans-splicing approach, EPL is compatible with most of the amino acids at the cyclization site [17, 59], making this approach general with respect of the sequence of the linear polypeptide sequence.
Cyclic peptides represent an exciting new tool for deciphering complex biological processes. As the genomics era unfolds, there will be a continuing demand for innovative strategies to address difficult biological questions, and cyclic peptides offer biologists access to highly diverse molecular libraries for genetic experiments, approaches previously restricted to synthetic chemists. Moreover, methods are being developed for the application of cyclic peptides in a variety of experimental endeavors, providing a toolbox that should benefit evolving genomics-based approaches. In the coming years, scientists will continue unraveling the overwhelming amount of genomic data encoding complex biochemical pathways and protein networks, and cyclic peptides are an attractive complementary tool that can aid this challenging task.
JAC and HS are supported by funding from the School of Pharmacy at the University of Southern California.