Inteins are a class of single turnover enzymes that catalyze a self-excision reaction out of a host protein-intein fusion (the precursor protein), resulting in a free intein and a mature host protein (the extein) where the splice junctions have been ligated with a standard peptide bond. Over 550 inteins have been identified in the InBase database, and they are ubiquitous in single cell organisms from all domains of life (1
). Intein splicing mechanisms are diverse, with three distinct mechanisms identified to date (2
). In each of these mechanisms, a series of three or four carefully coordinated displacement reactions was catalyzed in the absence of additional cofactors or energy sources by the intein plus the extein residue forming the C-terminal splice junction (2
). Coordination of these multistep pathways is likely to require a precise architecture at the intein active site that can be affected by proximal extein residues.
Splicing in native host proteins is so rapid that the precursor protein is rarely, if ever, detected in nature, whereas splicing in heterologous host proteins often results in significant amounts of single splice junction cleavage byproducts or unreacted precursor. As a result, it has been proposed that the native flanking extein sequences are optimal for splicing of each intein (3
). However, inteins are generally located within highly conserved extein motifs and active sites of essential proteins, so presumably there is also selective pressure to maintain the residues at the intein insertion site for optimal extein function (5
). Thus we hypothesize that the native precursor may not necessarily represent the most rapid splicing context for the intein but, rather, a balance of host protein sequence requirements, the specificity of the homing endonuclease (present in many inteins), and the need for a functional intein.
Because of their unique chemistry, inteins have proven useful in numerous applications such as protein tagging and purification, control of enzyme activity, assembling active proteins in vivo
, protein semisynthesis in vitro
, segmental isotope labeling for NMR, and many others (for review, see Refs. 6
). The technologies enabled by inteins are growing, yet there remain some fundamental gaps in our understanding of how inteins work, especially in the specificity requirements for splicing (defined here as the flanking extein sequences that permit splicing). Previous studies characterizing the effect of varying a single amino acid flanking the N- or C-terminal splice junction demonstrated that the subset of extein residues that promote splicing or cleavage are specific to each intein (10
). To our knowledge, no studies have examined the immense number of possible combinations when multiple extein residues are queried in a full splicing reaction. In lieu of such information, researchers design splicing experiments governed by the assumption that the natural host context will be the best and have, therefore, added three to five native flanking extein residues when placing an intein in a novel host protein. This limits the usefulness of these techniques by leaving behind a “scar” of flanking extein residues after splicing. Another approach to improving splicing in heterologous systems uses protein engineering strategies to change or relax intein specificity by intein mutation (4
). Complementary tools that allow us to gauge the full scope of viable substrates of important inteins are highly desirable for understanding inteins as enzymes and expanding their usefulness in biotechnology by providing more options for intein insertion.
Here we present a genetic selection based on kanamycin (Kan)4
resistance that allows for comprehensive examination of all possible combinations of three proximal N-terminal extein residues and three proximal C-terminal extein residues. Unlike previous intein kanamycin selection systems, our system was optimized for maintenance of kanamycin resistance with an extremely broad range of post-splicing six-residue insertions to minimize bias due to the reporter protein. The system was tested with the Nostoc punctiforme
Npu DnaE, the Mycobacterium tuberculosis
Mtu-H37Rv RecA, and the MP-Be DnaB inteins. We further applied this system to study splicing of the Npu DnaE intein in detail. This intein is naturally split and works in trans
but can be fused to splice in cis
as well (16
). The Npu DnaE intein is one of the fastest known inteins (17
) and is important for biotechnology. Our results show that the Npu DnaE intein is able to efficiently splice multiple sequences of far different composition than its native flanking exteins, defining new potential insertion sites preceding Cys-Trp or Cys-Met. These data expand the usefulness of this intein and provide a new framework for predictable design of protein splicing experiments.