|Home | About | Journals | Submit | Contact Us | Français|
The study of biomolecules in their native environments is a challenging task because of the vast complexity of cellular systems. Technologies developed in the last few years for the selective modification of biological species in living systems have yielded new insights into cellular processes. Key to these new techniques are bioorthogonal chemical reactions, whose components must react rapidly and selectively with each other under physiological conditions in the presence of the plethora of functionality necessary to sustain life. Herein we describe the bioorthogonal chemical reactions developed to date and how they can be used to study biomolecules.
Chemists and biologists have begun to share a common interest in developing methods to study biomolecules in their native settings. Their combined efforts have brought forth innovations such as genetically encoded fluorescent proteins (for example, the green fluorescent protein, GFP), whose widespread use and impact was recognized with the 2008 Nobel Prize in Chemistry.[1,2] However, many biomolecules, such as nucleic acids, lipids, and glycans, as well as various posttranslational modifications, cannot be monitored with genetically encoded reporters. A growing area of chemical biology strives to probe these biomolecules in living systems by using bioorthogonal chemical reactions (that is, reactions that do not interfere with biological processes). Such reactions must have fast rates under physiological conditions and be inert to the myriad of functionalities found in vivo (Scheme 1).
Here we provide a historical account of the development of bioorthogonal reactions, starting with their roots in protein bioconjugation. We discuss how unique sequences of natural amino acids have been designed to create orthogonal functionality for selective protein modification within complex samples. We then focus on the development of bioorthogonal transformations involving unnatural functional groups and methods to incorporate these unnatural groups into a variety of biomolecules. We conclude with a discussion of avenues toward new bioorthogonal chemical reactions and applications to unexplored biological processes.
During the past century, the chemical modification of biomolecules has evolved from a means of defining composition to a highly selective method for monitoring cellular events. Proteins, with their numerous side-chain functionalities, complex tertiary structures, and diverse biological functions, were early favorites for chemical modification, initially with the goal of defining their amino acid components. The field of mechanistic enzymology benefited tremendously from the efforts of early protein chemists, as the methods they developed allowed for chemical alteration of side chains implicated in catalysis. Protein modification methods have also been important in the biotechnology industry. Popular applications include PEGylation (modification of proteins with polyethylene glycol (PEG) groups) of therapeutic proteins to improve serum half-life[5,6] and the conjugation of cytotoxins or imaging agents to cancer-targeting elements, such as monoclonal antibodies. The vast majority of examples involve classic residue-specific protein-modification methods, which continue to be vital tools used by chemical biologists.
Classic protein bioconjugation primarily encompasses simple second-order reactions that selectively target the functionalities present in the side chains of the canonical, proteogenic amino acids.[3,8–10] Of those, cysteine and lysine are the most commonly modified residues. The thiol group of cysteine can undergo disulfide exchange to form mixed disulfides (Scheme 2, entry 1) as well as alkylation with alkyl halides or Michael addition with α,β-unsaturated carbonyl compounds to yield thioethers (Scheme 2, entries 2 and 3). Furthermore, as a relatively rare amino acid, cysteine can often be used for single-site modification. Although considerably more prevalent than cysteine, lysine residues are popular targets because of the abundance of methods to selectively modify primary amines. Lysine can react with activated esters, sulfonyl chlorides, isocyanates, or isothiocyanates to afford amides, sulfonamides, ureas, or thioureas, respectively (Scheme 2, entries 4–7). Lysine residues also undergo reductive amination reactions with aldehydes.[3,8–10] It is of note that these reagents can additionally modify the N termini of proteins.
By comparison, the remaining 18 proteogenic amino acids have been minimally exploited for residue-selective modification. The phenol moiety of tyrosine has been modified through electrophilic aromatic substitution reactions with diazonium salts, iodine, or nitrous acid.[3,10] Glutamate and aspartate residues have been targeted for bioconjugation through coupling with amines via carbodiimides,[3,8–10] although the potential cross-linking of proteins limits the utility of this technique. Pyrocarbonates have also been used to successfully modify histidine residues.[3,8–10]
By using these classic methods, the conjugation of small-molecule probes, such as biotin and fluorophores, to proteins is quite routine. Similar methods are widely used to immobilize proteins on chromatography matrices, soluble polymers, plastic surfaces, and microarray chips.
New methods have been developed for the modification of cysteine, lysine, tyrosine, and tryptophan (Scheme 3A). Many of these modern methods involve metal-mediated transformations. Furthermore, the N terminus has emerged as a popular target for protein modification (Scheme 3A). For a more in-depth discussion of chemoselective protein modification methods, the reader is directed to the recent review by Hackenberger and Schwarzer.
While the classic bioconjuation techniques for lysine and cysteine have been widely used, methods for the selective modification of amines and thiols continue to be developed and optimized. McFarland and Francis reported a lysine-specific reductive alkylation reaction that proceeds through an iridium-catalyzed transfer hydrogenation. Unlike the classic reaction based on sodium cyanoborohydride, which requires acidic conditions, the iridium-mediated process proceeds in high yield at pH 7.4 (Scheme 3B, entry 1).
Davis and co-workers have recently developed a two-step method for cysteine modification (Scheme 3B, entry 2). The first step in this procedure is the transformation of cysteine into dehydroalanine by treatment with O-mesityl-enesulfonylhydroxylamine under basic conditions. The dehydroalanine residues then undergo a Michael addition with thiol reagents to yield a thioether linkage. The Michael addition is not stereospecific and, thus, a diastereomeric mixture of modified proteins is produced. Another emerging technique for the modification of cysteine that yields a thioether linkage is thiol-ene chemistry, which involves the addition of a thiol across an alkene by a radical-based mechanism (Scheme 3B, entry 3). The radical species can be generated by standard radical initiators or by irradiation with light, with the latter method displaying greater functional-group tolerance and shorter reaction times. Thiol-ene reactions have been performed on proteins functionalized with either thiols or alkenes, although their use in the direct modification of cysteine residues has yet to be reported, possibly because generation of a protein-associated thiyl radical could lead to unwanted side reactions.
Much recent activity has focused on the modification of tyrosine and tryptophan residues, often by employing transition-metal-mediated processes that are compatible with aqueous conditions. These residues are relatively rare on protein surfaces, and thus offer opportunities for controlled single-site modification. The first example of a metal-mediated modification of tyrosine involved the oxidative coupling of two phenol groups. This method was first explored by Kodadek et al., who used a nickel(II) catalyst and a co-oxidant to cross-link two proteins (Scheme 3B, entry 4). Finn and co-workers have since validated this as a bioconjugation method by coupling biotin and alkyne reagents to tyrosine residues on the capsid proteins of a virus particle. Francis and co-workers have also explored the modification of tyrosine residues through a three-component Mannich reaction with aldehydes and anilines (Scheme 3B, entry 5),[20,23,24] as well as through palladium π-allyl chemistry (Scheme 3B, entry 6).
Additionally, Antos and Francis have developed a bio-conjugation reaction for tryptophan, the rarest amino acid, in which they used a rhodium carbenoid that was generated in situ from [Rh2(OAc)4] and a diazo compound (Scheme 3B, entry 7). However, this reaction requires acidic conditions (pH ≈ 2), which may affect the structure of some protein targets.
The N termimus of a protein has unique pH-dependent reactivity and is thus an attractive target for single-site modification. Its decreased pKa value relative to amino groups on lysine side chains renders selective acylation or alkylation possible, although difficult, in the presence of many competing lysine side chains. However, transamination reactions have been particularly successful for selective modification of the N terminus. Transamination of the N terminus dates back to 1956, when it was attempted by Bonetti and co-workers at 100°C, a temperature that often denatures proteins (Scheme 3C, entry 1a). Almost a decade later, Dixon performed a transamination at room temperature by using glyoxylate, catalytic base, and copper(I), which facilitated imine formation between the N terminus and the glyoxylate group (Scheme 3C, entry 1b).
Even with the improvements made by Dixon, the transamination reaction did not receive considerable attention until Francis and co-workers reported a biomimetic transamination that proceeds under physiological conditions without the need for metal or base additives (Scheme 3C, entry 1c). Their method involves condensation of the N-terminal amine with pyridoxal-5-phosphate and subsequent hydrolysis to result in a pyruvamide. The protein can then be further modified through the ketone of the pyruvamide by reaction with hydrazide or aminooxy reagents (Section 4.1). Extensive characterization revealed that the transamination reaction proceeded best when Ala, Gly, Asp, Glu, or Asn occupied the N-terminal position. This reaction also occurred with many other N-terminal residues, but the yields were variable.
Other chemical methods for N-terminal modification rely on a specific residue at the N terminus. For example, N-terminal serine or threonine residues undergo periodate oxidation to form glyoxylamides (Scheme 3C, entry 2). The aldehyde moiety of the glyoxylamide can then be modified with hydrazide or aminooxy reagents (Section 4.1). A Pictet–Spengler reaction can selectively modify N-terminal tryptophan residues with aldehyde probes (Scheme 3C, entry 3). There is also evidence that this reaction can be used to modify N-terminal histidine residues. The advantage of the Pictet–Spengler reaction is that a carbon–carbon bond is obtained between the probe and protein in a one-step procedure, whereas hydrazide/aminooxy-based methods do not form an irreversible linkage between the probe and protein. Additionally, aldehyde probes can be selectively conjugated to proteins containing N-terminal cysteine residues (as well as Ser, Thr, Trp, His, and Asn) to give various heterocycles (Scheme 3C, entry 4). N-Terminal cysteine residues have also been exploited in the highly successful method of protein modification known as native chemical ligation (Scheme 3C, entry 5).
In 1994, Kent and co-workers reported the ligation of thioesters with N-terminal cysteine residues to give a “native” amide bond, a reaction now termed native chemical ligation (NCL, Scheme 4A). Mechanistically, this transformation involves a rapid equilibration of thioesters that is interrupted by an irreversible intramolecular reaction with the N-terminal amine of the protein (an S-N acyl transfer), ultimately forming an amide bond. The S-N acyl transfer was first discovered by Wieland et al. in 1953, but was not applied as a protein modification technique until much later. NCL can be used to selectively ligate two highly functionalized molecules under physiological conditions without the use of protecting groups. As such, NCL has become a powerful method for the modification, synthesis, and semisynthesis of proteins. Through NCL, proteins larger than the traditional limits of solid-phase peptide synthesis have been generated and studied. Furthermore, NCL has allowed for portions of proteins to be isotopically labeled for structural biology studies[39,40] or for the selective addition of posttranslational modifications and chemical probes.
Many of the applications of NCL have been enhanced by expressed protein ligation (EPL, Scheme 4B) and protein-trans splicing (PTS, Scheme 4C). EPL and PTS both rely on the biological relatives of NCL: self-splicing proteins. Protein self-splicing is a natural phenomenon wherein a domain of a protein, referred to as an intein, is extruded in a posttranslational process that mechanistically mimics NCL. EPL is the ligation of a chemically prepared species (often containing an N-terminal cysteine) with a recombinantly expressed thioester-containing protein. PTS is similar to EPL in that a synthetically generated compound can be joined to a recombinately expressed protein through a native amide bond; however, in PTS the intein is split between the synthetic peptide and the expressed thioester-containing protein, and reassembly of the intein allows for splicing to occur.
EPL exploits inteins as a means to create a C-terminal thioester, which can then be modified through native chemical ligation (Scheme 4B). By using recombinant expression techniques, a desired polypeptide is fused with an intein that has been mutated so that it is unable to undergo S-N acyl transfer. Often a chitin-binding domain is added to the fusion protein on the C-terminal side of the intein to facilitate purification. The expressed fusion protein is isolated on a chitin affinity matrix, and, following the removal of all other proteins, the desired protein is cleaved from the chitin matrix by NCL with a synthetic peptide or other molecule containing an N-terminal cysteine residue.[43,44]
The PTS technique exploits the discovery that inteins can be separated into two polypeptides (IntC and IntN) that, upon noncovalent association, can produce an active intein capable of peptide splicing (Scheme 4C). PTS has allowed for the extension of NCL into living systems, thus facilitating the study of protein–protein interactions, the synthesis of cyclic peptides,[46,47] and the semisynthesis of proteins in vivo. Yao and co-workers have also performed traditional NCL in E. coli to detect overexpressed N-terminal cysteine-containing proteins with thioester fluorophore conjugates. In this study, only the proteins modified with the thioester fluorophore were detected, but there are many natural thioester-containing species that could also react with the proteins of interest, including coenzyme A (CoA) derivatives and polyketide and fatty acid synthases. These naturally occurring thioesters limit NCL primarily to in vitro techniques, such as the preparation of semisynthetic protein samples.
Apart from the N terminus, single-site protein modification is difficult to achieve unless a single cysteine, tyrosine, or tryptophan residue can be engineered on the surface of the protein. Even in those cases, undesirable protein dimerization or solubility problems can result. However, several research groups have now demonstrated that combinations of natural amino acid side chains can create new functionalities that are both unique and armed for selective chemical or enzymatic modification. Such sequences can also be genetically encoded, thereby rendering their incorporation into a specific protein of interest straightforward.
Fluorescent proteins, most notably GFP, have become essential tools for studying the localization, dynamics, and interactions of proteins within live cells and organisms.[50–53] Unfortunately, the large size of these fluorescent proteins (ca. 200 residues) can interfere with the functions of proteins fused to them. This problem motivated Tsien and co-workers to develop an alternative method for labeling proteins with fluorophores that had a smaller genetically encoded tag. They discovered that the tetracysteine motif CCXXCC, when situated in a hairpin structure, reacts selectively with biarsenical-functionalized fluorescent dyes such as FlAsH and ReAsH (Scheme 5A and B, respectively; both shown as their bis(ethanedithiol) (EDT2) adducts).[55–57] Fortuitously, the biarsenical dyes undergo a dramatic enhancement of fluorescence upon binding to the protein. Thus, background fluorescence is low, excessive washing steps are not required, and real-time imaging can be performed.
The biarsenical dyes are not without problems, however. They suffer from nonspecific hydrophobic interactions as well as reaction with other biological thiols, which renders the detection of low-abundance and disperse proteins difficult.[57,58] To overcome these problems, Tsien and co-workers have developed a labeling mixture comprising hydrophobic molecules to compete for nonspecific binding sites and excess ethanedithiol to compete with thiol-containing biomolecules. Additionally, the peptide sequence has been optimized for an enhanced association constant, thus allowing more stringent washing conditions to be used.
These improvements have led to the widespread use of the tetracysteine tag and biarsenical dyes. Applications include studies of mRNA translation, G-protein-coupled receptor activation, amyloid formation, ATP-gated P2X receptors, and transport of HIV-1 complexes. In addition, Schepartz and co-workers used a modified system termed bipartite tetracysteine display to study protein–protein interactions and protein folding. A similar split tetracysteine motif has been employed to study β-sheet formation. Extensions of the method to imaging by electron microscopy, pulse-chase experiments, Western blotting, and affinity chromatography have been performed.
Recently, Schepartz and co-workers have reported that a bisboronic acid rhodamine-based dye (RhoBo, Scheme 5C) binds to tetraserine motifs with a nanomolar Kd value. Similar to FlAsH and ReAsH, RhoBo is fluorogenic and cell-permeable, yet RhoBo does not utilize the cytotoxic element arsenic nor does it suffer from background fluorescence arising from thiol exchange. RhoBo was initially designed as a tool for monosaccharide detection, but it binds monosaccharides with a significantly higher Kd value compared to peptides with SSPGSS motifs, thus allowing RhoBo to selectively label proteins in the presence of carbohydrates. However, some endogenous proteins contain SSXXSS-like sequences that might lead to off-target labeling in cell-based systems.
The hexahistidine peptide, a popular purification tag, has been exploited for labeling proteins in cellular systems. The imidazole group chelates nickel nitrilotriacetate (Ni-NTA) with high affinity. Rhodamine derivatives, cyanine dyes, and fluorescein have been conjugated to Ni-NTA and used for imaging proteins containing His6 or His10 sequences (Scheme 6, entry 1). Two shortcomings of the technique derive from its reliance on nickel: the paramagnetic nature of nickel(II) often leads to fluorescence quenching, and nickel(II) can be toxic. Lippard and co-workers have reported a chlorinated fluorescein analogue whose fluorescence is not quenched by nickel(II) ions, thereby addressing the first limitation, but the latter problem still remains. Hauser and Tsien were able to overcome the toxicity and quenching problems associated with polyhistidine chelators by switching from nickel(II) to zinc(II). The fluorescent tag termed HisZiFit (Scheme 6, entry 2) was designed to bind His6 in a ZnII-dependent manner, upon which its fluorescence is activated. This reagent was used to study the stromal interaction molecule STIM1.
Hamachi and co-workers have also exploited zinc(II) in the development of chelating probes that recognize tetraaspartate sequences (Scheme 6, entry 3). Multinuclear zinc complexes were synthesized, conjugated to fluorescein and cyanine dyes, and used to image the muscarinic acetylcholine receptor in chinese hamster ovary (CHO) cells. Two strategies were explored to engineer fluorescence activation of these zinc complexes upon protein binding. The first involved the use of an Asp4GlyAsp4 tag and a pyrene chromophore. When both tetraaspartate motifs were chelated to the zinc complex, the pyrenes created an excimer complex with altered fluorescent properties. The second method is based on a fluorophore that undergoes a spectroscopic change as a function of the pH value. The tetraaspartate motif created a local acidic environment that was reflected in the fluorescence of the bound dye. More recently, a zinc reagent bearing a chloroacetamide group was designed to alkylate a cysteine residue positioned near the tetraaspartate motif. This reagent extended the use of the dye to Western blotting and affinity-purification applications, which require more robust protein conjugation reactions.
Other metal-based peptide tags have been developed for imaging proteins in complex systems. From library screens, Imperiali and co-workers discovered peptides that form luminescent complexes with terbium(III) (Scheme 6, entry 4).[83,84] These lanthanide-binding sequences were used to detect tagged ubiquitin in cell lysates and to study, through luminescence resonance energy transfer, peptide–protein interactions between phosphopeptides and Src homology 2 domains. So far, the applications of lanthanide-binding tags have focused on in vitro studies because of the cell-impermeability of these metals and their potential toxicity if they are able to cross the cell membrane. However, these issues might be overcome by using chelators that can deliver the metals to their in vivo targets.
Nature provides a suite of enzymes that covalently modify proteins with small-molecule cofactors, and many of these enzymes recognize short peptide sequences that can be transported into heterologous proteins.[88,89] In some cases, the enzymes will recognize unnatural modifications to their small-molecule substrates, thereby allowing introduction of bioorthogonal functional groups or novel moieties such as biophysical probes. These methods are growing in popularity as a means to selectively modify proteins in live cells.
Biotin ligase has been artfully employed by Ting and co-workers for this purpose. The biotin ligase from E. coli, BirA, biotinylates a lysine residue within a 15-residue acceptor peptide (Scheme 7, entry 1) that is orthogonal to the peptide recognized by mammalian biotin ligases. Consequently, mammalian proteins tagged with the BirA recognition motif can be selectively biotinylated and visualized with streptavidin-conjugated quantum dots. Ting and co-workers also demonstrated that BirA can accept a ketone-containing analogue of biotin termed ketobiotin as a substrate. After enzymatic transfer to the protein of interest, the ketobiotin can be covalently labeled with hydrazide or aminooxy compounds (Section 4.1). The tolerance of BirA for unnatural substrates was limited to conservatively modified biotin isosteres. However, ligases from P. horikoshii and yeast were able to catalyze the transfer of azido- and alkynylbiotin analogues to proteins, thus enabling detection by Staudinger ligation or CuAAC (Sections 4.2 and 4.3, respectively).
The success of the biotin ligase method prompted Ting and co-workers to consider other enzyme-mediated strategies to tag proteins. They first focused their attention on transglutaminase (TGase; Scheme 7, entry 2), which had been employed previously for the in vitro modification of glutamine-tagged proteins with amine-conjugated probes.[94,95] Lin and Ting extended its applications to protein labeling on live cells. More recently, Ting and co-workers used lipoic acid ligase (LplA) to site-specifically modify proteins with short-chain azido fatty acids (Scheme 7, entry 3). Additionally, a mutant LplA transferred aryl azides onto proteins for photo-cross-linkng applications (Scheme 7, entry 4). Lipoic acid ligase and biotin ligase have also been used to orthogonally label two independent proteins in the same cell.
Some posttranslational modifications have intrisic orthogonal reactivity that allows for direct labeling of the protein at the modification site. This is the case for the aldehyde-containing formylglycine (FGly) residue formed by the action of the formylglycine-generating enzyme (FGE).[98,99] FGE recognizes a six-residue motif, in which a cysteine residue is oxidized to FGly. Normally found in type I sulfatases, the motif can be transported into heterologous proteins where it is nonetheless recognized by FGE. We have exploited the FGE consensus as a genetically encoded aldehyde tag for site-specific protein modification (Scheme 7, entry 5).[100–102] Coexpression of the tagged protein alongside FGE directly produces the aldehyde-functionalized protein. The aldehyde can be modified by using a variety of methods, such as condensations with aminooxy or hydrazide probes (Section 4.1). While most organisms have endogenous FGE activity, Carrico et al. verified that conversion of cysteine into FGly is enhanced if FGE is overexpressed to result in more enzyme available to oxidize cysteine residues to FGly. The aldehyde tag has been employed to modify proteins expressed in E. coli as well as in mammalian cells, including secreted, cytosolic, and membrane-associated proteins.
Site-specific protein modification has also been accomplished by use of bacterial sortases (Scheme 7, entry 6). Sortase A (SrtA), the most commonly used enzyme, naturally catalyzes the conjugation of proteins to the bacterial cell wall. The enzyme recognizes a peptide sequence (LPXTG) near the C terminus of its target site, cleaves the Thr–Gly bond, and forms an amide bond between the new C-terminal threonine residue and the N-terminal glycine of a polyglycine species. SrtA requires the polyglycine sequence but will tolerate heterologous sequences (or unnatural moieties) beyond the LPXTG motif. SrtA has been used for many in vitro applications, including peptide–, protein–, and carbohydrate–peptide ligations.[106,107] In 2007, Ploegh and co-workers reported the first sortase modification of proteins on live cells. In their report, an MHC H-2Kb protein was modified with a variety of oligoglycine probes containing biotin, fluorescein, tetramethylrhodamine, an aryl azide, and an ortho-nitrophenyl group. In later studies, Nagamune and co-workers used SrtA on live cells to label the extracellular C terminus of the membrane protein ODF with biotin and Alexa Fluor 488. The ligation of GFP to ODF was also demonstrated using SrtA. The benefit of sortase tagging is that there is no observed limitation to the size of the modification introduced, thereby eliminating the need for two-step strategies. However, only C-terminal modifications are possible with this technique.
Cell-surface proteins have been selectively modified using phosphopantetheinyl transferases (PPtases) from E. coli (AcpS) and B. subtilis (Sfp). PPtases catalyze the addition of a CoA-activated phosphopantetheine group to the serine residue of an acyl or peptidyl carrier protein (Scheme 7, entry 7). AcpS and Sfp do not load mammalian carrier proteins, and, therefore, these enzymes and their complementary bacterial carrier proteins can be used orthogonally within a mammalian system. These enzymes are highly promiscuous and can introduce a variety of functionality, including biotin and cyanine dyes, into proteins.[112,113] PPtases were used to study the transport of the transferrin receptor 1 and yeast cell-wall protein Sag1. In the latter study, a two-color labeling strategy was employed, which takes advantage of the pseudo-orthogonality of AcpS and Sfp. Sfp is not selective for the type of carrier protein it labels, while AcpS only modifies acyl carrier proteins. Thus, if a cell is exhaustively labeled with AcpS and then subjected to Sfp, two separate proteins may be selectively tagged and simultaneously studied.
The PPtase system suffers from the same problems as GFP, in that the carrier proteins (80–120 residues) can often lead to substantial perturbations of the protein of interest. To overcome this limitation, Walsh and co-workers performed phage-display selections to obtain an 11-residue peptide (termed ybbR) that was selectively modified by Sfp. Furthermore, phage-display screening resulted in two superior peptide sequences: one for Sfp and one for AcpS. The two new peptides (A1 and S6) both contain 12 residues and are modified more efficiently than ybbR. Additionally, these peptides display orthogonal reactivity to each other, as A1 is selective for modification by AcpS and S6 is selective for Sfp. This makes the two-color labeling strategy originally demonstrated by Johnsson and co-workers considerably more straightforward.
Johnsson and co-workers have developed a protein-labeling method that utilizes the human DNA repair protein O6-alkylguanine-DNA alkyltransferase (hAGT), which repairs guanosine residues that are alkylated at the 6-oxo position by transferring the alkyl group to a resident cysteine. These researchers found that when O6-benzylguanosine derivatives are introduced in cells, the benzyl group is readily transferred to hAGT. Thus, if hAGT is fused to a protein of interest, guanosine derivatives can be utilized to specifically label the desired protein (Scheme 7, entry 8).[119,120] This method has been termed “SNAP tag”. A recent adaptation, termed “CLIP tag”, employed an orthogonal enzyme that acts on modified cytosine residues. Lastly, Promega has created a “HaloTag” where a protein of interest is fused to a bacterial haloalkane dehalogenase (DhaA) that has been mutated at the catalytic site to trap the covalent intermediate. The protein of interest can then be tagged using alkyl chloride probes (Scheme 7, entry 9).
Genetically encoded peptide tags have expanded the repertoire of proteins that can be probed in cellular systems. However, the other biomolecules—glycans, lipids, nucleic acids, and various metabolites—are not amenable to such genetically encoded tags. Instead, these biomolecules can be tagged by metabolic labeling with bioorthogonal chemical reporters, namely, functional groups that possess unique reactivity orthogonal to those of natural biomolecules. The process entails two steps. First, cells (or organisms) are incubated with a metabolic precursor adorned with a unique functional group—the chemical reporter. The metabolite could be a monosaccharide for glycan labeling, a nucleoside for DNA labeling, an amino acid for protein labeling, or a fatty acid for lipid labeling. Once the chemical reporter is incorporated into the target biomolecule, it is treated in a second step with a probe molecule bearing complementary bioorthogonal functionality.
Exquisite selectivity of the chemical reporter and probe molecule is critical for execution of this method of biomolecule labeling, but equally important is the intrinsic kinetics of the chemical reaction. Most reactions used to selectively label biomolecules follow second-order kinetics, and consequently, their rates depend on the concentrations of the two reactive components and the second-order rate constant. While the concentrations of the labeled species can be controlled to some extent in in vitro settings, the labeled biomolecules are often at low concentrations in vivo. Biological labeling agents such as monoclonal antibodies typically bind their antigens with biomolecular rate constants that approach the diffusion limit (ca. 109M−1s−1). Consequently, such reagents can be used at very low concentrations and still bind to their targets at reasonable rates. By contrast, most second-order chemical reactions have rate constants that are 8–15 orders of magnitude lower than this; the reactions discussed in this section have rate constants ranging from 10−4 to 103M−1s−1.[126,127] These rate constants necessitate the use of relatively high concentrations (often high micromolar to millimolar) of secondary reagent when employing bioorthogonal chemical reactions in vivo; a parameter that may require consideration of the solubility and toxicity when designing the labeling reagent. This point highlights the importance of optimizing the intrinsic kinetics of the bioorthogonal reaction as a means of reducing the concentrations required for in vivo labeling.
In Section 4 we discuss the reactions developed for use in the bioorthogonal chemical reporter strategy, and then in Section 5 applications to specific biomolecule classes will be covered. The reactions include condensation of aldehydes and ketones with aminooxy and hydrazide probes (Section 4.1), the Staudinger ligation of triarylphosphines and azides (Section 4.2), and various reactions of azides and alkynes (Section 4.3). Bioorthogonal reactions involving alkenes are emerging (Section 4.4), although their use in complex biological systems is still on the horizon.
Ketones and aldehydes react with amine nucleophiles that are enhanced by the α effect. Prototypical examples are aminooxy and hydrazide compounds, which form oxime and hydrazone linkages, respectively, under physiological conditions (Scheme 8). While biological nucleophiles—amines, thiols, and alcohols—also react with ketones and aldehydes, the equilibrium in water generally favors the carbonyl compound. Accordingly, ketones and aldehydes have a rich history in the field of protein modification.[100,129–131] These carbonyl compounds have not been widely employed for labeling biomolecules inside cells or within live organisms, in part because of competition with endogenous aldehdyes and ketones, including those in glucose and pyruvate (although notably, Schultz and co-workers reported the intracellular labeling of a ketone-functionalized protein). However, aldehydes and ketones are absent from cell surfaces and in this environment they can serve as unique chemical reporters. For example, we demonstrated that certain keto sugars are metabolized by cells and integrated into cell-surface glycans, where they can be treated with aminooxy and hydrazide probes.[133,134] Paulson and co-workers introduced aldehydes into cell-surface sialic acid residues by mild periodate oxidation and then captured the modified glycoproteins by reaction with aminooxybiotin followed by streptavidin chromatography. In this case, the use of aniline as a catalyst[135,136] accelerated the reaction under neutral conditions, an improvement over the typical acidic conditions used for oxime formation. As described above, both biotin ligase and aldehyde tags have been used to label cell-surface proteins with ketones and aldehydes, respectively.
Azide has proven to be a particularly powerful chemical reporter group. Unlike aldehydes and ketones, azide is totally absent from biological systems, and it also possesses orthogonal reactivity to the majority of biological functional groups. Importantly, the azide group is small[138–140] and therefore only minimally perturbs a modified substrate. These favorable properties went unexploited in biological settings, however, until a suitable reactive partner for azide was developed. In 2000, azide made its debut as a chemical reporter group with the development of the Staudinger ligation.
The Staudinger ligation is a modification of the classic Staudinger reduction of azides with triphenylphosphine. As shown in Scheme 9, strategic placement of an ester group on one of the phosphine’s aryl substitutents (1) allows an intermediate aza-ylide (2) to undergo intramolecular formation of an amide bond (3). This step is a central feature of the reaction, because otherwise the aza-ylide intermediate would simply hydrolyze to afford the corresponding amine and phosphine oxide. However, hydrolysis of intermediate 3 produces a stable ligation product (4), which includes the phosphine oxide within its structure.
The bioorthogonality of the reagents used in the Staudinger ligation warrants some discussion. Azides and phosphines have potential cross-reactivity with thiols and disulfides, respectively. Thiols are capable of reducing alkyl and aryl azides, especially under basic conditions, and dithiols such as dithiothreitol are particularly reactive in this regard.[143–145] Although the reduction of alkyl azides is quite slow at physiological pH, there is evidence that cytosolic glutathione can reduce the azide in 3′-azidothymidine. We addressed this potential problem in the context of labeling glycans with azido sugars. Jurkat cells bearing azido sugars in their cell-surface glycans were treated with a Staudinger ligation reagent or tris(2-carboxyethyl)phosphine (TCEP)—a trialkylphosphine known to readily reduce disulfides—followed by an amine-reactive biotin probe. The quantity of amines on the cell surface was subsequently measured by incubation with a fluorescent avidin reagent followed by flow cytometry analysis. Cells labeled with azido sugar displayed slightly higher levels of cell-surface amines than unlabeled cells, which suggests that some azides had been reduced in situ. However, the quantity of cell-surface amines was significantly higher for cells treated with TCEP, thus indicating that the majority of the azides were not reduced by cellular thiols.
In principle, phosphines can reduce disulfide bonds.[148,149] However, triarylphosphines are generally not capable of reducing alkyl disulfides (namely, those found in biological systems) under physiological conditions. We confirmed this observation in a biological setting by treating Jurkat cells with a triarylphosphine and quantifying the amount of free sulfhydryl groups on the cell surface by using a thiol-specific biotin reagent. No increase in free sulfhydryl groups was observed. By contrast, TCEP produced a marked increase in free sulfhydryl groups.
An appealing feature of the Staudinger ligation is that its mechanism lends itself to the invention of fluorogenic reagents for the real-time imaging of biomolecules. In 2003, Lemieux et al. reported a fluorogenic phosphine in which one of the aryl rings was replaced with a coumarin dye (5, Scheme 10A). The fluorescence of 5 was quenched by the lone pair of electrons on the phosphorus atom. Phosphine oxidation during the course of the Staudinger ligation relieved the quenching effect, thereby producing a highly fluorescent biomolecule-bound product (6). One problem with this “smart” probe was background fluorescence caused by low levels of oxidation of the phosphine in air. To overcome this problem, we exploited another step of the Staudinger ligation—ester cleavage. As shown in Scheme 10B, incorporation of a fluorescence resonance energy transfer (FRET) quenching group at the ester position in 7 provided an alternative means of fluorescence activation upon Staudinger ligation that was not sensitive to phosphine oxidation.
The Staudinger ligation has been adapted for applications beyond biomolecule labeling, most significantly for protein synthesis. “Traceless” versions of the Staudinger ligation have been developed to produce amide bonds without inclusion of the phosphine oxide moiety. Raines and co-workers merged this concept with thioester chemistry (reminiscent of NCL) to develop a traceless Staudinger ligation for peptide coupling (Scheme 11).[154–156] This traceless Staudinger-mediated peptide coupling involves the attack of a peptide containing a C-terminal phosphinothioester (9) with an azide-labeled peptide. Attack of the phosphine in 9 on the azide results in iminophosphorane 10, which rearranges to 11. Hydrolysis of 11 facilitates the ligation of the two original peptides through formation of a native amide bond (12). Unlike the standard NCL process, the method of Raines and co-workers does not require the presence of a cysteine residue at the ligation site.[154,155] Additional reagents have been synthesized for the traceless Staudinger ligation, including an ester-linked version. Optimization of the phosphine reagents for steric, electronic, and coulombic factors has also been performed.[158,159]
The exquisite bioorthogonality of azides and triarylphosphines has enabled the use of the Staudinger ligation for probing biomolecules in many cellular environments, as well as within living animals.[147,160] However, like any reaction, the Staudinger ligation is not without limitations. It suffers from relatively slow reaction kinetics, necessitating high concentrations of a triarylphosphine (>250 μM). A series of kinetic studies determined that the Staudinger ligation displays second-order kinetics in reactions with alkyl azides (k = 10−3M−1s−1), which indicates that the rate-determining step is the attack of the phosphine on the azide. Unfortunately, all efforts to improve the kinetics of the Staudinger ligation by increasing the nucleophilicity of the phosphine reagents (such as by addition of electron-donating groups to its aryl substituents or replacement of aryl with alkyl substituents) also resulted in increased susceptibility to phosphine oxidation in air, which subverts the desired ligation. As discussed above, the kinetics can be critical to the success of in vivo biomolecule labeling studies. Consequently, there has been considerable interest in developing faster azide reactions.
An alternate mode of reactivity for the azide is its participation as a 1,3-dipole in a [3+2] cycloaddition with alkenes and alkynes. This reaction, first reported at the end of the 19th century, has been proposed to proceed by a concerted cycloaddition since the 1950s, when Rolf Huisgen introduced the concept of 1,3-dipolar cycloadditions. However, the high temperatures or pressures required to promote the cycloaddition of azides and most dipolarophiles are not compatible with living systems. Nevertheless, the potential of this transformation, especially the cycloaddition of azides and alkynes to form aromatic triazole products (ΔG° ≈ −61 kcalmol−1), was too great for it to be overlooked.
In separate efforts, the Sharpless and Meldal research groups discovered that the formal 1,3-dipolar cycloaddition of azides with terminal alkynes to produce 1,4-disubstituted 1,2,3-triazoles could be effectively catalyzed by copper(I) (Scheme 12).[168,169] This reaction, now termed the copper-catalyzed azide–alkyne 1,3-dipolar cycloaddition (CuAAC), takes advantage of the formation of a copper acetylide to activate terminal alkynes toward reaction with azides. The copper(I)-catalyzed cycloaddition proceeds roughly seven orders of magnitude faster than the uncatalyzed cycloaddition, and the reaction can be further accelerated by the use of specific ligands for copper(I).[170,171] CuAAC has all the properties of a click reaction (including efficiency, simplicity, and selectivity), as defined by Sharpless and co-workers. In fact, it has become the quintessential click reaction and is often refered to simply as “click chemistry”. CuAAC has gained widespread use in organic synthesis, combinatorial chemistry, polymer chemistry, materials chemistry, and chemical biology.[173–182] The formal cycloaddition between azides and terminal alkynes can also be catalyzed by ruthenium(II) to obtain 1,5-disubstituted 1,2,3-triazole products, but this reaction is used far less frequently than CuAAC. The first report of CuAAC as a bioconjugation strategy was demonstrated by Finn and co-workers through the attachment of dyes to cowpea mosaic virus.
To date, the use of CuAAC in living systems has been hindered by the toxicity of copper(I). Bacterial and mammalian cells as well as zebrafish embryos have been subjected to click chemistry conditions. E. coli expressing protein-associated azides have been labeled with 100 μM CuBr for 16 h and survived the initial labeling, but were no longer able to divide.[186,187] Similarly, mammalian cells can survive low concentrations (below 500 μM) of copper(I) for 1 hour. However, considerable cell death occurs when optimized CuAAC conditions that require 1 mM copper(I) are employed. Zebrafish embryos exhibited a similar sensitivity to copper(I). When the embryos were treated with 1 mM CuSO4, 1.5 mM sodium ascorbate, and 0.1 mM tris(benzyltriazolylmethyl)amine ligand, all the embryos were dead within 15 minutes. Thus, as presently formulated, CuAAC is of limited use for labeling biomolecules in living systems.
To improve upon the biocompatibility of the azide–alkyne cycloaddition, we sought to activate alkynes by a method other than metal catalysis, namely by ring strain (Scheme 12, bottom). The roots of the strain-promoted azide cycloadditions precede the Huisgen era and date back to when Alder and Stein discovered that dicyclopentadiene reacted considerably faster than cyclopentadiene in reactions with azides.[189,190] Studies on strained alkenes and alkynes continued through the 1960s, and during this time, Wittig and Krebs reported that cyclooctyne, the smallest stable cycloalkyne, reacted “like an explosion” when combined with phenylazide. Building on this classic literature, we synthesized a biotin conjugate of cyclooctyne 13 and demonstrated that it labeled azides effectively within cell-surface glycans with no apparent cytotoxic effects. This study laid the foundation for the investigation of a series of cyclooctynes (Scheme 13, 14–20) that enable the detection of azides in living systems through the strain-promoted [3 +2] cycloaddition.[193–196]
Still, the first-generation strain-promoted cycloaddition was no faster than the Staudinger ligation (k = 10−3M−1s−1) and considerably slower than CuAAC. With the aim of improving the kinetics of the process, a series of compounds bearing electron-withdrawing fluorine atoms at the propargylic positions were investigated. The addition of one fluorine atom (cyclooctyne 15) modestly increased the rate (ca. 4-fold) but a gem-difluoro group afforded a dramatic 60-fold enhancement (k = 10−1M−1s−1). This difluorinated cyclooctyne (16, abbreviated DIFO) demonstrated comparable kinetics to CuAAC in biomolecule labeling experiments, which prompted us to dub its reaction with azides “copper-free click chemistry”. DIFO–fluorophore conjugates have now proven to be exceptional reagents for imaging azide-labeled biomolecules within complex biological systems, including live cells, C. elegans, and zebrafish embryos, with very low background fluorescence. However, background labeling is sometimes observed in protein labeling experiments analyzed by Western blot, perhaps because of nonspecific hydrophobic interactions or as yet uncharacterized reactions with protein functionalities.
Since the report of DIFO, a number of new cyclooctyne reagents have been synthesized. Two more synthetically tractable DIFO analogues 17 and 18 have been synthesized, and will hopefully make copper-free click chemistry more accessible to researchers in the area of chemical biology. We have also synthesized a more water-soluble azacyclooctyne (19) that was designed to improve other attributes of the cyclooctyne reagents such as pharmacokinetic properties. Boons and co-workers have reported the use of dibenzocyclooctyn-ol 20 in copper-free click reactions. This reagent is nontoxic, has reaction kinetics similar to DIFO, and is synthetically quite accessible. Recent theoretical studies by Houk and co-workers have provided frameworks in which cyclooctyne reactivity can be predicted and optimized. This theoretical basis may enable the design of more reactive congeners for synthetic pursuit.
Alkenes have also been used with 1,3-dipoles and dienes in cycloadditions promoted by ring strain or light (Scheme 14). The product of an azide–alkene cycloaddition is a triazoline, which is relatively unstable compared to an aromatic triazole and is not advantageous for applications where a ligation product is desired. Rutjes and co-workers circumvented this problem by using oxanorbornadienes containing electron-deficient olefins (21) as substrates. The oxanorbornadiene underwent a [3+2] cycloaddition with an azide (22) to produce 23, which then proceeded through a Diels–Alder reaction to extrude furan and yield triazole product 24 (Scheme 14A).[127,200] These reagents are straightforward to synthesize, but they display relatively slow reaction rates (k =10−4M−1s−1).
By contrast, the reaction of tetrazines with strained alkenes as reported by Fox and co-workers is very fast. They reported the inverse-electron demand Diels–Alder reaction of trans-cyclooctene 25 with dipyridyltetrazine 26 to form ligation product 29 (Scheme 14B). The reaction proceeds via intermediate 27, which rapidly loses N2 to yield intermediate 28, which isomerizes to the final ligation product 29. This reaction proceeds very rapidly in water (k = 103M−1s−1). Independently, Hilderbrand and co-workers developed the reaction of norbornene 30 and tetrazine 31 (Scheme 14C). This reaction proceeds by the same mechanism; however, it does not occur as rapidly as the tetrazine ligation of Fox and co-workers. It is important to note that normal-demand Diels–Alder reactions have been used in the context of bioconjugation reactions, but they often require dienophiles that are also Michael acceptors (for example, maleimides).[202–205] Thus, competing Michael additions from biological nucleophiles severely limit the selectivity of these reactions.
Lin and co-workers have developed a photochemical 1,3-dipolar cycloaddition between diaryl tetrazoles 35 and simple alkenes (Scheme 14D) to form pyrazolines. The first report of the reaction required light with a wavelength of 302 nm to produce the nitrile–imine dipole 36. In an effort to reduce potential photodamage to living systems, Lin and coworkers have modified the aryl groups on the tetrazole so that the reaction can occur with irradiation at a wavelength of 365 nm. This reaction does not require an activated alkene, which makes its application to living systems more facile, since techniques for metabolic labeling of biomolecules with simple alkenes are well precedented (see Section 5). In fact, alkene-containing proteins in E. coli have been modified with diaryl tetrazoles. An additional advantage of this reaction is that the resulting pyrazoline cycloadducts (38) are fluorescent.
Chemoselective modification of alkenes by cross-metathesis is emerging as a bioorthogonal reaction. Ruthenium-catalyzed olefin metathesis is remarkably tolerant of functional groups and has been employed with biomolecule substrates for some time, usually in organic solvents.[209–212] Over the past few years, multiple research groups have sought to develop water-soluble olefin metathesis catalysts.[213–216] Cross-metathesis with these catalysts has been particularly challenging and only a few substrates have been successful. Recently, Davis and co-workers have modified proteins containing allyl sulfide groups (39) through cross-metathesis in a tert-butanol/water mixture by using the Hoveyda–Grubbs second-generation catalyst (Scheme 14E). The high selectivity and functional-group tolerance of cross-metathesis, coupled with the ease of introduction of alkenes into biomolecules, render this technique well-poised for further application in biology.
The bioorthogonal chemical reporter strategy requires integration of one reactive component into target biomolecules within cells or organisms. Proteins, glycans, lipids, and nucleic acids have all been adorned with bioorthogonal functional groups in a global or site-selective fashion.
A straightforward method of introducing chemical reporters into cellular proteins, pioneered by Tirrell and co-workers, is to simply subject cells to an unnatural amino acid that is tolerated by the translational machinery, particularly the aminoacyl-tRNA synthetases (aaRSs; Scheme 15A).[218–221] The concept dates back to the 1950s, when methionine residues were shown to be replaced by their selenium analogues after the addition of selenomethionine to methionine-depleted growth media. We now know that surrogates for methionine, leucine, tryptophan, or phenylalanine can be incorporated into proteins expressed in E. coli, although reports can be found for replacement of almost any amino acid with an unnatural derivative. The yields are optimal when the E. coli strain is rendered auxotrophic for the amino acid being targeted for replacement, and over-expression of the required aaRS can be helpful as well.
By using this method, a variety of bioorthogonal functional groups have been incorporated into proteins, both in E. coli and in mammalian cells. For example, the methionine surrogates homopropargylglycine (42, Hpg), homoallylglycine (43, Hag), and azidohomoalanine (44, Aha) were used to introduce alkynes, alkenes, and azides, respectively, in proteins with good efficiency (Scheme 16).[187,223] Hpg was employed to label newly synthesized proteins with an azidocoumarin dye by CuAAC in bacterial and mammalian systems. Similarly, Aha has been used to interrogate newly synthesized proteins by labeling with CuAAC, Staudinger ligation, and, more recently, with cyclooctyne probes. The method has also been applied to proteomic analysis of newly synthesized proteins, a process termed bioorthogonal noncanonical amino acid tagging (BONCAT). Moreover, Aha and Hpg have been used together to image two distinct protein populations simultaneously. Azides and alkynes have also been installed within viruslike capsids by replacing methionine residues with Aha and Hpg. Preliminary data indicate that allylcysteine can also be incorporated as a methionine surrogate and then potentially modified by cross-metathesis.
Some unnatural amino acids are too structurally dissimilar from their native relatives for recognition by natural aaRS enzymes. In such cases, the aaRSs can be mutated to accept the unnatural substrate, either by rational structure-based[231,232] or selection-based methods. Such strategies have been used to obtain a mutant phenylalanine tRNA synthetase (ePheRS) and incorporate a ketone-bearing amino acid para-acetyl-phenylalanine (45) into proteins.[231,234] The same ePheRS has also been used to incorporate para-azido-, para-bromo-, and para-iodophenyl-alanines into proteins.[235,236] The azide allowed for modification with phosphines or alkynes, while the halogen derivatives were modified by using palladium-catalyzed cross-coupling methods.[237,238]
Site-specific introduction of unnatural amino acids into proteins was first reported 20 years ago as an in vitro technique,[239,240] and it has since been extended, primarily through the work of Schultz and coworkers, to in vivo applications. This method utilizes the codon UAG (the amber nonsense stop codon), which normally directs termination of protein synthesis, to instead encode an unnatural amino acid loaded onto a complementary tRNA. The tolerance of the ribosome for unnatural amino acids allows for incorporation into proteins during normal translation (Scheme 15B,C).
To integrate unnatural amino acids into proteins in an in vitro setting (Scheme 15B), the gene encoding the protein of interest is first mutated so that the amber stop codon is situated at the desired modification site, and all other amber stop codons are removed from the sequence. A tRNA is synthesized that contains a complementary anticodon as well as a covalently ligated unnatural amino acid at the 3′ end. The addition of this artificial tRNA and the gene encoding the mutated protein to an E. coli in vitro transcription/translation system generates the modified protein. Many unnatural amino acids have been incorporated into proteins by using this technique; however, it suffers from low yields and considerable labor is involved in synthesizing the amino-acylated tRNA. Despite these limitations, the in vitro technique has been used to study a variety of proteins. Residues in α helices and β sheets were replaced with ester analogues to explore the contributions of backbone amide bonds in protein structure.[243–245] In addition, the contribution of cation–π interactions to protein stability was analyzed by using unnatural amino acids. A notable extension of the system has been microinjection of engineered mRNA and aminoacylated tRNA into Xenopus oocytes.[247,248] Dougherty, Lester, and co-workers have used this approach to study the mechanism of neuroreceptors through electrophysiology, and to this end more than 100 unnatural amino acids have been incorporated into Xenopus ion channels. The in vitro method coupled with microinjection is ideally suited for these applications because of the very low levels of protein that can be detected using electrophysiology.
To overcome the limitations of the in vitro method, Schultz and co-workers have created an in vivo system for site-specific mutagenesis of unnatural amino acids (Scheme 15C). The breakthrough was the selection of orthogonal tRNA and aaRSs that recognized the amber stop codon and unnatural amino acid, respectively. Expression of the corresponding genes in a heterologous host together with the gene encoding the desired protein with the amber mutation produced the modified protein. Typically, the cell-culture media is supplemented with the unnatural amino acid, but an E. coli strain engineered to produce para-aminophenylalanine was able to incorporate this “21st amino acid” by total biosynthesis. The extension of in vivo unnatural amino acid mutagenesis to yeast[250,251] and mammalian cells[252–254] has also been achieved.
Dozens of unnatural amino acids have been incorporated into proteins by using the in vivo amber stop codon method; most are based on an aromatic (tyrosine or phenylalanine) core. Amino acids containing azides (44, 46), alkynes (42, 47), ketones (45),[132,257] and alkenes (43, 48) have all been incorporated for further reaction with bioorthogonal chemical reporters (Scheme 16). Other functional groups installed into proteins include anilines,[259,260] aryl halides, boronic acids, photoisomerizable and cross-linking groups,[263,264] post-translational modifications,[265,266] caged versions of amino acids to allow the masking of putatively important residues,[267–269] and even whole fluorophores.[270,271]
Activity-based protein profiling (ABPP) allows for the study of specific classes of enzymes based on their catalytic mechanism, often through the use of large molecules which contain a probe (fluorophore or biotin) conjugated to a functional group designed to react with residues in the target enzyme’s active site (a mechanism-based inhibitor). The bioorthogonal chemical reporter strategy has improved ABPP by eliminating the need for a large chemical probe during the covalent labeling process (Scheme 17).[272,273] In its initial form, ABPP made use of an electrophilic group (known as a warhead) that covalently modifies the targeted class of enzymes and is conjugated to an affinity probe such as a biotin or a fluorophore. While useful for tagging active enzymes from cell lysates, such large probes often had poor pharmacokinetic properties that prevented their use in vivo. Consequently, enzyme activities could not be studied in their native environments.
In 2003, the research groups of Cravatt and Overkleeft independently reported the application of bioorthogonal reactions to ABPP. Cravatt and co-workers used an azide-functionalized phenyl sulfonate that targeted serine hydrolases, while Overkleeft and co-workers used an azide-functionalized vinyl sulfone that targeted proteasomes.[188,275] The azide probes were introduced into live cells, where the target enzymes were covalently modified. Cell lysates were generated and subsequently treated with an alkyne-rhodamine probe through CuAAC ligation (Cravatt) or with a biotinylated triarylphosphine probe through a Staudinger ligation (Overkleeft). Cravatt and co-workers further showed that this two-step procedure allowed the extension of ABPP to live mice. Since the original report, Cravatt and coworkers have used this strategy to profile breast cancer cells and cytochromes P450, and have also determined that the copper(I)-catalyzed labeling procedure is more effective when the alkyne is attached to the electrophilic trap and detection is performed with an azido probe. Additionally, Ploegh and co-workers have used an azido-epoxide warhead, a triarylphosphine-biotin probe, and streptavidin-Alexa Fluor 647 for live-cell imaging of cathepsin proteases.
Proteins can be modified experimentally though genetics (amber stop codon methods and the addition of peptide sequences), biosynthesis (metabolic incorporation of unnatural amino acids using auxotrophs), and function (activity–based protein profiling). Many of these techniques are not applicable to glycans because these biopolymers do not have the genetic template and enzymatic activities that proteins can possess. However, glycans can be modified by metabolic labeling with biosynthetic precursors (Scheme 18). To date, unnatural sialic acid (Sia), N-acetylgalactosamine (GalNAc), N-acetylglucosamine (GlcNAc), and fucose (Fuc) residues have been successfully incorporated into glycans through salvage pathways or, in the case of sialic acid, through de novo biosynthesis (Scheme 19). This technique, known as metabolic oligosaccharide engineering, was first employed by Reutter and co-workers to alter the acyl side chains on cell-surface sialic acids in live cells and animals using N-acetylmannosamine (ManNAc) derivatives 49 and 50 as the metabolic precursors.[280,281] These unnatural sialic acids were used to study host–virus interactions, neuronal cell differentiation, and polysialic acid. Building upon these studies, we have exploited metabolic pathways to introduce bioorthogonal functional groups into sialylated glycans. Toward this end, a ketone-containing ManNAc derivative (51), known as N-levulinoylmannosamine (ManLev), was shown to be metabolized in a variety of cell lines.[133,285,286] The ketone groups were then treated with biotin hydrazide and detected by Western blot or flow cytometry. The latter method was also used to promote gene transfer and adenoviral uptake through the targeting of an anti-adenovirus antibody to ketone-containing glycans.
The GalNAc salvage pathway enzymes proved to be less tolerant than the sialic acid biosynthetic enzymes, as N-levulinoylgalactosamine (GalLev) was not significantly metabolized in mammalian cells. Ketone isostere 52 was synthesized to introduce ketone groups into GalNAc containing glycans. This compound was successfully incorporated into glycans present in ldlD CHO cells, which lack the UDP-Gal/UDP-GalNAc 4-epimerase and cannot biosynthesize their own GalNAc.
In 2000, we reported that azides could be incorporated into glycans containing sialic acid by treatment of cells with ManNAc derivative 53, N-azidoacetylmannosamine (ManNAz). This report also marked the introduction of the Staudinger ligation, as the azidosialic acid residues were detected with triarylphosphine-biotin probes followed by flow cytometry analysis. Contrary to the levulinoyl derivatives, the azido derivatives could easily be extended to the GalNAc and GlcNAc salvage pathways (using GalNAz (54) and GlcNAz (55), respectively), thus facilitating the analysis of mucin-type O-linked glycans and O-GlcNAcylated proteins with the Staudinger ligation or strain-promoted [3+2] cycloaddition.[289–291] Additionally, we have shown that azides can be incorporated into glycans in mice and zebrafish by using ManNAz or GalNAz as metabolic precursors.[160,197,292] For in vivo studies, the azido glycans can be identified by either the Staudinger ligation or copper-free click reaction. Alternatively, they can be analyzed ex vivo by using any azide reaction.
Azides and other unnatural groups have also been incorporated into cell-surface sialic acid residues by using the sialic acid analogues directly. Modification of the 5-N-acyl and 9-OH positions of sialic acid are well tolerated, and numerous unnatural groups have been introduced therein to study sialic acid binding events. Alkyl and aryl azides have been incorporated into cell-surface glycans through sialic acid precursors modified at the 5- and 9-positions (56–58).[294,295] Photo-cross-linking of sialic acid 58 was used to study the glycan ligands for CD22. Tanaka and Kohler have performed similar photo-cross-linking studies using a diazirine-containing ManNAc derivative, 60 (ManNDAz). Luchansky et al. employed sialic acids 56, 57, and 59 along with their corresponding ManNAc precursors 53, 61, and 51 to show that bypassing the first step of the sialic acid biosynthetic pathway often increases the yield of cell-surface glycans. Notably, the sialic acid biosynthetic pathway proved to be just as efficient as the salvage pathway for the incorporation of azides into sialic acid residues when ManNAz (53) and the corresponding azidosialic acid 56 were employed as substrates.
Fucosylated glycans have been labeled with 6-azidofucose 62;[297,298] however, fucose 62 was found to be toxic in many mammalian cell lines and has not been extended into living animals. Wong and co-workers reported the use of 6-alkynyl fucose 63, which was less toxic than the azido analogue. The Wong research group also successfully labeled cell-surface sialic acids with alkynyl ManNAc precursor 64, and Wu and co-workers used the same metabolite to label cell-surface glycans in mice. The degree of sialic acid metabolic labeling with alkynyl ManNAc is superior to that of ManNAz; however, the alkyne must be detected by using CuAAC, which is not ideal for the detection of glycans in a living system, because of the toxicity of copper(I). Yarema and co-workers reported the metabolic labeling of sialic acids on stem cells with thio-ManNAc analogue 65 (ManNTGc), and showed that this unnatural glycan influenced their differentiation.
The ability to metabolically label glycans with bioorthogonal functional groups has made these biopolymers amenable, for the first time, to molecular imaging. Toward this end, Chang et al. used triarylphosphine fluorophore conjugates to visualize azide-labeled glycans on cultured cells. This same report demonstrated that two glycan subtypes could be imaged simultaneously by introduction of ketones into one glycan type and azides into another. Azide-labeled glycans were also imaged by the Staudinger ligation with fluorogenic phosphine 7. Using the triarylphosphines, cell-surface glycans could be profiled; however, the inherently slow rate of the Staudinger ligation prevented more dynamic events from being observed. Subsequently, Baskin et al. performed glycan transport experiments on azide-labeled glycans with DIFO–fluorophore conjugates to gain insight into the rate and trajectory of internalization from the membrane. DIFO–fluorophore conjugates have been used with great success to label glycans in developing zebrafish; this is the first example of molecular imaging of glycans in a live organism. Spatiotemporal analysis of GalNAz-labeled glycans revealed dynamic aspects of mucin glycoproteins throughout the developmental program. Recently, alkyne-containing carbocyanine dyes (absorption in the near-IR region) have also been used to image azido glycans in mammalian cells by using CuAAC.
There is growing interest in the use of chemical reporters embedded within glycans for proteomic analysis of glycosylation, an effort that may reveal new biomarkers of disease. For example, we have shown that cells treated with GalNAz incorporate the unnatural metabolite into mucin glycoproteins, and these can be identified by reaction with a phosphine-based affinity probe followed by capture, elution, trypsin digestion, and identification by mass spectrometry. Through this procedure we discovered that GalNAz also labels O-GlcNAcylated proteins, presumably through the conversion of UDP-GalNAz into UDP-GlcNAz by the enzyme UDP-Glc/GalNAc C-4 epimerase. Zhao and coworkers analyzed O-GlcNAc-modified proteins of the cytosol and nucleus through metabolic labeling with GlcNAz followed by reaction with a phosphine probe, enrichment, and mass spectrometry identification.[304,305] In a similar way, Lemoine and co-workers analyzed the O-GlcNAc proteome. By using the metabolic precursor alkynyl ManNAc (64), Wong and co-workers isolated sialylated glycoproteins by reacting lysates with an azido-biotin probe through CuAAC, capturing the tagged proteins on streptavidin-conjugated beads, and directly digesting the proteins off the beads with trypsin, which yielded an enriched sample ready for mass spectrometric analysis. By using this method, over 200 glycoproteins were identified, with the majority bearing sialylated N-linked glycans. These precedents establish a platform for comparative glycoproteomic studies that seek to identify changes in glycoprotein abundance or in the structures of the pendant glycans that correlate with disease.
Like glycans, lipids are secondary metabolites and post-translational modifications that cannot be directly studied using genetically encoded reporters. Bioorthogonal chemical reporters have proven particularly useful for probing lipids in cellular systems (Scheme 20A).
Myristoylation and palmitoylation are the two predominant forms of fatty acid acylation on proteins. Both have been probed using azido lipid analogues in cell-culture systems. Ploegh and co-workers demonstrated that azide-labeled fatty acid 66 is converted in situ into its CoA analogue and transferred to sites of myristoylation on endogenous proteins. The longer chain fatty acids 67–69, by contrast, were attached to proteins at sites normally modified by palmitoylation. N-myristoylation has also been studied in vitro and in E. coli by using azido- and alkynyl-fatty acids 70 and 71 that were metabolically incorporated and subsequently detected through CuAAC.[310,311] Berthiaume and co-workers have studied the role of myristoylation in apoptosis by using azido-fatty acid 66 and the Staudinger ligation. The same research group has also used similar methods to identify new palmitoylated proteins in the liver.
Unlike myristoylation and palmitoylation, farnesyl groups (as well as geranylgeranyl groups) are derived from isoprenoid diphosphates. Proteins that have a C-terminal CAAX (A =aliphatic, X =Met, Ser, Phe) sequence are farnesylated on the cysteine residue by protein farnesyltransferases (PFtases). A variety of diphosphate farnesyl precursors containing unnatural functional groups (for example, alkynes, azides, ketones) have been synthesized and used to modify proteins in vitro.[315–319] Zhao and co-workers have incorporated azides into farnesyl groups in an in vivo setting through the introduction of azidoisoprenol 72 as a metabolic precursor. An enrichment aided by the Staudinger ligation was performed, and new farnesylated proteins were identified by mass spectrometry. The authors also stated that this method was successful in labeling geranylgeranylated proteins. Also noteworthy is that some research groups have used protein farnesylation as a method of enzyme-mediated site-specific modification of proteins (Section 3.3) by using the conserved sequence recognized by PFtases.
Recently, Neef and Schultz have extended the bioorthogonal chemical reporter strategy to study the dynamics of phospholipids in bilayers. Alkyne-modified phosphatidic acids 73–75 were synthesized and introduced into mammalian cells. The S-acetylthioethyl (SATE) groups facilitated penetration of the cell membrane and were cleaved in the cytosol by esterases. The terminal alkyne-functionalized phospholipid 73 and nonhydrolyzable version 74 were visualized on fixed cells by using CuAAC with an azidocoumarin dye. Cyclo-octyne lipid 75 enabled direct visualization of lipid bilayers on live cells by copper-free click reactions. This study marks the first integration of a cyclooctyne into a biomolecule for cell-based studies.
Nucleic acids adorned with bioorthogonal functional groups have been synthesized and used for various purposes (Scheme 20B), most notably as a method for imaging cells undergoing DNA synthesis and replication. Salic and Mitchison have reported that 5-ethynyl-2′-deoxyuridine (76, EdU) can be metabolically incorporated into DNA during replication and subsequently detected with an azido fluorophore through CuAAC. The alkynes can be detected in live cells if a cell-permeable fluorophore is employed, although the viability of cells is compromised after exposure to copper. By using this procedure, newly synthesized DNA can be visualized quickly and with good sensitivity under milder conditions than the traditional method in which 5-bromo-2′-deoxyuridine (BrdU) and a corresponding antibody are used.[324,325] EdU is even effective for visualizing cells undergoing DNA synthesis in mice. More recently, RNA synthesis has been imaged using 5-ethynyluridine (77, EU) and fluorescent azides in an analogous method.
The azido analogue of EdU, 5-azido-2′-deoxyuridine (78, AdU), was also reported, but detection of AdU by CuAAC suffered from considerably higher background signal than labeling using EdU. AdU was not the first case in which an azide was installed into a nucleic acid. Aryl azidonucleotides have been used as photo-cross-linking agents to study interactions between DNA or RNA and other biomolecules. Additionally, Rajski and co-workers have synthesized S-adenosyl-L-methionine (SAM) derivatives with azides and alkynes to probe DNA methyltransferase activity.[328–330]
Within the last decade, bioorthogonal reactions have become essential tools for chemical biologists. They have opened up new avenues for biological investigation and produced fundamental discoveries in areas as diverse as protein biophysics, neurophysiology, developmental and stem cell biology, and cancer detection. The need still exists for additional orthogonal functional groups and improvements in the kinetics and selectivities of the reactions already at hand. Mining of the periodic table and classic organic literature are two avenues that are likely to bear interesting fruit. Group 15 has been particularly lucrative in terms of bioorthogonal reactions, and perhaps the larger elements within this group, bismuth and antimony, could be similarly beneficial. In addition, novel combinations of phosphorus and sulfur have yet to be explored. Pericyclic reactions seem poised for biological applications because their rates are often accelerated in polar solvents and their concerted mechanisms leave little room for interruption by biological nucleophiles and electrophiles. Also, biologically compatible forms of energy may be strategically employed to promote otherwise dormant reactions in living systems. Light has already been harnessed, but focused ultrasound is another energy source that might be exploited for bioorthogonal chemistry in vivo.
The extension of chemical reporters to studies of small-molecule metabolites is another promising future direction. Cofactors and cholesterol seem to be prime targets for the addition of chemical reporters. Notably, the Burkart and Marquez research groups have already made progress in extending the bioorthogonal chemical reporter strategy to the analysis of pantetheine cofactors and the mechanism of action of flavone-8-acetic acid, respectively. The field is quite open to the merger of innovative chemical reactions with biological targets that have defied conventional research tools.
We thank J. M. Baskin, K. E. Beatty, K. W. Dehnert, J. C. Jewett, and S. T. Laughlin for critical reading of the manuscript.
Ellen Sletten was born in New Hampshire (USA) in 1984. She obtained her BS in Chemistry from Stonehill College in 2006, where she worked in the laboratory of Prof. Louis J. Liotta on the synthesis of polyhydroxylated pyrrolizidines. She is currently pursuing her PhD at UC Berkeley under the direction of Prof. Carolyn Bertozzi. Her research focuses on the synthesis of cyclooctyne reagents for use in copper-free click reactions.
Prof. Carolyn Bertozzi is the T.Z. and Irmgard Chu Distinguished Professor of Chemistry and Professor of Molecular and Cell Biology at UC Berkeley, and Professor of Molecular and Cellular Biology at UCSF. She is also the Director of the Molecular Foundry at the Lawrence Berkeley National Laboratory and an Investigator of the Howard Hughes Medical Institute. She earned her AB in Chemistry from Harvard University in 1988 and obtained her PhD at UC Berkeley in 1993 with Prof. Mark Bednarski. She carried out postdoctoral research at UCSF with Prof. Steven Rosen and joined the faculty at UC Berkeley in 1996.
Prof. Ellen M. Sletten, Department of Chemistry, University of California, Berkeley, CA 94720 (USA)
Carolyn R. Bertozzi, Departments of Chemistry and Molecular and Cell Biology and Howard Hughes Medical Institute, University of California and The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (USA), Fax: (+1)510-643-2628.