New chemical, physical, and biological properties can be introduced into proteins via the incorporation of unnatural amino acids, and thus unnatural amino acids have been exploited to study a wide range of biological problems involving proteins. In particular, the stability, specificity, and catalytic properties of multiple proteins have been extensively studied with unnatural amino acids incorporated with chemically acylated tRNAs, revealing fundamental principles for protein structure and function (
Cornish et al., 1995). Microinjection of chemically acylated tRNAs into
Xenopus oocytes enables the structure-function studies of integral membrane proteins with unnatural amino acids. Various ion channels and neurotransmitter receptors have been probed with unnatural amino acids using electrophysiological techniques, yielding novel insights on the function role of conserved amino acid residues and gating mechanism of this important family of proteins (
Beene et al., 2003,
Lummis, et al., 2005). Recent years have also seen elegant studies of signal transduction, selectivity of ion channels, and chromatin biology using unnatural amino acids that are introduced with native chemical ligation or expressed protein ligation (
Hahn et al., 2007,
Valiyaveetil et al., 2006,
McGinty et al., 2008). There are already many excellent reviews on different aspects of this subject (
Cornish et al., 1995,
Hohsaka and Sisido, 2002,
Beene et al., 2003,
Link et al., 2003,
Wang and Schultz, 2004,
Schwarzer and Cole, 2005,
Pellois and Muir, 2006), so here we will focus on applications of unnatural amino acids that are genetically encoded using the orthogonal tRNA/synthetase only. Representative examples are summarized to illustrate the usefulness of this technology in modifying and probing proteins, regulating protein functions, and generating new protein properties. We note that some of the unnatural amino acids discussed below have also been incorporated into proteins using other approaches, details of which can be found in the reviews cited above.
Labeling and Modifying Proteins
Reactive side chains of canonical amino acids such as cysteine and lysine have long been used for selective chemical modification of proteins, but the intrinsic selectivity is low unless the amino acid is absent or can be mutated in the target protein. A nonproteinogenic chemical group embodied by an unnatural amino acid can serve as a unique chemical handle for bio-orthogonal chemical reactions, through which a variety of reagents can be selectively appended to proteins in vitro. These reagents can be biophysical probes, tags, post-translational modifications, and groups modifying protein stability or activity. Currently, site-specific labeling of a target protein via unnatural amino acids directly inside live cells is still challenging.
The keto group is not present in any of the canonical amino acids, and reacts with hydrazides, alkoxyamines, and semicarbazides under aqueous, mild conditions to respectively produce hydrazone, oxime, and semicarbazone linkages that are stable under physiological conditions. The keto group has been genetically encoded in
E. coli in the form of
p-acetylphenylalanine
(5) and specifically labeled
in vitro with fluorescein hydrazide and biotin hydrazide with greater than 90% yield (
Wang et al., 2003). Similarly, fluorescent dyes are selectively appended to the
m-acetylphenylalanine
(6) introduced into the membrane protein LamB in
E. coli (
Zhang et al., 2003).
p-Acetylphenylalanine and
p-benzoylphenylalanine
(21) have also been incorporated into rhodopsin, a transmembrane G protein-coupled receptor, in mammalian cells (
Ye et al., 2008). The purified mutant rhodopsin is labeled
in vitro with fluorescein hydrazide.
Natural glycoproteins are often present as a population of different glycoforms, which complicates glycan structure analysis and the study of glycosylation effects on protein structure and function. Generation of pure glycoprotein mimetics with defined glycan structures will be valuable for the systematic understanding of glycan function and the development of improved glycoprotein therapeutics. To prepare homogenous glycoprotein mimetics, sugars are synthesized in the aminooxy form and attached to
p-acetylphenylalanine incorporated at defined sites in proteins () (
Liu et al., 2003). The attached sugar is subsequently elaborated by adding additional saccharides with glycosyltransferases to generate oligosaccharides with defined structures. Alternatively, an aminooxy-derivatized glycan can also be covalently coupled to the
p-acetylphenylalanine containing protein in one step.
PEGylation, the attachment of polyethylene glycol (PEG) to proteins, increases protein stability and solubility. In comparison with using common amino acids, PEGylation using a unique chemical handle of an unnatural amino acid increases the homogeneity of the final protein product and provides flexibility in choosing PEGylation sites for optimizing protein activity. PEG has been attached to p-acetylphenylalanine introduced in the human growth hormone to afford a protein that retained wild-type activity but had a considerably improved half-life in serum (PG Schultz, personal communication). This approach can be extended to other therapeutic proteins to generate purer and more stable protein drugs.
Azide and acetylene are two additional nonproteinogenic chemical groups, which react with each other through a copper (I)-catalyzed [2+3] cycloaddition reaction,
p-Azidophenylalanine
(7) and
p-propargyloxyphenylalanine
(8) have been genetically incorporated into human superoxide dismutase-1 in
E. coli and yeast (
Deiters et al., 2003,
Deiters et al., 2004). Purified mutant proteins are labeled
in vitro with fluorescent dyes or PEG derivatized with the complementary functional group (). The requirement of Cu
+ as catalyst may make it difficult to use this reaction inside live cells. The azide group can also react with phosphine derivatives through the Staudinger ligation.
p-Azidophenylalanine has been incorporated into the Z-domain protein in
E. coli or into peptides displayed on phage, and can be labeled with fluorescein-derived phosphines () (
Tsao et al., 2005).
Phenylselenide undergoes oxidative elimination in hydrogen peroxide followed by Michael addition with thiols (). Phenylselenocysteine
(9) is genetically incorporated into GFP in
E. coli (
Wang et al., 2007). After treating with hydrogen peroxide, the resultant dehydroalanine is selectively labeled
in vitro with thiol-containing mannopyranose to generate an analogue of glycosylated GFP, or with n-hexadecylmercaptan to generate a palmitoylated GFP with a non-hydrolyzable linkage. Similarly, phenylselenocysteine has been used to append analogues of methyl- or acetyl-lysine onto the purified
Xenopus histone H3, important histone modifications that contribute to chromatin structure and accessibility (
Guo et al., 2008). The H3 protein with an acetyl-lysine 9 mimic is deactylated by a histone deacetylation complex and phosphorylated by Aurora B kinase, suggesting that such chemically labeled histones are likely functional in nucleosomes to facilitate the analysis of chromatins.
Besides using a chemical handle, changes can be directly incorporated into proteins in the form of unnatural amino acids, in particular, to mimic post-translational modifications. These modifications are common in eukaryotic proteins playing important roles in protein stability, localization, and function, but cannot be easily produced in bacterial cells. It is also difficult to prepare homogenously modified proteins from eukaryotic cells. Unnatural amino acids with the properties of post-translational modifications may allow the generation of homogenous proteins with the desired modifications to uncover their functional roles and possibly to control protein function.
Glycosylation is a frequent modification of eukaryotic proteins. Homogeneous glycoprotein mimetics with defined glycen structures can be prepared using the keto or phenylselenide-containing unnatural amino acids, but it leaves a small unnatural linkage between the protein and the first sugar. This linkage does not prevent glycosyltransferases from attaching additional sugars (
Liu et al., 2003), but whether it affects other glycoprotein functions awaits more research. To prepare a native glycosylated protein, mutant synthetases have been evolved that site-specifically incorporate β-
N-acetylglucosamine-
O-serine
(10) or N-acetylgalactosamine-α-
Othreonine
(11) into proteins in
E. coli (
Zhang et al., 2004b,
Xu et al., 2004). Since there is no endogenous glycosyl modification in
E. coli, genetic incorporation of glycosylated amino acids will help improve the production of proteins that require glycosylation for proper folding and function.
Phosphorylation of serine, threonine, and tyrosine by kinases and reversion by phosphatases are key switches in various signal transduction cascades. A phosphorylation mimic resistant to phosphatase can be used to identify the phosphorylation role of a specific residue, and to create stably activated proteins for functional assays.
p-Carboxymethylphenylalanine (
pCMF,
12), a nonhydrolyzable analog of phosphotyrosine, was found capable of mimicking the phosphorylated state of Tyr (
Xie et al., 2007). Wild type human STAT1, after being phosphorylated at Tyr701, forms a homodimer and strongly binds a DNA duplex. The mutant STAT1 with Tyr701 substituted with
pCMF also binds the same DNA duplex tightly, suggesting that
pCMF could replace phosphotyrosine in the generation of constitutively active phosphoproteins.
Acetylation is another important reversible protein modification. For instance, acetylation of lysine residues in histone proteins controls the secondary structure of chromatin as well as gene expression levels. One method for generating acetyl- and methyl-lysine analogue tagged histones is described above using phenylselenocysteine.
Nε-acetyllysine (13) is genetically incorporated into proteins in
E. coli (
Neumann et al., 2008). In the presence of a deacetylase inhibitor, manganese superoxide dismutase containing
Nε-acetyllysine at residue Lys44 is produced; however, acetylation of Lys44 does not appear to affect the enzyme activity. In addition,
Nε-acetyllysine and other lysine analogues are also incorporated into GRB2 in HEK293 cells (
Mukai et al., 2008).
Probing Protein Structure and Function
The structure and function relationship of proteins can be studied with greater precision and accuracy when various biophysical probes are site-specifically incorporated into proteins. Genetically encoded biophysical probes further extend the potential of these studies into live cells, the native environment of proteins.
NMR spectra from large proteins or complexes are generally complicated to interpret. Specific NMR labels introduced at defined locations in a protein would greatly reduce the complexity and simplify signal assignment. Tri-fluoromethylphenylalanine (tfm-Phe,
14) was incorporated at the binding interface of two obligate dimers, nitroreductase and histidinol dehydrogenase, both of which contain active sites at the dimer interface (
Jackson et al., 2007). Substrate or inhibitor induced conformational changes are evident by monitoring the
19F NMR chemical shifts. In another report, the binding of a small molecule ligand to the thioesterase domain of fatty acid synthase was studied by NMR using tfm-Phe,
13C and
15 C and N labeled
o-methyltyrosine
(15), and
15N labeled
o-nitrobenzyltyrosine
(16), at 11 different sites (
Cellitti et al., 2008). Comparing the spectra of different mutants and the conformational changes upon addition of the small molecule, the binding site of this molecule is mapped on the protein. It is also worthy to note that photo-decaging of
16 to regenerate
15N labeled tyrosine provides a useful method of isotopic labeling of selected tyrosines without structural change.
p-Cyanophenylalanine (
pCNPhe,
17) is a probe for infrared (IR) spectroscopy (
Schultz et al., 2006). The stretching vibration of its nitrile group has strong absorption and a frequency (νCN) at ~2200 cm
−1, which falls in the transparent window of protein IR spectra.
pCNPhe was incorporated into myoglobin at His64, a site close to the iron center of the heme group, to examine ligand-bound states of the heme group. When the Fe(III) ligand is changed from water to cyanide in ferric myoglobin, νCN shifts from 2248 cm
−1 to ~2236 cm
−1. In the ferrous myoglobin, the linear Fe(II)CO complex shows a νCN absorption at 2239 cm
−1 while the bent Fe(II)NO and Fe(II)O
2 complexes at 2230 cm
−1. These results demonstrate that the nitrile group is a sensitive probe for ligand binding and local electronic environment.
Fluorescent unnatural amino acids may complement the widely used fluorescent proteins in imaging protein expression, dynamics and function. Their smaller size can minimize potential perturbations to target proteins, and their fluorophores can be designed to report different micro environmental changes. Dansylalanine
(18) contains the dansyl fluorophore, whose fluorescence intensity increases in hydrophobic surroundings. When incorporated on the surface of a β-barrel in human superoxide dismutase, dansylalanine shows little change in fluorescence after protein denaturation; when placed at an internal site of the β-barrel, its emission wavelength is red-shifted and intensity greatly decreased upon denaturation (
Summerer et al., 2006). Therefore, fluorescence change can be used to infer when a residue is exposed to solvent during unfolding. L-(7-hydroxycoumarin-4-yl) ethylglycine
(19) bears the coumarin fluorophore, which is sensitive to pH and polarity. It was genetically incorporated into myoglobin at two different helices, A or C (
Wang et al., 2006). At 3M urea, coumarin placed in helix A and C shows a similar increase in fluorescence. However, at 2M urea fluorescence increases only when coumarin is incorporated in helix A, suggesting that helix C is not destabilized until urea concentrations above 3M. This result is consistent with previous NMR data, and suggests that coumarin fluorescence can also report local protein unfolding. On the other hand,
p-nitrophenylalanine
(20) quenches the fluorescence of nearby tryptophans (
Tsao et al., 2006). When it is incorporated in a leucine zipper protein GCN4, the tryptophanyl fluorescence is quenched in a distance-dependent manner, making
p-nitrophenylalanine a useful distance probe to monitor protein folding or conformational changes. Though still in its infancy, the use of genetically encoded fluorescent amino acids as real time optical reporters and biosensors in live cells is promising.
Introducing heavy atoms into protein crystals is a critical step for phase determination in X-ray crystallography. A new method to introduce heavy atoms into proteins is to genetically incorporate
p-iodophenylalanine (
3) into proteins in
E. coli or yeast.
p-Iodophenylalanine was incorporated into T4 lysozyme, and the mutant protein was crystallized (
Xie et al., 2004). Diffraction data were collected with a laboratory CuKα X-ray source, and the structure was solved using single-wavelength anomalous dispersion phasing. A single iodinated amino acid among 164 residues results in a strong anomalous signal, about 3% of the total intensities, which compares favorably with the level achieved with selenomethionine using synchrotron beams.
p-Iodophenylalanine causes little structural perturbation when substituted for Phe in the core of T4 lysozyme. This approach ensures that the heavy atom iodine is quantitatively introduced at a specific site of the target protein. The strong anomalous signal, the possibility of incorporation at multiple sites and in different cell types, and the use of an in-house X-ray source should facilitate solving protein structures in a high-throughput manner.
The side chain size of unnatural amino acids can be conveniently altered at atomic precision to provide desired changes for probing bulk effects of amino acids on protein structure and function. The fast inactivation mechanism of the voltage-gated K
+ channel Kv1.4 was examined using unnatural amino acids in HEK293 cells (
Wang et al., 2007). The classic ball-and-chain model for channel inactivation proposes that the N-terminal inactivation peptide forms a ball-like domain to occlude the channel exit for ions. In contrast, a new model hypothesizes that the inactivation peptide threads through a side portal and extends into the inner pore to block ion flow. To experimentally test this, Tyr19 in the inactivation peptide was initially mutated to Phe or Trp, which shows no difference in channel inactivation. However, mutation to
o-methyltyrosine (
1) results in a markedly slower inactivation, as does mutation to dansylalanine (
18) (). Modeling suggested that the diameter of the inactivation peptide, which is unchanged for Phe or Trp at this site but is larger for
o-methyltyrosine and dansylalanine, is important for channel inactivation. This is likely due to the narrow width of the side portal in the channel, supporting the new model for channel inactivation.
Identifying and Regulating Protein Activity
The identification of protein interactions and control of protein activities in vivo would ideally require methods that are noninvasive and easy to administer. Application of light is an attractive approach. Photo-responsive unnatural amino acids that can crosslink with nearby molecules, shed protecting groups, or change conformation have been developed and genetically incorporated into proteins. They enable the in vivo manipulation of proteins, which should have great potential in molecular and synthetic biology applications.
Crosslinking has been an important technique for uncovering protein interactions in cells. Site-specific photocrosslinking enabled by genetically encoded unnatural amino acids may help to pinpoint transient or weak interactions, to identify which region is involved, and to distinguish direct from indirect interactions.
p-Azidophenylalanine (
7),
p-benzoylphenylalanine (pBpA,
21), and
p-(3-trifluoromethyl-3
H-diazirin-3-yl)-phenylalanine (TfmdPhe,
22) contain photocrosslinking side chains and have been genetically incorporated into proteins (
Chin et al., 2002,
Chin et al., 2002,
Tippmann et al., 2007). pBpA is incorporated into the heat shock protein ClpB in
E. coli at residue Tyr251, which is considered as the main substrate recognition residue (
Schlieker et al., 2004). Biotinylated substrate peptides are shown to be crosslinked upon UV light exposure, but not if pBpA is incorporated elsewhere. pBpA is also incorporated into Ste2p, a GPCR that binds a short peptide pheromone α-factor, at several sites on the extracellular loops in yeast (
Huang et al., 2008). At two sites biotinylated α-factor is indeed photocaptured and identified by Western blot. pBpA has also been incorporated into Grb2 (growth-factor-receptor-bound protein 2) in the ligand binding pocket of the SH2 domain in mammalian cells. When activated epidermal growth factor receptor (EGFR) and pBpA containing Grb2 are co-expressed and treated with UV light, a larger molecular weight band is detected suggesting the binding of EGFR to Grb2 (
Hino et al., 2005). Genetically encoded photocrosslinkers should provide high specificity and be compatible with a wide range of biological processes, making this strategy attractive for probing protein complexes and interactions
in vivo.Photocaging of critical residues can be harnessed to regulate protein activity using light. A photoremovable protecting group is installed onto the suitable amino acid in the target protein, which masks the amino acid and renders the protein inactive. Photolysis releases the caging group and converts the amino acid to the wild-type active form, generating abrupt or localized changes. For example, mutation of the active-site Cys residue in the proapoptotic protease caspase 3 to
o-nitrobenzylcysteine (
23) in yeast leads to a catalytically inactive enzyme. UV illumination of the cell lysate converts ~40% of the caged caspase to the active enzyme (
Wu et al., 2004).
o-Nitrobenzyltyrosine (
24) is incorporated into (β-galactosidase at Tyr503 in
E. coli, which effectively reduced the activity of this enzyme to 5% of the wild-type form (
Deiters et al., 2006). After a 30 minute exposure of bacterial cells to UV light, the enzyme regains activity to ~70% of wild-type levels. A photocaged serine, 4,5-dimethoxy-2-nitrobenzylserine (DMNB-Ser,
25), is incorporated at phosphoserine sites in the transcription factor Pho4 to control its phosphorylation in yeast (
Lemke et al., 2007). DMNB-Ser blocks phosphorylation until photolysed with 405 nm light. Serine is then regenerated and subsequently phosphorylated, triggering the nuclear export of Pho4 (). Photocaging prevents post-translational modification of the selected residue until a desired moment, which is a more versatile method than mutating the residue to a null amino acid in examining signal transductions.
Photolysis of a caged residue is an irreversible process. Reversible modulation can be achieved with the photochromic azobenzene group, which undergoes a reversible
cis-trans isomerization using light with different wavelengths (). The resultant change in geometry and/or dipole of the compound can be used to regulate protein activity.
p-Azophenylphenylalanine (AzoPhe,
26) has been incorporated at Ile71 of the
E. coli catabolite activator protein, a transcriptional activator (
Bose et al., 2006). Its binding affinity for the promoter sequence decreases fourfold after irradiation at 334 nm, which converts the predominant
trans AzoPhe to the
cis-form. The isomerized
cis AzoPhe is switched back to the
trans-state by irradiation at >420 nm, after which the protein affinity for the promoter is completely recovered.
Generating New Protein Properties
Natural evolution affords numerous proteins for life needs, while directed protein evolution and rational engineering further increase the diversity of protein functions. Genetically encoding additional functional groups into proteins prompts the hope for generating totally novel and previously impossible protein properties, and some initial progresses are indeed encouraging.
Metal ions bound in proteins participate in a large number of catalytic and electron transfer reactions in cells. Rationally engineering a metalloprotein is challenging, as it is difficult to predict and control the amino acid shells surrounding the metal ion. Multidentate metal-chelating amino acids with coordinating atoms pre-orientated in the correct configuration may facilitate such effort. Bipyridylalanine (BpyAla,
27) has been shown to reversibly bind copper ions when genetically incorporated into T4 lysozyme (
Xie et al., 2007). The Cu(II) binding property is exploited to oxidatively cleave the phosphosugar moiety of the nucleic acid backbone. BpyAla is incorporated into the catabolite activator protein (CAP) at Lys26, a site close to the protein-DNA interface (
Lee and Schultz, 2008). In the presence of Cu(II) and 3-mercaptopropionic acid, the mutant CAP, originally a transcription factor, cleaves double-stranded DNA at its consensus sequence with high specificity. This method may be generally applied to other DNA binding proteins to map their preferred DNA sequences.
The distance probe
p-nitrophenylalanine
(20) has recently been rediscovered to stimulate potent immune responses for novel immunogenic applications. An animal usually will not mount a substantial immune response against a self protein. This immune self-tolerance has been shown to be broken by introducing
p-nitophenylalanine into the antigen (
Grünewald et al., 2008). Incorporation of this unnatural amino acid into tumor necrosis factor α (TNF-α) leads to a strong immune response directed at the unnatural mutant TNF-α in mice, even in the absence of adjuvant. The generated antibodies cross-react with native mouse TNF-α. The immune response is retained in mice after exposure to the mutant TNF-α, and protects mice against lipopolysaccharide-induced death. These striking observations have the potential for generating new therapies for cancer or protein-misfolding diseases.