|Home | About | Journals | Submit | Contact Us | Français|
Proteolysis is a key regulatory post-translational modification in diverse cellular processes including programmed cell death, immune function, and development. Tracking proteolytic events has become a focus of researchers assessing the downstream consequences of protease activation. In this review we summarize unbiased methods for identifying protease substrates and tracking the extent of cleavage, a field termed “degradomics”. These include one- and two-dimensional gel-based methods for identifying protease substrates, N-terminal peptide identification methods for simultaneously identifying substrates and cleavage sites, and approaches for quantitation of cleavage events during endogenous proteolysis. Individual methods have identified more than 300 caspase-cleaved targets during apoptosis suggesting broad future applications for these technologies.
Proteolysis is both a catabolic process for metabolite recycling and energy generation and an essential post-translational modification (PTM) that regulates intracellular and extracellular signal transduction.[2–4] Like other PTMs, proteolysis regulates protein function more rapidly than is possible via transcription and translation. Thus, its use is pivotal in responsive biological proccesses, for example in coagulation and the inflammatory response.[5–7] Site-specific proteolysis can also rapidly degrade numerous proteins in a controlled fashion, resulting in widespread morphological changes (e.g. in apoptosis or development). In studies of other PTMs, such as phosphorylation or glycosylation, systems-level protein profiling has linked physiological stimuli to site-specific protein modifications.[9,10] However, while these modifications can be targeted by affinity chromatography, proteolyzed proteins have no unique chemical handle enabling their enrichment from complex mixtures. This review focuses on approaches to track proteolysis in complex mixtures by defining the sequence specificity of proteases, identifying their protein targets and sites of cleavage, and measuring the kinetics and extent of cleavage events. Apoptosis and the proteolytic activity of the caspase family of proteases are model systems that will be used as a semi-universal metric to compare the techniques.
Key efforts towards defining proteolytic substrates focus on the determination of primary sequence specificities. Diverse methods including substrate phage display [11,12] and mRNA display  identify small numbers of optimized cleavage sequences, while chemically- and enzymatically-derived peptide libraries interrogate the amino acid preferences for proteases at each subsite.[14–17] While highly successful, peptide-centric efforts suffer from an inability to define the roll of higher-order protein structure on substrate determination. To address these deficiencies, a number of gel-based techniques have been developed to identify proteases’ protein substrates.
Two-dimensional differential gel electrophoresis (2D-DiGE) employs orthogonal electrophoresis methods to separate complex mixtures into resolvable spots. Comparative staining reveals differences between proteolyzed and control samples that can be analyzed by MS (Figure 1a). A variation on this approach, diagonal electrophoresis, uses two dimensions of SDS-PAGE with an intermediate in-gel protease treatment to identify protease targets. Proteins that migrate off the diagonal in the second, post-proteolysis dimension are likely substrates of the digesting protease (Figure 1b). 2D gel methods have been applied to caspase substrate identification, identifying 41, 13, and 15 putative substrates of caspases-1, -3, and -7 (Table 1).[19,20]
An alternative 1D gel-based method, termed PROTOMAP, has recently been described to identify cleavages that occur during Jurkat cell apoptosis.[21,22] Apoptotic and control cell lysates were separated by SDS-PAGE in adjacent lanes, and the gel lanes were sliced into bands (Figure 1c). Proteins in each band were identified by LC-MS/MS following in-gel trypsin digestion, and quantified by spectral counting. Proteins from apoptotic cells that decreased in intensity or shifted from higher to lower apparent molecular weight (261 of 1648 total proteins) were presumed to be caspase substrates. A caspase cleavage-derived peptide was detected directly in approximately one quarter of the proteolyzed proteins, and by overlaying the peptides of cleaved fragments onto a sequence map the authors could approximate the cleavage site for many others. However, without direct observation of a caspase-cleaved peptide, definitive cleavage sites cannot be assigned, and the possibility exists that non-caspase proteases are responsible for some of the observed proteolysis. A notable advantage of PROTOMAP is that an essentially infinite number of samples can be compared, limited only by the size of the gel. The authors exploited this to monitor a time course of apoptotic cleavage, and identified both early and late cleavage events during apoptosis and the relative stabilities of the resulting proteolytic fragments. Unfortunately, this technique does not enrich for proteolyzed proteins, and thus preferentially samples highly abundant proteins. Consequently, extensive MS instrument time is required for comprehensive peptide identification from low amounts (~100µg) of sample. It remains unclear whether the technique can be applied to study sparse proteolysis within a high background of unmodified proteins.
Several groups have taken advantage of the semi-unique nature of protein N-termini to identify both the proteolyzed protein and the site of cleavage. Full-length proteins have a limited number of reactive nitrogen atoms corresponding to lysine ε-amines and N-terminal α-amines. Upon digestion with a residue specific protease (e.g. trypsin), C-terminal and internal peptides gain an additional α-amine. Two teams of researchers have used this additional functionality to remove internal peptides and identify sites of proteolysis. One approach, termed COmbined FRActional DIagonal Chromatography (COFRADIC), treats full-length proteins with N-hydroxysuccinimide acetate to acetylate all lysines and N-termini (Figure 2a–b).[23–28] Subsequently, the proteins are trypsinized and the N-termini of internal peptides are capped with trinitrobenzenesulfonyl chloride, greatly increasing their hydrophobicity. The N-terminal peptides elute far earlier than the modified internal peptides on RP-HPLC, enabling their selective analysis by MS. By using an isotopically enriched amine-blocking group (e.g. trideuteroacetate) it is possible to differentiate blocked N-termini from endogenous N-terminal acetylation. COFRADIC has identified 58 caspase-cleaved peptides from apoptotic Jurkat T lymphocytes, 11 putative caspase-1 and 9 putative caspase-7 cleaved peptides in a caspase-1 treated lysate, and 585 direct human granzyme B substrates from a granzyme B-treated lysate. In a similar approach, Beynon and colleagues incubated an acetylated and trypsinized cell lysate with amine-reactive N-hydroxysuccinimide-activated beads, to capture internal peptides (Figure 2b).[29,30] Each of these techniques identifies proteins based on a single peptide. Inappropriate lengths (i.e. peptides that are too long or short) or poor ionizability make ~50% of tryptic peptides unsuitable for unambiguous identification via MS and thus limit the number of possible substrate identifications. Furthermore, incomplete capture of internal peptides can lead to a high background of false positives. On average, there are ~25 internal peptides for each N-terminal peptide in an acetylated lysate, so even small inefficiencies in their removal are highly problematic.
An alternative to removing internal peptides for N-terminal peptide identification is to positively enrich for the peptides of interest. Salvesen and colleagues have developed a method for enrichment based on selective guanidinylation of lysine residues using O-methylisourea (Figure 2c). O-methylisourea inefficiently modifies protein α-amines, leaving them available for subsequent chemical biotinylation. After digestion with trypsin, N-terminal peptides can be isolated with immobilized streptavidin. In contrast to the two previous methods for N-terminal modification, this approach discards endogenously N-acetylated N-termini (~80% of all N-termini) and thus increases the sensitivity for detecting proteolytic cleavages. The authors used their technique to investigate mitochondrial transit peptides from yeast, mouse, and human cells. They found a total of 34 transit peptides from 27 proteins, only 10 of which had been previously annotated, demonstrating the utility of this method for N-terminal discovery. It should be noted that this method depends on highly efficient modification of lysine residues and highly selective biotinylation of the remaining α-amines. Biotinylation of serine, threonine, or histidine side chains could lead to false identifications. In principle, database matching in the MS-analysis should distinguish real N-terminal peptides from spurious identifications, but incomplete peptide coverage, particularly close to the N-terminus, can introduce false positives.
We have employed subtiligase, an engineered variant of the bacterial protease subtilisin BPN’, to selectively label N-terminal peptides in a single step without lysine derivatization. Created by mutating the catalytic serine to a cysteine and modifying the geometry of this active site residue with a second point mutation (P225A), subtiligase has negligible amidase activity but remains active as an esterase.[33–35] The thioester-enzyme intermediate formed with peptide ester substrates during the catalytic cycle is slowly hydrolyzed by water, but can be rapidly intercepted by N-terminal amines, allowing transfer of the N-terminal portion of the ester onto free amines. We have never observed transfer onto peptide or protein side chains, suggesting an enzymatic specificity thus far unachievable via small molecule approaches. Identification of N-terminal peptides was accomplished by treating cell lysates with subtiligase and a biotinylated-peptide ester (Figure 2c).[36,37] The resulting peptide mixture was trypsinized, N-terminal peptides were captured on streptavidin beads, and the desired peptides were released by cleavage of a TEV protease site in the original biotinylated peptide. TEV protease cleavage leaves a dipeptide tag that can be used to confirm true positives. In typical runs, >95% of identified peptides contain this tag. Using this method on apoptotic Jurkat cells, we identified 333 caspase-like cleavage sites on 292 protein substrates. The broad scope of these substrates enabled the discovery of unexpected caspase enzymatic activities- most notably cleavage within helices and systematic cleavage of proteins within protein-protein complexes. Though highly specific, the efficiency of subtiligase N-terminal labeling is low, and thus this technique requires large amounts of material (typically 50–100 mg of a complex mixture per experiment) to identify numerous substrates. However, the highly enriched set of N-termini that result can be routinely analyzed in a single day via LC-MS/MS. Like all N-terminal identification procedures, the identification of one peptide per protein systematically excludes proteins with N-terminal peptides not readily identifiable via MS.
To date, the primary focus of protease-substrate classification techniques has been on identifying substrates and their cleavage sites. Efforts to understand the dynamics and extent of proteolysis could make significant contributions to our understanding of proteolysis in vivo. Simple, two-plex-quantitation can be metabolically, chemically, or enzymatically introduced into these techniques to evaluate relative levels of proteolysis in two samples (Figures 3a–b).[2,23,26,28,38] TMT- and iTRAQ-based quantitation allows for relative comparison of up to eight samples, enabling more detailed kinetic measurements or observations of proteolytic efficiency across multiple biological conditions (Figure 3c). Application of these methods to N-terminal peptides, however, monitors only the appearance of new N-termini, leaving the extent of cleavage undetermined. PROTOMAP and other gel-based methods can theoretically analyze every peptide in the protein and thus can monitor both appearance of new peptides and the extent of cleavage.
Recent technological developments have revolutionized the process of protease substrate identification. These advances have provided facile means to categorize a protease’s sequence specificity, in vitro substrates, and the in vivo proteolysis that results from biological stimuli. The two main approaches, gel-based methods and N-terminal identification, provide complementary information: gel-based methods do not enrich for the peptides of interest and thus may miss low abundance substrates, while protein identification based on enrichment of single peptides is more sensitive, but inherently limited by the nature of MS-database matching. The introduction of MS-based quantitation into these analyses will enable comprehensive and high-throughput profiling of substrate proteolysis. These advances should reveal the full scope of proteolysis in both normal and disease states.
We would like to thank Juan Diaz, J. T. Koerber, Sami Mahrus, and David Wildes for their helpful comments on the manuscript. This work is funded by the National Institute of Health F32-AI077177-01 (NJA), and R01 GM081051 (JAW), and from the Hartwell Foundation (JAW).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Nicholas J. Agard, Department of Pharmaceutical Chemistry, University of California, San Francisco, UCSF MC 2552, 1700 4th St, San Francisco, CA 94158, Email: ude.fscu@dragA.salohciN.
James A. Wells, Department of Pharmaceutical Chemistry, University of California, San Francisco, UCSF MC 2330, 1700 4th St, San Francisco, CA 94158, Email: ude.fscu@slleW.miJ.