|Home | About | Journals | Submit | Contact Us | Français|
Post-translational modification of proteins by ubiquitin is a fundamentally important regulatory mechanism. However, proteome-wide analysis of endogenous ubiquitylation remains a challenging task, and almost always has relied on cells expressing affinity tagged ubiquitin. Here we combine single-step immunoenrichment of ubiquitylated peptides with peptide fractionation and high-resolution mass spectrometry to investigate endogenous ubiquitylation sites. We precisely map 11,054 endogenous putative ubiquitylation sites (diglycine-modified lysines) on 4,273 human proteins. The presented data set covers 67% of the known ubiquitylation sites and contains 10,254 novel sites on proteins with diverse cellular functions including cell signaling, receptor endocytosis, DNA replication, DNA damage repair, and cell cycle progression. Our method enables site-specific quantification of ubiquitylation in response to cellular perturbations and is applicable to any cell type or tissue. Global quantification of ubiquitylation in cells treated with the proteasome inhibitor MG-132 discovers sites that are involved in proteasomal degradation, and suggests a nonproteasomal function for almost half of all sites. Surprisingly, ubiquitylation of about 15% of sites decreased more than twofold within four hours of MG-132 treatment, showing that inhibition of proteasomal function can dramatically reduce ubiquitylation on many sites with non-proteasomal functions. Comparison of ubiquitylation sites with acetylation sites reveals an extensive overlap between the lysine residues targeted by these two modifications. However, the crosstalk between these two post-translational modifications is significantly less frequent on sites that show increased ubiquitylation upon proteasome inhibition. Taken together, we report the largest site-specific ubiquitylation dataset in human cells, and for the first time demonstrate proteome-wide, site-specific quantification of endogenous putative ubiquitylation sites.
Ubiquitin is a 76 amino acid long protein that can be conjugated to the ε-amino group of lysines in a process termed ubiquitylation or ubiquitination (1, 2). Post-translational modification (PTM)1 of proteins by ubiquitin is a reversible regulatory mechanism that is well conserved in eukaryotic organisms. The role of ubiquitylation is extensively studied in the ubiquitin proteasome system (UPS) where substrate-linked ubiquitin provides a signal for proteasomal degradation of target proteins (3). However, ubiquitylation also plays important roles in many other cellular processes including DNA damage repair, DNA replication, cell surface receptor endocytosis, and innate immune signaling (4–6). Deregulation of the UPS has been implicated in the development of cancer and neurodegenerative disorders (7–9). The clinical use of the proteasome inhibitor bortezomib, and ongoing clinical trials of several other inhibitors emphasize the therapeutic relevance of the UPS (10, 11).
Accurate mapping of PTM sites is a key requirement to determine their functional roles and to understand the regulatory complexity of the proteome. Advancements in high resolution mass spectrometry (MS)-based proteomics have enabled the identification of thousands of in vivo PTMs (12). Quantitative proteomics can be used to analyze relative changes in PTM abundance on a global scale, enabling the identification of perturbation-relevant regulatory sites in complex signaling networks.
Identification of ubiquitylation sites by mass spectrometry is based on the presence of a di-glycine (di-Gly) remnant on ubiquitylated lysines. The di-Gly remnant is derived from the two C-terminal glycine residues of ubiquitin that remain covalently linked to modified lysines following proteolytic digestion with trypsin. The distinct mass shift (114.0429 Da) caused by the di-Gly remnant enables identification and precise localization of ubiquitylation sites based on peptide fragment masses. Trypsin proteolysis of proteins modified by ubiquitin, NEDD8, or ISG15 generates an identical di-Gly remnant on modified lysines, making it impossible to distinguish among these modifications by mass spectrometry. However, the expression of ISG15 and its conjugation to lysines is relatively low in cells cultured under standard cell culture conditions (13), and NEDD8 is believed to target primarily cullin family proteins (14). Consequently, a great majority of cellular peptides containing the di-Gly remnant are believed to stem from ubiquitylated proteins. Therefore, in this paper we refer to all di-Gly modified lysines as “ubiquitylation sites” even though a small fraction of these sites is likely to originate from modification by ISG15 or NEDD8.
Large-scale ubiquitylation site mapping by mass spectrometry was first demonstrated in yeast by identifying over 100 ubiquitylation sites (15). Since then four large-scale ubiquitylation screens have mapped 1,192 sites in human cells (16–19). The methods used in all these studies require enrichment of ubiquitylated proteins. Although, many putatively ubiquitylated proteins were identified (16), only a relatively small number of ubiquitylation sites were mapped. Limitations of previous methods for in-depth ubiquitylation analysis and their incompatibility with a proteome-wide, site-specific quantification highlighted the need to develop more robust methods of ubiquitylation site identification and quantification.
In this study we developed a streamlined method in which ubiquitylated peptides are directly enriched from trypsin digested whole cell peptide mixture with a recently developed di-Gly-lysine-specific antibody (17). Direct immunoenrichment of ubiquitylated peptides, together with peptide fractionation and high resolution mass spectrometery, allowed in-depth analysis of putative ubiquitylation sites. Using this method we identified a considerable portion of previously known human ubiquitylation sites, and discovered more than 10,000 additional sites. Furthermore, we combined our method with stable isotope labeling by amino acids in cell culture (SILAC) to quantify changes in ubiquitylation in response to the proteasome inhibitor MG-132. The described method enables proteome-wide quantification of endogenous ubiquitylation sites in response to different cellular perturbations.
HEK293T cells were cultured in DMEM and MV4–11 cells in RPMI 1640 medium supplemented with 10% fetal bovine serum, l-glutamine, penicillin, and streptomycin. For SILAC labeling, cells were cultured in media containing either l-arginine and l-lysine or l-arginine-U-13C6-15N4 and l-lysine-U-13C6-15N2 (Cambridge Isotope Laboratories, Andover, MA) as described previously (20). All cells were cultured at 37 °C in a humidified incubator containing 5% CO2. For mapping of ubiquitylation sites, three independent experiments were performed using unperturbed HEK293T cells. Before lysis, cells were washed twice with phosphate-buffered saline and harvested using a cell scraper. For the quantification of ubiquitylation upon proteasome inhibition, two replicate experiments were performed using MV4–11 cells. Cells were treated with MG-132 (10 μm) or dimethyl sulfoxide for 4 hours. Cells were harvested by centrifugation, and washed twice with phosphate-buffered saline.
Peptide fractions were analyzed on a hybrid linear ion-trap/Orbitrap mass spectrometer (LTQ-Orbitrap Velos, Thermo Scientific) equipped with a nanoflow high-performance liquid chromatography system (Thermo Scientific) as described (21). Peptide samples were loaded onto C18 reversed phase chromatography columns (length 15 cm, inner diameter 75 μm) and eluted with a linear gradient from 8 to 50% acetonitrile-water containing 0.5% acetic acid in 3 to 3.5 h. The separated peptides were ionized using electrospray ionization, and measured in the mass spectrometer. Typical mass spectrometric conditions were: spray voltage 2.0 kV, no sheath and auxiliary gas flow, heated capillary temperature 275 °C. The LTQ Orbitrap Velos was operated in data dependent mode to automatically switch between MS and MS2 acquisitions. Survey full scan MS spectra (m/z 300–1150) were acquired in the Orbitrap with a resolution r = 30,000 after accumulation to a target value of 1,000,000 ions in the linear ion trap. The ten most intense ions were sequentially isolated and fragmented by higher-energy C-trap dissociation (HCD) (22) or collision-induced dissociation (CID). The ion selection threshold was 5,000 counts for HCD and CID. An isolation window of 4.0 Da, an activation time of 0.1 ms, and a normalized collision energy of 40% were used. For CID, a normalized collision energy of 35% was used. Peptides with unassigned charge states, as well as with charge state less than +3 were excluded from fragmentation. Fragment spectra were acquired in the Orbitrap mass analyzer. A lock mass ion from ambient air (m/z 445.120025) was used for internal calibration of all measurements in the Orbitrap (23).
Cells were lysed in modified RIPA buffer (1% Nonidet P-40, 0.1% sodium deoxycholate, 150 mm NaCl, 1 mm EDTA in 50 mm Tris-HCl pH 7.5) supplemented with protease inhibitors (Complete protease inhibitor mixture tablets, Roche Diagnostics) and N-ethylmaleimide to inhibit cysteine-based proteases such as deubiquitylases (Sigma). Lysates were incubated for 15 min on ice and cleared by centrifugation at 16,000 × g. The insoluble fraction was re-suspended in modified RIPA buffer supplemented with 500 mm NaCl and sonicated to extract chromatin-bound proteins. Protein concentration of the cleared lysates was determined using BCA Protein Assay Reagent (Thermo Scientific). Proteins were precipitated in fourfold excess volumes of ice-cold acetone overnight at −20 °C and subsequently redissolved in denaturation buffer (6 m urea, 2 m thiourea in 10 mm HEPES pH 8). Cysteines were reduced with 1 mm dithiothreitol and alkylated with 5.5 mm chloroacetamide (24). About 20 mg of proteins were digested with endoproteinase Lys-C and sequencing grade modified trypsin (Promega, Madison, WI) after fourfold dilution in deionized water. Protease digestion was stopped by addition of trifluoroacetic acid to a final concentration of 1%. Precipitates were removed by centrifugation for 10 min at 3,000 × g. Peptides were purified using reversed-phase Sep-Pak C18 cartridges (Waters, Milford, MA). Peptides were lyophilized and re-dissolved in immunoprecipitation buffer (10 mm sodium phosphate, 50 mm sodium chloride in 50 mm 3-(N-morpholino)propanesulfonic acid pH 7.2) and modified peptides were immunoenriched using 100 μg (5 μg/1 mg of protein) of di-Gly-lysine-specific monoclonal antibody (Lucerna) for 12 h at 4 °C on a rotation wheel as described previously (25). The immunoprecipitates were washed three times in immunoprecipitation buffer followed by three washes in water, and immunoprecipitated peptides were eluted using 0.1% trifluoroacetic acid in H2O. Immunoenriched peptides were fractionated using isoelectric focusing (25, 26), or microcolumn-based strong cation exchange chromatography as described previously (27, 28). Peptide eluates were concentrated using a sample concentrator and acidified with 150 μl of 0.1% trifluoroacetic acid before desalting on reverse phase C18 StageTips as described previously (27).
Raw data files were analyzed using MaxQuant software (version 22.214.171.124) as described (29). Parent ion and MS2 spectra were searched against a database containing 87,061 protein sequences obtained from the human IPI protein database version 3.68 and 248 protein sequences of commonly observed contaminants using Andromeda search engine (30). Spectra were searched with a mass tolerance of 6 ppm in MS mode, 20 ppm in HCD MS2 mode, strict trypsin specificity and allowing up to two missed cleavage sites. Cysteine carbamidomethylation was searched as a fixed modification, whereas N-terminal protein acetylation, methionine oxidation and di-Gly-lysine were searched as variable modifications. Site localization probabilities were determined by MaxQuant using PTM scoring algorithm as described previously (29, 31). The C-terminal di-Gly-lysine site identifications were removed and peptides were filtered by posterior error probability to arrive at a false discovery rate of 1%. For comparison with Mascot, all raw data from HEK293T cells were used. Peak lists were generated using MaxQuant (development version 126.96.36.199), and database searches were performed using Mascot search engine. Ubiquitylated peptides were identified using MaxQuant. C-terminally modified peptides were removed. False discovery rate in the final dataset was less than one percent, estimated using target-decoy search strategy.
Statistical analysis was performed using the R software environment. Secondary structure analysis was performed using NetSurfP (32). Only predictions with a minimum probability of 0.5 for one of the different secondary structures (coil, α-helix, β-strand) were considered for analysis. The mean secondary structure probabilities of modified lysine residues were compared with the mean secondary structure probabilities of a control dataset containing all lysine residues of all ubiquitylated proteins identified in this study. p values were calculated using nonpaired Wilcox test.
Conservation analysis was performed using orthology assignments and multisequence alignments from the eggNOG database version 2.0 (33). The eggNOG database contains extended versions of the manually curated eurkaryotic orthology groups and adds additional nonsupervised orthology groups (euNOGs), thereby providing a broad coverage of protein sequences over different species. First, all modified human peptide sequences were mapped to eggNOG protein sequences and orthology groups. Only peptides matching to sequences in a single orthology group were considered for further analysis to avoid over counting of ambiguous peptide matches. Lysine conservation was determined separately for each species and at each alignment position that corresponds to a human modification site. A lysine residue was considered to be conserved if at least one sequence in the multi-sequence alignment contained a lysine residue at the aligned position. A data set containing all lysine residues of all proteins identified in this study served as control. The conservation of modified residues and control residues was plotted separately for each species. p values were calculated using Fisher's exact test.
One dimensional annotation enrichment was performed using logarithmized SILAC ratios. The distribution of all site ratios was compared with the distribution of all sites annotated with a specific PFAM, GOCC, or GOBP term. p values were calculated using nonpaired Wilcox test and adjusted for multiple comparisons (34).
Gene Ontology (GO) enrichment analysis was performed using the functional annotation tool of the DAVID bioinformatics resources (35). A p value < 0.01 (adjusted for multiple comparisons) was considered statistically significant. Enriched terms were sorted by p value. To show diverse processes enriched in our data, redundant or highly similar terms were removed. Protein interaction network analysis was performed using interaction data from the STRING database (36). Only interactions with a score > 0.7 are represented in the networks. Cytoscape version 2.8 (37) was used for visualization of protein interaction networks.
HEK293T cells were cotransfected with GFP-tagged proteins and Strep-HA-tagged ubiquitin. Cells were lysed in modified RIPA buffer and 1 mg of lysate was subjected to Strep-tactin Sepharose (IBA) pull-down. After washing for 5 times in lysis buffer, proteins were resolved on SDS-PAGE gels and immunoblotted with antibodies against GFP (Santa Cruz, Santa Cruz, CA).
In this study we identified 11,054 ubiquitylation sites on 4,273 human proteins. To generate this map of endogenous human ubiquitylation sites, we used unperturbed human embryonic kidney (HEK293T) and acute monocytic leukemia (MV4–11) cells. Proteins were digested into peptides using trypsin and modified peptides containing di-Gly remnants were directly immunoenriched using a di-Gly-lysine-specific monoclonal antibody (Fig. 1A) (17). To further reduce the complexity of the immunoenriched samples, peptides were fractionated using isoelectric focusing (26), or a microcolumn-based strong cation exchange method (27).
Peptide fractions were analyzed on a high resolution hybrid linear ion-trap/Orbitrap mass spectrometer (LTQ-Orbitrap Velos) (21). To obtain high quality peptide sequence information, a majority of peptides were fragmented using HCD (22), and both intact peptide ions as well as their fragment ions, were analyzed in the high resolution Orbitrap mass analyzer. In our initial experiments, we observed that a majority (~95%) of the di-Gly modified peptides were identified with a charge state of +3 or higher (data not shown). In contrast, a majority of unmodified peptides were sequenced with a charge state of +2. As a consequence of their higher charge state, di-Gly modified peptides are present in a lower mass/charge (m/z) range. Therefore, we only sequenced peptides in the mass range of m/z 300–1150 and used data dependent acquisition to exclude peptides with a charge state below +3. Thus, the MS method was optimized to maximize the identification of di-Gly modified peptides, and to minimize sequencing of unmodified peptides found as background in the immunoenriched samples. In the immunoenriched samples >35% of the identified peptides contained the di-Gly remnant. In contrast, a nonenriched proteome dataset from MV4–11 cells (unpublished data) contained only 0.02% of the di-Gly modified peptides, indicating a nearly 1000-fold enrichment of di-Gly modified peptides. In total, we analyzed 50 LC-MS runs on ~3.5 h long chromatography gradients. Intact peptide ions were measured with an average absolute mass accuracy of 0.63 ppm, and fragment ions with an average absolute mass accuracy of 3.38 ppm (supplemental Fig. S1A). Computational analysis was performed using MaxQuant (29), allowing a maximum false discovery rate of 1% at peptide and protein level. Peptide searches were performed with Andromeda search algorithms (30). The strict sequence specificity of trypsin (38), and its inability to cleave at ubiquitylated lysines greatly facilitated the unambiguous localization of ubiquitylation sites. The average localization probability of the identified sites in our data set was 98.3% (localization probabilities for individual sites are given in supplemental Table S1 and S2). We separately processed the entire HEK293T data, and peptides were identified either with the Andromeda (30) or Mascot search algorithms (39) (Matrix Science, Boston, MA). Both of these search algorithms identified similar number of ubiquitylation sites, and 94% of sites were identified by both search engines (supplemental Fig. S1B). We also verified ubiquitylation of eight randomly selected proteins by Western blotting (Fig. 1B).
The ubiquitylation sites identified in this study cover 67% (800 out of 1,192 sites) of the ubiquitylation sites reported in MS-based studies (16–19) (Fig. 1C). We find a greater overlap (73%) with the single largest ubiquitylation dataset (16), in which peptides were sequenced using the HCD method, as in our experiments. Thus, we sampled a substantial portion of known ubiquitylation sites, and identified more than 10,000 novel sites.
GO term annotations show that ubiquitylated proteins are present in all major cellular compartments including the nucleus, cytoplasm, plasma membrane, and mitochondria (Fig. 2A). These proteins are involved in various biological processes such as cell signaling, response to stress, cell cycle, and cell death (Fig. 2B).
To understand the properties of ubiquitylation sites, the local secondary structures of protein sequences surrounding ubiquitylation sites were investigated using NetSurfP software (32). The probabilities of different secondary structures (coil, α-helix, and β-strand) near ubiquitylated lysines were compared with the secondary structure probabilities of all lysines on proteins identified in this study. Ubiquitylated lysines are marginally, yet significantly, more frequently present in structured regions of proteins (p = 4.2e-03 for α-helix and p = 5.4e-04 for β-strand) and depleted in unstructured regions (p = 4.6e-07 for coil) (Fig. 2C).
To examine the properties of amino acids surrounding ubiquitylation sites, the frequencies of neighboring amino acid residues were compared for ubiquitylated lysines and nonubiquitylated lysines on ubiquitylated proteins using iceLogo (40). We observed a significant preference for hydrophobic residues such as Phe, Tyr, Trp, Leu, Ile, and Val adjacent (at position +1, +2, −1, and −2) to ubiquitylated lysines (Fig. 2D). Hydrophobic amino acids occur more frequently in ordered secondary structures as compared with unstructured regions. These data agree with the secondary structure bias described above. Although the antibody used here is suggested to recognize di-Gly modified peptides independent of the sequence context (17), it is possible that the observed sequence preference stem from subtle sequence bias exhibited by the antibody.
We examined the evolutionary conservation of ubiquitylated lysines and nonubiquitylated lysines in diverse species covering the evolutionary tree. Ubiquitylated lysines are significantly more conserved (p = 2.1e-4) than nonubiquitylated lysines (Fig. 2E), indicating a stronger selective pressure to keep ubiquitylated lysines compared with nonubiquitylated lysines. These data suggest that many ubiquitylation sites have conserved regulatory roles. Alternatively, it is possible that higher degree of ubiquitylated lysines conservation is due to conservation of structured protein regions in which they occur.
To understand the cellular functions of ubiquitylated proteins, we performed network analysis using protein interaction information from the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database (36). GO annotations were used to group proteins according to their involvement in biological processes, and interaction subnetworks were visualized using Cytoscape (37). A large number of ubiquitylation sites were observed on proteins involved in DNA replication (Fig. 3A). All subunits of several major regulatory protein complexes are ubiquitylated, including the MCM complex (MCM2–7), GINS complex (GINS1–4), SMC complex (SMC1–6), and the RFC (RFC1–4) complex (Fig. 3A and supplemental Table S1). Similarly, many critical regulators of the DNA damage response are modified by ubiquitylation (Fig. 3B), including several kinases (ATM, ATR, DNAPK, and CHK1), the MRN complex (MRE11-RAD50-NBS1), and the BRCA1-associated genome surveillance complex. It had been reported that monoubiquitylation of the Fanconi anemia (FA) pathway components FANCD2 and FANCI at K561 and K523, respectively, play important roles in DNA damage repair (41, 42). We identified both of the known sites on FANCD2 and FANCI and found 19 additional sites on these proteins. Ubiquitylation sites were also identified on several other proteins involved in the FA pathway, including BRCA2, FANCA, FANCE, and BRIP1, as well as USP1 and UAF1, which revert FANCI and FANCD2 monoubiquitylation. In agreement with the important role of ubiquitylation in cell cycle progression, we identified ubiquitylation sites on a large number of proteins involved in this process (supplemental Fig. S2, supplemental Table S1). Among these proteins are many cell cycle regulatory kinases such as Aurora B, PLK1, BUB1, and BUB3. Progression of the cell cycle is tightly coordinated by cyclins and cyclin-dependent kinases. We mapped ubiquitylation sites on cyclin B1, B2, C, I, K, and YL1, and on cyclin-dependent kinases including CDK1, 2, 4, 5, 6, 9, 10, and 11. Ubiquitylated proteins were found in several other interaction networks involved in nuclear as well as nonnuclear processes, including chromatin organization and chromatin remodeling, apoptosis and protein folding (supplemental Fig. S2). We identified 57 ubiquitylation sites on various histones (supplemental Fig. S3, and supplemental Table S1), about half of which have not been reported previously. Ubiquitylation of lysine K120 on histone H2A and H2B is known to play a crucial role in chromatin silencing (43), and X-chromosome inactivation (44). We identified ubiquitylation of this lysine and many others present in all five major histones (H1, H2A, H2B, H3, and H4). Nearly all of these sites are modified by other PTMs, such as acetylation and methylation, that are known to play important roles in the regulation of chromatin-associated processes (45).
Protein ubiquitylation is known to play important roles in the biology of cell surface receptors. We found a large number of ubiquitylation sites on diverse cell surface receptors (Fig. 4A). Ubiquitylation of receptor tyrosine kinases (RTKs) controls the amplitude and duration of receptor signaling (46). In agreement with this, we identified ubiquitylation sites on at least 13 different RTKs (Fig. 4A). We found ubiquitylation on numerous other cell surface receptors that are involved in a variety of signaling pathways such as Wnt, NOTCH, cytokine, chemokine, integrin, and G-protein coupled receptor signaling. Ubiquitylation was also detected on many cytoplasmic tyrosine and serine threonine kinases that function downstream of cell surface receptors (Fig. 4A–4C, supplemental Table S1).
It is known that RTKs and other cell surface receptors can be ubiquitylated in their intracellular domains, and that the type of ubiquitylation then determines the fate of receptor molecules (6, 46). In accordance with this, all ubiquitylation sites detected on receptor tyrosine kinases in our study are located on their intracellular domains. Similarly, several other cell signaling receptors such as IFNR1, IL-3R, IL-17R, and OSMR were ubiquitylated on their intracellular domains. Surprisingly, ubiquitylation sites on several other membrane receptors, such as CD11A, CD63, CD74, CD81, CD180, HLA-A, B, and C, were only found in domains annotated as extracellular (supplemental Table S3). However, it is unclear whether these protein domains are incorrectly annotated or certain proteins are preferentially ubiquitylated in their extracellular domains.
We mapped ubiquitylation sites on many proteins involved in immune response signaling downstream of tumor necrosis factor alpha (TNF-α), and interleukin 1 (IL1)/toll-like receptors. Sites were identified on TNF receptor associated factor (TRAF) family proteins, including TRAF2, 3, 4, 7, and TRAFD1 (Fig. 4B), as well as on several other ubiquitin ligases (IAP1, IAP2, XIAP, and ITCH) involved in these pathways. In addition, all three components (HOIL1/HOIP/SHARPIN) of the linear ubiquitin chain assembly complex (47, 48), which has an important role in the activation of NF-κB downstream of these receptors were found to be ubiquitylated. Ubiquitylation sites were also mapped on key signaling kinases in these pathways such as IRAK1, 3, TNK1, and NEMO, the regulatory subunit of the IkappaB kinase complex.
We discovered many ubiquitylation sites on proteins involved in diverse signaling pathways. More than 200 ubiquitylation sites occurred on various protein kinases known to function in signaling pathways activated by RTKs (supplemental Table S1 and Fig. 4C). These proteins include several important components of mitogen activated protein kinase and phosphatidylinositol 3-kinase (PI3-K) signaling pathways (Fig. 4C). Ubiquitylation of AKT1, a critical component in this pathway, is known to be important for its activation (49). Based on mutational analysis, it was reported that AKT1 is ubiquitylated on K8 and K14. Our results confirm the ubiquitylation of AKT1 on K8, and identify two additional ubiquitylation sites on this protein (K30 and K276), one of which (K30) is modified in both AKT1 and AKT2. Identification of ubiquitylation sites on all major components of the PI3-kinase and MAPK pathway suggests that ubiquitylation plays a broad regulatory role in this disease relevant signaling pathway. Ubiquitylation plays an important role in the regulation of cell surface receptor endocytosis (50). We found ubiquitylation sites on more than 40 proteins involved in receptor endocytosis and sorting of endocytic cargos (Fig. 4D), suggesting a widespread role of ubiquitylation in this process.
To demonstrate the utility of our approach for proteome-wide quantification of endogenous ubiquitylation sites, we used a SILAC-based proteomics strategy to quantify changes in ubiquitylation after proteasome inhibition. We inhibited proteasomal activity with MG-132 reasoning that ubiquitylation sites used to target proteins to the proteasome would accumulate. Heavy isotope labeled MV4–11 cells were treated with MG-132 for 4 hours, and light isotope labeled cells were treated with dimethyl sulfoxide and served as control (Fig. 5A). We quantified 2,987 ubiquitylation sites in two independent biological replicates and obtained an excellent correlation between these replicates (Spearman′s rank correlation coefficient 0.9) (Fig. 5B). Nearly two-thirds of the sites identified in MV4–11 cells were also identified in HEK293T cells. However, several proteins such as hematopoietic receptor tyrosine kinases FLT3 and CSF1R were only found in MV4–11 cells. In agreement with the proteasome inhibitory effect of MG-132, overall ubiquitylation levels were dramatically increased (threefold higher on average) in treated cells (Fig. 5C, and supplemental Table S2). Ubiquitylation of nearly a quarter of all quantified sites increased more than fourfold, whereas ubiquitylation of 45% of the sites was elevated more than twofold in MG-132 treated cells. Among the proteins with increased ubiquitylation in MG-132 treated cells were several cell cycle regulators and protein kinases such as CDC2, AURKB, CDK6, and other cell signaling kinases including PIK3CD, AKT1 and 2, FLT3, CSF1R, and GRK2. In our experiments, we identified di-Gly modified peptides from ubiquitin itself containing all seven lysines that are known to form ubiquitin linkages, and quantified their relative changes after MG-132 treatment. In agreement with the available literature, ubiquitylation of K6 and K63 did not show a substantial change, whereas ubiquitylation of all remaining lysines (K11, K27, K29, K33, and K48) was increased at least 1.5-fold (supplemental Table S2).
Nearly 40% of the quantified sites did not show a substantial increase in ubiquitylation (SILAC ratio less than 1.2) in proteasome-inhibited cells, suggesting that many of these sites are possibly involved in non-proteasomal regulatory functions. Surprisingly, ubiquitylation of a large fraction of the quantified sites (15%) was reduced more than twofold after MG-132 treatment. It has been reported that overall ubiquitylation of histone H2A and H2B is reduced upon treatment with MG-132 (51). We found a strong (~10-fold) reduction in ubiquitylation of histone H2B on K120, a site that is known to have important biological functions (43, 44). In addition, we quantified 36 ubiquitylation sites on different variants of all five histones, and nearly all of them were less abundant in MG-132 treated cells (supplemental Table S2). These results show that steady-state ubiquitylation of histones is altered by disruption of the UPS.
Sites on proteins annotated with nuclear terms, including chromatin and nucleosome, show reduced ubiquitylation after MG-132 treatment (Fig. 5C). A similar trend is observed for ubiquitylation sites on proteins annotated with the functional terms chromatin organization, nucleosome organization, and protein-DNA complex assembly.
In addition to ubiquitylation, lysine residues can be modified by several other post-translational modifications including acetylation, methylation, and SUMOylation (52). Reciprocal regulation of protein activity by ubiquitylation and acetylation has been shown to be functionally important in a few extensively studied proteins, such as p53 (52). To analyze the extent of overlap between ubiquitylation and acetylation, the ubiquitylation sites identified in this study were compared with a previously published dataset of human acetylation sites (25). We found that 30% (1,040/3,428) of acetylated lysines are modified by ubiquitylation at the same position (Fig. 6A). These results suggest, but do not mechanistically prove, a “crosstalk” between ubiquitylation and acetylation. Indeed, it has been suggested that acetylation prevents ubiquitin-mediated degradation of proteins (53). In accordance with this hypothesis, we observed that lysines modified by both ubiquitylation and acetylation have a significantly lower increase in ubiquitylation (p = 1.7e-15) after treatment with MG-132 (Fig. 6B). Among the proteins that harbor lysine residues modified by both ubiquitylation and acetylation, GO terms describing cytosolic and mitochondrial localizations as well as cytosolic processes such as translation, translational elongation, and mitochondrial metabolism are significantly enriched (Fig. 6C).
Protein phosphorylation is one of the most extensively studied post-translational modifications in eukaryotes. The human genome encodes about 518 protein kinases (54), 107 tyrosine phosphatases (55), and nearly 30 serine threonine phosphatases (56). Similarly, in human cells 650 different E1, E2, and E3 enzymes are involved in ubiquitylation (57), and about 100 deubiquitylases reverse this modification (57, 58). The presence of this extensive ubiquitylation machinery in human cells suggests that the complexity of protein regulation by ubiquitylation could be comparable to that of phosphorylation. However, owing to technological challenges, the number of known ubiquitylation sites was limited in comparison to the number of known phosphorylation sites. In previously described methods identification of ubiquitylation sites entails enrichment of ubiquitylated proteins, often modified by ectopically expressed, affinity-tagged ubiquitin. Trypsin digestion of these proteins generates a complex peptide mixture containing a small fraction of ubiquitylated peptides.
In this study we applied a highly robust and streamlined proteomic method to precisely map endogenous putative ubiquitylation sites in human cells. Our approach provides several advantages over previously described MS-based methods for ubiquitylation site mapping: (1) it only requires a single-step affinity enrichment of modified peptides, (2) the highly efficient enrichment of ubiquitylated peptides enables detection of low abundant modification sites, (3) it allows in-depth, proteome-wide analysis of endogenous ubiquitylation sites, (4) it can be easily applied to map ubiquitylation sites in any tissue or organism, and (5) it is fully compatible with site-specific quantification of ubiquitylation on a proteome-wide level.
Using this approach we identified 11,054 putative endogenous ubiquitylation sites (diglycine-modified lysines) in human cells. These data cover over two-thirds of the sites reported in previous MS-based studies. However, more than 90% of the sites in our dataset had not been reported previously. Thus, these data substantially expand the number of currently known human ubiquitylation sites. This study establishes ubiquitylation, in terms of identified sites, as the second most comprehensively studied PTM after phosphorylation. We show that ubiquitylation targets proteins involved in all major cellular functions and that its regulatory scope is comparable to other PTMs such as phosphorylation and acetylation. The described approach enables proteome-wide, quantitative analysis of ubiquitylation for the first time. Our analysis of changes in ubiquitylation following proteasome inhibition demonstrated that about half of all sites show increased ubiquitylation, whereas a subset of nuclear proteins is affected by reduced ubiquitylation, after just four hours of proteasome inhibition. It remains unclear whether sites that show increased ubiquitylation after MG-132 treatment are directly involved in proteasomal degradation. However, it is tempting to speculate that sites that do not show increased ubiquitylation upon proteasome inhibition confer nonproteasomal regulatory functions.
The sites identified in this study will serve as a valuable resource for the functional characterization of many proteins. The described method is generic and can be easily applied to map ubiquitylation sites in any cell type, tissue, or organism, and to perform site-specific quantification of ubiquitylation upon cellular perturbations.
Data availability: Sequence spectra supporting identification of ubiquitylated peptides, as well as all raw data associated with this manuscript can be downloaded from ProteomeCommons.org Tranche (https://proteomecommons.org/tranche) using hash keys and passphrases provided below. Dataset 1.1 and 1.2 contain 34 raw files. These files contain all data obtained from HEK293T cells. Dataset 2 contains 16 raw files, which include all SILAC quantification data from MV4–11 cells. The PDF document “MS2 Spectra” contains annotated MS2 spectra of ubiquitylated peptides identified in this study.
* This work is supported by the European Commission's 7th Framework Programme grants Proteomics Research Infrastructure Maximizing knowledge EXchange and access (XS) (INFRASTRUCTURES-F7-2010-262067/PRIME-XS) and Systems Biology of Stem Cells and Reprogramming (HEALTH-F7-2010-242129/SYBOSS), and the Lundbeck Foundation (R48-A4649). SAW is supported by a postdoctoral grant from Danish Council for Independent Research (FSS: 10-085134). The Center for Protein Research is funded by a generous grant from the Novo Nordisk Foundation.
This article contains supplemental Figs. S1 to S3 and Tables S1 to S3.
1 The abbreviations used are: