|Home | About | Journals | Submit | Contact Us | Français|
Protein glycosylation is a common post-translational modification and has been increasingly recognized as one of the most prominent biochemical alterations associated with malignant transformation and tumorigenesis. N-linked glycosylation is prevalent in proteins on the extracellular membrane, and many clinical biomarkers and therapeutic targets are glycoproteins. Here, we describe a protocol for solid-phase extraction of N-linked glycopeptides (SPEG) and subsequent identification of N-linked glycosylation sites (N-glycosites) by tandem mass spectrometry. The method oxidizes the carbohydrates in glycopeptides into aldehydes, which can be immobilized on a solid support. The N-linked glycopeptides are then optionally labeled with a stable isotope using deuterium-labeled succinic anhydride and the peptide moieties are released by peptide-N-glycosidase. In a single analysis, the method identifies hundreds of N-linked glycoproteins, the site(s) of N-linked glycosylation, and the relative quantity of the identified glycopeptides.
Glycosylation has long been recognized as the most common protein post-translational modification. It affects protein function, such as protein localization, stability, enzymatic activity, and protein-protein interactions 1. Differential glycosylation is a major source of protein microheterogeneity. Glycosylation plays key roles in cell communication, signaling and cell adhesion. Changes in carbohydrates in cell surface and body fluid proteins have been demonstrated in cancer and other conditions and highlight the importance of this modification 2, 3. However, studies on protein glycosylation have been complicated by the diverse structure of protein glycans and the lack of effective tools to identify the protein glycosylation site and glycan structure. For example, human protein database contains thousands of predicted N-linked glycosylated proteins, however, before high throughput methods for the identification of glycosites were developed in 2003, only 172 proteins were proven to be glycoproteins experimentally in the Protein Information Resources-Protein Sequence Database (http://pir.georgetown.edu/pirwww/search/textpsd.shtml)4.
Recently, two new methods for isolation and identification of N-linked glycopeptides in complex biological samples – solid-phase extraction of N-linked glycopeptides and glycopeptide capture using lectin-affinity column chromatography – have been reported 4, 5. In both instances, mass spectrometry is used after the isolation of the N-glycopeptides to identify the N-glycosites and quantify the relative abundance of glycopeptides using isotope coded tags.
In the first method, glycoproteins are covalently conjugated to a solid support via hydrazide chemistry, non-glycosylated peptides are removed with proteolysis prior to the release of N-linked glycopeptides from solid support 5. In the modified method detailed in this protocol, (Figure 1), glycoproteins are first digested into peptides that contain both glycosylated peptides and non-glycosylated peptides, and the cis-diol groups of carbohydrates in glycopeptides are oxidized to aldehydes which then form covalent hydrazone bonds with hydrazide groups immobilized on a solid support. Non-glycosylated peptides are washed away while the glycosylated peptides remain on the solid support. For accurate quantitative analysis of N-linked glycopeptides using isotope labeling and mass spectrometry, the amino groups of the immobilized glycopeptides can be labeled with isotopic tags. For example, light (d0, containing no deuteriums) or heavy (d4, containing four deuteriums) forms of succinic anhydrides can be used to label the glycopeptides isolated from two biological samples. Lastly, the formerly N-linked glycosylated peptides are released from the solid phase using Peptide-N-Glycosidase F (PNGase F). PNGase F treatment also results in the conversion of the glycosylated asparagines to aspartic acids, thus generating a 1 unit mass shift at the site of glycosylation which is detectable using a high accuracy mass spectrometer and is diagnostic for the glycosylation site. Therefore, in a single analysis, the method identifies N-linked glycosylated proteins, the site(s) of N-linked glycosylation, and the relative quantity of the identified glycopeptides via stable isotope tagging.
In the second approach, glycopeptides are immobilized by lectin column-mediated affinity capture. After elution of glycopeptides from lectin column, peptide-N-glycosidase is then used to release the N-linked glycopeptides and the resulting peptides are subjected to mass spectrometry to identify the N-linked glycopeptides and sites of glycosylation 4. For glycopeptide capture using hydrazide chemistry, peptides containing either N-linked or O-linked oligosaccharides are conjugated to a solid support covalently. Non-glycosylated peptides can be removed by extensive washing before the release of glycopeptides, therefore the glycopeptides can be specifically enriched (over 90% enrichment 6). The specific release of different types of glycopeptides from solid support can be achieved by varying glycosidases or chemicals. For glycopeptide capture using a lectin affinity column, glycopeptides containing certain glycan structures can be selectively enriched by employing different types of lectin affinity chromatography. The specificity and reproducibility may not be easily controlled due to affinitive capture of glycopeptides.
Extracellular proteins, such as proteins expressed on the cell surface or secreted from the cell, are exposed to extracellular environments such as surrounding tissue, blood, or other body fluids. Such proteins are thus the most easily accessible for diagnostic and therapeutic purposes. If present in easily accessible body fluids such as blood plasma or cerebrospinal fluid, they are also preferred candidates protein biomarkers. However, the challenge faced by all proteomic methods for the analysis of body fluids is the peculiar properties of these samples. The proteome of most body fluids is complex, consisting minimally of tens of thousands of different protein species and exhibiting a high dynamic range in protein concentration. These proteomes are also dominated by a few highly abundant proteins7, and as a result, most proteomic analyses of body fluids detect only a limited number of highly abundant proteins. While the dynamic range of protein abundances in tissues or cells is not as large as it is in body fluids, a similar situation is still observed: Proteomic analyses detect only the most abundant protein subset, comprised mostly of structural intracellular proteins. Glycosylation, especially N-linked glycosylation, is a common modification to proteins that are exposed to an extracellular environment 8. Therefore, selective isolation of N-linked glycopeptides enriches tissue and body fluid proteomes for extracellular proteins. In addition, the number of N-linked glycosylation sites in the human extracellular proteome is modest and identifiable with current proteomic technology. For example, N-glycosites, which generally contain the N-X-S/T sequence motif (where X denotes any amino acid except proline) 9, are found in just 3% of tryptic peptides from the entire human proteome, yet they represent the majority of extracellular proteins (over 70%) 10. Therefore, targeted analysis of N-glycosites significantly increases the sensitivity for low abundance glycoproteins by reducing the number of detectable peptides at each targeted mass range.
While the analysis of N-linked glycopeptides reduces sample complexity and peptide redundancy and is therefore beneficial for achieving higher coverage of the proteome per analysis, it is also apparent that it leads to the loss of some, potentially important information. First, non-glycosylated proteins are transparent to this system. Second, the availability of few N-glycosites per protein increases the challenge of identifying the corresponding protein. Third, tryptic peptides that are too short or too long to fall within the detection range of the mass spectrometer will not be identified, although this last limitation may be overcome, at least in part, by the use of proteases with cleavage specificities different from that of trypsin. Forth, removal of oligosaccharides from the N-linked glycopeptides before their analysis only identifies the N-glycosite whereas the glycan structure attached to the site is not being analyzed. Finally, releasing formerly N-linked glycopeptides from their attached oligosaccharide coalesces different oligosaccharide structures into a single peptide sequence-specific signal, thus obfuscating glycosylation changes due to oligosaccharide structure alteration 2.
SPEG has been successfully applied to the quantitative analysis of glycopeptides from complex mixtures, including serum/plasma and other body fluids. 5, 6, 11–15, the detection of glycoproteins secreted or shed into the culture medium by cells, the detection of membrane proteins and cell surface proteins from cells and tissues 16, 17, and for comparing glycoproteins in the extracellular matrix of normal and disease tissues 18. Several high abundance plasma proteins, including the most abundant plasma protein, albumin, do not appear to contain any N-glycosites and are therefore transparent to the method, allowing for more efficient identification and quantification of the lower abundance glycoproteins in blood 6. By combining quantitative analysis of N-glycosites with methods that determine relative protein changes in different proteomes, such as the cysteine tagging method 19, the occupancy of individual N-glycosites and changes therein can also be determined. This is of particular interest in studies in which changes of glycosylation occupancy are suspected, as exemplified by patients with Type I Congenital Disorders of glycosylation (CDG), in which the N-linked glycosylation pathway is deficient 20. Using this method, we have identified thousands of N-glycosites from different tissues, plasma, and other body fluids, and established a database (www.unipep.org) as a public resource for glycoprotein analysis 10.
|Time (min)||% B|
To illustrate the protocol, glycopeptides from 1 mg of proteins from tissues and plasma were isolated, and ~10 µg of total N-linked glycopeptides were recovered. Such isolated peptides were analyzed by LTQ and MS/MS spectra were searched against protein database using SEQUEST. The analysis resulted in over 1000 peptide identifications that contain the consensus N-linked glycosylation motif (N-X-T/S motif, X can be any amino acids except P) and have a PeptideProphet score of at least 0.9 6. The identified glycopeptides represent over 300 individual, unique N-X-S/T sequons after overlapping sequences containing the same N-X-S/T sequon are resolved. The specificity of the identified peptides is over 90%, as calculated by the percentage of identified peptides containing consensus N-linked glycosylation motif.
Among about 300 peptides identified from the mixed samples, 28 peptides were detected and quantified in both tissue and plasma. These results indicate that the method can selectively isolate N-glycosites from mixtures of glycoproteins.
An example of quantitative analysis of glycopeptides with isotopic labeling is shown in Figure 3 6. Glycopeptides immobilized on hydrazide resin from two samples were labeled with light (d0) or heavy (d4) forms of succinic anhydride. After labeling, the beads containing the two samples were combined and the formerly N-glycosites were released. The combined samples were analyzed by LC-MS using ESI-qTOF. The quantification is illustrated for a single scan of the mass spectrometer in MS mode. The paired peptide is doublely charged and has a mass difference of four units with monoisotopic peaks at 629.88 and 631.86 (Fig. 3). The peptide was sequenced by MS/MS analysis using ESI-qTOF and identified by database searching 21. It was identified with peptide sequence EEQFN#STFR from human Ig γ-1 chain C region secreted form, a classic serum protein. This shows that accurate quantification of relative glycopeptide abundance from two samples can be achieved with stable isotope tagging.
This work was supported with federal funds from the National Cancer Institute, National Institutes of Health, by grant R21-CA-114852 (to H.Z.) and by Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, under contract No. N01-HV-8179 (to R. A.).