|Home | About | Journals | Submit | Contact Us | Français|
Glycosylation is one of the most common post-translational modifications (PTMs) of proteins. At least 50% of human proteins are glycosylated with some estimates being as high as 70%. Glycoprotein analysis requires determining both the sites of glycosylation as well as the glycan structures associated with each site. Recent advances have led to the development of new analytical methods that employ mass spectrometry extensively making it possible to obtain the glycosylation site and the site microheterogeneity. These tools will be important for the eventual development of glycoproteomics.
Glycosylation plays a variety of important roles in many cellular events ranging from structural to signaling and recognition . Key to the understanding of these roles is the knowledge of the primary structures of the glycoconjugates. There are two major types of glycosylation: N-linked and O-linked. In N-glycosylation, an N-acetylglucosamine (GlcNAc) residue is attached by an amide bond to an asparagine residue belonging to a consensus sequence NX(S/T), where X can be any amino acid except proline. The presence of the consensus sequence is required for N-linked glycosylation, however the occupation of a potential site is not obligatory. Hence, a glycoprotein may contain a number of potentially N-glycosylated sites, each of which may or may not be glycosylated. Additionally, O-glycosylation may occur at any serine or threonine residue with no single common core structure or consensus protein sequence. While the proteome is coded in the genome, no template exists for glycosylation. Glycans are produced by a set of competing glycosyl transferases. For this reason, the population of glycans occurring at a given site is often not homogeneous; a particular site of N- and O-glycosylation may be occupied by a number of structurally distinct glycans - a condition referred to as microheterogeneity or site heterogeneity. A protein containing three sites of glycosylation with 10 different glycans in each site can result in a 1000 different glycoforms of the protein. The large structural heterogeneity, the presence of numerous glycosylation sites, and the large number of possible stereo- and regio-isomers all conspire to make glycoprotein analysis significantly more difficult than protein analysis. For this reason, proteomic studies typically cleave and discard the glycans prior to analysis. This notion is however changing as new methods for glycoprotein analysis are being developed and refined.
Analysis of glycans or oligosaccharides has previously been performed by nuclear magnetic resonance. X-ray crystallography of glycans is relatively uncommon and limited to very simple structures. The major limitations of these methods are that they require pure samples and large amounts of material. Separation of glycans into pure components is complicated by the lack of effective methods for separating them, while biological samples often yield femto- to pico-moles of individual components and well below the analytical capabilities of these methods.
Mass Spectrometry (MS) has emerged as the premier tool for oligosaccharide analysis . It provides structural information with high sensitivity . Glycan analysis is typically performed with matrix-assisted laser desorption/ionization (MALDI) and electrospray ionization (ESI) resulting in ions that are either sodium coordinated or protonated, respectively in the positive MS mode and the deprotonated in the negative MS mode [4,5]. The fragmentation behavior of the two types of ions in tandem MS varies slightly but glycans commonly yield fragment ions resulting from glycosidic bond cleavages . While per-derivatization such as permethylation is often performed, it is not a requisite to analysis. Derivatization improves sensitivity and yields more structurally relevant fragment ions during tandem MS, but the sensitivity improvements are only slight while making it difficult to use glycosidase reactions to determine linkages and identify saccharide residues . Furthermore, per-derivatization adds additional steps, yields partially derivatized products, and may mask some types such as sulfated oligosaccharides.
The standard approaches (Figure 1) to determine site-specific glycosylation are to employ a combination of specific enzymatic proteolysis (usually with trypsin), fractionation of glycopeptides (most often by liquid chromatography or affinity chromatography) and glycopeptide analysis by MS [7-11].
In some cases, the deglycosylation of glycoprotein or glycopeptides is concurrently performed, followed by MS analysis of the released glycans. Trypsin readily produces highly predictable peptide masses due to its high activity and specificity. In addition, tryptic glycopeptides guarantee a basic (readily protonated) residue in every peptide, which increases ionization efficiency during MS analysis. The major problem with this approach is that many glycoproteins are resistant to trypsin. In addition, the resulting glycopeptides may often be too large for MS analysis. The problem is complicated by the presence of missed cleavages particularly near the sites of glycosylation. The mixture of peptides and glycopeptides further complicates analysis because glycosylation strongly diminishes the ionization of the peptide. The problem is further compounded because glycopeptides obtained from proteolytic digestion are much lower in abundance than peptides from the same glycoproteins. For these reasons, the information regarding glycosylation is rather often incomplete.
Enrichment of the glycopeptides can be performed but has its own challenges. One strategy was developed that cleaves the carbon-carbon group between the diols of saccharide units producing aldehyde groups that can be captured via reaction with hydrazine functional groups that are immobilized on a solid support [12,13]. The unbound, nonglycosylated peptides are then removed by washing, and the immobilized glycopeptides are released from the solid-phase support by enzymatic cleavage of the glycan-peptide bond. The asparagines that were formerly glycosylated are converted to aspartic acids during this reaction, and the peptides can then be identified using MS/MS analyses. Other approaches include capture with boronic acid immobilized in columns ; chemical bonds are formed between the diol groups of the glycoproteins or tryptic glycopeptides and the immobilized borate groups at high pH. Similarly, size exclusion chromatography has been used to enrich N-linked glycopeptides from human serum resulting in a three-fold increase in the number of glycosylation sites observed. Hydrophilic interaction liquid chromatography (HILIC) has also been used on a variety of complex matrices[16-21]. Anion exchange chromatography was employed for the enrichment of sialylated glycopeptides. Lectin affinity chromatography using ConA [22-24], WGA or a combination of lectins  has also demonstrated utility. However despite the plurality of the methods, there is still no efficient method for glycopeptide enrichment, as no method is both comprehensive and highly specific..
An alternative method (Figure 2) to overcome several limitations of trypsin digestion was developed using nonspecific proteases [26,27]. In this approach pronase is used, which is a mixture of exo- and endo-proteinases capable of cleaving essentially any peptide bond. The glycoprotein is digested save for small peptide ‘footprints’ around glycosylation sites that are protected from digestion by steric hindrance due to the glycan moiety. The products of the digestion are glycopeptides with short peptides, small (unglycosylated) dipeptides, and amino acids. Purification with solid phase extraction and subsequent MS analysis yields mass spectra that contain only glycopeptides. This method is now used routinely to obtain site heterogeneity for numerous glycoproteins, however its major limitation is that it works best currently for purified glycoproteins and small protein mixtures. It requires high-mass accuracy and high-resolution instruments such as Fourier transform instruments (ion cyclotron resonance and orbitrap) and high performance time-of-flight. It may also yield small amino acid and dipeptide coordinated N-linked glycans, but the size of the peptide moiety can be controlled by the reaction time. O-linked glycan typically yield larger peptides despite the reaction times [26,28,29].
Further improvements on these methods however will make them more generally applicable and more robust. For example, immobilized pronase is found to be more effective in producing glycopeptides by reducing enzyme autolysis products. Hyper-loaded beads are found to accelerate the reaction yielding desired glycopeptides after only 1.5 hours . Site heterogeneity in both N-linked  and O-linked  glycoproteins displaying a wide range of glycan structures are completely elucidated. This technique is also useful for mixtures of glycoproteins yielding quantitative determination of glycosylation sites and microheterogeneity .
In theory, tandem MS can provide peptide and glycan sequence as well as the site of glycosylation. In practice however, tandem MS analysis of glycopeptides remains difficult and far from routine. Studies of glycopeptide fragmentation reactions have focused almost exclusively on protonated (either singly via MALDI or multiply via ESI) tryptic glycopeptides. The typical fragments correspond to the loss of the glycan moiety, while the information on peptide sequence and glycan attachment sites is often minimal. Tandem MS is complicated by the sizes of tryptic peptides, which tend to be larger than the useful mass range of collision-induced dissociation (CID), and the labile nature of the glycan-peptide bond. A common strategy, therefore, is to determine the overall mass of the glycopeptide and perform tandem MS to yield the peptide mass. Typically very little information is obtained on the glycan sequence. To sequence the peptide, the glycopeptide is first deglycosylated and subjected separately to tandem MS.
Several reviews on glycopeptide analysis by mass spectrometry [7,20] conclude that the fragmentation behavior of glycopeptide ions under collision-induced dissociation tend to vary with the instrument, the instrumental parameters, the peptide composition, the charge carrier, the charge state [9,20,31-35]. More recently, electron capture dissociation (ECD) and electron transfer dissociation (ETD) have been applied to glycopeptides and are showing considerable promise [35,36]. These typically non-ergodic methods cleave peptide bonds while leaving the glycan attached to the peptide. Unfortunately, most ECD and ETD, and even CID, studies have been performed on N-linked glycans where the glycosylation sites are known from the consensus sequence. A better application would be to employ ECD and ETD on O-linked glycans.
Nonetheless, much information can still be extracted from CID. Glycopeptides when subjected to CID yield low molecular weight ions such as m/z 163 (Hex+H+), m/z 204 (HexNAc+H+), m/z 292 (NeuAc+H+), and m/z 366 (Hex-HexNAc+H+) that are diagnostic for the presence of glycosylation [8,37]. In this way, glycopeptides were readily identified by selective ion monitoring (SIM) with high sensitivity in ion trap MS , quadrupole-TOF mass analyzers[38,39]. By the same token, neutral losses of saccharides such as hexose (162 Da), N-acetylhexosamine (203 Da), fucose (146 Da), N-acetylneuraminic acid (291 Da) can also be used to indicate the presence of glycopeptides in the mass spectra. The CID of pronase-produced ions also yields structurally informative fragments on glycopeptides with sizes more amenable to CID. Glycopeptides coordinated with Na+ yield glycan information, while protonated species yield peptide sequence information .
Due to the extreme glycan heterogeneity, creating a thorough data analysis platform for large-scale glycopeptide identification is a formidable challenge . The protein databases can be parsed to contain only those containing the consensus sequence for N-linked glycosylation. There are several software platforms such as GlycoMod, GlycoPep DB, Peptoonist, GlycoMiner that aid in the identification of intact glycopeptides, mainly N-linked glycopeptides [43,45-47]. These methods are useful for the determination of glycosylation sites from glycopeptides derived from specific protease like a trypsin, while GlycoX can be used to interpret mass spectra obtained from nonspecific protease such as pronase E and protease K.
Unlocking the structural code of glycosylation offers another dimension toward understanding protein function. Development in the next few years in chemical/biochemical sample processing, mass spectrometry strategies, and bio-informatics will lead to the comprehensive and routine analysis of glycoproteins. However, glycoproteomics must include both glycan and protein information. The current tendency to use the term “glycoproteomics” to studies that examine glycosylation sites while discarding glycan information should be avoided. These glycan-specific proteomics studies have their merit but provide only partial information regarding the glycoproteome.
We are grateful for funds provided by the National Institute of Health (RO1GM049077).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.