|Home | About | Journals | Submit | Contact Us | Français|
Glycans are cell-type specific, post-translational protein modifications that are modulated during developmental and disease processes. As such, glycoproteins are attractive biomarker candidates. Here, we describe a mass spectrometry-based workflow that incorporates lectin affinity chromatography to enrich for proteins that carry specific glycan structures. As increases in sialylation and fucosylation are prominent among cancer-associated modifications, we focused on Sambucus nigra agglutinin (SNA) and Aleuria aurantia lectin (AAL), lectins which bind sialic acid- and fucose-containing structures, respectively. Fucosylated and sialylated glycopeptides from human lactoferrin served as positive controls, and high mannose structures from yeast invertase served as negative controls. The standards were spiked into Multiple Affinity Removal System (MARS) 14-depleted, trypsin-digested human plasma from healthy donors. Samples were loaded onto lectin columns, separated by HPLC into flow-through and bound fractions, and treated with peptide: N-glycosidase F to remove N-linked glycans. The deglycosylated peptide fractions were interrogated by ESI HPLC-MS/MS. We identified a total of 122 human plasma glycoproteins containing 247 unique glycosites. Importantly, several of the observed glycoproteins (e.g., cadherin 5 and neutrophil gelatinase-associated lipocalin) typically circulate in plasma at low ng/mL levels. Together, these results provide mass spectrometry-based evidence of the utility of incorporating lectin-separation platforms into cancer biomarker discovery pipelines.
Advances in mass spectrometry (MS)-based proteomics, including nano liquid chromatography electrospray ionization (ESI) interfaces, faster and more sensitive mass analyzers, and robust bioinformatics approaches, have brought the unbiased discovery of disease biomarkers within reach. These technological improvements have led to a new era of research aimed at improving prognoses, diagnoses, and monitoring responses to therapy through detection of biomarkers in human body fluids (i.e., plasma, saliva and urine). However, making this an effective strategy requires panels of verified disease-specific reporter molecules that, as yet, do not exist. Therefore, there is a tremendous interest in discovery efforts.
Overall, investigators who focus on the early stages of discovery pipelines use two approaches. The first is brute force protein identification to determine differences between samples obtained from patient subjects and control individuals. The inherent complexity of body fluids requires extensive sample separation, usually achieved by a series of orthogonal/complimentary chromatographic steps. Successful studies typically require large amounts of starting material, time, and expertise . Second, other investigators have used targeted approaches to reduce complexity. One way to direct these experiments is by considering the biology of the disease of interest, in this case, cancer. In this regard, post-translational modifications (PTMs) are especially interesting because they are linked to the disease process and in some cases play a causal role . Since these modifications can appear at multiple positions on a protein scaffold and on multiple protein backbones, the expression of these motifs is usually greatly amplified as compared to that of single proteins. Thus, through the use of affinity capture reagents, specific PTMs may be exploited as targets to enrich molecules that are a signature of a particular disease state.
Aberrant carbohydrate modifications have been recognized as a hallmark of cancer for over 30 years . Intriguingly, many of the oldest and most widely used clinical diagnostic tests detect glycoproteins. These include carcinoembryonic antigen (CEA), commonly used as a marker of colorectal cancer; CA-125, frequently employed to diagnose ovarian cancer; and prostate-specific antigen (PSA) [4-6]. Interestingly, many of the most informative tests directly assess the expression of a particular class of carbohydrates termed Lewis (Le) blood group antigens, which exhibit unique biological functions . Anti-sialyl Lea (CA 19-9), -Lex, -sialyl Lex, and -Ley antibodies are used in the evaluation of biopsy specimens from breast, bladder, colorectal, esophageal and non-small cell lung carcinoma [8-15]. In all instances, Le antigen expression is correlated with increased metastasis, advanced stage of disease and reduced survival time. The fact that cancer-related carbohydrate changes are correlated with clinically relevant outcomes such as metastasis and survival enhances their utility as biomarkers. Indeed, studies have already shown that selectively enriching cancer-related protein glycoforms affords the possibility of increasing diagnostic sensitivity and specificity. For example, separation of serum PSA by the Maackia amurensis agglutinin lectin, which specifically binds α2,3-linked sialic acid, allows discrimination (p < 0.001) between blood samples from individuals with benign prostatic hypertrophy and prostate cancer patients , which standard PSA tests fail to do .
The relationship between saccharide expression and disease progression has a biological basis as glycans regulate many processes involved in tumorigenesis. For instance, extravasation, a critical step in metastasis, is initiated by shear stress-induced interactions between selectin family endogenous lectins and their cell surface carbohydrate ligands. The selectin family consists of three members with differential expression patterns. Endothelial cells display E- and P-selectin, platelets express P-selectin, and leukocytes present L-selectin [18; 19]. Carbohydrate ligands for selectins are modified Le blood group antigens, which are abundant in malignant tissues. Evidence is accumulating that the clinical correlation between tumor expression of selectin ligands and metastasis is a reflection of a causal relationship . Mice lacking L- or P-selectin, singly or in combination, exhibit greater resistance to the metastasis of colon carcinoma cells than do wild-type animals [21; 22], and E-selectin is also implicated in tumor metastasis . Additionally, cancer-related protein scaffolds have been identified as selectin ligand carriers, and thus as participants in the extravasation process. For instance, CEA, the glycoprotein commonly used as a marker for colorectal cancer, and CD44, a highly glycosylated cell surface adhesion protein, can serve as E- and L-selectin ligands .
In light of the strong biological and clinical correlations between tumors and specific types of glycosylation, several laboratories have applied a variety of glycan-based enrichment strategies to cancer biomarker discovery. In general, these studies have tended to be small-scale proof-of-principle endeavors. It is noteworthy that, on the whole, they succeeded in identifying target biomarker candidates for further validation. Thus, the published literature supports the theory that focusing the selection process on cancer-relevant glycans increases the likelihood of identifying potential disease sentinels and/or targets. In terms of specific methods, a variety of protocols, including lectin-, antibody- and chemical-based approaches, have been developed for the capture of glycoproteins and glycopeptides. Lectin-based protocols include those using single species [24-26], as well as mixtures, e.g., concanavalin A, wheat germ agglutinin, and Artocarpus integrifolia (jacalin). The latter method has been dubbed multi-lectin affinity chromatography [27-30]. Less common are protocols employing carbohydrate-reactive antibodies such as those recognizing Le blood group antigens . In addition, two similar approaches, involving hydrazide or boronic acid chemistry, capitalize on the cis-diols in monosaccharides to covalently couple the glycans to a derivatized support [32; 33].
Here, we present the results of an extensive method development effort in which we used glycopeptide standards as positive and negative controls to optimize an automated workflow for the rapid enrichment of selectively glycosylated peptides from depleted, trypsin-digested human plasma. Specifically, we performed lectin affinity chromatography at the glycopeptide level. Using a number of requirements, including the presence of an N-linked consensus motif (NXS/T) and a +1 Da mass shift at the modified Asn residue after deglycosylation with PNGase F, we identified glycosites enriched by a specific lectin, an indication that they carried specialized carbohydrate modifications. Using commercially available preparations of Sambucus nigra agglutinin (SNA), which binds sialic acid α2,6- and α2,3-linked to terminal galactose, and Aleuria aurantia lectin (AAL), which binds α-linked fucose at both core and branch positions, we identified a total of 227 glycosites in 119 glycoproteins from human plasma. These included low-abundance proteins such as cadherin-5 and neutrophil gelatinase-associated lipocalin. Now, we are poised to compare families of SNA- and AAL-binding glycoproteins in cancer-relevant human samples.
For details, also see a Standard Operating Procedure (SOP) provided as Supplementary Methods.
Pooled human plasma collected from equal numbers of male and female healthy donors was obtained from Bioreclamation (Westbury, NY).
Batches of 200 μl of human plasma were processed according to the manufacturer's instructions using a MARS-Hu-14 (10 × 100 mm) column from Agilent Technologies (Santa Clara, CA) on a Paradigm MG4 HPLC system (Michrom Bioresources, Auburn, CA). The protein flow-through fraction was collected and re-adjusted to the original volume using a 3 or 5 kDa molecular weight cut-off (MWCO) centrifugal concentrator (Sartorius AG, Goettingen, Germany). Hereafter, a plasma volume refers to the original plasma volume.
MARS-14 depleted plasma was denatured with 6 M urea, reduced with 20 mM DTT (30 min at 37 °C), alkylated with 50 mM iodoacetamide (30 min at RT), and incubated overnight at 37°C with sequencing grade trypsin (Promega, Madison, WI) added at a 1:50 enzyme:substrate ratio (wt/wt) as previously described . Following digestion, the samples were acidified with formic acid, and desalted using HLB Oasis SPE cartridges (Waters, Milford, MA) that were prepared by washing with 80% acetonitrile in 1% formic acid prior to equilibration with 0.1% formic acid. Samples were eluted with 80% acetonitrile in 0.1% formic acid, neutralized by the addition of 0.5 M ammonium bicarbonate and concentrated by vacuum centrifugation. Peptides were stored at -80 °C until use.
Human lactoferrin (L0520) and yeast invertase (I0408) were purchased from Sigma-Aldrich (St. Louis, MO). Glycoproteins were individually trypsin-digested as described above. Since invertase was not expected to bind to the lectin columns, no further purifications were carried out. The lactoferrin digest (50 μg) was chromatographed on either an SNA or an AAL column to enrich the subset of glycopeptides that had glycan structures that interacted with the carbohydrate-binding proteins. The bound material was collected, and desalted as described below before use. Lactoferrin- and invertase-derived glycopeptide standards were characterized by matrix-assisted laser desorption ionization–time-of-flight (MALDI-TOF) MS and MS/MS, also described below.
SNA and AAL were purchased from Vector Laboratories (Burlingame, CA). Lectins (1.5-6 mg) were suspended at 10-20 mg/mL in PBS. 100 mg POROS-AL beads (Applied Biosystems, Foster City, CA) were washed 2 times in 1 mL PBS before addition of the lectin solution. Sodium cyanoborohydride was added to a final concentration of 50 mM and the sample was tumbled overnight at room temperature. The beads were washed once with 1 mL 1M Tris-HCl, pH 7.4. The remaining aldehydes were blocked by incubation for 30 min at room temperature in the same buffer containing 50 mM sodium cyanoborohydride. Unconjugated protein was removed by washing the beads (5 × 1 mL of 1 M sodium chloride). Then, the lectin-conjugated beads, approximately 300 μl in volume, were packed into 4.6 × 50 mm PEEK HPLC columns. Routinely, columns were stored in PBS with 0.02% sodium azide at 4 °C for up to 6 months. Individual columns were used for up to 75 affinity separations without changes in the performance characteristics.
A Michrom Paradigm MG4 HPLC system equipped with a CTC PAL robot configured as an autosampler and fraction collector (Michrom Bioresources, Auburn, CA) was employed. Initially, we tested sample loads, buffer compositions, and column washing conditions using glycopeptide standards spiked into MARS 14-depleted trypsin-digested plasma. We began by varying the amount of sodium chloride (0-500 mM) in Buffer A (25 mM Tris, pH 7.4, 10 mM calcium chloride, 10 mM magnesium chloride). Buffer B was 0.5 M acetic acid. Next, we optimized the wash step, in terms of timing and flow rate: ~40 column volumes at 2 mL/min vs. 0-0.75 column volumes at 50 μl/min. Conditions were assessed in terms of the binding characteristics of the glycopeptide standards, recovery of plasma glycopeptides in the bound fraction, and reproducibility (determined by monitoring peptide elution at 280 nm). Lastly, we explored sample loading (10-200 μl) in terms of bound glycopeptides.
The optimized protocol was as follows. Routinely, ~1 mg (200 μL depleted plasma) was diluted into Buffer A containing 50 mM sodium chloride, and spiked with 50 pmol of the invertase digest and 25 pmol of the lectin-enriched lactoferrin. Then, the mixture was applied to the lectin column and separated using a 3 step gradient: 1) Collection of the unbound fraction: Buffer A for 9.0 min at 50 μL/min; 2) Elution: Buffer B for 4.8 min at 500 μL/min; and 3) Re-equilibration: Buffer A for 6.0 min at 3000 μL/min. The flow-through (FT) fraction, collected from 2.8 to 6.9 min, and the bound fraction, collected from 9.0 to 11.6 min, were desalted using Oasis HLB cartridges as described above. Eluted samples were neutralized by the addition of 0.5 M ammonium bicarbonate and concentrated to < 100 μL by vacuum centrifugation.
N-linked glycopeptides in the FT and bound fractions were deglycosylated by treatment with PNGase F (Glycerol-free, New England Biolabs; Ipswich, MA). A small volume was removed to estimate pH. If outside the desired range of 7.0-8.0, 50 mM ammonium bicarbonate was added. Initial experiments assessed the optimal enzyme to substrate ratio (1 U/μg-20 U/μg) and the effect of enzyme concentration (25 U/μl-200 U/μl) on deglycosylation efficiency. Final reaction conditions were as follows. Five μl (2,500 U) of enzyme was added to each sample in a volume of 50-100 μl prior to incubation overnight at 37 °C. Following deglycosylation, samples were desalted and concentrated using C18 ZipTips® (Millipore; Billerica, MA).
Mass spectra were acquired by matrix-assisted laser desorption ionization–time-of-flight (MALDI-TOF) MS using a 4800 MALDI TOF-TOF Proteomics Analyzer (Applied Biosystems, Foster City, CA; USA/MDS Sciex, Concord, ON, Canada) equipped with TOF/TOF™ ion optics, a 200 Hz Nd:YAG laser, controlled by the 4000 Series Explorer Software V3.5.28193. The positive ion MS spectra in the m/z range of 800-6000 Da were manually acquired by accumulating 1500-2000 shots at a fixed laser intensity (4000-4200). The accelerating voltage was 20 kV. Other instrumental settings were optimized to afford a resolution of >15,000 at m/z = 2093.89 Da. MS/MS spectra were generated by employing collision-induced dissociation (CID) using the following settings: collision cell floated at 1 kV; resolution of precursor ion selector = 300 or 400 full-width half-maximum (FWHM); metastable suppressor: on; total shots per spectrum = 4000; and fixed laser intensity = 5000-5500. No collision gas was used. Manufacturer-supplied Plate Model and Default MS Calibration algorithms were employed to generate external calibration of the MS mode using monoisotopic molecular ions from a combination of six peptide standards (des-Arg1-bradykinin, angiotensin I, [Glu1]-fibrinopeptide B, and three ACTH peptides: 1-17, 18-39 and 7-38). Acceptance criteria thresholds for a generation of MS calibration files required at least four standard molecular ions with S/N of ≥300, mass tolerance of 0.5 Da, and maximum outlier error of 25 ppm. Default calibration of the MS/MS spectra used a minimum of five matched fragment ions of [Glu1]-fibrinopeptide B that were detected with a minimum S/N of 120, within a mass tolerance window of ±1 Da and a maximum outlier error of 20 ppm. Data were processed using Data Explorer Software Version 4.9 (Applied Biosystems). Samples were desalted and concentrated using C18 ZipTips® (Millipore). Then aliquots of 0.5 μl (~15 nM – 3 μM) in 0.1% TFA were mixed on a MALDI target with the matrix (α-cyano-4-hydroxycinnamic acid, 5 mg/ml in 80% ACN/0.1% TFA/10 mM dibasic ammonium phosphate) at a 1:1 ratio (v/v). All data were manually processed and interpreted.
The proteolytic peptide mixtures obtained after tryptic digestion, lectin enrichment, PNGase F treatment, and desalting were analyzed by reversed-phase nano-HPLC-ESI-MS/MS using an Eksigent nano-LC 2D HPLC system (Eksigent, Dublin, CA), which was directly connected to a quadrupole time-of-flight (QqTOF) QSTAR Elite mass spectrometer (MDS SCIEX, Concorde, CAN). To compensate for differences in the amount of glyco/peptides in the various lectin fractions, we injected 10-20% of the bound and 5% of the flow-through material. Briefly, peptides were applied to a guard column (C18 Acclaim PepMap100, 300 μm I.D. × 5 mm, 5 μm particle size, 100 Å pore size, Dionex, Sunnyvale, CA) and washed with the aqueous loading solvent (2% solvent B in A, flow rate: 20 μL/min) for 10 min. Subsequently, samples were transferred onto the analytical C18-nanocapillary HPLC column (C18 Acclaim PepMap100, 300 μm I.D. × 15 cm, 3 μm particle size, 100 Å pore size, Dionex, Sunnyvale, CA) and eluted at a flow rate of 300 nL/min using one of two gradients: Gradient 1 (~3 h): at 2% solvent B in A for 5 min, 2-40% solvent B in A (from 5-125 min), 40-90% solvent B in A (from 125-140 min), and at 90% solvent B in A (from 140-149 min), with a total runtime of 194 min (including column equilibration), or Gradient 2 (~2 h): 2-40% solvent B in A (from 0-60 min), 40-90% solvent B in A (from 60-75 min), and at 90% solvent B in A (from 75-85 min), with a total runtime of 120 min (including column equilibration). Solvent A consisted of 0.1% formic acid in 98% H2O/2% acetonitrile and solvent B was 0.1% formic acid in 98% acetonitrile/2% H2O. Mass spectra (ESI-MS) and tandem mass spectra (ESI-MS/MS) were recorded in positive-ion mode with a resolution of 12,000-15,000 FWHM. The nanospray needle voltage was typically 2,300 V in HPLC-MS mode. Spectra were calibrated in static nanospray mode using MS/MS fragment-ions of a Glu-Fibrinogen B peptide standard (y1 fragment ion with m/z at 175.1195, and y11 fragment ion with m/z at 1285.5444). For collision induced dissociation tandem mass spectrometry (CID-MS/MS), the mass window for precursor ion selection of the quadrupole mass analyzer was set to ± 1 m/z. The precursor ions were fragmented in a collision cell using nitrogen. Advanced information dependent acquisition was employed for MS/MS data collection, using QSTAR Elite (Analyst QS 2.0) specific features, including “Smart Collision” and “Smart Exit” (fragment intensity multiplier set to 2.0 and maximum accumulation time at 2 sec) to obtain MS/MS spectra for the six most abundant parent ions following each survey scan. Dynamic exclusion features, based on value M, were set to a width of 50 mDa and exclusion duration of 120 sec. For further details see Supplementary Methods. To increase overall sampling efficiencies, two or three replicate nano-HPLC-MS/MS experiments were performed per sample.
Mass spectrometric data were analyzed with a bioinformatics database search engine system Protein Pilot 3.0 (revision 114732) using the Paragon Algorithm 188.8.131.52 11342 developed by ABI . Peaklists for the QSTAR Elite LC-MS/MS data sets were generated using Protein Pilot 3.0 (revision 114732). The following search parameters were used: trypsin enzyme specificity, carbamidomethyl (Cys) as a fixed modification, “thorough search effort setting” allowing for “biological modifications”. In contrast to other search engines, the “thorough” mode of Paragon Algorithm determines modifications using feature probabilities based on the specified settings . The Protein Pilot “Modifications Catalog and Translations.xls” file, and the “ParameterTranslation.xml” file, that were used in this study, were uploaded to Tranche (see below). The mass tolerance for the QSTAR Elite data were generally considered <50 ppm on the precursor and the fragment ion levels. In addition, during the search, Protein Pilot 3.0 performs a mass recalibration of the datasets based on highly confident peptide spectra. Specifically, a first search iteration is done to selects high confidence peptide identifications that are used to recalibrate both the MS and MS/MS data, which is automatically re-searched.
All data were searched using the publicly available SwissProt database release version 57.6 (28-Jul-2009). For plasma samples, a merged database of 20337 sequences was generated that contained sequences from 3 sources: 1) human; 2) Saccharomyces cerevisiae invertase standards: [P10594 (INV1_YEAST), P00724 (INV2_YEAST), P10595 (INV3_YEAST), P00724 (INV2_YEAST), and P10597 (INV5_YEAST)]; and 3) the Elizabethkingia miricola (Chryseobacterium miricola) PNGase F enzyme precursor P21163|PNGF_ELIMR.
To assess and restrict rates of false positive peptide/protein identifications, we used the Proteomics System Performance Evaluation Pipeline (PSPEP) tool available in ProteinPilot 3.0. This tool automatically creates a concatenated forward and reverse decoy database, and provides an Excel output of the experimentally determined false discovery rate at the spectral, peptide and protein levels with standard statistical errors . The discriminating variable for the Paragon Algorithm is the peptide confidence value, which is a 0-99.0 scaled real number . For database searches, a cut-off peptide confidence value of 95 was chosen with the following justification. Typically, for the lectin-enriched plasma datasets, the Protein Pilot false discovery rate (FDR) analysis tool (PSPEP) algorithm  provides a local FDR at 5% for a peptide confidence score of 94, and a global FDR at 1% for a peptide confidence score of 90. For all further data processing, the cut-off peptide confidence value of 95 was chosen. A key requirement for the positive identification of a PNGaseF-deglycosylated peptide was the motif NXS/T (in which X represents any amino acid except P), where N was converted to D; the N→D conversion was reported by a search engine as N deamidation. Based on these criteria, these peptides, which were classified as “deglycosylated”, are listed in Supplementary Tables 2 and 3 as well as in Supplementary Tables 4 and 5 (the latter showing only the deglycosylated peptides). To ensure inclusion of glycosites containing lysine and/or arginine in the X position (e.g., NKT), which were likely to have been cleaved by trypsin, the amino-acid residue following the carboxy-terminal cleavage site was also considered. Peptides containing the motif NGS or NGT were excluded due to the fact that asparagine residues in that sequence are prone to chemical deamidation during overnight trypsin digestion . The latter peptides are listed in Supplementary Tables 4 and 5 (see worksheet 2). For all deglycosylated peptides we examined the corresponding MS/MS spectra to ensure correct assignment. The MS/MS spectra were manually inspected based on an adaptation of previously published criteria  and the peptide was confirmed or deleted from the table. Specifically, deglycosylated peptides were validated according to the following criteria: (a) a good signal-to-noise ratio for the majority of fragment ions (S/N > 3); (b) a minimum of 3 consecutive y- or b-ions; (c) fragment ions that clearly determine the site of N-deamidation/deglycosylation; (d) intense y-ions from N-terminal proline; (e) an immonium ion for Arg (m/z 175.12) or Lys (m/z 147.11) consistent with the expected C-terminal tryptic site; and (f) a clean precursor ion isotope pattern demonstrating quantitative N-deamidation/degylcosylation.
The data associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash: udWtxXPMwG0TDQ7KrbC8o+14nI1aEGopRFn44unPCO119rWL8a2GWUH2edn3N76L2PAgiUsPuXT7wdKLH3rLC2coEAMAAAAAAACVBA==
The hash may be used to prove exactly what files were published as part of this manuscript's data set, and the hash may also be used to check that the data has not changed since publication. Accessible information includes all raw data files and processed data report files, as well as detailed data file annotation legends.
Detailed information on identified peptides was extracted from ProteinPilot 3.0 result files. The peptide data generated in all experiments were assembled into a single table using software developed in-house. Retention times (RT) in different runs were aligned across all runs using a linear trend calculated by fitting a line to the RT pairs for all common observations among runs. All distinct peptides that were observed in at least one run and were identified with a confidence score of at least 95 were combined to generate a Master List. The results of each individual run were then compared to the Master List. If a peptide was identified in a run with a score of 95 or above, it was retained. Peptides with scores of < 50 were discarded, while those with scores between 50 and 95 were included if their aligned RTs differed from those of a high-confidence (>95) peptide identification by ≤ 25 seconds. The program totaled the number of all peptides and deglycosylated peptides observed in each sample. These totals, along with the percentage of deglycosylated peptides relative to total peptides observed for each sample were displayed. Due to a combination of missed tryptic cleavage sites (partial digestion) and non-specific cleavages, we observed multiple peptides covering the same protein glycosylation site. To ensure that only distinct glycosylation sites were counted, each deglycosylated peptide was annotated with the amino acid number of the glycosylated residue (the glycosite). Then, the number of distinct glycosites observed was reported. Finally, the program monitored proteins with at least one deglycosylated peptide and produced a table of glycoproteins (and the corresponding glycosites) observed in the samples.
In these experiments, we took the approach of developing, as a first step, standards that allowed us to optimize the conditions that favor glycopeptide enrichment, which was a key element in our experimental design. As a test of lectin specificity, we interrogated the binding characteristics of the columns by chromatographing positive controls that met the structural requirements for interaction and negative controls that did not. We selected standards with well-characterized glycan structures [39-41] that reflected the specificity of the lectins we employed. Specifically, SNA binds sialic acid α2,6- and α2,3-linked to terminal galactose, and AAL binds fucose α1,6-linked to N-acetylglucosamine and α1,3-linked to N-acetyllactosamine. Accordingly, we expected that sialylated and fucosylated human lactoferrin glycopeptides [39; 40] would bind to both SNA and AAL. By contrast, yeast invertase glycopeptides, which carry only high mannose structures, should not interact with either lectin . Indeed, this was the case for both lectins, as exemplified by the AAL data shown in Fig. 1 (and substantiated by the MS data shown in Supplementary Figures 1A-1K). In the case of lactoferrin, glycopeptides were concentrated in the bound fraction and not observed in the flow-through. The principle species detected were two overlapping peptide sequences that were modified with various fucose-containing glycoforms (as confirmed by MALDI-MS/MS analyses, Supplementary Figures 1A-1G, and Supplementary Table 1). Conversely, invertase-derived glycopeptides were observed predominantly in the flow-through fraction. To demonstrate that the lectin chromatographic separation was glycan-dependent and reproducible, we rechromatographed the glycopeptide-containing fractions from each standard and demonstrated consistent behavior. Moreover, the lactoferrin glycopeptides recovered in the bound fraction were further enriched relative to non-glycosylated species. Accordingly, we used unseparated invertase glycopeptides and lectin-enriched lactoferrin glycopeptides as internal standards in all subsequent experiments.
Next, we optimized the conditions for HPLC separation of human plasma glycopeptides (Fig. 2). Specifically, we digested MARS 14-depleted human plasma with trypsin, spiked in the glycopeptide standards and fractionated the mixture by using lectin chromatography. To prove binding specificity and facilitate peptide identification, we used PNGase F to remove N-linked glycans. This enzyme, an amidase, catalyzes the hydrolysis of the amide bond on the asparagine side chain, releasing the glycan and converting asparagine to aspartic acid. This modification produces a +1 Da increment in the molecular mass of the deglycosylated peptide. When observed in the context of an N-linked consensus sequence (NXS/T, where X is any residue except proline), this mass shift strongly suggested that the peptide was N-glycosylated. Due to technical constraints, lectin enrichment of O-linked glycopeptides was not addressed in these studies, and all further references to glycopeptides pertain to N-linked species unless otherwise stated.
Initial optimization focused on the PNGaseF digestion conditions. Specifically, a wide range of enzyme:substrate ratios and reaction volumes was assessed. The extent of deglycosylation of glycopeptide standards and tryspin-digested MARS 14-depleted plasma was monitored by MS. For the standards, MALDI MS was used to detect the loss of ions corresponding to glycopeptides, and the corresponding increases in signals from the deglycosylated peptides (data not shown). For plasma, ESI-HPLC MS/MS was used to assess the extent of deglycosylation. Over the range of enzyme concentrations tested, 1 −20 U/μg of peptide, we observed only a modest increase (max. Δ = 2%) in product formation. Accordingly, we used the highest enzyme concentration that was practical given cost considerations (2.5 U/μg).
Next, we optimized the affinity chromatography methods. First, we focused on buffer A (25 mM Tris HCl, 10 mM CaCl2, 10 mM MgCl2), which was supplemented with increasing amounts of NaCl (0-500 mM) in an effort to identify the maximum salt concentration at which lectin-specific interactions were maintained as determined by the partitioning of lactoferrin and invertase glycopeptides to the bound and flow-through fractions, respectively. Perturbation of lactoferrin binding was first observed at 100 mM NaCl and was exacerbated at higher salt concentrations. Therefore 50 mM was chosen as the working concentration. We also optimized the column wash step, i.e., the elution time between collection of the flow-through and bound fractions with the goal of maximizing recovery of bound glycopeptides, as determined by LC MS/MS, and to increase the reproducibility of the elution profiles. Initially, we used a high-flow, high-pressure wash step (2 mL/min at 1600 psi) of approximately 40 column volumes. However, under these conditions, glycopeptides eluted from the column before application of the elution buffer, indicating that some glycan-mediated interactions did not withstand extensive washing. Furthermore, the UV traces of the wash were highly variable. Therefore, we significantly modified this step by reducing both the flow rate and the volume. Ultimately, we used 0.5 column volumes under low-flow and low-pressure conditions (50 μL/min at 30 psi), which preserved glycopeptide binding and stabilized the UV-monitored elution profile.
As a last step, we optimized the sample load with the goal of maximizing the number of glycopeptides recovered in the bound fraction. We tested a range (10-200 μl) of plasma volumes (as defined in Experimental Procedures) corresponding to ~50-1000 μg of peptides, and determined the number of glycopeptides and the percent glycopeptide enrichment (total glycopeptides/total peptides) in the flow-through and bound fractions (Table 1). Interestingly, the results differed between AAL and SNA, perhaps reflecting the relative availability of their target carbohydrate modifications among human plasma glycoproteins, as well as the significant structural differences in the lectin molecules themselves. For example, at low peptide loads, SNA selected a large number of glycopeptides (>150) at a high percent enrichment (>50%), while AAL produced a similar level of enrichment (>40% enrichment), but bound very few glyco- and non-glycopeptides. In both cases, applying increasing amounts of starting material to the column enabled the identification of greater numbers of bound glycopeptides. However, SNA-specific glycopeptide binding plateaued at 500 μg of sample, whereas the number of glycopeptides that bound to AAL continued increasing through the highest amount tested (i.e., 1 mg).
For both lectins, the percent enrichment decreased as more sample was applied to the column. The flow-through fractions also reflected differences in lectin binding. At low peptide concentrations, fewer glycopeptides were detected in the unbound SNA fraction, as determined by the absolute glycopeptide numbers and percent enrichment. As the peptide load increased, the glycopeptide numbers and enrichment levels became more similar for the two lectins. As the goal of this workflow was to enable robust and comprehensive biomarker discovery efforts, we elected to sacrifice specificity of glycopeptide enrichment in favor of increasing the total number of bound glycopeptides. Accordingly, 200 μl of plasma (~1 mg of peptides) was chosen as the final volume of starting material. The protocol was provided online as a video demonstration intended to enable viewers to develop their own lectin chromatography systems , however the lectin affinity datasets, specific results and conclusions are exclusively described in this paper.
We evaluated performance of the optimized lectin affinity step by separating MARS 14-depleted, trypsin-digested plasma from a pool of healthy donors. Numerous replicates were performed to assess the reproducibility of the protocol. Specifically, sample separation on AAL was repeated 14 times, and on SNA, 10 times; each bound fraction was subjected to either duplicate or triplicate LC-MS/MS analyses. The entire dataset for flow-through and bound fractions is presented in Supplementary Tables 2 and 3. This includes all peptide sequences (± modifications) with confidence scores and retention times. In addition, for both fractions of each experimental replicate, the table summarizes the number of peptides, glycopeptides, and the % glycopeptide enrichment. Please note that glycopeptides containing NGS/T in the N-linked consensus sequence were excluded from the list, as asparagines neighboring glycines are easily deamidated during trypsin digestion . Additionally, we noted 4 putative glycopeptides containing the sequence NXC, which has recently been proposed as an alternative N-linked consensus sequence . MS/MS spectra for these unusual glycosites is included in the supplement, however they were excluded from the list of glycosites observed. For easy reference, the glycoproteins and their glycosites are summarized at the bottom of each table. Spectra documenting each of the observed glycopeptides are included as Supplementary Figure. 2. Finally, separate spreadsheets listing exclusively the observed glycopeptides for AAL and SNA chromatography, and the excluded NGS/NGT-containing peptides are supplied as Supplementary Tables 4 and 5.
Key elements of the data are summarized as follows. Overall, we used two binary gradient elution profiles (2 h or 3 h) for the reversed-phase separation prior to MS/MS. The total number of peptides identified in each run varied accordingly—563-842 and 1035-1726 for the shorter and longer times, respectively. With respect to distinct glycosites, 48-69 were observed in the flow-through fraction, 53-128 bound to AAL, and 117-145 interacted with SNA. The percent glycopeptide/total peptide values were 4-5% in the flow-through fraction for each lectin, increasing to 8-15% in the AAL bound portion, and 11-18% in the subset that interacted with SNA. In total, we identified 119 human plasma glycoproteins, containing 227 distinct glycosites, of which 62 (52%) bound both lectins; 37 (31%) bound to AAL and 20 (17%) interacted with SNA (Fig. 3A). The glycosites demonstrated a slightly different distribution; 117 (51%) bound to both lectins, while AAL and SNA each bound 55 (24%), possible evidence of heterogeneity of glycan placement along the protein backbone (Fig. 3B). In support of this, we observed that 22 glycoproteins contained multiple glycosites that were differentially identified by the two lectins. For example, complement C2 contained 3 glycosites observed in this study—N29, N621 and N651. While the first glycosite was captured by both lectins, N621 was detected only with SNA, and N651 only with AAL (Table 2).
As to the reproducibility of this workflow, similar glycosites were identified in replicate lectin chromatography experiments as determined by two measures—the percentage of total glycosites observed in a given number of experiments (Fig. 4A), and pairwise repeatability, or the percentage of glycosites present in two replicate datasets (Fig. 4B). With respect to the former, fully 35% of all glycosites detected in the AAL workflow were found in the bound fractions of all 14 experiments, and >65% were observed in at least half. SNA chromatography was even more robust, with 60% of all glycosites observed in all 10 experiments, and close to 90% observed in 5 or more. With respect to the latter, the median pairwise repeatability was 73% for AAL and 80% for SNA.
All plasma glycoproteins and glycosites bound by AAL and SNA are listed in Table 2, along with information (where available) regarding the extent of protein glycosylation and published concentration ranges in human plasma or serum. We were particularly interested in the latter values, as the major goal of this work was to employ lectin chromatography followed by MS to capture and identify low-abundance glycopeptides from biological samples. Two measures suggest we met this objective. First, of the 119 identified glycoproteins, 55 (46%) were not counted among the top 150 most-abundant plasma proteins, which are present at concentrations greater than 600-900 ng/mL . Second, at lower abundances, we detected, with high confidence, proteins that circulate at levels that are comparable to those that are currently used as biomarkers (Fig. 5). These included cadherin-5, which is found at 0.3-4.9 ng/mL, neutrophil gelatinase-associated lipocalin with levels estimated at ~2.5 ng/mL, and hepatocyte growth factor activator, which is present at 3-6 ng/mL [45-47].
Although lectin chromatography had been previously used as a method for glycoprotein/glycopeptide enrichment [48; 49], five criteria, chosen to broaden and strengthen the utility of this protocol, shaped our approach. First, our goal was to develop a robust, easily adaptable method that would be within the financial and technical reach of most laboratories engaged in biomarker discovery. Second, we focused on biologically relevant species, i.e., carbohydrate modifications that were associated with cancer. Third, to establish lectin specificity, we included positive and negative glycopeptide standards. Fourth, we used relatively small plasma volumes that are more easily accessible from banked collections of high-quality patient and control samples. Fifth, and most importantly, we sought to develop a method that could access low-abundance glycoproteins in a complex mixture (i.e., human plasma) for the purpose of biomarker discovery.
We accomplished these goals by optimizing specific components of the workflow. The final protocol required a minimum number of procedures and no specialized equipment beyond that found in a typical protein research laboratory that uses standard chromatographic separation strategies and has access to routine biological MS technology. The lectins chosen for initial studies, AAL and SNA, recognize fucose- and sialic acid-containing carbohydrates, respectively, since expression of the glycosyltransferases that add these terminal residues is frequently upregulated in cancer cells . We used glycopeptide standards as quality control measures to monitor the performance of the lectin chromatography step. For this purpose, we used fucosylated and sialylated N-linked lactoferrin peptides, which met the structural requirements for lectin binding, and high-mannose species from invertase, which did not. In parallel, numerous parameters were assessed to develop a robust and reproducible protocol that uses relatively small plasma volumes, an important issue given the paucity of well-annotated patient and control specimen banks. Finally, using established MS platforms, our method enabled routine detection of numerous low-abundance (ng/mL) plasma glycoproteins from a pooled sample of plasmas obtained from healthy donors. Thus, by optimizing each step of the workflow, we developed a method that can be used to exploit relatively underappreciated features of cancer-related post-translational modifications, such as disease-related structural differences in glycosylation and their attachment sites in terms of the tryptic peptide sequences.
The sensitivity and specificity inherent in these data are a largely unexplored biomarker discovery space that commonly used methods (e.g., lectin chromatography at the glycoprotein level, and hydrazide- or boronic acid-mediated chemical capture of glycoproteins/glycopeptides) do not typically access (Table 3). Specifically, our ability to study oligosaccharide changes at the glycosite level allows a powerful means to assess subtle disease-related modifications in carbohydrate structure and placement along the protein backbone. Accordingly, this study identified 23 glycoproteins from healthy pooled human plasma in which distinct glycosites were detected using different lectins (Table 2). We anticipate that this precision will make our workflow highly responsive to cancer-related variations in oligosaccharide structure.
As we work at the glycopeptide level, the % glycopeptide enrichment in our workflows cannot be directly compared with that of lectin chromatography performed at the glycoprotein level. The latter methods commonly use both peptides and glycopeptides to identify enriched proteins. As a majority of human plasma proteins are glycosylated , both specific and non-specifically bound species are likely to be derived from glycoproteins, with the latter subset artificially contributing to the reported enrichment. By contrast, our workflow set a very high bar for the identification of a lectin-bound glycoprotein—we required that the mass spectrometric data specifically identified the glycosite itself (as a deamidated asparagine after PNGase F treatment). Importantly, the MS/MS spectra for each glycosite-containing peptide were manually inspected as described in the Experimental Procedures (excluding NGS/T sequences as a conservative measure).
During method optimization, we found that AAL and SNA behaved differently with respect to glycopeptide capture from a given sample volume. Specifically, SNA bound more glycopeptides from a smaller sample volume than did AAL. This effect likely reflected the greater abundance of sialylated as compared to fucosylated species in human plasma, although other factors such as differences in carbohydrate binding kinetics between these two lectins could also play a role. As the amount of sample loaded onto the column was increased, the number of glycopeptides identified by both lectins rose, however the binding specificity decreased. Because reaching the biologically-relevant glycoproteome was our primary goal, we pursued the method that allowed deeper glycopeptide coverage. This was a feasible strategy as our glycopeptide standards confirmed that the columns were indeed selecting fucosylated and sialylated species, and the presence of an N-linked consensus sequence deglycosylated at the Asn residue allowed facile detection of deglycosylated peptides. Preliminary experiments in which we assessed the chromatographic efficiency of various column sizes suggested that the observed enrichment effect was related to the peptide content of the sample in relationship to the column bed volume (data not shown). Although we selected final conditions that optimized detection of a maximum number of lectin-bound glycopeptides from a minimal amount of plasma using relatively inexpensive amounts of lectins, it is likely that these parameters could be modified to suit a variety of endpoints. For instance, investigators that are not limited by sample volume or cost might scale-up to larger columns to achieve higher yields of enriched glycopeptides. Thus, in accord with the fact that individual lectin species have unique structural features  and distinct saccharide binding motifs, our work shows that the performance of each carbohydrate-binding protein can and should be optimized using the appropriate glycopeptide standards.
We elected to begin these studies with two lectins of complimentary glycan specificity, which detected distinct cancer-related carbohydrate modifications. This choice was motivated by the expectation that these lectins would enable identification of distinct, but overlapping, plasma glycopeptide subsets. Accordingly, we anticipated that this approach would allow us to identify greater numbers of low-abundance glycoproteins, thus reaching deeper into the plasma glycoproteome. Indeed, these hypotheses proved to be correct. Of the 119 plasma glycoproteins observed in the bound species, approximately 52% were selected by both lectins, 31% by AAL, and 17% by SNA. Importantly, 90% of the glycoproteins identified using AAL alone and 30% of those observed with SNA alone were low-abundance relative to the previously described set of proteins occupying the upper 4 log concentration range in human plasma . Furthermore, the glycoproteins comprised a total of 227 unique glycosites, 6% of which had not been previously annotated as such, that were selected by lectin chromatography. Of these, 40% were bound by both AAL and SNA, while the remainder were selected by a single lectin. This observation illustrates that the described method can be used to detect glycan heterogeneity at distinct sites along a peptide backbone, in contrast to alternative techniques involving chemical capture or lectin chromatography at the glycoprotein level. Our findings also support the notion that judicious selection of lectins with regard to the biology of the system results in enriched capture of disease- or process-specific glycoproteins. Finally, the overall strategy we developed including the chromatographic enrichment protocol is easily adapted for other protein-based affinity selectors (our unpublished observations). Currently, we are focusing on carbohydrate-specific antibodies for the enrichment of glycosylated species bearing biologically-interesting saccharide motifs as demonstrated by the use of relevant and irrelevant glycopeptide standards .
In summary, this study design can serve as a blue print for the development of lectin-based workflows for capturing disease-related changes in the glycoproteome, with subsequent MS-identification of the modified protein substrates. Several features make glycoproteins especially attractive as biomarkers. First, biology can be used to guide the experimental design, and specific carbohydrate motifs can be targeted in a rational manner that is derived from the expression of the relevant glycosyltransferases in affected cells/tissues. Because glycosylation, like phosphorylation, is dynamic in nature and responsive to numerous biological stimuli , saccharide-motifs offer a window for viewing many normal and pathological processes. Second, as multiple saccharide-modifications can decorate a single protein scaffold, and as a single glycoform can be appended to numerous protein backbones, carbohydrate-based signals can be greatly amplified relative to a single protein sequence. Third, carbohydrate modifications offer some protection from proteolytic degradation, which raises the possibility that glycosylated molecules may be more biologically stable than their unmodified counterparts. Thus, we are very interested in applying this workflow and modifications, which utilize other lectins, for biomarker discovery projects that employ cell lines, tissues, and biological fluids in discovery-phase analyses.
1This work was supported by an NCRR shared instrumentation grant S10 RR024615 (to B.W. Gibson) and by grants from the National Cancer Institute, U24 CA126477 (S.J. Fisher, UCSF) and a U24 Subcontract (B.W. Gibson, Buck Institute for Age Research), which are part of the NCI Clinical Proteomic Technologies for Cancer initiative (http://proteomics.cancer.gov).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.