|Home | About | Journals | Submit | Contact Us | Français|
The extensive glycosylation of HIV-1 envelope proteins (Env), gp120/gp41, is known to play an important role in evasion of host immune response by masking key neutralization epitopes and presenting the Env glycosylation as “self” to the host immune system. The Env glycosylation is mostly conserved but continues to evolve to modulate viral infectivity. Thus, profiling Env glycosylation and distinguishing interclade and intraclade glycosylation variations are necessary components in unraveling the effects of glycosylation on Env’s immunogenicity. Here, we describe a mass spectrometry-based approach to characterize the glycosylation profiles of two rVV-expressed clade C Envs by identifying the glycan motifs on each glycosylation site and determining the degree of glycosylation site occupancy. One Env is a wild-type Env, while the other is a synthetic “consensus” sequence (C.CON). The observed differences in the glycosylation profiles between the two clade C Envs show that C.CON has more unutilized sites and high levels of high mannose glycans; these features mimic the glycosylation profile of a Group M consensus immunogen, CON-S. Our results also reveal a clade-specific glycosylation pattern. Discerning interclade and intraclade glycosylation variations could provide valuable information in understanding the molecular differences among the different HIV-1 clades and in designing new Env-based immunogens.
Among the distinct HIV-1 clades that have expanded worldwide, clade C infection is currently one of the fastest growing HIV-1 infections1–3. Indeed, it accounts for more than 50% of all global infections with 94% of HIV/AIDS clade C cases in sub-Saharan Africa4,5. Clade C viruses in general have unique biological and immunological properties that include ease of transmission6, exclusive CCR5 tropism during early infection7,8, sensitivity to neutralization from clade C serum donors9,10, and resistance to neutralization by carbohydrate binding antibody, 2G12, and by gp41 specific antibody, 2F59,11,12. Additionally, clade C Envs have unique molecular features, including highly conserved V3 region compared to clade B11,13, high amino acid sequence variability in the C3 region, a greater degree of amphipathicity of the α-2 helix14, shorter V1–V4 loops for better neutralization sensitivity15, and shorter gp120 early transmitted viruses10,11. These distinct molecular and immunological clade C Env features illustrate the relevance of correlating the clade-specific structural differences with the Env’s immunological properties. Establishing the relationship between clade-specific Env’s immunological properties and structural elements could provide fundamental knowledge about efficacious design of future HIV vaccine candidates.
One important molecular and structural feature of the Env that needs significantly more attention is the N-glycosylation profiles of Env. Env glycosylation is fundamental in every aspect of HIV biology that spans from proper Env folding and processing16,17, virus transmission10,18–20, and immune evasive mechanisms21–23. There are at least potential 24 N-linked glycosylation sites for a given Env that can be populated with high mannose, hybrid, or a highly diverse array of complex glycans24–28. The site utilization is known to vary within isolates and across different clades allowing the global glycosylation to evolve to maintain the glycan shield and facilitate viral transmission and dissemination23,29,30. Thus, the systematic analysis of the glycosylation profiles of Env could provide valuable insights in designing good Env immunogens.
An important step towards a systematic analysis of the glycosylation profiles of Env immunogen is to identify and compare distinct intraclade and interclade Env glycosylation profiles. A detailed characterization of the glycan patterns and determining the extent of site occupancy of potential N-glycosylation (PNG) sites are critical steps in elucidating certain glycosylation trends that influence the immunogenic and antigenic properties of Envs. And given the importance of glycosylation in HIV pathogenesis, molecular insights provided by glycosylation profiling are potentially useful in HIV vaccine development.
Recently, we have shown by a glycopeptide-based mass mapping approach that Envs’ glycosylation correlates with their immunological response27,28. We evaluated two recombinant Env immunogens- one derived from the Group M HIV-1 consensus (CON-S ΔCFI gp140) and one derived from a clade B primary isolate (JR-FL ΔCF gp140) and found that the better Env immunogen (CON-S) has predominantly high mannose glycans populating the region surrounding the immunodominant V3 region and has more unutilized glycosylation sites throughout the protein. While the CON-S glycosylation profile provides one example of the glycosylation profile of a good Env immunogen, several questions still remain. 1) Is the glycosylation profile displayed by CON-S a central glycosylation pattern for all consensus Env immunogens? 2) How do intraclade and interclade glycosylation profiles between Env immunogens vary? To address these questions it is necessary to examine distinctive glycosylation patterns of Env immunogens derived from other clades, including those from a clade consensus, and determine interclade and intraclade glycosylation trends.
Towards this goal, we characterized the glycosylation profile of two clade C recombinant Envs using the same glycopeptide-based mass analysis approach described in our previous study27,28. The clade C Envs are derived from immunogens expressing clade C consensus sequence, C.CON, and a clade C primary isolate sequence, C.97ZA012. Comparison of the glycosylation profiles of the clade C Envs reveals the following characteristic glycosylation profile - the consensus protein, C.CON, has a high level of high mannose glycans and a high degree of unutilized glycosylation sites compared to C.97ZA012. Our analysis also shows that both C.CON and C.97ZA012 share a common glycosylation pattern in the C2 and C3 regions where predominantly high mannose glycans are observed. This glycosylation profile could possibly influence the local Env structural conformation in these regions and may explain the increased infectivity in clade C viruses. Comparison of the glycosylation profiles of Envs of the same clade provides information on intraclade variations of glycosylation that help to modulate the intrinsic clade-specific molecular, structural, and immunological properties.
Ammonium bicarbonate, Trizma@ hydrochloride, Trizma@ base, ethylenediaminetetraacetic acid (EDTA), acetic acid, HPLC grade acetonitrile (CH3CN) and methanol (CH3OH), 2,5-dihydroxybenzoic acid (DHB), urea, α-cyano-4-hydroxycinnamic acid (α-CHCA), iodoacetamide (IAA), dithiothreitol (DTT), and formic acid were purchased from Sigma (St. Louis, MO). Water was purified using a Millipore Direct-Q3 Water Purification System (Billerica, MA). Sequencing grade trypsin (Tp), proteomics grade N-Glycosidase F (PNGase F) from Elizabethkingia meningosepticum, and glycerol-free PNGase F from Flavobacterium meningosepticum were obtained from Promega (Madison, WI), Sigma (St. Louis, MO), and New England BioLabs (Ipswich, MA), respectively.
C.CON and C.97ZA012 envelope proteins were obtained from the Duke Human Vaccine Research Institute in Durham N.C. These proteins were constructed with internal deletions, which generate marked improvement in oligomerization and immunogenicity31. Both Envs were expressed and purified as described in literature32,33. Briefly, recombinant vaccinia viruses (rVVs) expressing C.CON gp140 ΔCF and C.97ZA012 gp140 ΔCFI genes were used for production of soluble Envs34,35. For batch production of recombinant Envs, Envs were produced by infecting 293T cells with rVVs. The 293T cells were cultured in Dulbecco modified Eagle medium (DMEM; Invitrogen Corp, Carlsbad, CA) supplemented with 10% fetal calf serum in T150 tissue culture flasks and were grown to confluence at a multiplicity of infection (MOI) of about one before infecting with rVVs. At two hours postinfection, the cell culture was washed with serum free DMEM and the infection was allowed to proceed for 72 hours. Recombinant Envs were then purified from supernatants of rVV-infected 293T cell cultures using Galanthus nivalis lectin-agarose (Vector Labs, Burlingame, CA) column chromatography and stored at −70°C until use. A typical batch production of clade C Envs (using 30 T-150 TC flasks) would yield ~600–800 µg of protein. Protein concentration was determined by absorbance. Purified recombinant envelope proteins were concentrated for MS-based glycosylation analysis.
Details of protein digestion have been described elsewhere27. Briefly, samples containing 200 µg of the HIV-1 Envs, with protein concentration >4 mg/mL, were denatured with 6M urea in 100 mM tris buffer (pH 8.5) containing 3 mM EDTA. The proteins were reduced and alkylated with 10 mM DTT and 15 mM IAA at RT, respectively, and were digested at 37°C with trypsin at a protein:enzyme ratio of 30:1 (w/w) overnight, followed by a second trypsin digestion under the same conditions. The resulting HIV envelope glycoprotein digest was either subjected to off-line reversed-phase high performance liquid chromatography (RP-HPLC) fractionation for MALDI or RP-HPLC/ESI-FTICR MS analyses27,36. To ensure reproducibility and reliability of our method, protein digestion was performed three times on different days with Env samples obtained from the same batch and analyzed with the same experimental procedure. In addition, Env samples obtained from two different batches with different protein concentration were also digested and analyzed to determine lot-to-lot variations in glycosylation profile.
The Env deglycosylation experiment for MALDI MS analysis was performed as described elsewhere27. Briefly, glycopeptide enriched fractions were collected from an HPLC, as described above, and the glycans were released by adding 12 µL of 20 mM NH4HCO3 (pH 8.5) and 4 µL of diluted PNGase F solution (500 units/mL). The reaction was incubated overnight at 37°C and was stopped by heating the sample to 100°C. The resulting solution was subsequently analyzed by MALDI MS. Deglycosylation experiment for LC/ESI-FTICR MS analysis was performed by incubating ~50 µg of the Env with 1 µL of PNGase F solution (≥ 4500 units/mL) for a week at 37°C. Deglycosylated Env proteins were digested at 37°C with trypsin using the same digestion procedure described above. The resulting tryptic digest was analyzed by LC/ESI-FTICR MS.
MALDI MS and MS/MS experiments were performed on an Applied Biosystems 4700 Proteomics Analyzer mass spectrometer (Foster City, CA) operated in the positive ion mode. Samples were prepared by mixing equal volumes (1 µL each) of the analyte and matrix solutions in a microcentrifuge tube, then immediately deposited on a MALDI plate, and allowed to dry in air. Matrix used for the experiment consists of a 1:1 mixture of 10 mg/mL each of α-CHCA and DHB. Samples were irradiated with an ND-YAG laser (355 nm) operated at 200 Hz. High resolution mass spectra were acquired in the reflectron mode and were generated by averaging 3000 individual laser shots into a single spectrum. Each spectrum was accumulated from 60 shots at 50 different locations within the MALDI spot. The laser intensity was optimized to obtain adequate signal-to-noise (S/N) ratio and resolution for each sample. MALDI MS/MS data were acquired using a collision energy of 1 kV with air as collision gas.
LC/ESI-FTICR MS and MS/MS experiments were performed using a hybrid linear ion-trap Fourier transform ion cyclotron resonance mass spectrometer (LTQ-FT, ThermoScientific, San Jose, CA) directly coupled to Dionex UltiMate capillary LC system (Sunnyvale, CA) equipped with a FAMOS well plate autosampler. Mobile phases utilized for the experiment consisted of solvent A: 99.9% deionized H2O + 0.1% formic acid and solvent B: 99.9 % CH3CN + 0.1% formic acid. Five microliters of the sample was injected onto C18 PepMap™ 300 column (300 µm i.d. × 15 cm, 300 Å, LC Packings, Sunnyvale, CA) at a flow rate of 5 µL/min. The following CH3CN/H2O multistep gradient was used: 5% mobile phase B for 5 min, followed a linear increase to 40% B in 50 min a linear increase to 90% B in 10 min. The column was held at 95% B for 10 minutes before re-equilibration. A short wash and blank run were performed between every sample to ensure no sample carry-over. The ESI source was operated in the following conditions: source voltage of 2.8 kV, capillary temperature of 200°C, and capillary voltage of 46 V. Data were collected in a data-dependent fashion in which the five most intense ions in an FT scan were sequentially and dynamically selected for subsequent collision-induced dissociation (CID) in the LTQ linear ion trap using a normalized collision energy of 30% and a 3 minute dynamic exclusion window.
Glycopeptide compositions with singly utilized glycosylation site were elucidated using the web-based-tools, GlycoPep DB37 and GlycoPep ID38. Details of the analysis for glycopeptides have been described previously27,37,38. Briefly, compositional analysis was performed by identifying the peptide portion of a glycopeptide of interest from MS and MS/MS spectra generated from MALDI MS and LC/ESI-FTICR MS analyses. Peptide portions were elucidated from characteristic signature fragment ions of the cross-ring cleavages, 0,2X or 0Y1 ions in the MS/MS data using GlycoPep ID. Once the peptide portion is identified, plausible glycopeptide compositions are obtained from MS data using GlycoPep DB. For glycopeptides with multiply utilized glycosylation sites, experimental masses of singly charged glycopeptide ions from MS data were submitted to GlycoMod39. This program calculates plausible glycopeptide compositions from the set of experimental mass values entered by the user and compares these mass values with theoretical mass values, then generates a list of plausible glycopeptide compositions within a specified mass error. Plausible glycopeptide compositions in GlycoMod were deduced by providing the mass of the singly charged glycopeptide ion, enzyme, protein sequence, cysteine modification, mass tolerance, and the possible types of glycans present in the glycopeptide. Plausible glycopeptide compositions were manually confirmed and validated using MS/MS data.
Non-glycosylated peptides were identified by searching raw MS/MS data acquired on the hybrid LTQ FTICR mass spectrometer against a custom HIV database with 107 protein entries, obtained from the Los Alamos HIV sequence database (http://www.hiv.lanl.gov/content), using Mascot (Matrix Science, London, UK, version 2.2.04). The peak list was extracted from raw files using BioWorksBrowser (Thermo Electron Corporation, version 3.5). DTA files were searched specifying the following parameters: (a) enzyme: trypsin, (b) missed cleavage: 2, (c) fixed modification: carbamidomethyl, (d) variable modification: methionine oxidation, and carbamyl, (e) peptide tolerance of 0.8 Da, and (f) MS/MS tolerance of 0.4 Da. Peptides identified from Mascot search were manually validated from MS/MS spectra to ensure major fragmentation ions (b and y ions) were observed especially for peptides generated from PNGase treated Envs containing potential N to D conversions.
We evaluated the glycosylation profiles of two clade C Env immunogens, specifically, one protein is a synthetic Env sequence, C.CON, generated from the alignment of HIV-1 clade C gene sequences available from the 2003 Los Alamos HIV-1 Database; the other protein is a primary isolate Env, C.97ZA012, from an HIV-1 strain from South Africa12. Both Envs were constructed with shortened variable loops (V1–V5), and deletions of the cleavage site (C), fusion domain (F), (ΔCF) for C.CON and deletions of the cleavage site (C), fusion domain (F), the immunodominant region (I) in the transmembrane region (ΔCFI) for C.97ZA01232,33. The two C Envs are designated as C.CON gp140ΔCF and C.97ZA012 gp140ΔCFI. It should be noted that immunology data showed no apparent difference between gp140ΔCF and gp140ΔCFI constructs in terms of eliciting antibody response33. The full sequence alignment of C.CON gp140ΔCF and C.97ZA012 gp140ΔCFI with potential N-linked glycosylation (PNG) sites in red is shown in Figure 1. For consistency, the amino acid numbering positions were based on the reference strain, HXB2 (SwissProt accession number P04578). The protein sequence of the two Envs differs by 17%, as determined from the protein sequence alignment analysis tool, ClustalX40. There are 29 PNG sites for C.CON gp140ΔCF and 24 PNG sites for C.97ZA012 gp140ΔCFI. Between the two clade C Envs, 22 PNG sites are highly conserved and nine are not conserved as shown in green boxes (Figure 1). Conserved PNG sites are spread throughout the Env sequence with eight PNG sites located in the hypervariable regions: V1 (N133, N139, and N156), V2 (N160 and N187), V3 (N301), and V4 (N386 and N397), and 14 conserved PNG sites located in the conserved regions: C1 (N88), C2 (N197, N230, N241, N262, N276, and N289), C3 (N332, N339, and N356), and C4 (N442, and N448), and the transmembrane (N625 and N637) regions. PNG sites that are not conserved shown in green boxes in Figure 1 and are located in the C1, V2, C2, V4, V5, and transmembrane regions. For simplicity, C.CON gp140ΔCF and C.97ZA012 gp140ΔCFI are referred as C.CON and wild-type C, from here on and data presented in the following sections are from samples that were processed and analyzed three times using the same batch. These analyses obtained reproducible glycosylation profiles.
To determine the glycosylation motifs on each glycosylation site as well as to identify unutilized glycosylation sites, the Envs were denatured, reduced, and alkylated with IAA before being subjected to an in-solution tryptic digestion. The resulting digest mixture was divided into two equal aliquots, with one aliquot analyzed by MALDI MS in both linear and reflectron modes and the other by LC/ESI-FTICR MS. Before mass analysis, both aliquots were subjected to reverse-phase HPLC to separate the glycopeptides from peptides. Similar to our previous study27,28, the glycopeptide compositions and the glycosylation site occupancy are unambiguously identified from tandem MS analysis and deglycosylation experiments in combination with the web-based analysis tools, GlycoPep DB, GlycoPep ID, and GlycoMod. A 100% glycosylation site coverage was obtained for both C.CON and wild-type C. Table 1 summarizes the glycosylation site coverage showing the identified tryptic peptides bearing single and multiple potential N-linked glycosylation (PNG) sites (NXT/S) and their respective glycosylation site occupancy. Table 1 shows that C.CON has a lower degree of glycosylation site occupancy compared to wild-type C. These glycosylation sites are either fully or variably unutilized. C.CON differs in the degree glycosylation site occupancy with wild-type C in the following regions: at the end of the V2 loop, the V4 loop, and the transmembrane region.
Glycopeptides identified from MALDI MS and LC/ESI-FTICR MS analyses bear glycans consisting of high mannose, hybrid, and complex type structures as shown in Table 2 (see Supplementary Table for a complete list). The glycopeptides have either single or multiple glycosylation sites. Figure 2A and Figure 3A show MALDI MS and LC/ESI-FTICR MS spectra typical of a glycopeptide rich fraction. These data show glycopeptides with the singly utilized glycosylation site found at the beginning of the V3 region for wild-type C (Figure 2A), and the C4 and the transmembrane regions for C.CON (Figure 3A). A characteristic feature of the mass spectra of glycopeptides with a singly utilized glycosylation site is a series of singly charged peaks in MALDI MS or doubly/multiply charged peaks in LC/ESI-FTICR MS separated by a mass difference equivalent to the mass of the monosaccharide units (hexose (Hex), N-acetylglucosamine (HexNAc), fucose (Fuc), and sialic acid (NeuNAc)). These characteristic patterns in the MS data helped identify ions that were likely glycopeptides.
For both MS techniques, compositions of glycopeptides with one glycosylation site were elucidated from the fragmentation pattern of the glycopeptide peaks observed in the high resolution mass spectra. The peptide portion and the glycan compositions were identified from the MS/MS data (Figure 2B, Figure 3B, and 3C). The peptide portion was determined from the characteristic glycosidic cleavages, 0,2X or Y1 ions (Figure 2B, Figure 3B, and 3C). Peptide sequences identified from MALDI MS/MS were further validated from the MS and MS/MS analysis of the deglycosylated glycopeptide fraction of interest (Figure 2C and 2D). A typical MALDI MS spectrum of the deglycosylated fraction shows deglycosylated peptides with potential N to D conversion depending on the site utilization (Figure 2C). Peptide sequences were confirmed from the observed fragmentation pattern in MALDI MS/MS spectrum (Figure 2D). For glycopeptides with multiple glycosylation sites, glycopeptide compositions were determined from GlycoMod using the data obtained from high resolution LC/ESI-FTICR MS (Figure 4A) or MALDI MS in the linear mode (Figure 4C). Results from GlycoMod were validated using the MS/MS data (Figure 4B). Overall, a total of 300 unique glycopeptide compositions per Env were identified for both singly and multiply glycosylated glycopeptides. The full list of the identified glycopeptides is included in the supplementary information.
The glycan compositions of the two clade C Envs were broadly grouped according to the type of glycan found on each glycopeptide and were plotted in a bar graph. A glycopeptide could have a single or multiple glycosylation sites. Each glycopeptide is represented with a bar or a pair of bars corresponding to the glycan percentage of processed or high mannose glycans that is arranged according to the Env sequence position. The following criteria was used to determine the relative number of processed or high mannose glycans: “high mannose” includes structures with 5–9 mannose units and “processed glycans” includes both hybrid and complex type structures with hexose (Hex) ≥3 and N-acetylglucosamine (HexNAc) ≥ 4 or Hex ≥ 4 and HexNAc ≥ 327. Bar graphs showing the differential glycan profiles between the synthetic consensus Env, C.CON, and wild-type C, are shown in Figure 5. Comparison of the glycan profiles of wild-type C and C.CON shows similarities as well as differences throughout the Env. The glycan profile of wild-type C and C.CON are different in the following glycosylation sites in the corresponding Env regions: N275 and N289 in the C2 region, N301 in the V3 loop, N339 in the C3 region, N386 at the beginning of the V4 loop, N442 in the C4 region. These glycosylation sites have more high mannose glycans (%glycan ≥ 50) in C.CON compared to wild-type C. On the other hand, the glycan profiles on the conserved glycosylation sites, N230, N241, N262, and N332 in the C2 and C3 regions, for wild-type C and C.CON are the same. These glycosylation sites are populated with high mannose glycans. Glycosylation site at N234 in C.CON was not present in wild-type C due to a mutation.
In an effort to determine the minimum concentration of Env needed for analysis and the variations in glycosylation profile between batches of the same protein, glycosylation analysis of C.CON from two different batches with different concentrations following the same procedure described above was performed. One batch (C.CON #1) has a protein concentration of 14.2 mg/mL and the other batch (C.CON #2) has a protein concentration of 2.16 mg/mL as determined from their absorbance. These two batches were produced from different rVV-infected 293T cell cultures. A 100% glycosylation coverage of the glycosylated peptides was obtained for C.CON #1 and 48% for C.CON #2. Based on these results, and previous analyses we have conducted on Envs with different initial concentrations, we determined that an initial protein concentration >4 mg/mL is needed for complete coverage. This corresponds to a concentration of about 28 µM. Due to these requirements, this type of analysis is well suited for analyzing recombinant protein but possibly not well suited for analyzing Env isolated off virions if present in low concentration.
To examine the variation in glycan profile between these samples, glycan profiles of the eight most abundant glycopeptides in C.CON #1 were compared with the same eight most abundant glycopeptides in C.CON #2. A bar graph showing the glycan profiles of C.CON #1 and C.CON #2 in Figure 6 shows remarkably similar glycan profiles. The similarity of these profiles reinforces the fact that differences in glycan profiles are not related to the analytical conditions, the starting concentration of the sample, or the particular batch of cells used. Rather, differences in the bar graphs, such as those in Figure 5 indicate fundamental differences in the proteins. We have not yet tested whether or not changing the cell line that produces the protein would change the glycosylation profile. However, it is fully reasonable to expect that a different cell line could generate protein with modified glycosylation and that a modified glycoprotein may have a different immunological response.
Clade C is the most rapidly spreading form of HIV-1, and this is the first analysis of the glycan profiles of any clade C Env protein. The data reported here extends our previous work in comparing glycosylation profiles with immunogenicity data for two other Env immunogens. It is well established that Env glycosylation is crucial in host immune regulation by affecting protein conformation as well as masking key protein epitopes22,30,41–45. Thus, assessing the differences and similarities in glycosylation between Env immunogens and distinguishing the profiles that correlate to Env immunogenicity is important in the improvement of vaccine design and efficacy. We established previously by a glycopeptide-based mass mapping approach that a good immunogen has a glycosylation profile with higher levels of high mannose glycans surrounding the immunogenic V3 region and higher levels of unutilized glycosylation sites27. When compared with immunology data, such a glycosylation profile correlates to better induction of both humoral and cellular immunity in small animal and primate models33,46. Clearly, how well an Env immunogen induces a potent immune response depends in part on the number of PNG sites, the degree of glycosylation site occupancy, and the glycan motifs populating each glycosylation site. However, these elements vary considerably between isolates and across clades, due to the HIV genetic diversity14,47,48. Thus, evaluation of Env glycosylation is in part crucial in establishing the utility of Env immunogens as potential components of a vaccine regimen. As an important step in understanding the influence if glycosylation to immunogenecity the following questions must be addressed: (1) What are the interclade differences and similarities in the glycosylation profiles? (2) How does the glycosylation profile of a group M global consensus (CON-S) differ from a clade-specific consensus?
We characterized the glycosylation of two clade C Env immunogens to determine the variations in glycosylation profiles and correlate our results with immunology data. We showed that C.CON and wild-type C differ by 17% in amino acid sequence, which is well within the interclade genetic variation (>15%)47. The relative difference in Env amino sequence determines the number of PNG sites on the Env. In fact, for the collection of gp120 protein sequences in the Los Alamos HIV sequence data repository, the number of PNG sites ranges from 18–33, with a median of 2530,48. The relative variation in the number of PNG sites is due to an insertion or deletion specifically in the hypervariable regions48. In general, a loss of a PNG site and/or lesser degree of glycosylation site occupancy will impose lesser constraints on neutralizing antibodies (NAbs) to access key neutralization epitopes. Our results show that the consensus protein, C.CON, has more unutilized glycosylation sites compared to the wild-type C (See Table 1). There are seven unutilized sites for C.CON and two for wild-type C. Most of these unutilized glycosylation sites for both clade C Envs are conserved. Six of the seven unutilized sites for C.CON and the two unutilized sites for wild-type C are variably utilized. One of the seven unutilized sites for C.CON is not glycosylated at all. This non-glycosylated site is located at N625 in the transmembrane region and the same site is utilized in wild-type C. Glycosylation sites that are variably unutilized are located in the V1/V2 loops (N133, N156, and N187), V4 loop (N386 and N397) and transmembrane region (N616) for C.CON and in the V1/V2 region (N156 and N184) for wild-type C. Variably unutilized PNG sites that are not conserved between the two Envs (shown in green boxes in Figure 1) are either deleted or mutated in either C.CON or C.97ZA012. These PNG sites are located at N184 in the V2 loop for wild-type C and at N616 in the transmembrane region for C.CON. Corresponding glycosylation sites at N187 in wild-type C and at N616 in C.CON were deleted.
How these open glycosylation sites may influence the Env immunogenicity depend on their location and their distribution in the Env structure. More unutilized glycosylation sites in regions where neutralization epitopes are located would lead to better Env immunogenicity. Indeed, removal of glycans in the variable loops is known to increase neutralization sensitivity49. The variably unutilized glycosylation sites at N133, N156, and N187 for C.CON and N156 and N184 for wild-type C lie at the base of the V1/V2 loop and are proximal to the two disulfide bonds that define the V1/V2 loop. Glycosylation on these sites could affect the flexibility as well as the orientation of the V1/V2 loop. Additionally, it has been proposed that there is interaction between V1/V2 and V3 loops wherein glycans play a role in helping to protect the CD4 binding site against NAbs50. Thus, absence of glycans in the V1/V2 loop would make the CD4 binding site vulnerable to NAbs. Direct comparison of the glycosylation site occupancy between C.CON and wild-type C in the V1/V2 loop shows difference in glycosylation site occupancy at sites N133 and N187. Both of these sites are variably unutilized in C.CON but fully utilized in wild-type C. While the wild-type C Env nominally has an extra glycosylation site at N184 that is absent in C.CON, this site is only variably unutilized, which significantly mitigates its impact. Clearly, C.CON is less glycosylated compared to wild-type C. Less glycosylation in the V1/V2 loop in C.CON promotes better accessibility of Abs to the CD4 binding site.
Apart from the V1/V2 loop, there are two more variably unutilized PNG sites in C.CON at N386 and N397 in the V4 loop. N386 is located at the base of the V4 loop and proximal to the chemokine receptor binding site, while N397 is within the V4 loop51–55. The lack of glycans at these sites could allow for better accessibility of antibodies to the critical CD4 induced epitopes. Finally, the highly conserved glycosylation in the transmembrane region has been reported to effectively shield the underlying epitopes in this region56. The lack of glycan at N625 in C.CON could increase the neutralization sensitivity to this region, once it is exposed.
While there is mounting evidence that the lack of glycans on PNG sites or the absence of a PNG site is a good measure to differentiate a good immunogen from a poor immunogen, the diverse array of glycans decorating the glycosylation sites can also be characterized and correlated to the immunogenicity of different Envs. Indeed, the type of prevalent glycans on each glycosylation site and how they are distributed throughout the Env could help define the glycosylation pattern that potentially influences Env immunogenecity. This study takes a small step toward unraveling the correlation between glycosylation profiles and immunogenicity by comparing the profiles of two clade C proteins. MS analysis of the glycan profiles of wild-type C and C.CON shows distinct difference in the glycan patterns between the clade C Envs (Figure 5). C.CON generally displays a higher population of high mannose glycans compared to wild-type C in the C2 region, V3 loop, C3 region, V4 loop, and C4 region. When mapped onto the gp120 structure, these glycosylation sites (N275, N289, N301, N339, N386, and N442) are located in the outer domain of gp120 and are within the 2G129,11,12,57,58 and the IgG1 b12 binding sites59. Considering that high mannose glycans promote proper Env folding and stabilize protein conformation60–62, it is likely that the glycosylation sites with high mannose in C.CON provide conformational stability in this region. This stabilization could enhance the ability of C.CON to elicit a wider breadth of NAbs to this region. This interclade variation in glycosylation reflects a difference in structural conformation in this region between the two clade C Envs. The higher level of high mannose glycans in C.CON indicates a highly structurally conserved Env that in part correlates with its ability to induce wide breadth of NAbs compared to wild-type C63.
In addition to the differences in the glycan profile between the two clade C Envs, four conserved glycosylation sites in the C2 and C3 regions of C.CON and wild-type C have similar glycan patterns, specifically at N230, N241, N262, and N332 which predominantly bear high mannose glycans. Studies have shown that HIV-1, HIV-2, and SIV are transmitted efficiently to T-cells when their respective Envs have high levels of high mannose glycans29. This process is mediated by DC-SIGN and DC-SIGNR, which are calcium dependent C-type lectins that facilitate HIV transmission and dissemination64,65. These C-type lectins have high affinity and binding specificity to high mannose glycans66–69. With this precedent, it is possible that the clade-specific glycan pattern in clade C Env observed in this study (mainly, the presence of high mannose glycans in the C2 region) is responsible for efficient transmission of clade C viruses through mediation by DC-SIGN and DC-SIGNR. This glycan profile is remarkably conserved within clade C Env immunogens analyzed in this study and possibly important in modulating clade C infectivity. While the role of this glycosylation pattern in the Env’s immunology remains to be determined, future studies could focus in elucidating intraclade glycosylation variations that modulate DC-SIGN binding and infectivity.
How does the overall glycosylation profile of the clade consensus, C.CON, compare to the group M consensus, CON-S? Figure 7A and 7B show a composite representation of the overall glycosylation profile that includes the unutilized glycosylation sites as well as the glycan profiles for CON-S, C.CON, and wild-type C. The data presented for CON-S were obtained from our previous study27,28. Comparison of the overall glycosylation profiles reveals that C.CON reflects a similar glycosylation pattern as CON-S. Both consensus Envs have high levels of unutilized glycosylation sites and high levels of high mannose glycans surrounding the immunogenic V3 region. CON-S has a higher number of total PNG sites and unutilized glycosylation sites compared to C.CON (Figure 7A). Additionally, the unutilized glycosylation sites and the respective distribution of these sites in the Env regions in C.CON follow a similar trend observed in CON-S. These unutilized glycosylation sites are well conserved in both of the consensus Envs, indicating a trend for more open glycosylation sites for good Env immunogens.
When all glycosylation sites are considered for CON-S, C.CON, and wild-type C, C.CON has higher levels of high mannose glycans than the clade-generic consensus Env, CON-S (Figure 7B). This difference likely reflects clade specific differences, where clade C proteins have inherently more high mannose glycans in the C2 region. The fact that this is a clade-specific difference is further supported by data that shows the clade C wild type Env protein had a much higher level of high-mannose glycans, compared to the clade B wild-type Env protein, JR-FL (data not shown.) Lastly, taking into consideration the glycan profiles between C.CON and CON-S, the utilized glycosylation sites have high levels of high mannose glycans for both consensus Env immunogens between the C2 region and the beginning of the C3 region. This consensus Env glycosylation feature indicates that Env conformation in these regions is conserved for both C.CON and CON-S. These data correlate well with immunology data where both the clade consensus and group M consensus Env immunogens elicited wide breadth of NAbs when used as vaccine components in small animal models63.
Mass spectrometry-based glycosylation profiling of two rVV expressed clade C HIV-1 Envs described in this study provides an informative approach to differentiate glycosylation profiles of Env immunogens. A combination of MALDI-MS and LC/ESI FTICR-MS with liquid chromatography, tandem MS analyses, and web-based analysis tools were used to map the glycan motifs as well as glycosylation site occupancy in a glycosylation site-specific fashion. Our results show that the consensus protein, C.CON, has more unutilized glycosylation sites and a high level of high mannose glycans in the region surrounding the immunodominant V3 loop but also in the C4 region and at the beginning of V4 loop. Interestingly, the glycosylation profile of C.CON shows some degree of similarity to another consensus immunogen, CON-S, derived from the global group M env sequence. The common glycosylation features between C.CON and CON-S indicate that consensus Envs have similar functionally conserved Env conformations and therefore similar immunogenicity63. It is also important to note that our results also reveal a clade-specific glycosylation pattern in the C2 and the C3 regions.
While an effective vaccine against HIV infection remains elusive, considerable progress has been made so far in understanding the immunopathogenesis of HIV. The Env is a major target for vaccine design and one Env-based immunogen design strategy is delineating the Env glycosylation and modifying the glycosylation to improve the breadth of NAb coverage27,44,49,50,63,70–74. Thus, a systematic comparison of interclade and/or intraclade glycosylation profiles between Env immunogens may prove useful in distinguishing a definitive glycosylation patterns that positively contributes to Env’s immunogenicity and the knowledge gained from this study will certainly enhance vaccine design,
This work was supported by NIH grant RO1GM077226, PO1AI61734, and a collaboration from the AIDS Vaccine Development Grant from the Bill and Melinda Gates Foundation to Barton F. Haynes. We would also like to thank the Applied Proteomics Laboratory at KU for instrument time.