|Home | About | Journals | Submit | Contact Us | Français|
Naive CD4+ T cells are the common precursors of multiple effector and memory T-cell subsets and possess a high plasticity in terms of differentiation potential. This stem-cell-like character is important for cell therapies aiming at regeneration of specific immunity. Cell surface proteins are crucial for recognition and response to signals mediated by other cells or environmental changes. Knowledge of cell surface proteins of human naive CD4+ T cells and their changes during the early phase of T-cell activation is urgently needed for a guided differentiation of naive T cells and may support the selection of pluripotent cells for cell therapy. Periodate oxidation and aniline-catalyzed oxime ligation technology was applied with subsequent quantitative liquid chromatography-tandem MS to generate a data set describing the surface proteome of primary human naive CD4+ T cells and to monitor dynamic changes during the early phase of activation. This led to the identification of 173 N-glycosylated surface proteins. To independently confirm the proteomic data set and to analyze the cell surface by an alternative technique a systematic phenotypic expression analysis of surface antigens via flow cytometry was performed. This screening expanded the previous data set, resulting in 229 surface proteins, which were expressed on naive unstimulated and activated CD4+ T cells. Furthermore, we generated a surface expression atlas based on transcriptome data, experimental annotation, and predicted subcellular localization, and correlated the proteomics result with this transcriptional data set. This extensive surface atlas provides an overall naive CD4+ T cell surface resource and will enable future studies aiming at a deeper understanding of mechanisms of T-cell biology allowing the identification of novel immune targets usable for the development of therapeutic treatments.
Naive CD4+ T cells are the common precursors for all other T-helper cell subsets and it is of fundamental importance for specific immunity that their differentiation process is well directed. A complex signaling network is engaged upon antigen recognition that triggers the differentiation process of stem-cell-like, plastic, antigen-unexperienced naive T cells into antigen-specific, functional distinct T-cell subphenotypes (1). The differentiation process of naive T cells is tightly regulated in healthy individuals. Pathology develops under dysregulated effector responses such as overshooting responses leading to impaired tolerance (2) or ineffective control of infections (3). Naive T cells are defined by CD45RA expression and they are early cellular targets of immune modulation regarding the differentiation process and the development of long lasting, sustainable therapeutic strategies. In contrast, memory T cells express CD45RO and cover already committed cells such as T helper 1 and T helper 2 cells. Therefore, we chose to investigate the naive CD4+ T cell (CD45RA) and its phenotype during T-cell receptor (TCR)1 activation. The differentiation process of naive CD4+ T cells is initiated by ligand binding to the TCR, costimulatory surface receptors, and co-acting of specific extracellular signals and growth factors. This complex interaction, including signals mediated by other cells or changes in the environment, allows the integration of complex immunological conditions.
Until now, approaches dealing with T-cell differentiation focused mainly on genome-wide transcriptome and epigenome investigations revealing a large number of potential key drivers important in T-cell commitment (4–6). However, proteomic approaches dealing with the T-cell differentiation are rarely performed but consistently requested by the immunological community (7, 8). In 2014, two mass-spectrometry-based drafts of the complete human proteome were published on the same day in the same journal highlighting the importance and the need of proteomic data (9, 10). The first proteomic manuscript regarding activated human primary T helper cells, published in 2001, consisted of 91 proteins identified by metabolic labeling, 2-dimensional gel electrophoresis, and MALDI-TOF MS (11). Most of the already existing studies regarding T-cell biology are often conducted in Jurkat T-cell lines instead of primary T cells, focusing on proteomic events during activation close to the TCR, located in lipid rafts (12–14). Other studies focused on T-cell subproteomes within the early stages of T-cell differentiation and investigated proteomic changes in the nucleus of activated human cord blood CD4+ T cells after interleukin-4 stimulation (15) or focused on changes of the global phosphoproteome of human primary T cells in response to 5 min of TCR activation with αCD3 (16). In vitro manipulated T cells were previously analyzed such as 7-day cultures of in vitro differentiated T helper 1 and T helper 2 cells (17), however, the surface proteome of human naive CD4+ T cells and how these proteins change during the early time window of αCD3/αCD28 activation has not been investigated so far. Interestingly, knowledge of the surface proteins of naive CD4+ T cells would provide a molecular fingerprint to classify naive CD4+ T cells and especially their cellular state during activation. Furthermore, the investigation of cell surface proteins and their expression changes during stimulation might lead to the identification of interesting targets for therapeutic approaches such as antibody-based drugs or enrichment strategies.
The main aim of the present study was to generate an extensive ex vivo surface atlas of human naive CD4+ T cells and to analyze surface proteins of freshly isolated cells or directly upon stimulation on a deep proteomic level. Therefore an omics approach was performed, combining quantitative proteomic and transcriptomic methods. For the identification of surface proteins of CD4+/CD45RA+ naive T cells by mass-spectrometry the recently described PAL (periodate oxidation and aniline-catalyzed oxime ligation) approach (18) was applied and adapted for the use with primary T cells and combined with quantitative LC-MS/MS (19). This method allows an efficient labeling of cell surface sialic acid-containing glycans on living cells. Similar technical cell surfaceome approaches were conducted in human and mouse studies before and already showed the importance of these cell surface focusing technologies in order to identify proteins via shotgun proteomics. Most studies are based on the cell surface capture technology, which covalently labels extracellular glycan moieties on living cells (20). These studies focused for example on the surface proteome of stem cells (21), mesenchymal stromal cells (22), and murine adipocytes in obesity (23) as well as other immune cells such as B cell lines derived from lymphomas (24).
Furthermore, a systematic expression analysis of 332 surface antigens via flow cytometry was performed and the combination of the two proteomic data sets (PAL-qLC-MS/MS and flow cytometry) resulted in a cell surface protein atlas of human naive CD4+ T cells containing 229 cell surface proteins.
Additionally, a cell surface expression atlas based on transcriptomic data was created to increase the resolution of the proteomic data set. A bioinformatics analysis was performed on the transcriptome of naive and activated CD4+ T cells and annotations with experimental evidence or prediction of subcellular localization were used to select genes coding for cell surface proteins, i.e. transmembrane proteins localized in the plasma membrane and other proteins localized on the cell surface. This uncovers an additional set of genes encoding proteins located on the surface of human naive CD4+ T cells that have to be further validated and might include interesting candidates potentially applicable for a guided T-cell differentiation.
The here described cell surface atlas provides novel potential immune targets usable for therapeutic treatments and also gives for the first time an extensive overview of the cell surface expression pattern of CD markers and surface molecules detected on human naive and activated CD4+ T cells.
Peripheral blood mononuclear cells (PBMCs) were isolated from the heparinized blood of healthy, nonatopic blood donors (n = 19) by standard lymphoprep (Fresenius Kabi Norge AS for Axis-Shield PoC AS, Oslo, Norway) density gradient centrifugation. To generate and validate the surface atlas of human naive CD4+ T cells on a proteomic level, 15 different biological replicates were used. For the generation of the transcriptome data set by RNA microarray, additional four different donors were included. The study was approved by the local ethics committee of the Technical University Munich, ethic board no 2877/10. Informed written consent was obtained from all study subjects. Naive CD4+ T cells were separated by negative isolation (Naive CD4+ T cell Kit II, Miltenyi Biotec, Bergisch Gladbach, Germany), followed by the depletion of CD45RO+ cells (Isolation Kit CD45RO Microbeads; Miltenyi Biotec) according to the manufacturer's protocol reaching a purity of at least 98%, as monitored by flow cytometry. Cells were activated with plate-bound αCD3 (0.75 μg/24-well culture plate well; BD Biosciences, Heidelberg, Germany) and soluble αCD28 (0.75 μg/ml; BD Biosciences) or the T-cell activation/expansion kit with a bead-to-cell ratio of 1:2 (Miltenyi Biotec) in a density of 1 × 106 cells/ml of AIMV Medium (Gibco by LifeTechnologies, Darmstadt, Germany) and cultured at 37 °C in a humidified 5% CO2 atmosphere.
Periodate oxidation and aniline-catalyzed oxime ligation (PAL)-based cell surface labeling procedure was performed as described previously (18) and adapted for primary human naive CD4+ T cells. All following steps were performed on ice. The cell pellet of 8 × 106 naive CD4+ T cells per donor (n = 4, donors 1–4) and time point was washed twice in ice-cold labeling buffer (PBS with CaCl2/MgCl2, pH 6.7) and then added to 1 ml oxidation/biotinylation mix in an one-pot reaction, consisting of 1 mm NaIO4, 100 μm aminooxy-biotin (Biotium Inc. Hayward, CA) and 10 mm aniline (Sigma-Aldrich, Taufkirchen, Germany) in labeling buffer and incubated for 30 min at 4 °C in the dark, rotating. The biotinylation reaction was quenched by adding glycerol to a final concentration of 1 mm for additional 5 min on the rotator at 4 °C. Cells were washed with 1 ml cold washing buffer (PBS with CaCl2/MgCl2, pH 7.4) and an aliquot of cells was collected to check biotinylation labeling efficiency (supplemental Fig. S1A). After centrifugation (4000 rpm, 4 °C, 10 min), each cell pellet was resuspended in 250 μl lysis buffer (1% Nonidet P-40, 10 mm NaCl, 10 mm Tris, pH 7.6, 2 × EDTA-free complete protease inhibitor mixture (Roche, Basel, Switzerland) in ddH2O) and frozen at −20 °C. To prepare cell membrane fractions, the lysed cells were thawed on ice and constantly vortexed. Raw lysates were cleared by centrifugation (6000 × g, 4 °C, 10 min). The pellet was discarded and the supernatant was diluted 1:5 with washing buffer. The diluted samples were incubated with 60 μl of prewashed Strep-Tactin Superflow 50% suspension (IBA GmbH, Göttingen, Germany) in LoBind tubes (Eppendorf, Hamburg, Germany) at 4 °C for 2 h on a rotator to bind biotinylated cell surface proteins to high-affinity streptavidin beads. Samples were centrifuged at 1000 × g for 1 min to pellet the bead-protein complexes. The following washing and incubation steps (25) were all carried out in a volume of 200 μl and the centrifugation steps during the washing procedure were carried out at 2000 × g for 2 min. First, beads were washed with the washing buffer, supplemented with 0.2% Nonidet P-40 and followed by a washing step with 0.5% SDS (Gibco by LifeTechnologies) in washing buffer. After centrifugation, the beads of each sample were incubated with 0.5% SDS and 100 mm DTT in washing buffer at room temperature for 30 min. Beads were then washed with UC buffer (6 m urea, 100 mm Tris-HCl, pH 8.5). For alkylation the beads were incubated in UC buffer containing 50 mm iodacetamide (Sigma-Aldrich) at room temperature for 30 min, followed by washing steps with UC buffer, 5 m NaCl, 100 mm Na2CO3 (Merck, Darmstadt, Germany), pH 11.5, and 50 mm Tris-HCl, pH 8.5. Bead-protein complexes were then first digested in 40 μl 50 mm Tris-HCl, pH 8.5, containing 1 μg sequencing-grade modified trypsin (Promega, Madison, WI) in a Thermomixer (Eppendorf) at 37 °C overnight. The samples were centrifuged as before and the supernatant, containing the tryptic peptides, was transferred into a new LoBind tube (Eppendorf). Beads were resuspended in 40 μl 50 mm Tris-HCl, pH 8.5, and centrifuged, the resulting supernatant was then pooled with the first tryptic fraction. The beads were washed with 40 μl 1 × G7 buffer (NEB, Frankfurt a.M., Germany) and then incubated with 20 μl 1 × G7 buffer containing 500 U glyerol-free PNGase F (NEB) at 37 °C in a thermomixer for 6 h to release the glycopeptides from the beads. After the second digest, beads were centrifuged and the supernatant was transferred into a new LoBind tube. The beads were again resuspended with 20 μl 1× G7 buffer and centrifuged to pool the resulting supernatant with the first PNGase F fraction. Tryptic and PNGase F peptide fractions were stored at −20 °C until mass spectrometric analysis and measured separately.
As a technical validation of the PAL-qLC-MS/MS approach, a flow cytometry analysis was performed with identical samples in parallel to the MS setting, using specific antibodies for several selected surface markers (CD11a-FITC, CD69-PE, CD62L-PE-TexasRed) (BD Biosciences). The comparison of the expression patterns obtained by flow cytometry or mass spectrometry showed a very high concordance between the two techniques (supplemental Fig. S1B).
The LC-MS/MS analysis were performed as described previously (26, 27) on a LTQ-Orbitrap XL (Thermo Fisher Scientific, Waltham, MA) with the following adjustments: A nano trap column was used (300 μm inner diameter × 5 mm, packed with Acclaim PepMap100 C18. 5 μm, 100 Å; LC Packings, Sunnyvale, CA) before separation by reversed phase chromatography (PepMap, 25 cm, 75 μm ID, 2 μm/100 Å pore size, LC Packings) operated on a RSLC (Ultimate 3000, Dionex, Sunnyvale, CA) using a nonlinear 170 min LC-gradient from 5 to 31% of buffer B (98% acetonitrile and 0.1% formic acid) at 300 nl/min flow rate followed by a short gradient from 31 to 95% buffer B in 5 min and an equilibration for 15 min to starting conditions. From the MS prescan, the 10 most abundant peptide ions were selected for fragmentation in the linear ion trap if they exceeded an intensity of at least 200 counts and were at least doubly charged. During fragment analysis, high-resolution (60,000 full-width half maximum) MS spectra were acquired in the Orbitrap with a mass range from 300 to 1500 Da. One microscan was recorded with fill times in the FT (MS) set to 0.5 s and in the Ion Trap (MSn) to 0.1 s. Automatic Gain Control (AGC) targets were set to 1e+6 (MS) and 1e+4MSn, respectively. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (28) via the PRIDE partner repository with the data set identifier PXD001432.
The RAW files (Thermo Fisher Scientific) were further analyzed using the Progenesis LC-MS software (version 4.0, Nonlinear Dynamics, Newcastle, UK), as described previously (19), with the following changes: For peptide identification, using the search engine Mascot (Matrix Science, Release number 2.4), assuming tryptic digestion, one missed cleavage was allowed, a fragment ion mass tolerance of 0.6 Da and a parent ion tolerance of 10 ppm. Carbamidomethylation was set as fixed modification, methionine oxidation, and asparagine or glutamine deamidation were allowed as variable modifications. Spectra were searched against the Ensembl human database (Release 69; 100,607 sequences) (29) excluding the common contaminants keratin and albumin. A Mascot-integrated decoy database search using the Percolator algorithm calculated an average peptide false discovery rate of < 1% when searches were performed with a Percolator score cutoff of 13 and a significance threshold of p < 0.05. Search results and spectral files have been uploaded to the ProteomeXchange platform (http://www.proteomexchange.org) and are available with the identifier PXD001432. Peptide assignments were re-imported into Progenesis LC-MS. Normalized abundances of all unique peptides were summed up and allocated to the respective protein (supplemental Tables S1 and S2).
To generate the cell surface protein data set with high confidence, proteins were only pursued further if one of the following criteria was true: (1) proteins were identified either in Trypsin or PNGase fraction with one or more peptides if the confidence score was ≥ 18, and (2) protein was identified in both, the Trypsin and PNGase fraction with at least one peptide and a confidence score ≥ 13. For this analysis, the Ensembl human database protein ID of each identified protein was converted to the respective transcript (ENST) ID to guarantee stringency. All spectra from single peptide hits were manually inspected and a complete collection of spectra for the one-peptide hits was added as supplemental Fig. S2.
Furthermore, to verify the identification of cell surface proteins and to remove further contaminants, we considered only proteins for the cell surface protein atlas, which were at least annotated as “membrane” or “secreted” in the UniProtKB/Swiss-Prot database (30). The experimental evidence for their localization on the cell surface is brought by the PAL-qLC-MS/MS approach. In addition, proteins already nomenclatured as CD molecules were also included in further analysis supplemental Fig. S3 displays a Venn diagram and gives an overview of the number of proteins identified in the trypsin fraction, in the PNGase F fraction or in both fractions. Furthermore, the subcellular localizations of the identified proteins, which are given by the UniProtKB/Swiss-Prot database, are listed.
Surface staining of human naive CD4+ T cells (n = 3, donors 5–7) was performed with the LEGENDScreen Human Cell Screening (PE) Kit (Biolegend, San Diego, CA) according to the manufacturer's protocol. The kit contains 332 PE-conjugated monoclonal antibodies against human cell surface markers, plus 10 mouse, rat, and hamster Ig isotype controls. Aquisition was performed with the BD LSR Fortessa and BD FACSDIVA 7.0 and data analysis was performed using BD FACSDIVA 7.0 (BD Bioscience) and FlowJo Software (Tree Star, Ashland, OR). To distinguish positive from negative antibody signals for the cell surface proteins on naive and activated CD4+ T cells, the mean fluorescence intensity (MFI) detection threshold for all donors and time points was set to the highest measured Ig isotype control. Additionally, positive signals had to be obtained in at least two donors (otherwise stated) to integrate the respective proteins into the cell surface protein atlas.
Furthermore, flow cytometry analysis was also used to check cell purity and viability after isolation and to investigate labeling efficiency and potential side effects of the one-pot reaction on protein expression (supplemental Fig. S1A and S1B). To this end, 100,000 cells/staining were washed once with flow washing buffer (PBS containing 5% Hyclone FCS (PERBIO, Lausanne, Switzerland) and 0.02% NaN3) and stained with the respective antibodies (CD4-APC-Cy7, CD3-PerCPCy5.5, CD25-PE-Cy7, CD45RA-Horizon V450, CD45RO-Alexa700, CD69-PE, CD62L-PE-TexasRed, Streptavidin-PE) (BD Biosciences) for 30 min at 4 °C in the dark, washed again with flow washing buffer and analyzed with BD LSRFortessa (BD Biosciences) in combination with the software BD FACSDIVA 7.0 (BD Biosciences). Propidium iodide positive cells were excluded from the analysis.
The mean ratios of the via PAL-qLC-MS/MS identified protein abundances as well as the mean fluorescence intensity ratio of the proteins identified via the flow cytometry screening panel (of one representative donor) were subjected to unsupervised clustering (GProX) based on the fuzzy c-means algorithm as implemented in the Mfuzz package (31, 32).
Based on the PAL-qLC-MS/MS clustering results, a Gene Ontology (GO) (33) enrichment analysis was performed: proteins with a GProX membership value ≥0.6 to the respective cluster were subjected to the Generic Gene Ontology (GO) TermFinder software (34) and the results were transferred to REVIGO for reduction and visualization. Within the REVIGO software, we set the allowed similarity to the predefined value “small” in order to guarantee that possible pairs of GO terms will have a semantic similarity less than 0.5 (35).
RNA from human naive CD4+ T cells and αCD3/αCD28 stimulated (3 h) T cells of four blood donors (n = 4, donors 8–11) was isolated using the RNeasy Mini Kit (Qiagen, Hilden, Germany) and afterward RNA quality was proofed using an Agilent 2100 Bioanalyzer with the RNA 6000 Nano kit (Agilent Technologies, Santa Clara, CA) obtaining RNA integrity numbers (RIN). 25 μg of total RNA (RIN ≥ 9) was amplified and Cy3-labeled using the 1-color Low Input Quick Amp Labeling Kit (Agilent Technologies), according to the manufacturer's protocol. Hybridization to SurePrint G3 Human Gene Expression 8 × 60K microarray was performed by using the Hybridization Kit (Agilent Technologies). The array data were deposited in NCBI's Gene Expression Omnibus (36) and are accessible through GEO Series accession number GSE61983.
Quality control was performed with the GeneSpring software GX 12.5 (Agilent Technologies). Only genes, which were detected in four of four blood donors either in the medium control, in the stimulated state, or in both conditions were chosen for further analysis.
Using this selection criteria of the genome-wide scan, genes coding for cell surface proteins localized in and close to the plasma membrane were identified by a bioinformatics analysis. Of a total of 27,958 Entrez Gene RNAs, 17,757 microarray probe names were assigned to 14,455 unique NCBI RefSeq (37) accession numbers by the manufacturer. We could map 13,028 of these RefSeq accession numbers to their corresponding human UniProtKB (30) accession numbers (AC; UniProt release 2013_10). In general, several human UniProt ACs can be assigned to the same gene name. To reduce redundancy, for each gene name we chose, if available, the reviewed UniProt ACs (Swiss-Prot) and otherwise the unreviewed (TrEMBL). Thereby, we assigned UniProt ACs to 12,263 gene names (supplemental Table S3). (Remaining redundancy is because of 29 gene names that are assigned to more than one reviewed UniProt AC. Additionally, we retained 36 unreviewed UniProt ACs that we predicted to be plasma membrane proteins (criteria see below)). For these UniProtKB ACs, we extracted the subcellular localization (UniProt_SL) annotation from the UniProtKB/Swiss-Prot database (30), if available. Otherwise, we predicted subcellular localization using LocTree3 (38) and transmembrane helices (TMHs) using PolyPhobius (39). We identified cell surface proteins as follows (supplemental Table S3): (1) From UniProt_SL, we accepted all experimentally verified and probable subcellular localizations. Here, we were in particular interested in cell surface and cell membrane (plasma membrane) annotations. For proteins localized in the plasma membrane, we additionally required that they are single- or multi-pass membrane proteins. If no further information is given, we additionally required at least one TMH predicted by PolyPhobius, otherwise these were classified as putative cell surface proteins. For peripheral or lipid-anchored membrane proteins, we required that they be localized on the extracellular side of the plasma membrane. Those located on the cytoplasmic side were excluded; otherwise, these proteins were classified as putative cell surface proteins. (2) If we could not annotate the protein using UniProt_SL, we used the subcellular localization predicted by LocTree3 and required at least one TMH predicted by PolyPhobius for plasma membrane proteins.
Genes passing these selection criteria were analyzed for differential gene expression with the GeneSpring software GX 12.5 (Agilent Technologies) using the paired Student t test, filtered for a corrected p value (p ≤ 0.05, Benjamini-Hochberg correction).
For combining the omics data sets, we first scaled the flow cytometry and the PAL-qLC-MS/MS measurements by subtracting the mean from each value and dividing by the standard deviation. As a result, both protein data sets were comparable with the normalized log2 expression values of the microarray data. On the scaled data a one-sample, two-sided Welch t test was applied to determine significantly differentially expressed proteins/genes at each time point and/or for each technology with a p value ≤ 0.01 and a scaled absolute expression measurement ≥1. For this combined data analysis, the R programming language (www.r-project.org) and the “gplots” package were used.
Isolated total RNA (RNeasy Mini-Kit, Qiagen) from human naive CD4+ T cells (n = 4, donors 12–15) was subjected to reverse transcription using a high-capacity cDNA kit (Applied Biosystems, LifeTechnologies), following the manufacturer′s instructions. Real-time PCR was performed by using the FastStart Universal SYBR Green Mastermix (Roche) and the ViiA 7 Real-Time PCR System (Applied Biosystems, Life Technologies). The specific primers (Metabion, Munich, Germany) used in the real-time PCR are listed in supplemental Table S4. All amplifications were carried out at least in technical duplicates.
For Western blot analysis, equal protein amounts of isolated naive CD4+ T cells from four additional blood donors (n = 4, donors 16–19) in lysis buffer (1% Nonidet P-40, 10 mm NaCl, 10 mm Tris, pH 7.6, 2 × EDTA-free complete protease inhibitor mixture (Roche) in ddH2O) were mixed with NuPAGE LDS Sample Buffer (Novex, LifeTechnologies, Thermo Fisher Scientific, Waltham, MA) and boiled at 95 °C for 10 min, loaded on 10% Bis-Tris protein gels (NuPAGE Novex, LifeTechnologies) and separated for 1,5 h at 120 V using SDS-PAGE gelelectrophoresis (XCell SureLock Mini-Cell) in a MOPS SDS running buffer (NuPAGE Novex, LifeTechnologies). Afterward the proteins were transferred to a polyvinylidenfluorid (PVDF) membrane (Merck Millipore, Darmstadt, Germany) using XCell I Blot Module (Novex, LifeTechnologies) at 60 V (limited to 500 mA) for 90 min. Incubation in 3–5% nonfat dry milk in 1 × PBS for one hour was done for blocking of unspecific binding sites. The membranes were then separately incubated with the primary antibodies (supplemental Table S5) in 3–5% nonfat dry milk in 1 × PBS overnight at 4 °C. Membranes were washed three times with 3–5% nonfat dry milk in 1 × PBS and incubated with the corresponding HRP-linked secondary antibody (supplemental Table S5) in 3–5% nonfat dry milk in 1 × PBS for 2 h at 4 °C. Membranes were washed three times with 1 × PBS containing 0,02% Tween and then incubated with the substrate for HRP (Amersham Biosciences ECL Prime Western blotting Detection Reagent, GE Healthcare, Pittsburgh, PA) for 5 min. Chemiluminescence on all Western blots was recorded by ECL ChemoCam Imager and ChemoStar software (INTAS, Göttingen, Germany). A Western blot assay was declared as positive if there was detection of a band at the expected molecular weight of a protein.
For the identification of the N-glyco surface proteome, human naive CD4+ T cells from four biological replicates were isolated and the previously described PAL (periodate oxidation and aniline-catalyzed oxime ligation)-based cell surface protein labeling and enrichment technology (18) was used with subsequent quantitative LC-MS/MS (PAL-qLC-MS/MS) (Fig. 1, blue track). Applying the stringent identification threshold described in the methods section, 242 proteins expressed on naive and activated CD4+ T cells were identified and based on the subcellular localization (UniProt_SL) annotation given by the UniProtKB/Swiss-Prot database, 173 of them were included in the cell surface protein atlas (a complete list with these identified proteins is presented in Table I). Remarkably, for 131 of the 173 identified proteins the UniProt_SL annotation for “plasma membrane,” “(cell) membrane,” or “secreted” was given with experimental evidence or an affiliation as CD molecule was already assigned (supplemental Fig. S3). For the remaining 42 of the proteins, our data set expands the hitherto existing knowledge and provides direct experimental evidence for their plasma membrane-associated localization.
To further characterize the obtained protein data set, an extensive NCBI PubMed literature and a further UniProtKB/Swiss-Prot information search were performed. Confirming the robustness of our approach, most of the identified 173 proteins were described in the context of activation/proliferation of T cells or other immune cells (86%). Within this literature search, for 24 of the identified proteins no cocitation with “T cell and/or activation/proliferation/differentiation” could be found (Table I). To validate the expression of these proteins, naive CD4+ T cells were collected from four additional donors (donors 12–15) and by performing real-time PCR, all membrane anchored proteins of this group were confirmed at the transcriptional level (supplemental Fig. S4A). Validated antibodies were available for four identified targets (EVI2A, NPTN, RNF149, TMEM2). Using samples from four additional different biological replicates (donors 16–19) the expression of these markers at protein level could be additionally confirmed (supplemental Fig. S4B) by an alternative technique (Western blot analysis). The protein expression of the remaining markers has to be proven once the respective antibodies are commercially obtainable.
To investigate the changes in protein expression levels on the surface of human naive CD4+ T cells occurring during the first hours of activation, quantitative label-free proteomics were applied. To this end, human naive CD4+ T cells (donors 1–4) were either left unstimulated or activated by αCD3/αCD28 for 3, 6, 12, 24, 48 h and subjected to PAL-qLC-MS/MS sample preparation. A principle component analysis showed highly concordant protein abundances between the different blood donors at the respective time-points (supplemental Fig. S5). The large majority of proteins changed their proteomic expression within the first three hours of activation and/or at a later time point of 24 h of stimulation. Six proteins (CD98LC, CD120b, CD218a, CD258, CD272, CD357) could not be detected on the surface of unstimulated naive CD4+ T cells. Ten proteins (SBSN, DAG1, Fas-(2), HLA-B-(3), RNF149, S1PR4, SLC1A4, SLC6A6, TNFRSF18, TNFSF8) showed a differential expression on the naive CD4+ T cells among the different donors, based on the PAL-qLC-MS/MS data set, but were expressed during the stimulation at later time-points.
Unsupervised clustering (Fig. 2A) divided the 173 proteins, which were identified on the surface of human naive and activated CD4+ T cells according to their dynamic changes in protein abundance into three clusters (supplemental Table S6): Cluster 1 consists of 40 proteins and Cluster 2 contains 25 proteins. Both clusters show a fast expression decrease within the first hours of activation. Whereas the expression pattern increases in Cluster 1 at later time points, the expression in Cluster 2 stays at a lower level over time and does not reach or exceed the initial expression value. Cluster 3 is the largest one and contains 108 proteins. The expression of these proteins does only slightly change during the first hours of activation, however shows a strong up-regulation after 24 h of activation.
To obtain deeper insight into which kind of proteins are present in the different expression clusters and what they do have in common, a Gene Ontology (GO) Enrichment Analysis was performed. The Generic GO TermFinder algorithm was applied to identify enriched GO terms, shared among the proteins in each cluster. To reduce the obtained list with enriched GO terms, REVIGO was used to summarize and visualize the GO term results as treemaps (Fig. 2B, for more information see supplemental Fig. S6). As expected, some GO terms are similar between all clusters such as immune system process or response to stimulus, however, there are also very specific hits for each cluster. Proteins contained in Cluster 1 are primarily involved in migration, adhesion, and response to wounding. Cluster 2, characterized by a quick expression change within the first hours of activation, is enriched for proteins responsible for T-cell costimulation and activation. Cluster 3 covers proteins showing an increase in their expression at a later time point and is especially enriched for transmembrane transport. Fig. 2C shows the dynamic expression profiles of selected proteins that are involved in various processes enriched in the preceding GO term analysis. Certainly, the depicted proteins are associated in general to more than one GO term, but were selected as representatives for one GO term supercluster. The graphs at the bottom do not refer to any GO term supercluster, but illustrate the expression changes of those proteins during activation, which were not cocited in the context of T-cell biology so far.
In order to independently confirm the data obtained by mass spectrometry and to establish the cell surface atlas of naive CD4+ T cells by an alternative technique, human naive CD4+ T cells were subjected to an immunostaining with 332 different PE-labeled monoclonal antibodies against known cell surface proteins (Fig. 1, orange track). Because the MS-based data clearly showed that most of the proteins have their strongest expression changes either at an early (3 h) or a later (24 h) time point, the flow cytometry analysis was restricted to these time-points. Using a commercially available cell surface screening kit, 123 markers could be detected on the surface of naive and/or activated CD4+ T cells (Table I). According to their expression changes within three and 24 h of αCD3/αCD28 stimulation, these surface markers could be divided into three clusters (supplemental Table S7), in agreement with the results obtained via the PAL-qLC-MS/MS approach (Fig. 3A). Fig. 3B displays histograms of the flow cytometric analysis of numerous selected cell surface markers of each cluster and illustrates the expression profiles during activation in detail.
The comparison between the PAL-qLC-MS/MS and the flow cytometry analysis showed that a set of 56 proteins was uniquely detected by flow cytometry-based cell surface phenotyping. A disagreement between PAL-qLC-MS/MS technology and flow cytometry was only observed for five proteins, which were identified by PAL-qLC-MS/MS, but not detectable by the antibody in the flow cytometric profiling at any examined time point. Markers which were not detected in the flow cytometry-based surface screen might be potential negative markers for naive CD4+ T cells (n = 209).
Collectively, the combination of the mass spectrometry-based data set and the targeted flow cytometry screening panel expanded the knowledge of proteins present on the cell surface. This permits the generation of an extensive surface protein atlas of human naive and activated CD4+ T cells containing in total 229 surface proteins (a summary of all identified proteins can be found in Table I).
The combination of the nontargeted PAL-qLC-MS/MS approach and the targeted antibody-based cell surface screen led to the generation of a cell surface protein atlas. Because of technical reasons, PAL-qLC-MS/MS offers in general a hypothesis-free and system-wide analysis but is limited to the identification of N-glycosylated surface proteins and low abundant proteins are often difficult to detect. On the other hand, the flow cytometry-based surface screen cannot be used to identify novel and uncharacterized surface markers because a predefined, commercially available antibody panel against known surface markers was used.
To circumvent these technical limitations and to identify additional proteins, which were not covered by the two proteomic approaches, a surface expression atlas relying on transcriptomic data, subcellular annotations based on experimental evidence or predicted subcellular localization, was created (Fig. 1, green track). For that purpose, human naive CD4+ T cells of four biological replicates (donors 8–11) were either left unstimulated or exposed to αCD3/αCD28 for 3 h to investigate changes in gene expression during the early time window of TCR activation and subjected to a whole-genome microarray analysis. Genes, coding for cell surface proteins were identified by a bioinformatics analysis among detected genes from the genome-wide scan. In particular, to specifically identify cell surface proteins, we used (1) experimentally verified or (2) probable UniProtKB subcellular localization annotations and, if not available, (3) predicted the subcellular localization (LocTree3) and TMHs (PolyPhobius) (supplemental Table S3). This strict selection process led to the identification of 927 genes coding for cell surface proteins. 141 of these genes display a significant expression change within the first three hours of activation (FC ≥1.5, p ≤ 0.05 corrected) (Fig. 4).
A comparison between the surface proteins identified in the proteomic approaches (PAL-qLC-MS/MS and flow cytometry-based surface screen) and the genes coding for surface proteins, identified via bioinformatics analysis, showed that 53% of the proteins identified by mass-spectrometry and flow cytometry analysis could also be found in the transcriptional surface expression data set. To further deepen our analysis, we generated in addition a putative cell surface data set, in which we extenuated the bioinformatics selection criteria. Here, we included proteins that are described in the UniProtKB database as potentially expressed on the extracellular side of the plasma membrane, e.g. lipid-anchored or peripheral membrane proteins for which it is not known whether they are on the extracellular side (Table I and supplemental Table S3). This putative data set presents 248 additional genes, potentially encoding cell surface proteins. Using this putative cell surface data set for the comparison with the proteomic approach, we reached an overlap of 58% between transcriptomic and proteomic data sets. The overlap could even be improved to 83%, if proteins, which are solely mentioned as “membrane” proteins in the UniProtKB (no more information regarding their subcellular localization), are considered in the analysis.
Fig. 5 depicts the combined proteomic and transcriptomic data sets and compares the quantitative expression trends of all detected cell surface proteins, which were identified via PAL-qLC-MS/MS as well as flow cytometry or detected either by PAL-qLC-MS/MS or flow cytometry alone, with the respective mRNA expression data (given mRNA data is independent of the annotation or prediction as a cell surface protein of the respective gene). A large set of cell surface proteins showed a weak correlation between expression patterns at the transcriptional compared with the protein level at the early time points. Of note, Fig. 5 indicates a reference, presenting a scheme of numerous surface and CD markers detected on naive CD4+ T cells and their corresponding changes during T-cell activation, combining transcriptome and proteome technologies.
In order to select out of this reference map the most promising surface markers for further functional studies, we combined the transcriptomic and the proteomic data set on the level of significant regulation. We identified a distinct group of 32 cell surface markers, which were measurable at RNA as well as at protein level (PAL-qLC-MS/MS and/or flow cytometry) and revealed a significant expression change in at least one technical approach for a minimum of one stimulation time point (Fig. 6). Obviously, a large majority of these proteins belong to the cluster of differentiation and thus their role in T-cell biology is already well characterized. However, Fig. 6 highlights one cell surface marker, EVI2A, whose expression was also confirmed by Western blot analysis. For this protein a definite general functional role and, in particular, a role during activation is not described so far and further studies are essential to identify its potential function in the context of T-cell activation.
In this study, we present an extensive surface atlas describing proteins located on the surface of human naive and activated CD4+ T cells. These cells are the common precursor for all other T helper cell subsets and are therefore of fundamental importance for future developments in immunology and clinical application.
We conducted a combined omics approach to link proteomic and transcriptomic data. First, we combined two complementary proteomic techniques, mass-spectrometry and flow cytometry and could create an extensive cell surface protein atlas of human naive and activated CD4+ T cells, containing 229 proteins. All proteins are directly measured on protein level and thus represent an experimentally validated set of cell surface proteins involved in the T-cell activation process. Next, a cell surface expression atlas based on transcriptomic data, annotations based on experimental evidence as well as predicted subcellular localization, was built and identified 927 genes coding for proteins located on the surface of human naive and activated CD4+ T cells, of which 101 could be confirmed in the combined proteomic data set (Table I).
In detail, by using the PAL-qLC-MS/MS approach, we identified 173 N-glycoproteins on the surface of human naive CD4+ T cells with high confidence. An extensive literature search revealed that 24 of these proteins were not mentioned in the context of T-cell biology before. We validated all membrane-anchored markers (n = 20) via qPCR, however a confirmation on protein level via an alternative technique was only doable for four targets. The proteomic validation of the remaining markers is still missing because of the absence of validated detection tools for these markers, e.g. such as monoclonal antibodies for flow cytometry. Certainly, to specify these 24 markers as novel T-cell surface proteins would be too broad, because their expression needs to be further substantiated on protein level with an alternative technique. However, very recently, Bausch-Fluck et al. published a mass-spectrometry-derived cell surface protein atlas providing a surfaceome snapshot of 78 different human and murine cellular species. Among them a mixed CD4+/CD25− T-cell population is listed, which contains nonactivated, naive as well as memory T cells (40). This population can be clearly distinguished from the pure naive CD4+/CD45RA+ T-cell population, which are investigated in the current manuscript, and has no further focus on the activation process. However, 84 proteins identified by our PAL-qLC-MS/MS approach could be confirmed in this data set, even including nine (ECE1, NPTN, RNF149, SLC1A4, SLC4A7, SLC5A3, SLC7A1, SYPL1, TMEM2) of the 24 markers, which were not linked to T cells so far. Thus, the next step will be to focus on these interesting candidates and to prove their status as novel surface proteins. Furthermore, functional studies will be required to elucidate their potential functional relevance within T-cell biology.
Unsupervised clustering of the whole mass-spectrometry-based data set grouped the identified proteins into three distinct dynamic expression profiles, each of them characterized by specific Gene Ontology terms. Interestingly, 38% (cluster 1 and 2) of the proteins showed a strong down-regulation within the first hours of activation. In general, the expression of cell surface proteins is subjected to a fast turnover, which enables those proteins to respond to any kind of stimulation. Markers contained in cluster 1 and 2 such as Lck, CD3, CD4, CD28, and CD44 are present in the immunological synapse and are known for their early expression change during stimulation. Others, such as SEMA7A or CD274 (Cluster 3) are late activation markers and play mainly a role as negative regulators, which lead to the termination of T-cell responses. In line with this, Naramura et al. (41) showed that T-cell activation is negatively regulated by clearance/internalization of the engaged TCR from the cell surface, a process that seems to be essential for the termination of the TCR signal.
In order to expand and endorse the data set of surface proteins generated by PAL-qLC-MS/MS with an independent technique and to detect further surface markers, an antibody-based flow cytometry approach was performed. Hereby we screened the expression of 332 surface antigens with monoclonal antibodies and could verify 67 proteins detected by PAL-qLC-MS/MS, whereas further 56 additional surface proteins were solely identified by flow cytometry.
Metcalfe et al. (42) recently performed a different mass-spectrometry based approach by focusing on surface proteins of murine leukocytes that contain redox labile disulfide bonds. Redox conditions change during immune activation and thus mild reducing conditions, comparable with those expected during activation, were applied to a murine 2B4 T-cell hybridoma cell line. By doing so, a large set of 87 membrane proteins containing labile disulfide bonds was identified, of which half of the proteins were confirmed by our human proteomic data set.
Of note, our proteomic surface approach is limited to the identification of N-glycoproteins by the PAL-qLC-MS/MS and restricted to available antibodies in the case of the flow cytometry approach. Thus, to overcome this drawback and to estimate the quantity of surface proteins, which might be potentially expressed on the surface of human naive CD4+ T cells, we continued with the transcriptomic approach. In particular, we identified genes for which the subcellular localization of their corresponding proteins was annotated in UniProtKB as being on the cell surface or in the plasma membrane. If no experimentally verified annotation could be obtained from the UniProtKB, we used predicted subcellular localizations from the recently published LocTree3 (accuracy 80%) further supported by predicted transmembrane helices (1 or more for plasma membrane proteins). We are aware, that the selection criteria for cell surface protein annotation are more strict in the transcriptional data set than in the proteomic data set. However, because experimental evidence for the subcellular localization of the protein data is provided by the PAL-qLC-MS/MS and/or the flow cytometry approach, we strengthened the criteria in the transcriptomic data set to reduce false positive results.
The transcriptomic analysis revealed 927 transcripts, which fulfilled the selection criteria. Recently, da Cunha et al. (43) generated an overall catalog of cell surface proteins based on the human genome sequence (National Center for Biotechnology Information build 36.1). By using bioinformatics tools a set of more than 3700 genes, believed to encode cell surface proteins of human cells, were identified. This number is in line with other estimates assuming that around 26% of all human genes code for proteins that are located at the cell surface (44). The reduced number of identified surface proteins in our transcriptomic data set compared with this study could be explained by the fact that (1) we focused specifically on the subset of CD4+ naive T cells and (2) we used a much stricter selection approach as described above.
Out of the 927 selected transcripts we identified in our bioinformatics analysis, 141 have to be emphasized, because those revealed a significant differential expression after 3 h of αCD3/αCD28 activation. The top 10 genes with the highest expression changes are SEMA7A, FASLG, CLIC4, FFAR3, CD200, TMEM88, TNF, SLC7A5, TNFSF14, and CD69. For most of them, their role in T-cell activation and differentiation is well known. However others such as FFAR3 (the free fatty acid receptor 3), which was recently described in a murine study as a crucial sensor on dendritic cell precursors for the transport of short-chain fatty acids (45), have to be further investigated on the protein level in the context of naive CD4+ T cells.
A comparison between the transcriptomic and the proteomic data sets showed that 53% of the proteins identified by mass-spectrometry and flow cytometry would have been identified as cell surface proteins via the bioinformatics approach alone. An even more extensive overlap of 83% between the transcriptional and the proteomic approach could be reached if the more general annotation “membrane” (subcellular localization not further classified) and the putative cell surface protein data set are included in the analysis. The putative data set allows additional subcellular annotations given by the UniProtKB such as “lipid- anchor” or “peripheral” (side of the plasma membrane to which they are attached is not specified) to be considered in the analysis and presents 248 additional genes, potentially encoding cell surface proteins. On the one hand, this comparison still illustrates the limitations obtained by trusting only transcriptional approaches supported by bioinformatics analysis. Forty-seven percent of the identified proteins are not identified as cell surface proteins by the strict transcriptional data set, underscoring the essential need of proteomic approaches. On the other hand the bioinformatics approach used for the identification of cell surface proteins (including the putative data set) presents a great assembly of further genes (n = 1074) potentially coding for cell surface proteins, which have to be experimentally validated on a proteomic level on the surface of human naive CD4+ T cells. The fact that these surface membrane protein coding genes, which were identified by the bioinformatics analysis, were not detected by the two proteomic approaches might be explained by (1) not all genes present on the RNA level are translated into proteins, (2) proteins are subjected to post-translational modifications, translocation, and degradation, and (3) not all of the genes code for proteins containing a glycosylation site. A potential N-glycosylation site was proposed for 69% of the genes coding for cell surface proteins by the UniProtKB database (supplemental Table S3). Thus, one might speculate that we are missing around 30% of the proteins expressed on the surface of naive CD4+ T cells using the PAL approach. However, because we are aware of this limitation, we added the flow cytometry approach and the transcriptome data set in order to expand the cell surface atlas and to introduce a promising collection of further potential surface proteins to the interested readership.
In general, we observed only a weak correlation between the expression changes of transcripts and the corresponding translated proteins, which is already shown in various previous studies (46, 47). Nonetheless, by combining the transcriptional and the proteomic approaches (PAL-qLC-MS/MS and/or flow cytometry), 32 out of 229 cell surface proteins showed a significant expression change in at least one technical approach for a minimum of one stimulation time point. As expected, the majority of these proteins like e.g. CD69 (T-cell activation marker) and CD108 (known to modulate inflammation and T-cell mediated immune response) are well characterized in the context of T-cell biology. Within this data set, 21 cell surface proteins are already annotated as CD molecules. Six proteins belong to the solute carrier family and one of them, SLC7A5 (CD98LC), was recently described as the central large neutral amino acid transporter in a murine study (48), indispensable for the effector differentiation and clonal expansion of T cells upon antigenic TCR stimulation. In the current human study, this surface transporter protein was also among the top significantly regulated hits on RNA as well as on protein level. SLC7A5 showed a strong up-regulation upon the general αCD3/αCD28 TCR stimulus like it was previously shown for CD4+/CD25− T cells (49). Further members of the SLC family, (SLC4A2-Anion exchanger 2, SLC2A3-GLUT-3, SLC29A1-equilibrative nucleoside transporter 1, SLC1A5-neutral amino acid transporter B, SLC12A2-Na+/K+/Cl− cotransporter) were detected, but all of them are missing the affiliation to the cluster of differentiation (CD). Although their general role as transporters is described, it needs to be further validated if they play a similar important role in T-cell signaling as SLC7A5. Furthermore, EVI2A (ecotropic viral integration site 2A) was identified in our analysis as an interesting naive CD4+ T-cell marker. Together with EVI2B, it lies in a long intron of NF1 (neurofibromin 1), which is known to be completely deleted in some patients with neurofibromatosis (50, 51). It is also known that the EVI2 locus (murine homolog of human EVI2A) is an integration site for the chronic Murine Leukemia Virus, possibly leading to myeloid leukemias upon infection and integration of the virus at this gene locus (52). This protein is thought to be arranged in complexes as homodimer or with other proteins located in the plasma membrane as a heterodimer to form a cell surface receptor (information given by UniProtKB). Its neighbor gene, EVI2B, which was also detected on transcriptional and protein level in our cell surface approach, but did not exhibit a significant regulation, was accepted in the last Human Leukocyte Differentiation Antigens (HLDA) Workshop in 2010 (53) as a new member of the CD family and nomenclatured as CD361. EVI2B was assigned as a new B cell marker, but its expression on T cells, monocytes, granulocytes and NK cells was additionally proven, leading to the assumption that this protein might have a general role on cells of the immune system. However, until now, no functional characterization of either EVI2B nor EVI2A has occurred leaving their function unknown. Thus, future studies are essential to identify their potential function in the context of T-cell activation.
The high translational potential of cell surface protein focusing technologies was already shown by data from Mirkowska et al. (54) by the analysis of the surface composition of xenografts from 19 B-cell precursor acute lymphoblastic leukemia patients leading to the identification of new diagnostic leukemia-associated candidate markers. In a second study, the surface proteome of myeloid leukemia cells was identified to target antigens, which may be used for the development of therapeutic antibodies (55). Cell surface proteins are easy accessible targets for drugs, especially for antibodies. Seven proteins of our data set are already described as targets for approved therapeutic antibodies: therapeutic antibodies against CD52 and CD152 are applied in cancer therapy, whereas CD3, CD25, CD11a, CD49d, and CD126 are targeted in the context of autoimmune diseases (www.antibodysociety.org, www.immunologylink.com). Furthermore, the high translational potential of the presented cell surface atlas becomes obvious by subjecting the identified proteins (n = 229) to a DrugBank database search (56). In total, the current study identified 60 proteins (Table I), which are targets for approved (n = 51) or experimentally validated/currently under investigation drugs (n = 9).
Taken together, the combination of the transcriptomic as well as the proteomic data sets enabled us to generate an extensive cell surface atlas of human naive and activated CD4+ T cells on an experimental level and introduces a promising collection of further candidate proteins. Furthermore, the cell surface atlas can serve as a reference book for the mapping of cell surface proteins on human naive and activated CD4+ T cells. It presents known markers and proposes potential new drug targets such as EVI2A usable for the modification of T-cell activation.
We thank B.Grothe (Division of Neurobiology, Department Biology II, Ludwig-Maximilians-Universität München, Germany) for support of the studies.
Author's contribution: AG, SMH, MS, CBSW, KS: designed experiments; AG, CvT, HK, KD, KS: performed experiments; AG, SMH, CvT, EK, TG, BK, LK, KS: analyzed data; CBSW, KS: supervised the project; SMH, CvT: contributed to manuscript preparation; AG, KS: wrote the final version of the manuscript.
*E.K. is supported by NIH grant GM095315. T.G. is supported by Alexander von Humboldt Foundation through German Federal Ministry for Education and Research; Ernst Ludwig Ehrlich Studienwerk. M.S. is supported by the German Research Foundation and the Else Kröner-Fresenius Stiftung. B.K is supported by the German Resarch Foundation (SPP1395/InKoMBio Busch 900/6-1).
This article contains supplemental Figs. S1 to S6 and Tables S1 to S7.
The MS proteomics data in this paper have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomeexchange.org) via the PRIDE partner respository data set identifier PXD001432. The RNA array data set was deposited in NCBI's Gene Expression Omnibus and is accessible through GEO Series accession number GSE61983. This work was supported by iMed - the Helmholtz Initiative on Personalized Medicine. This work is part of the PhD thesis of A.G. and is supported by HELENA, Helmholtz Graduate School Environmental Health, Helmholtz Zentrum München, Neuherberg, Germany.
1 The abbreviations used are: