|Home | About | Journals | Submit | Contact Us | Français|
As the reliable identification of proteins by tandem mass spectrometry becomes increasingly common, the full characterization of large data sets of proteins remains a difficult challenge. Our goal was to survey the proteome of a human T-cell lymphoma-derived cell line in a single set of experiments and present an automated method for the annotation of lists of proteins. A downstream application of these data includes the identification of novel pathogenetic and candidate diagnostic markers of T-cell lymphoma.
Total protein isolated from cytoplasmic, membrane, and nuclear fractions of the SUDHL-1 T-cell lymphoma cell line was resolved by SDS-PAGE, and the entire gel lanes digested and analyzed by tandem mass spectrometry. Acquired data files were searched against the UniProt protein database using the SEQUEST algorithm. Search results for each subcellular fraction were analyzed using INTERACT and ProteinProphet. All protein identifications with an error rate of less than 10% were directly exported into excel and analyzed using GOMiner (NIH/NCI). The Gene ontology molecular function and cell location data were summarized for the identified proteins and results exported as user-interactive directed acyclic graphs.
A total of 1105 unique proteins were identified and fully annotated, including numerous proteins that had not been previously characterized in lymphoma, in functional categories such as cell adhesion, migration, signaling, and stress response. This study demonstrates the utility of currently available bioinformatics tools for the robust identification and annotation of large numbers of proteins in a batchwise fashion.
Identification of proteins by tandem mass spectrometry is becoming increasingly popular. However, due to the large amounts of data that are generated, fully characterizing the identified proteins becomes difficult when hundreds or thousands of proteins have been found. Our goal was to perform comprehensive analysis of the entire proteome of a human T-cell lymphoma cell line in a single set of experiments. The SUDHL-1 cell line is derived from anaplastic large-cell lymphoma (ALCL), and carries the t(2;5)(p23;q35) chromosomal aberration.1 Approximately 250 proteins expressed by this cell line have been documented in the published literature since the identification of the chromosomal translocation. We hypothesized that the unbiased characterization of a comprehensive set of proteins expressed by these cells would provide novel insights into the molecular circuitry, signaling pathways, oncogenes, tumor suppressors, and cytokines that may play a role in its pathogenesis, and lead to development of novel disease-associated diagnostic markers. Here, we present a highly automated method for the annotation of lists of proteins in a batchwise fashion.
In order to decrease the complexity of the cell lysate and maximize the total number of proteins identified in the study, we separated the cell lysate into cytoplasmic, membrane, and nuclear fractions,2–4 which were further resolved by SDS-PAGE. Entire gel lanes were then cut into equal slices, digested, and analyzed by ion trap tandem mass spectrometry (Thermo, San Jose, CA). Protein database search results from SEQUEST (Thermo) were further evaluated using INTERACT and ProteinProphet (Institute for Systems Biology, Seattle, WA),5,6 and finally analyzed by GOMiner (NIH/NCI, Atlanta, GA).7 A total of 1105 proteins were identified and annotated using this strategy (Figure 11).). Our study demonstrates the utility of currently available bioinformatics tools for the robust identification of proteins and mapping to Gene Ontology (GO) terms for large numbers of proteins in a batchwise fashion. Information obtained from this study could be useful in the identification of novel diagnostic markers and proteins that are biologically relevant to the pathogenesis of this T-cell lymphoma.
The SUDHL-1 cell line was obtained from American Type Culture Collection (Rockville, MD), and maintained in RPMI 1640 medium supplemented with 10% heat-inactivated fetal calf serum and antibiotic mixture (Gibco-BRL, Gaithersburg, MD). The cells were grown to confluency and samples (approximately 2 × 107 cells each) were pelleted and stored at −80°C until analysis.
Cells were lysed using 0.5 mL RIPA buffer (25 mM Tris-HCl, 0.1% SDS, 1% Triton X-100, 1% sodium deoxycholate, 0.15 M NaCl, 1 mM EDTA) per pellet. Before each use, 0.1% protease inhibitor cocktail (Sigma, St. Louis, MO) was added to the required volume of RIPA buffer. Protein concentrations were estimated using the Bradford colorimetric method against known concentrations of BSA. Lysate from one to three cell pellets was combined to obtain the desired concentration of 2–3 mg/mL of total protein.
Total cell lysate was separated into cytoplasmic and nuclear fractions using the NE-PER Nuclear and Cytoplasmic Extraction Kit (Pierce, Rockford, IL); a membrane fraction was obtained using the Mem-PER membrane protein extraction kit (Pierce) according to the manufacturer’s instructions. To prepare the fractions for separation by SDS-PAGE, the PAGEprep Protein Cleanup and Enrichment Kit (Pierce) was also used. Each cellular fraction was then resolved on a 10% SDS-PAGE and visualized with mass spectrometry–compatible silver staining (Invitrogen, Carlsbad, CA).
After staining the gel, lanes of interest were excised into equivalent slices, destained, digested, and extracted using the Invitrogen protocol. Briefly, each gel slice was destained and washed, crushed and dried, then rehydrated in ammonium carbonate buffer. Freshly prepared sequencing grade lysine-C endopeptidase (Princeton Separations, Adelphia, NJ) was added (1:50) to each tube, and the tubes incubated at 37°C for 4 h. Sequencing-grade modified trypsin (Princeton) was then added (1:50) and the tubes returned to incubate at 37°C overnight. The digested peptides were then extracted from the gel pieces with 50% acetonitrile with 0.1% trifluoroacetic acid (TFA), and reduced to a final volume of 30 μL. For reproducibility, duplicate lanes were prepared for each cellular fraction (6 lanes total) and 32 slices per lane resulted in 192 samples to be analyzed.
A 15-μL aliquot of each sample was analyzed by automated nanoflow reverse-phase LC/MS using the LCQ Deca XP ion trap mass spectrometer as previously described.8 Digested peptides were injected by an autosampler, using an acetonitrile gradient (0–60% B in 60 min; A = 5% acetonitrile with 0.4% acetic acid and 0.005% heptafluorobutyric acid, B = 95% acetonitrile 0.4% acetic acid and 0.005% heptafluorobutyric acid) through a reverse-phase column (75 μm ID fused silica packed in-house with 10 cm of 5-μm C18 particles) to elute the peptides at a flow rate of approximately 200 nL/min into the mass spectrometer. An electrospray voltage of 2.2 kV was used, with the ion transfer tube temperature set to 220°C. Peptide analysis was performed using data-dependent acquisition of one MS scan (400 to 2000 m/z) followed by MS/MS scans of the three most abundant ions in each MS scan. Normalized collision energy for MS/MS was set to 35%, with an isolation width of 1.7 amu. To obtain better peptide coverage throughout the 1-h gradient, dynamic exclusion was set to a repeat count of 2, with the exclusion duration of 3 min.
The MS/MS-acquired data were searched using the SEQUEST algorithm in Bioworks 3.1 (Thermo) against amino acid sequences in the UniProt protein database (9.27.2004 download, 190,183 entries). Protein search parameters included a precursor peptide mass tolerance of ±0.7 amu, fragment mass tolerance of ±0.1 amu, and carboxymethylation for cysteine residues. The search was constrained to tryptic peptides, with one missed enzyme cleavage allowed. The peptide matching criteria of a cross-correlation score (Xcorr) greater than 1.2 for +1 peptides, greater than 2.2 for +2 peptides, and greater than 3.2 for +3 peptides, and a delta correlation score (ΔCn) greater than 0.100 was used as a threshold of acceptance. When at least two peptides met these requirements per protein, additional peptides were then included to maximize the peptide coverage of the identification.
Next, all SEQUEST data (.dta) and output (.out) files from duplicate samples were combined and evaluated using INTERACT and ProteinProphet (Institute for Systems Biology). Data analysis from INTERACT and ProteinProphet improved the confidence of protein identification by best-fit distribution of probability scores specific to each data set, and reduced the overall risk of false-positive identifications.5,6 All protein identifications with an error rate of less than 10% were summarized and directly exported into Excel. Each cellular fraction was also analyzed for unique and shared protein identifications using the Perl script program (iadiff.pl) INTERACT Difference (Eng J, personal communication, January 2004).
Finally, Excel’s data text-to-column command was used to prepare tables for each cell fraction with protein name, UniProt accession number, description, MW, and probability. UniProt accession numbers were saved as a .txt file and analyzed using GOMiner, which maps identified proteins to existing Gene Ontology (GO) terms.7 Both molecular function and cell location were summarized for each cellular fraction, and the entire list of proteins and results was exported to a Web browser as interactive directed acyclic graphs (DAGs). All software was used as supplied, without further modification or interfacing.
In order to reduce the complexity of our sample, SUDHL-1 total lysate was separated into three cellular fractions using the NE-PER and Mem-PER extraction kits (Pierce). To further simplify the sample prior to analysis, each fraction was resolved by one-dimensional SDS-PAGE. Figure 22 displays the separation of proteins in each cellular fraction by gel electrophoresis and demonstrates a unique pattern of protein bands in each lane.
The excised gel slices were analyzed by tandem mass spectrometry using data-dependent scanning set to cycle from 400 to 2000 m/z, followed by MS/MS scans of the three most abundant ions. Dynamic exclusion was used to obtain a more complete survey of the peptides contained in each sample by automatic recognition and exclusion of ions that had been previously acquired. Analysis of the SDS-PAGE lanes obtained from the SUDHL-1 lysate cellular fractions resulted in a total of 6328 peptides, which were matched to 5401 known database entries and 1105 unique proteins. The peptides ranged in length from 7 to 29 amino acid residues, with the average size of identified peptides at 14 amino acid residues.
The total ion chromatogram (TIC) and representative MS/MS scans of dominant ions from band 22 of the SUDHL-1 nuclear fraction are shown in Figure 33.. The TIC displays the ion trap scans from the 60-min HPLC gradient, with the y-axis scaled (100% response) to the most abundant peak seen during the 1-h analysis. The TIC window is followed by representative data-dependent acquisition of top ion candidates, and MS/MS scans for tryptic peptides that identified ribosomal protein S6 kinase.
A total of 1105 proteins were identified by tandem mass spectromtery in the three cellular fractions of the SUDHL-1 lysate. Only those proteins found in duplicate experiments were included in the final summary. As summarized in Table 11,, the cytoplasmic fraction contained 553 proteins, while the membrane fraction contained 295 proteins and the nuclear fraction contained 274 proteins. Analysis using INTERACT Difference showed 26 proteins identified in both the cytoplasm and membrane fractions, 37 proteins in common between the cytoplasm and nuclear fraction, and 19 proteins identified in both the membrane and nuclear fraction. Only four proteins were identified by mass spectrometry in all three cellular fractions. As reported by GOMiner, Figure 44 shows the cellular location for the protein identifications in the (A) cytoplasmic, (B) membrane, and (C) nuclear fractions of the SUDHL-1 proteome, and demonstrates protein enrichment for each cellular fraction. Categories of molecular function for the identified proteins are displayed in Figure 4D4D.
In summary, more than one thousand unique proteins were identified and fully annotated using the described method. Numerous proteins that were not previously known in lymphoma, in functional categories such as cell adhesion, migration, signaling molecules, and stress response, were identified and may serve as novel disease markers, providing insight into the pathogenesis of lymphoma. This study demonstrates the utility of currently available bioinformatics tools for the robust identification and annotation for large numbers of proteins in a batchwise fashion.
This work was supported by the ARUP Institute for Clinical and Experimental Pathology.