|Home | About | Journals | Submit | Contact Us | Français|
The current study used three different proteomic strategies, which differed by their extent of intact protein separation, to examine the proteome of a pluripotent mouse embryonic stem cell line, R1. Proteins from whole-cell lysates were subjected either to 2-D-LC, or 1-DE, or were unfractionated prior to enzymatic digestion and subsequent analysis by MS. The results yielded 1895 identified non-redundant proteins and, for 128 of these, the specific isoform could be determined based on detection of an isoform-specific peptide. When compared with two previously published proteomic studies that used the same cell line, the current study reveals 612 new proteins.
The realization of stem cell therapy depends, in part, on understanding and manipulating mechanisms necessary for the maintenance of pluripotency as well as differentiation into specific cell types. Knowing the genes and proteins that play essential roles in these processes is an important part of understanding stem cell biology and developing viable therapies. Therefore, studies that characterize the proteome of pluripotent cells will benefit the stem cell community. Toward that end, the current study used three different proteomic strategies, which differed by their extent of intact protein separation, to examine the proteome of a pluripotent mouse embryonic stem (ES) cell line, R1. This work complements previous proteomic studies of the R1 proteome by our lab , which used 2-DE, and Graumann et al. , which used subcellular fractionation followed by 1-DE for protein separation and isoelectric focusing for peptide separation.
Pluripotent mouse ES cells (R1 cell line) were cultivated as described  and were passaged off feeder layers five times before lysis . Under these conditions, ES cells contained mRNA transcripts to Oct4, sex-determining region Y-box 2, Nanog, Zfp42, and either weak or no expression of transcript markers of differentiation (Brachyury, CoupTF) (data not shown). Protein from whole-cell lysates were subjected either to 2-D-LC; (separation by pI and hydrophobicity), 1-DE (separation by molecular mass), or were unfractionated (UF; i.e. shotgun approach) prior to enzymatic digestion and subsequent analysis by MS (Fig. 1). Detailed methods are provided in the Supporting Information. Peptides from the 1-DE (n = 20 bands) and UF samples (n = 2 replicates) were analyzed on an Agilent 1200 nanoLC system (Agilent, Santa Clara, CA, USA) connected to an LTQ-Orbitrap mass spectrometer (Thermo). 2-D-LC samples (n = 185 fractions) were analyzed on an Agilent 1100 nano-LC system connected to an LTQ mass spectrometer (Thermo). All MS acquisition details are provided in the Supporting Information.
Raw MS data were searched against the International Protein Index (IPI) Mouse v3.47 database  (55 298 entries; 8/26/08) using Sorcerer 2™-SEQUEST® (Sage-N Research, Milpitas, CA, USA) with post search analysis performed using the Trans-Proteome Pipeline (TPP), implementing PeptideProphet  and ProteinProphet  algorithms. Database search parameters are provided in the Supporting Information. The ProteinProphet interact-prot.xml result files were input into ProteinCenter (Proxeon Bioinformatics, Odense, Denmark) and filtered to display proteins with protein probability scores p>0.9 (corresponding to false discovery rate ~1.0%), which were identified by two or more unique peptides. To remove redundancy in protein identifications, proteins were grouped according to “indistinguishable proteins,” which resulted in 1895 protein groups. For the final protein database, isoform notation is provided only when a peptide with an amino acid sequence that is unique to a specific protein isoform was identified. Membrane topology predictions were based on TMAP , which is integrated into ProteinCenter. All proteins identified are provided in Supporting Information Table S2 and detailed information regarding the data set can be found in Supporting Information Table S3 and the PRIDE database  (www.ebi.ac.uk/pride), accession numbers 11364 – 11379 (inclusive). The data were converted using PRIDE Converter  (http://code.google.com/p/pride-converter).
A total of 1895 non-redundant proteins were identified among all three strategies, with 1164 identified via 2-D-LC, 924 via 1-DE, and 955 via UF (Figs. 1 and and2B).2B). A higher percentage of the proteins identified via 1-DE (40%) and UF (40%) compared to 2-D-LC (18%) were predicted or known transmembrane proteins based on TMAP (Fig. 2A). The gene ontology classifications for subcellular localization most represented in the complete data set include cytoplasm (29%) and nucleus (17%) (Fig. 2D) and the biological processes most represented include cell organization/biogenesis and regulatory proteins (Fig. 2E). Known ES protein markers such as sex-determining region Y-box 2 (Sox2), nestin (Nes), catenin-1 (Ctnna1), telomere-associated protein RIF1 (Rif1), E3 ubiquitin-protein ligase RING2 (Rnf2), undifferentiated embryonic cell transcription factor 1 (Utf1), and sal-like protein 4 (Sall4) were identified under the stringent conditions used for this database. The pluripotency markers Oct4 and Nanog were also identified, but by a single peptide (data not shown) and thus were not included in the final database. In total, 112 proteins known to be involved in the pluripotency regulatory network [10–13] or are part of a protein interaction network common to pluripotent cells (PluriNet ) were identified (Table 1 and Supporting Information Table S1).
Three hundred and forty-two proteins were common among all three proteomic strategies used in this study (Fig. 2B). Comparing the sequence coverage of these 342 proteins among each strategy revealed that for 160 proteins (47%) the 1-DE provided the highest sequence coverage, for 113 (33%) the 2-D-LC provided highest coverage, and for 69 (20%) the UF provided the highest coverage. Though it was expected that the more extensive fractionation provided by the 2-D-LC would have resulted in the highest sequence coverage, the 1-DE provided the highest sequence coverage regardless of protein length (Supporting Information Fig. S2). However, it is noted that the 2-D-LC fractions were analyzed using an LTQ and it is predicted that had they been analyzed on the LTQ-Orbitrap, the sequence coverage would have been higher as is our experience in other studies (unpublished data). The median sequence coverage for all 1895 proteins identified were 15, 18 and 12%, respectively, for 2-D-LC, 1-DE, and UF.
It has been suggested that maintaining protein integrity during sample fractionation will facilitate the identification of protein isoforms [15, 16]. Even though the sequence coverage achieved was very similar for the 342 proteins observed in all three approaches, we independently manually examined this subset to determine whether there were any proteins for which the specific isoform could be determined based on the criteria that a peptide corresponding to an amino acid sequence unique to the isoform was identified by MS. This analysis was facilitated using ProteinCenter, which visually maps the identified peptides to all protein isoforms contained within the database. Of the 1895 proteins identified, the specific isoform could be determined for 128 proteins. Of these, 96 could be determined by 2-D-LC, 38 by 1-DE, and 25 by UF. The method by which each protein isoform was determined is listed in Supporting Information Table S2. For 17 proteins that were observed by multiple methods, the isoform could only be differentiated in the 2-D-LC analysis but not by other methods. The increased number of isoforms determined via 2-D-LC is consistent with the hypothesis that more protein fractionation can lead to a more complete characterization of the protein. The determination of protein isoform can be important from a biological perspective as the isoform of a protein can affect its localization and function. For example, protein isoforms found in the current study, which have been found to be specifically involved with functional changes in the differentiation of stem cells include cell division control protein 42 homolog (CDC42), staufen (RNA binding protein) homolog 1 (Stau1), pyruvate kinase isozymes M1/M2 (Pkm2), and 2-oxoglutarate dehydrogenase E1 component, mitochondrial (Ogdh) . Specifically, the current study identified the isoform M2 of pyruvate kinase isozyme type M2, which promotes proliferation, is regulated by fructose-1,6-bisphosphate (FBP) and is present only during embryonic development, whereas the M1 isoform is found in adult heart, skeletal muscle, and brain and is not regulated by FBP [17–19].
The current data were compared with other proteomic studies of undifferentiated R1 cells by importing the protein accession numbers reported by our lab (Elliott et al. ) as well as Graumann et al.  into ProteinCenter. Clustering the proteins to remove redundancy resulted in a total of 5826 protein groups collectively for the 3 studies, with only 87 proteins common to all three studies (Fig. 2C). The overlap is limited among all three data sets by the relatively smaller data size contained in the 2-DE study of Elliott et al. (218 proteins) . The overlap between the larger data set contained in Graumann et al.  and this current study is 1161 proteins. Of the 612 proteins found in the current study but not in the other studies, 75% were identified by 3 or more peptides, which allows us to have high confidence in these identifications (Fig. 2F). Also, of the 612 proteins found only in the current study, 18 are part of the protein interaction network common among pluripotent cells (PluriNet ) and 2 are experimentally linked to the pluripotency regulatory network [10, 12] (Table 1).
In summary, the current data set adds new information to the growing knowledge of the pluripotent stem cell proteome by identifying proteins known to be important for the maintenance of pluripotency, proteins not previously identified via proteomic approaches, and the identification of specific protein isoforms. Overall, this data set should be a useful reference for future studies of stem cells.
This research was supported by funding from the Intramural Research Program of the NIH, National Institute on Aging (K. R. B.), NIH Pathway to Independence Award K99-L094708-01 (R. L. G.), the NHLBI Proteomics Innovation Contract N01-HV-28180 (J. E. V.), NIH-R01-HL085434 (J. E. V.), and AHA Grant-in-Aid #09GRNT2500002 (J. E. V.). The authors thank the Technical Implementation and Coordination Core at JHMI for their technical assistance as well as Rui Wang and Juan Vizcaino at EBI for their assistance in uploading the data to PRIDE.
Dataset information was uploaded to the PRIDE database, accession numbers 11364–11379.
The authors have declared no conflict of interest.