|Home | About | Journals | Submit | Contact Us | Français|
Pluripotent human embryonic stem cells (ESCs) can be differentiated in vitro into a variety of cells which hold promise for transplantation therapy. Human embryonal carcinoma cells (ECCs), stem cells of human teratocarcinomas, are considered a close but malignant counterpart to human ESCs. In this study, a comprehensive quantitative proteomic analysis of ESCs and ECCs was carried out using the iTRAQ method. Using two-dimensional liquid chromatography and tandem mass spectrometry analyses, we identified and quantitated ~1,800 proteins. Among these are proteins associated with pluripotency and development as well as tight junction signaling and TGF beta receptor pathway. Nearly ~200 proteins exhibit >2 fold difference in abundance between ESCs and ECCs. Examples of early developmental markers high in ESCs include beta-galactoside-binding lectin (LGALS1), undifferentiated embryonic cell transcription factor-1 (UTF1), DNA cytosine methyltransferase 3 isoform-B (DNMT3B), melanoma antigen family-A4 (MAGEA4), and interferon induced transmembrane protein-1 (IFITM1). In contrast, CD99-antigen (CD99), growth differentiation factor-3 (GDF3), cellular retinoic acid binding protein-2 (CRABP2), and developmental pluripotency associated-4 (DPPA4) were among the highly expressed proteins in ECCs. Several proteins that were highly expressed in ECCs such as heat shock 27 kDa protein-1 (HSPB1), mitogen-activated protein kinase kinase-1 (MAP3K1), nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor like-2 (NFKBIL2), and S100 calcium-binding protein-A4 (S100A4) have also been attributed to malignancy in other systems. Importantly, immunocytochemistry was used to validate the proteomic analyses for a subset of the proteins. In summary, this is the first large scale quantitative proteomic study of human ESCs and ECCs, which provides critical information about the regulators of these two closely related, but developmentally-distinct, stem cells.
Pluripotent cells are stem cells which can give rise to all cell types in the body. Pluripotent stem cells have been isolated from a variety of human sources as models for studying early human development as well as for in vitro differentiation into cardiocytes, motor neurons, hematopoetic cells and others for the purpose of transplantation therapy [1, 2]. Two of the most well-studied cell types include embryonic stem cells (ESCs) derived from the inner cell mass of blastocyst-staged embryos and embryonal carcinoma cells (ECCs), the stem cells of teratocarcinomas (mixed germ cell tumors) derived from progenitors of the germline . Both of these cell types share the general properties of pluripotent stem cells in that they exhibit unlimited self-renewal and can give rise to derivatives of all three embryonic germ layers as demonstrated by embryoid bodies in cell culture and in the development of tumors after injection into adult mice. Thus, given these attributes, pluripotent stem cells can potentially provide sufficient numbers of differentiated cells to treat a wide variety of human conditions, including heart disease, diabetes, and many neurological disorders.
However, several major hurdles remain to be overcome if such cells are to be used clinically. Most importantly, these cells must be easily and reproducibly cultured and manipulated so that they possess the necessary characteristics for successful differentiation, transplantation and engraftment. For this purpose, identifying the factors involved in stem cell survival, proliferation and pluripotency is critical. Another critical factor lies with their chromosomal stability. For instance, most ECC lines are heteroploid, and those that are diploid exhibit alterations in their genomes as revealed through comparative genomic hybridization . Nonetheless, to date the only clinical trial reported on the use of a pluripotent stem cell-derived source in humans are human ECC-derived postmitotic neurons implanted in regions of the brain damaged by stroke [5–7]. Although the outcome has been promising, the safety concerns regarding the use of a karyotypically unstable cell line will require further monitoring. In contrast, ESC lines are routinely maintained as normal diploids, except over long extended cultures which some lines have shown chromosomal abnormalities similar to those seen in ECCs . Like karyotypic instability, the expression of factors associated with oncogenesis inherent in embryonic stem cells also raises concerns for their use in transplantation. Altered expression of many factors has now been associated with some cancers even though their role is unknown or is a secondary effect downstream of the cause of the tumorigenicity. Therefore, it is essential to determine those factors which turn on the oncogenic state versus those that enhance proliferation and self-renewal without inferring aberrant cell cycles and genomic instability. These factors can then be controlled and screened in cells before transplantation to minimize the risk of potential carcinogenic outcomes. The need for this information is highlighted by the recent approval by the FDA for the first human clinical trial utilizing human embryonic stem cells. This trial involves treating patients with spinal cord injury with hESC-derived oligodendrocyte neural progenitors . The significance of identifying factors associated with pluripotency while avoiding those associated with tumorigenesis has also been highlighted by a series of studies that have shown the conversion of adult fibroblast cells into pluripotent-like stem cells by inserting four genes [10–13]. The resulting cells designated as induced pluripotent stem (iPS) cells express two pluripotent genes, OCT4 and SOX2, and two genes c-Myc and KLF4 which are frequently upregulated in tumors. Although this combination of genes successfully produced ES-like colonies that could generate chimeric animals including germline transmission, nearly 20% of the iPS-derived chimeric offspring developed tumors . In addition, Maherali et al.  demonstrated that the expression of OCT4 was no longer required for iPS cell survival. Thus, while these types of studies provide hope for reprogramming adult cells for therapeutic uses, it further reiterates the necessity of finding genes associated with pluripotency while avoiding those associated with oncogenesis.
To define such genes, many attempts have been made to study the global stem cell genome as well as its chromatin state. While these studies provide critical information for finding the factors associated with pluripotency, investigation into the protein levels in these cells are also required as levels of protein expression do not always directly correlate to transcriptomic changes. Indeed, with current developments in proteome-wide approaches, the characterization of the proteome of these cells has just begun. Some of the these proteomic studies which include analysis of mouse ESCs , human ESCs  and human embryonal carcinoma cells (ECCs)  involve non-quantitative analyses, which while useful, do not allow for differential analyses among these populations.
To date, a few proteomics studies and several transcriptomics studies have been reported comparing ESCs and ECCs [18–20]. Membrane proteomic approaches have also been reported recently using a label-free method of quantitation after extensive membrane fractionation . More recent developments in quantitative methods to study proteomics have been employed to study ESCs in mice  but have not yet been applied to study human pluripotent stem cells. For this purpose, isobaric tagged for relative and absolute quantitation labeling (iTRAQ) is an effective method for comparing the expression level of even low abundance proteins. Alternatively, stable isotope labeling with amino acids in cell culture (SILAC) is another straightforward and simple approach for labeling proteins for mass spectrometry based analysis. This approach has been recently used for quantitative comparison of the membrane proteomes in human embryonic stem cells and their differentiation after their adaptation to SILAC media .
Here, we report the use of an iTRAQ coupled to two-dimensional liquid chromatography and tandem mass spectrometry to compare the protein expression between two distinct, but phenotypically related, pluripotent populations - human ESCs and human ECCs. Our goal was to study the proteomic differences between ESCs and ECCs to identify potential candidates that might explain regulation of pluripotency and malignancy. This approach generated an initial high quality reference proteins of ~1,800 proteins, which include low abundance protein classes such as transcription factors and kinases that were not previously described in stem cells as well as previously documented stem cell markers. We also examined compartmental distribution of nuclear, cytoplasmic, and membrane proteins. Bioinformatics analysis of ESCs and ECCs revealed shared features of their pluripotent nature as well as distinguish the expression of key factors which may be related to the oncogenetic nature of ECCs.
Human ESCs (H1; WA01) were obtained from Wicell (Wisconsin) and cultured on Matrigel (BD Bioscience) coated dishes in the presence of conditioned medium derived from mouse embryonic fibroblasts (MEFs). MEFs (Millipore) were plated on gelatin (1%) -coated 10 cm dishes and cultured in DMEM, supplemented with 20% serum 3.5μl BME, 2 mM glutamax, 4 ng/mL basic fibroblast growth factor (bFGF, BD Bioscience) for 2 days. To generate conditioned media, MEFs were cultured in DMEM/F12 supplemented with 15% knock out serum, 3.5ul BME 2 mM glutamax and 2 mM NEAA, 4 ng/ml bFGF. Conditioned media is then collected every 24 hrs for 10 days. ESCs were passaged upon 80% confluence using 0.05% trypsin/ EDTA for 5 minutes at 37° C then neutralized using trypsin neutralizing solution (Lorenzo). ESCs were grown in DMEM/F12 supplemented with 2 mM glutamax, 10% FBS and passaged using trypsin similar to ESCs. The human embryonal carcinoma line, NTERA-2 cl.D1 was acquired through American Type Culture Collection (Virginia) and cultured on matrigel-coated plates under conditions described previously for this cell line . Human ESCs and ECCs were constantly monitored for any differentiation events by immunocytochemistry. In order to completely remove traces of feeder cells ESCs cultured on MEFs were subsequently passaged on matrigel coated plates for five subcultures in conditioned media. Conditioned media was filtered using a 0.2 μM filter. Before lysis, ESCs and ECCs were washed 6 times with cold PBS to remove traces of contamination from serum. Karyotypic analysis of pluripotent cells is explained in Supplementary Methods.
For the whole cell proteomic analysis, ESCs and ECCs were collected in serum-free media by washing them in ice cold PBS 3 times. The cells were lysed in 0.5% SDS and subsequently sonicated for 3 min on ice (Duty cycle 30%, output control at 3, on Sonifier 250, Branson). For the preparation of cytosolic and non-cytosolic fractions of cells, cells were washed in ice cold PBS for removal of serum. The cells were sheared by Dounce homogenizing 150 strokes in buffer containing 5 mM HEPES pH 7.4, 0.5 mM EDTA, 250 mM sucrose and freshly prepared 1 mM phenylmethylsulfonyl fluoride. Non-cytosolic fraction (pellet) was separated from cytoplasmic fraction by centrifugation at 100,000 g for 10 min at 4 °C. ESC and ECC samples from whole cell lysate, cytosolic and non-cytosolic preparations were normalized based on protein concentration and used for iTRAQ labeling.
Peptides from ESCs and ECCs were differentially labeled using iTRAQ reagent according to manufacturer’s instructions (Applied Biosystems). Briefly, 40 μg of each sample from duplicate ESCs and ECCs were treated with 2μl of reducing agent (tris(2-carboxyethyl) phosphine (TCEP)) at 60°C for 1 hr and alkylated with 1 μl of cysteine blocking reagent, methyl methanethiosulfonate (MMTS) for 10 minutes at room temperature. Protein samples were digested using sequencing grade trypsin (Promega) (1:15) for 12 hr at 37°C. Peptides from each sample in a final volume of 40μl were labeled with one of the four iTRAQ reagents in 70μl of ethanol at room temperature. After 2 hrs, iTRAQ labeling reactions were terminated by adding 100μl water to each sample and then samples are subsequently combined and organic solvent evaporated using a Speedvac. pH was adjusted to 3.0 using 100 mM phosphoric acid and then diluted to 1 ml in SCX solvent A (10 mM potassium phosphate buffer (pH 2.85) 25% acetonitrile). Combined mixtures of iTRAQ labeled tryptic digests from ESCs and ECCs were fractionated using strong cation exchange chromatography on a Polysulfoethyl A column (PolyLC, Columbia, MD) (300A, 5μm, 100 × 2.1mm) using an Agilent 1100 HPLC system containing a binary pump, UV detector and a fraction collector. Fractionation of peptides (0.2 ml fraction) were carried out by a linear gradient between solvent A and solvent B (solvent A, 350 mM KCl, pH 2.85). Three SCX fractionations were carried out for whole cell lysate, cytosolic and non-cytosolic preparations. The fractions were completely dried and reconstituted in 40μl of 0.2% formic acid and stored at −80°C until LC-MS/MS analysis.
Tandem mass spectrometry analysis of iTRAQ labeled peptides was carried out on a quadrupole time-of-flight mass spectrometer (QSTAR/pulsar, Applied Biosystems). Peptide fractions from SCX chromatography were further separated on reversed-phase liquid chromatography (RP-LC) system (Agilent 1100 system) interfaced with a mass spectrometer. The RP-LC system consisted of a desalting column (75μm × 3 cm, C18 material 5–10μm, 120Å) and an analytical column (75μm × 10 cm, C18 material 5μm, 120 Å) with a nanoflow solvent delivery. Electrospray source is fitted with an emitter tip 8μm (New Objective, Woburn, MA) and maintained at 900 v ion spray voltage. Peptide samples (40μl) were loaded onto a trap column in 0.1% formic acid, 5% acetonitrile for 15 min and LC-MS/MS data were acquired by online analysis of peptides eluted in an acetonitrile in 0.1% formic acid (5–40%) gradient for 30 min with a flow rate of 300 nl/min. Using Analyst v 1.1 (Applied Biosystems), MS/MS data were acquired by targeting three most abundance ions in the scan range of m/z 350 to 1200 Da and those ions selected were excluded from MS/MS for 45s. Unlike non-labeled peptides, twenty percent higher collision energy was applied during MS/MS scan of iTRAQ labeled peptides.
Peptide and protein identification was carried out in compliance with Molecular and Cellular Proteomics guidelines. ProteinPilot software V3.0 (Applied Biosystems) was used for database search and quantitation, which uses Paragon algorithm for protein identification and quantitation. Estimates of both local and global FDR are given in Supplementary Table 1. ProGroup algorithm further process these data to determine minimal set of justifiable identified proteins. Instrument raw files were uploaded from three sets of experiment separately (Whole cell, cytosolic and non-cytosolic) and searched against human RefSeq database version 35 containing 33,888 proteins. Search parameters included iTRAQ labeling at N-terminus and lysine residues, cysteine modification by methyl methanethiosulfonate (MMTS), methionine oxidation and digestion by trypsin. We carried out the data analysis using ProteinPilot 3.0, which gives both global and local FDRs. The list of proteins shows the estimate of proteins at 1% and 5% FDR levels. Since in ProteinPilot, we used >95% confidence score cutoff (>1.3 unused score) for protein identification before FDR analysis, we included proteins identified up to 5% FDR
Relative abundance of proteins calculated based on individual peptide ratios. Shared peptides were not included for quantitation except for first hit protein among the other proteins and isoform specific identification of protein was carried out by selecting peptides distinct to each form. The ion count threshold value for considering reporter ions for fold calculation was set at 7. When the same protein was identified in more than one experiment, the quantitation ratio is selected from the experiment with the best p-values. ProteinPilot software quantitates protein ratios for those identified with at least two peptides considering the error factor and p-value, both are estimation of confident interval indicating the likelihood that protein is differentially expressed. In addition, we have included background noise reduction and bias correction feature of ProteinPilot.
For the functional analysis we used Ingenuity Pathways Analysis (IPA) software version 7.1 (Ingenuity Systems, Mountain View, CA) (<http://www.ingenuity.com/products/pathways_analysis.html>). We uploaded the Entrez gene symbols corresponding to all proteins quantitated from both ESCs and ECCs. Proteins with least p-value for iTRAQ ratio were selected from three experiments which represent the aggregate of whole cell lysate, cytosolic and non-cytosolic fractions. IPA software was used to overlay the proteins identified in ESCs and ECCs in different canonical pathways and networks along with their expression level values (p-value <0.05). For cellular localization annotation, all gi accession numbers were mapped to HPRD accession and clustered according to primary localization (nucleus, plasma membrane, cytoplasm, extracellular matrix and unknown category) to understand the proteomic coverage attained by our method.
Antibodies and the concentrations used for immunocytochemical validation are summarized in Supplementary Methods. ESCs and ECCs were fixed in 4% paraformaldehyde for 15 min and antibodies diluted in Dulbecco’s PBS (DPBS) containing 15% goat serum and incubated with the fixed cells for an hr at 25°C. Fluorescently-labeled secondary antibodies (1:200 dilution; Molecular Probes) diluted in DPBS in 15% goat serum were used for detection. Nuclei were stained using DAPI (Sigma) and controls were performed with secondary antibodies alone. Fluorescent images were visualized using a Nikon Eclipse E800 microscope (Nikon, Inc., Melville, NY) and were captured with a Photometrics 20 MHz cooled interlined CCD camera. Alexa Fluor 488 (cyan-green color) was detected using a FITC excitation filter, a 505 nm dichroic mirror and a barrier filter (Chroma, Inc., Burlington, VT)with a band width of 515–555 nm. Alexa Fluor 594 (orange-red) fluorescence was detected using a G2ERHOD 541–551 nm excitation filter, a 575 nm dichroic mirror and a barrier filter with a band width of 590. DAPI was detected using a standard DAPI/Hoechst filter set, UV 2E/C 340380 nm excitation filter, 400 nm dichroic mirror, and a barrier filter with a band width of 435–485 nm. The images were processed using Metamorph software, v.6.2 (Universal Imaging Corp). Importantly, to confirm differences in the relative expression between cell lines, images were captured with the same exposure time for each treatment.
The integrity of human ESCs and ECCs isolated for quantitative proteomic analysis was verified with well-established markers of pluripotency, POU class 5 homeobox 1 transcription factor, (OCT4), tumor rejection antigen 1–81 (TRA-1-81) and stage specific antigen-4 (SSEA4) (Supplementary Fig. 1). Whole cell lysate, cytosolic and non-cytosolic fractions of ESCs and ECCs were compared with non-cytosolic fractions containing membrane, nuclear and other organelle proteomes. Importantly, technical replicates were performed for each experiment dividing the same lysate into two aliquots. Peptides from ECCs were labeled with reagents containing 114 and 115 iTRAQ reporters while peptides from ESCs were labeled with reagents containing 116 and 117 iTRAQ reporters (Fig. 1). LC-MS/MS analysis of 70 SCX fractions from whole cell lysates, cytosolic and non-cytosolic preparations generated a total of >100,000 MS/MS spectra. Using confidence cutoff score ProtScore value >1.3 (95 % confidence), a total of ~1,800 proteins were identified from 36,967 distinct peptides. MS/MS and iTRAQ reporter ion spectra of representative peptides from proteins with different expression levels in ESCs and ECCs are shown in Fig. 2. Panels A and B show the MS/MS spectra of peptides from undifferentiated embryonic cell transcription factor 1 (UTF1) and DNA cytosine-5 methyltransferase 3 beta isoform 1 (DNMT3B), which were highly expressed in ESCs. Panel C and D show the MS/MS spectra of peptides from heat shock 27kDa protein 1 (HSPB1), and CD99 antigen (CD99), which were highly expressed in ECCs. Panel E and F show the MS/MS spectra of peptides from podocalyxin-like isoform 1 (PODXL) and LIN28 homolog (LIN28), which showed no significant change in expression in whole cell analysis. Supplementary Fig. 2 shows additional MS/MS spectra and iTRAQ ratios of 8 peptides from proteins 1) highly expressed in ESCs: Beta-galactoside-binding lectin (LGALS1), biglycan (BGN), gelsolin (GSN), 2) highly expressed in ECCs: developmental pluripotency associated 4 (DPPA4), cellular retinoic acid binding protein 2 (CRABP2), and nucleolar protein 1, 120kDa (NOP2) and 3) protein (talin 1) which show similar level of expression. HELLS1 showed slight higher level of expression in ESCs compared to ECCs as shown by MS/MS spectrum and immunocytochemical staining. The complete list of these proteins along with iTRAQ ratios and FDR values can be found in Supplementary Table 1. Importantly, quantitation data is supported by p-values wherever more than two peptides are used for quantitations, each with technical replicates. Error factor and number of peptides (>95% confidence) used for quantitation are included. The error factor is similar to standard deviation and it gives a measure of the certainty of the average ratio. ProteinPilot calculates Error factor, = 1095%Confidence error.
Fig. 3A shows iTRAQ fold changes for all proteins and differential expression of a small subset of proteins from ESCs and ECCs. Nearly, 213 ESC proteins showed >2 fold changes in expression levels while ~208 proteins were found to be expressed more in ECCs. Table 1 shows the partial list of proteins (top 55) along with their iTRAQ ratio that were overexpressed in ESCs when compared to ECCs. The transcription factors, UTF1 and general transcription factor IIIC, polypeptide 4 (GTF3C4) were highly expressed in ESCs (14 and 2.4 fold respectively). UTF1 is a known pluripotency marker which decreases during the onset of differentiation of stem cells . Highly expressed ESCs membrane protein include annexin 1 (ANXA1) caspase recruitment domain family, member 11 (CARD11) (4.0 fold) and cadherin EGF LAG seven-pass G-type receptor 3 (CELSR3) (9.2) fold, catenin (cadherin-associated protein) beta 1(CTNNB1) (2.6 fold), interferon induced transmembrane protein 1 (IFITM1) (2.7 fold) and zyxin (ZYX) 5.3 fold. In contrast, Table 2 shows the partial list of proteins (top 55) identified in this study that were highly expressed in ECCs compared to ESCs. Among them, growth differentiation factor 3 (GDF3), DPPA4, MFGE8 and HSPB1 were identified.
Functional annotations of the combined list of proteins from all three experiments are shown in Fig. 3B. Categories are based on primary localization with the total number of proteins in parenthesis, which include cytoplasm (493) nucleus (466), mitochondrion (146), endoplasmic reticulum (80), ribosome (61), extracellular (37), integral to membrane (25), golgi apparatus (25), lyzosome (10), centrosome (9), and endosome (7). In addition to the 97 proteins localized primarily to the plasma membrane, another 160 proteins were found in which plasma membrane was their alternate localization. The list of proteins with NCBI sequence identifier GI number and localization information derived from HPRD database  is given in Supplementary Table 2. Functional annotation of the protein dataset using the Ingenuity pathway analysis tool revealed identification of a large number of molecules from several canonical pathways (Fig. 3C). Supplementary Table 3 shows the list of ~480 proteins classified as cancer gene clusters using Ingenuity pathway analysis (IPA) tool. With a 2 fold change in expression as cutoff, we found ESCs and ECCs showed 15% and 11% of highly expressed proteins in respective cells. Cancer markers that were expressed at lower levels in ESCs when compared to ECCs, included p53 induced protein (0.5 fold in non-cytosolic fraction), NFKBIL2 (0.8), S100A4 (0.7) and HSPB1 (0.3) (p<0.05).
In contrast, when pathways associated with pluripotency were studied, the Wnt pathway demonstrated a significant number of proteins that could be detected in both cell lines. Specifically, 17 molecules associated with the Wnt pathway (www.netpath.org) such as ARRB1, LRP1, CTBP2, MAP1B, CSNK2B, CDC2, CSNK2A1, PPP2CA, RUVBL1, PIN1, SUMO1, SUMO2, MARK2 PPP2R5B, RHOA and RAC1 were identified in both ESCs and ECCs. These proteins were expressed in similar levels between both cell types while other members of this pathway such as ARRB1 (~1.5 fold), CTNNB1 (1.4 fold), PPP2CA (~1.7 fold) were high in ESCs and CDC2 (<0.6) was low in ESCs. Among the member of TGF beta receptor pathway, 19 molecules (ANAPC4, AP2B1, CAV1, CDC2, CDC27, CTNNB1, HDAC1, HSPA8, KPNB1, NUP153, NUP214, PPP2R2A, SNX2, SNX6, SPARC, STRAP, SUMO1, TRAP1 and XPO1) were identified in this study. Among them SNX2, CDC2, CDC27, and STRAP showed 0.5, 0.6, 0.7 and 0.7 fold changes in ESC when compared to ECCs respectively. Only SPARC was found to be high (2.0 fold) in ESCs.
Among the proteins associated with kit receptor pathway, nine proteins (CLTC, CRKL, GRB2, MAPK1, PLCG1, PTPN11, RPS6KA1, STAT1 and VAV2) were detected in our study. MAPK1 was expressed in low level in ESCs (0.54 fold respectively). Under the Notch pathway, APP, HDAC1, HDAC2, MAPK1, SIN3A and WDR12 were identified and specifically cell proliferation regulatory protein WDR12 was ~2.1 fold highly expressed in ESCs when compared to ECCs. Twenty two proteins were identified from MAPK signaling pathway including ARRB1, CASP3, CDC42, CRKL, DUSP5, FLNA, FLNB, FLNC, GRB2, HSPA1A, HSPA8, HSPB1, MAP2K2, MAPK1, PAK1, PAK2, PPM1A, PPP5C, RAC1, RPS6KA1 and RRAS2. Both CASP3 showed 5.4 fold while PAK1 and PAK2 showed 2.3 fold changes in expression level in ESCs when compared to ECCs. MAPK1, RRAS2 and HSPB1 showed less than 0.5 fold expression levels in ESCs. Among the large number of molecules associated with EGFR pathway, molecules such as ARF4, KRT18, KRT8, MAPK1 and NDUFA13, were low in ESCs when compared to ECCs. In contrast, VAV2, connexin 43 and PAK1 levels were high in ESCs when compared to ECCs (Table 2). GRB2 involved in receptor tyrosine kinase signaling showed slightly higher expression (1.3 fold, p<0.03) in ESCs compared to ECCs.
In addition to identifying specific pathways, our data also classified proteins by cellular function including embryonic stem cells survival and cell death, cellular growth and proliferation, cellular assembly and organization, cell cycle, DNA replication, and recombination and repair (Fig. 3C). Interestingly, many proteins involved in early embryonic development were also detected in our analyses. These included DNMT3A, DNMT3B, MAGEA4, HELLS, GDF3, UTF1 and CTNNB1. Hence this proteomic dataset is a valuable resource to investigate subset of proteins in specific pathways.
To corroborate proteomic analyses, the relative expression of selected proteins that were differentially expressed was also compared in cell lines by immunostaining. Relative levels of expression were consistent with proteomic analysis for the following proteins: UTF1, DNMT3B, CTNNB, GSN, BGN, and LGALS1 which showed a higher expression of these proteins in ESCs versus ECCs (Fig. 4). HELLS showed slight higher level of expression in ESCs. We also investigated proteins which were expressed higher in ECCs compared to ESCs. These included DPPA4, GDF3, MFGE8 and HSPB1 which demonstrated similar results in immunostaining while TLN1 showed no significant difference in expression between populations in the iTRAQ or immunostaining (Fig. 5 and Supplementary Fig. 2).
Comparisons between the expression profiles of ESCs and ECCs have been recently highlighted by the growing interest in identifying factors which distinguish pluripotency and oncogenesis. Although, ESC and ECCs provide a model to study these attributes, only a handful of comparisons have been performed on their transcriptomes and even less comparing their protein expression. What is known is that both cell types express markers associated with pluripotency including the three, well-established transcription factors which regulate this process - Oct4, Nanog and Sox2. Although these factors are expressed at such low abundance as to be detected by current proteomic technologies, other, more abundant members were found by this study to be expressed in both ESCs and ECCs. Several of these are known markers of undifferentiated ESC but whose relative protein levels compared to human ECCs have not been reported until now. These include, lin-28 homolog, THY1 cell surface antigen, UTF1, and GDF3, which showed 0.8, 1.1, 14 and 0.3 fold changes respectively in ESCs compared ECCs.
In addition to these factors, we were also able to detect differences in protein levels in three well-established factors of pluripotency which have been previously reported in the only other study to date comparing these lines using proteomic analysis . Dormeyer et al reported that tissue non-specific alkaline phosphatase precursor (ALPL), CD9, and beta-catenin (CTNNB) were similarly expressed in HUES-7 ESCs (Doug Melton’s Harvard line) and NT2/D1 ECCs. However, our results revealed that although ALPL levels were similar in both lines, H1 ESCs expressed higher levels of CD9 and CTNNB than the NTera2 ECCs. However, it remains to be determined whether these inconsistencies are the result of differences in the sensitivity in the proteomic analysis or the result of subtle differences in expression between cell lines. Furthermore, our study was able to detect differences in expression for three other pluripotent associated factors. These included UTF1, which demonstrated higher levels of expression in ESCs compared to ECCs while both lines expressed similar levels of LIN28 and THY1.
In addition to these established regulators of pluripotency, this report demonstrates, for the first time, relative abundance of proteins associated with early development that also have implications in stem cell regulation. These include DPPA4, DNMT3A, DNMT3B, MAGEA4, IFITM1, left-right determination factor-B (LEFTB), CD9, helicase lymphoid-specific protein (HELLS), LIN28, insulin-like growth factor 2 mRNA binding protein 2 (IGF2BP2), podocalyxin-like 1 (PODXL), cellular retinoic acid binding protein 2 (CRABP2) and DPPA4. Many of these factors were recently recognized by The International Stem Cell Initiative based on gene expression across 59 human embryonic stem cell lines. In fact, our study was able to compare the protein expression of 8 of the 20 transcripts described by this report as positively correlated with NANOG expression in undifferentiated ESCs. These include DNMT3B, GDF3, LEFTB, IFITMI1, UTF1, LIN28, PODXL and CD9.
Of significance is the ability of our analyses to detect quantifiable differences in the expression of these factors. While this has been performed for a number of these markers at the transcriptional level, this report signifies the importance of investigating directly differences in protein expression. Specifically many of the early developmental markers we investigated were high in ESCs compared to ECCs consistent with the premise that ECCs is derived from more mature germ line precursors. For instance, members of a family of proteins which play an important role in DNA methylation and genomic imprinting such as DMNT3B was high in ESCs compared to ECCs (5.6 fold) [26, 27]. This is also consistent with results we previously reported that demonstrated decreased levels of DNMT3B level during ESC differentiation into motor neurons . Hence DNMT levels may be useful to delineate undifferentiated ESCs from differentiated cells as well ECCs. Interestingly, HELLS, also known as LSH, which supports transcription repression by interacting with DNMTs  was found to be same in both cell types.
DPPA4, another early developmental marker, was also found to be low in ESCs when compared to ECCs consistent with previous reports demonstrating the expression of this protein in ECCs and germline cells [30, 31]. Furthermore, DPPA4 has also been implemented in the inhibition of ESC differentiation into the ectoderm lineage in mouse  which is consistent with our earlier study showing decreased expression during human ESC differentiation into motor neurons . Other early developmental factors that were also highly expressed in ESCs included IFITM1 and LEFTY1, while PODXL and IGF2BP2 levels were low. Similarly, a suspected markers of pluripotency CRABP2 demonstrated higher expression in ECCs than ESCs as well as key factors of chromatin remodeling. Immunocytochemical staining also showed increased ESC expression of gelsolin (GSN), which is an early developmental protein involved in actin restructuring [33, 34].
A meta-analysis across 38 studies of hESC transcriptomes between undifferentiated versus differentiated cells showed a subset of nearly thousand genes from ESCs common to at least three studies . This list of genes was considered as potential differentiation genes based on their high expression levels in ESCs compared to differentiated cells. We have compared our entire protein list with this subset of genes to study the status of differentiation genes among ESCs and ECCs. Among these genes, our proteomic analyses found 265 proteins of which 193 proteins showed no change (using 2 fold change as cutoff) in protein abundance levels in ESCs and ECCs (Supplementary Table 4). Further, our proteomic analysis confirmed that 19 proteins were highly expressed in ESCs compared to ECCs such as CASP3, DNMT3B, ETV1, FABP5, GMFB, HMGA1, KPNA2, LGALS1, LIG1, MTHFD2, PAK1, PSIP1, RUVBL1, SERPINB9, SLC3A2, UCHL1, UGP2, UTF1 and WDR3. Notably, BCAT1, CSE1L, GDF3, DPPA4, MFGE8 and CRABP2 were highly expressed in ECCs compared to ESCs. We carried out correlation analysis among iTRAQ ratios from whole cell, cytosolic and non-cytosolic fractions, which showed significant correlations (r value around 0.4 to 0.5). This data is shown in Supplementary Table 5.
Several overexpressed proteins identified in this study have not been reported earlier in the context of pluripotency or differentiation. Interestingly, three ECM glycoproteins were highly expressed in ESCs compared to ECCs, including biglycan (BGN), tectorin alpha (TECTA), and galectin 1 (LGALS1) (3.5, 34 and 4.8 fold, respectively). Although the relationship between cell-matrix interactions and pluripotency is a well recognized phenomenon in culture, to date nothing is known regarding the molecules or mechanisms involved in stem cell survival or maintenance. BTB (POZ) domain containing 5 (KLHL28) belongs to a BTB/POZ zinc finger domain family known to play important roles in transcriptional regulation . KLHL28 is 13 fold more abundantly expressed in ESCs when compared to ECCs.
The protein dataset from his study also included ~480 proteins associated with cancer. Many cancer specific genes (15%) were also found to be highly expressed in ESCs when compared to ECCs. P21-activated kinase 1 (PAK1) overexpression (1.8 fold) has been reported in breast cancer  and anti-PAK1 drugs have been used for pancreatic cancer therapy . Similarly cancer/testis antigen melanoma antigen family A, 4 (MAGEA4) is overexpressed in many cancers including oral squamous cell carcinoma  and non-small cell lung cancer . Both PAK1 and MAGEA4 levels were high (6.0 and 3.2 fold respectively) in ESCs when compared to ECCs. Some proteins involved in limiting proliferation were also overexpressed in ESCs. For example, the HECT-domain ubiquitin ligase, Huwe1, expressed >2.8 fold higher in ESCs. This protein has been described in controlling differentiation and proliferation through nMyc ubiquitin mediated degradation  and as a key player in multiple cancers by degrading tumor suppressor genes . Likewise, A-kinase anchor protein 12 isoform 2 (AKAP12), a tumor suppressor gene whose inactivation has been implicated in gastric cancer  and myeloid malignancies , was also detected 2.5 fold higher in ESCs than ECC. Another marker associated with certain types of cancer, MCAM, an adhesion molecule was also more highly expressed in ESCs.
Lower expression of some of the known cancer genes were also detected in ESCs compared to ECCs. These included well known cell-cycle regulators such as S100A4 and p53 induced protein (in non-cytosolic) (TPS3I11) as well as the signal transduction molecules such as heat shock 27kDa protein 1 (HSPB1), MAP3K1 (MEK1) and nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (NFKBIL2). Invasive factors such as non-metastatic cells 4 protein (NME4) and milk fat globule-EGF factor 8 proteins were also down regulated in ESCs compared to ECCs. These results suggest that the regulation of these factors in ESC may prevent the tumorigenicity found in ECCs. Other cancer markers such as the adhesion molecules pinin, desmosome-associated protein (PNN) and integrin, beta 1 (ITGB1) were similarly expressed between ESC and ECCs as well as factors identified in metastatic tumors such as MTA2, MTA3 and non-metastatic cells 1 protein (NME1) suggesting that these are not contributing factors in oncogenesis. Interestingly GDF3 was also found to be high in ECC versus normal testis consistent with our data showing increase in ECCs compared to ESCs. Another study also compared the gene expression of human ECCs to normal testis . Compared to normal testis, ECCs expressed more branched chain aminotransferase 1, cytosolic (BCAT1), DNMT3B, N-acylaminoacyl-peptide hydrolase (APEH), visinin-like 1 (VSNL1), metallothionein 2A (MT2A), CD9 and GDF3. In our study, comparing ECCs to ESCs, ECCs expressed less DNMT3B (5.6 and 3.5 fold), VSNL1 (1.8 fold) and more CD9 (1.6 fold) and, GDF3 (3.0 fold) and BCAT1 (4.0 fold, cytosolic) while APEH showed similar expression.
Currently, there are only a few markers that can distinguish human ECCs from human ESCs. For instance, various proteins encoded on chromosome 12p, duplicated in testicular cancer, were uniquely high in human ECCs . Of the 8 proteins originally reported by Dormeyer et al.  that were unique to ECCs, we found five proteins to be consistently high in ECCs although still expressed in ESCs. These included GAPDH, lactate dehydrogenase B (LDHB), tyrosyl-tRNA synthetase 2, mitochondrial (YARS2), moesin (MSN) and nucleolar protein 1 (NOP2) which are also known to be high in testicular cancer. Furthermore, the germ cell marker recently used to derive adult male germ-line stem cells, ITGA6 (CD49f), showed no change in expression level in ECCs versus ESCs consistent with the theory of germ cell origin for ESCs. There are two other markers that have been used previously to distinguish ECCs and ESCs. One is the well-established germ cell-specific marker VASA or DDX4 (DEAD (Asp-Glu-Ala-Asp) box polypeptide 4). Another is a protein marker recently discovered for its role in identifying ECCs in patient semen, known as AP2-gamma or transcription factor activating protein 2-gamma (TFAP2C) . However neither transcription factor was detected in our analyses. Like Oct4, Nanog and Sox2 expression, this is consistent with the inability to detect low abundance transcription factors by these analyses emphasizing the current need for various approaches trying to identify factors contributing to cell identity.
Isotope labeling based quantitative proteomics using mass spectrometry is a powerful approach for global characterization of proteins highlighting cell specific key molecules. This is the largest report using this approach to compare ESCs versus ECCs, which provided a number of candidate factors to study for roles in oncogenesis versus pluripotency. Interestingly, cells from different genetic backgrounds are expected to show extensive quantitative differences, while the ESC and ECC proteomes identified in this study shared several common factors. Thus this technology had greater sensitivity to detect proteins, we are able to report on a number of pluripotent-associated factors and their relative expression levels between two fundamentally similar pluripotent cell lines which differ in their oncogenic tendencies. Significant changes were observed for many targets in this study which could not be detected using label-free methods of quantitation. For instance, there is one other study that reports on the proteomic comparisons between human ECCs and ESCs but it was limited to membrane proteins and used methods with reduced sensitivity for detecting differences in levels of protein concentration compared to the analyses performed here. Nonetheless, it provides comparisons to study which are relatively consistent with the results shown here. This provides a powerful model to distinguish those factors associated with developmental potency from those regulating tumorigenicity. Large numbers of proteins reported in this study have not been studied in the context of ESCs and ECCs. All the peptides and corresponding protein data found in this study has been deposited in Human Proteinpedia  (www.humanproteinpedia.org, identification number HuPA 00641) to facilitate the dissemination of this data set.
Immunocytochemical staining of ECCs and ESCs. A. Oct4, B. SSEA4, and C, Tra-1-81. DAPI (blue) was used to stain nuclei.
MS/MS spectra and the iTRAQ reporter ion spectra of 12 peptides. A. BGN, B. GSN, C. LGAL1, D., TLN1, E., NOP1, F. CRABP2, G. DPPA4 and H. HELLS.
Methods describing details of karyotyping, mass spectrometry analysis and immunocytochemical labeling experiments.
Complete list of proteins quantitated from ESCs and ECCs using iTRAQ.
Subcellular localization of proteins identified in ESCs and ECCs.
The list of proteins quantitated from ESCs and ECCs, which are found to be associated in different types of cancer.
Proteins identified in this study that have previously been reported in a meta analysis of 36 studies pertaining to ESC differentiation (Assou et al. 2006)
Correlation analysis among iTRAQ ratios from whole cell, cytosolic and non-cytosolic fractions,
This work was supported by a grant from the Maryland Stem Cell Research Fund, State of Maryland (2007-MSCRFE-0137-01) to C.L.K., J.D.G. and A.P. and an NIH Roadmap grant “Technology Center for Networks and Pathways” (U54 RR 020839) to A.P and the Maryland Stem Cell Research Fund, State of Maryland (2007-MSCRFE-0210-01) to C.L.K. We thank Marjan Gucek and Robert Cole for assistance with mass spectrometry and Ms. Fei Fei (Cyndi) Liu for help with immunocytochemistry.