|Home | About | Journals | Submit | Contact Us | Français|
Genomics methodologies have advanced to the extent that it is now possible to interrogate the gene expression in a single cell but proteomics has traditionally lagged behind and required much greater cellular input and was not quantitative. Coupling protein with gene expression data is essential for understanding how cell behavior is regulated. Advances primarily in mass spectrometry have, however, greatly improved the sensitivity of proteomics methods over the last decade and the outcome of proteomic analyses can now also be quantified. Nevertheless, it is still difficult to obtain sufficient tissue from staged mammalian embryos to combine proteomic and genomic analyses. Recent developments in pluripotent stem cell biology have in part addressed this issue by providing surrogate scalable cell systems in which early developmental events can be modeled. Here we present an overview of current proteomics methodologies and the kind of information this can provide on the biology of human and mouse pluripotent stem cells.
Development from a single fertilized zygote to a complex multicellular organism occurs within a relatively short period of time compared with the total lifetime of the resulting adult individual. This remarkable feat requires the precise orchestration of multiple sequential and parallel events controlling cell specification, division, position, migration and communication. With the discovery of genes encoded in DNA, decades of research assumed that the blueprint for the embryo lay entirely in the regulation of gene transcription; but there is a growing realization that epigenetics and the status of proteins in a cell play equally crucial roles. There are many (up to 50%) changes in protein expression that do not have a corresponding change in mRNA expression (during early differentiation) (Lu et al. 2009). Showing the presence of signaling pathway components in a cell is also not sufficient to assess their importance, because protein modifications of many types can affect the functioning of the protein in the cell. It is essential to know the nature of specific signaling pathways, downstream targets, and inhibitory networks, as well as the kinetics of their activation.
Most embryos, however, do not lend themselves easily to the techniques that are available to protein chemists. Classic Western blotting to identify proteins and their activation status, more contemporary ChIP-chip or ChIP-seq to identify interacting partners, and mass spectrometry (MS) for large-scale protein identification, generally require more cells and tissue than available from the mammalian embryos that are closest in development to humans. In contrast, genomic and gene expression profiles can be generated these days from just a single cell. The gap, however, is beginning to close to the extent that some protein assays can be performed on more limited numbers of embryos directly, although the surrogate model systems offered by pluripotent stem cells from mice and humans, as we describe here, are proving exceptionally informative for events that probably take place in the early embryo.
Stem cells are defined by (1) their ability to self-renew, and (2) their ability to differentiate into one or more different cell types. At one end of the spectrum are totipotent cells, like the fertilized egg or early blastomeres that can become all cells of the conceptus. At the other end are spermatagonial stem cells that are unipotent and can only differentiate into sperm. Between these extremes are pluripotent cells of the blastocyst stage of embryonic development and the multipotent stem or progenitor cells of specific tissues and organs like the nervous system (neural progenitor cells) and skin. Multipotent stem cells are able to differentiate to different cell types but usually only those that make up the organ or tissue from which they are derived (reviewed in Jaenisch and Young 2008). In adults, they are thought to be the sources of cells for tissue repair.
Research in the 1960s on teratocarcinomas, spontaneous tumors that look like disorganized embryos found in the testes of some strains of mice, eventually led to the discovery in the 1970's that pluripotent stem cells are also present in early mouse embryos. The experiments performed showed that teratocarcinomas can be induced in mice simply by transplanting normal embryos to extrauterine sites. The tumors that formed contained multiple tissue types as well as a stem cell population that could be maintained indefinitely in an undifferentiated state in culture. When injected into syngeneic hosts, say under the kidney capsule, they would form teratocarcinomas, once more containing the same mixture of differentiated cells. These stem cells are known as embryonal carcinoma (EC) cells. They resemble undifferentiated embryonic cells of the mouse blastocyst stage embryo in many respects, expressing the same cell-surface proteins and enzymes (Gokhale and Andrews 2006; Yu and Thomson 2008). Human EC cell lines have also been derived as a similar stem cell population from the spontaneous teratocarcinomas that can occur in young men, it is thought as a result of germ cell development going awry before birth. Research with these cell lines has provided important knowledge about the properties of these tumors and the early differentiation of the stem cells that they contain. It also provided the intellectual framework for the successful derivation and culture of embryonic stem cells (ESCs) directly from embryos without an intervening teratocarcinoma stage both in mice and in humans.
The observation that early embryos can form teratocarcinomas when transplanted to animals led to the hypothesis that intact embryos may contain cells that are, or can become, pluripotent stem cells. A few years later, ESCs were indeed isolated from the inner cell mass of blastocyst-stage embryos (Evans and Kaufman 1981; Martin 1981). These cells were immortal and able to contribute to embryonic development by differentiation when injected into a host blastocyst, just like EC cells. More importantly, however, they could contribute to the germ line, forming gametes. This allowed them to become the most important vehicle for genetic manipulation in mice, the deletion of genes by homologous recombination. Whereas the first mESCs required culture on so-called “feeder” cells, usually rather ill-defined fibroblast cells that can, for instance, be derived from mouse embryos around mid-gestation, to inhibit differentiation, it is now possible to culture them and maintain the undifferentiated state in defined growth media supplemented with either growth factors or inhibitors of signaling pathways that induce differentiation (Ying et al. 2008). Many years after the first isolation of mESC, the first human ESC (hESC) lines were derived (Thomson 1998) following much the same protocol using feeder cells, although in this case, the cell-surface molecular identity of hESC clearly resembled human EC cells more than mESCs, and the growth factor requirements differed between the two species. Whereas mouse embryonic fibroblasts (MEFs) could be replaced by leukemia inhibitory factor (LIF) and bone morphogenetic protein (BMP) for mESCs, this was not the case for hESC, and two other factors are required in addition to an appropriate coating of the plastic dish: a combination of fibroblast growth factor (FGF) and nodal or activin. hESC and mESC did, however, show a similar capacity to differentiate into many cell types in both teratomas in immunocompromised mice and in culture. In addition, the core transcription factors that are now considered the signature of stem cell pluripotency (Loring and Rao 2006) were clearly largely conserved between mice and humans.
Of course, the use of human embryos to derive hESC made them ethically controversial, and several groups addressed the question of whether direct reprogramming of a somatic nucleus was possible, perhaps using signals much like those present in the oocyte, which led to reprogramming of the genome during the generation of the first cloned animal (Campbell et al. 1996). In 2006, however, Yamanaka and colleagues at Kyoto University reported that the introduction of genes encoding four important stem cell transcription factors (Oct4, Sox2, Klf4, and c-Myc) into adult mouse cells by retroviral transduction resulted in reprogramming them into cells with ESC-like properties (Takahashi and Yamanaka 2006). These reprogrammed cells were referred to as “iPSCs,” for induced pluripotent stem cells. In 2007, Yamanaka and the laboratory of Thomson (Takahashi et al. 2007; Yu et al. 2007) described the successful genetic reprogramming of human adult cells into human iPSCs. Because ESCs and iPSCs are pluripotent and appear to model the first differentiation steps in development faithfully, they are frequently used as a surrogate for these processes when large numbers or cell types not accessible from the human embryo are required for study. For all intents and purposes, hESCs and hiPSCs seem to be very similar, although not identical (see, e.g., commentaries Mummery 2011; Panopoulos et al. 2011; Pera 2011). Imprinting and epigenetic memory during reprogramming have been described as being retained during reprogramming so that during early passages, iPSCs may retain a preference to differentiate to the somatic cell type from which it derives, although this may be lost at later passages (Ohi et al. 2011).
In the present article, we will consider what proteomics has added to date to our understanding of pluripotency networks and linked signaling pathways, how cells exit the pluripotent state, and how lineage might be determined during differentiation of stem cells. Genomics provided the genetic tools for reprogramming to pluripotency and direct reprogramming: what can proteomics add? Answering these questions and others like them in stem cells will likely require standardized growth and differentiation protocols for multiple cell lines, much like those available for mESC, and many of these are now being reported. Defined protocols for differentiation are still under development, although exceptionally, the neural lineages now have some robust protocols (Gossrau et al. 2007; Koch et al. 2009; Lee et al. 2011).
To address the contribution of proteomics to understanding pluripotency and differentiation, we divide the text into two main sections:
MS has become one of the most powerful tools for protein identification in cell biology. Mass spectrometers are versatile instruments that come in many different types for multiple applications. Nevertheless, they share one basic concept: they all determine the mass of protein products that can then be used to infer identity. Detection of peptides and proteins by MS is enabled by so-called “soft ionization” techniques, such as electrospray (ES) and matrix-assisted laser desorption/ionization (MALDI), which brings these molecules from the solid to the gas phase without destroying them. Protein identification generally does not occur by measuring the mass of the intact protein because its mass alone is usually not distinctive enough to allow a unique assignment. Rather, proteins are digested with proteases such as trypsin, producing peptides that can then be introduced in the mass spectrometer. This has two distinct advantages: first, it produces an assembly of masses (also called a “peptide fingerprint”), which collectively indicate a unique protein. Second, peptides can be fragmented in the mass spectrometer (much more easily than proteins), generating peptide fragments that can be used to deduce a peptide sequence. The latter approach increases the specificity of protein identification tremendously, especially if multiple peptides can be fragmented per protein. It is for this reason that this technique, known as “tandem MS” (where “tandem” denotes the subsequent mass analysis of a peptide and its fragments) now dominates in the field of proteomics. Collectively, a wide range of mass spectrometers are available, each of which can be used for specific research questions (e.g., protein quantification, PTMs, etc.). Table 1 lists the distinguishing features of some of the most current instruments. Table 2 describes their utility in the field of proteomics, along with some examples of how they have been applied in the area of developmental biology.
Characteristics of contemporary mass spectrometers
Mass spectrometers and their use in proteomics research
Although MS is one of the key components in proteomics, a number of other developments in accessory techniques have critically contributed to disclosing ever-increasing parts of the proteome. This includes many aspects of proteomic workflows, such as sample preparation, protein and peptide separation, and bioinformatic tools for protein identification, quantification, and data mining (Fig. 1). Some of the challenges that have been met (and that continue to be addressed) are relevant for any proteomic application, whereas others are of particular importance for developmental biology. Therefore, we will briefly mention some of these challenges as well as possible solutions, before describing their application in the field of stem cell biology and mammalian development.
Workflow for the proteomic analysis of biological samples, from sample preparation to biological validation. The indicated techniques are a nonexhaustive list of examples, which often are used in combination.
If two features stand out distinguishing the proteome from the genome, they are its complexity and dynamic range. Complexity indicates the enormous variation in protein entities that, by far, outnumber the number of genes in the genome. This is caused by the many ways proteins can be processed posttranscriptionally, e.g., by splicing, truncation, or one of several hundred other PTMs. In particular, the decoration of proteins by modifications in a combinatorial way increases the number of protein entities exponentially. For instance, the ~15 modifications that are known to occur in the tail of Histone 3 alone (Kouzarides 2007) would result in 215 (~32,000) possible combinations (at least in theory), thereby exceeding the number of genes in mammalian genomes. In turn, this number is dwarfed by proteins containing >100 phosphorylation sites (Gnad et al. 2007), which could produce up to 2100 combinations.
On top of this complexity comes dynamic range, indicating that protein concentration (or copy number) can vary widely among protein species. It is estimated that protein concentration in serum spans 12 orders of magnitude (Anderson and Anderson 2002). For yeast (Picotti et al. 2009) and mammalian cells (Schwanhausser et al. 2011) there is experimental evidence that proteins vary in copy number >6 orders of magnitude. Furthermore, there is no even distribution along this range, because in mammalian cells the 25 most abundant proteins make up 25% of the cellular protein mass, whereas the lower quartile is populated by thousands of proteins (Schwanhausser et al. 2011).
Together, this serves to illustrate that “identifying the human proteome” poses an enormous analytical challenge that will only begin to be tackled in the foreseeable future. This is not to say that meeting this goal only partially cannot be informative―on the contrary. Access to sizable parts of the proteome can be achieved by the implementation of protein and peptide fractionation strategies, or combinations thereof (Fig. 1). Fractionation by protein electrophoresis (Lundby and Olsen 2011), peptide chromatography (Motoyama and Yates 2008), or isoelectric focusing (Krijgsveld et al. 2006) results in reduced complexity per collected fraction, and increased chances to identify more proteins in a subsequent mass spectrometric analysis. Typically, this is the way to uncover low-abundant proteins, which tend to be the ones of highest biological relevance (e.g., kinases and transcription factors). This is the reason why two-dimensional (2D) gel electrophoresis of proteins has largely been replaced by liquid-phase approaches: 2D gels display up to several hundreds of unique (abundant) proteins, whereas a workflow combining protein electrophoresis and peptide chromatography identifies >5000 proteins (Graumann et al. 2008).
This gain in protein number is partly owing to miniaturization of chromatographic columns (~50 µm diameter) and the introduction of high-pressure liquid chromatography (HPLC) and ultra-high-pressure chromatography (UPLC) pumps capable of stably maintaining low flow rates (~100 nl/min). On top of this, technical developments in MS have dramatically increased the speed and sensitivity of data acquisition, sequencing 5–10 peptides/sec, or >15,000/h during a chromatographic separation. In combination with fractionation strategies, this results in the identification of many thousands of proteins (Mann and Kelleher 2008). These are not just figures that are of interest to proteomic technologists, but they bear an important message for (stem cell) biologists. Namely, they indicate that relatively low amounts of starting material (105–106 cells) are sufficient to generate proteomic data sets that include all classes of cellular proteins, which are very likely to contain biologically relevant information. It is anticipated that lower cell numbers are within reach (103–104 range), while maintaining high numbers of protein identifications. This means that rare cell populations obtained from tissue culture or fluorescence-activated cell sorting (FACS) are becoming accessible for meaningful proteomic investigation.
Protein function is often modulated by PTMs. Therefore there is a tremendous interest in identifying PTMs in a biological context. Although many modifications are known to occur in nature, phosphorylation may stand above them all because of its importance in signal transduction and cellular signaling. Even though phosphorylation is widespread, it is often substoichiometric and thus tends to be very low-abundant―and difficult to detect by MS. It is for these reasons that intense efforts have focused on enrichment techniques to isolate phosphopeptides selectively from nonmodified peptides, typically using metal-affinity chromatography (Villen and Gygi 2008; Pinkse et al. 2011). Such approaches have been used to identify many thousands of phosphorylation sites (Dephoure et al. 2008) and have been applied successfully to stem cell research (further described below).
Alternative methods studying PTMs include immunoprecipitation using antibodies, followed by MS. Unfortunately, only a limited number of antibodies are truly modification specific (i.e., disregarding any sequence specificity surrounding the modification). One of the few exceptions is antibodies against phosphorylated tyrosine residues, which have been applied successfully to interrogate specific signaling cascades (Blagoev et al. 2004; Ding et al. 2011). Other modifications have been targeted with moderate success, such as antibodies recognizing acetylated lysines. Nevertheless, recent studies have shown that lysine acetylation modification may be just as widespread as phosphorylation, fulfilling an equally important role in regulating protein function (Yang and Seto 2008); nearly all enzymes in primary metabolism (glycolysis, tricarboxylic acid [TCA] cycle, fatty acid metabolism) can be modified by acetylation, the vast majority of which was shown to be functional (Zhao et al. 2010). This shows that the availability of an analytical tool opens up new ways to study basic cellular processes, even for relatively high-abundant proteins that have been studied for decades. This may be crucial for studying many developmental processes, where cellular differentiation and reprogramming (and tumorigenesis) are accompanied by drastic changes in metabolism (Zhu et al. 2010; Levine and Puzio-Kuter 2011).
Another crucial development that has brought proteomic technology considerably closer to cell biology is the ability to quantify protein expression levels. Traditionally, MS has been strong in generating protein inventories; but these lists do not necessarily indicate which proteins are biologically relevant. This is preferably performed in a comparative fashion, contrasting with cells that were isolated from different biological states or that received different treatments. Because MS is not an inherently quantitative method, additional means need to be applied to achieve this goal. One possibility is to take the number of fragmented peptides per protein (“spectral count”) as an approximation for protein abundance. Although this may be useful for some applications, it generally suffers from low sensitivity to detect small differences in protein levels (a less than fivefold change) (Vaudel et al. 2010). A preferred strategy entails the introduction of stable isotopes (e.g., 2H, 13C, 15N, and 18O) into proteins of one sample, that then serve as internal standards for the (unlabeled) proteins in the reciprocal sample (Gouw et al. 2010; Washburn 2011). The principle is based on the premise that labeled and unlabeled proteins/peptides behave identically in the biological experiment and throughout the analytical process, the sole difference being in their mass that can be detected in the mass spectrometer. The relative intensity of a “light” and a “heavy” peak is then a direct measure for their protein abundance in the samples they are derived from.
Technically, proteins can be labeled by stable isotopes in various ways. Among the most powerful methods, cells can be labeled in cell culture in the presence of labeled amino acids (e.g., lysine and/or arginine). This method, known as stable isotope labeling with amino acids in cell culture (SILAC) (Mann 2006), has the advantage that full labeling of all proteins occurs metabolically, i.e., without the need for protein extraction and additional sample handling. Clearly, the prerequisite is that cells of interest can be grown in defined media: the presence of unlabeled amino acids in poorly defined supplements (e.g., serum) will compromise labeling efficiency. The use of dialyzed serum usually circumvents this problem. Another potential caveat is the metabolic conversion of (labeled) arginine into proline, thus diluting the label and complicating downstream data analysis. It should be noted, however, that conversion rates differ greatly between cell types, and therefore testing beforehand is advisable. For highly converting cells, the process can be inhibited by adding an excess of unlabeled proline (Bendall et al. 2008) or by changing the labeling regimen (Van Hoof et al. 2007). Several groups have successfully used SILAC to label ESCs (Van Hoof et al. 2007; Bendall et al. 2008; Graumann et al. 2008; Prokhorova et al. 2009), thus opening the way to use this powerful approach in developmental biology.
If SILAC is not possible or is impractical for a particular cell type, there is a range of possibilities to introduce the isotope label post isolation by chemical means (Bantscheff et al. 2007; Elliott et al. 2009). Without going into great detail here, this includes iTRAQ (Karp et al. 2010), ICPL labeling (Lottspeich and Kellermann 2011), reductive dimethylation (Boersema et al. 2009) and various others. One method is not necessarily better than the other; considerations in choosing either of these methods include: availability of appropriate mass spectrometric infrastructure, price tags of the reagents (some are only commercially available as kits, e.g., iTRAQ), and―pragmatically―experience that is present in the (collaborating) MS laboratory. Availability of appropriate software is becoming less of an issue, because packages accepting various labeling formats are being developed (Cox and Mann 2008; Mortensen et al. 2010).
In summary, it is clear from the above that proteomics has much to teach us about stem cell biology and development, in a highly complementary fashion to genomics, which lacks the ability to accurately chart the behavior and effects of proteins, and is blind to their modifications. However, the application of proteomics has been limited by (1) its requirement for large cellular input (only recently has it become possible to scale up stem cell production to levels appropriate for proteomic analysis, and have proteomics methods been adapted to lower cellular input), (2) its inability to quantify the outcome and compare relative levels of proteomic changes, and (3) limited access for most biologists to high-end proteomic platforms. All of these issues are now being addressed and it is expected that proteomic analysis will reveal much more about stem cell biology in the coming years when complemented with genomic analysis. Yet, the direct application of proteomics to the development of embryos will, for the time being, remain limited to species like zebrafish, fruit flies, and amphibians, in which it is possible to produce large numbers of identically staged embryos or tissues within a reasonable time span and cost. For mice and their embryos as well as for humans, it is likely that yet one higher order of magnitude in sensitivity is required before proteomic analysis, at least by MS, becomes feasible to implement widely in developmental biology and stem cells.
One of the strongest motivations to study cellular processes at the proteome level is the increasing awareness that transcript levels are poor indicators for protein abundance (Schwanhausser et al. 2011). This has been beautifully shown recently for differentiating stem cells, showing that transcription profiles, epigenetic marks, and protein levels were highly divergent (Lu et al. 2009). At the same time, this is one of the few papers describing the dynamics of protein expression during the early phases of differentiation at a relatively large scale. It follows earlier (mostly semiquantitative) studies, mostly studying cellular differentiation. For instance, the first large-scale MS-based study on differentiating hESCs compared the proteomes of undifferentiated hESCs with their derivatives formed after 12 days of undirected “spontaneous” differentiation into a heterogeneous population (Van Hoof et al. 2006). More than 700 of the total of nearly 2300 unique proteins found in both samples were identified as being present in undifferentiated cells only. Of these, 191 were also detected in mouse ESCs in a parallel analysis. Among these were several proteins that at the time were not previously known to be enriched in or specific for ESCs; some examples are proteins like TOP2A, MCM4, KPNA2, and Sall4. Although these numbers represent only a fraction of the actual proteome, the technique as such proved sensitive enough to detect ESC-associated low-abundance transcription factors. These included well-known pluripotency proteins like OCT4 and UTF1 (for review, see Van Hoof et al. 2008). Interestingly, based on their gene ontology, many of the identified proteins were annotated as nuclear, which is to be expected considering the high nucleus-to-cytosol ratio of ESCs. Even though subcellular locations were confirmed for a handful of these proteins using fluorescence microscopy, one of the major advantages of MS-based proteomics over transcriptome analysis, the ability to generate cell compartment-specific data sets, was not implemented in this study.
The nuclear proteomes of hESCs and neural stem cell analogs derived from them were, however, analyzed by a more conventional approach involving isolation of the nuclei by centrifugation before 2D difference gel electrophoresis. The proteins that were differentially expressed were then identified by MS (Barthelery et al. 2009). Although not as comprehensive as an unbiased, discovery-oriented MS-based analysis, this study identified CPSF6 as a novel potential hESC-specific protein.
In contrast to cytosolic and nuclear proteins, membrane-associated proteins―in particular, transmembrane proteins―are notoriously difficult to purify owing to their generally hydrophobic nature. Therefore, these proteins are likely to be underrepresented in global proteomics analyses. In one study, in an attempt to skew this bias for detection of hydrophilic proteins, samples of hESCs and their tumorigenic counterpart, human EC cells, were enriched for membrane-associated proteins using ultracentrifugation before MS analysis (Dormeyer et al. 2008). The high percentage of commonly expressed proteins confirmed similarities in expression patterns known to exist between hESCs and human EC cells, both of which are pluripotent, divide relatively rapidly, and self-renew by symmetrical division. The relatively few differences in surface proteins that were found might betray signaling pathways that are active specifically in carcinoma cells, shedding light on their tumorigenic behavior as opposed to the benign properties of pluripotent cells in the developing embryo.
The studies described above all used semiquantitative proteomics. However, as mentioned in the previous section, SILAC and iTRAQ are being used increasingly to quantitate the relative differences in protein levels between two samples. The use of SILAC has, however, been challenging because, unfortunately, some hESC lines show a high rate of arginine-to-proline conversion, which compromises the accuracy of SILAC-based quantitation when using heavy stable isotope-containing arginine for protein labeling in vivo. Two approaches have been developed to address this technical issue: inclusion of the “light” forms of the arginine in the “unlabeled” sample (Van Hoof et al. 2007), or lowering the arginine concentration in the medium to a minimum (Bendall et al. 2008). Two independent studies used a very similar quantitative MS approach to compare the proteomes of hESCs and their differentiated derivatives, formed after exposure of the undifferentiated cells to the BMP4-inhibitor noggin, which induces neuronal differentiation (Yocum et al. 2008; Chaerkady et al. 2009). Despite the comparable strategies and analytical methods used, only a small number of proteins were found to be commonly differentially expressed in the two studies. The lack of concordance is not typical for these two studies alone; it appears to be a widespread phenomenon that is often attributed to dissimilarities in differentiation propensities of individual hESC lines, the variety in MS strategies used, and “undersampling,” denoting the property that an MS experiment usually identifies a (random) portion of the sampled proteome.
Combining subcellular fractionation with quantitative MS showed its potential in a search for a cell-surface protein that would allow antibody-based purification of hESC-derived cardiomyocytes (Van Hoof et al. 2010). In this study, directed differentiation was used to induce cardiomyocyte formation from hESCs under SILAC conditions. Although the differentiated population did not consist exclusively of cardiomyocytes, the purified plasma membrane proteome of the cardiomyocyte-enriched populations was compared quantitatively to that of unlabeled undifferentiated hESCs to select differentially expressed surface proteins. In parallel, human fetal heart muscle cells were compared to find surface proteins commonly expressed by primary and stem cell-derived cardiomyocytes. The resulting data sets were then screened for overlapping proteins, identifying EMILIN2 as a candidate that turned out to be suitable for FACS of cardiomyocytes from the heterogeneous pool of hESC-derived differentiated cells. Unfortunately, the protein was not stably associated with the plasma membrane and could only be used to sort fixed but not live cells. The approach nevertheless, was shown to be feasible even for relatively low abundance surface proteins.
An area where MS has been particularly powerful is in the characterization of protein complexes and interactions. Tagging approaches have been designed to capture proteins of interest by affinity purification, followed by proteomic identification of interaction partners. Such studies have been extremely helpful in charting molecular networks, ranging from the local environment of individual proteins to a genome-wide description of protein–protein interaction networks (reviewed in Vermeulen et al. 2008; Wodak et al. 2009; Vidal et al. 2011).
Protein–DNA networks in stem cells have primarily been approached by ChIP-chip and ChIP-seq, taking individual proteins as initial baits, and locating where they bind in the genome. Regions of prime interest include the core pluripotency factors (Oct4, Sox2, and Nanog), in an effort to explain how transcription of these proteins may be modulated by regulatory interactors. Indeed, from these studies it has become apparent that, e.g., enhancer domains of the Oct4 locus interact with a large number of transcriptional regulators, such as Essrb, Tcf3, Zfx, and c-Myc (Young 2011). Recent proteomic screens, taking a more unbiased approach, have expanded these networks considerably by screening for proteins that directly interact with pluripotency factors. This has indicated that Oct4 (Pardo et al. 2010; van den Berg et al. 2010) and Nanog (Wang et al. 2006), either directly or indirectly, interact with an extensive set of proteins, including transcription factors, chromatin remodelers, and components of the basal transcriptional machinery. Many of these interactions have been shown to be functional, indicating that the activity of both Oct4 and Nanog is modulated by a large number of cofactors (Wang et al. 2006; Pardo et al. 2010; van den Berg et al. 2010). This notion also emerged from an elegant study, where affinity purification and lentiviral expression were combined to identify interactions between transcription factors and chromatin remodeling complexes (Mak et al. 2010). These studies are important to position proteins that are pivotal for pluripotency into biochemical context. Importantly, they complement approaches such as ChIP-chip and ChIP-seq, which study genomic localization of individual proteins. The recent protein interaction studies mentioned above have expanded and refined the circuitry controlling pluripotent cell identity, while identifying players potentially contributing to cellular reprogramming.
In addition to identifying direct protein interactions in a discovery-oriented manner, MS-based proteomics excels at finding out which signaling pathways are active in a cell, and which become activated or deactivated on external signals. Mapping such dynamic processes goes beyond simply detecting the presence of a protein and determining its subcellular location; it requires the identification of PTMs on these proteins—a task for which MS is exceptionally well suited.
Because phosphorylation is common, and probably one of the earliest PTMs that occur at the onset of differentiation, it was also the first PTM to be investigated in hESCs in a systematic way. An initial screen identified nearly 11,000 phosphorylation sites on more than 4000 proteins in hESCs (Swaney et al. 2009). Among these were several known pluripotency-associated proteins, including the transcription factors OCT4 and SOX2. OCT4 was found to be phosphorylated at serine residue 236 (Ser236), which lies within the DNA-binding domain, implying its involvement in transcriptional activity, whereas multiple phosphorylation sites were found for SOX2 (i.e., Ser246, Ser249, Ser250, and Ser251). However, because these cells were analyzed only in their undifferentiated state, the data set represented a static map of phosphorylated residues of the proteins identified, thereby limiting the interpretation of their biological significance to speculation.
Subsequent studies compared hESCs before and after differentiation to deduce phosphorylation associated with pluripotency and differentiation. Brill et al. (2009) charted the dynamics in phosphorylation that occur in hESCs treated for 4 days with retinoic acid, which induces efficient differentiation into a heterogeneous population of cells. Their semiquantitative analysis indicated that many components of the signaling cascades that are believed to be important for hESC self-renewal are phosphorylated in undifferentiated cells, among which were members of the EGF, VEGF, and PDGF pathways. Indeed, blocking each of the signal transduction pathways individually by inhibiting the activity of receptors at the top of these signaling chains resulted in major morphological changes in the cells in addition to reducing or inducing complete loss of the transcription factors OCT4 and NANOG.
Van Hoof et al. (2009) applied SILAC-based MS to quantify early phosphorylation changes that occur in hESCs at multiple time points (30 min, 1 h, and 4 h) on BMP4-induced differentiation. BMP4 is one of the most rapid inducers of differentiation in hESC (Pera et al. 2004); early derivatives eventually form mesoderm or trophectoderm. The resulting data set, consisting of more than 5000 proteins and 3000 phosphosites, provided quantitative as well as temporal information on the dynamics of phosphorylation events in hESCs during their exit from the pluripotent state and their commitment to a specific lineage. Subjecting the data to an algorithm that links phosphorylation motifs to kinases (Linding et al. 2007), CDK1/2 emerged as a central kinase regulating self-renewal and lineage specification. Furthermore, the largest group identified comprised nucleic acid-binding proteins and transcription factors; both were significantly reduced in differentiating cells, which indicates that these classes of proteins are highly represented in hESCs. Interestingly, the three consecutive serine residues of the transcription factor SOX2 that had been found to be phosphorylated (i.e., Ser249, Ser250, and Ser251) did not show altered phosphorylation levels in the differentiating cells, even though the protein itself was rapidly eliminated after the onset of differentiation. Instead, phosphorylation of these residues was associated with increased binding of the small ubiquitin-related modifier (SUMO) to the proximate lysine residue at position 245 (Lys245), as assessed by mutating the three serine residues to aspartic acids, thereby mimicking the negative charge resulting from phosphorylation. PolySUMOylation of a lysine residue usually targets the modified protein for proteasomal degradation (Gareau and Lima 2010). Interestingly, the combined lysine and serine residues closely match a defined phosphorylation-dependent SUMOylation motif also described for other SOX family members (Hietakangas et al. 2006). Therefore, the unwavering presence of SOX2 in undifferentiated cells, despite continuous phosphorylation and its likely subsequent SUMO-induced degradation in self-renewing hESCs, suggests that the levels of this transcription factor are tightly controlled by a fine balance between translation and degradation. The proposed importance of a consistent level of this core transcription factor in hESCs is in line with that already observed for OCT4: a twofold increase of the latter initiates differentiation into primitive endoderm and mesoderm, whereas a decrease results in the formation of trophectoderm (Niwa et al. 2000).
Rigbolt et al. (2011) reported an impressive 6500 different proteins and 23,500 phosphorylation sites identified in differentiating hESCs that were exposed to either nonconditioned medium (NCM) or a diacylglycerol analog (phorbol 12-myristate 13-acetate, PMA). These extremely high numbers of identifications illustrate the rapid progress in size and biological relevance of data sets that can be generated nowadays with MS-based proteomics since the first hESC proteome was profiled five years earlier (Van Hoof et al. 2006). SILAC permitted quantitative cross-comparison between the two conditions in the study of Rigbolt et al. (2011), revealing that, irrespective of the type of differentiation induced, serine residues within basic or acidic amino acid-rich motifs generally became progressively phosphorylated, whereas those with an adjacent proline residue showed reduced phosphorylation. In contrast, a decrease in SOX2 phosphorylation was observed for only Ser246, Ser249, and Ser251 under NCM conditions; phosphorylation ratios remained unchanged on treatment with PMA. Concomitantly, the level of SOX2 protein increased in the cells differentiating in NCM, whereas the level in those differentiated in the presence of PMA decreased. This implies that, when these residues are unphosphorylated, the protein is protected from degradation, whereas phosphorylation does the opposite. This is in agreement with the earlier hypothesis that relates the introduction of an acidic group to the trio of serine residues through phosphorylation to an increase in SUMOylation of Lys245, followed by proteasomal degradation of SOX2 (Van Hoof et al. 2009). Checking the individual serine residues within the triplet independently, we found that mutating Ser249 and Ser251 to aspartic acids increased SUMOylation, whereas mutating Ser250 did not (Fig. 2). Combined, these data fit a model describing regulation of SOX2 protein levels within hESCs and differentiating cells that is dependent on multiple PTMs, i.e., phosphorylation and SUMOylation, the latter of which is causally related to the former (Fig. 3).
Western blots showing wild-type SOX2 and SOX2 mutants expressed in HeLa cells. HeLa cells that were transfected with 6 × histidine-tagged SUMO2 in combination with either wild-type SOX2 (lane 2) or SOX2 mutants where serine residues 249, 250, ...
Rapid advances in contemporary proteomics methodologies and commercial media preparations for cell expansion and differentiation are now making it possible to scale up cell production at every stage of development. This is independent of whether cells are undifferentiated, at lineage progenitor stages, or are in the process of reprogramming to iPSCs. Genomic and proteomic assays can now be performed simultaneously and with increasing sensitivity on the same samples. Direct genome-wide comparisons of the transcriptome with the proteome will increasingly lead to integrated analyses of stem cells before and during differentiation so that systems biology approaches will be feasible. As importantly, it will be possible to integrate data on the epigenetic status of cells so that their self-renewal can be maintained, not only when the cells are undifferentiated and pluripotent, but also at lineage progenitor stages. Neural progenitors from pluripotent stem cells are at present the only progenitors from pluripotent stem cells that can be expanded robustly in culture. This has a great advantage for future clinical applications, because it does not require repeated return to the undifferentiated cells, which have the capacity for teratomas after transplantation.
Among the specific stem cell questions that still need to be addressed are:
As further refinement takes place and proteomics increases in sensitivity, it may one day become possible to analyze the proteomic status of embryos directly with a limited amount of tissue or cells so that issues of erroneous imprinting in development can be addressed proteome and genome wide. Until that time, stem cells in their various forms will provide a much needed source of new and exciting information on the control of self-renewal and directed differentiation.
Research in the Mummery Laboratory is supported by the Netherlands Proteomics Consortium (NPC: 050-040-250). J.K. is supported by a Vidi grant from the Netherlands Organisation for Scientific Research (NWO).
Editors: Patrick P.L. Tam, W. James Nelson, and Janet Rossant
Additional Perspectives on Mammalian Development available at www.cshperspectives.org