|Home | About | Journals | Submit | Contact Us | Français|
Stem cell differentiation and the maintenance of self-renewal are intrinsically complex processes requiring the coordinated dynamic expression of hundreds of genes and proteins in precise response to external signalling cues. Numerous recent reports have used both experimental and computational techniques to dissect this complexity. These reports suggest that the control of cell fate has both deterministic and stochastic elements: complex underlying regulatory networks define stable molecular ‘attractor’ states towards which individual cells are drawn over time, whereas stochastic fluctuations in gene and protein expression levels drive transitions between coexisting attractors, ensuring robustness at the population level.
Stem cells have a crucial role during development, tissue regeneration and healthy homeostatic cell turnover. Collectively, all stem cells share the ability to self-renew and differentiate into various different lineages. Embryonic stem (ES) cells — which are derived from the inner cell mass of the developing blastocyst — are pluripotent, whereas stem cells derived from adult tissues are generally only multipotent, maintaining a limited, tissue- specific, regenerative potential1–3. In culture, ES cells can be propagated indefinitely, whereas adult tissue stem cells are more limited in this capacity.
Owing to their ability to generate tissue de novo following disease or injury there is widespread hope of developing stem cell-based therapies for various degenerative diseases4. Recent reports have indicated that differentiated adult cells can easily be reprogrammed to an embryonic-like pluripotent state5,6 (thus potentially providing a patient-specific source of pluripotent stem cells), as well as be reprogrammed to other adult cell types without intermediate reversion to a pluripotent state7,8. These findings have served to intensify interest in understanding the molecular basis of cell fate regulation and the potential therapeutic uses of stem cells9. However, before such stem cell-based therapies can be routinely and safely developed, numerous crucial issues must be addressed. In particular, although great progress has been made towards understanding the roles of the homeodomain transcription factors OCT4 (also known as POU5F1) and NANOG, as well as SRY box-containing factor 2 (SOX2) in the maintenance of stem cell pluripotency10–16 (BOX 1), the extended molecular mechanisms of ES cell fate control have yet to be fully determined.
Maintenance of pluripotency and self-renewal in embryonic stem (ES) cells is controlled by a complex interplay between signalling from the extracellular environment and the dynamics of core transcription factors. Although self-renewal signalling pathways differ between mice and humans123, the core transcriptional circuitry seems to be remarkably conserved. In particular, the homeodomain transcription factors OCT4 (also known as POU5F1) and NANOG, as well as SRY box-containing factor 2 (SOX2), form a transcriptional module that has a central role in maintaining ES cell identity both in mice and humans10,11,13–16 (see the figure). This module is rich in positive feedback and feedforward loops (see also BOX 2). In particular, OCT4 and SOX2 form a heterodimer that positively regulates the expression of the Pou5f1 (which encodes OCT4), Sox2 and Nanog10,11,124,125. In addition, NANOG also interacts directly with OCT4 (not shown)17 and positively regulates the expression of all three genes11. Thus, these three transcription factors regulate their own and each other’s expression in a highly coordinated manner, involving positive protein–protein and protein–DNA feedback loop interactions. Furthermore, all three transcription factors co-occupy numerous developmentally important genes and repress the expression of the genes involved in lineage commitment. These include: Hand1 (heart and neural crest derivatives-expressed 1), eomesodermin (Eomes) (both involved in trophectoderm development); Lhx5 (LIM homeobox 5), Otx1 (orthodenticle homologue 1), Hoxb1 (all involved in ectoderm development); Myf5 (myogenic factor 5), T (brachyury protein homologue), Gsc (goosecoid) (all involved in mesoderm development); and Foxa2 (forkhead box A2) and Gata6 (GATA-binding protein 6) (both involved in endoderm development). At the same time, OCT4, NANOG and SOX2 activate genes that are associated with self-renewal and pluripotency, including other ES cell-associated transcription factors such as Tcl (T cell leukaemia/lymphoma), Tbx3, Rest, Zic3, Hesx1 (homeobox expressed in ES cells 1), Stat3 (signal transducer and activator of transcription 3), Rex1 (also known as Zpf42), Sall4, Tcf3 and Dax1 (also known as Nr0b1)11. Thus, OCT4, SOX2 and NANOG are central to the maintenance of ES cell identity; appropriate expression of this protein trio holds the cell in a pluripotent self-renewing state by activating other ES cell-specific genes and repressing genes that are associated with lineage commitment, and loss of expression leads to loss of the self-renewing ES phenotype and commitment to differentiation. Dotted arrows denote potential feedback mechanisms from downstream targets back to the core circuit.
Mathematically, a network is a data structure known as a graph35, consisting of a set of nodes and a set of edges or arcs (which are directed edges) that connect the nodes in the graph. In the context of biological regulatory networks, theNatur nodeReviews set represents the list of molecular components in the network (for example, genes and proteins) and the edge set describes functional relationships between the nodes. So, in a protein–protein interaction network, nodes represent proteins and edges represent physical interactions between proteins, whereas in a transcriptional network nodes represent transcription factors and arcs represent functional regulation of transcription. The figure shows examples of simple directed and undirected graphs. In each case, there are three nodes, labelled 1, 2 and 3, and edges and arcs are coloured black. On the left is an undirected graph on the mutually connected nodes. In this case, the edges have no specified direction and so are drawn without arrowheads. In the middle is a three-node feedback loop. In this case the nodes regulate each other in a directed cyclic manner. On the right is a three-node feedforward loop. In this case, node 1 regulates nodes 2 and 3, and node 2 also regulates node 3. Feedback and feedforward loops such as these are common in transcriptional regulatory networks and can give rise to complex dynamic behaviour. For example, the presence of a positive feedback loop is a necessary condition for the existence of multiple stable stationary states74 (see BOX 3).
The molecular state of a cell can be described by its state vector s(t) = [m1(t),m2(t),m3(t),…,mn(t)], in which mn(t) denotes the concentration of the ith molecular component at time t.The set of all possible molecular configurations is called the ‘state space’. A dynamical system is a mathematical description of how a system’s state vector changes over time based on the interactions between all the various components in the system (in the form of a set of coupled differential or difference equations, for example). Owing to the coupling between molecular components, the expression levels of the different components in a dynamical system generally change over time in a coordinated way, and this coordination restricts the trajectories that the system may take in state space over time. An attractor of a dynamical system is a minimal subset of state space A, such that all trajectories starting in the vicinity of A approach A eventually. Intuitively, attractors can be thought of as stable preferred states in which all the various interactions in the system are balanced and towards which the system is drawn over time. Attractors can be fixed points, corresponding to static stationary states, or more complex sets, corresponding to dynamic states such as limit cycles (oscillators) or strange (chaotic) attractors79. For a given attractor A, the subset of state space NA for which all trajectories starting in NA approach A for large time is known as the ‘basin of attraction’ of A. Some dynamical systems have many coexisting attractors, in which case the system is said to be ‘multi-stable’. The basins of attraction of the various attractors in a multi-stable system partition the state space into discrete pieces. As stationary attractors can intuitively be associated with the minima of an ‘energy-like’ function79, in the context of cellular differentiation this partitioning is sometimes referred to as the attractor landscape69 (see also FIG. 2).
To begin to deconstruct these intrinsically complex regulatory mechanisms it is now common for stem cell studies to combine low-throughput experimental techniques with an ever-increasing range of different high-throughput experimental techniques. Consequently, stem cell studies now often produce large amounts of data, and integrating these data into a coherent quantitative picture of cell fate control at the systems level is an important current research challenge. To address this challenge several groups have begun to apply systems biology approaches to understanding the regulation of stem cell fate decisions11,12,17,18.
Instead of focusing on the role of individual genes, proteins or pathways in biological phenomena, the aim of systems biology is to characterize the ways in which essential molecular parts interact with each other to determine the collective dynamics of the system as a whole19–23. However, it is difficult to understand collective behaviour in complex systems using experimental approaches alone. Therefore systems biology approaches often employ high-throughput experimental techniques alongside theoretical and computational methods, which are specifically designed to dissect collective phenomena in complex systems24–26. Although to date systems biology approaches are mostly successful in lower organisms, such as yeast27,28 and bacteria29,30, the complexity of mammalian stem cell biology as well as the experimental reproducibility of many stem cell systems makes mammalian stem cell biology a good platform for the development of future systems biology techniques. In the context of stem cell biology, the aim of systems biology approaches is to characterize the molecular components involved in stem cell self-renewal and differentiation along specific lineages — from core transcription factors and the genes they regulate to proteins and protein complexes to microRNAs (miRNAs) and other epigenetic marks — and elucidate their functional interactions. The ultimate goal is to understand the dynamic behaviour of the resulting molecular circuits and elucidate how these circuits control cell fate changes. In the context of cellular reprogramming, systems biology approaches aim to use advances in the understanding of the molecular basis of normal cell fate decisions during development to generate strategies for the experimental conversion of adult cells from one type to another.
In this Review we discuss a range of ways in which high-throughput experimental techniques and computational methods are being fruitfully combined towards the development of stem cell systems biology approaches. We begin by outlining how data from high-throughput experiments can be used to reconstruct accurate stem cell regulatory networks. However, as stem cell regulatory circuits are typically intricate and contain highly nested feedback loops and feedforward loops that give rise to complex dynamics, it is difficult to elucidate cell behaviour from this regulatory circuitry. Therefore, we also discuss how computational techniques can be used to relate dynamic cell behaviour to regulatory architecture. In particular, we focus on how cell types can be thought of as balanced states or ‘attractors’ of underlying regulatory networks and the ways in which stochastic and deterministic mechanisms interact to define cell fate. We conclude with some suggestions of directions for future work in this area, including ways in which these notions might be used to better understand cellular reprogramming.
Molecular biology has entered the high-throughput age. Consequently, it is now typical for stem cell studies to make use of various disparate high-throughput techniques to determine the molecular mechanisms of cell fate specification. These techniques include: micro-arrays to assess genome-wide mRNA expression; high-throughput chromatin immunoprecipitation (ChIP) such as ChIP-on-chip31, ChIP-seq (ChIP-sequencing)32 and ChIP-PET (ChIP-paired-end-ditag)10 to assess protein–DNA interactions; and mass spectrometry proteomics33 and phosphoproteomics34 to assess the protein composition of molecular complexes and global changes in post-translational modifications. Because high-throughput techniques measure system-wide expression patterns, rather than focusing on the behaviour of key molecular elements, their development has driven increasing interest in systems biology approaches to understanding cell behaviour22. An important challenge in this area is how to best integrate the wealth of data that high-throughput studies produce into both a coherent qualitative and quantitative understanding of cell behaviour at the systems level. One approach to dissecting this complexity is to represent the underlying stem cell molecular regulatory mechanisms as ‘networks’.
To make sense of complex biological datasets it is becoming common to represent molecular components and their interactions as networks and apply techniques from the mathematical theory of graphs35 to their analysis (BOX 2). The combination of high-throughput experiments and the representation of high-dimensional data in the form of networks is the basis of much of modern systems biology. This integrated experimental–theoretical approach has greatly enhanced our understanding of a wide range of complex mammalian biochemical systems36, including signalling networks37,38, protein interaction networks39,40 and genetic regulatory networks41. Representing complex biological systems as networks is useful as it provides a formal way to combine different types of biological datasets into a single conceptual framework42.
Two key elements are required to construct a biological regulatory network: a list of molecular parts (such as sets of genes, proteins or miRNAs) and a set of regulatory interactions between these parts (for example, activation or inhibition of expression). The molecular parts lists that are needed to construct a regulatory network typically come from data derived from high-throughput experiments (for example, sets of genes that are differentially expressed in treated or control conditions). Physical interactions between elements in the molecular parts list can be identified by techniques such as yeast two-hybrid screens or affinity purification followed by mass spectrometry (AP–MS); however, the regulatory nature of such interactions cannot be determined by these methods alone. Extracting putative functional connections between elements in the parts list often requires some computational input, either by reverse engineering regulatory networks from specific experimentally derived datasets using sophisticated computational inference techniques41,43,44 or by comparing experimentally derived expression patterns with databases of interactions collated from the published literature45,46.
Recently, several examples of how high-throughput experimental techniques can be used to infer regulatory networks have been documented in the published literature on stem cells. For example, Wang and co-workers17 derived a high-quality protein–protein interaction network for pluripotency in mouse ES cells that is centred around the core stem cell transcription factor NANOG (FIG. 1a). To construct this network they adopted an iterative proteomics approach in which proteins that physically associate with NANOG and NANOG-associated proteins were identified using AP–MS. By doing so, they identified a complex network that is highly enriched in stem cell-specific transcription factors, many of which transcriptionally regulate the expression of other members of the protein–protein interaction network. This indicates that stem cell fate control is highly combinatorial and involves coordinated interactions between key transcription factors and the genes that encode them.
Similarly, numerous groups have used high-throughput ChIP techniques to identify targets of core ES cell transcription factors, including NANOG14,16, OCT4 (REF. 15) and SOX2 (REF. 13), and thereby reconstruct core ES cell-specific transcriptional circuits that are centred around these (and other) factors10–12,47–50 (FIG. 1b). Furthermore, recent reports have also connected miRNAs51 and key stem cell signalling pathways32 to the core ES cell transcriptional circuit. These reports are useful because, by finding functional associates of known core factors, they produce a detailed dissection of the stem cell molecular regulatory core and thus provide the basis of a systematic understanding of the control of stem cell fate. However, by focusing on interactions involving a small number of central transcription factors, these studies are also limited in their ability to elucidate the extended molecular regulatory networks that underpin cell fate decisions. Nevertheless, although techniques such as AP–MS and high-throughput ChIP inevitably identify numerous false positive interactions, as is observed by the often poor overlap of results from comparative studies52,53, the datasets of inferred interactions produced by these techniques can be useful for the generation of hypotheses if used with caution.
In contrast to these focused experimental studies, a recent report by Müller and co-workers18 used a new computational approach to reconstruct an extended stem cell regulatory network. First, they generated a database of global gene expression patterns in approximately 150 samples of pluripotent, multipotent and differentiated human cell types, and named this database the ‘stem cell matrix’. Using a computational clustering technique, they found that undifferentiated pluripotent stem cells samples, including ES cells and induced pluripotent stem (iPS) cells, strongly clustered together on the basis of gene expression. Then, they used a graph theoretic algorithm known as MATISSE (module analysis via topology of interactions and similarity sets)54 and identified a putative pluripotency network, which they named PluriNet. This was achieved by searching for connected sub-networks involving pluripotency-related factors from a previously compiled background network of human protein–protein and protein–DNA interactions, including those in the NANOG interactome described by Wang and co-workers17. PluriNet is an undirected graph (that is, regulatory directions and effects such as activation or inhibition are not specified), and many interactions have yet to be directly experimentally characterized in any specific cell type. Despite this, the approach of Müller and co-workers18 is useful because it provides a formal way to ‘project’ experimentally derived datasets onto previously compiled databases and interpret new findings in the context of known biological processes45,46. Efforts such as PluriNet are inevitably works-in-progress; as our understanding of the molecular mechanisms of cell fate control becomes increasingly detailed, it will become important that prior knowledge is appropriately used, tested for reliability and organized in a coherent, structured and user-friendly manner so that new results can be assessed appropriately in light of previous data.
With this in mind, we have constructed a database of directed transcriptional interactions in ES cells as a supplement to this Review (see Integrated Stem Cell Molecular Interactions database). This repository currently integrates the data presented in 12 recent publications10–12,31,47,49–51,55–58, which collectively report high-throughput ChIP profiling experiments for 20 transcription factors that are known to have a central role in ES cell fate regulation. In total, the repository currently contains 50,250 putative transcription factor–gene interactions that have been identified specifically in ES cells. We connected the 20 core transcription factors to their gene targets and formed a directed background network (BOX 2). This network is highly dense and rich in feedback and feedforward loops; this indicates that many of the 20 core transcription factors share target genes, which suggests combinatorial regulation of gene expression. To generate a more focused network we collated a shortlist of 264 genes that are known to have an important role in the maintenance of ES cell self-renewal, pluripotency, cell cycle progression and differentiation along all three germ layers (mesoderm, endoderm and ectoderm). By searching for shortest paths between nodes in the background network we obtained two subnetworks: a network containing 156 mutual interactions between the 20 core transcription factors (FIG. 1b) and a network containing 1,739 links connecting the 264 shortlisted genes. Although directed, most of the interactions in these subnetworks are not signed (that is, regulatory effects such as activation or inhibition are not provided). However, regulatory effects can be inferred from studies that combine ChIP experiments with mRNA expression profiling that is obtained following loss-of-function experiments.
We have provided this initial database of interactions and the background and focused sub-networks as a web-based resource to accompany this Review (see Integrated Stem Cell Molecular Interactions database). On this site users can navigate from node to node and examine how target genes are co-regulated by core transcription factors. We invite the stem cell community to deposit additional interactions in this repository as they are reported and thereby continually improve this resource.
The work of Wang and co-workers17, our initial transcriptional interaction repository and the work of Müller and co-workers18 illustrate two different (and complementary) approaches to determining stem cell regulatory architecture. In the first approach precise experimentation is used to elucidate high-confidence functional interactions among a limited number of key components. In the second approach extended networks are generated by inferring interactions between numerous components using computational methods without direct experimental validation. It will be important to elucidate the nature of extended regulatory networks while maintaining high confidence in the inferred interactions. Perhaps the most promising way to address this issue is the combination of meticulous experimentation with computational inference, in which computational techniques are used to infer initial interaction networks and experimental techniques are used to validate inferred interactions and refine network structure59. In this regard, integration of large-scale RNA interference screens58,60,61, in which thousands of genes may be systematically and accurately individually silenced in a cell population, with high-throughput ChIP experiments and subsequent genome-wide expression profiling will be particularly useful in validating inferred regulatory interactions and their effects.
In summary, evidence suggests that ES cell fate is controlled by a core transcriptional circuit enriched in feedback and feedforward loops that itself is part of a much more extensive and highly complex dynamic regulatory network involving protein–protein interactions17, additional transcription factors12, signalling pathways32, miRNAs51 and other epigenetic modifiers56. This complexity is central to the cell’s ability to respond in a flexible way to disparate exogenous stimuli; however, it also makes it extremely difficult to determine cell behaviour from the regulatory network structure. Therefore, in the following section we discuss ways in which mathematical models can be used to make sense of this complexity and link molecular regulatory architecture to cell behaviour. In particular, we focus on the ways in which notions from dynamical systems theory can be used to interpret cell types as balanced states or attractors of underlying regulatory networks, and how molecular noise has a role in defining cell fate by triggering stochastic transitions between coexisting attracting states.
Consider the core stem cell transcriptional network given in FIG. 1b. Although this network is small, it is highly complex, containing many feedback and feedforward loops, and this complexity makes it extremely difficult to determine how this network behaves (that is, how it controls stem cell fate). To begin to elucidate how the architecture of this network relates to stem cell fate, we first note that this network is not static but instead encodes the essential topology of a complex dynamical system (BOX 3) in which transcriptional activation and inhibition may loosely be thought of as ‘forces’ that push and pull the cell in different genetic directions. Thus, the state of a cell is determined by its transcriptional (or, more generally, its molecular) expression profile, which in turn depends dynamically on the regulatory interactions that are encoded in its underlying molecular regulatory architecture.
Mathematical models can help to better understand the molecular basis of cell behaviour and can be approached at various different levels62–64. For example, coarse-grained models, such as Boolean networks, that assume that genes adopt a binary ‘ON’ or ‘OFF’ state and regulate each others’ expression through simple Boolean functions are useful in determining the collective behaviour of large complex regulatory networks65. By contrast, differential equation models, which focus in detail on smaller regulatory circuits in which additional information (such as mRNA and protein production and degradation rates) are known, are useful when examining the fine details of regulatory dynamics36. The most successful examples of integration of mathematical modelling with experimental approaches are from model organisms, such as yeast and bacteria64, which have generated a wealth of data. However, advances in experimental techniques are now increasingly facilitating the development of mathematical models of mammalian cell fate control by providing the required data. This has lead to an increasing interest in the application of techniques that have been developed in model organisms to mammalian cell biology66–72.
The central notions of this joint theoretical–experimental approach to cell fate go back to the 1940s, to the work of the physicist Max Delbrück73,74 and the developmental biologist Conrad Waddington75,76. Over 50 years ago Waddington presented his now famous ‘epigenetic landscape’ as a conceptual picture of development75,76. Waddington’s view was that development occurs similarly to a ball rolling down a sloping landscape containing multiple ‘hills’ and ‘valleys’: as development progresses, cells take different paths down this landscape and so adopt different fates, and uncontrolled differentiation does not occur because the hills act as barriers by separating the landscape into distinct valleys (cell types). So, in this view, differentiation is not terminal, but instead different cell states are maintained by epigenetic barriers that can be overcome given sufficient perturbation. In Waddington’s words “This ‘landscape’ presents, in the form of a visual model, a description of the general properties of a complicated developing system in which the course of events is controlled by many different processes that interact in such a way that they tend to balance each other.” (REF. 77). Although Waddington viewed this epigenetic landscape as a qualitative conceptualization of development, the idea that cell types may be related to ‘balanced states’ of an underlying regulatory system bears a striking resemblance to the modern mathematical notion of attractors of dynamical systems78,79.
Although formal mathematical definitions are complicated78, broadly speaking an attractor can be thought of as a balanced state (or set of states) towards which a system will converge given sufficient time79. Consider, for example, a marble at rest at the bottom of a bowl. If perturbed away from the bottom of the bowl, the marble will track out a transient trajectory around the sides of the bowl, only to finally come to rest at the bottom again: the default resting position of the marble is at the bottom of the bowl, and this is the state towards which the marble is attracted to regardless of where it starts in the bowl. Consider now the more interesting case of a marble rolling around a smooth convoluted surface with many hills (local maxima) and valleys (local minima), as illustrated in FIG. 2. Now the marble will come to rest at the bottom of one of many possible valleys, with its final resting place depending on its starting position and the nature of the particular perturbation that displaced it from its initial state. In this case, each local minimum is an attractor of the marble’s dynamics, and we might refer to the surface as a whole as the attractor landscape.
More formal descriptions of dynamical systems, attractors and attractor landscapes are given in BOX 3; however, this intuitive picture of attractors as local minima of a complex ‘energy-like’ landscape is conceptually informative, and the parallels with Waddington’s epigenetic landscape are clear. In the context of cellular differentiation, an attractor is an internal molecular state (or set of states) towards which the cell is drawn, in which all the molecular forces that are pushing and pulling the cell in different molecular directions are balanced. Thus, attractors correspond to stable molecular configurations and have accordingly been associated with different cell types65,80. A system that exhibits many coexisting attractor states is said to be a multi-stable system; the notion that different cell types may correspond to different stable states of an underlying multi-stable regulatory system was first suggested by Delbrück73,74. The notion that cell types might correspond more generally to attractors of ‘high-dimensional’ regulatory networks was first proposed by Kauffman65,80 and has been examined extensively in the theoretical published literature since the late 1960s36,65,80–83.
Despite this longstanding theoretical interest, direct experimental evidence that different cell types might correspond to attractors of multi-stable genetic regulatory networks has been provided only recently66,68,84–89. For example, in 2005 Huang and co-workers85 provided the first evidence that mammalian cell types might correspond to attractors of a high-dimensional dynamical system. To do so, they took advantage of the fact that human HL60 promyelocytic progenitor cells can be triggered to differentiate into neutrophils in vitro if they are stimulated with all-trans retinoic acid (ATRA) or dimethylsulphoxide (DMSO). By taking samples for microarray analysis at different time points during the differentiation process, they showed that ATRA and DMSO initially triggered different genetic responses, which, however, ultimately converged over time to a common stable pattern of gene expression. This ‘homing in’ is characteristic of an attracting state and suggests that the HL60 neutrophil state is an attractor of an underlying molecular regulatory network. Similarly, others have shown that if sub-optimal ATRA stimulation is removed before commitment is complete, HL60 cells do not differentiate into neutrophils but instead revert back to the promyelocytic state66. Stability in the face of weak perturbations is another hallmark of an attracting state, so this work indicates that the HL60 promyelocytic state is also an attractor of an underlying molecular regulatory network.
In the context of mammalian ES cell biology, although direct evidence for attracting states has not yet been provided, indirect evidence for a self-sustaining self-renewing state in mouse ES cells was recently provided by Ying and co-workers90. They showed that, if shielded from inductive stimuli through fibroblast growth factor receptor and extracellular signal-regulated kinase signalling and treated with a glycogen synthase kinase 3 inhibitor to restore viability, mouse ES cells can self-renew in the absence of additional maintenance factors such as leukaemia inhibitory factor (LIF). This indicates that the ES self-renewing state is a self-sustaining ‘ground state’ of the core transcriptional circuitry.
The notion that cell types correspond to attractors of underlying regulatory networks is appealing from a systems biology point of view, as attracting states do not depend solely on individual regulatory elements, but rather result from the collective behaviour of the cell’s molecular regulatory circuitry as a whole. However, in many circumstances cell phenotypes are not well defined, and there might be substantial variability between cells, even within a clonal population in a homogeneous environment91. Thus, in addition to a deterministic control by molecular regulatory circuits, it has long been suggested that cell fate specification also has a stochastic element92–95.
Gene expression is an inherently ‘noisy’ process96: owing to the stochasticity of molecular processes, such as transcription and translation (intrinsic noise), and the effect of environmental noise on these processes (extrinsic noise), gene and protein expression levels in a given cell are continuously fluctuating97. As molecular noise can markedly affect cell behaviour, cells have adapted a range of sophisticated mechanisms to control molecular noise98. For example, they use molecular mechanisms, such as the 26S proteasome, to buffer noise by targeting transcriptional pre-initiation complexes for degradation99,100. Furthermore, cells can use epigenetic regulatory agents, such as polycomb group repressors56, to restrict the transcriptional activation of developmental genes101. In addition, they use regulatory network motifs, such as negative feedback loops, to modulate the levels of noise102,103.
In the context of stem cell differentiation, there has been a longstanding interest in the role of stochasticity in determining cell fate92–95. For example, in the early 1960s McCulloch, Till and Siminovitch92 examined the distribution of stem cell-like colony forming units (CFUs) in the spleens of irradiated mice following the injection of a suspension of adult mouse bone marrow cells. They found that the proportion of CFUs per colony varied greatly from colony to colony and was consistent with a ‘birth’ and ‘death’ process in which cell fate decisions (that is, to differentiate or self-renew) were made stochastically. Similarly, in the 1980s Ogawa and co-workers93,95 studied pairs of cells derived from single haematopoietic progenitors (‘paired progenitors’). They showed that if isolated and allowed to form separate colonies in vitro, paired progenitors show remarkably variable and seemingly uncorrelated patterns of differentiation.
These classic papers suggest that, in addition to deterministic control by an underlying regulatory network, stem cell fate specification also has an intrinsically stochastic element. Furthermore, numerous reports have suggested that rather than being a destabilizing force to be minimized, molecular noise can have a positive role in determining cell fate104,105. A key observation in this regard is that molecular noise can give rise to robust heterogeneity at the cell population level96,104 by triggering stochastic transitions between coexisting attractor states84 (BOX 4). For example, it has been suggested that in microorganisms noise-driven heterogeneity in a clonal population allows adaptation during times of stress without the need for genetic mutations, by providing a means for individual cells to ‘explore’ different phenotypes in a dynamic manner104,105. This view is supported by the observation that in yeast the expression of proteins involved in responses to environmental changes are more noisy than those involved in protein synthesis106.
Consider the simple motif in which two transcription factors activate their own expression and mutually repress each others’ expression (see the figure, part a). This type of feedback naturally gives rise to multi-stability86,126 and provides the cell with the ability to make all-or-none fate decisions in response to external cues. The following stochastic differential equations describe the expression levels of two transcription ) factors (x1 and x2 that are interacting in this way:
In these equations k1, k2 and k3 are the (normalized) rate constants at which transcription factors bind to promoters; K1 and K2 are (normalized) dissociation rate constants; b1 and b2 are (normalized) decay rate constants; σ1 σ2 and are constants determining the amplitude of noise in the system; and W denotes a Weiner process (Brownian motion). In this simple illustrative case we have assumed that each transcription factor binds cooperatively to its own promoter and to that of the other transcription factor as a homodimer (which is why x is raised to the power of two). In the absence of molecular noise (σ1=σ2=0) this model has many coexisting steady state attractors (for appropriate parameter regimes). In the presence of molecular noise (σ1, σ2>0), individual cells do not settle at a single attractor but instead stochastically switch between distinct states at a rate that depends on the amplitude of molecular noise. However, over time the joint probability density p(x1, x2) (that is, the probability of finding a cell with expression levels of (x1, x2)) settles to a stationary state, and a robust distribution of cell types is achieved. The figure (part b) shows the stationary probability distribution for a representative simulation of this system: red hot spots indicate preferred genetic configurations at which cells will accumulate, and blue indicates low probability configurations.
Evidence that a similar mechanism might give rise to heterogeneity in mammalian progenitor cell populations, allowing dynamic ‘priming’ of progenitors towards different lineages, has recently been provided by Chang and co-workers84. To do so, they studied heterogeneity in the expression of the stem cell surface marker SCA1 (stem cell antigen 1) in a clonal population of EML mouse multi-potent haematopoietic cells84. First, using flow cytometry they found that in EML cells SCA1 expression exhibits a characteristic bimodal distribution. Then, to probe the origin of this heterogeneity they used flow cytometry to isolate cells with the highest, middle and lowest SCA1 expression for further culture. Surprisingly, they found that over time all three selected fractions reconstituted the parental bimodal distribution. With the aid of mathematical analyses they identified discrete noise-driven transitions between two underlying and coexisting attracting states as one source of this universal reconstitution. This report is interesting, as it suggests that cells do not have a rigidly fixed identity but instead can transition stochastically between coexisting attracting states at a rate that depends on transcriptome-wide noise levels. However, crucially, at the population level the fraction of cells in the vicinity of each of the attracting states remains fixed in the long term (BOX 4). Thus, although cell identity might be somewhat indeterminate at the single cell level, the distribution of cell types at the population level is robust.
The notion that cell fate is controlled by the interplay between deterministic regulatory mechanisms and stochasticity is not new: a similar observation was made by McCulloch, Till and Siminovitch92, who observed that “individual cells within the population are not closely regulated” and that “it is the population as a whole that is regulated rather than individual cells.” However, the work of Chang and co-workers84 provides an elegant mechanism for this observation. Interestingly, another report (which was, in fact, published before the work of Chang and coworkers) has shown a remarkably similar phenomenon in mouse ES cells107. In this report, Chambers and co- workers107 used flow cytometry to profile NANOG expression in mouse ES cells using green fluorescent protein targeted to the Nanog locus. They found that, similarly to SCA1 expression in haematopoietic precursors, mouse ES cells also show variability in NANOG expression, which is undetectable in a fraction of OCT4-expressing cells. Importantly, they found that following cell sorting using flow cytometry, both NANOG-positive and NANOG-negative cell fractions had a heterogeneous distribution of NANOG expression over time and that ES cells lacking NANOG expression showed an increased propensity to differentiate. Their results suggest that NANOG expression levels fluctuate in mouse ES cells and that the NANOG-low expression phase might be a temporary ‘window of opportunity’, allowing dynamic priming of cellular commitment to differentiation.
The reports discussed so far suggest that at the single cell level cell fates can exhibit a surprising degree of flexibility. In 2006 Yamanaka and co-workers5 made a remarkable discovery concerning the extent of this flexibility: they showed that following retroviral infection with just four transcription factors — OCT4, SOX2, KLF4 (Kruepper-like factor 4) and MYC — adult fibroblasts can be reprogrammed to a state that has many ES cell characteristics. These include an ES cell morphology, the ability to form teratomas (a type of tumour containing tissue from all three germ layers) following subcutaneous injection in nude mice and the ability to differentiate to all three germ layers in vitro. The cells were termed iPS cells on the basis of their similarity to ES cells.
Since these initial reports, the original reprogramming strategy of Yamanaka and co-workers has been refined by numerous groups108–118. In particular, reprogramming has been achieved using forced expression of various alternative reprogramming factors108,109 (including in the absence of forced MYC expression109–111) in a range of adult somatic cell types, such as fibroblasts5,6, hepatocytes112, gastric epithelial cells112, mesenchymal stem cells113 and neural stem cells114. Furthermore, improved selection criteria have allowed the derivation of more completely reprogrammed cells, which are similar to ES cells not only in morphology, differentiation capacity, response to cytokines such as LIF and the ability to form teratomas, but also in global genetic and epigenetic profiles115 and the ability to form viable chimeras following their injection into blastocysts116,117.
Taken together, these reprogramming reports indicate that somatic cell fate is not terminal, but instead that cellular integrity is preserved by reversible epigenetic barriers that can be overcome given the correct stimuli. These observations are in accordance with Waddington’s view of development and the idea of cell types as attractors. The fact that cellular reprogramming is a multistep process119 involving numerous (possibly stochastic120) transitions indicates that cellular reprogramming might correspond to navigation through a complex noisy attractor landscape (FIG. 2). Crucially, this landscape describes both the molecular characteristics of the various different cell types and the relationships between these different cell types — that is, how easy or difficult reprogramming between distinct cell types might be121.
As the processes involved in cellular reprogramming are highly complex, it is a considerable challenge to map this cell fate landscape. We view the generation of such a map as a long-term goal for stem cell systems biology that will require coordinated and sustained collaboration between scientists from a range of disciplines using both experimental and theoretical approaches. However, some first steps towards mapping this landscape might be taken immediately using current technologies. For example, the fact that reprogramming of various adult somatic cells to a self-renewing pluripotent state can be achieved by many different methods5,6,108,120 is consistent with, although not yet proof of, the existence of a core ES cell attractor. In particular, although Ying and co-workers90 have shown that the self-renewing state of an ES cell is self-sustaining if the cell is shielded from inductive signalling, the hallmark attractor characteristic of stability in the face of different weak perturbations has yet to be shown for the ES cell state. To confirm the presence of such an attractor, similar approaches to those taken in establishing the presence of attractors in the haematopoietic system could be adopted. For example, an informative experiment would be to use high-throughput techniques to measure temporal molecular expression patterns following the treatment of ES cells with suboptimal inductive stimuli (such as administering low-dose or short-period retinoic acid treatment, or applying suboptimal levels of key pluripotency factors that target and inhibit mRNA). This would help to determine whether there is an inductive point of no return before which perturbed cells revert back to the undifferentiated ES cell state when stimuli are removed and after which the undifferentiated ES cell state cannot be recovered simply by the removal of stimuli.
The published literature on stem cell systems biology and reprogramming indicates that cell fate might be controlled by a complex interplay between determinism and stochasticity. In the case of determinism, systems-level regulatory network dynamics define the molecular attracting states towards which the cell is drawn over time, and in the case of stochasticity systems-level molecular noise drives transitions between coexisting attractor states and ensures robustness at the population level.
However, current evidence for cellular attractors is limited to a few mammalian cell types. Additional experiments are needed to clarify how universal these initial observations are. Furthermore, direct evidence for molecular attractors is currently limited to the mRNA transcript level. As mRNA expression does not necessarily correlate with protein expression, additional studies are required to clarify how coordinated regulation at different molecular regulatory layers — including mRNA transcripts, proteins and protein complexes, histone modifications, RNA polymerase, signalling pathways and miRNAs — specifies cellular attractor states. Similarly, although there have been some reports detailing the temporal molecular dynamics of ES cell fate changes following perturbation122, our understanding of the systems-level molecular dynamics of cell fate specification, particularly at the single cell level, is still incomplete. For this reason, we anticipate that the further development and use of high-throughput single cell genetic, epigenetic and proteomic techniques106 will be necessary to elucidate the nature of cell to cell variability and to better dissect the role of molecular noise in determining cell fate. In addition, experimental advances will have to be continually integrated with computational models to construct an accurate quantitative understanding of the regulation of stem cell fate. In this regard, stem cell systems biology is an exciting field of research, as it is rich in both experimental and computational challenges and has the potential for genuinely collaborative research.
Recently there have been important and exciting advances in our understanding of stem cell fate specification and cellular reprogramming. However, we still know little about these intrinsically complex processes at the systems level. We hope that an integrated approach, in which experimental approaches provide the information that forms the theory and computational modelling refines experimental approaches, will help us to better understand the molecular basis of stem cell fate decision making and cellular reprogramming.
The authors thank J. Wang for supplying the NANOG interac-tome data used to create FIG. 1a and Y.-S. Ang for helping to compile the list of genes used to create the supplementary stem cell transcription network.
NANOG | OCT4 | SOX2 |
The Black Family Stem Cell Institute homepage: http://www.blackfamilystemcell.org/BFSCIresearch.html
Integrated Stem Cell Molecular Interactions database: http://amp.pharm.mssm.edu/iscmid
ALL LINKS ARE ACTIVE IN THE ONLINE PDF