|Home | About | Journals | Submit | Contact Us | Français|
The last few years have seen significant advances in our understanding of the molecular mechanisms of stem-cell-fate specification. New and emerging high-throughput techniques, as well as increasingly accurate loss-of-function perturbation techniques, are allowing us to dissect the interplay among genetic, epigenetic, proteomic, and signaling mechanisms in stem-cell-fate determination with ever-increasing fidelity (Boyer et al. 2005, 2006; Ivanova et al. 2006; Loh et al. 2006; Cole et al. 2008; Jiang et al. 2008; Johnson et al. 2008; Kim et al. 2008; Liu et al. 2008; Marson et al. 2008; Mathur et al. 2008). Taken together, recent reports using these new techniques demonstrate that stem-cell-fate specification is an extremely complex process, regulated by multiple mutually interacting molecular mechanisms involving multiple regulatory feedback loops. Given this complexity and the sensitive dependence of stem cell differentiation on signaling cues from the extracellular environment, how are we best to develop a coherent quantitative understanding of stem cell fate at the systems level? One approach that we and other researchers have begun to investigate is the application of techniques derived in the computational disciplines (mathematics, physics, computer science, etc.) to problems in stem cell biology. Here, we briefly sketch a few pertinent results from the literature in this area and discuss future potential applications of computational techniques to stem cell systems biology.
Modern stem cell studies now typically use a variety of different high-throughput techniques to deconstruct the molecular basis of cell-fate specification. Nevertheless, individual studies inevitably only focus on one chosen aspect of stem cell self-renewal, fate specification, or reprogramming. Consequently, although they typically produce a wealth of data, each individual study still only represents a small aspect of our collective knowledge of stem cell behavior. Therefore, it is useful for information from a large number of individual studies to be collated and cataloged into structured meta-data sets representing the collective knowledge about the molecular regulatory mechanisms that control stem cell self-renewal and differentiation. However, the task of constructing and maintaining such collective knowledge data sets is computationally and biologically challenging because different experimental studies consider different types of stems cells, under different culture conditions, using different experimental techniques that may naturally produce biased results due to inherent limitations of experimental techniques. For example, proteomic experiments are known to enrich for highly abundant proteins, whereas gene expression microarrays are noisy and mRNA levels often only partially correlate with protein expression and function. To tackle the data integration and knowledge accumulation challenge, applications of techniques from the mathematical field of graph theory (Ma’ayan 2009) have been particularly successful. The realization that complex biological systems can be conceptually represented as networks (also known as graphs in the mathematical literature) has revolutionized our approach to exploring complex biochemical systems. To construct a biological regulatory network, elements such as genes, proteins, mRNAs, microRNAs (miRNAs), or any other kind of molecular species are represented as nodes, whereas the biochemical interactions between species, for example, protein–protein interactions or transcription factor regulation of gene expression, are represented as edges or links. Because a variety of different types of regulatory mechanisms can be represented as networks (Ma’ayan et al. 2005a), representing complex biochemical systems as networks allows the merging of different types of experimental data into a single conceptual framework (Ma’ayan 2008). An example of the successful application of graph-theoretic techniques to data integration in stem cell biology was recently given by Franz-Josef Müller, Jeanne Loring, and coworkers (Müller et al. 2008). They first classified different types of human stem cells on the basis of their genome-wide mRNA expression signatures (Müller et al. 2008) and identified a set of genes that are specifically up-regulated across a variety of different types of stem cells. Then, using available mammalian protein–protein interaction databases, they “connected” their identified stem cell gene set into a network of protein interactions, naming this integrated network PluriNet. To build PluriNet, the authors made use of a graph-theoretic algorithm and software package called Matisse (Ulitsky and Shamir 2007) to identify modules in gene expression data using background knowledge about known protein–protein interactions. In general, algorithms such as Matisse can be used to identify functional modules in complex data sets (Berger et al. 2007), whereas statistical tools can be used to characterize the functional theme of such modules (Subramanian et al. 2005). Alternatively, protein–protein interaction networks can be readily reconstructed experimentally using proteomic techniques such as immunoprecipitation-based “pull-downs” followed by mass spectrometry (IP-MS) (Gygi and Aebersold 2000). For example, a protein–protein interaction network centered around the transcription factor Nanog was recently constructed by Jianlong Wang, Stuart Orkin, and coworkers using a set of serial IP-MS experiments in which they pulled down different components of the Nanog interaction complex one at a time (Wang et al. 2006). Resources such as the PluriNet and empirically constructed interaction networks are useful because they can be used as a reference upon which to “project” future data and interpret new findings within the context of known biology.
Another source of data for building regulatory networks comes from high-throughput chromatin immunoprecipitation (ChIP)-chip (Kidder et al. 2008), ChIP-seq (Chen et al. 2008), and ChIP-PET (Loh et al. 2006) experiments. These techniques are commonly used to identify transcription factor—DNA interactions and thereby connect transcription factors to the putative sets of genes that they regulate. These techniques can also be used to identify a broad range of epigenetic chromatin modifications such as methylation/acetylation status of histone proteins. Several studies have used these techniques to identify targets of a number of the core pluripotency transcription factors. To clarify the nature of the observed binding events, high-throughput transcription-factor-binding studies are often coupled to loss-of-function experiments and genome-wide mRNA expression profiling to assess the functionality of any identified putative regulatory interactions (e.g., whether observed transcriptional binding induces activation or repression of the target gene). In addition to transcriptional regulation of stem cell fate, accumulating evidence, first in Drosophila (Hatfield et al. 2005) and more recently in mammalian stem cells (Houbaviy et al. 2003; Tay et al. 2008), suggests that miRNAs are also intimately involved in the regulation of stem-cell-fate decisions (Gangaraju and Lin 2009). For example, it has recently been shown that mir-21 suppresses a set of core pluripotency genes and is itself transcriptionally suppressed by the pluripotency factor Rest (Singh et al. 2008). Consequently, databases and network analyses deconstructing the place of miRNAs in the regulation of mammalian cells are also rapidly emerging (Altuvia et al. 2005; Griffiths-Jones et al. 2006). Although transcriptional regulation of stem cell fate is now being dissected with increasing detail, the complex signaling network (Ma’ayan et al. 2005b) upstream of the transcriptional network is less well understood. Even less information is available that sheds light on how signaling networks converge on events that occur in the nucleus. However, we anticipate that emerging high-throughput phosphoproteomics and RNA interference experiments will provide insights into the structure and function of stem cell signaling networks and their relationship to the core transcriptional network. Indeed, some progress has already been made toward this end (Chen et al. 2008).
Taken together, these reports suggest that stem-cell-fate determination is an intrinsically complex process, regulated by a dynamic interplay among genetic, proteomic, miRNA, and epigenetic mechanisms. To begin to make sense of this complexity, it is useful to cast this multiplicity of biochemical interactions in the form of networks that encode the architecture of the regulatory mechanisms of stem-cell-fate specification at the system level (Fig. 1). However, this approach inevitably leads to a paradox: As our understanding of the molecular basis of cell fate becomes more detailed, the networks that arise from these integrative studies become correspondingly more complex and difficult to interpret. For this reason, there is now a pressing need to generate new tools to make sense of these complex networks to understand how internal molecular circuitry defines cell fate at the systems level. In a sense, new tools are needed to “see the forest and not just the trees.” In the following section, we discuss ways in which mathematical models may be fruitfully used to make sense of this complexity.
A mentioned above, a number of recent reports have begun to reconstruct the transcriptional circuitry underpinning the maintenance of stem cell pluripotency and self-renewal (Boyer et al. 2005; Ivanova et al. 2006; Kim et al. 2008). Taken together, these studies report a complex transcriptional regulatory circuit centered around a set of core pluripotency factors (including Oct4, Sox2, Nanog, Esrrb, Tbx3, Tcl1, Dppa4, Tcf3, and others) connected to an extended set of lineage-specifying factors. Crucially, this extended circuit appears to have a highly enriched feedback loop structure, in which the core pluripotency factors regulate the expression of their target genes in a highly combinatorial manner and are themselves regulated in a coordinated way. The multiplicity of positive and negative feedback loops present in this core circuit makes determination and prediction of cell behavior from regulatory architecture intrinsically difficult. To tackle this problem, it is conceptually convenient to take a physical approach and think of transcriptional activation and inhibition as forces that “push” and “pull” the cell’s internal transcriptional state in different directions: some synergistically, pushing the cell in the same genetic direction; others competitively, pushing the cell in divergent directions. Within this framework, cell-fate determination may be seen as resulting from the sum of the internal forces that the cell experiences in response to environmental signaling cues, and cell “types” as equilibrium states in which the core transcription factors are expressed at a level that balances the system. Within the mathematical literature, such balanced configurations are referred to as attractors because if perturbed away, the system is attracted back to the balanced state over time. A useful analogy is that of a marble perturbed from the bottom of a bowl that tracks out a transient trajectory around the sides of the bowl only to eventually return to rest again at its bottom. A system that supports the existence of multiple different attractor states is said to exhibit multistability.
The notion that different cell types may result from multistability of an underlying dynamical system was first suggested by the Nobel-Prize-winning physicist Max Delbrück in the late 1940s (Delbrück 1949; Thomas 1998) and has been developed extensively in a theoretical context by Stuart Kauffman and other researchers since the late 1960s (Kauffman 1969, 1993; Thomas 1998; MacArthur et al. 2008). However, although this notion has received much attention in the theoretical literature, experimental evidence that distinct mammalian cell fates may correspond to attractors of underlying high-dimensional regulatory networks has only recently been provided by Sui Huang, Donald Ingber, and coworkers (Huang et al. 2005). To do so, these authors made use of the experimental observation that similar in vitro cellular responses can often be induced by disparate chemical stimuli. In particular, they used the fact that human promyelocytic HL60 cells may be triggered to neutrophil differentiation in vitro, either by treatment with retinoic acid (RA) or by treatment with dimethylsulfoxide (DMSO). By taking time courses of microarrays during differentiation, they found that RA and DMSO initially triggered widely divergent patterns of gene expression; however, although genome-scale patterns of expression were initially divergent, they found that, over time, the patterns of gene expression induced by RA and DMSO ultimately converged to a common end point. The fact that alternative perturbations affect a common response through divergent routes is characteristic of an attracting state, and their results therefore suggest that the HL60 neutrophil state is an attractor of a (as yet undefined) complex regulatory network. Since this initial report, further evidence that other mammalian cell types may be high-dimensional attractors has been provided (Chang et al. 2006, 2008; Ying et al. 2008). For instance, by blocking fibroblast growth factor (FGF) receptor and extracellular signal-regulated kinase (ERK) signaling, Qi-Long Ying, Austin Smith, and coworkers demonstrated that, if protected from external inductive differentiation stimuli, mouse embryonic stem cells may be maintained in culture in a self-renewing state without the need for the additional culture stimuli usually required for their maintenance. This result suggests that in the mouse, the core self-renewing pluripotent state is internally stable and self-sustaining, indicating that it may correspond to an attractor of the complex pluripotency circuit. Within this context, recent reports that fully differentiated cell types may be reprogrammed to a primitive pluripotent state by a variety of different means (Yu et al. 2007; Nakagawa et al. 2008; Feng et al. 2009) are also indicative of the presence of a core pluripotent attracting state.
Taken together, these experimental reports are consistent with the notion that cell fates, including the primitive pluripotent self-renewing state, may correspond to different high-dimensional attractors of the cell’s internal regulatory circuitry. However, current reports have generally only provided indirect evidence of attractors or evidence for cellular attractors at the RNA level. Because cell fate is controlled by complex feedback among genetic, epigenetic, and proteomic mechanisms, a current challenge in stem cell systems biology is to extend these initial reports, to map not only the genetic profile of cellular attractors, but also the proteomic and epigenetic profiles of cellular attractors. For example, from a biological point of view, it is usual to think of cell types as characterized by fixed molecular signatures (Ivanova et al. 2002); however, from a mathematical point of view it is also natural to suspect that the complex circuitry at the core of cell-fate specification may allow not just static “fixed-point” attractors, but also stable self-sustaining oscillatory states, in which transcriptional forces balance in a dynamic manner. Oscillators are ubiquitous in complex systems containing feedback loops, and many biochemical oscillators have correspondingly been described (Winfree 2001). In the context of stem cell differentiation, recent data indicating that Nanog expression fluctuates in murine embryonic stem cells (Chambers et al. 2007) are possibly indicative of a dynamic, rather than static, attracting state. The notion of dynamic stem cell attractors is intuitively appealing because, if present, they may allow individual cells to be dynamically primed: At the Nanog high-expression phase, cells are resistant to inductive stimuli, whereas at the Nanog low-expression phase, cells are sensitive to inductive differentiation stimuli. Evidence for dynamic stem cell attractors is currently lacking; however, we anticipate that this may be a fruitful area for future stem cell systems biology research.
In an attempt to understand the robustness of cellular differentiation, Conrad Waddington suggested his now famous epigenetic landscape (Waddington 1957). His view was that development occurs rather like a marble rolling down a tilted, funneled landscape containing multiple “hills” and “valleys”: As differentiation progresses, the cell adopts a more and more specific state, corresponding to a deeper valley in the landscape, and is barred from spontaneous movement between states by the hills that split the landscape into discrete valleys. Crucially, within Waddington’s view, cell types are not terminally fixed, but rather, they are maintained by “epigenetic” barriers that can, given sufficient perturbation, be overcome. Recent demonstrations that cells can be reprogrammed from one type to another (Jaenisch and Young 2008; Zhou et al. 2008) suggest that this is indeed the case, and these reports have correspondingly led to a revived interest in Waddington’s ideas (Goldberg et al. 2007). The notion that cell fate is guided by an underlying regulatory landscape is also appealing from a theoretical point of view because, for many complex systems, attractors may be directly associated in a precise way with local minima of an appropriately defined potential energy (or energy-like). This observation has led other authors to conjecture that Waddington’s epigenetic landscape may, in fact, correspond to the “energy” landscape of a cell’s underlying regulatory architecture (Huang and Ingber 2007). Energy landscapes have proven to be successful in helping to understand many other complex phenomena (such as the protein folding problem, for example [Wales 2003; Janke 2007]), and we therefore anticipate that applications of energy landscape theory will be useful in addressing the relationship between internal regulatory circuitry and cell-fate determination. In particular, by determining the topology of cellular “energy” landscapes, it may be possible to understand not just the nature of individual cellular attractors, but also the ways in which individual attractors relate to one another (e.g., the heights of the barriers separating them). In the context of cellular reprogramming, such information would be particularly useful because it would provide a means to determine how efficiently different cell types may be reprogrammed, either to the pluripotent state or to alternative differentiated or multipotent states.
These are exciting times for stem cell biology. New and emerging high-throughput technologies are allowing us to deconstruct the mechanisms of cell-fate determination with ever-increasing detail. By representing the multiplicity of regulatory interactions underpinning stem cell fate as networks, we are beginning to dissect stem-cell-fate specification at the systems level. However, it is becoming clear that cell-fate specification is a fundamentally complex process and this complexity makes it intrinsically difficult to determine and predict cell behavior from regulatory network architecture. One potential way to connect cell fate to regulatory circuitry is by using regulatory architecture to define a cellular “energy” landscape—in which valleys are associated with different cell types and hills are associated with the barriers between them—and computationally explore the topology of this landscape. This approach is conceptually reminiscent of Waddington’s epigenetic landscape but has, until recently, been hampered by lack of data. However, with the advent of high-resolution high-throughput techniques, we are now beginning to accumulate sufficient data at multiple molecular and biochemical levels to make Waddington’s vision quantitative. Doing so will require interdisciplinary collaboration among experimentalists, mathematicians, physicists, and computer scientists. Thus, this is not only an outstanding problem in stem cell systems biology, but also an area rich in collaborative opportunities between experimentalists and theoreticians. Consequently, developing a rigorous understanding of stem-cell-fate determination at the systems level is a significant challenge as well as a great opportunity.