Pluripotent embryonic stem (ES) cells are derived from the inner cell mass of a developing embryo and can be cultured indefinitely in vitro. In vivo, mouse ES cells can contribute to all adult cell populations, including the germ line. Under defined in vitro conditions both mouse and human ES cells can differentiate into numerous mammalian cell types providing great promise for regenerative medicine. Recent studies have shown that adult mouse and human cells can be ‘reprogrammed’ into an induced pluripotent stem (iPS) cell state using simple combinations of transcription factors. In order to harness the exciting biomedical potential of ES/iPS cells, the molecular regulatory networks responsible for controlling pluripotency/self-renewal as well as, commitment and differentiation into different lineages, need to be characterized. Stem cell research is increasingly employing high-throughput systems biology approaches to define molecular ‘parts lists’ and regulatory interactions between the parts in ES cells and in their more differentiated progeny. How these parts are interconnected into gene and cell signaling regulatory networks ultimately responsible for self-renewal and differentiation is unclear. Approaches aimed to bridge the gap among molecules, network architectures, and dynamics in order to ultimately ‘explain’ phenotypic behavior are in their infancy. To enable these efforts, a pipeline process that couples experimental and computational approaches has emerged. An example of such a pipeline is outlined in . First, data are collected from different molecular regulatory layers [for example: epigenomic, messenger RNA (mRNA), and proteomic data] using emerging high-throughput technologies. Second, in order to extract biological knowledge out of such rich, complex but often, noisy experimental datasets, advanced computational tools and databases are being developed. Moreover, computational methods capable of synthesizing data from numerous experimental platforms with user-friendly interactive interfaces are gradually emerging. The computational methods include tools that convert raw data into standardized database formats/records. Such data records are organized into databases where experiments from different sources can be merged. Algorithms are then used to query such databases and integrate the high-throughput data with annotated data collated from low-throughput studies and other high-throughput studies in order to obtain new biological insights. Here, the organization of experimental data into sets of biochemically related gene products and ultimately interacting gene-product networks is extremely useful. The abstraction/simplification of data into gene-sets and networks is qualitative and as such typically ignores quantitative detail. However, it provides a birds-eye-view of the system as a whole when advanced algorithms are applied to dissect the complexity and rank components. Taken together, computational tools and the algorithms embedded within them are used to make predictions that are translated into rational hypotheses that can be validated using low-throughput functional experiments. Although results from high-throughput experiments provide a global view of the many variables involved and their relationships, current technologies lack accuracy and direct functional perspective. In contrast, low-throughput techniques, while providing functional understanding of specific components and interactions, do not have the scope needed to understand the multi-factorial behavioral complexity of the system’s behavior as a whole.
FIGURE 1 Pipeline process for systematic studies of ES cells starting with experimental methods to characterize the state of the cell at different regulatory layers. Then, data from such experiments are stored in public repositories for data consolidation and (more ...)
ES cell research is an area that fits well with the systems biology pipeline process because these cells are relatively easy to handle experimentally, have defined sets of phenotypes that can be experimentally evaluated, and are relatively homogenous in gene expression and morphology (this latter point is discussed in more detail in a subsequent section). The fact that we can differentiate stem cells to different cell types also makes these cells ideal. Stem cells can be considered the ‘ideal model organism’. In this review, we first discuss the systems biology pipeline process by surveying different types of high-throughput experiments with an emphasis on the computational tools and databases associated with each specific type of experimental approach. We then describe how these methodologies have been applied so far to study ES cells at different molecular regulatory layers, whether these layers are epigenetic, transcriptional, mRNA, microRNA, proteomic, and others. We focus on data-mining approaches which include algorithms, software tools, and databases. Such methods are used to reconstruct in silico regulatory networks and develop hypotheses for further experimentation. Finally, we present an initial ES cell regulatory network constructed from low-throughput studies.