Phenotype and functionality of a cell is largely governed by the underlying gene regulatory network (GRN). The GRN is of fundamental importance for the developmental process, where a pluripotent progenitor cell gives rise to multiple cell types in a multicellular organism. Acquisition of different cellular phenotypes stems from the differential expression patterns of specific transcription factors that activate a cascade of complex network architecture. While experimental data are fundamental in identifying the level of transcription and the nature of transcriptional control, understanding of the complex network architecture and prediction of the effects of individual interactions in such networks will require their quantitative description in terms of strength of interaction governing the network dynamics. In this article, we report a novel mathematical modeling effort that aims at identifying the transcription factor network governing differentiation of progenitor cells to a specific lineage. We exploit the notion of sparsity, common to many biological networks, to identify the most plausible GRN operative in this scenario. Our model predictions are supported by concurrent experiments in differentiating embryonic stem cells to a specific lineage, for this case the pancreas. As will be discussed subsequently, we believe that our approach will be beneficial for the development of targeted experimental protocols for the production of cells with a pre-specified fate.
Developments in large-scale genomic technologies have made data acquisition more tractable. This feat is increasing the emphasis on the development of meaningful quantitative models utilizing the wealth of experimental data (Bansal et al.
). However, it is not always obvious how the data acquired through such techniques can be assembled into unambiguous predictive models. Tremendous effort has been focused to tackle the network identification problem (Davidson, et al.
; Foteinou et al.
; Yeung et al.
) with significant success in analyzing bacterial and yeast (Segal, 2003
) networks. However, the generalization of these methods to the inference of networks in higher eukaryotes is not always obvious. Furthermore, developmental GRNs are organized very differently from the GRNs responsible for cellular physiology, house-keeping, cell cycle, etc. (Bolouri and Davidson, 2003
). In contrast to most other GRNs, developmental network occurs in a sequence of multiple cascades of transcriptional regulations (Davidson, 2001
). Endomesoderm specification in pre-gastrular sea urchin embryo (Oliveri and Davidson, 2004
) was among the first attempts in identifying developmental GRN, followed by mesoderm specification in the frog Xenopus laevis
(Koide et al.
), dorsoventral patterning and segmentation of the Drosophila embryo (Stathopoulos et al.
), and B-cell differentiation in the mammalian immune systems (Singh et al.
). Parallel efforts in identifying the regulatory networks governing in vitro
differentiation of embryonic stem cells have been lacking till date, which has been attempted in this report.
The primary purpose of modeling GRNs for developmental process is to reveal pathways of differentiation that can be precisely manipulated to generate different cell types. Currently, it is an area of intense study due to the heightened interest in stem cell biology (Shaywitz and Melton, 2005). The main focus of this article is to capture the regulatory network using its key features: sparsity and cascade-like architecture; and quantify the influence of external environment on the governing network. This endeavor has significant relevance in the field of stem cell differentiation, where cell fate induction is controlled primarily by manipulation of the external environment via extracellular matrix, growth factors, chemical inducers/ repressors, etc. Such mathematical quantification will enable the in silico prediction of cell fate by environmental perturbations, resulting in the development of robust differentiation protocols.
The developmental regulatory network is typically organized in a distinctive cascade of control (Blais et al.
) that enables the subdivision of the entire complex network into a number of smaller subsets or modules. Each module is under the control of a signature gene or ‘hub’ that plays a central role in directing the cellular response to a given stimulus. Typically, these hubs connect to very few other nodes, behaving like a small world network with very few steps involved in connecting two nodes. Another characteristic of developmental GRNs is the relative absence of inter-connectivity between hubs that presumably facilitates compartmentalization of various biological processes occurring in a cell.
These observations reinforce the segregation of pancreatic differentiation into specific interconnected modules. Subsequently, the regulatory architecture of a single stage, that of pancreatic endoderm differentiation to pancreatic progenitor, has been treated as a single module. The governing transcription factor of this stage has been identified to be Pdx-1 which was considered to be the ‘hub’ of the pancreatic differentiation module under consideration.
The mathematical formulation is developed based on the rationale that network sparsity characterizes the regulatory architecture governing development. Consequently, we develop our mathematical model based on the notion of sparse coding
(Lee et al.
). Network sparsity has been experimentally observed in visual system of primates (Vinje and Gallant, 2000
), auditory system of rats (DeWeese et al.
), and olfactory system of insects. Here, we envisage the notion of network sparsity as the governing criterion determining the regulatory network of differentiating embryonic stem cells and propose a formal mathematical structure to analyze such systems. We describe a novel bilevel optimization algorithm that will identify the underlying regulatory network, and validate it against an in silico
network (Supplementary Material
). The developed algorithm is then applied to a system of embryonic stem cells (ESCs) differentiating towards pancreatic lineage. We show that the identified network largely conforms to a number of observations reported in the literature regarding pancreatic development. Finally, we demonstrate the predictive capability of the mathematical model by simulating a likely mechanism to induce subsequent differentiation, and validating the model prediction with concurrent experiment. The pathway of inducing endocrine differentiation, as predicted by our model, has not yet been reported in literature. However, concurrent experiments in our laboratory successfully validated the salient features of our model prediction. Although developed and validated for the specific case of pancreatic differentiation of mouse ESC, the methodology is sufficiently general in scope to be applicable to any other GRN.