Elucidating the regulatory structure and dynamics of gene networks is a major objective in biology. The inference of regulatory networks from gene expression data is known as reverse engineering
[1]–
[7]. It is being widely and successfully applied, from microbes to animals (see, for example,
[8]–
[16]). Many reverse engineering studies aim to determine regulatory structure from large-scale perturbation- or time-series data based on microarray or transcriptome-sequencing technology (reviewed in
[5],
[17]). This approach has two significant limitations: first, spatial information on gene expression is lost, since homogenised tissue samples or disaggregated cells are studied. And second, most resulting models are of a static and probabilistic nature, which cannot be used to investigate network dynamics (for example
[18]–
[20]). If dynamical models are used, they are often linear (for example
[21]–
[23]). Network inference using complex non-linear dynamical models is deemed a considerable technical challenge
[6],
[17],
[24]. However, there are many important biological questions that absolutely require consideration of non-linear and spatial aspects of a system. Here we discuss such a case, and show that reverse engineering can be used for its study with a reasonable amount of experimental and computational effort.
Our research focuses on how developmental gene regulatory networks produce spatial patterns in multi-cellular organisms, and how these patterns evolve through changes in the underlying structure of the network
[25],
[26]. In this context, reverse engineering is implemented by fitting non-linear systems of differential equations to quantitative, spatial gene expression data (reviewed in
[7]). There are many network modelling formalisms
[27]–
[29], and a number of powerful global non-linear optimisation methods
[7],
[30],
[31], which are suitable for this task.
So far, only a small number of developmental systems have been reverse-engineered using dynamical models (see, for example,
[32]–
[34]). One of those is the (trunk) gap gene network of the vinegar fly
Drosophila melanogaster (reviewed in
[35]). This regulatory network consists of four genes—
hunchback (
hb),
Krüppel (
Kr),
giant (
gt) and
knirps (
kni)—which all encode transcription factors. Gap genes are involved in establishing the segmented body plan of the animal. They are active during a very early period of
Drosophila development, called the blastoderm stage, which occurs before the onset of gastrulation. At this stage, the embryo consists of a multi-nucleate syncytium allowing transcription factors to diffuse through the tissue. Gap genes are expressed in broad, overlapping domains along the embryo's major, or antero-posterior (A–P) axis. They are regulated by long-range gradients of transcription factors encoded by maternal co-ordinate genes
bicoid (bcd),
hunchback (hb), and
caudal (cad), and are repressed by the terminal gap genes
tailless (tll) and
huckebein (hkb) in the pole regions of the embryo. Maternal co-ordinate and gap genes form the first two tiers of the segmentation gene hierarchy in
Drosophila. Together they regulate pair-rule and segment-polarity genes, the latter forming a molecular pre-pattern that leads to morphological segmentation at later stages of development (see,
[36],
[37], for review).
The particular reverse-engineering approach we use to investigate the gap gene network is called the gene circuit method
[1],
[38],
[39] (). It uses mathematical models called gene circuits that represent the basic properties of the embryo and the transcriptional regulatory interactions underlying the network. Gene circuits are described in detail in
Materials and Methods. Here we provide a brief overview of the model, which consists of a row of dividing nuclei (, top left) each harbouring an identical version of the regulatory network. There are three processes that occur within and between nuclei: (1) regulated gene product synthesis, (2) Fickian gene product diffusion, and (3) linear gene product decay (, top middle). Regulatory interactions that direct synthesis are represented by a genetic interconnectivity matrix: each regulatory weight in this matrix can represent activation, repression, or no interaction depending on whether it is positive, negative or (close to) zero (, bottom panel, right). The interconnectivity matrix can also be displayed as a network diagram (, bottom panel, left). Note that the values of regulatory weights are not set
a priori. Instead, they are estimated by fitting the model to quantitative gene expression data. Those model solutions that fit the data well are analysed to characterize the regulatory structure and dynamics of the network (, bottom). In this way, gene circuits act as tools to extract regulatory information from quantitative data.
Previous reverse-engineering studies of the gap gene network were based on quantitative expression data obtained by visualising the distribution of gap gene mRNA
[40], or protein products
[41]–
[43] using fluorescent whole-mount
in situ hybridisation, or antibody staining (immunofluorescence) respectively. Stained embryos were imaged using confocal laser-scanning microscopy, and the resulting expression profiles were quantified using a processing pipeline that includes image segmentation to identify nuclei, time classification, removal of non-specific background staining, data registration to remove embryo-to-embryo variability, and data integration (reviewed in
[44]). It took years of effort by several researchers to establish the protein data set
[41],
[43]. mRNA data, on the other hand, were acquired by one of the authors of this paper in considerably less time
[40]. However, these mRNA data remain incomplete in that they only cover a subset of gap genes (
Kr, kni, and
gt) during the earliest stages of expression.
These previous reverse-engineering studies have yielded many new insights into gap gene regulation, which would have been difficult to obtain by experimental approaches alone. An early pioneering study predicted a co-operative effect between maternal factors Bcd and Hb on the regulation of gap gene expression
[45]. Later efforts uncovered a mechanism for the dynamic anterior shift of gap domains over time
[46],
[47], removed ambiguities in the published experimental evidence
[48],
[49], identified core mechanisms for gap gene regulation
[48],
[50], and explained the robustness of the system against variable levels of maternal inputs
[47],
[51]. Taken together, these studies clearly demonstrate the utility and feasibility of the approach: over the past two decades, reverse engineering has contributed significantly to our understanding of gap gene regulation.
It would be extremely interesting to apply the gene circuit method to other developmental systems. In our view, reverse engineering has tremendous potential for the study of gene regulatory networks in development and evolution. For instance, gene circuits could be used to reconstruct homologous developmental regulatory networks across a range of species, to compare their regulatory structure and dynamical behaviour
[26]. This could be used to map which regulatory changes in a network correspond to which changes in gene expression during evolution. Alternatively, it would also be highly interesting to compare network structures and dynamics between different developmental processes.
Despite this potential, the application of dynamic, non-linear reverse-engineering approaches beyond gap genes in Drosophila has been very limited. The main reason for this, we suspect, is the following: collection of high-quality data sets—such as the spatio-temporal profiling of gap genes described above—is costly both in terms of time and resources. It is clearly the bottleneck of the approach. Protocols based on immunofluorescence require antibodies, which are difficult and expensive to obtain. Confocal microscopy is time-consuming and laborious, since a large number of embryo images need to be scanned. Moreover, while protocols for data acquisition and quantification work efficiently in Drosophila, their application to less well-established experimental models is not trivial. In particular, it is often difficult to adapt fluorescent staining protocols to non-model species.
Thus, in order to make the gene circuit method more widely applicable—and hence useful for the study of developmental gene regulatory networks—it is imperative that we simplify the method. We address an important question which applies to reverse-engineering approaches in general: how much, and what kind of data are required to successfully infer a gene regulatory network? Answering this question in the context of the gap genes will allow us to minimise the cost of data acquisition and processing. This, in turn, will decrease the barrier for applying reverse-engineering methodology to other developmental systems, many of which are similar in kind and complexity to the gap gene network.
The quality of a gene circuit model depends directly on the quality of the data it was fit to. What matters most in this regard is the timing and position of expression domain boundaries with respect to each other. The relative level of expression in each domain is less crucial. For instance, early gap gene circuit models did not capture the formation of the abdominal
kni domain correctly (see in
[45]). This was due to the incorrect relative position of this domain in the data resulting in a large gap between it and the posterior
hb domain (see ,
ibid.). This defect is no longer present in more recent models based on data with the abdominal
kni domain positioned accurately while still only measuring relative levels of protein concentration
[46],
[48]–
[50].
In this study, we present a simplified reverse-engineering protocol and apply it to a new, quantitative data set of gap gene mRNA expression in Drosophila. We demonstrate how mRNA expression data derived from a colorimetric (enzymatic) protocol for in situ hybridisation can be used to infer the regulatory structure and dynamics of the gap gene network. We compare our results with those obtained in previous studies based on protein expression data, and show that they predict equivalent regulatory mechanisms that are consistent with experimental evidence. In addition, we show that our simplified data set can be reduced even further while still yielding correct predictions. In this way, we define a set of minimal requirements for the successful inference of gap gene regulatory network structure and dynamics. These minimal requirements suggest that the adapted gene circuit method can be applied to a variety of developmental systems with a reasonable amount of effort. Such wider application of reverse-engineering methods will enable us to carry out systematic and comparative analyses of developmental gene regulatory networks.