In multicellular of organisms, most genes need to be expressed in a specific spatiotemporal manner, in order to guide development and to maintain post-developmental physiology. For instance, the stem cell factors Oct4, Sox2, Klf4 and c-Myc are expressed early in development and function to preserve the pluripotency of uncommitted stem cells. Their expression is sufficient to preserve or induce this unique cellular property [
1]. On the other hand; a different group of genes is activated upon cellular differentiation. One of the classical examples is the Pax6 gene that is required for eye development in a variety of organisms [
2]. Likewise, in
Drosophila embryos, the spatiotemporal expression of
gap and
pair-rule genes is crucial for defining segmentation patterns and development (reviewed in: [
3]). Finally, yet another group of genes is expressed in specialized, fully differentiated tissues to enable post-developmental functions in physiology throughout the lifetime of an organism. A textbook example is proinsulin, the insulin precursor, which is specifically expressed in the pancreas and regulates the level of glucose in the blood after food intake.
As exemplified above, differential gene expression is a highly regulated and controlled process. It occurs at a first level by the action of transcription factors (TFs): proteins that physically interact with
cis-acting genomic regions to control expression of their target genes. TFs can either repress or activate transcription and many can do both, depending on the cellular context. In addition to TFs, chromatin modifications (e.g. histone acetylation, methylation, etc. [
4]), microRNAs (reviewed in: [
5]), RNA binding proteins, mRNA stability, export and splicing, and post-translational modifications also contribute to differential gene expression. However, it is transcriptional regulation which first and foremost determines where and when a gene is expressed, whereas other types of regulation often modulate and dampen gene expression, rather than to determine it.
The human genome encodes ~1500 TFs [
6] and 600 microRNAs [
7]. For most of these regulators, their function is completely unknown. Indeed, even in large community efforts such as the ENCODE project, only a handful have been comprehensively studied [
8]. Furthermore, increasingly more non-protein-coding nucleotides are being associated with a regulatory function in the 3.2 Gb human genome [
8]. Thus, the comprehensive delineation of the mechanisms that control differential gene expression at a genome scale, or systems level in humans is as of yet a daunting task.
Systems level studies of differential gene expression are greatly advanced by the use of genetically tractable model organisms such as the fruitfly
Drosophila melanogaster and the nematode
Caenorhabditis elegans. We have focused on
C. elegans because it is a relatively simple animal with a fixed lineage of only 959 cells. In addition, the
C. elegans genome is fully sequenced and annotated, and is compact compared to the human genome: even though both contain ~20 000 genes, the 100 Mb worm genome is 30 times smaller [
9,
10]. Consequently, ~26% of the worm genome is exonic, compared with 1–2% in humans. In addition, the majority of intergenic regions are shorter than 2 kb [
11], and introns are much shorter with a median length of 65 bp, whereas the median length of human introns is 3 kb [
12]. Thus, the potential regulatory genomic ‘space’ that needs to be considered in studies of differential gene expression is much smaller. The
C. elegans genome also encodes fewer TFs (~940) and microRNAs (~150) than the human genome [
13–15]. Finally, studies about the mechanisms of differential gene expression at a systems level are greatly facilitated by the fact that
C. elegans is a transparent animal. By using reporters such as the green fluorescent protein (GFP) one can elucidate where and when genes are expressed in living animals, and determine how different perturbations affect gene expression [
16–20].
Differential gene expression can be studied at a systems level using gene regulatory networks (GRNs) that model physical and regulatory interactions between genes and their
trans regulators () [
21]. Physical TF-DNA interactions can be delineated using two conceptually different but highly complementary approaches (). TF-centered, ‘protein-to-DNA’, methods start with a TF or set of TFs of interest and identify genomic DNA fragments that these TF(s) interact with. Chromatin-immunoprecipitation (ChIP) and DamID are the most widely used TF-centered methods [
22,
23]. ChIP has been particularly powerful for the identification of TF–DNA interactions in homogeneous systems such as yeast, and in mammalian tissue culture cells, including primary cells or stem cells. Although powerful, it is technically difficult to systematically apply ChIP to most TFs in heterogeneous and complex metazoan systems such as intact worms. This is because many TFs are expressed at low levels, and some may be expressed in a limited number of cells, or during a narrow developmental interval. Furthermore, antibodies that are suitable for ChIP assays are only available for a handful of worm TFs. On the other hand, gene-centered, ‘DNA-to-protein’, methods start with one or more regulatory DNA fragments and identify the TFs that can interact with these fragments [
21]. Here, we will discuss our efforts on the delineation of gene-centered GRNs in
C. elegans and describe some of the insights that we have obtained into the global mechanisms of differential gene expression.