Generation of the developmental time-series data
Thirty-one-hour time points, which spanned all stages of embryogenesis, were collected as described previously (
Arbeitman et al, 2002). To capture the rapid developmental changes that occur during the first half of embryogenesis, overlapping 1-h time points were obtained for the first 6.5 h of development. The stages of all samples were verified and only tightly staged embryo collections were used for RNA isolation and microarray analysis in order to minimize the overlap between measurements (see for the distribution of stages at each time point).
Three independent embryo collections were used for each stage of development. Details of the microarray hybridizations are found in
Arbeitman et al (2002). All samples were hybridized together with a common reference sample, which was made from pooled samples for each transcript from all stages of the
Drosophila lifecycle. Thus, the expression level of each gene in the sample can be compared relative to its corresponding reference expression level. The microarrays used for this study consist of PCR fragments of one exon of every predicted
Drosophila gene (
Li and White, 2003). The data were normalized using the intensity-dependent Qspline method (
Workman et al, 2002) and subsequently corrected for spatial biases (see Materials and methods for details).
Estimating expression changes of genes during embryogenesis
Two different statistical methods were used for identifying genes that change in expression during embryonic development. First, the widely used analysis of variance (ANOVA) was applied to identify significant changes in the general expression level. We found significant changes in transcript levels for 86% (
P<0.05) of all genes. This compares to
C. elegans (
Baugh et al, 2003) and an earlier analysis of
D.
melanogaster (
Arbeitman et al, 2002), where 68 and 95% respectively (
P<0.05) of the genes were found to change expression during embryogenesis. However, the actual implementation of ANOVA may differ slightly. Second, to explicitly analyze the temporal dependency of expression levels in individual genes, a runs test was used; it suggests temporal changes in transcript levels for 65% of genes during embryogenesis.
These estimates give a global overview of the amount of change occurring during the entire embryonic development; however, they do not pinpoint when transitions in gene expression programs occur. To get a more exact time measure, we searched for changes in expression levels using local convolution methods (
Supplementary Figure S1). More specifically, we required four points of low expression and four subsequent points of high expression (or vice versa) even if the amplitude change was relatively low (see Materials and methods). This type of convolution not only requires a sharp increase or decrease of expression, but also that the change in transcript level is consistent over a period of time, thereby reducing the rate of false positives owing to individual outliers.
Using this approach, we mapped 6233 ‘sharp' changes in transcript levels (2808 increases and 3425 decreases) to time points and developmental stages (). As indicated already in the study of
Arbeitman et al (2002), several developmental stages show an increased frequency of transcript level changes during embryogenesis, which can now be confirmed genome-wide. The local convolution analysis also revealed that the increase in the transcript level of one group of genes often coincides with the decrease of transcript expression of another group of genes and vice versa, indicating coordinated waves of expression ().
A flurry of expression changes was observed at 2–3 h (embryo stage ~5), representing the initiation of zygotic transcription and the parallel decay of some maternal transcripts. This first stage of dramatic change coincides with events just after cellularization—the process during which each nucleus is enclosed to form a cell by invagination of the plasma membrane—for example, the major morphogenetic changes leading to germ layer formation that occur during that time period in the cellularized embryo. It might also reflect the embryo patterning that begins along both the anterior–posterior and dorsal–ventral axis (
Wolpert et al, 2002).
A second major period of transcript expression change (both increase and decrease) was observed at roughly 12 h, when most embryos had reached stages 12–14, corresponding to the end of the dorsal closure, the terminal differentiation of many tissues, and to the invagination of the epithelial cells that will become the imaginal discs.
A third period of gene expression change is observed at 16 h (stages 14–16) when a discrete set of transcripts decrease their expression levels, followed by an intense increase of transcript levels of another set of genes (17–19 h). This could possibly be in preparation for the transition to the larval stage. Generally, sharp decreases of mRNAs seem to be more confined to particular time points.
The correlation between times of increase and decrease in transcript levels suggests the existence of coregulated groups of genes that drive major developmental events during embryogenesis.
Classification of gene expression behavior
To be able to group genes whose transcripts follow similar patterns over the full 30 time points, and to correlate this with the periods of rapid expression changes, we used global convolution (see Materials and methods and
Supplementary Figure S1) to assign genes to general expression classes. These are characterized by distinct plateaus of low and high expression during embryogenesis: class
I (maternal) genes encoding transcripts that start with a high relative transcript level, which subsequently decreases; class
II (transient) genes whose transcripts levels first increase and later decrease, and thus do not seem to be maternally deposited and, Finally, class
III (activated) genes encoding transcripts for which we only observe an increase in expression.
Class
III gene transcripts are most likely not present at high levels during the entire
Drosophila life cycle, as most of the corresponding genes return to low levels at various time points beyond embryogenesis (
Arbeitman et al, 2002). Although the transcripts from the transiently expressed genes (class
II) may be present in later stages in the larva, pupa, or the adult fly, we expect only a few genes with multiple expression peaks during embryogenesis (
Arbeitman et al, 2002), also indicated by the runs test above (data not shown).
Using global convolution (see Materials and methods), we found strong and significant expression correlation coefficients (r>0.8, P<10−4, t-test) between transcript levels and global convolution profiles for 26% (3379) of the transcripts present on the array. Of these, we classified 1534 as class I, 792 as class II, and 1053 as class III. Many genes do not fit to any of these three classes, for instance genes that are constitutively transcribed (or not transcribed at all) during embryogenesis. Furthermore, the requirements for assignment were rather stringent; the entire expression profile must fit the categorization to a high degree (30 time points), whereas in the local convolution, only eight time points are considered (corresponding to P<10−4, t-test). This leads to a low rate of false positives at the cost of sensitivity. For instance, we are likely to miss many genes with very short periods of transcriptional activity.
Within each of the three global classes, groups of genes can now be readily identified with common times of increase and/or decrease of relative transcript levels (). More than 62% of the class I (maternal) genes can be classified into two major groups: class I:a and b, whose transcript levels decrease at 3–5 and 12–14 h, respectively. For the class II genes, even though there are 276 possible combinations of time points of increase and decrease of transcript levels (see Materials and methods), as many as 38% of the 792 class II genes fall into only three groups: II:a (2.5–12 h), II:b (11–20 h), and II:c (15–20 h) containing 153, 100, and 50 genes, respectively. Of the genes whose transcript levels increase but are not observed to decrease during embryogenesis (class III), more than 73% can be classified into three main groups; class III:a, b, and c (times of increase at 13–14, 18–20, and 22 h of development).
In order to identify biological principles underlying these different coexpression groups, and also as a general quality control, we studied these groups using various sources of biological information (, and and Materials and methods). For example, proteins encoded by genes assigned to each of the three major classes have more interactions with each other than expected from a random group of the same size (P<0.01; , and ), showing that the gene products are not only coexpressed but also tend to interact physically to perform related cellular functions.
| Table 1aDatabase analysis of class I transcripts |
| Table 1bDatabase analysis of class II transcripts |
| Table 1cDatabase analysis of class III transcripts |
Class I: maternal genes Class
I contains 1534 genes, and thus represents the most populated of the three major expression classes. It is long been known that a large number of transcripts are deposited in the oocyte during gametogenesis. Among other vital functions, they have been shown to be responsible for establishing the major body axes and for the initiation of zygotic transcription (
Luschnig et al, 2004). The importance of these genes is reflected by a high proportion of lethal genes (observed: 10%; expected: 5%;
P<10
−3; χ
2-test) and a higher than average fraction of orthologs in this group shared with
Anopheles gambiae (), indicating functional conservation. Furthermore, analysis of Flybase GO annotation (
Drysdale and Crosby, 2005) revealed that there is a significant overrepresentation of genes involved in ‘nuclear organization and biogenesis', ‘nuclear mRNA splicing', and ‘DNA metabolism' in the class
I group. These genes facilitate the organization of the chromosomes and nuclei during the very rapid cell divisions in the precellular blastoderm embryo. Proteins encoded by the class
I genes also display a strong physical interconnectivity: 1097 connections within 1534 proteins (expected connections: 517;
P<10
−3; χ
2-test). Furthermore, they are enriched in transcription factors (
P<0.05). A total of 365 out of the 1534 genes change transcript levels at 1.5–3 h (group class
I:a; ). As expected, the functionally characterized genes of the class
I:a group are mostly involved in early development and in the cell cycle according to the interactive fly database (
Brody, 1999). The class
I:b group consists of 593 genes, which encode transcripts with decreased levels of expression by 10–11 h. These genes are annotated in the interactive fly database with functions involved in the cell cycle, chromatin organization, and DNA replication. Class
I:b is also enriched in lethal genes (
P<0.01; χ
2-test). In developmental terms, the decrease of the transcripts in class
I:a coincides with the start of gastrulation, and
I:b with the initiation of germ band retraction and dorsal closure.
Class II: transient genes Despite the stringent requirements of class
II genes to have both a sharp increase and decrease in transcript levels, 792 fly genes were classified in this class. This is consistent with earlier reports in
D. melanogaster (
Arbeitman et al, 2002) and also
C. elegans (
Baugh et al, 2003). As shown in , the class
II genes are enriched in transcription factors and lethal phenotype genes (hereafter referred to as
lethals; see Materials and methods) and not surprisingly, this class shows an overrepresentation of genes with functions involved in development. More specifically, many of these genes are annotated as encoding proteins involved in ‘histogenesis', ‘organogenesis', ‘ectoderm development', ‘cell differentiation', and ‘cell fate commitment' (; GO analysis). The class
II group is also enriched in genes that encode well-characterized transcription factors (
P<0.05,
t2-test).
Of the class II genes, 303 (38%) are found in only three groups; a, b, and c, each defined by specific times of increase and decrease of transcript levels (). In addition to the common features above, the groups also differ from each other. For example, more genes from class II:a (with a plateau of increased transcript levels starting at 3–6 h and decreasing at 12–13 h) have been functionally characterized than average for the genome (P<0.01, t2-test). Conversely, genes in class II:b (13–14 to 19–21 h) and II:c (17–18 to 21 h) are poorly characterized. Class II:a corresponds roughly to the time of cellularization and gastrulation, up to the point of germ band retraction. Class II:b and II:c range between germ band retraction and late embryonic stage.
The class II:a group (153 genes, of which 109 are annotated) contains 11 genes (of 29 on the array) that are a part of the Notch pathway (see below). Some other Notch members are found in class I, as many are maternally inherited. The drop in their transcript levels do however tend to coincide with II:a. The remaining 142 genes are often annotated as being implicated in ‘neuroblast cell fate determination' suggesting that this group is highly enriched for both the regulators (members of the Notch pathway) and their downstream effector molecules (genes involved in neurogenesis).
Genes involved in dorso-ventral patterning are also significantly overrepresented in the class II groups (P<0.01) and in particular in the class II:a category (P<0.05). Genes from the class II:a group are expressed in the procephalic ectoderm, ventral ectoderm, sensory complex, and central brain neurons (in situ data, see Materials and methods). Again, this strongly suggests a role for class II:a genes in nervous tissue and brain development.
Despite the low proportion of functionally characterized genes in the class
II:b group (51 of 100 genes), there is an enrichment in oxidoreductase and peroxidase functions (GO classification). Moreover, defense and immune response are common functional classifications in this group. For instance, eight (of 20 known) members of the Osiris cluster (
Dorer et al, 2003) are present in class
II:b. This cluster is part of the largest region of synteny between
D. melanogaster and
A. gambiae, and encodes one of the largest gene families in fruitfly (
Zdobnov et al, 2002). The genes of the Osiris cluster are still poorly characterized, but they are known to be under strong selection pressure both on protein sequence and expression level (
Dorer et al, 2003). The transcript levels of Osiris genes 3, 7, 9, 17, 18, and 20 are known to be high during embryogenesis from stages 13 to 16 (
Dorer et al, 2003), compatible with our findings.
The class
II:c group encompasses 50 genes. Only 19 of these are annotated, and most are involved in cuticle formation. The fly embryo secretes a hard proteinaceous material, which forms a thick protective cuticle surrounding the larvae. Furthermore, class
II:b (see above) is linked via
in situ data to the dorsal ectoderm, suggesting that also many genes in class
II:b may contribute to cuticle formation (
Ostrowski et al, 2002). Therefore, groups
II:b and
II:c may be of interest when designing insecticides or planning experiments that target the cuticle.
Class III: activated genes Of the 1053 transcripts with sharp increases in expression levels but without a subsequent decrease during embryogenesis (class III), 250 are activated at 11–12 h (III:a), 342 at 16–18 h (III:b), and 184 at 20 h (III:c). These genes are the most species specific of the three categories: we found significantly fewer orthologs to predicted genes from A. gambiae than for the genes represented on the microarray as a whole. Furthermore, known transcription factors are significantly underrepresented in this group (), which implies that the mRNA expression of transcriptional regulators is likely to be under tight regulation. III:a starts roughly at the same time as the dorsal closure, and III:b coincides with the late embryonic stage. The initiation of III:c cannot be correlated to any distinct developmental event.
Coordinated regulation of transcripts and protein products
The decrease in transcript levels during embryogenesis in the class
I and
II genes suggests that it is important to reduce the levels of the respective protein in a temporally controlled manner. Transcriptional regulation alone is not sufficient to ensure a rapid reduction in protein levels, as the protein degradation may take hours. We thus hypothesize that the protein products of most transcriptionally repressed genes are inactivated, for example through targeted degradation controlled by PEST regions. This mechanism has previously been suggested to be responsible for the degradation of maternal proteins in
C. elegans (
Baugh et al, 2003) as well as for proteins that are periodically expressed during the mitotic cell cycle in several eukaryotes (
de Lichtenberg et al, 2005;
Jensen et al, 2006).
To test our hypothesis, we used a computational method to systematically predict PEST regions in the
D. melanogaster proteome and compared the percentage of PEST-containing proteins encoded by the genome to that of the genes in each expression class. The highest percentages of PEST-containing proteins are encoded by class
I (39%) and class
II genes (37%). Both groups are significantly enriched in PEST regions compared to the proteome-wide content (31%) with
P-values of 10
−11 and 10
−3, respectively. The difference between class
I and
II is not statistically significant. In contrast, PEST regions are found in only 21% of the proteins encoded by class
III genes whose RNA levels remain high, which is significantly less than expected at
P<10
−9. These observations are consistent with the hypothesis of a coordinated downregulation of a gene's expression in
Drosophila at the level of both their RNA and protein products during embryogenesis, as suggested previously for other organisms (
Baugh et al, 2003;
de Lichtenberg et al, 2005;
Jensen et al, 2006).
Mapping coordinately expressed groups of genes to protein interactions
As suggested above, transcripts in the individual expression groups are tightly coexpressed at the same stages of development and are enriched in particular groups of functional (GO) categories. Consequently, one would expect that the protein products preferentially interact with each other to perform common functions. To test this hypothesis, we analyzed the integrated
D. melanogaster protein interaction network from the STRING database, which contains large-scale interaction data from fruitfly yeast two-hybrid screens (
Giot et al, 2003), small-scale interactions stored in dedicated databases and extracted from the literature and inferred interactions from different species, all embedded into a unified scoring scheme (
von Mering et al, 2005). Indeed, proteins encoded by genes in all groups except class
I:
a and class
III:a have more interactions between themselves than with proteins outside that group (
P<0.01). Certain classes of proteins and pathways contribute strongly to this enrichment, including proteins involved in energy production, transcription factors (such as
bicoid, FBgn0000166), or members of the Notch pathway (). The latter also indicates that functions are not only performed within a particular coexpression group but also that some larger developmental pathways require a concerted action of genes with different expression profiles.
Exploration of coexpressed genes involved in common pathways: Notch as an example
The Notch signaling pathway regulates cell fate determination, and consists of 61 proneural and neurogenic genes (
Brody, 1999). Of these 61 genes, 52 were assayed on the array and 20 fit significantly to the category of class
II genes. Thus, >32% of the Notch pathway genes have highly coordinated and transient gene expression, as compared to an expected 6.2% (based on 792 of 12 868 transcripts in class
II; P<10
−4,
t-test). This reflects the transient role of this pathway in cell fate specification. Roughly half of these genes are found in
II:a, along with
Notch itself. Also, we find four pathway members (e.g.
deltex: FBgn0000524) in the maternal class
I:b, whose transcripts decrease at the same time as class
II:a.
Some members of the Notch pathway in class
II include
big brain (FBgn0000180; an ion channel concentrated at apical adherence junctions),
neuralized (FBgn0002561; a gene involved in ubiquitination), and a key transcription factor acting in parallel to the proneural genes,
soxneuro (FBgn0029123;
Overton et al, 2002).
Notch itself (FBgn0004647) and many of its pathway members are highly expressed up to the 12th hour of development (stages 12–14 in this developmental time series), but then have an abrupt decrease in expression, dropping to very low transcript levels after this point. This sharp decrease of
Notch and some other transcripts suggests that the pathway is not needed anymore and should be suppressed, indicating that the major specification events have been completed. This is further indicated by the presence of a PEST sequence in
Notch, and the fact that the prolonged transcript expression of
Notch gives rise to disease (
Joutel and Tournier-Lasserve, 1998).
Coexpressed classes of genes show coordinated expression in vivo
As outlined above, the tight temporal patterns of expression observed by >30% of the Notch family members suggest a coordinated regulation in expression. This observation makes a number of predictions about the expression patterns of the other ~100 class II:b genes, the majority of which are poorly characterized. Firstly, class II genes should have transient expression, initiating at stages 6–7 and decreasing by stage 13. Second, as these genes have coordinated expression with Notch family members, they are highly likely to be colocalized in the same cells. To test these hypotheses, we selected five candidates from the II:a group, for costaining with Delta (FBgn0000463; the ligand of the Notch receptor) by double in situ hybridization. The candidates were selected based on their strength of correlation to the II:a profile, their fold change, and their gene annotation.
Four of the five genes gave specific patterns of expression. We were not able to obtain an in situ hybridization probe for the fifth gene (CG1316). For all four genes tested, their expression initiates early in development (~stage 7), and is dramatically reduced or absent by stage 13 (). There is residual expression in small groups of cells at stage 13 for three genes, for example in the brain (worniu and CG13333), in the foregut (pdm2), and segmentally repeated groups of cells in the ectoderm (CG13333). No expression was observed for CG4440 at stage 13, indicating that this gene is no longer transcribed. Therefore, the temporal window and transient nature of expression of all four genes map to the prediction for class II:a genes.
We next examined the spatial colocalization of these four genes with the Notch pathway, using Delta as a marker. Delta is a membrane-bound ligand for Notch, and is therefore expressed on the ‘Notch signal-sending' cell. While Notch itself is a membrane bound receptor on the ‘Notch-receiving cell'. As Delta is tethered to the membrane, in contrast to other signaling pathways, the ‘Notch signal-sending' cells and ‘Notch-receiving' cells are usually adjacent to each other or within the same field of cells. A colocalization of tissue expression in the Delta-expressing cell or the neighboring Notch-expressing cell would suggest that the transcript may play a role either in the Notch–Delta pathway or in a tightly coordinated parallel pathway. The transcription factors pdm2 and worniu are essential for brain and neural development, but are not known to be linked to either Notch or Delta specifically.
shows in situ hybridization of stage 11–12 embryos, which have peak expression for class II:a genes expression. As the Notch pathway is highly active during this time of embryogenesis, Delta has a very broad expression pattern making it problematic to discern specific colocalization. Given this potential difficulty, we could observe colocalization of worniu and pdm2 in the ventral nerve cord and brain. These genes are colocalized in a subset of neuroblasts indicating specific colocalization in these cells at this stage of development. CG13333, a gene of unknown function, has a broad expression in a number of tissues including the developing foregut, hindgut, and trachea. Again, we observed specific colocalization with Delta in the brain. Interestingly, both CG13333 and the second uncharacterized gene CG4440 are expressed in ectodermal strips which are directly adjacent to the Delta-expressing ectodermal strips. This neighboring expression may represent CG13333 and CG4440 expression in Notch-receiving cells, rather than the Delta-sending cells, which would implicate these genes in Notch-regulated processes in development.