Transcriptional regulation plays a central role in metazoan development by establishing cell-specific patterns of gene expression that represent coordinate responses to extrinsic signals and intrinsic programming [
1,
2]. Thus, detailed knowledge of the genes that are spatially and temporally coexpressed at the cellular level in a particular developmental context will not only provide insight into the logic of transcriptional networks but also define the downstream effectors of morphogenesis. Given the cellular diversity present in most tissues, it would be ideal to derive the entire genetic program of each individual cell type and to determine the response of each differentially expressed gene to perturbations of the pathways that regulate formation of that organ. Defining such cell-specific gene expression signatures and mapping the sequential steps involved in their generation are both essential to achieving a systems-level view of development [
3,
4].
Traditional studies have monitored only one or a few cell-type specific markers at a time using different genetic backgrounds to perturb the developmental process of interest. In many cases, such approaches have yielded sets of regulatory inputs and responses that provide the conceptual underpinnings for considering development in the broader terms of component interactions and network architecture [
5,
6]. However, to test the generality of hypotheses derived from the study of small numbers of genes, it is essential to acquire a comprehensive assessment of the gene expression changes occurring in response to a known set of developmental regulators.
Elaborating an integrated and systematic experimental approach to identify and functionally characterize such genes and their
cis-regulatory sequences in a metazoan model organism remains a significant and largely unsolved challenge. In yeast, pooled expression profiles derived for multiple genotypes and chemical treatments have proved extremely valuable for dissecting biological pathways [
7]. In principle, it should be possible to generate equally illuminating expression profile compendia for the development of multicellular organisms. Large numbers of datasets have been combined in a few cases for this purpose [
8,
9], but these studies did not focus on a particular aspect of development. Here, we have used such a comprehensive approach to examine the molecular identities of myoblast subtypes in the
Drosophila embryo, results that yield new information about the composition of the muscle regulatory network.
Myogenesis initiates with the segregation of two types of myoblasts from the somatic mesoderm: founder cells (FCs) and fusion-competent myoblasts (FCMs) [
10]. Each FC possesses a unique identity and seeds the formation of an individual myotube by fusing with the more homogeneous population of FCMs. Of the known early muscle-specific genes, some are specific to only one myoblast type, while others are expressed in both. Many of these genes encode transcription factors that are essential for myoblast specification [
11–
16]. Intercellular signals act in different combinations to promote the formation of FCs and FCMs [
10,
17]. This process is best understood for a subset of FCs that express
even skipped (eve) [
18–
21]. Wingless (Wg, a Wnt family member) and Decapentaplegic (Dpp, a member of the bone morphogenetic protein superfamily) first cooperate to render a large domain of mesodermal cells competent to respond to a subsequent inductive signal mediated by two receptor tyrosine kinases (RTKs), an epidermal growth factor (EGF) receptor (EGFR) and the fibroblast growth factor (FGF) receptor (FGFR) encoded by
heartless (htl). Localized RTK activation within the competence domain stimulates the Ras pathway and the formation of Eve-expressing equivalence groups [
18]. Lateral inhibitory signaling by Notch then allows a single Eve progenitor to emerge from each equivalence group under the continued influence of Ras [
19], with the remaining Notch-inhibited cells assuming an FCM identity characterized by expression of
lame duck (lmd) [
14–
16]. Since FCs are derived by the asymmetric division of progenitors [
22,
23], the Ras pathway favors FC formation, while Notch promotes FCM development from mesodermal equivalence groups.
Integration of the Wg, Dpp, and Ras pathways occurs through the direct convergent regulation of
eve by the three corresponding signal-activated transcription factors bound to a specific enhancer in the context of two mesodermal selectors [
24–
26]. Thus, distinct myoblast identity codes are generated by the combinatorial functions of Wg, Dpp, EGF, FGF, and Notch signals. These signaling codes are in turn mirrored in transcriptional codes that induce the changes in gene expression that are characteristic of individual FCs and FCMs. Collectively, this knowledge provides the logical foundation for genomic and computational investigations of muscle gene transcriptional regulation in the
Drosophila embryo.
Gene expression profiling of the
Drosophila embryonic mesoderm has been undertaken in several prior studies. In one approach, mutations in early dorsoventral patterning genes were used to eliminate or overproduce mesodermal cells, and genes whose expression is enriched in the mesoderm were identified by microarray analysis [
14,
27]. A modification of this approach in which the Ras or Notch pathway was constitutively activated in a
Toll10b mutant—a genetic background that drastically disrupts gastrulation and converts the entire embryo to mesoderm—led to the identification of a small number of genes that are specific to FCs or FCMs [
28]. However, the latter study was limited by several factors, including the complete lack of inductive ectoderm and its differentiated derivatives in
Toll10b embryos, the absence of Dpp in these embryos, the disruption of normal cellular interactions within the overproduced mesoderm, independent validation of only a few microarray predictions so that a true-positive detection rate could not be reliably estimated, and the use of a cDNA microarray that represented only 40% of the genes in the entire
Drosophila genome. It is likely, therefore, that many more FC and FCM genes remain to be discovered.
To address this question, we designed a different strategy for analyzing cell type–specific genetic programs for a complex tissue that circumvents the previously encountered difficulties and is more generally applicable. This approach integrates genetic perturbations of development, purification of primary embryonic cells of interest, microarray-based genomewide transcriptional profiling, statistical meta-analysis of the pooled gene expression datasets, and large-scale validation by in situ hybridization of gene expression patterns predicted by the computational analysis. Applying this strategy, we identified and validated several hundred genes that are uniquely expressed in FCs, FCMs, or both myoblast types. Finally, we used in vivo RNA interference (RNAi) to rapidly assess the myogenic functions of several newly identified myoblast genes. In a separate but complementary effort, information derived from the present studies was applied to a new computational method for analyzing the relative contribution of individual transcription factor binding sites to combinatorial transcriptional codes (A. A. Philippakis, B. Busser, S. S. Gisselbrecht, F. S. He, B. Estrada, A. M. Michelson, and M. L. Bulyk, unpublished data). Taken together, the systematic strategy used here provides significant new insights into embryonic myogenesis and represents an integrated experimental framework that can be applied to related investigations in other developmental contexts.