|Home | About | Journals | Submit | Contact Us | Français|
Starting with SAGE-libraries prepared from C. elegans FAC-sorted embryonic intestine cells (8E-16E cell stage), from total embryos and from purified oocytes, and taking advantage of the NextDB in situ hybridization data base, we define sets of genes highly expressed from the zygotic genome, and expressed either exclusively or preferentially in the embryonic intestine or in the intestine of newly hatched larvae; we had previously defined a similarly expressed set of genes from the adult intestine. We show that an extended TGATAA-like sequence is essentially the only candidate for a cis-acting regulatory motif common to intestine genes expressed at all stages. This sequence is a strong ELT-2 binding site and matches the sequence of GATA-like sites found to be important for the expression of every intestinal gene so far analyzed experimentally. We show that the majority of these three sets of highly expressed intestinal-specific/intestinal-enriched genes respond strongly to ectopic expression of ELT-2 within the embryo. By flow-sorting elt-2(null) larvae from elt-2(+) larvae and then preparing Solexa/Illumina-SAGE libraries, we show that the majority of these genes also respond strongly to loss-of-function of ELT-2. To test the consequences of loss of other transcription factors identified in the embryonic intestine, we develop a strain of worms that is RNAi-sensitive only in the intestine; however, we are unable (with one possible exception) to identify any other transcription factor whose intestinal loss-of-function causes a phenotype of comparable severity to the phenotype caused by loss of ELT-2. Overall, our results support a model in which ELT-2 is the predominant transcription factor in the post-specification C. elegans intestine and participates directly in the transcriptional regulation of the majority (> 80%) of intestinal genes. We present evidence that ELT-2 plays a central role in most aspects of C. elegans intestinal physiology: establishing the structure of the enterocyte, regulating enzymes and transporters involved in digestion and nutrition, responding to environmental toxins and pathogenic infections, and regulating the downstream intestinal components of the daf-2/daf-16 pathway influencing aging and longevity.
The intestine represents the entire endoderm of the nematode C. elegans and is clonally derived from a single cell (the E cell) present in the 8-cell embryo (Sulston et al., 1983). The intestine shows a limited degree of spatial patterning but does not partition into distinct cell types. In keeping with the simple cell lineage and anatomy of the intestine, the regulatory pathway that specifies intestine fate and subsequent differentiation also appears relatively simple and straightforward (McGhee, 2007; Maduro, 2008).
Specification of the E cell, the clonal progenitor of the intestine, corresponds to the activation, probably by direct action of the maternally-provided transcription factor SKN-1, of genes encoding the redundant GATA-type transcription factors END-1 and END-3 (Zhu et al., 1997; Zhu et al., 1998; Maduro et al., 2005; Maduro, 2008). Expression of end-1/3 is transient. However, before decaying at ~mid-embryogenesis (Zhu et al., 1997; Zhu et al., 1998; Baugh et al., 2003), END-1/3 activate genes expressed in the early endoderm, as well as genes encoding three additional GATA-type transcription factors: ELT-2, ELT-4 and ELT-7 (C18G1.2) (Fukushige et al., 1998; Zhu et al., 1998; Maduro and Rothman, 2002; Fukushige et al., 2003). Animals that lack elt-2 arrest as newly hatched larvae with malformed intestines (Fukushige et al., 1998). In contrast, elt-7(null); elt-4(null) doubly homozygous animals are essentially wildtype (McGhee et al., 2007). Thus, ELT-2 appears to be the only essential GATA-type transcription factor present in the C. elegans endoderm, following END-1/3 decay. ELT-2 can first be detected when the embryonic endoderm has only two cells (mid-2E cell stage) and, at least in part because of autoregulation, elt-2 expression continues in the intestine throughout the life of the worm (Fukushige et al., 1998; Fukushige et al., 1999).
We have previously characterized genes, identified by Serial Analysis of Gene Expression (SAGE) (Velculescu et al., 1995), that are expressed specifically (or highly preferentially) in the adult intestine (McGhee et al., 2007). Building on two decades of experimental analysis of intestinal promoters, by ourselves and by others, our results suggested that the majority of intestinal genes are directly controlled by ELT-2; (genes encoding ribosomal proteins were noted as possible exceptions (McGhee et al., 2007)). In the present paper, we test this model using sets of (non-ribosomal) genes expressed exclusively (or preferentially) in the embryonic intestine and in the early larval intestine. We ask: (i) does ELT-2 indeed directly control the majority of genes expressed at any stage of the developing intestine, and; (ii) can we identify any other intestinal transcription factor of comparable importance to ELT-2?
How is a developmentally-necessary transcription factor, such as ELT-2, utilized in regulating the many genes that function in the mature terminally-differentiated organ? This is an especially important question to ask about the intestine, which plays such a central role in the diverse metabolic and homeostatic pathways defining C. elegans physiology (Ashrafi et al., 2003; Kniazeva et al., 2004; Van Gilst et al., 2005a; Van Gilst et al., 2005b; Rajagopal et al., 2008). The intestine is also a particularly effective site of action of the DAF-16/FOXO factor, the major downstream effector of the daf-2-insulin/insulin-like growth factor signaling pathway influencing aging and longevity (Libina et al., 2003). As part of the daf-2/daf-16 dauer pathway, signals pass between the intestine and the rest of the animal, presumably to coordinate morphological and physiological responses (Berman and Kenyon, 2006; Rottiers et al., 2006; Gerisch et al., 2007; Murphy et al., 2007). The intestine also plays major roles in the response of C. elegans to environmental stresses, both toxins and infections (Couillault and Ewbank, 2002; An and Blackwell, 2003; Nicholas and Hodgkin, 2004; Schulenburg et al., 2004; An et al., 2005; Inoue et al., 2005; Tullet et al., 2008). There is suggestive evidence that ELT-2 functions in all the above pathways: for example, ELT-2 has been implicated in the direct control of a number of individual genes involved in digestion, assimilation and metabolism (Britton et al., 1997; Fukushige et al., 1998; Fukushige et al., 2005; Oskouian et al., 2005; Romney et al., 2008), response to environmental toxins (Moilanen et al., 1999), response to infection with pathogens (Kerry et al., 2006; Shapira et al., 2006) and even in the control of a noncoding RNA induced by starvation (Hellwig and Bass, 2008). With the large data sets of the present paper, we are now in a position to provide a more comprehensive analysis of ELT-2 function within both the developing and the mature intestine. We suggest that ELT-2 plays central and critical roles in establishing the structure and function of the enterocyte, producing enzymes required for digestion, assimilation and metabolic homeostasis, regulating genes that allow the worm to respond to environmental stresses, and regulating the intestinal components of the daf-2/daf-16 aging/longevity/dauer pathway.
We wish to test the hypothesis that the ELT-2 GATA-factor directly regulates transcription of the majority of the genes expressed in the C. elegans intestine, beginning from the mid 2E-cell stage when ELT-2 first appears (Fukushige et al., 1998) and continuing throughout the lifetime of the worm. The major experimental difficulty in testing such an inclusive model is that most genes expressed in the intestine are likely to be expressed in other tissues as well (McGhee et al., 2007). Hence, in ELT-2 gain-of-function or ELT-2 loss-of-function experiments, the overall animal-wide response of such intestinally-expressed but not intestinal-specific genes will appear blunted, possibly to the point of uncertainty. This problem is likely to be more severe when investigating genes expressed in the embryo, because of possible high levels of (uniformly-distributed) maternal transcripts (Baugh et al., (2003) and see below). To avoid such limitations, we will identify two sets of genes, one set expressed exclusively/preferentially in the embryonic intestine and the second set expressed exclusively/preferentially in the early larval intestine, much as we had previously identified a (third) set of genes expressed exclusively/preferentially in the adult intestine (McGhee et al., 2007). We will then investigate the regulation of these selected gene sets, by searching for common cis-acting regulatory motifs and by measuring their response to ELT-2 gain-of-function or loss-of-function. We then extrapolate from the behaviour of these ~200 intestine-specific/intestine-enriched genes to the behaviour of all genes expressed in the C. elegans post-specification intestine, whether their expression is restricted to the intestine or not.
SAGE libraries were prepared from FAC-sorted embryonic intestine cells (8E-16E cell stage) identified by the fluorescence of GFP expressed under control of the elt-2 promoter (see McKay et al. (2003b)). Companion SAGE libraries were prepared from total embryos at approximately the same developmental stage, as well as from purified fer-1(b232ts) oocytes (Stroeher et al., 1994; Mains and McGhee, 1999). Curated data (WS170) used in the present study are available in Supplementary Table 1; primary data are available at http://elegans.bcgsc.bc.ca. Counting single tags, we identified 5637, 8535 and 7982 individual genes expressed in the embryonic intestine, the total embryo and purified oocytes, respectively; if single tags are ignored, the corresponding number of identified genes are 3615, 6672 and 6035. The distribution of transcript classes (KOG categories) is highly similar in all three libraries (Supplementary Figure 1A), transcript frequencies in all three libraries follow an approximate power-law distribution (Supplementary Figure 1B) and there is a large overlap (65-85%) between genes expressed in the oocyte and in the embryonic intestine (as well as between the oocyte and the total embryo; see also Baugh et al. (2003)); there is a lower degree of overlap (~20%) between genes expressed in the embryonic and adult intestine (Supplementary Figure 1C).
To define a set of genes transcribed zygotically, at high levels, and exclusively/preferentially in the embryonic intestine, the following three filters were applied to the SAGE data. A candidate gene had to be: (i) identified at ≥30 tags (per 100,000) in the embryonic intestine library (as well as present in the total embryo library); (ii) identified at <10 tags (per 100,000) in the oocyte library, and; (iii) 2-fold enriched in the embryonic intestine library relative to the total embryo library. These filters are justified at greater length in Supplementary Table 2 and Supplementary Figure 2. The 82 genes in the resulting set are briefly annotated in Table 1; expression patterns for 54 of these genes are available either from the literature or from the NextDB in situ hybridization data base (http://nematode.lab.nig.ac.jp/db2/index.php) and the majority (49/54 = 91%) of these available patterns are consistent with either exclusive or preferential gene expression in the embryonic intestine (independent of expression at other stages); examples are shown in Figure 1A.
To identify genes expressed strongly and exclusively in the L1/L2 larval intestine, all in situ hybridization patterns in NextDB were surveyed. Table 2 provides brief annotations for 61 genes whose in situ patterns were judged unambiguous; representative examples are shown in Figure 1B. Although genes in these three highly-expressed sets were identified by virtue of their expression at a particular stage, most of the genes are also expressed at other stages (Supplementary Figure 1C and 3) but at generally lower intensity. Only a minority of genes that are highly expressed at one stage are identified as highly-expressed at other stages: for example, nine genes appear on both embryo and larval highly-expressed lists, eight genes on both larval and adult lists, three genes on both embryo and adult lists, and only one gene (F58G1.4) appears on all three lists.
Computational analysis of co-expressed C. elegans promoters is capable of identifying over-represented sequences that could potentially be common cis-regulatory motifs (Gaudet and Mango, 2002; Gaudet et al., 2004; Pauli et al., 2006; McGhee et al., 2007). Combinations of different computational methods appear to be both more effective and more reliable than any single method (Tompa et al., 2005). Thus, (as previously (McGhee et al., 2007)), we searched for motifs common to intestine promoters using two completely different algorithms: (1) the determinative “oligoanalysis” word-counting program of the RSAT collection (van Helden et al., 1998; van Helden et al., 2000) (hereafter referred to as RSAT) and; (2) the stochastic iterative Gibbs-sampling-based MotifSampler program (Thijs et al., 2001). We analyzed promoters from the three sets of intestine-specific/intestine-enriched genes: embryonic intestine, L1/L2 intestine and adult intestine; (this last set was previously analyzed in McGhee et al. (2007) but has been re-curated to database freeze WS170). The results are shown as “Sequence Logos” (Schneider and Stephens, 1990) in Figure 2A, and can be summarized as follows:
What fraction of intestine-specific/intestine-enriched genes have a high-scoring TGATAA site in their promoter? To address this key question, we used the PFM (Figure 2A) to identify the highest scoring 10-mer on either strand of each promoter from the three intestinal gene sets, as well as promoters in the following control gene sets: three sets of 79 genes selected at random from the C. elegans genome and three sets of genes expressed exclusively/preferentially in hypodermis, muscle and neurons; (gene identifiers are provided in Supplementary Table 4). Figure 2B plots the highest single PFM score found in each promoter as a cumulative distribution function, i.e. the fraction of all scores in the particular set of promoters that lie at or below a particular score. For the PFM shown in Figure 2A, the highest possible score is 0.83 (corresponding to sequence ACTGATAAGA); as argued previously (McGhee et al., 2007), sequences with scores below ~0.7 may not be functional in driving transcription. Figure 2B clearly shows that intestine genes, whether identified in the embryonic, larval or adult intestine, tend to have higher scoring sites in their promoters than do control genes: 80-90% of the promoters in the three intestinal gene sets contain extended TGATAA sites scoring in the range likely to be biologically relevant, compared to 40-50% (essentially genomic background levels) of the promoters from the control gene sets. By a conservative non-parametric rank test (see Methods), the probability that the intestine PFM scores are drawn from the same population as the control PFM scores is < 10-12. Intestine genes have an average of 1.8 such high scoring sites per promoter, compared to an average of 0.7 sites in the control promoters. Pauli et al. (2006) have reported a somewhat lower degree of enrichment of TGATAA sites in the promoters of genes expressed (not necessarily exclusively) in the intestine of L4 larvae.
To summarize the above computational analyses, the extended TGATAA sequence is the only motif identified as significantly over-represented in all three sets of intestine promoters and hence is the only current candidate for a cis-acting regulatory motif common to all intestinally-expressed genes. Although statistical over-representation cannot provide evidence that a motif is functional, such evidence is provided by the numerous experimental studies that have identified extended TGATAA sequences as critical for intestinal gene expression in C. elegans. Indeed, the computationally derived PFM can be used to predict the large majority of the GATA sites that have been identified experimentally (data not shown); (discrepancies mostly correspond to the PFM predicting a high scoring site that was not investigated experimentally).
Up to the ~100-200 cell stage of the C. elegans embryo, ectopic expression of transcriptional activators can drive ectopic expression of lineage-specific differentiation markers (Fukushige et al., 1998; Horner et al., 1998; Kalb et al., 1998; Zhu et al., 1998; Gilleard and McGhee, 2001; Maduro et al., 2001; Fukushige et al., 2003; Fukushige and Krause, 2005; Maduro et al., 2005; Fukushige et al., 2006). We used this experimental system to test whether intestine-specific/intestine-enriched genes respond to ectopic expression of ELT-2 more strongly than do randomly selected genes or genes expressed tissue-specifically but in the hypodermis, muscle or neurons. As additional controls, we tested the response of the same gene sets to ectopic expression of END-1 (one of the two GATA factors associated with endoderm specification (Zhu et al., 1997; Maduro et al., 2005) or either of the two muscle-associated transcription factors, HLH-1 and PAL-1 (these latter two data sets were described in Fukushige et al. (2006). Poly-A+ RNA was purified from embryos at 0 and 6 hours following heatshock-induction of the various transcription factors (see Methods) and analyzed by hybridization to Affymetrix microarrays. Figure 3 plots LOG10(Fold-Induction) for the nine sets of genes responding to the four transcription factors. Intestinal genes (Panels A,B,C) clearly respond more strongly to ectopic ELT-2 than to ectopic END-1 (p<0.0001 using a conservative non-parametric rank test) and much more strongly to ELT-2 than to either of the control factors HLH-1 or PAL-1 (p<10-12). For the set of genes expressed in the embryonic intestine (Panel A), the quantitative responses can be summarized as: ELT-2 (median induction of 7.5-fold; 76% of genes show >2-fold induction); END-1 (median induction of 2.4-fold; 53% of genes show >2-fold induction); HLH-1 (median induction of 1.6-fold; 40% of genes show >2-fold induction); PAL-1 (median induction of 1.0-fold; 32% of genes show >2-fold induction). Genes expressed in the embryonic intestine (Panel A) respond more strongly to ectopic ELT-2 than do genes expressed in the larval intestine (Panel B; median induction of 3.5-fold; 65% >2-fold induction) or adult intestine (Panel C; median induction of 1.8-fold; 47% >2-fold induction.) Muscle-specific genes (Panel E) do not respond to ELT-2 or END-1 but, as expected, respond to ectopic expression of HLH-1 or PAL-1 (Panel E). Other control genes (hypodermal, neuronal or randomly selected; Panels D, F and G-I respectively) show weak overall responses to any factor.
We interpret the results of this section as consistent with the hypothesis that ELT-2 directly interacts with the promoters of the majority of the genes in the intestinal-specific/intestinal-enriched sets, in such a way that ELT-2 can drive ectopic expression within the embryo. We discuss below possible explanations for the minority of apparently non-responding genes.
To determine what fraction of intestine genes require ELT-2 for their expression, we constructed a balanced transgenic strain segregating progeny that are either elt-2(+) or elt-2(ca15), a deletion allele removing the entire elt-2 coding region (Fukushige et al., 1998). Strain JM147 elt-2(ca15); [pJM276 (rescuing elt-2(+) genomic plasmid); pRF4 (rol-6); pTG96_2 (sur-5::GFP transcriptional fusion (Yochem et al., 1998))] produces roughly equal numbers of Green (sur-5GFP) larvae that are rescued by the wildtype copy of elt-2 on the transgenic array and Non-Green/elt-2(null) larvae that have lost the rescuing array and that will therefore arrest (Supplementary Figure 4). Embryos were isolated by alkaline-bleach treatment of adult JM147 hermaphrodites, allowed to hatch overnight in the absence of food, and the Green/Rescued and Non-Green/elt-2(null) L1 larvae were separated using a COPAS sorter (see profile in Supplementary Figure 4). Three Solexa/Illumina-SAGE libraries were produced and analyzed: Green/Rescued L1s, Non-Green/elt-2(null) L1s and Wildtype (produced from N2 L1 larvae processed in parallel). The total number of identified tags in each library ranged from 1.5 to 2.5 million; all data are deposited in Supplementary Table 5.
To obtain a quick sense of the validity of the data, we inspected the tag counts for three genes that should respond to ELT-2 loss and for which antibodies are available. (i) Tag counts for the elt-2 gene itself are easily detected in the Green/Rescued library (123 tags per million) but are essentially abolished in the Non-Green/elt-2(null) library (2 tags per million); this residual tag count provides a measure of background uncertainty and could reflect occasional sequencing errors or mis-sorted larvae; (ii) the asp-1 gene encodes the major cathepsin D-like aspartic protease of the larval intestine (Tcherepanova et al., 2000); asp-1 tag counts are 18,704 in the normalized Green/Rescued library, compared to 395 in the Non-Green/elt-2(null) library, a 47-fold reduction; (iii) the ifb-2 gene encodes an intestine-specific intermediate filament (Bossinger et al., 2004); we have shown previously that loss of ELT-2 does not completely remove IFB-2 from the embryonic intestine (Fukushige et al., 1998), most likely because of prior activation by END-1/END-3; tag counts for ifb-2 are 1467 in the Green/Rescued library, compared to 513 in the Non-Green/elt-2(null) library, a 2.9 fold reduction. The relative intensities of antibody staining in the intestines of elt-2(+) and elt-2(null) larvae are consistent with these tag estimates (data not shown).
The overall elt-2 loss-of-function Solexa/Illumina-SAGE data are displayed as a series of scatter plots in Figure 4, plotting LOG10(tag count) for a particular gene in the Green/Rescued library (y-axis) vs. LOG10(tag count) for the same gene in the Non-Green/elt-2(null) library (x-axis). Figure 4A displays the entire data set, after filtering out genes that show <10 tags in the Wildtype control library. Figure 4B superimposes the data points corresponding to a set of genes selected randomly from the genome, and showing y and x tags in the Green/Rescued and Non-Green/elt-2(null) library, respectively. As expected, these points cluster around the scatter plot diagonal, although the displaced position of several outliers might actually reflect loss of ELT-2: (the red circles on Figure 4B depict four genes in the randomly-selected set that happen to be expressed strongly in the intestine.) Figure 4C superimposes data points corresponding to genes specifically expressed in hypodermis, muscle and neurons. Although these control data points also cluster around the scatter plot diagonal, they are further displaced from the diagonal than are the randomly selected genes shown in Figure 4B, possibly pointing to real but indirect effects of loss of ELT-2. Figures 4D, ,4E4E and and4F4F superimpose the data points corresponding to the three sets of genes identified in the embryonic intestine, larval intestine and adult intestine, respectively. The principal result of this section is that genes expressed exclusively/preferentially in the intestine are displaced further from the diagonal than are genes from the control sets, either genes chosen at random (Figure 4B) or genes expressed exclusively/preferentially in non-intestine tissues (Figure 4C). As estimated by a conservative non-parametric rank test (see Methods), the probability that the data point positions for intestine and control genes are drawn from the same population ranges from <0.0001 (embryonic intestine genes compared to hypodermal, muscle and neuronal genes) down to <10-8 (adult intestine genes compared to randomly selected genes). We note that, of the three selected intestinal gene sets, genes expressed in the embryonic intestine have the lowest fractional response to loss of ELT-2 (Figure 4D), consistent with END-1/END-3 providing redundant backup to ELT-2 in the early-to-mid embryo.
Two additional aspects of the ELT-2 loss-of-function data support the consistency of our analysis. The first result is that the set of ~400 genes, excluding genes on the intestine-specific/enriched lists, displaced farthest above the scatter plot diagonal (i.e. the ~400 genes that show the largest apparent response to ELT-2 loss-of-function, with median Green/NonGreen tag ratio = 3.4) have a significantly (p<10-5) higher-scoring TGATAA site in their promoters than do promoters of ~400 control genes that span the scatter plot diagonal (i.e. genes that do not respond to loss of ELT-2, with median Green/NonGreen tag ratio = 1.0). This is the result expected if ELT-2 interacts directly in vivo with the promoters of these displaced/ELT-2-responsive genes. The second result is that the same set of displaced genes show a significantly greater response to ectopic expression of ELT-2 within the embryo than do the set of genes that span the scatter plot diagonal (p<10-4).
In summary of this section, the majority of all three sets of highly-expressed intestinal-specific/enriched genes respond strongly to ELT-2 loss. We will discuss below possible explanations for the 10%-20% of genes that do not appear to respond.
Roughly 300 different transcription factors can be identified in the embryonic intestine SAGE library (see Supplementary Table 1); we added to this list a further 18 factors, whose spliced transcripts lack the CATG site needed to produce a SAGE tag. The large majority of these factors show no significant loss-of-function phenotype, either by mutation or by RNAi; (we have paid particular attention to the study of Sonnichsen et al. (2005), in which dsRNA was administered by maternal injection). However, thirty of these transcription factors are associated with severe loss-of-function phenotypes such as embryonic lethality, larval arrest or developmental delay. To determine whether loss of any single one of these 30 factors only in the intestine is sufficient to cause these severe phenotypes, we constructed a strain of worms in which RNAi functions only in the intestine; (see also Qadota et al. (2007)). Strain OLB11 is RNAi-resistant because of a mutation in the rde-1 gene (Tabara et al., 1999) but intestinal RNAi-sensitivity has been reconstituted by an integrated transgenic array in which the rde-1 cDNA is controlled by the elt-2 promoter. Since the elt-2 promoter is activated at the 2E cell stage (Fukushige et al., 1998), the OLB11 endoderm should become RNAi-sensitive one cell cycle after endoderm specification but prior to the majority of intestinal differentiation. Numerous control experiments demonstrate that the OLB11 strain performs as expected (data collected in Supplementary Table 6 and Supplementary Figure 5): developing OLB11 embryos and larvae appear completely sensitive to RNAi performed (by injection into mothers) against genes expressed exclusively or predominantly in the intestine but appear completely resistant to RNAi performed against non-intestinal genes, including maternal-effect lethal genes such as skn-1. Immunohistochemistry performed to detect proteins encoded by RNAi targets verifies the expected loss or major reduction of intestinal proteins with little effect on non-intestinal proteins; Figure 5A shows the intestinal loss but hypodermal-and-pharynx retention of the cell adhesion molecule AJM-1; (we note that animals lacking intestinal AJM-1 show no obvious phenotype).
Double-stranded RNA corresponding to each of the 30 selected intestinal transcription factors was injected into hermaphrodites of strain OLB11, as well as into wildtype (N2) controls (and often into the rde-1 RNAi-resistant parent strain). We also injected dsRNA corresponding to pop-1 and sbp-1, both of which are known to be expressed in the intestine but for which no SAGE tags were identified. For each transcription factor, embryonic viability, larval viability and growth rates were measured in the subsequent progeny (Supplementary Table 6); post-hatching growth rates are shown in Figure 5B.
The most severe phenotype was a penetrant (90-100%) L1 larval arrest, observed only for RNAi performed against two transcription factors: elt-2 and sbp-1. sbp-1 (also called lpd-1) is the C. elegans homolog of SREBP1 (sterol response element binding protein), centrally involved in regulating cholesterol and fatty acid metabolism (Horton et al., 2002; McKay et al., 2003a; Eberle et al., 2004; Kniazeva et al., 2004). sbp-1(null) mutants are “pale, skinny, larval-arrested worms that lack fat stores” (McKay et al., 2003a) but without obviously malformed intestines. The next most severe phenotype was slow growth post-hatching, associated with RNAi performed against F57B10.1 (let-607), C16A3.4, and F23B12.7 (see Figure 5B); RNAi against dve-1 caused an impenetrant (~16%) slow growth phenotype. However, in these last four cases, the slow-growing animals eventually develop into fertile adults and the phenotypes are much less severe than the arrest caused by loss of elt-2 or sbp-1. We also note that loss of the intestinal functions of either pop-1 or pha-4 only causes an impenetrant (~10-20%) embryonic arrest and most surviving animals appear to grow normally. Intestinal RNAi performed against the remaining transcription factors shown in Figure 5B produced no obvious phenotype.
To summarize the results of this section, we were able to identify only one transcription factor (besides ELT-2) that is expressed in the embryonic intestine and that shows a severe intestinal loss-of-function phenotype, namely SBP-1. Possible limitations in our search for other intestinal transcription factors will be discussed below.
We have previously shown that ELT-2 is the only GATA-type transcription factor required after the 2E cell stage; worms lacking both of the two other post-specification intestinal GATA factors, ELT-4 and ELT-7, are essentially wildtype (McGhee et al., 2007). It is an interesting question how such “unnecessary” transcription factors contribute to endoderm development. We have been completely unable to detect any phenotype associated with loss of ELT-4 (Fukushige et al., 2003). However, Murray et al. (2008) have recently reported that loss of ELT-7 function causes a modest (~23%) decrease in expression intensity of an endoderm reporter gene, and the phenotype of an elt-7(null); elt-2(null) animal is slightly more severe than the phenotype of an elt-2(null) by itself (unpublished results of K. Strohmaier and J. Rothman, cited in Maduro and Rothman (2002); see also Maduro (2008)). Both of these observations are consistent with ELT-7 providing a minor backup function for ELT-2 but do not challenge our view of ELT-2 preeminence.
Overall, we interpret the present results as supporting a model in which ELT-2 directly regulates the majority (say >80%) of genes expressed in the C. elegans intestine, following the early 2E-cell stage of embryogenesis (McGhee et al., 2007). We focused on the behaviour of ~200 highly-expressed intestinal-specific/intestinal-enriched genes and showed that the majority of these genes' promoters contain a high-scoring extended TGATAA site, known to be a strong ELT-2 binding site; no other candidate for a common cis-acting regulatory sequence could be detected. The majority of these ~200 genes also respond strongly to ectopic expression of ELT-2 within the C. elegans embryo and respond strongly to loss of ELT-2 in L1 larvae. In each of these three assays, 10-20 % of the genes did not respond but we are reluctant to conclude that these latter genes are not controlled by ELT-2. Such lack of response could simply reflect limitations of the particular assay: for example, a high scoring TGATAA site could be outside of the limited promoter region scanned, a gene could be repressed rather than activated by ectopic ELT-2, and loss of ELT-2 in the L1 larva could be masked by a slow turnover of transcripts perduring from an earlier phase of END-1/END-3 activation in the embryo.
Can we extrapolate from the behaviour of the ~200 selected highly-expressed intestinal-specific/intestinal-enriched genes to the behaviour of all genes expressed in the C. elegans intestine, including genes expressed at low levels, or genes for which the intestine is only one among several expression sites? First of all, as we had noted previously (McGhee et al., 2007), promoters of intestine-expressed ribosomal genes appear to be depleted of TGATAA sequences, suggesting that they may not fall under ELT-2 control. However, ignoring ribosomal genes, we suggest that ELT-2 is likely to directly control intestinal genes independently of their expression levels: most of the intestinal-specific/intestine-enriched genes for which critical TGATAA sites have been experimentally identified are expressed at low levels and are not in our selected gene sets. Furthermore, there are ample precedents, both in C. elegans and in other organisms, for the experimental decomposition of a widely expressed promoter into a series of independent modular tissue-specific enhancers; we draw attention only to the study of Wenick and Hobert (2004), who defined distinct cis-regulatory regions associated with gene expression in distinct C. elegans neurons, with the overall expression pattern of a gene being the sum of the individual patterns. In other words, we suggest that ELT-2 directly controls expression of the intestinal component of genes expressed widely in the worm. We had previously shown that the promoters of genes expressed both in the intestine and elsewhere in the adult worm are nonetheless enriched in an extended TGATAA site (McGhee et al., 2007). Overall, we suggest that a reasonable default position is that, if a gene is expressed in the post-embryonic intestine, its intestinal expression will be controlled by ELT-2, unless proven otherwise. An important test of this view will be whether any gene can be identified that does not depend on a cis-acting TGATAA-like site for its intestinal expression.
Our generally unsuccessful search for essential intestinal transcription factors (other than ELT-2) supports the view that ELT-2 is indeed the predominant transcription factor driving differentiation of the post-specification intestine. However, our search strategy had limitations (beyond the usual difficulty of attempting to prove a negative). For example, an intestinal transcription factor might have no associated SAGE tags because of low expression level, or the loss-of-function phenotype for a particular transcription factor could be masked by redundancy (see, for example, Kirienko et al. (2008). To answer the first objection, we included transcription factors known to be expressed in the intestine but which were not detected by SAGE tags (e.g. pop-1 and sbp-1). To partially counter the second objection, we suggest that the highly efficient response of intestinal genes to ELT-2 loss-of-function (Figure 4) makes it unlikely that any second factor will be redundant with ELT-2. Within these limitations, SBP-1 was the only other transcription factor (besides ELT-2) whose intestinal loss-of-function caused a penetrant larval arrest. Because SBP-1 regulates genes involved in cholesterol and fatty acid homeostasis (McKay et al., 2003a; Kniazeva et al., 2004), it is tempting to label SBP-1 a “specialized metabolic” transcription factor whose phenotype could conceivably be rescued by nutritional supplements, to distinguish it from a “general developmental” transcription factor such as ELT-2, which appears to regulate all classes of intestine genes. In support of this view, we will provide evidence below that ELT-2 is likely to control the sbp-1 gene directly and then cooperate with SBP-1 in a feed-forward loop to control genes associated with lipid metabolism.
Although no severe intestinal loss-of-function phenotype could be detected for the ~300 transcription factors identified in the embryonic intestine (besides ELT-2 and SBP-1), we suggest that many of these factors will act in combination with ELT-2 to regulate subsets of intestinal genes performing particular intestinal functions. Three clear examples of ELT-2 combining with other transcription factors are now available: (i) ELT-2 interacts with LAG-1 on the ref-1 promoter as part of the Notch-signalling pathway controlling intestinal morphogenesis (Neves et al., 2007); (ii) ELT-2 activates the ferritin promoter in combination with a second factor that confers iron responsiveness (Romney et al., 2008), and; (iii) ELT-2 activates the vit-2 promoter, combining with the MAB-3 protein to confer sex-specific and tissue-specific expression of vitellogenins (MacMorris et al., 1994; Yi and Zarkower, 1999; V. V. Captan, A. M. Danielson and JDM, unpublished results). Indeed, we suggest that most intestinal genes will be regulated by combinations of ELT-2 and some other transcription factor(s).
Figure 6 summarizes evidence that ELT-2 is directly involved in many, perhaps most, aspects of C. elegans intestinal physiology: (i) structure/function of the enterocyte; (ii) digestion/nutrition; (iii) response to environmental toxins and infections; (iv) aging/longevity and the dauer pathway, and; (v) lipid metabolism. Each sector of Figure 6 represents one of these pathways and a small number of candidate ELT-2 target genes have been selected to represent each pathway. To present the evidence for direct ELT-2 control, each gene name is associated with a vector whose entries are: (i) the response of the gene to ELT-2 loss-of-function, i.e. the ratio of tag counts in the Green/Rescued to NonGreen/elt-2(null) SAGE libraries; (ii) fold induction in response to ectopic expression of ELT-2 within the embryo; (iii) a short-hand notation describing the gene's expression pattern (I = intestinal specific; II = mostly intestinal with minor expression elsewhere; III = widespread, including the intestine), and; (iv) the number of high-scoring potentially-functional ELT-2 binding sites in the gene's promoter; (the expected background number ~0.7). Brief explanations will be provided for the gene assignments for each pathway, emphasizing potential exceptions (or elaborations) to the proposed predominance of ELT-2.
The intermediate filament protein IFB-2 is a central structural component of the microvillar brush border (Bossinger et al., 2004). HAF-9 is one of several transporter proteins and could function either extra- or intracellularly (Sundaram et al., 2008). VHA-6 is an essential subunit of the ATPase that acidifies either extracellular or intracellular compartments (Oka et al., 2001). The act-5 gene encodes the major actin component of the microvillar brush border (MacQueen et al., 2005). act-5 responds strongly to ELT-2 gain-of-function, has 6 potential ELT-2 binding sites in its promoter and we would have predicted that act-5 transcripts would be decreased in the ELT-2 loss-of-function but instead, they are significantly increased. Perhaps, in the absence of growth, ELT-2 represses act-5 transcription.
Secreted proteases are the single most prominent class of candidate direct ELT-2 targets: asp-1 (Tcherepanova et al., 2000) and cpr-1 (Britton et al., 1998) are chosen as two of numerous possible examples. The opt-2 gene encodes the major intestinal peptide transporter and presumably functions to assimilate the products of luminal proteolysis (Nehrke, 2003; Meissner et al., 2004). We suggest that ELT-2 directly regulates the genes encoding most of the other enzymes/proteins secreted into the intestinal lumen, such as saposins (e.g. spp-1), lysozymes (e.g. lys-1), and lipases (e.g. ZK6.7), whose collective and probably constitutive function is to digest the bacterial food.
Both Kerry et al. (2006) and Shapira et al. (2006) have proposed that ELT-2 is centrally involved in the innate immune response of worms to external pathogens. Indeed, 16 genes were compiled by Wong et al. (2007) on the basis that they respond to three different pathogens and are expressed in the C. elegans intestine; 12 of these genes could be identified in our L1 Solexa/Illumina-SAGE libraries and responded significantly to loss of ELT-2 (p<0.002). C-type lectins form a prominent part of the C. elegans response to infection (Wong et al., 2007; Schulenburg et al., 2008) and Figure 6 presents evidence that two such genes, clec-63 and clec-85, which are activated in response to a number of different pathogens, also appear to be regulated by ELT-2. Examples of other classes of genes identified in the pathogen response were described as normal components of food processing in the previous section (proteases, saposins, lysozymes and lipases). These same classes of genes are repeatedly identified as major components of the C. elegans response to infection, stress and aging and we suggest that ELT-2 provides the constitutive level of these gene products necessary for normal digestion. Expression of these genes could then be further stimulated under particular conditions.
We suggest that the following candidate ELT-2 targets are involved in protection against toxins in the environment: the mtl-1 metallothionein (Moilanen et al., 1999), cytochrome P450s such as cyp-37A1, and conjugating enzymes such as ugt-22 and gst-5. Blackwell and co-workers have shown that zygotic SKN-1 is expressed mainly in the intestine and is directly involved in controlling genes of the phase II detoxification response (An et al., 2005; Inoue et al., 2005; Tullet et al., 2008). As shown on Figure 6, we raise the possibility that ELT-2 may directly control transcription of at least one of the skn-1 isoforms and then may combine with SKN-1 to control downstream target genes such as gst-5 (Tullet et al., 2008).
The intestine plays a major role in the aging/longevity/dauer pathway, e.g. as a particularly effective site of action of the DAF-16/FOXO factor in reversing the extended lifespans of daf-2 mutants (Libina et al., 2003), and as an important signaling centre ((Berman and Kenyon, 2006; Gerisch et al., 2007; Murphy et al., 2007). Murphy et al (2003) have identified a set of genes referred to as dod = downstream of daf-16, and have identified, besides candidate DAF-16 binding motifs, what they describe as a “new potential regulatory sequence” over-represented in dod gene promoters. The sequence, CTTATCA, is the reverse complement of the highest scoring core-sequence present in intestinal gene promoters (Figure 2A above) and we thus suggest that ELT-2 is the “additional, as yet unidentified, factor” (Murphy et al., 2003) proposed to act in combination with DAF-16 to control downstream genes in the aging/longevity pathway, at least the intestinal component of such expression. A cursory inspection of several dod promoters reveals joint occurrences of DAF-16 sites (TRTTTAC) and high-scoring TGATAA sites, e.g. an overlapping TGATAATGTTTAC in the dod-3 promoter; unfortunately, it is not yet known where most of these genes are expressed. One interesting gene that is known to be expressed in the intestine is daf-36, which encodes an enzyme capable of modifying the sterol ligand of DAF-12 (Rottiers et al., 2006); as shown on Figure 6, we suggest that daf-36 could be a direct ELT-2 target. Also as shown on Figure 6, we raise the possibility that ELT-2 directly regulates the transcription of daf-16 isoforms expressed within the intestine.
Budovskaya et al. (2008) have recently reported that a set of age-responsive genes identified by spotted microarrays appear to be significantly enriched in genes expressed in the intestine and that TGATAA-like sites are over-represented in the gene promoters. They propose that the GATA-factor ELT-3 (together with the primarily-hypodermal non-intestinal GATA-factors ELT-5 and ELT-6) form a transcription circuit that guides C. elegans aging. ELT-3 may well control aging-related genes expressed in the hypodermis and in the pharyngeal-intestinal and rectal-intestinal valve cells (where ELT-3 is expressed). However, using anti-ELT-3 antibodies as well as several different elt-3 transgenic reporter constructs, we found no evidence that ELT-3 is expressed in the intestine, from the embryo up to mature adulthood (Gilleard et al., 1999; Gilleard and McGhee, 2001). Thus, it is unlikely that ELT-3 directly regulates aging/longevity genes acting in the intestine and we suggest that ELT-2 is by far the better candidate.
Three transcription factors important for C. elegans lipid metabolism have been identified: the worm SREBP homolog SBP-1 (McKay et al., 2003a) (see above) and the two nuclear hormone receptors NHR-49 (Van Gilst et al., 2005a; Van Gilst et al., 2005b) and NHR-80 (Brock et al., 2006). Both nhr-80 and sbp-1 are expressed exclusively or highly preferentially in the intestine (Miyabayashi et al., 1999; McKay et al., 2003a; Kniazeva et al., 2004). In contrast, nhr-49 is widely expressed in the worm and the intestine is only one of the major sites of expression (Van Gilst et al., 2005a). Figure 6 provides evidence that ELT-2 directly controls the transcription of sbp-1 and nhr-80, as well as the intestinal component of nhr-49 transcription. As further suggested in Figure 6, ELT-2 may also directly control genes encoding several enzymes of lipid metabolism, possibly in some combinatorial feed-forward loop with NHR-49, NHR-80 and SBP-1.
We note the unexpected and somewhat contradictory behaviour of the elo-5 and elo-6 genes. Both genes are expressed in the intestine (Kniazeva et al., 2004; Pauli et al., 2006; McGhee et al., 2007) and are likely to be direct targets of SBP-1 (Kniazeva et al., 2004). Although both promoters have high-scoring TGATAA sites, neither genes appear to respond to ELT-2 loss-of-function or to ELT-2 gain-of-function (Figure 6). On the other hand, mutation of one of the elo-6 TGATAA sites strongly reduces intestinal expression (Pauli et al., 2006). Perhaps elo5/elo-6 only respond to ELT-2 at later stages of development. In any case, the elo-5/elo-6 promoters would seem to provide an appropriate experimental system in which the different regulatory pathways could be distinguished.
The present paper has focused on the later steps in intestine development, after endoderm specification and especially after hatching. By incorporating these results, the core regulatory pathway that is required to define the C. elegans endoderm seems firm from start to finish, at least in outline: from the maternal cytoplasm (skn-1 transcripts) through the intermediate zygotically-produced END-1/END-3 GATA factors through the GATA-factor ELT-2, which then directly controls the large majority of the effector genes that provide the structure and function of the intestine. There appear to be few other single transcription factors that are necessary. The worm intestine thus provides a simple, possibly extreme, model of a transcriptional cascade, in which the terminal transcription factor driving development morphs into a factor that plays a central role in organ physiology and hence in the overall physiology of the animal. It will be both interesting and important to determine whether ELT-2 provides an adequate model for the role of the homologous GATA-factors in formation and function of the vertebrate endoderm.
Isolation of elt-2GFP labeled embryonic intestine cells by FACS analysis was performed exactly as described in Etchberger et al. (2007) but using strain JM63 caIs13 [pJM67 (elt-2GFP/lacZ (Fukushige et al., 1999) ; pRF4 (rol-6(su1006))]. Unfertilized oocytes were produced from strain fer-1(b232) as previously described (Mains and McGhee, 1999). Production and analysis of LongSAGE libraries were as described in (McKay et al., 2003b; Siddiqui et al., 2005; Etchberger et al., 2007; Khattra et al., 2007; McGhee et al., 2007), using the same quality filters as in McGhee et al., (2007), but with database freeze WS170. Computational analysis of promoters was as described previously (McGhee et al., 2007); parameter values are collected in Supplementary Table 3.
Embryonic cell fate conversions were initiated as previously described (Fukushige et al., 2006). Two cell embryos from transgenic strains harboring integrated, heat shock promoter driven cDNA for one of four different transcription factors (ELT-2, END-1, HLH-1, PAL-1) were collected on ice. Embryos were then incubated at room temperature for a time empirically determined to maximize cell fate conversion for each factor: 75 min for ELT-2, 60 min for END-1, 60 min for HLH-1 and 20 min for PAL-1. Embryos were heat shocked at 34°C for 30min and then incubated at room temperature for six hours prior to collection of RNA (three independent experiments). Individual RNA preparations were used to prepare probes that were independently hybridized to whole genome C. elegans Affymetrix gene expression arrays, as annotated in WormBase Freeze WS170 (Fukushige et al., 2006). Data was collected and subjected to MAS5.0 (Affymetrix) normalization and imported into GeneSpring for further analysis. GeneSpring normalizations set all values less than 0.01 to 0.01, each measurement was divided by the 50th percentile of all measurements in the total sample, and each gene value was divided by the median of raw values for that gene in all samples. Principle component analysis of samples was used to identify and eliminate the most divergent samples for each set; normalized values were averaged for the two samples of each time point and used to calculate “fold-induction”.
A Union Biometrica Biosorter equipped with a 488nm excitation filter and a 250μm flow channel was used to separate Green/Rescued from Non-Green/elt-2 (null) L1 larvae from the strain JM147. Gravid adults were subjected to alkaline-bleach treatment, and embryos were allowed to hatch overnight in M9 buffer in the absence of food and with vigorous aeration. Larvae were then suspended in M9 and 0.01% Triton X-100 and placed in the Biosorter's sample cup. An initial gate region utilizing two size parameters, Extinction vs. Time of Flight, eliminated debris and selected Green or Non-green L1 larvae (see Supplementary Figure 4). Collection periods alternated between the two classes of larvae to ensure exposure to similar conditions. L1 larvae from N2 wildtype worms were collected in parallel as a control. An aliquot of each population of sorted worms was analyzed by fluorescence microscopy to ensure sort purity. Solexa/Illumina libraries were constructed using a LongSAGE protocol (Siddiqui et al., 2005; Khattra et al., 2007) with modifications to allow for direct sequencing on the Illumina 1G Genome Analyzer. Cluster generation and sequencing was performed on the Illumina cluster station and 1G analyzer (Illumina) following manufacturer's instructions. Sequences were extracted from the resulting image files using the open source Firecrest and Bustard applications (Illumina) on a 32 CPU cluster running Red Hat Enterprise Linux 4 (Red Hat) and Sun Grid Engine 6 (Sun Microsystems). Seventeen bp SAGE tags were extracted from the resulting reads and mapped to the C. elegans genome.
Young adult hermaphrodites were injected with dsRNA (1mg/ml) prepared using genomic sequences corresponding to each selected transcription factor (primer sequences available upon request) and allowed to recover overnight at 20°C. Offspring produced by each surviving worm were collected 20-44 hours following the injection and incubated at 20°C. Viability was measured by counting the number of eggs laid, unhatched embryos and adults produced from each injected worm. Growth rates were determined by collecting ~10-15 RNAi embryos over a 2-3 hour interval and measuring their lengths 24, 48 and 72 hours later (20°C). The strain OLB11 will be described in detail elsewhere (Bossinger, in preparation).
The significance of the different behaviours observed with different gene sets was estimated using the conservative Mann-Whitney rank sum method (Freund, 1962). This test was easily applied to PFM-scores in different sets of promoters (Figure 2B), as well as to “fold-inductions” associated with ectopic expression of the different transcription factors (Figure 3). To compare the response of different sets of genes to ELT-2 loss-of-function (Figure 4), the displacement of each gene from the scatter plot diagonal was calculated as a “probability” using equation (1) of Audic and Claverie (1997), i.e. the probability that a gene had “y” and “x” tags in the Green/Rescued and Non-Green/elt-2(null) libraries, respectively, if the two libraries were identical. This approach incorporates sampling statistics, unlike the imposition of a blanket “n-fold” change in tag counts. The probabilities associated with each gene set were then compared using the Mann-Whitney rank sum test; the actual probability values were not considered.
The authors should like to thank M. Sleumer, M. Bilenky and G. Robertson, (Genome Sciences Centre, Vancouver) for programs and for helpful advice on the computational analysis of promoters, Z. Bhao (university of Washington, Seattle) for providing a list of C. elegans Transcription factors, and D. Hansen (University of Calgary) for critical reading of the manuscript. This work was supported by an operating grant from the Canadian Institutes of Health Research (to J.D.M.) and from Genome Canada and Genome British Columbia (to D.G.M., M.A.M. and S.J.J.). This research was also supported by the Intramural Research Program of the NIH, National Institute of Diabetes and Digestive and Kidney Diseases. M.A.M. and S.J.J. are scholars of the Michael Smith Research Foundation for Health Research. M.A.M. is a Terry Fox Young Investigator. J.D.M. is a Medical Scientist of the Alberta Heritage Foundation of Medical Research and a Canada Research Chair in Developmental Biology. This work is dedicated to the lat Dr. M.G. Persico.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.