Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors (TFs) and regulatory proteins across multiple stages of C. elegans development by performing 241 ChIP-seq experiments. Integrating regulatory binding and cellular-resolution expression data yielded a spatiotemporally-resolved metazoan TF binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of TFs, characterizing (1) the genomic coverage and clustering of regulatory binding, (2) the binding preferences of and biological processes regulated by TFs, (3) the global TF co-associations and genomic subdomains that suggest shared patterns of regulation, and (4) key TFs and TF co-associations for fate specification of individual lineages and cell-types.
Transcription Factor; Gene Regulation; ChIP-seq; Cellular Expression; Development
The Hawaiian strain (CB4856) of Caenorhabditis elegans is one of the most divergent from the canonical laboratory strain N2 and has been widely used in developmental, population, and evolutionary studies. To enhance the utility of the strain, we have generated a draft sequence of the CB4856 genome, exploiting a variety of resources and strategies. When compared against the N2 reference, the CB4856 genome has 327,050 single nucleotide variants (SNVs) and 79,529 insertion–deletion events that result in a total of 3.3 Mb of N2 sequence missing from CB4856 and 1.4 Mb of sequence present in CB4856 but not present in N2. As previously reported, the density of SNVs varies along the chromosomes, with the arms of chromosomes showing greater average variation than the centers. In addition, we find 61 regions totaling 2.8 Mb, distributed across all six chromosomes, which have a greatly elevated SNV density, ranging from 2 to 16% SNVs. A survey of other wild isolates show that the two alternative haplotypes for each region are widely distributed, suggesting they have been maintained by balancing selection over long evolutionary times. These divergent regions contain an abundance of genes from large rapidly evolving families encoding F-box, MATH, BATH, seven-transmembrane G-coupled receptors, and nuclear hormone receptors, suggesting that they provide selective advantages in natural environments. The draft sequence makes available a comprehensive catalog of sequence differences between the CB4856 and N2 strains that will facilitate the molecular dissection of their phenotypic differences. Our work also emphasizes the importance of going beyond simple alignment of reads to a reference genome when assessing differences between genomes.
C. elegans; genome assembly; evolution; variation
Despite the large evolutionary distances, metazoan species show remarkable commonalities, which has helped establish fly and worm as model organisms for human biology1,2. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. We mapped the genome-wide binding locations of 165 human, 93 worm, and 52 fly transcription-regulatory factors (RFs) generating a total of 1,019 data sets from diverse cell-types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous RF families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding in the regulatory underpinnings of model organism biology and how these relate to human biology, development, and disease.
Transcription Factor; Regulatory Information; Gene Regulation; Single Nucleotide Polymorphisms; ChIP-seq
The simple and well-described structure of the C. elegans nervous system offers an unprecedented opportunity to identify the genetic programs that define the connectivity and function of individual neurons and their circuits. A correspondingly precise gene expression map of C. elegans neurons would facilitate the application of genetic methods toward this goal. Here we describe a powerful new approach, SeqCeL (RNA-Seq of C. elegans cells) for producing gene expression profiles of specific larval C. elegans neurons.
Methods and Results
We have exploited available GFP reporter lines for FACS isolation of specific larval C. elegans neurons for RNA-Seq analysis. Our analysis showed that diverse classes of neurons are accessible to this approach. To demonstrate the applicability of this strategy to rare neuron types, we generated RNA-Seq profiles of the NSM serotonergic neurons that occur as a single bilateral pair of cells in the C. elegans pharynx. These data detected >1,000 NSM enriched transcripts, including the majority of previously known NSM-expressed genes.
This work offers a simple and robust protocol for expression profiling studies of post-embryonic C. elegans neurons and thus provides an important new method for identifying candidate genes for key roles in neuron-specific development and function.
The nematode Caenorhabditis briggsae is an excellent model organism for the comparative analysis of gene function and developmental mechanisms. To study the evolutionary conservation and divergence of genetic pathways mediating vulva formation, we screened for mutations in C. briggsae that cause the egg-laying defective (Egl) phenotype. Here, we report the characterization of 13 genes, including three that are orthologs of Caenorhabditis elegans unc-84 (SUN domain), lin-39 (Dfd/Scr-related homeobox), and lin-11 (LIM homeobox). Based on the morphology and cell fate changes, the mutants were placed into four different categories. Class 1 animals have normal-looking vulva and vulva-uterine connections, indicating defects in other components of the egg-laying system. Class 2 animals frequently lack some or all of the vulval precursor cells (VPCs) due to defects in the migration of P-cell nuclei into the ventral hypodermal region. Class 3 animals show inappropriate fusion of VPCs to the hypodermal syncytium, leading to a reduced number of vulval progeny. Finally, class 4 animals exhibit abnormal vulval invagination and morphology. Interestingly, we did not find mutations that affect VPC induction and fates. Our work is the first study involving the characterization of genes in C. briggsae vulva formation, and it offers a basis for future investigations of these genes in C. elegans.
C. briggsae; C. elegans; vulva; development; cell proliferation; differentiation; morphogenesis; egg-laying defective
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
While Caenorhabditis elegans specifically responds to infection by the up-regulation of certain genes, distinct pathogens trigger the expression of a common set of genes. We applied new methods to conduct a comprehensive and comparative study of the transcriptional response of C. elegans to bacterial and fungal infection. Using tiling arrays and/or RNA-sequencing, we have characterized the genome-wide transcriptional changes that underlie the host's response to infection by three bacterial (Serratia marcescens, Enterococcus faecalis and otorhabdus luminescens) and two fungal pathogens (Drechmeria coniospora and Harposporium sp.). We developed a flexible tool, the WormBase Converter (available at http://wormbasemanager.sourceforge.net/), to allow cross-study comparisons. The new data sets provided more extensive lists of differentially regulated genes than previous studies. Annotation analysis confirmed that genes commonly up-regulated by bacterial infections are related to stress responses. We found substantial overlaps between the genes regulated upon intestinal infection by the bacterial pathogens and Harposporium, and between those regulated by Harposporium and D. coniospora, which infects the epidermis. Among the fungus-regulated genes, there was a significant bias towards genes that are evolving rapidly and potentially encode small proteins. The results obtained using new methods reveal that the response to infection in C. elegans is determined by the nature of the pathogen, the site of infection and the physiological imbalance provoked by infection. They form the basis for future functional dissection of innate immune signaling. Finally, we also propose alternative methods to identify differentially regulated genes that take into account the greater variability in lowly expressed genes.
MicroRNAs (miRNAs) have been found to regulate gene expression across eukaryotic species, but the function of most miRNA genes remains unknown. Here we describe how the analysis of the expression patterns of a well-conserved miRNA gene, mir-57, at cellular resolution for every minute during early development of Caenorhabditis elegans provided key insights in understanding its function. Remarkably, mir-57 expression shows strong positional bias but little tissue specificity, a pattern reminiscent of Hox gene function. Despite the minor defects produced by a loss of function mutation, overexpression of mir-57 causes dramatic posterior defects, which also mimic the phenotypes of mutant alleles of a posterior Hox gene, nob-1, an Abd homolog. More importantly, nob-1 expression is found in the same two posterior AB sublineages as those expressing mir-57 but with an earlier onset. Intriguingly, nob-1 functions as an activator for mir-57 expression; it is also a direct target of mir-57. In agreement with this, loss of mir-57 function partially rescues the nob-1 allele defects, indicating a negative feedback regulatory loop between the miRNA and Hox gene to provide positional cues. Given the conservation of the miRNA and Hox gene, the regulatory mechanism might be broadly used across species. The strategy used here to explore mir-57 function provides a path to dissect the regulatory relationship between genes.
miRNAs are small RNAs found in many multi-cellular species that inhibit gene expression. Many of them play important roles in cancer and cell fate determination, but the function of most miRNAs is uncertain. Using live cell imaging and automated expression analysis, we found a miRNA gene, mir-57, is expressed in a position rather than tissue dependent way. Hox genes also regulate cell fate patterning along anterior-posterior (a-p) axis across different tissues. By investigating interactions between genes of these classes expressed in mir-57 expressing cells, we demonstrated by both genetic analysis and gene expression assays that a negative feedback loop between a posterior Hox gene, nob-1, and mir-57 regulates posterior cell fate determination in C. elegans. On the one hand, the Hox gene is required for normal activation of mir-57 expression, and on the other, the Hox gene functions as a direct target of and is repressed by the miRNA. Given the conservation of the two genes, a negative feedback loop between Hox and miRNA genes might be broadly used across species to regulate cell fate along the a-p axis. Detailed expression analysis may provide a general way to dissect the regulatory role of miRNAs.
Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.
Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.
The C. elegans transcription factor PHA-4 is a member of the highly conserved FOXA family of transcription factors. These factors act as master regulators of organ development by controlling how genes are turned off and on as tissues are formed. Additionally they regulate genes in response to nutrient levels and control both longevity and survival of the organism. However, the extent to which these factors control similar or distinct gene targets for each of these functions is unknown. For this reason, we have used the technique of chromatin immunoprecipitation followed by deep sequencing (ChIP–Seq), to define the target binding sites of PHA-4 on a genome-wide scale, when it is either functioning as an organ identity regulator or in response to environmental stress. Our data clearly demonstrate distinct sets of biologically relevant target genes for the transcription factor PHA-4 under these two different conditions. Not only have we defined PHA-4 targets, but we established an experimental ChIP–Seq pipeline to facilitate the identification of binding sites for many transcription factors in the future.
Comparative genomic analysis of important signaling pathways in C. briggase and C. elegans reveals both conserved features and also differences. To build a framework to address the significance of these features we determined the C. briggsae embryonic cell lineage, using the tools StarryNite and AceTree. We traced both cell divisions and cell positions for all cells through all but the last round of cell division and for selected cells through the final round. We found the lineage to be remarkably similar to that of C. elegans. Not only did the founder cells give rise to similar numbers of progeny, the relative cell division timing and positions were largely maintained. These lineage similarities appear to give rise to similar cell fates as judged both by the positions of lineally-equivalent cells and by the patterns of cell deaths in both species. However, some reproducible differences were seen, e.g., the P4 cell cycle length is more than 40% longer in C. briggsae than that in C. elegans (p < 0.01). The extensive conservation of embryonic development between such divergent species suggests that substantial evolutionary distance between these two species has not altered these early developmental cellular events, although the developmental defects of transpecies hybrids suggest that the details of the underlying molecular pathways have diverged sufficiently so as to not be interchangeable.
C. briggsae; C. elegans; embryo; cell lineage; signaling pathway
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence.
We describe a system that permits the automated analysis of reporter gene expression in Caenorhabditis elegans with cellular resolution continuously during embryogenesis and demonstrate its utility by defining the expression patterns of reporters for several embryonically expressed transcription factors. The invariant cell lineage permits the automated alignment of multiple expression profiles, allowing the direct comparison of the expression of different genes' reporters. We have also used the system to monitor perturbations to normal development involving changes both in cell division timing and in cell fate. Systematic application could reveal the gene activity of each cell throughout development.
Nematode.net (www.nematode.net) is a web- accessible resource for investigating gene sequences from nematode genomes. The database is an outgrowth of the parasitic nematode EST project at Washington University’s Genome Sequencing Center (GSC), St Louis. A sister project at the University of Edinburgh and the Sanger Institute is also underway. More than 295 000 ESTs have been generated from >30 nematodes other than Caenorhabditis elegans including key parasites of humans, animals and plants. Nematode.net currently provides NemaGene EST cluster consensus sequence, enhanced online BLAST search tools, functional classifications of cluster sequences and comprehensive information concerning the ongoing generation of nematode genome data. The long-term goal of nematode.net is to provide the scientific community with the highest quality sequence information and tools for studying these diverse species.
In C. elegans, assembly of hypodermal hemidesmosome-like structures called fibrous organelles is temporally and spatially coordinated with the assembly of the muscle contractile apparatus, suggesting that signals are exchanged between these cell types to position fibrous organelles correctly. Myotactin, a protein recognized by monoclonal antibody MH46, is a candidate for such a signaling molecule. The antigen, although expressed by hypodermis, first reflects the pattern of muscle elements and only later reflects the pattern of fibrous organelles. Confocal microscopy shows that in adult worms myotactin and fibrous organelles show coincident localization. Further, cell ablation studies show the bodywall muscle cells are necessary for normal myotactin distribution. To investigate myotactin's role in muscle-hypodermal signaling, we characterized the myotactin locus molecularly and genetically. Myotactin is a novel transmembrane protein of ∼500 kd. The extracellular domain contains at least 32 fibronectin type III repeats and the cytoplasmic domain contains unique sequence. In mutants lacking myotactin, muscle cells detach when embryonic muscle contraction begins. Later in development, fibrous organelles become delocalized and are not restricted to regions of the hypodermis previously contacted by muscle. These results suggest myotactin helps maintain the association between the muscle contractile apparatus and hypodermal fibrous organelles.
Caenorhabditis elegans; cell adhesion; cell ablations; muscle; hemidesmosome-like structures