Advances in microscopy and fluorescent reporters have allowed us to detect the onset of gene expression on a cell-by-cell basis in a systemic fashion. This information, however, is often encoded in large repositories of images, and developing ways to extract this spatiotemporal expression data is a difficult problem that often uses complex domain-specific methods for each individual data set. We present a more unified approach that incorporates general previous information into a hierarchical probabilistic model to extract spatiotemporal gene expression from 4D confocal microscopy images of developing Caenorhabditis elegans embryos. This approach reduces the overall error rate of our automated lineage tracing pipeline by 3.8-fold, allowing us to routinely follow the C. elegans lineage to later stages of development, where individual neuronal subspecification becomes apparent. Unlike previous methods that often use custom approaches that are organism specific, our method uses generalized linear models and extensions of standard reversible jump Markov chain Monte Carlo methods that can be readily extended to other organisms for a variety of biological inference problems relating to cell fate specification. This modeling approach is flexible and provides tractable avenues for incorporating additional previous information into the model for similar difficult high-fidelity/low error tolerance image analysis problems for systematically applied genomic experiments.
C. elegans; cell fate; gene expression; image analysis; lineage
Many animal species use a chromosome-based mechanism of sex determination, which has led to the coordinate evolution of dosage-compensation systems. Dosage compensation not only corrects the imbalance in the number of X chromosomes between the sexes but also is hypothesized to correct dosage imbalance within cells that is due to monoallelic X-linked expression and biallelic autosomal expression, by upregulating X-linked genes twofold (termed ‘Ohno’s hypothesis’). Although this hypothesis is well supported by expression analyses of individual X-linked genes and by microarray-based transcriptome analyses, it was challenged by a recent study using RNA sequencing and proteomics. We obtained new, independent RNA-seq data, measured RNA polymerase distribution and reanalyzed published expression data in mammals, C. elegans and Drosophila. Our analyses, which take into account the skewed gene content of the X chromosome, support the hypothesis of upregulation of expressed X-linked genes to balance expression of the genome.
The nematode Caenorhabditis briggsae is an excellent model organism for the comparative analysis of gene function and developmental mechanisms. To study the evolutionary conservation and divergence of genetic pathways mediating vulva formation, we screened for mutations in C. briggsae that cause the egg-laying defective (Egl) phenotype. Here, we report the characterization of 13 genes, including three that are orthologs of Caenorhabditis elegans unc-84 (SUN domain), lin-39 (Dfd/Scr-related homeobox), and lin-11 (LIM homeobox). Based on the morphology and cell fate changes, the mutants were placed into four different categories. Class 1 animals have normal-looking vulva and vulva-uterine connections, indicating defects in other components of the egg-laying system. Class 2 animals frequently lack some or all of the vulval precursor cells (VPCs) due to defects in the migration of P-cell nuclei into the ventral hypodermal region. Class 3 animals show inappropriate fusion of VPCs to the hypodermal syncytium, leading to a reduced number of vulval progeny. Finally, class 4 animals exhibit abnormal vulval invagination and morphology. Interestingly, we did not find mutations that affect VPC induction and fates. Our work is the first study involving the characterization of genes in C. briggsae vulva formation, and it offers a basis for future investigations of these genes in C. elegans.
C. briggsae; C. elegans; vulva; development; cell proliferation; differentiation; morphogenesis; egg-laying defective
Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.
Image analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual C. elegans genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (i.e., editing) is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours.
In this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions) and train a support vector machine (SVM) classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at http://starrynite.sourceforge.net.
We demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.
Comparative genomic analysis of important signaling pathways in C. briggase and C. elegans reveals both conserved features and also differences. To build a framework to address the significance of these features we determined the C. briggsae embryonic cell lineage, using the tools StarryNite and AceTree. We traced both cell divisions and cell positions for all cells through all but the last round of cell division and for selected cells through the final round. We found the lineage to be remarkably similar to that of C. elegans. Not only did the founder cells give rise to similar numbers of progeny, the relative cell division timing and positions were largely maintained. These lineage similarities appear to give rise to similar cell fates as judged both by the positions of lineally-equivalent cells and by the patterns of cell deaths in both species. However, some reproducible differences were seen, e.g., the P4 cell cycle length is more than 40% longer in C. briggsae than that in C. elegans (p < 0.01). The extensive conservation of embryonic development between such divergent species suggests that substantial evolutionary distance between these two species has not altered these early developmental cellular events, although the developmental defects of transpecies hybrids suggest that the details of the underlying molecular pathways have diverged sufficiently so as to not be interchangeable.
C. briggsae; C. elegans; embryo; cell lineage; signaling pathway
As a fundamental process of development, cell proliferation must be coordinated with other processes such as fate differentiation. Through statistical analysis of individual cell cycle lengths of the first eight out of ten rounds of embryonic cell division in C. elegans, we identified synchronous and invariantly ordered divisions that are tightly associated with fate differentiation. Our results suggest a three-tier model for fate control of cell cycle pace: the primary control of cell cycle pace is established by lineage and the founder cell fate, then fine-tuned by tissue and organ differentiation within each lineage, then further modified by individualization of cells as they acquire unique morphological and physiological roles in the variant body plan. We then set out to identify the pace-setting mechanisms in different fates. Our results suggest that ubiquitin-mediated degradation of CDC-25.1 is a rate-determining step for the E (gut) and P3 (muscle and germline) lineages but not others, even though CDC-25.1 and its apparent decay have been detected in all lineages. Our results demonstrate the power of C. elegans embryogenesis as a model to dissect the interaction between differentiation and proliferation, and an effective approach combining genetic and statistical analysis at single-cell resolution.
statistics; single cell; fate differentiation; cdc25; Skp1-related
We describe a system that permits the automated analysis of reporter gene expression in Caenorhabditis elegans with cellular resolution continuously during embryogenesis and demonstrate its utility by defining the expression patterns of reporters for several embryonically expressed transcription factors. The invariant cell lineage permits the automated alignment of multiple expression profiles, allowing the direct comparison of the expression of different genes' reporters. We have also used the system to monitor perturbations to normal development involving changes both in cell division timing and in cell fate. Systematic application could reveal the gene activity of each cell throughout development.
To determine whether the distinctive features of Caenorhabditis elegans chromosomal organization are shared with the C. briggsae genome, we constructed a single nucleotide polymorphism–based genetic map to order and orient the whole genome shotgun assembly along the six C. briggsae chromosomes. Although these species are of the same genus, their most recent common ancestor existed 80–110 million years ago, and thus they are more evolutionarily distant than, for example, human and mouse. We found that, like C. elegans chromosomes, C. briggsae chromosomes exhibit high levels of recombination on the arms along with higher repeat density, a higher fraction of intronic sequence, and a lower fraction of exonic sequence compared with chromosome centers. Despite extensive intrachromosomal rearrangements, 1:1 orthologs tend to remain in the same region of the chromosome, and colinear blocks of orthologs tend to be longer in chromosome centers compared with arms. More strikingly, the two species show an almost complete conservation of synteny, with 1:1 orthologs present on a single chromosome in one species also found on a single chromosome in the other. The conservation of both chromosomal organization and synteny between these two distantly related species suggests roles for chromosome organization in the fitness of an organism that are only poorly understood presently.
The importance of chromosomal organization in the fitness of a species is only poorly understood. The publication of the C. elegans genome sequence in 1998 revealed features of higher level organization that suggested its chromosomes were organized into distinct domains. Chromosome arms were accumulating changes more rapidly than the centers of chromosomes. In this paper, we have compared the organization of the nematode C. briggsae genome with that of C. elegans. By building a genetic map based on DNA variations between two strains of C. briggsae, and by using that map to organize the draft genome sequence of C. briggsae published in 2003, we found the following: (1) Intrachromosomal rearrangements are frequent within and even between arms but are less common within central regions and between arms and centers. (2) Genes have remained overwhelmingly on the same chromosomes. (3) The distinctive features that distinguish C. elegans arms from centers also are seen in C. briggsae chromosomes. The conservation of these features between these two species, despite the approximately 100 million years since their most recent common ancestor, provides clear evidence of the selective advantages of the domain architecture of chromosomes. The continuing association of genes on the same chromosomes suggests that this may also be advantageous.
The conservation of both chromosomal organization and synteny between two distantly related species suggests roles for chromosome organization in the fitness of an organism.
The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point.
We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data.
By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development.
Nematode.net (www.nematode.net) is a web- accessible resource for investigating gene sequences from nematode genomes. The database is an outgrowth of the parasitic nematode EST project at Washington University’s Genome Sequencing Center (GSC), St Louis. A sister project at the University of Edinburgh and the Sanger Institute is also underway. More than 295 000 ESTs have been generated from >30 nematodes other than Caenorhabditis elegans including key parasites of humans, animals and plants. Nematode.net currently provides NemaGene EST cluster consensus sequence, enhanced online BLAST search tools, functional classifications of cluster sequences and comprehensive information concerning the ongoing generation of nematode genome data. The long-term goal of nematode.net is to provide the scientific community with the highest quality sequence information and tools for studying these diverse species.
As an entrée to characterizing plant parasitic nematode genomes, 5,700 expressed sequence tags (ESTs) from the infective second-stage larvae (L2) of the root-knot nematode Meloidogyne incognita have been analyzed. In addition to identifying putative nematode-specific and Tylenchida-specific genes, sequencing revealed previously uncharacterized horizontal gene transfer candidates in Meloidogyne with high identity to rhizobacterial genes.
Plant parasitic nematodes are major pathogens of most crops. Molecular characterization of these species as well as the development of new techniques for control can benefit from genomic approaches. As an entrée to characterizing plant parasitic nematode genomes, we analyzed 5,700 expressed sequence tags (ESTs) from second-stage larvae (L2) of the root-knot nematode Meloidogyne incognita.
From these, 1,625 EST clusters were formed and classified by function using the Gene Ontology (GO) hierarchy and the Kyoto KEGG database. L2 larvae, which represent the infective stage of the life cycle before plant invasion, express a diverse array of ligand-binding proteins and abundant cytoskeletal proteins. L2 are structurally similar to Caenorhabditis elegans dauer larva and the presence of transcripts encoding glyoxylate pathway enzymes in the M. incognita clusters suggests that root-knot nematode larvae metabolize lipid stores while in search of a host. Homology to other species was observed in 79% of translated cluster sequences, with the C. elegans genome providing more information than any other source. In addition to identifying putative nematode-specific and Tylenchida-specific genes, sequencing revealed previously uncharacterized horizontal gene transfer candidates in Meloidogyne with high identity to rhizobacterial genes including homologs of nodL acetyltransferase and novel cellulases.
With sequencing from plant parasitic nematodes accelerating, the approaches to transcript characterization described here can be applied to more extensive datasets and also provide a foundation for more complex genome analyses.
In C. elegans, assembly of hypodermal hemidesmosome-like structures called fibrous organelles is temporally and spatially coordinated with the assembly of the muscle contractile apparatus, suggesting that signals are exchanged between these cell types to position fibrous organelles correctly. Myotactin, a protein recognized by monoclonal antibody MH46, is a candidate for such a signaling molecule. The antigen, although expressed by hypodermis, first reflects the pattern of muscle elements and only later reflects the pattern of fibrous organelles. Confocal microscopy shows that in adult worms myotactin and fibrous organelles show coincident localization. Further, cell ablation studies show the bodywall muscle cells are necessary for normal myotactin distribution. To investigate myotactin's role in muscle-hypodermal signaling, we characterized the myotactin locus molecularly and genetically. Myotactin is a novel transmembrane protein of ∼500 kd. The extracellular domain contains at least 32 fibronectin type III repeats and the cytoplasmic domain contains unique sequence. In mutants lacking myotactin, muscle cells detach when embryonic muscle contraction begins. Later in development, fibrous organelles become delocalized and are not restricted to regions of the hypodermis previously contacted by muscle. These results suggest myotactin helps maintain the association between the muscle contractile apparatus and hypodermal fibrous organelles.
Caenorhabditis elegans; cell adhesion; cell ablations; muscle; hemidesmosome-like structures
While Caenorhabditis elegans specifically responds to infection by the up-regulation of certain genes, distinct pathogens trigger the expression of a common set of genes. We applied new methods to conduct a comprehensive and comparative study of the transcriptional response of C. elegans to bacterial and fungal infection. Using tiling arrays and/or RNA-sequencing, we have characterized the genome-wide transcriptional changes that underlie the host's response to infection by three bacterial (Serratia marcescens, Enterococcus faecalis and otorhabdus luminescens) and two fungal pathogens (Drechmeria coniospora and Harposporium sp.). We developed a flexible tool, the WormBase Converter (available at http://wormbasemanager.sourceforge.net/), to allow cross-study comparisons. The new data sets provided more extensive lists of differentially regulated genes than previous studies. Annotation analysis confirmed that genes commonly up-regulated by bacterial infections are related to stress responses. We found substantial overlaps between the genes regulated upon intestinal infection by the bacterial pathogens and Harposporium, and between those regulated by Harposporium and D. coniospora, which infects the epidermis. Among the fungus-regulated genes, there was a significant bias towards genes that are evolving rapidly and potentially encode small proteins. The results obtained using new methods reveal that the response to infection in C. elegans is determined by the nature of the pathogen, the site of infection and the physiological imbalance provoked by infection. They form the basis for future functional dissection of innate immune signaling. Finally, we also propose alternative methods to identify differentially regulated genes that take into account the greater variability in lowly expressed genes.
MicroRNAs (miRNAs) have been found to regulate gene expression across eukaryotic species, but the function of most miRNA genes remains unknown. Here we describe how the analysis of the expression patterns of a well-conserved miRNA gene, mir-57, at cellular resolution for every minute during early development of Caenorhabditis elegans provided key insights in understanding its function. Remarkably, mir-57 expression shows strong positional bias but little tissue specificity, a pattern reminiscent of Hox gene function. Despite the minor defects produced by a loss of function mutation, overexpression of mir-57 causes dramatic posterior defects, which also mimic the phenotypes of mutant alleles of a posterior Hox gene, nob-1, an Abd homolog. More importantly, nob-1 expression is found in the same two posterior AB sublineages as those expressing mir-57 but with an earlier onset. Intriguingly, nob-1 functions as an activator for mir-57 expression; it is also a direct target of mir-57. In agreement with this, loss of mir-57 function partially rescues the nob-1 allele defects, indicating a negative feedback regulatory loop between the miRNA and Hox gene to provide positional cues. Given the conservation of the miRNA and Hox gene, the regulatory mechanism might be broadly used across species. The strategy used here to explore mir-57 function provides a path to dissect the regulatory relationship between genes.
miRNAs are small RNAs found in many multi-cellular species that inhibit gene expression. Many of them play important roles in cancer and cell fate determination, but the function of most miRNAs is uncertain. Using live cell imaging and automated expression analysis, we found a miRNA gene, mir-57, is expressed in a position rather than tissue dependent way. Hox genes also regulate cell fate patterning along anterior-posterior (a-p) axis across different tissues. By investigating interactions between genes of these classes expressed in mir-57 expressing cells, we demonstrated by both genetic analysis and gene expression assays that a negative feedback loop between a posterior Hox gene, nob-1, and mir-57 regulates posterior cell fate determination in C. elegans. On the one hand, the Hox gene is required for normal activation of mir-57 expression, and on the other, the Hox gene functions as a direct target of and is repressed by the miRNA. Given the conservation of the two genes, a negative feedback loop between Hox and miRNA genes might be broadly used across species to regulate cell fate along the a-p axis. Detailed expression analysis may provide a general way to dissect the regulatory role of miRNAs.
Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs.
Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of C. elegans. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center.
Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.
Transcription factors are key components of regulatory networks that control development, as well as the response to environmental stimuli. We have established an experimental pipeline in Caenorhabditis elegans that permits global identification of the binding sites for transcription factors using chromatin immunoprecipitation and deep sequencing. We describe and validate this strategy, and apply it to the transcription factor PHA-4, which plays critical roles in organ development and other cellular processes. We identified thousands of binding sites for PHA-4 during formation of the embryonic pharynx, and also found a role for this factor during the starvation response. Many binding sites were found to shift dramatically between embryos and starved larvae, from developmentally regulated genes to genes involved in metabolism. These results indicate distinct roles for this regulator in two different biological processes and demonstrate the versatility of transcription factors in mediating diverse biological roles.
The C. elegans transcription factor PHA-4 is a member of the highly conserved FOXA family of transcription factors. These factors act as master regulators of organ development by controlling how genes are turned off and on as tissues are formed. Additionally they regulate genes in response to nutrient levels and control both longevity and survival of the organism. However, the extent to which these factors control similar or distinct gene targets for each of these functions is unknown. For this reason, we have used the technique of chromatin immunoprecipitation followed by deep sequencing (ChIP–Seq), to define the target binding sites of PHA-4 on a genome-wide scale, when it is either functioning as an organ identity regulator or in response to environmental stress. Our data clearly demonstrate distinct sets of biologically relevant target genes for the transcription factor PHA-4 under these two different conditions. Not only have we defined PHA-4 targets, but we established an experimental ChIP–Seq pipeline to facilitate the identification of binding sites for many transcription factors in the future.
A codon usage table for 32 nematode species is presented and suggests that total genomic GC content drives codon usage.
Codon usage has direct utility in molecular characterization of species and is also a marker for molecular evolution. To understand codon usage within the diverse phylum Nematoda, we analyzed a total of 265,494 expressed sequence tags (ESTs) from 30 nematode species. The full genomes of Caenorhabditis elegans and C. briggsae were also examined. A total of 25,871,325 codons were analyzed and a comprehensive codon usage table for all species was generated. This is the first codon usage table available for 24 of these organisms.
Codon usage similarity in Nematoda usually persists over the breadth of a genus but then rapidly diminishes even within each clade. Globodera, Meloidogyne, Pristionchus, and Strongyloides have the most highly derived patterns of codon usage. The major factor affecting differences in codon usage between species is the coding sequence GC content, which varies in nematodes from 32% to 51%. Coding GC content (measured as GC3) also explains much of the observed variation in the effective number of codons (R = 0.70), which is a measure of codon bias, and it even accounts for differences in amino acid frequency. Codon usage is also affected by neighboring nucleotides (N1 context). Coding GC content correlates strongly with estimated noncoding genomic GC content (R = 0.92). On examining abundant clusters in five species, candidate optimal codons were identified that may be preferred in highly expressed transcripts.
Evolutionary models indicate that total genomic GC content, probably the product of directional mutation pressure, drives codon usage rather than the converse, a conclusion that is supported by examination of nematode genomes.
Hookworms, infecting over one billion people, are the mostly closely related major human parasites to the model nematode Caenorhabditis elegans. Applying genomics techniques to these species, we analyzed 3,840 and 3,149 genes from Ancylostoma caninum and A. ceylanicum.
Transcripts originated from libraries representing infective L3 larva, stimulated L3, arrested L3, and adults. Most genes are represented in single stages including abundant transcripts like hsp-20 in infective L3 and vit-3 in adults. Over 80% of the genes have homologs in C. elegans, and nearly 30% of these were with observable RNA interference phenotypes. Homologies were identified to nematode-specific and clade V specific gene families. To study the evolution of hookworm genes, 574 A. caninum / A. ceylanicum orthologs were identified, all of which were found to be under purifying selection with distribution ratios of nonsynonymous to synonymous amino acid substitutions similar to that reported for C. elegans / C. briggsae orthologs. The phylogenetic distance between A. caninum and A. ceylanicum is almost identical to that for C. elegans / C. briggsae.
The genes discovered should substantially accelerate research toward better understanding of the parasites' basic biology as well as new therapies including vaccines and novel anthelmintics.
The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.
With the Caenorhabditis briggsae genome now in hand, C. elegans biologists have a powerful new research tool to refine their knowledge of gene function in C. elegans and to study the path of genome evolution