In this paper we report a database and a series of techniques related to the problem of tracking cells, and detecting their divisions, in time-lapse movies of mammalian embryos. Our contributions are: (1) a method for counting embryos in a well, and cropping each individual embryo across frames, to create individual movies for cell tracking; (2) a semi-automated method for cell tracking that works up to the 8-cell stage, along with a software implementation available to the public (this software was used to build the reported database); (3) an algorithm for automatic tracking up to the 4-cell stage, based on histograms of mirror symmetry coefficients captured using wavelets; (4) a cell-tracking database containing 100 annotated examples of mammalian embryos up to the 8-cell stage; (5) statistical analysis of various timing distributions obtained from those examples.
tracking; cell counting; event detection; dynamic programming; time series; embryo development; database
Circadian (~24hr) rhythms offer one of the best examples of how gene expression is tied to behavior. Circadian pacemaker neurons contain molecular clocks that control ~24hr rhythms in gene expression that in turn regulate electrical activity rhythms to control behavior.
Here we demonstrate the inverse relationship: there are broad transcriptional changes in Drosophila clock neurons (LNvs) in response to altered electrical activity, including a large set of circadian genes. Hyperexciting LNvs creates a morning-like expression profile for many circadian genes while hyperpolarization leads to an evening-like transcriptional state. The electrical effects robustly persist in per0 mutant LNvs but not in cyc0 mutant LNvs suggesting that neuronal activity interacts with the transcriptional activators of the core circadian clock. Bioinformatic and immunocytochemical analyses suggest that CREB family transcription factors link LNv electrical state to circadian gene expression.
The electrical state of a clock neuron can impose time-of-day to its transcriptional program. We propose that this acts as an internal zeitgeber to add robustness and precision to circadian behavioral rhythms.
High-content screening for gene profiling has generally been limited to single cells. Here, we explore an alternative approach—profiling gene function by analyzing effects of gene knockdowns on the architecture of a complex tissue in a multicellular organism. We profile 554 essential C. elegans genes by imaging gonad architecture and scoring 94 phenotypic features. To generate a reference for evaluating methods for network construction, genes were manually partitioned into 102 phenotypic classes, predicting functions for uncharacterized genes across diverse cellular processes. Using this classification as a benchmark, we developed a robust computational method for constructing gene networks from high-content profiles based on a network context-dependent measure that ranks the significance of links between genes. Our analysis reveals that multi-parametric profiling in a complex tissue yields functional maps with a resolution similar to genetic interaction-based profiling in unicellular eukaryotes—pinpointing subunits of macromolecular complexes and components functioning in common cellular processes.
N-Browse is a graphical network browser for the visualization and navigation of heterogeneous molecular interaction data. N-Browse runs as a Java applet in a Web browser, providing highly dynamic and interactive on-demand access to network data available from a remote server. The N-Browse interface is easy to use and accommodates multiple types of functional linkages with associated information, allowing the exploration of many layers of functional information simultaneously. Although created for applications in biology, N-Browse uses a generic database schema that can be adapted to network representations in any knowledge domain. The N-Browse client-server package is freely available for distribution, providing a convenient way for data producers and providers to distribute and offer interactive visualization of network-based data.
network; molecular; interaction; graph; browser; Web-based; client-server system; JAVA; functional genomics; GUI; visualization; database; MySQL
We present a hierarchical principle for object recognition and its application to automatically classify developmental stages of C. elegans animals from a population of mixed stages. The object recognition machine consists of four hierarchical layers, each composed of units upon which evaluation functions output a label score, followed by a grouping mechanism that resolves ambiguities in the score by imposing local consistency constraints. Each layer then outputs groups of units, from which the units of the next layer are derived. Using this hierarchical principle, the machine builds up successively more sophisticated representations of the objects to be classified. The algorithm segments large and small objects, decomposes objects into parts, extracts features from these parts, and classifies them by SVM. We are using this system to analyze phenotypic data from C. elegans high-throughput genetic screens, and our system overcomes a previous bottleneck in image analysis by achieving near real-time scoring of image data. The system is in current use in a functioning C. elegans laboratory and has processed over two hundred thousand images for lab users.
Temperature-sensitive (ts) mutations are mutations that exhibit a mutant phenotype at high or low temperatures and a wild-type phenotype at normal temperature. Temperature-sensitive mutants are valuable tools for geneticists, particularly in the study of essential genes. However, finding ts mutations typically relies on generating and screening many thousands of mutations, which is an expensive and labor-intensive process. Here we describe an in silico method that uses Rosetta and machine learning techniques to predict a highly accurate “top 5” list of ts mutations given the structure of a protein of interest. Rosetta is a protein structure prediction and design code, used here to model and score how proteins accommodate point mutations with side-chain and backbone movements. We show that integrating Rosetta relax-derived features with sequence-based features results in accurate temperature-sensitive mutation predictions.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Three-prime untranslated regions (3′UTRs) of metazoan messenger RNAs (mRNAs) contain numerous regulatory elements, yet remain largely uncharacterized. Using polyA capture, 3′ rapid amplification of complementary DNA (cDNA) ends, full-length cDNAs, and RNA-seq, we defined ∼26,000 distinct 3′UTRs in Caenorhabditis elegans for ∼85% of the 18,328 experimentally supported protein-coding genes and revised ∼40% of gene models. Alternative 3′UTR isoforms are frequent, often differentially expressed during development. Average 3′UTR length decreases with animal age. Surprisingly, no polyadenylation signal (PAS) was detected for 13% of polyadenylation sites, predominantly among shorter alternative isoforms. Trans-spliced (versus non–trans-spliced) mRNAs possess longer 3′UTRs and frequently contain no PAS or variant PAS. We identified conserved 3′UTR motifs, isoform-specific predicted microRNA target sites, and polyadenylation of most histone genes. Our data reveal a rich complexity of 3′UTRs, both genome-wide and throughout development.
Cellular responses to carcinogens are typically studied in transformed cell lines, which do not reflect the physiological status of normal tissues. To address this question, we have characterized the transcriptional program and cellular responses of human lung WI-38 fibroblasts upon exposure to the ultimate carcinogen benzo[a]pyrene diol epoxide (BPDE). In contrast to observations in cell lines, we find that BPDE treatment induces a strong inflammatory response in these normal fibroblasts. Whole-genome microarrays show induction of numerous inflammatory factors, including genes that encode interleukins (ILs), growth factors and enzymes related to prostaglandin synthesis and signaling. Real-time reverse transcription–polymerase chain reaction and enzyme-linked immunosorbent assay (ELISA) revealed a time- and dose-dependent-induced expression and production of cyclooxygenase 2, prostglandin E2 and IL1B, IL6 and IL8. In parallel, cell cycle progression and DNA repair processes were repressed, but DNA damage signaling was increased via p53-Ser15 phosphorylation and induced expression levels of GADD45A, CDKN1A, BTG2 and SESN1. Network analysis suggested that activator protein 1 transcription factors may link the cell cycle response and DNA damage signaling with the inflammatory stress–response in these cells. We confirmed this hypothesis by showing that p53-dependent signaling through c-jun N-terminal kinase (JNK) led to increased cJun-Ser63 phosphorylation and that inhibition of JNK-mediated cJun activation using p53- or JNK-specific inhibitors significantly reduced IL gene expression and subsequent production of IL8. This is the first demonstration that a strong inflammatory response is triggered in normal fibroblasts by BPDE and that this occurs through coordinated regulation with other cellular processes.
Graphical models of network associations are useful for both visualizing and integrating multiple types of association data. Identifying modules, or groups of functionally related gene products, is an important challenge in analyzing biological networks. However, existing tools to identify modules are insufficient when applied to dense networks of experimentally derived interaction data. To address this problem, we have developed an agglomerative clustering method that is able to identify highly modular sets of gene products within highly interconnected molecular interaction networks.
MINE outperforms MCODE, CFinder, NEMO, SPICi, and MCL in identifying non-exclusive, high modularity clusters when applied to the C. elegans protein-protein interaction network. The algorithm generally achieves superior geometric accuracy and modularity for annotated functional categories. In comparison with the most closely related algorithm, MCODE, the top clusters identified by MINE are consistently of higher density and MINE is less likely to designate overlapping modules as a single unit. MINE offers a high level of granularity with a small number of adjustable parameters, enabling users to fine-tune cluster results for input networks with differing topological properties.
MINE was created in response to the challenge of discovering high quality modules of gene products within highly interconnected biological networks. The algorithm allows a high degree of flexibility and user-customisation of results with few adjustable parameters. MINE outperforms several popular clustering algorithms in identifying modules with high modularity and obtains good overall recall and precision of functional annotations in protein-protein interaction networks from both S. cerevisiae and C. elegans.
To provide accurate biological hypotheses and inform upon global properties of cellular networks, systematic identification of protein–protein interactions has to meet high-quality standards. We present an expanded Caenorhabditis elegans protein-protein interaction network, or “interactome” map derived from testing a matrix of ~ 10,000 × ~ 10,000 proteins using a highly specific high-throughput yeast two-hybrid system. Through a new quality control empirical framework, We show that the resulting dataset (Worm Interactome 2007 or WI-2007) is similar in quality to low-throughput data curated from the literature. Previous interaction datasets have been filtered and integrated with WI-2007 to generate a high confidence consolidated map (Worm Interactome version 8 or WI8). This work allows us to estimate the size of the worm interactome at ~ 116,000 interactions. Comparison with other types of functional genomic data shows the complementarity of distinct experimental approaches in predicting different functional relationship features between genes or proteins.
Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.
Many protein-protein interactions are mediated through independently folding modular domains. Proteome-wide efforts to model protein-protein interaction or “interactome” networks have largely ignored this modular organization of proteins. We developed an experimental strategy to efficiently identify interaction domains and generated a domain-based interactome network for proteins involved in C. elegans early embryonic cell divisions. Minimal interacting regions were identified for over 200 proteins, providing important information on their domain organization. Furthermore, our approach increased the sensitivity of the two-hybrid system, resulting in a more complete interactome network. This interactome modeling strategy revealed new insights into C. elegans centrosome function and is applicable to other biological processes in this and other organisms.
Systematic mapping of genetic-interaction networks will provide an essential foundation for understanding complex genetic disorders, mechanisms of genetic buffering and principles of robustness and evolvability. A recent study of signaling pathways in Caenorhabditis elegans lays the next row of bricks in this foundation.
The actin cytoskeleton plays critical roles in early development in Caenorhabditis elegans. To further understand the complex roles of actin in early embryogenesis we use RNAi and in vivo imaging of filamentous actin (F-actin) dynamics.
Using RNAi, we found processes that are differentially sensitive to levels of actin during early embryogenesis. Mild actin depletion shows defects in cortical ruffling, pseudocleavage, and establishment of polarity, while more severe depletion shows defects in polar body extrusion, cytokinesis, chromosome segregation, and eventually, egg production. These defects indicate that actin is required for proper oocyte development, fertilization, and a wide range of important events during early embryogenesis, including proper chromosome segregation. In vivo visualization of the cortical actin cytoskeleton shows dynamics that parallel but are distinct from the previously described myosin dynamics. Two distinct types of actin organization are observed at the cortex. During asymmetric polarization to the anterior, or the establishment phase (Phase I), actin forms a meshwork of microfilaments and focal accumulations throughout the cortex, while during the anterior maintenance phase (Phase II) it undergoes a morphological transition to asymmetrically localized puncta. The proper asymmetric redistribution is dependent on the PAR proteins, while both asymmetric redistribution and morphological transitions are dependent upon PFN-1 and NMY-2. Just before cytokinesis, actin disappears from most of the cortex and is only found around the presumptive cytokinetic furrow. Finally, we describe dynamic actin-enriched comets in the early embryo.
During early C. elegans embryogenesis actin plays more roles and its organization is more dynamic than previously described. Morphological transitions of F-actin, from meshwork to puncta, as well as asymmetric redistribution, are regulated by the PAR proteins. Results from this study indicate new insights into the cellular and developmental roles of the actin cytoskeleton.
Three-prime untranslated regions (3′UTRs) are widely recognized as important post-transcriptional regulatory regions of mRNAs. RNA-binding proteins and small non-coding RNAs such as microRNAs (miRNAs) bind to functional elements within 3′UTRs to influence mRNA stability, translation and localization. These interactions play many important roles in development, metabolism and disease. However, even in the most well-annotated metazoan genomes, 3′UTRs and their functional elements are not well defined. Comprehensive and accurate genome-wide annotation of 3′UTRs and their functional elements is thus critical. We have developed an open-access database, available at http://www.UTRome.org, to provide a rich and comprehensive resource for 3′UTR biology in the well-characterized, experimentally tractable model system Caenorhabditis elegans. UTRome.org combines data from public repositories and a large-scale effort we are undertaking to characterize 3′UTRs and their functional elements in C. elegans, including 3′UTR sequences, graphical displays, predicted and validated functional elements, secondary structure predictions and detailed data from our cloning pipeline. UTRome.org will grow substantially over time to encompass individual 3′UTR isoforms for the majority of genes, new and revised functional elements, and in vivo data on 3′UTR function as they become available. The UTRome database thus represents a powerful tool to better understand the biology of 3′UTRs.
To initiate studies on how protein-protein interaction (or “interactome”) networks relate to multicellular functions, we have mapped a large fraction of the Caenorhabditis elegans interactome network. Starting with a subset of metazoan-specific proteins, more than 4000 interactions were identified from high-throughput, yeast two-hybrid (HT=Y2H) screens. Independent coaffinity purification assays experimentally validated the overall quality of this Y2H data set. Together with already described Y2H interactions and interologs predicted in silico, the current version of the Worm Interactome (WI5) map contains ∼5500 interactions. Topological and biological features of this interactome network, as well as its integration with phenome and transcriptome data sets, lead to numerous biological hypotheses.
microRNAs are small noncoding genes that regulate the protein production of genes by binding to partially complementary sites in the mRNAs of targeted genes. Here, using our algorithm PicTar, we exploit cross-species comparisons to predict, on average, 54 targeted genes per microRNA above noise in Drosophila melanogaster. Analysis of the functional annotation of target genes furthermore suggests specific biological functions for many microRNAs. We also predict combinatorial targets for clustered microRNAs and find that some clustered microRNAs are likely to coordinately regulate target genes. Furthermore, we compare microRNA regulation between insects and vertebrates. We find that the widespread extent of gene regulation by microRNAs is comparable between flies and mammals but that certain microRNAs may function in clade-specific modes of gene regulation. One of these microRNAs (miR-210) is predicted to contribute to the regulation of fly oogenesis. We also list specific regulatory relationships that appear to be conserved between flies and mammals. Our findings provide the most extensive microRNA target predictions in Drosophila to date, suggest specific functional roles for most microRNAs, indicate the existence of coordinate gene regulation executed by clustered microRNAs, and shed light on the evolution of microRNA function across large evolutionary distances. All predictions are freely accessible at our searchable Web site http://pictar.bio.nyu.edu.
MicroRNA genes are a recently discovered large class of small noncoding genes. These genes have been shown to regulate the expression of target genes by binding to partially complementary sites in the mRNAs of the targets. To understand microRNA function it is thus important to identify their targets. Here, the authors use their bioinformatic method, PicTar, and cross-species comparisons of several newly sequenced fly species to predict, genome wide, targets of microRNAs in Drosophila. They find that known fly microRNAs control at least 15% of all genes in D. melanogaster. They also show that genomic clusters of microRNAs are likely to coordinately regulate target genes. Analysis of the functional annotation of target genes furthermore suggests specific biological functions for many microRNAs. All predictions are freely accessible at http://pictar.bio.nyu.edu. Finally, Grün et al. compare the function of microRNAs across flies and mammals. They find that (a) the overall extent of microRNA gene regulation is comparable between both clades, (b) the number of targets for a conserved microRNA in flies correlates with the number of targets in mammals, (c) some conserved microRNAs may function in clade-specific modes of gene regulation, and (d) some specific microRNA–target regulatory relationships may be conserved between both clades.
RNA interference (RNAi) is being used in large-scale genomic studies as a rapid way to obtain in vivo functional information associated with specific genes. How best to archive and mine the complex data derived from these studies provides a series of challenges associated with both the methods used to elicit the RNAi response and the functional data gathered. RNAiDB (RNAi Database; http://www.rnai.org) has been created for the archival, distribution and analysis of phenotypic data from large-scale RNAi analyses in Caenorhabditis elegans. The database contains a compendium of publicly available data and provides information on experimental methods and phenotypic results, including raw data in the form of images and streaming time-lapse movies. Phenotypic summaries together with graphical displays of RNAi to gene mappings allow quick intuitive comparison of results from different RNAi assays and visualization of the gene product(s) potentially inhibited by each RNAi experiment based on multiple sequence analysis methods. RNAiDB can be searched using combinatorial queries and using the novel tool PhenoBlast, which ranks genes according to their overall phenotypic similarity. RNAiDB could serve as a model database for distributing and navigating in vivo functional information from large-scale systematic phenotypic analyses in different organisms.