The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point.
We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data.
By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development.
Comparative genomic analysis of important signaling pathways in C. briggase and C. elegans reveals both conserved features and also differences. To build a framework to address the significance of these features we determined the C. briggsae embryonic cell lineage, using the tools StarryNite and AceTree. We traced both cell divisions and cell positions for all cells through all but the last round of cell division and for selected cells through the final round. We found the lineage to be remarkably similar to that of C. elegans. Not only did the founder cells give rise to similar numbers of progeny, the relative cell division timing and positions were largely maintained. These lineage similarities appear to give rise to similar cell fates as judged both by the positions of lineally-equivalent cells and by the patterns of cell deaths in both species. However, some reproducible differences were seen, e.g., the P4 cell cycle length is more than 40% longer in C. briggsae than that in C. elegans (p < 0.01). The extensive conservation of embryonic development between such divergent species suggests that substantial evolutionary distance between these two species has not altered these early developmental cellular events, although the developmental defects of transpecies hybrids suggest that the details of the underlying molecular pathways have diverged sufficiently so as to not be interchangeable.
C. briggsae; C. elegans; embryo; cell lineage; signaling pathway
Motivation: Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene expression in mouse embryo provides a powerful resource to discover the biological function of embryo organization. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable, and inevitably errors arise from the tedious nature of the task. In this article, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms.
Results: The method takes images from in situ hybridization studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images. We evaluate our method on image data from the EURExpress study, where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70% and 80% with few exceptions. We show that other known methods have lower classification performance than ours. We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise.
Availability: The annotation result and the experimental dataset in the article can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/.
Supplementary Information: Supplementary data are available at Bioinformatics online.
'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.
Cellular processes, such as chromosome assembly, segregation and cytokinesis,are inherently dynamic. Time-lapse imaging of living cells, using fluorescent-labeled reporter proteins or differential interference contrast (DIC) microscopy, allows for the examination of the temporal progression of these dynamic events which is otherwise inferred from analysis of fixed samples1,2. Moreover, the study of the developmental regulations of cellular processes necessitates conducting time-lapse experiments on an intact organism during development. The Caenorhabiditis elegans embryo is light-transparent and has a rapid, invariant developmental program with a known cell lineage3, thus providing an ideal experiment model for studying questions in cell biology4,5and development6-9. C. elegans is amendable to genetic manipulation by forward genetics (based on random mutagenesis10,11) and reverse genetics to target specific genes (based on RNAi-mediated interference and targeted mutagenesis12-15). In addition, transgenic animals can be readily created to express fluorescently tagged proteins or reporters16,17. These traits combine to make it easy to identify the genetic pathways regulating fundamental cellular and developmental processes in vivo18-21. In this protocol we present methods for live imaging of C. elegans embryos using DIC optics or GFP fluorescence on a compound epifluorescent microscope. We demonstrate the ease with which readily available microscopes, typically used for fixed sample imaging, can also be applied for time-lapse analysis using open-source software to automate the imaging process.
Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms.
Results: In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods.
Fluorescent and bioluminescent time-lapse microscopy approaches have been successfully used to investigate molecular mechanisms underlying the mammalian circadian oscillator at the single cell level. However, most of the available software and common methods based on intensity-threshold segmentation and frame-to-frame tracking are not applicable in these experiments. This is due to cell movement and dramatic changes in the fluorescent/bioluminescent reporter protein during the circadian cycle, with the lowest expression level very close to the background intensity. At present, the standard approach to analyze data sets obtained from time lapse microscopy is either manual tracking or application of generic image-processing software/dedicated tracking software. To our knowledge, these existing software solutions for manual and automatic tracking have strong limitations in tracking individual cells if their plane shifts.
In an attempt to improve existing methodology of time-lapse tracking of a large number of moving cells, we have developed a semi-automatic software package. It extracts the trajectory of the cells by tracking theirs displacements, makes the delineation of cell nucleus or whole cell, and finally yields measurements of various features, like reporter protein expression level or cell displacement. As an example, we present here single cell circadian pattern and motility analysis of NIH3T3 mouse fibroblasts expressing a fluorescent circadian reporter protein. Using Circadian Gene Express plugin, we performed fast and nonbiased analysis of large fluorescent time lapse microscopy datasets.
Our software solution, Circadian Gene Express (CGE), is easy to use and allows precise and semi-automatic tracking of moving cells over longer period of time. In spite of significant circadian variations in protein expression with extremely low expression levels at the valley phase, CGE allows accurate and efficient recording of large number of cell parameters, including level of reporter protein expression, velocity, direction of movement, and others. CGE proves to be useful for the analysis of widefield fluorescent microscopy datasets, as well as for bioluminescence imaging. Moreover, it might be easily adaptable for confocal image analysis by manually choosing one of the focal planes of each z-stack of the various time points of a time series.
CGE is a Java plugin for ImageJ; it is freely available at: http://bigwww.epfl.ch/sage/soft/circadian/.
This protocol and the accompanying software program called LEVER enable quantitative automated analysis of phase contrast time-lapse images of cultured neural stem cells. Images are captured at 5 min. intervals over a period of 5 to 15 days as the cells proliferate and differentiate. LEVER automatically segments, tracks and generates lineage trees of the stem cells from the image sequence. In addition to generating lineage trees capturing the population dynamics of clonal development, LEVER extracts quantitative phenotypic measurements of cell location, shape, movement, and size. When available, the system can include biomolecular markers imaged using fluorescence. It then displays the results to the user for highly efficient inspection and editing to correct any errors in the segmentation, tracking or lineaging. In order to enable high-throughput inspection, LEVER incorporates features for rapid identification of errors, and learning from user-supplied corrections to automatically identify and correct related errors.
stem cell; progenitor cell; clone; lineage; time-lapse recording; image sequence analysis; cell dynamics; lineage editing; stem cell tracking edit
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.
A main goal in understanding cell mechanisms is to explain the relationship among genes and related molecular processes through the combined use of technological platforms and bioinformatics analysis. High throughput platforms, such as microarrays, enable the investigation of the whole genome in a single experiment. There exist different kind of microarray platforms, that produce different types of binary data (images and raw data). Moreover, also considering a single vendor, different chips are available. The analysis of microarray data requires an initial preprocessing phase (i.e. normalization and summarization) of raw data that makes them suitable for use on existing platforms, such as the TIGR M4 Suite. Nevertheless, the annotations of data with additional information such as gene function, is needed to perform more powerful analysis. Raw data preprocessing and annotation is often performed in a manual and error prone way. Moreover, many available preprocessing tools do not support annotation. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of microarray data are needed.
The paper presents μ-CS (Microarray Cel file Summarizer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix binary data. μ-CS is based on a client-server architecture. The μ-CS client is provided both as a plug-in of the TIGR M4 platform and as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data, avoiding the manual invocation of external tools (e.g. the Affymetrix Power Tools), the manual loading of preprocessing libraries, and the management of intermediate files. The μ-CS server automatically updates the references to the summarization and annotation libraries that are provided to the μ-CS client before the preprocessing. The μ-CS server is based on the web services technology and can be easily extended to support more microarray vendors (e.g. Illumina).
Thus μ-CS users can directly manage binary data without worrying about locating and invoking the proper preprocessing tools and chip-specific libraries. Moreover, users of the μ-CS plugin for TM4 can manage Affymetrix binary files without using external tools, such as APT (Affymetrix Power Tools) and related libraries. Consequently, μ-CS offers four main advantages: (i) it avoids to waste time for searching the correct libraries, (ii) it reduces possible errors in the preprocessing and further analysis phases, e.g. due to the incorrect choice of parameters or the use of old libraries, (iii) it implements the annotation of preprocessed data, and finally, (iv) it may enhance the quality of further analysis since it provides the most updated annotation libraries. The μ-CS client is freely available as a plugin of the TM4 platform as well as a standalone application at the project web site (http://bioingegneria.unicz.it/M-CS).
Apoptotic cells in animals are engulfed by phagocytic cells and subsequently degraded inside phagosomes. To study the mechanisms controlling the degradation of apoptotic cells, we developed time-lapse imaging protocols in developing Caenorhabditis elegans embryos and established the temporal order of multiple events during engulfment and phagosome maturation. These include sequential enrichment on phagocytic membranes of phagocytic receptor cell death abnormal 1 (CED-1), large GTPase dynamin (DYN-1), phosphatidylinositol 3-phosphate (PI(3)P), and the small GTPase RAB-7, as well as the incorporation of endosomes and lysosomes to phagosomes. Two parallel genetic pathways are known to control the engulfment of apoptotic cells in C. elegans. We found that null mutations in each pathway not only delay or block engulfment, but also delay the degradation of engulfed apoptotic cells. One of the pathways, composed of CED-1, the adaptor protein CED-6, and DYN-1, controls the rate of enrichment of PI(3)P and RAB-7 on phagosomal surfaces and the formation of phagolysosomes. We further identified an essential role of RAB-7 in promoting the recruitment and fusion of lysosomes to phagosomes. We propose that RAB-7 functions as a downstream effector of the CED-1 pathway to mediate phagolysosome formation. Our work suggests that phagocytic receptors, which were thought to act specifically in initiating engulfment, also control phagosome maturation through the sequential activation of multiple effectors such as dynamin, PI(3)P, and Rab GTPases.
Cells undergoing programmed cell death, or apoptosis, within an animal are swiftly engulfed by phagocytes and degraded inside phagosomes, vesicles in which the apoptotic cell is bounded by the engulfing cell's membrane. Little is known about how the degradation process is triggered and controlled. We studied the degradation of apoptotic cells during the development of the nematode Caenorhabditis elegans. Aided by a newly developed live-cell imaging technique, we identified multiple cellular events occurring on phagosomal surfaces and tracked the initiation signal to CED-1, a phagocytic receptor known to recognize apoptotic cells and to initiate their engulfment. CED-1 activates DYN-1, a large GTPase, which further activates downstream events, leading intracellular organelles such as endosomes and lysosomes to deliver to phagosomes various molecules essential for the degradation of apoptotic cells. As well as establishing a temporal order of events that lead to the degradation of apoptotic cells, the results suggest that phagocytic receptors, in addition to initiating phagocytosis, promote phagosome maturation through the sequential activation of multiple effector molecules.
The authors have identified multiple cellular events leading to the degradation of engulfed apoptotic cells in the nematodeC. elegans, and found that CED-1, a phagocytic receptor thought to specifically control apoptotic-cell engulfment, activates a signaling pathway that initiates phagosome maturation.
The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.
gene expression pattern; image annotation; bag-of-words; sparse learning; regularization
While several mouse strains have recently been developed for tracing neural crest or oligodendrocyte lineages, each strain has inherent limitations. The connection between human SOX10 mutations and neural crest cell pathogenesis led us to focus on the Sox10 gene, which is critical for neural crest development. We generated Sox10-Venus BAC transgenic mice to monitor Sox10 expression in both normal development and in pathological processes.
Tissue fluorescence distinguished neural crest progeny cells and oligodendrocytes in the Sox10-Venus mouse embryo. Immunohistochemical analysis confirmed that Venus expression was restricted to cells expressing endogenous Sox10. Time-lapse imaging of various tissues in Sox10-Venus mice demonstrated that Venus expression could be visualized at the single-cell level in vivo due to the intense, focused Venus fluorescence. In the adult Sox10-Venus mouse, several types of mature and immature oligodendrocytes along with Schwann cells were clearly labeled with Venus, both before and after spinal cord injury.
In the newly-developed Sox10-Venus transgenic mouse, Venus fluorescence faithfully mirrors endogenous Sox10 expression and allows for in vivo imaging of live cells at the single-cell level. This Sox10-Venus mouse will thus be a useful tool for studying neural crest cells or oligodendrocytes, both in development and in pathological processes.
The ability to detect nuclei in embryos is essential for studying the development of multicellular organisms. A system of automated nuclear detection has already been tested on a set of four-dimensional (4D) Nomarski differential interference contrast (DIC) microscope images of Caenorhabditis elegans embryos. However, the system needed laborious hand-tuning of its parameters every time a new image set was used. It could not detect nuclei in the process of cell division, and could detect nuclei only from the two- to eight-cell stages.
We developed a system that automates the detection of nuclei in a set of 4D DIC microscope images of C. elegans embryos. Local image entropy is used to produce regions of the images that have the image texture of the nucleus. From these regions, those that actually detect nuclei are manually selected at the first and last time points of the image set, and an object-tracking algorithm then selects regions that detect nuclei in between the first and last time points. The use of local image entropy makes the system applicable to multiple image sets without the need to change its parameter values. The use of an object-tracking algorithm enables the system to detect nuclei in the process of cell division. The system detected nuclei with high sensitivity and specificity from the one- to 24-cell stages.
A combination of local image entropy and an object-tracking algorithm enabled highly objective and productive detection of nuclei in a set of 4D DIC microscope images of C. elegans embryos. The system will facilitate genomic and computational analyses of C. elegans embryos.
The cell biological events that guide early embryonic development occur with great precision within species but can be quite diverse across species. How these cellular processes evolve and which molecular components underlie evolutionary changes is poorly understood. To begin to address these questions, we systematically investigated early embryogenesis, from the one- to the four-cell embryo, in 34 nematode species related to C. elegans. We found 40 cell-biological characters that captured the phenotypic differences between these species. By tracing the evolutionary changes on a molecular phylogeny, we found that these characters evolved multiple times and independently of one another. Strikingly, all these phenotypes are mimicked by single-gene RNAi experiments in C. elegans. We use these comparisons to hypothesize the molecular mechanisms underlying the evolutionary changes. For example, we predict that a cell polarity module was altered during the evolution of the Protorhabditis group and show that PAR-1, a kinase localized asymmetrically in C. elegans early embryos, is symmetrically localized in the one-cell stage of Protorhabditis group species. Our genome-wide approach identifies candidate molecules—and thereby modules—associated with evolutionary changes in cell-biological phenotypes.
Nematoda; C. elegans; embryogenesis; early development; phenotypic analysis; cell polarity; phenotypic plasticity
Hemispheric asymmetry of hippocampal volume is a common finding that has biological relevance, including associations with dementia and cognitive performance. However, a recent study has reported the possibility of systematic error in measurements of hippocampal asymmetry by magnetic resonance volumetry. We manually traced the volumes of the anterior and posterior hippocampus in 40 healthy people to measure systematic error related to image orientation. We found a bias due to the side of the screen on which the hippocampus was viewed, such that hippocampal volume was larger when traced on the left side of the screen than when traced on the right (p = 0.05). However, this bias was smaller than the anatomical right > left asymmetry of the anterior hippocampus. We found right > left asymmetry of hippocampal volume regardless of image presentation (radiological versus neurological). We conclude that manual segmentation protocols can minimize the effect of image orientation in the study of hippocampal volume asymmetry, but our confirmation that such bias exists suggests strategies to avoid it in future studies.
hippocampus; asymmetry; MRI volumetry; segmentation; asymmetry bias
Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts.
We employ the Textpresso category-based information retrieval and extraction system , developed by WormBase to explore how Textpresso might improve the efficiency with which we manually curate C. elegans proteins to the Gene Ontology's Cellular Component Ontology. Using a training set of sentences that describe results of localization experiments in the published literature, we generated three new curation task-specific categories (Cellular Components, Assay Terms, and Verbs) containing words and phrases associated with reports of experimentally determined subcellular localization. We compared the results of manual curation to that of Textpresso queries that searched the full text of articles for sentences containing terms from each of the three new categories plus the name of a previously uncurated C. elegans protein, and found that Textpresso searches identified curatable papers with recall and precision rates of 79.1% and 61.8%, respectively (F-score of 69.5%), when compared to manual curation. Within those documents, Textpresso identified relevant sentences with recall and precision rates of 30.3% and 80.1% (F-score of 44.0%). From returned sentences, curators were able to make 66.2% of all possible experimentally supported GO Cellular Component annotations with 97.3% precision (F-score of 78.8%). Measuring the relative efficiencies of Textpresso-based versus manual curation we find that Textpresso has the potential to increase curation efficiency by at least 8-fold, and perhaps as much as 15-fold, given differences in individual curatorial speed.
Textpresso is an effective tool for improving the efficiency of manual, experimentally based curation. Incorporating a Textpresso-based Cellular Component curation pipeline at WormBase has allowed us to transition from strictly manual curation of this data type to a more efficient pipeline of computer-assisted validation. Continued development of curation task-specific Textpresso categories will provide an invaluable resource for genomics databases that rely heavily on manual curation.
The ever-increasing number of sequenced and annotated genomes has made management of their annotations a significant undertaking, especially for large eukaryotic genomes containing many thousands of genes. Typically, changes in gene and transcript numbers are used to summarize changes from release to release, but these measures say nothing about changes to individual annotations, nor do they provide any means to identify annotations in need of manual review.
In response, we have developed a suite of quantitative measures to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review. We have applied these measures to the annotations of five eukaryotic genomes over multiple releases – H. sapiens, M. musculus, D. melanogaster, A. gambiae, and C. elegans.
Our results provide the first detailed, historical overview of how these genomes' annotations have changed over the years, and demonstrate the usefulness of these measures for genome annotation management.
The developmental transcriptome of the Xenopus laevis intestine, from embryo to adult, reveals insights into the regulation of gut development in all vertebrates.
To adapt to its changing dietary environment, the digestive tract is extensively remodeled from the embryo to the adult during vertebrate development. Xenopus laevis metamorphosis is an excellent model system for studying mammalian gastrointestinal development and is used to determine the genes and signaling programs essential for intestinal development and maturation.
The metamorphosing intestine can be divided into four distinct developmental time points and these were analyzed with X. laevis microarrays. Due to the high level of conservation in developmental signaling programs and homology to mammalian genes, annotations and bioinformatics analysis were based on human orthologs. Clustering of the expression patterns revealed co-expressed genes involved in essential cell processes such as apoptosis and proliferation. The two largest clusters of genes have expression peaks and troughs at the climax of metamorphosis, respectively. Novel conserved gene ontology categories regulated during this period include transcriptional activity, signal transduction, and metabolic processes. Additionally, we identified larval/embryo- and adult-specific genes. Detailed analysis revealed 17 larval specific genes that may represent molecular markers for human colonic cancers, while many adult specific genes are associated with dietary enzymes.
This global developmental expression study provides the first detailed molecular description of intestinal remodeling and maturation during postembryonic development, which should help improve our understanding of intestinal organogenesis and human diseases. This study significantly contributes towards our understanding of the dynamics of molecular regulation during development and tissue renewal, which is important for future basic and clinical research and for medicinal applications.
Using DNA sequences 5′ to open reading frames, we have constructed green fluorescent protein (GFP) fusions and generated spatial and temporal tissue expression profiles for 1,886 specific genes in the nematode Caenorhabditis elegans. This effort encompasses about 10% of all genes identified in this organism. GFP-expressing wild-type animals were analyzed at each stage of development from embryo to adult. We have identified 5′ DNA regions regulating expression at all developmental stages and in 38 different cell and tissue types in this organism. Among the regulatory regions identified are sequences that regulate expression in all cells, in specific tissues, in combinations of tissues, and in single cells. Most of the genes we have examined in C. elegans have human orthologs. All the images and expression pattern data generated by this project are available at WormAtlas (http://gfpweb.aecom.yu.edu/index) and through WormBase (http://www.wormbase.org).
Knowing where a protein is expressed provides an important clue about its potential function. As critical as this information is, we have complete developmental expression profiles for only a small fraction of all genes expressed in any metazoan. Here, we have generated spatial and temporal tissue expression profiles for 10% of all genes in the nematode Caenorhabditis elegans. Worms expressing putative gene regulatory elements fused with green fluorescent protein were analyzed at each stage of development from embryo to adult. Among the regulatory regions identified are sequences that regulate expression in all cells, in specific tissues, in combinations of tissues, and in single cells. Most of the genes we have examined in C. elegans have human orthologs. Our analysis of complex expression patterns for so many genes may not only facilitate functional analysis in C. elegans, but also create a foundation for decoding the informational hierarchies governing gene expression in all organisms.
Using DNA sequences 5' to open reading frames, the authors construct green fluorescent protein fusions and generate spatial and temporal tissue expression profiles for 10% of all genes in the nematode Caenorhabditis elegans.
Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns.
Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results.
Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance.
The increasing complexities of clinical trials have led to increasing costs for investigators and organizations that author and administer those trials. The process of authoring a clinical trial protocol, the document that specifies the details of the study, is usually a manual task, and thus authors may introduce subtle errors in medical and procedural content. We have created a protocol inspection and critiquing tool (PICASSO) that evaluates the procedural aspects of a clinical trial protocol. To implement this tool, we developed a knowledge base for clinical trials that contains knowledge of the medical domain (diseases, drugs, lab tests, etc.) and of specific requirements for clinical trial protocols (eligibility criteria, patient treatments, and monitoring activities). We also developed a set of constraints, expressed in a formal language, that describe appropriate practices for authoring clinical trials. If a clinical trial designed with PICASSO violates any of these constraints, PICASSO generates a message to the user and a list of inconsistencies for each violated constraint. To test our methodology, we encoded portions of a hypothetical protocol and implemented designs consistent and inconsistent with known clinical trial practice. Our hope is that this methodology will be useful for standardizing new protocols and improving their quality.
The Berkeley Drosophila Genome Project (BDGP) has produced a large number of gene expression patterns, many of which have been annotated textually with anatomical and developmental terms. These terms spatially correspond to local regions of the images; however, they are attached collectively to groups of images, such that it is unknown which term is assigned to which region of which image in the group. This poses a challenge to the development of the computational method to automate the textual description of expression patterns contained in each image. In this paper, we show that the underlying nature of this task matches well with a new machine learning framework, Multi-Instance Multi-Label learning (MIML). We propose a new MIML support vector machine to solve the problems that beset the annotation task. Empirical study shows that the proposed method outperforms the state-of-the-art Drosophila gene expression pattern annotation methods.
Summary: Images containing spatial expression patterns illuminate the roles of different genes during embryogenesis. In order to generate initial clues to regulatory interactions, biologists frequently need to know the set of genes expressed at the same time at specific locations in a developing embryo, as well as related research publications. However, text-based mining of image annotations and research articles cannot produce all relevant results, because the primary data are images that exist as graphical objects. We have developed a unique knowledge base (FlyExpress) to facilitate visual mining of images from Drosophila melanogaster embryogenesis. By clicking on specific locations in pictures of fly embryos from different stages of development and different visual projections, users can produce a list of genes and publications instantly. In FlyExpress, each queryable embryo picture is a heat-map that captures the expression patterns of more than 4500 genes and more than 2600 published articles. In addition, one can view spatial patterns for particular genes over time as well as find other genes with similar expression patterns at a given developmental stage. Therefore, FlyExpress is a unique tool for mining spatiotemporal expression patterns in a format readily accessible to the scientific community.
We show that Saccharomyces cerevisiae and Caenorhabditis elegans embryos experience high lethality at low temperature due to cell cycle errors and that anoxia-induced suspended animation prevents such lethality by preventing occurrence of such errors.
The orderly progression through the cell division cycle is of paramount importance to all organisms, as improper progression through the cycle could result in defects with grave consequences. Previously, our lab has shown that model eukaryotes such as Saccharomyces cerevisiae, Caenorhabditis elegans, and Danio rerio all retain high viability after prolonged arrest in a state of anoxia-induced suspended animation, implying that in such a state, progression through the cell division cycle is reversibly arrested in an orderly manner. Here, we show that S. cerevisiae (both wild-type and several cold-sensitive strains) and C. elegans embryos exhibit a dramatic decrease in viability that is associated with dysregulation of the cell cycle when exposed to low temperatures. Further, we find that when the yeast or worms are first transitioned into a state of anoxia-induced suspended animation before cold exposure, the associated cold-induced viability defects are largely abrogated. We present evidence that by imposing an anoxia-induced reversible arrest of the cell cycle, the cells are prevented from engaging in aberrant cell cycle events in the cold, thus allowing the organisms to avoid the lethality that would have occurred in a cold, oxygenated environment.