The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point.
We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data.
By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development.
Comparative genomic analysis of important signaling pathways in C. briggase and C. elegans reveals both conserved features and also differences. To build a framework to address the significance of these features we determined the C. briggsae embryonic cell lineage, using the tools StarryNite and AceTree. We traced both cell divisions and cell positions for all cells through all but the last round of cell division and for selected cells through the final round. We found the lineage to be remarkably similar to that of C. elegans. Not only did the founder cells give rise to similar numbers of progeny, the relative cell division timing and positions were largely maintained. These lineage similarities appear to give rise to similar cell fates as judged both by the positions of lineally-equivalent cells and by the patterns of cell deaths in both species. However, some reproducible differences were seen, e.g., the P4 cell cycle length is more than 40% longer in C. briggsae than that in C. elegans (p < 0.01). The extensive conservation of embryonic development between such divergent species suggests that substantial evolutionary distance between these two species has not altered these early developmental cellular events, although the developmental defects of transpecies hybrids suggest that the details of the underlying molecular pathways have diverged sufficiently so as to not be interchangeable.
C. briggsae; C. elegans; embryo; cell lineage; signaling pathway
Motivation: Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene expression in mouse embryo provides a powerful resource to discover the biological function of embryo organization. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable, and inevitably errors arise from the tedious nature of the task. In this article, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms.
Results: The method takes images from in situ hybridization studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images. We evaluate our method on image data from the EURExpress study, where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70% and 80% with few exceptions. We show that other known methods have lower classification performance than ours. We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise.
Availability: The annotation result and the experimental dataset in the article can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Motivation: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm Caenorhabditis elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C.elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes.
Results: In this article, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal and hypodermal cells) in high-resolution images of adult C.elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum-flow and apply a cross-entropy–based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system. These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C.elegans.
Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms.
Results: In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods.
Fluorescent and bioluminescent time-lapse microscopy approaches have been successfully used to investigate molecular mechanisms underlying the mammalian circadian oscillator at the single cell level. However, most of the available software and common methods based on intensity-threshold segmentation and frame-to-frame tracking are not applicable in these experiments. This is due to cell movement and dramatic changes in the fluorescent/bioluminescent reporter protein during the circadian cycle, with the lowest expression level very close to the background intensity. At present, the standard approach to analyze data sets obtained from time lapse microscopy is either manual tracking or application of generic image-processing software/dedicated tracking software. To our knowledge, these existing software solutions for manual and automatic tracking have strong limitations in tracking individual cells if their plane shifts.
In an attempt to improve existing methodology of time-lapse tracking of a large number of moving cells, we have developed a semi-automatic software package. It extracts the trajectory of the cells by tracking theirs displacements, makes the delineation of cell nucleus or whole cell, and finally yields measurements of various features, like reporter protein expression level or cell displacement. As an example, we present here single cell circadian pattern and motility analysis of NIH3T3 mouse fibroblasts expressing a fluorescent circadian reporter protein. Using Circadian Gene Express plugin, we performed fast and nonbiased analysis of large fluorescent time lapse microscopy datasets.
Our software solution, Circadian Gene Express (CGE), is easy to use and allows precise and semi-automatic tracking of moving cells over longer period of time. In spite of significant circadian variations in protein expression with extremely low expression levels at the valley phase, CGE allows accurate and efficient recording of large number of cell parameters, including level of reporter protein expression, velocity, direction of movement, and others. CGE proves to be useful for the analysis of widefield fluorescent microscopy datasets, as well as for bioluminescence imaging. Moreover, it might be easily adaptable for confocal image analysis by manually choosing one of the focal planes of each z-stack of the various time points of a time series.
CGE is a Java plugin for ImageJ; it is freely available at: http://bigwww.epfl.ch/sage/soft/circadian/.
This protocol and the accompanying software program called LEVER enable quantitative automated analysis of phase contrast time-lapse images of cultured neural stem cells. Images are captured at 5 min. intervals over a period of 5 to 15 days as the cells proliferate and differentiate. LEVER automatically segments, tracks and generates lineage trees of the stem cells from the image sequence. In addition to generating lineage trees capturing the population dynamics of clonal development, LEVER extracts quantitative phenotypic measurements of cell location, shape, movement, and size. When available, the system can include biomolecular markers imaged using fluorescence. It then displays the results to the user for highly efficient inspection and editing to correct any errors in the segmentation, tracking or lineaging. In order to enable high-throughput inspection, LEVER incorporates features for rapid identification of errors, and learning from user-supplied corrections to automatically identify and correct related errors.
stem cell; progenitor cell; clone; lineage; time-lapse recording; image sequence analysis; cell dynamics; lineage editing; stem cell tracking edit
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.
Cellular processes, such as chromosome assembly, segregation and cytokinesis,are inherently dynamic. Time-lapse imaging of living cells, using fluorescent-labeled reporter proteins or differential interference contrast (DIC) microscopy, allows for the examination of the temporal progression of these dynamic events which is otherwise inferred from analysis of fixed samples1,2. Moreover, the study of the developmental regulations of cellular processes necessitates conducting time-lapse experiments on an intact organism during development. The Caenorhabiditis elegans embryo is light-transparent and has a rapid, invariant developmental program with a known cell lineage3, thus providing an ideal experiment model for studying questions in cell biology4,5and development6-9. C. elegans is amendable to genetic manipulation by forward genetics (based on random mutagenesis10,11) and reverse genetics to target specific genes (based on RNAi-mediated interference and targeted mutagenesis12-15). In addition, transgenic animals can be readily created to express fluorescently tagged proteins or reporters16,17. These traits combine to make it easy to identify the genetic pathways regulating fundamental cellular and developmental processes in vivo18-21. In this protocol we present methods for live imaging of C. elegans embryos using DIC optics or GFP fluorescence on a compound epifluorescent microscope. We demonstrate the ease with which readily available microscopes, typically used for fixed sample imaging, can also be applied for time-lapse analysis using open-source software to automate the imaging process.
We have built a digital nuclear atlas of the newly hatched, first larval stage (L1) of the wild type hermaphrodite of C. elegans at single cell resolution from confocal image stacks of 15 individuals. The atlas quantifies the stereotypy of the locations and provides for other statistics on the spatial patterns of the 357 nuclei that could be faithfully segmented and annotated of the 558 present at this developmental stage. Given this atlas we then developed an automated approach to assign cell names to each nucleus in a 3D image of an L1 worm. We achieve 86% accuracy in identifying the 357 nuclei automatically. This computational method is essential for high-throughput single cell analyses of the worm at post-embryonic stages, such as determining the expression of every gene in every cell during development from the L1 onward, or ablating or stimulating cells under computer control in a high-throughput functional screen.
Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction.
Database URL: http://www.cellfinder.org/
Motivation: The centrosome is a dynamic structure in animal cells that serves as a microtubule organizing center during mitosis and also regulates cell-cycle progression and sets polarity cues. Automated and reliable tracking of centrosomes is essential for genetic screens that study the process of centrosome assembly and maturation in the nematode Caenorhabditis elegans.
Results: We have developed a fully automatic system for tracking and measuring fluorescently labeled centrosomes in 3D time-lapse images of early C.elegans embryos. Using a spinning disc microscope, we monitor the centrosome cycle in living embryos from the 1- up to the 16-cell stage at imaging intervals between 30 and 50 s. After establishing the centrosome trajectories with a novel method involving two layers of inference, we also automatically detect the nuclear envelope breakdown in each cell division and recognize the identities of the centrosomes based on the invariant cell lineage of C.elegans. To date, we have tracked centrosomes in over 500 wild type and mutant embryos with almost no manual correction required.
Availability: The centrosome tracking software along with test data is freely available at http://publications.mpi-cbg.de/itemPublication.html?documentId=4082
Manual chemical data curation from publications is error-prone, time consuming, and hard to maintain up-to-date data sets. Automatic information extraction can be used as a tool to reduce these problems. Since chemical structures usually described in images, information extraction needs to combine structure image recognition and text mining together.
We have developed ChemEx, a chemical information extraction system. ChemEx processes both text and images in publications. Text annotator is able to extract compound, organism, and assay entities from text content while structure image recognition enables translation of chemical raster images to machine readable format. A user can view annotated text along with summarized information of compounds, organism that produces those compounds, and assay tests.
ChemEx facilitates and speeds up chemical data curation by extracting compounds, organisms, and assays from a large collection of publications. The software and corpus can be downloaded from http://www.biotec.or.th/isl/ChemEx.
A main goal in understanding cell mechanisms is to explain the relationship among genes and related molecular processes through the combined use of technological platforms and bioinformatics analysis. High throughput platforms, such as microarrays, enable the investigation of the whole genome in a single experiment. There exist different kind of microarray platforms, that produce different types of binary data (images and raw data). Moreover, also considering a single vendor, different chips are available. The analysis of microarray data requires an initial preprocessing phase (i.e. normalization and summarization) of raw data that makes them suitable for use on existing platforms, such as the TIGR M4 Suite. Nevertheless, the annotations of data with additional information such as gene function, is needed to perform more powerful analysis. Raw data preprocessing and annotation is often performed in a manual and error prone way. Moreover, many available preprocessing tools do not support annotation. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of microarray data are needed.
The paper presents μ-CS (Microarray Cel file Summarizer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix binary data. μ-CS is based on a client-server architecture. The μ-CS client is provided both as a plug-in of the TIGR M4 platform and as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data, avoiding the manual invocation of external tools (e.g. the Affymetrix Power Tools), the manual loading of preprocessing libraries, and the management of intermediate files. The μ-CS server automatically updates the references to the summarization and annotation libraries that are provided to the μ-CS client before the preprocessing. The μ-CS server is based on the web services technology and can be easily extended to support more microarray vendors (e.g. Illumina).
Thus μ-CS users can directly manage binary data without worrying about locating and invoking the proper preprocessing tools and chip-specific libraries. Moreover, users of the μ-CS plugin for TM4 can manage Affymetrix binary files without using external tools, such as APT (Affymetrix Power Tools) and related libraries. Consequently, μ-CS offers four main advantages: (i) it avoids to waste time for searching the correct libraries, (ii) it reduces possible errors in the preprocessing and further analysis phases, e.g. due to the incorrect choice of parameters or the use of old libraries, (iii) it implements the annotation of preprocessed data, and finally, (iv) it may enhance the quality of further analysis since it provides the most updated annotation libraries. The μ-CS client is freely available as a plugin of the TM4 platform as well as a standalone application at the project web site (http://bioingegneria.unicz.it/M-CS).
Studies on malaria vector ecology and development/evaluation of vector control strategies often require measures of mosquito life history traits. Assessing the fecundity of malaria vectors can be carried out by counting eggs laid by Anopheles females. However, manually counting the eggs is time consuming, tedious, and error prone.
In this paper we present a newly developed software for high precision automatic egg counting. The software written in the Java programming language proposes a user-friendly interface and a complete online manual. It allows the inspection of results by the operator and includes proper tools for manual corrections. The user can in fact correct any details on the acquired results by a mouse click. Time saving is significant and errors due to loss of concentration are avoided.
The software was tested over 16 randomly chosen images from 2 different experiments. The results show that the proposed automatic method produces results that are close to the ground truth.
The proposed approaches demonstrated a very high level of robustness. The adoption of the proposed software package will save many hours of labor to the bench scientist. The software needs no particular configuration and is freely available for download on: http://w3.ualg.pt/∼hshah/eggcounter/.
Advances in microscopy and fluorescent reporters have allowed us to detect the onset of gene expression on a cell-by-cell basis in a systemic fashion. This information, however, is often encoded in large repositories of images, and developing ways to extract this spatiotemporal expression data is a difficult problem that often uses complex domain-specific methods for each individual data set. We present a more unified approach that incorporates general previous information into a hierarchical probabilistic model to extract spatiotemporal gene expression from 4D confocal microscopy images of developing Caenorhabditis elegans embryos. This approach reduces the overall error rate of our automated lineage tracing pipeline by 3.8-fold, allowing us to routinely follow the C. elegans lineage to later stages of development, where individual neuronal subspecification becomes apparent. Unlike previous methods that often use custom approaches that are organism specific, our method uses generalized linear models and extensions of standard reversible jump Markov chain Monte Carlo methods that can be readily extended to other organisms for a variety of biological inference problems relating to cell fate specification. This modeling approach is flexible and provides tractable avenues for incorporating additional previous information into the model for similar difficult high-fidelity/low error tolerance image analysis problems for systematically applied genomic experiments.
C. elegans; cell fate; gene expression; image analysis; lineage
Apoptotic cells in animals are engulfed by phagocytic cells and subsequently degraded inside phagosomes. To study the mechanisms controlling the degradation of apoptotic cells, we developed time-lapse imaging protocols in developing Caenorhabditis elegans embryos and established the temporal order of multiple events during engulfment and phagosome maturation. These include sequential enrichment on phagocytic membranes of phagocytic receptor cell death abnormal 1 (CED-1), large GTPase dynamin (DYN-1), phosphatidylinositol 3-phosphate (PI(3)P), and the small GTPase RAB-7, as well as the incorporation of endosomes and lysosomes to phagosomes. Two parallel genetic pathways are known to control the engulfment of apoptotic cells in C. elegans. We found that null mutations in each pathway not only delay or block engulfment, but also delay the degradation of engulfed apoptotic cells. One of the pathways, composed of CED-1, the adaptor protein CED-6, and DYN-1, controls the rate of enrichment of PI(3)P and RAB-7 on phagosomal surfaces and the formation of phagolysosomes. We further identified an essential role of RAB-7 in promoting the recruitment and fusion of lysosomes to phagosomes. We propose that RAB-7 functions as a downstream effector of the CED-1 pathway to mediate phagolysosome formation. Our work suggests that phagocytic receptors, which were thought to act specifically in initiating engulfment, also control phagosome maturation through the sequential activation of multiple effectors such as dynamin, PI(3)P, and Rab GTPases.
Cells undergoing programmed cell death, or apoptosis, within an animal are swiftly engulfed by phagocytes and degraded inside phagosomes, vesicles in which the apoptotic cell is bounded by the engulfing cell's membrane. Little is known about how the degradation process is triggered and controlled. We studied the degradation of apoptotic cells during the development of the nematode Caenorhabditis elegans. Aided by a newly developed live-cell imaging technique, we identified multiple cellular events occurring on phagosomal surfaces and tracked the initiation signal to CED-1, a phagocytic receptor known to recognize apoptotic cells and to initiate their engulfment. CED-1 activates DYN-1, a large GTPase, which further activates downstream events, leading intracellular organelles such as endosomes and lysosomes to deliver to phagosomes various molecules essential for the degradation of apoptotic cells. As well as establishing a temporal order of events that lead to the degradation of apoptotic cells, the results suggest that phagocytic receptors, in addition to initiating phagocytosis, promote phagosome maturation through the sequential activation of multiple effector molecules.
The authors have identified multiple cellular events leading to the degradation of engulfed apoptotic cells in the nematodeC. elegans, and found that CED-1, a phagocytic receptor thought to specifically control apoptotic-cell engulfment, activates a signaling pathway that initiates phagosome maturation.
The annotation of newly sequenced bacterial genomes begins with running several automatic analysis methods, with major emphasis on the identification of protein-coding genes. DNA sequences are heterogeneous in local nucleotide composition and this leads sometimes to sequences being annotated as authentic genes when they are not protein-coding genes or are true but uncharacterized protein-coding genes. This first annotation step is generally followed by an expert manual annotation of the predicted genes. The genomic data (sequence and annotations) organized in an appropriate databank file format is subsequently submitted to an entry point of the International Nucleotide Sequence Database. These procedures are inevitably subject to mistakes, and this can lead to unintentional syntactic annotation errors being stored in public databanks. Here, we present a new web program, MICheck (MIcrobial genome Checker), that enables rapid verification of sets of annotated genes and frameshifts in previously published bacterial genomes. The web interface allows one easily to investigate the MICheck results, i.e. inaccurate or missed gene annotations: a graphical representation is drawn, in which the genomic context of a unique coding DNA sequence annotation or a predicted frameshift is given, using information on the coding potential (curves) and annotation of the neighbouring genes. We illustrate some capabilities of the MICheck site through the analysis of 20 bacterial genomes, 9 of which were selected for their ‘Reviewed’ status in the National Center for Biotechnology Information (NCBI) Reference Sequence Project (RefSeq). In the context of the numerous re-annotation projects for microbial genomes, this tool can be seen as a preliminary step before the functional re-annotation step to check quickly for missing or wrongly annotated genes. The MICheck website is accessible at the following address: .
The size of the protein sequence database has been exponentially increasing due to advances in genome sequencing. However, experimentally characterized proteins only constitute a small portion of the database, such that the majority of sequences have been annotated by computational approaches. Current automatic annotation pipelines inevitably introduce errors, making the annotations unreliable. Instead of such error-prone automatic annotations, functional interpretation should rely on annotations of ‘reference proteins’ that have been experimentally characterized or manually curated.
The Seq2Ref server uses BLAST to detect proteins homologous to a query sequence and identifies the reference proteins among them. Seq2Ref then reports publications with experimental characterizations of the identified reference proteins that might be relevant to the query. Furthermore, a plurality-based rating system is developed to evaluate the homologous relationships and rank the reference proteins by their relevance to the query.
The reference proteins detected by our server will lend insight into proteins of unknown function and provide extensive information to develop in-depth understanding of uncharacterized proteins. Seq2Ref is available at: http://prodata.swmed.edu/seq2ref.
Web server; Functional interpretation; Sequence homology; Reference protein; PubMed literature
Important events in embryonic development such as gastrulation, neurulation, and cranial neural crest development occur in ectodermal tissues during vertebrate embryonic development. Although the chicken embryo is a well-established model system in developmental biology, problems of accessibility of the ectoderm for experimental manipulation and an inability to generate gene knockouts previously impeded studies of gene regulation and key processes during chicken gastrulation and neurulation. The technique of in ovo electroporation permits genetic manipulation and provides a powerful animal model. However, the problem of accessibility to the ectoderm in ovo requires an ex ovo whole-embryo culture approach combined with electroporation. This Unit provides convenient and reproducible whole-embryo ex ovo culture and electroporation protocols. These chicken embryo culture protocols can be used not only for gene regulatory experiments, but also for time-lapse imaging of the dynamics of early vertebrate development.
chicken embryo; ex ovo culture; electroporation; morpholino; neural crest cells
The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.
gene expression pattern; image annotation; bag-of-words; sparse learning; regularization
'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.
Annotating genes and their products with Gene Ontology codes is an important area of research. One approach is to use the information available about these genes in the biomedical literature. The goal in this paper, based on this approach, is to develop automatic annotation methods that can supplement the expensive manual annotation processes currently in place.
Using a set of Support Vector Machines (SVM) classifiers we were able to achieve Fscores of 0.49, 0.41 and 0.33 for codes of the molecular function, cellular component and biological process GO hierarchies respectively. We find that alternative term weighting strategies are not different from each other in performance and feature selection strategies reduce performance. The best thresholding strategy is one where a single threshold is picked for each hierarchy. Hierarchy level is important especially for molecular function and biological process. The cellular component hierarchy stands apart from the other two in many respects. This may be due to fundamental differences in link semantics. This research shows that it is possible to beneficially exploit the hierarchical structures by defining and testing a relaxed criteria for classification correctness. Finally it is possible to build classifiers for codes with very few associated documents but as expected a huge penalty is paid in performance.
The GO annotation problem is complex. Several key observations have been made as for example about topic drift that may be important to consider in annotation strategies.
Cell number changes during normal development, and in disease (e.g., neurodegeneration, cancer). Many genes affect cell number, thus functional genetic analysis frequently requires analysis of cell number alterations upon loss of function mutations or in gain of function experiments. Drosophila is a most powerful model organism to investigate the function of genes involved in development or disease in vivo. Image processing and pattern recognition techniques can be used to extract information from microscopy images to quantify automatically distinct cellular features, but these methods are still not very extended in this model organism. Thus cellular quantification is often carried out manually, which is laborious, tedious, error prone or humanly unfeasible. Here, we present DeadEasy Mito-Glia, an image processing method to count automatically the number of mitotic cells labelled with anti-phospho-histone H3 and of glial cells labelled with anti-Repo in Drosophila embryos. This programme belongs to the DeadEasy suite of which we have previously developed versions to count apoptotic cells and neuronal nuclei. Having separate programmes is paramount for accuracy. DeadEasy Mito-Glia is very easy to use, fast, objective and very accurate when counting dividing cells and glial cells labelled with a nuclear marker. Although this method has been validated for Drosophila embryos, we provide an interactive window for biologists to easily extend its application to other nuclear markers and other sample types. DeadEasy MitoGlia is freely available as an ImageJ plug-in, it increases the repertoire of tools for in vivo genetic analysis, and it will be of interest to a broad community of developmental, cancer and neuro-biologists.
The ability to detect nuclei in embryos is essential for studying the development of multicellular organisms. A system of automated nuclear detection has already been tested on a set of four-dimensional (4D) Nomarski differential interference contrast (DIC) microscope images of Caenorhabditis elegans embryos. However, the system needed laborious hand-tuning of its parameters every time a new image set was used. It could not detect nuclei in the process of cell division, and could detect nuclei only from the two- to eight-cell stages.
We developed a system that automates the detection of nuclei in a set of 4D DIC microscope images of C. elegans embryos. Local image entropy is used to produce regions of the images that have the image texture of the nucleus. From these regions, those that actually detect nuclei are manually selected at the first and last time points of the image set, and an object-tracking algorithm then selects regions that detect nuclei in between the first and last time points. The use of local image entropy makes the system applicable to multiple image sets without the need to change its parameter values. The use of an object-tracking algorithm enables the system to detect nuclei in the process of cell division. The system detected nuclei with high sensitivity and specificity from the one- to 24-cell stages.
A combination of local image entropy and an object-tracking algorithm enabled highly objective and productive detection of nuclei in a set of 4D DIC microscope images of C. elegans embryos. The system will facilitate genomic and computational analyses of C. elegans embryos.