The invariant lineage of the nematode Caenorhabditis elegans has potential as a powerful tool for the description of mutant phenotypes and gene expression patterns. We previously described procedures for the imaging and automatic extraction of the cell lineage from C. elegans embryos. That method uses time-lapse confocal imaging of a strain expressing histone-GFP fusions and a software package, StarryNite, processes the thousands of images and produces output files that describe the location and lineage relationship of each nucleus at each time point.
We have developed a companion software package, AceTree, which links the images and the annotations using tree representations of the lineage. This facilitates curation and editing of the lineage. AceTree also contains powerful visualization and interpretive tools, such as space filling models and tree-based expression patterning, that can be used to extract biological significance from the data.
By pairing a fast lineaging program written in C with a user interface program written in Java we have produced a powerful software suite for exploring embryonic development.
Comparative genomic analysis of important signaling pathways in C. briggase and C. elegans reveals both conserved features and also differences. To build a framework to address the significance of these features we determined the C. briggsae embryonic cell lineage, using the tools StarryNite and AceTree. We traced both cell divisions and cell positions for all cells through all but the last round of cell division and for selected cells through the final round. We found the lineage to be remarkably similar to that of C. elegans. Not only did the founder cells give rise to similar numbers of progeny, the relative cell division timing and positions were largely maintained. These lineage similarities appear to give rise to similar cell fates as judged both by the positions of lineally-equivalent cells and by the patterns of cell deaths in both species. However, some reproducible differences were seen, e.g., the P4 cell cycle length is more than 40% longer in C. briggsae than that in C. elegans (p < 0.01). The extensive conservation of embryonic development between such divergent species suggests that substantial evolutionary distance between these two species has not altered these early developmental cellular events, although the developmental defects of transpecies hybrids suggest that the details of the underlying molecular pathways have diverged sufficiently so as to not be interchangeable.
C. briggsae; C. elegans; embryo; cell lineage; signaling pathway
Motivation: Deciphering the regulatory and developmental mechanisms for multicellular organisms requires detailed knowledge of gene interactions and gene expressions. The availability of large datasets with both spatial and ontological annotation of the spatio-temporal patterns of gene expression in mouse embryo provides a powerful resource to discover the biological function of embryo organization. Ontological annotation of gene expressions consists of labelling images with terms from the anatomy ontology for mouse development. If the spatial genes of an anatomical component are expressed in an image, the image is then tagged with a term of that anatomical component. The current annotation is done manually by domain experts, which is both time consuming and costly. In addition, the level of detail is variable, and inevitably errors arise from the tedious nature of the task. In this article, we present a new method to automatically identify and annotate gene expression patterns in the mouse embryo with anatomical terms.
Results: The method takes images from in situ hybridization studies and the ontology for the developing mouse embryo, it then combines machine learning and image processing techniques to produce classifiers that automatically identify and annotate gene expression patterns in these images. We evaluate our method on image data from the EURExpress study, where we use it to automatically classify nine anatomical terms: humerus, handplate, fibula, tibia, femur, ribs, petrous part, scapula and head mesenchyme. The accuracy of our method lies between 70% and 80% with few exceptions. We show that other known methods have lower classification performance than ours. We have investigated the images misclassified by our method and found several cases where the original annotation was not correct. This shows our method is robust against this kind of noise.
Availability: The annotation result and the experimental dataset in the article can be freely accessed at http://www2.docm.mmu.ac.uk/STAFF/L.Han/geneannotation/.
Supplementary Information: Supplementary data are available at Bioinformatics online.
Motivation: Advances in high-resolution microscopy have recently made possible the analysis of gene expression at the level of individual cells. The fixed lineage of cells in the adult worm Caenorhabditis elegans makes this organism an ideal model for studying complex biological processes like development and aging. However, annotating individual cells in images of adult C.elegans typically requires expertise and significant manual effort. Automation of this task is therefore critical to enabling high-resolution studies of a large number of genes.
Results: In this article, we describe an automated method for annotating a subset of 154 cells (including various muscle, intestinal and hypodermal cells) in high-resolution images of adult C.elegans. We formulate the task of labeling cells within an image as a combinatorial optimization problem, where the goal is to minimize a scoring function that compares cells in a test input image with cells from a training atlas of manually annotated worms according to various spatial and morphological characteristics. We propose an approach for solving this problem based on reduction to minimum-cost maximum-flow and apply a cross-entropy–based learning algorithm to tune the weights of our scoring function. We achieve 84% median accuracy across a set of 154 cell labels in this highly variable system. These results demonstrate the feasibility of the automatic annotation of microscopy-based images in adult C.elegans.
A main goal in understanding cell mechanisms is to explain the relationship among genes and related molecular processes through the combined use of technological platforms and bioinformatics analysis. High throughput platforms, such as microarrays, enable the investigation of the whole genome in a single experiment. There exist different kind of microarray platforms, that produce different types of binary data (images and raw data). Moreover, also considering a single vendor, different chips are available. The analysis of microarray data requires an initial preprocessing phase (i.e. normalization and summarization) of raw data that makes them suitable for use on existing platforms, such as the TIGR M4 Suite. Nevertheless, the annotations of data with additional information such as gene function, is needed to perform more powerful analysis. Raw data preprocessing and annotation is often performed in a manual and error prone way. Moreover, many available preprocessing tools do not support annotation. Thus novel, platform independent, and possibly open source tools enabling the semi-automatic preprocessing and annotation of microarray data are needed.
The paper presents μ-CS (Microarray Cel file Summarizer), a cross-platform tool for the automatic normalization, summarization and annotation of Affymetrix binary data. μ-CS is based on a client-server architecture. The μ-CS client is provided both as a plug-in of the TIGR M4 platform and as a Java standalone tool and enables users to read, preprocess and analyse binary microarray data, avoiding the manual invocation of external tools (e.g. the Affymetrix Power Tools), the manual loading of preprocessing libraries, and the management of intermediate files. The μ-CS server automatically updates the references to the summarization and annotation libraries that are provided to the μ-CS client before the preprocessing. The μ-CS server is based on the web services technology and can be easily extended to support more microarray vendors (e.g. Illumina).
Thus μ-CS users can directly manage binary data without worrying about locating and invoking the proper preprocessing tools and chip-specific libraries. Moreover, users of the μ-CS plugin for TM4 can manage Affymetrix binary files without using external tools, such as APT (Affymetrix Power Tools) and related libraries. Consequently, μ-CS offers four main advantages: (i) it avoids to waste time for searching the correct libraries, (ii) it reduces possible errors in the preprocessing and further analysis phases, e.g. due to the incorrect choice of parameters or the use of old libraries, (iii) it implements the annotation of preprocessed data, and finally, (iv) it may enhance the quality of further analysis since it provides the most updated annotation libraries. The μ-CS client is freely available as a plugin of the TM4 platform as well as a standalone application at the project web site (http://bioingegneria.unicz.it/M-CS).
Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms.
Results: In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods.
Cellular processes, such as chromosome assembly, segregation and cytokinesis,are inherently dynamic. Time-lapse imaging of living cells, using fluorescent-labeled reporter proteins or differential interference contrast (DIC) microscopy, allows for the examination of the temporal progression of these dynamic events which is otherwise inferred from analysis of fixed samples1,2. Moreover, the study of the developmental regulations of cellular processes necessitates conducting time-lapse experiments on an intact organism during development. The Caenorhabiditis elegans embryo is light-transparent and has a rapid, invariant developmental program with a known cell lineage3, thus providing an ideal experiment model for studying questions in cell biology4,5and development6-9. C. elegans is amendable to genetic manipulation by forward genetics (based on random mutagenesis10,11) and reverse genetics to target specific genes (based on RNAi-mediated interference and targeted mutagenesis12-15). In addition, transgenic animals can be readily created to express fluorescently tagged proteins or reporters16,17. These traits combine to make it easy to identify the genetic pathways regulating fundamental cellular and developmental processes in vivo18-21. In this protocol we present methods for live imaging of C. elegans embryos using DIC optics or GFP fluorescence on a compound epifluorescent microscope. We demonstrate the ease with which readily available microscopes, typically used for fixed sample imaging, can also be applied for time-lapse analysis using open-source software to automate the imaging process.
The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.
gene expression pattern; image annotation; bag-of-words; sparse learning; regularization
Fluorescent and bioluminescent time-lapse microscopy approaches have been successfully used to investigate molecular mechanisms underlying the mammalian circadian oscillator at the single cell level. However, most of the available software and common methods based on intensity-threshold segmentation and frame-to-frame tracking are not applicable in these experiments. This is due to cell movement and dramatic changes in the fluorescent/bioluminescent reporter protein during the circadian cycle, with the lowest expression level very close to the background intensity. At present, the standard approach to analyze data sets obtained from time lapse microscopy is either manual tracking or application of generic image-processing software/dedicated tracking software. To our knowledge, these existing software solutions for manual and automatic tracking have strong limitations in tracking individual cells if their plane shifts.
In an attempt to improve existing methodology of time-lapse tracking of a large number of moving cells, we have developed a semi-automatic software package. It extracts the trajectory of the cells by tracking theirs displacements, makes the delineation of cell nucleus or whole cell, and finally yields measurements of various features, like reporter protein expression level or cell displacement. As an example, we present here single cell circadian pattern and motility analysis of NIH3T3 mouse fibroblasts expressing a fluorescent circadian reporter protein. Using Circadian Gene Express plugin, we performed fast and nonbiased analysis of large fluorescent time lapse microscopy datasets.
Our software solution, Circadian Gene Express (CGE), is easy to use and allows precise and semi-automatic tracking of moving cells over longer period of time. In spite of significant circadian variations in protein expression with extremely low expression levels at the valley phase, CGE allows accurate and efficient recording of large number of cell parameters, including level of reporter protein expression, velocity, direction of movement, and others. CGE proves to be useful for the analysis of widefield fluorescent microscopy datasets, as well as for bioluminescence imaging. Moreover, it might be easily adaptable for confocal image analysis by manually choosing one of the focal planes of each z-stack of the various time points of a time series.
CGE is a Java plugin for ImageJ; it is freely available at: http://bigwww.epfl.ch/sage/soft/circadian/.
This protocol and the accompanying software program called LEVER enable quantitative automated analysis of phase contrast time-lapse images of cultured neural stem cells. Images are captured at 5 min. intervals over a period of 5 to 15 days as the cells proliferate and differentiate. LEVER automatically segments, tracks and generates lineage trees of the stem cells from the image sequence. In addition to generating lineage trees capturing the population dynamics of clonal development, LEVER extracts quantitative phenotypic measurements of cell location, shape, movement, and size. When available, the system can include biomolecular markers imaged using fluorescence. It then displays the results to the user for highly efficient inspection and editing to correct any errors in the segmentation, tracking or lineaging. In order to enable high-throughput inspection, LEVER incorporates features for rapid identification of errors, and learning from user-supplied corrections to automatically identify and correct related errors.
stem cell; progenitor cell; clone; lineage; time-lapse recording; image sequence analysis; cell dynamics; lineage editing; stem cell tracking edit
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.
Advances in microscopy and fluorescent reporters have allowed us to detect the onset of gene expression on a cell-by-cell basis in a systemic fashion. This information, however, is often encoded in large repositories of images, and developing ways to extract this spatiotemporal expression data is a difficult problem that often uses complex domain-specific methods for each individual data set. We present a more unified approach that incorporates general previous information into a hierarchical probabilistic model to extract spatiotemporal gene expression from 4D confocal microscopy images of developing Caenorhabditis elegans embryos. This approach reduces the overall error rate of our automated lineage tracing pipeline by 3.8-fold, allowing us to routinely follow the C. elegans lineage to later stages of development, where individual neuronal subspecification becomes apparent. Unlike previous methods that often use custom approaches that are organism specific, our method uses generalized linear models and extensions of standard reversible jump Markov chain Monte Carlo methods that can be readily extended to other organisms for a variety of biological inference problems relating to cell fate specification. This modeling approach is flexible and provides tractable avenues for incorporating additional previous information into the model for similar difficult high-fidelity/low error tolerance image analysis problems for systematically applied genomic experiments.
C. elegans; cell fate; gene expression; image analysis; lineage
Apoptotic cells in animals are engulfed by phagocytic cells and subsequently degraded inside phagosomes. To study the mechanisms controlling the degradation of apoptotic cells, we developed time-lapse imaging protocols in developing Caenorhabditis elegans embryos and established the temporal order of multiple events during engulfment and phagosome maturation. These include sequential enrichment on phagocytic membranes of phagocytic receptor cell death abnormal 1 (CED-1), large GTPase dynamin (DYN-1), phosphatidylinositol 3-phosphate (PI(3)P), and the small GTPase RAB-7, as well as the incorporation of endosomes and lysosomes to phagosomes. Two parallel genetic pathways are known to control the engulfment of apoptotic cells in C. elegans. We found that null mutations in each pathway not only delay or block engulfment, but also delay the degradation of engulfed apoptotic cells. One of the pathways, composed of CED-1, the adaptor protein CED-6, and DYN-1, controls the rate of enrichment of PI(3)P and RAB-7 on phagosomal surfaces and the formation of phagolysosomes. We further identified an essential role of RAB-7 in promoting the recruitment and fusion of lysosomes to phagosomes. We propose that RAB-7 functions as a downstream effector of the CED-1 pathway to mediate phagolysosome formation. Our work suggests that phagocytic receptors, which were thought to act specifically in initiating engulfment, also control phagosome maturation through the sequential activation of multiple effectors such as dynamin, PI(3)P, and Rab GTPases.
Cells undergoing programmed cell death, or apoptosis, within an animal are swiftly engulfed by phagocytes and degraded inside phagosomes, vesicles in which the apoptotic cell is bounded by the engulfing cell's membrane. Little is known about how the degradation process is triggered and controlled. We studied the degradation of apoptotic cells during the development of the nematode Caenorhabditis elegans. Aided by a newly developed live-cell imaging technique, we identified multiple cellular events occurring on phagosomal surfaces and tracked the initiation signal to CED-1, a phagocytic receptor known to recognize apoptotic cells and to initiate their engulfment. CED-1 activates DYN-1, a large GTPase, which further activates downstream events, leading intracellular organelles such as endosomes and lysosomes to deliver to phagosomes various molecules essential for the degradation of apoptotic cells. As well as establishing a temporal order of events that lead to the degradation of apoptotic cells, the results suggest that phagocytic receptors, in addition to initiating phagocytosis, promote phagosome maturation through the sequential activation of multiple effector molecules.
The authors have identified multiple cellular events leading to the degradation of engulfed apoptotic cells in the nematodeC. elegans, and found that CED-1, a phagocytic receptor thought to specifically control apoptotic-cell engulfment, activates a signaling pathway that initiates phagosome maturation.
We describe a protocol for fully automated detection and segmentation of asymmetric, presumed excitatory, synapses in serial electron microscopy images of the adult mammalian cerebral cortex, taken with the focused ion beam, scanning electron microscope (FIB/SEM). The procedure is based on interactive machine learning and only requires a few labeled synapses for training. The statistical learning is performed on geometrical features of 3D neighborhoods of each voxel and can fully exploit the high z-resolution of the data. On a quantitative validation dataset of 111 synapses in 409 images of 1948×1342 pixels with manual annotations by three independent experts the error rate of the algorithm was found to be comparable to that of the experts (0.92 recall at 0.89 precision). Our software offers a convenient interface for labeling the training data and the possibility to visualize and proofread the results in 3D. The source code, the test dataset and the ground truth annotation are freely available on the website http://www.ilastik.org/synapse-detection.
We present an interactive program called HistoStitcher© for accurate and rapid reassembly of histology fragments into a pseudo-whole digitized histological section. HistoStitcher© provides both an intuitive graphical interface to assist the operator in performing the stitch of adjacent histology fragments by selecting pairs of anatomical landmarks, and a set of computational routines for determining and applying an optimal linear transformation to generate the stitched image. Reconstruction of whole histological sections from images of slides containing smaller fragments is required in applications where preparation of whole sections of large tissue specimens is not feasible or efficient, and such whole mounts are required to facilitate (a) disease annotation and (b) image registration with radiological images. Unlike manual reassembly of image fragments in a general purpose image editing program (such as Photoshop), HistoStitcher© provides memory efficient operation on high resolution digitized histology images and a highly flexible stitching process capable of producing more accurate results in less time. Further, by parameterizing the series of transformations determined by the stitching process, the stitching parameters can be saved, loaded at a later time, refined, or reapplied to multi-resolution scans, or quickly transmitted to another site. In this paper, we describe in detail the design of HistoStitcher© and the mathematical routines used for calculating the optimal image transformation, and demonstrate its operation for stitching high resolution histology quadrants of a prostate specimen to form a digitally reassembled whole histology section, for 8 different patient studies. To evaluate stitching quality, a 6 point scoring scheme, which assesses the alignment and continuity of anatomical structures important for disease annotation, is employed by three independent expert pathologists. For 6 studies compared with this scheme, reconstructed sections generated via HistoStitcher scored higher than reconstructions generated by an expert pathologist using Photoshop.
whole mount histology; prostate cancer; reconstruction; reassembly; HistoStitcher; digital pathology
Drosophila melanogaster is a major model organism for investigating the function and interconnection of animal genes in the earliest stages of embryogenesis. Today, images capturing Drosophila gene expression patterns are being produced at a higher throughput than ever before. The analysis of spatial patterns of gene expression is most biologically meaningful when images from a similar time point during development are compared. Thus, the critical first step is to determine the developmental stage of an embryo. This information is also needed to observe and analyze expression changes over developmental time. Currently, developmental stages (time) of embryos in images capturing spatial expression pattern are annotated manually, which is time- and labor-intensive. Embryos are often designated into stage ranges, making the information on developmental time course. This makes downstream analyses inefficient and biological interpretations of similarities and differences in spatial expression patterns challenging, particularly when using automated tools for analyzing expression patterns of large number of images.
Results: Here, we present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In an analysis of 3724 images, the new approach shows high accuracy in predicting the developmental stage correctly (79%). In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores for all images containing expression patterns of the same gene enable a direct way to view expression changes over developmental time for any gene. We show that the genomewide-expression-maps generated using images from embryos in refined stages illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.
Availability and implementation: The software package is available for download at: http://www.public.asu.edu/∼jye02/Software/Fly-Project/.
Supplementary data are available at Bioinformatics online.
Important events in embryonic development such as gastrulation, neurulation, and cranial neural crest development occur in ectodermal tissues during vertebrate embryonic development. Although the chicken embryo is a well-established model system in developmental biology, problems of accessibility of the ectoderm for experimental manipulation and an inability to generate gene knockouts previously impeded studies of gene regulation and key processes during chicken gastrulation and neurulation. The technique of in ovo electroporation permits genetic manipulation and provides a powerful animal model. However, the problem of accessibility to the ectoderm in ovo requires an ex ovo whole-embryo culture approach combined with electroporation. This Unit provides convenient and reproducible whole-embryo ex ovo culture and electroporation protocols. These chicken embryo culture protocols can be used not only for gene regulatory experiments, but also for time-lapse imaging of the dynamics of early vertebrate development.
chicken embryo; ex ovo culture; electroporation; morpholino; neural crest cells
New genomes are being sequenced at an increasingly rapid rate, far outpacing the rate at which manual gene annotation can be performed. Automated genome annotation is thus necessitated by this growth in genome projects; however, full-fledged annotation systems are usually home-grown and customized to a particular genome. There is thus a renewed need for accurate ab initio gene prediction methods. However, it is apparent that fully ab initio methods fall short of the required level of sensitivity and specificity for a quality annotation. Evidence in the form of expressed sequences gives the single biggest improvement in accuracy when used to inform gene predictions. Here, we present a lightweight pipeline for first-pass gene prediction on newly sequenced genomes. The two main components are ASPic, a program that derives highly accurate, albeit not necessarily complete, EST-based transcript annotations from EST alignments, and GeneID, a standard gene prediction program, which we have modified to take as evidence intron annotations. The introns output by ASPic CDS predictions is given to GeneID to constrain the exon-chaining process and produce predictions consistent with the underlying EST alignments. The pipeline was successfully tested on the entire C. elegans genome and the 44 ENCODE human pilot regions.
'Ted' (Trace editor) is a graphical editor for sequence and trace data from automated fluorescence sequencing machines. It provides facilities for viewing sequence and trace data (in top or bottom strand orientation), for editing the base sequence, for automated or manual trimming of the head (vector) and tail (uncertain data) from the sequence, for vertical and horizontal trace scaling, for keeping a history of sequence editing, and for output of the edited sequence. Ted has been used extensively in the C.elegans genome sequencing project, both as a stand-alone program and integrated into the Staden sequence assembly package, and has greatly aided in the efficiency and accuracy of sequence editing. It runs in the X windows environment on Sun workstations and is available from the authors. Ted currently supports sequence and trace data from the ABI 373A and Pharmacia A.L.F. sequencers.
We address the performance evaluation practices for developing medical image analysis methods, in particular, how to establish and share databases of medical images with verified ground truth and solid evaluation protocols. Such databases support the development of better algorithms, execution of profound method comparisons, and, consequently, technology transfer from research laboratories to clinical practice. For this purpose, we propose a framework consisting of reusable methods and tools for the laborious task of constructing a benchmark database. We provide a software tool for medical image annotation helping to collect class label, spatial span, and expert's confidence on lesions and a method to appropriately combine the manual segmentations from multiple experts. The tool and all necessary functionality for method evaluation are provided as public software packages. As a case study, we utilized the framework and tools to establish the DiaRetDB1 V2.1 database for benchmarking diabetic retinopathy detection algorithms. The database contains a set of retinal images, ground truth based on information from multiple experts, and a baseline algorithm for the detection of retinopathy lesions.
Annotating genes and their products with Gene Ontology codes is an important area of research. One approach is to use the information available about these genes in the biomedical literature. The goal in this paper, based on this approach, is to develop automatic annotation methods that can supplement the expensive manual annotation processes currently in place.
Using a set of Support Vector Machines (SVM) classifiers we were able to achieve Fscores of 0.49, 0.41 and 0.33 for codes of the molecular function, cellular component and biological process GO hierarchies respectively. We find that alternative term weighting strategies are not different from each other in performance and feature selection strategies reduce performance. The best thresholding strategy is one where a single threshold is picked for each hierarchy. Hierarchy level is important especially for molecular function and biological process. The cellular component hierarchy stands apart from the other two in many respects. This may be due to fundamental differences in link semantics. This research shows that it is possible to beneficially exploit the hierarchical structures by defining and testing a relaxed criteria for classification correctness. Finally it is possible to build classifiers for codes with very few associated documents but as expected a huge penalty is paid in performance.
The GO annotation problem is complex. Several key observations have been made as for example about topic drift that may be important to consider in annotation strategies.
The developmental transcriptome of the Xenopus laevis intestine, from embryo to adult, reveals insights into the regulation of gut development in all vertebrates.
To adapt to its changing dietary environment, the digestive tract is extensively remodeled from the embryo to the adult during vertebrate development. Xenopus laevis metamorphosis is an excellent model system for studying mammalian gastrointestinal development and is used to determine the genes and signaling programs essential for intestinal development and maturation.
The metamorphosing intestine can be divided into four distinct developmental time points and these were analyzed with X. laevis microarrays. Due to the high level of conservation in developmental signaling programs and homology to mammalian genes, annotations and bioinformatics analysis were based on human orthologs. Clustering of the expression patterns revealed co-expressed genes involved in essential cell processes such as apoptosis and proliferation. The two largest clusters of genes have expression peaks and troughs at the climax of metamorphosis, respectively. Novel conserved gene ontology categories regulated during this period include transcriptional activity, signal transduction, and metabolic processes. Additionally, we identified larval/embryo- and adult-specific genes. Detailed analysis revealed 17 larval specific genes that may represent molecular markers for human colonic cancers, while many adult specific genes are associated with dietary enzymes.
This global developmental expression study provides the first detailed molecular description of intestinal remodeling and maturation during postembryonic development, which should help improve our understanding of intestinal organogenesis and human diseases. This study significantly contributes towards our understanding of the dynamics of molecular regulation during development and tissue renewal, which is important for future basic and clinical research and for medicinal applications.
The ability to detect nuclei in embryos is essential for studying the development of multicellular organisms. A system of automated nuclear detection has already been tested on a set of four-dimensional (4D) Nomarski differential interference contrast (DIC) microscope images of Caenorhabditis elegans embryos. However, the system needed laborious hand-tuning of its parameters every time a new image set was used. It could not detect nuclei in the process of cell division, and could detect nuclei only from the two- to eight-cell stages.
We developed a system that automates the detection of nuclei in a set of 4D DIC microscope images of C. elegans embryos. Local image entropy is used to produce regions of the images that have the image texture of the nucleus. From these regions, those that actually detect nuclei are manually selected at the first and last time points of the image set, and an object-tracking algorithm then selects regions that detect nuclei in between the first and last time points. The use of local image entropy makes the system applicable to multiple image sets without the need to change its parameter values. The use of an object-tracking algorithm enables the system to detect nuclei in the process of cell division. The system detected nuclei with high sensitivity and specificity from the one- to 24-cell stages.
A combination of local image entropy and an object-tracking algorithm enabled highly objective and productive detection of nuclei in a set of 4D DIC microscope images of C. elegans embryos. The system will facilitate genomic and computational analyses of C. elegans embryos.
The ever-increasing number of sequenced and annotated genomes has made management of their annotations a significant undertaking, especially for large eukaryotic genomes containing many thousands of genes. Typically, changes in gene and transcript numbers are used to summarize changes from release to release, but these measures say nothing about changes to individual annotations, nor do they provide any means to identify annotations in need of manual review.
In response, we have developed a suite of quantitative measures to better characterize changes to a genome's annotations between releases, and to prioritize problematic annotations for manual review. We have applied these measures to the annotations of five eukaryotic genomes over multiple releases – H. sapiens, M. musculus, D. melanogaster, A. gambiae, and C. elegans.
Our results provide the first detailed, historical overview of how these genomes' annotations have changed over the years, and demonstrate the usefulness of these measures for genome annotation management.
Using DNA sequences 5′ to open reading frames, we have constructed green fluorescent protein (GFP) fusions and generated spatial and temporal tissue expression profiles for 1,886 specific genes in the nematode Caenorhabditis elegans. This effort encompasses about 10% of all genes identified in this organism. GFP-expressing wild-type animals were analyzed at each stage of development from embryo to adult. We have identified 5′ DNA regions regulating expression at all developmental stages and in 38 different cell and tissue types in this organism. Among the regulatory regions identified are sequences that regulate expression in all cells, in specific tissues, in combinations of tissues, and in single cells. Most of the genes we have examined in C. elegans have human orthologs. All the images and expression pattern data generated by this project are available at WormAtlas (http://gfpweb.aecom.yu.edu/index) and through WormBase (http://www.wormbase.org).
Knowing where a protein is expressed provides an important clue about its potential function. As critical as this information is, we have complete developmental expression profiles for only a small fraction of all genes expressed in any metazoan. Here, we have generated spatial and temporal tissue expression profiles for 10% of all genes in the nematode Caenorhabditis elegans. Worms expressing putative gene regulatory elements fused with green fluorescent protein were analyzed at each stage of development from embryo to adult. Among the regulatory regions identified are sequences that regulate expression in all cells, in specific tissues, in combinations of tissues, and in single cells. Most of the genes we have examined in C. elegans have human orthologs. Our analysis of complex expression patterns for so many genes may not only facilitate functional analysis in C. elegans, but also create a foundation for decoding the informational hierarchies governing gene expression in all organisms.
Using DNA sequences 5' to open reading frames, the authors construct green fluorescent protein fusions and generate spatial and temporal tissue expression profiles for 10% of all genes in the nematode Caenorhabditis elegans.