A gene expression atlas is an essential resource to quantify and understand the multiscale processes of embryogenesis in time and space. The automated reconstruction of a prototypic 4D atlas for vertebrate early embryos, using multicolor fluorescence in situ hybridization with nuclear counterstain, requires dedicated computational strategies. To this goal, we designed an original methodological framework implemented in a software tool called Match-IT. With only minimal human supervision, our system is able to gather gene expression patterns observed in different analyzed embryos with phenotypic variability and map them onto a series of common 3D templates over time, creating a 4D atlas. This framework was used to construct an atlas composed of 6 gene expression templates from a cohort of zebrafish early embryos spanning 6 developmental stages from 4 to 6.3 hpf (hours post fertilization). They included 53 specimens, 181,415 detected cell nuclei and the segmentation of 98 gene expression patterns observed in 3D for 9 different genes. In addition, an interactive visualization software, Atlas-IT, was developed to inspect, supervise and analyze the atlas. Match-IT and Atlas-IT, including user manuals, representative datasets and video tutorials, are publicly and freely available online. We also propose computational methods and tools for the quantitative assessment of the gene expression templates at the cellular scale, with the identification, visualization and analysis of coexpression patterns, synexpression groups and their dynamics through developmental stages.
We propose a workflow to map the expression domains of multiple genes onto a series of 3D templates, or “atlas”, during early embryogenesis. It was applied to the zebrafish at different stages between 4 and 6.3 hpf, generating 6 templates. Our system overcomes the lack of significant morphological landmarks in early development by relying on the expression of a reference gene (goosecoid, gsc) and nuclear staining to guide the registration of the analyzed genes. The proposed method also successfully maps gene domains from partially imaged embryos, thus allowing greater microscope magnification and cellular resolution. By using the workflow to construct a spatiotemporal database of zebrafish, we opened the way to a systematic analysis of vertebrate embryogenesis. The atlas database, together with the mapping software (Match-IT), a custom-made visualization platform (Atlas-IT), and step-by-step user guides are available from the Supplementary Material. We expect that this will encourage other laboratories to generate, map, visualize and analyze new gene expression datasets.
Motivation: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods.
Results: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported.
Supplementary information: Supplementary data are available at Bioinformatics online.
The datasets on gene expression are the valuable source of information about the functional state of an organism. Recently, we have acquired the large dataset on expression of segmentation genes in the Drosophila blastoderm. To provide efficient access to the data, we have developed the FlyEx database (http://urchin.spbcas.ru/flyex). FlyEx contains 4716 images of 14 segmentation gene expression patterns obtained from 1579 embryos and 9 500 000 quantitative data records. Reference data are available for all segmentation genes in cycles 11–13 and all temporal classes of cycle 14A. FlyEx supports operations on images of gene expression patterns. The database can be used to examine the quality of data, analyze the dynamics of formation of segmentation gene expression domains, as well as to estimate the variability of gene expression patterns. Currently, a user is able to monitor and analyze the dynamics of formation of segmentation gene expression domains over the whole period of segment determination, that amounts to 1.5 h of development. FlyEx supports the data downloads and construction of personal reference datasets, that makes it possible to more effectively use and analyze data.
Kidney development is based on differential cell type specific expression of a vast number of genes. While multiple critical genes and pathways have been elucidated, a genomewide analysis of gene expression within individual cellular and anatomic structures is lacking. Accomplishing this could provide significant new insights into fundamental developmental mechanisms such as mesenchymal-epithelial transition, inductive signaling, branching morphogenesis and segmentation. We describe here a comprehensive gene expression atlas of the developing mouse kidney based on the isolation of each major compartment by either laser capture microdissection or fluorescent activated cell sorting, followed by microarray profiling. The resulting data agrees with known expression patterns and additional in situ hybridizations. This kidney atlas allows a comprehensive analysis of the progression of gene expression states during nephrogenesis, as well as discovery of novel growth factor-receptor interactions. In addition, the results provide deeper insight into the genetic regulatory mechanisms of kidney development.
Site-specific transcription factors (TFs) bind DNA regulatory elements to control expression of target genes, forming the core of gene regulatory networks. Despite decades of research, most studies focus on only a small number of TFs and the roles of many remain unknown.
We present a systematic characterization of spatiotemporal gene expression patterns for all known or predicted Drosophila TFs throughout embryogenesis, the first such comprehensive study for any metazoan animal. We generated RNA expression patterns for all 708 TFs by in situ hybridization, annotated the patterns using an anatomical controlled vocabulary, and analyzed TF expression in the context of organ system development. Nearly all TFs are expressed during embryogenesis and more than half are specifically expressed in the central nervous system. Compared to other genes, TFs are enriched early in the development of most organ systems, and throughout the development of the nervous system. Of the 535 TFs with spatially restricted expression, 79% are dynamically expressed in multiple organ systems while 21% show single-organ specificity. Of those expressed in multiple organ systems, 77 TFs are restricted to a single organ system either early or late in development. Expression patterns for 354 TFs are characterized for the first time in this study.
We produced a reference TF dataset for the investigation of gene regulatory networks in embryogenesis, and gained insight into the expression dynamics of the full complement of TFs controlling the development of each organ system.
Phosphatase of regenerating liver (PRL) family is classified as class IVa of protein tyrosine phosphatase (PTP4A) that removes phosphate groups from phosphorylated tyrosine residues on proteins. PRL phosphatases have been implicated in a number of tumorigenesis and metastasis processes and are highly conserved. However, the understanding of PRL expression profiles during embryonic development is very limited.
In this study, we demonstrated and characterized the comprehensive expression pattern of Drosophila PRL, amphioxus PRL, and zebrafish PRLs during embryonic development by either whole mount immunostaining or in situ hybridization. Our results indicate that Drosophila PRL is mainly enriched in developing mid-guts and central nervous system (CNS) in embryogenesis. In amphioxus, initially PRL gene is expressed ubiquitously during early embryogenesis, but its expression become restricted to the anterior neural tube in the cerebral vesicle. In zebrafish, PRL-1 and PRL-2 share similar expression patterns, most of which are neuronal lineages. In contrast, the expression of zebrafish PRL-3 is more specific and preferential in muscle.
This study, for the first time, elucidated the embryonic expression pattern of Drosophila, amphioxus, and zebrafish PRL genes. The shared PRL expression pattern in the developing CNS among diverse animals suggests that PRL may play conserved roles in these animals for CNS development.
Phosphatase of regenerating liver; PTP4A; Embryogenesis; Drosophila; Zebrafish; Amphioxus
The manuscript describes the “digital transcriptome atlas” of the developing mouse embryo, a powerful resource to determine co-expression of genes, to identify cell populations and lineages and to identify functional associations between genes relevant to development and disease.
Ascertaining when and where genes are expressed is of crucial importance to understanding or predicting the physiological role of genes and proteins and how they interact to form the complex networks that underlie organ development and function. It is, therefore, crucial to determine on a genome-wide level, the spatio-temporal gene expression profiles at cellular resolution. This information is provided by colorimetric RNA in situ hybridization that can elucidate expression of genes in their native context and does so at cellular resolution. We generated what is to our knowledge the first genome-wide transcriptome atlas by RNA in situ hybridization of an entire mammalian organism, the developing mouse at embryonic day 14.5. This digital transcriptome atlas, the Eurexpress atlas (http://www.eurexpress.org), consists of a searchable database of annotated images that can be interactively viewed. We generated anatomy-based expression profiles for over 18,000 coding genes and over 400 microRNAs. We identified 1,002 tissue-specific genes that are a source of novel tissue-specific markers for 37 different anatomical structures. The quality and the resolution of the data revealed novel molecular domains for several developing structures, such as the telencephalon, a novel organization for the hypothalamus, and insight on the Wnt network involved in renal epithelial differentiation during kidney development. The digital transcriptome atlas is a powerful resource to determine co-expression of genes, to identify cell populations and lineages, and to identify functional associations between genes relevant to development and disease.
In situ hybridization (ISH) can be used to visualize gene expression in cells and tissues in their native context. High-throughput ISH using nonradioactive RNA probes allowed the Eurexpress consortium to generate a comprehensive, interactive, and freely accessible digital gene expression atlas, the Eurexpress transcriptome atlas (http://www.eurexpress.org), of the E14.5 mouse embryo. Expression data for over 15,000 genes were annotated for hundreds of anatomical structures, thus allowing us to systematically identify tissue-specific and tissue-overlapping gene networks. We illustrate the value of the Eurexpress atlas by finding novel regional subdivisions in the developing brain. We also use the transcriptome atlas to allocate specific components of the complex Wnt signaling pathway to kidney development, and we identify regionally expressed genes in liver that may be markers of hematopoietic stem cell differentiation.
Drosophila melanogaster is a major model organism for investigating the function and interconnection of animal genes in the earliest stages of embryogenesis. Today, images capturing Drosophila gene expression patterns are being produced at a higher throughput than ever before. The analysis of spatial patterns of gene expression is most biologically meaningful when images from a similar time point during development are compared. Thus, the critical first step is to determine the developmental stage of an embryo. This information is also needed to observe and analyze expression changes over developmental time. Currently, developmental stages (time) of embryos in images capturing spatial expression pattern are annotated manually, which is time- and labor-intensive. Embryos are often designated into stage ranges, making the information on developmental time course. This makes downstream analyses inefficient and biological interpretations of similarities and differences in spatial expression patterns challenging, particularly when using automated tools for analyzing expression patterns of large number of images.
Results: Here, we present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In an analysis of 3724 images, the new approach shows high accuracy in predicting the developmental stage correctly (79%). In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores for all images containing expression patterns of the same gene enable a direct way to view expression changes over developmental time for any gene. We show that the genomewide-expression-maps generated using images from embryos in refined stages illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.
Availability and implementation: The software package is available for download at: http://www.public.asu.edu/∼jye02/Software/Fly-Project/.
Supplementary data are available at Bioinformatics online.
Massive amounts of image data have been collected and continue to be generated for representing cellular gene expression throughout the mouse brain. Critical to exploiting this key effort of the post-genomic era is the ability to place these data into a common spatial reference that enables rapid interactive queries, analysis, data sharing, and visualization. In this paper, we present a set of automated protocols for generating and annotating gene expression patterns suitable for the establishment of a database. The steps include imaging tissue slices, detecting cellular gene expression levels, spatial registration with an atlas, and textual annotation. Using high-throughput in situ hybridization to generate serial sets of tissues displaying gene expression, this process was applied towards the establishment of a database representing over 200 genes in the postnatal day 7 mouse brain. These data using this protocol are now well-suited for interactive comparisons, analysis, queries, and visualization.
In situ hybridization; Comparison; Subdivision; Landmarks; Database; Brain atlas; Mice; Rodents
A key early step in embryogenesis is the establishment of the major body axes; the dorsal-ventral (DV) and anterior-posterior (AP) axes. Determination of these axes in some insects requires the function of different sets of signalling pathways for each axis. Patterning across the DV axis requires interaction between the Toll and Dpp/TGF-β pathways, whereas patterning across the AP axis requires gradients of bicoid/orthodenticle proteins and the actions of a hierarchy of gene transcription factors. We examined the expression and function of Toll and Dpp signalling during honeybee embryogenesis to assess to the role of these genes in DV patterning.
Pathway components that are required for dorsal specification in Drosophila are expressed in an AP-restricted pattern in the honeybee embryo, including Dpp and its receptor Tkv. Components of the Toll pathway are expressed in a more conserved pattern along the ventral axis of the embryo. Late-stage embryos from RNA interference (RNAi) knockdown of Toll and Dpp pathways had both DV and AP patterning defects, confirmed by staining with Am-sna, Am-zen, Am-eve, and Am-twi at earlier stages. We also identified two orthologues of dorsal in the honeybee genome, with one being expressed during embryogenesis and having a minor role in axis patterning, as determined by RNAi and the other expressed during oogenesis.
We found that early acting pathways (Toll and Dpp) are involved not only in DV patterning but also AP patterning in honeybee embryogenesis. Changes to the expression patterns and function of these genes may reflect evolutionary changes in the placement of the extra-embryonic membranes during embryogenesis with respect to the AP and DV axes.
Anterior posterior; Apis mellifera; Axis formation; Dorsal ventral; Dpp; Evolution; Honeybee; Toll
High-throughput instruments were recently developed to determine gene expression patterns on tissue sections by RNA in situ hybridization. The resulting images of gene expression patterns, chiefly of E14.5 mouse embryos, are accessible to the public at http://www.genepaint.org. This relational database is searchable for gene identifiers and RNA probe sequences. Moreover, patterns and intensity of expression in ∼100 different embryonic tissues are annotated and can be searched using a standardized catalog of anatomical structures. A virtual microscope tool, the Zoom Image Server, was implemented in GenePaint.org and permits interactive zooming and panning across ∼15 000 high-resolution images.
Motivation: Gene expression patterns can be useful in understanding the structural organization of the brain and the regulatory logic that governs its myriad cell types. A particularly rich source of spatial expression data is the Allen Brain Atlas (ABA), a comprehensive genome-wide in situ hybridization study of the adult mouse brain. Here, we present an open-source program, ALLENMINER, that searches the ABA for genes that are expressed, enriched, patterned or graded in a user-specified region of interest.
Results: Regionally enriched genes identified by ALLENMINER accurately reflect the in situ data (95–99% concordance with manual curation) and compare with regional microarray studies as expected from previous comparisons (61–80% concordance). We demonstrate the utility of ALLENMINER by identifying genes that exhibit patterned expression in the caudoputamen and neocortex. We discuss general characteristics of gene expression in the mouse brain and the potential application of ALLENMINER to design strategies for specific genetic access to brain regions and cell types.
Availability: ALLENMINER is freely available on the Internet at http://research.janelia.org/davis/allenminer.
Supplementary information: Supplementary data are available at Bioinformatics online.
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
Complex spatial and temporal patterns of gene expression underlie embryo differentiation, yet methods do not yet exist for the efficient genome-wide determination of spatial expression patterns during development. In situ imaging of transcripts and proteins is the gold-standard, but it is difficult and time consuming to apply to an entire genome, even when highly automated. Sequencing, in contrast, is fast and genome-wide, but is generally applied to homogenized tissues, thereby discarding spatial information. To take advantage of the efficiency and comprehensiveness of sequencing while retaining spatial information, we cryosectioned individual blastoderm stage Drosophila melanogaster embryos along the anterior-posterior axis and developed methods to reliably sequence the mRNA isolated from each 25 µm slice. The spatial patterns of gene expression we infer closely match patterns previously determined by in situ hybridization and microscopy. We applied this method to generate a genome-wide timecourse of spatial gene expression from shortly after fertilization through gastrulation. We identified numerous genes with spatial patterns that have not yet been described in the several ongoing systematic in situ based projects. This simple experiment demonstrates the potential for combining careful anatomical dissection with high-throughput sequencing to obtain spatially resolved gene expression on a genome-wide scale.
Here, we describe the BioMart interface to the eMouseAtlas gene expression database EMAGE. EMAGE is a spatiotemporal database of in situ gene expression patterns in the developing mouse embryo. BioMart provides a generic web query interface and programmable access using web services. The BioMart interface extends access to EMAGE via a powerful method of structuring complex queries and one with which users may already be familiar with from other BioMart implementations. The interface is structured into several data sets providing the user with comprehensive query access to the EMAGE data. The federated nature of BioMart allows scope for integration and cross querying of EMAGE with other similar BioMarts.
Database URL: http://biomart.emouseatlas.org
Transposable elements (TEs) are mobile nucleotide sequences which, through changing position in host genomes, partake in important evolutionary processes. The expression patterns of two TEs, P element transposon and 412 retrotransposon, were investigated during Drosophila melanogaster and D. willistoni embryogenesis, by means of embryo hybridization using riboprobes. Spatiotemporal transcription patterns for both TEs were similar to those of developmental genes. Although the two species shared the same P element transcription pattern, this was not so with 412 retrotransposon. These findings suggest that the regulatory sequences involved in the initial development of Drosophila spp are located in the transposable element sequences, and differences, such as in this case of the 412 retrotransposon, lead to losses or changes in their transcription patterns.
Drosophila; P element; 412; transposable element; embryonic development
The Allen Brain Atlas (ABA) project systematically profiles three-dimensional high-resolution gene expression in postnatal mouse brains for thousands of genes. By unveiling gene behaviors at both the cellular and molecular levels, ABA is becoming a unique and comprehensive neuroscience data source for decoding enigmatic biological processes in the brain. Given the unprecedented volume and complexity of the in situ hybridization image data, data mining in this area is extremely challenging. Currently, the ABA database mainly serves as an online reference for visual inspection of individual genes; the underlying rich information of this large data set is yet to be explored by novel computational tools. In this proof-of-concept study, we studied the hypothesis that genes sharing similar three-dimensional expression profiles in the mouse brain are likely to share similar biological functions.
In order to address the pattern comparison challenge when analyzing the ABA database, we developed a robust image filtering method, dubbed histogram-row-column (HRC) algorithm. We demonstrated how the HRC algorithm offers the sensitivity of identifying a manageable number of gene pairs based on automatic pattern searching from an original large brain image collection. This tool enables us to quickly identify genes of similar in situ hybridization patterns in a semi-automatic fashion and consequently allows us to discover several gene expression patterns with expression neighborhoods containing genes of similar functional categories.
Given a query brain image, HRC is a fully automated algorithm that is able to quickly mine vast number of brain images and identify a manageable subset of genes that potentially shares similar spatial co-distribution patterns for further visual inspection. A three-dimensional in situ hybridization pattern, if statistically significant, could serve as a fingerprint of certain gene function. Databases such as ABA provide valuable data source for characterizing brain-related gene functions when armed with powerful image querying tools like HRC.
The Allen Brain Atlas, the most comprehensive in situ hybridization database, covers over 21000 genes expressed in the mouse brain. Here we discuss the feasibility to utilize the ABA in research pertaining to the central regulation of feeding and we define advantages and vulnerabilities associated with the use of the atlas as a guidance tool. We searched for 57 feeding-related genes in the ABA, and of those 42 display distribution consistent with that described in previous reports. Detailed analyses of these 42 genes in the nucleus accumbens, ventral tegmental area, nucleus of the solitary tract, lateral hypothalamus, arcuate, paraventricular, ventromedial and dorsomedial nuclei suggests that molecules involved in feeding stimulation and termination are coexpressed in multiple consumption-related sites. Gene systems linked to energy needs, reward or satiation display a remarkably high level of overlap. This conclusion calls into question the classical concept of brain sites viewed as independent hunger or reward “centers” and favors the theory of a widespread feeding network comprising multiple neuroregulators affecting numerous aspects of consumption.
CNS; food intake; obesity; anorexia; neuropeptides
Mobile technologies provide unique opportunities for ubiquitous distribution of scientific information through user-friendly interfaces. Therefore, we have developed a new FlyExpress mobile application that makes available a growing collection (>100 000) of standardized in situ hybridization images containing spatial patterns of gene expression from Drosophila melanogaster (fruit fly) embryogenesis. Using this application, scientists can visualize and compare expression patterns of >4000 developmentally relevant genes. The FlyExpress app displays the expression patterns of the selected gene for different visual projections (e.g. lateral) and displays them according to their developmental stages, which shows a gene’s progression of spatial expression over developmental time. Ultimately, we envision the use of FlyExpress app in the laboratory where scientists may wish to immediately conduct a visual comparison of a known expression pattern with the one observed on the bench top or to display expression patterns of interest during scientific discussions at large.
Availability: Search “FlyExpress” on the Apple iTunes store
The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.
gene expression pattern; image annotation; bag-of-words; sparse learning; regularization
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.
Summary: Images containing spatial expression patterns illuminate the roles of different genes during embryogenesis. In order to generate initial clues to regulatory interactions, biologists frequently need to know the set of genes expressed at the same time at specific locations in a developing embryo, as well as related research publications. However, text-based mining of image annotations and research articles cannot produce all relevant results, because the primary data are images that exist as graphical objects. We have developed a unique knowledge base (FlyExpress) to facilitate visual mining of images from Drosophila melanogaster embryogenesis. By clicking on specific locations in pictures of fly embryos from different stages of development and different visual projections, users can produce a list of genes and publications instantly. In FlyExpress, each queryable embryo picture is a heat-map that captures the expression patterns of more than 4500 genes and more than 2600 published articles. In addition, one can view spatial patterns for particular genes over time as well as find other genes with similar expression patterns at a given developmental stage. Therefore, FlyExpress is a unique tool for mining spatiotemporal expression patterns in a format readily accessible to the scientific community.
The hypothalamus is a central regulator of many behaviors that are essential for survival, such as temperature regulation, food intake and circadian rhythms. However, the molecular pathways that mediate hypothalamic development are largely unknown. To identify genes expressed in developing mouse hypothalamus, we performed microarray analysis at 12 different developmental time points. We then conducted developmental in situ hybridization for 1,045 genes that were dynamically expressed over the course of hypothalamic neurogenesis. We identified markers that stably labeled each major hypothalamic nucleus over the entire course of neurogenesis and constructed a detailed molecular atlas of the developing hypothalamus. As a proof of concept of the utility of these data, we used these markers to analyze the phenotype of mice in which Sonic Hedgehog (Shh) was selectively deleted from hypothalamic neuroepithelium and found that Shh is essential for anterior hypothalamic patterning. Our results serve as a resource for functional investigations of hypothalamic development, connectivity, physiology and dysfunction.
Motivation: Recent advancements in high-throughput imaging have created new large datasets with tens of thousands of gene expression images. Methods for capturing these spatial and/or temporal expression patterns include in situ hybridization or fluorescent reporter constructs or tags, and results are still frequently assessed by subjective qualitative comparisons. In order to deal with available large datasets, fully automated analysis methods must be developed to properly normalize and model spatial expression patterns.
Results: We have developed image segmentation and registration methods to identify and extract spatial gene expression patterns from RNA in situ hybridization experiments of Drosophila embryos. These methods allow us to normalize and extract expression information for 78 621 images from 3724 genes across six time stages. The similarity between gene expression patterns is computed using four scoring metrics: mean squared error, Haar wavelet distance, mutual information and spatial mutual information (SMI). We additionally propose a strategy to calculate the significance of the similarity between two expression images, by generating surrogate datasets with similar spatial expression patterns using a Monte Carlo swap sampler. On data from an early development time stage, we show that SMI provides the most biologically relevant metric of comparison, and that our significance testing generalizes metrics to achieve similar performance. We exemplify the application of spatial metrics on the well-known Drosophila segmentation network.
Availability: A Java webstart application to register and compare patterns, as well as all source code, are available from: http://tools.genome.duke.edu/generegulation/image_analysis/insitu
Supplementary information: Supplementary data are available at Bioinformatics online.
The establishment of a database of gene-expression patterns derived from systematic highthroughput in situ hybridization studies on whole-mount Drosophila embryos vastly increases the breadth and depth that can be reached by developmental genetics.
The establishment of a database of gene-expression patterns derived from systematic high-throughput in situ hybridization studies on whole-mount Drosophila embryos, together with new information on the reannotated Drosophila genome and several recent microarray-based genomic analyses of Drosophila development, vastly increase the breadth and depth that can be reached by developmental genetics.