Motivation: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods.
Results: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported.
Supplementary information: Supplementary data are available at Bioinformatics online.
The datasets on gene expression are the valuable source of information about the functional state of an organism. Recently, we have acquired the large dataset on expression of segmentation genes in the Drosophila blastoderm. To provide efficient access to the data, we have developed the FlyEx database (http://urchin.spbcas.ru/flyex). FlyEx contains 4716 images of 14 segmentation gene expression patterns obtained from 1579 embryos and 9 500 000 quantitative data records. Reference data are available for all segmentation genes in cycles 11–13 and all temporal classes of cycle 14A. FlyEx supports operations on images of gene expression patterns. The database can be used to examine the quality of data, analyze the dynamics of formation of segmentation gene expression domains, as well as to estimate the variability of gene expression patterns. Currently, a user is able to monitor and analyze the dynamics of formation of segmentation gene expression domains over the whole period of segment determination, that amounts to 1.5 h of development. FlyEx supports the data downloads and construction of personal reference datasets, that makes it possible to more effectively use and analyze data.
Kidney development is based on differential cell type specific expression of a vast number of genes. While multiple critical genes and pathways have been elucidated, a genomewide analysis of gene expression within individual cellular and anatomic structures is lacking. Accomplishing this could provide significant new insights into fundamental developmental mechanisms such as mesenchymal-epithelial transition, inductive signaling, branching morphogenesis and segmentation. We describe here a comprehensive gene expression atlas of the developing mouse kidney based on the isolation of each major compartment by either laser capture microdissection or fluorescent activated cell sorting, followed by microarray profiling. The resulting data agrees with known expression patterns and additional in situ hybridizations. This kidney atlas allows a comprehensive analysis of the progression of gene expression states during nephrogenesis, as well as discovery of novel growth factor-receptor interactions. In addition, the results provide deeper insight into the genetic regulatory mechanisms of kidney development.
Phosphatase of regenerating liver (PRL) family is classified as class IVa of protein tyrosine phosphatase (PTP4A) that removes phosphate groups from phosphorylated tyrosine residues on proteins. PRL phosphatases have been implicated in a number of tumorigenesis and metastasis processes and are highly conserved. However, the understanding of PRL expression profiles during embryonic development is very limited.
In this study, we demonstrated and characterized the comprehensive expression pattern of Drosophila PRL, amphioxus PRL, and zebrafish PRLs during embryonic development by either whole mount immunostaining or in situ hybridization. Our results indicate that Drosophila PRL is mainly enriched in developing mid-guts and central nervous system (CNS) in embryogenesis. In amphioxus, initially PRL gene is expressed ubiquitously during early embryogenesis, but its expression become restricted to the anterior neural tube in the cerebral vesicle. In zebrafish, PRL-1 and PRL-2 share similar expression patterns, most of which are neuronal lineages. In contrast, the expression of zebrafish PRL-3 is more specific and preferential in muscle.
This study, for the first time, elucidated the embryonic expression pattern of Drosophila, amphioxus, and zebrafish PRL genes. The shared PRL expression pattern in the developing CNS among diverse animals suggests that PRL may play conserved roles in these animals for CNS development.
Phosphatase of regenerating liver; PTP4A; Embryogenesis; Drosophila; Zebrafish; Amphioxus
The manuscript describes the “digital transcriptome atlas” of the developing mouse embryo, a powerful resource to determine co-expression of genes, to identify cell populations and lineages and to identify functional associations between genes relevant to development and disease.
Ascertaining when and where genes are expressed is of crucial importance to understanding or predicting the physiological role of genes and proteins and how they interact to form the complex networks that underlie organ development and function. It is, therefore, crucial to determine on a genome-wide level, the spatio-temporal gene expression profiles at cellular resolution. This information is provided by colorimetric RNA in situ hybridization that can elucidate expression of genes in their native context and does so at cellular resolution. We generated what is to our knowledge the first genome-wide transcriptome atlas by RNA in situ hybridization of an entire mammalian organism, the developing mouse at embryonic day 14.5. This digital transcriptome atlas, the Eurexpress atlas (http://www.eurexpress.org), consists of a searchable database of annotated images that can be interactively viewed. We generated anatomy-based expression profiles for over 18,000 coding genes and over 400 microRNAs. We identified 1,002 tissue-specific genes that are a source of novel tissue-specific markers for 37 different anatomical structures. The quality and the resolution of the data revealed novel molecular domains for several developing structures, such as the telencephalon, a novel organization for the hypothalamus, and insight on the Wnt network involved in renal epithelial differentiation during kidney development. The digital transcriptome atlas is a powerful resource to determine co-expression of genes, to identify cell populations and lineages, and to identify functional associations between genes relevant to development and disease.
In situ hybridization (ISH) can be used to visualize gene expression in cells and tissues in their native context. High-throughput ISH using nonradioactive RNA probes allowed the Eurexpress consortium to generate a comprehensive, interactive, and freely accessible digital gene expression atlas, the Eurexpress transcriptome atlas (http://www.eurexpress.org), of the E14.5 mouse embryo. Expression data for over 15,000 genes were annotated for hundreds of anatomical structures, thus allowing us to systematically identify tissue-specific and tissue-overlapping gene networks. We illustrate the value of the Eurexpress atlas by finding novel regional subdivisions in the developing brain. We also use the transcriptome atlas to allocate specific components of the complex Wnt signaling pathway to kidney development, and we identify regionally expressed genes in liver that may be markers of hematopoietic stem cell differentiation.
Massive amounts of image data have been collected and continue to be generated for representing cellular gene expression throughout the mouse brain. Critical to exploiting this key effort of the post-genomic era is the ability to place these data into a common spatial reference that enables rapid interactive queries, analysis, data sharing, and visualization. In this paper, we present a set of automated protocols for generating and annotating gene expression patterns suitable for the establishment of a database. The steps include imaging tissue slices, detecting cellular gene expression levels, spatial registration with an atlas, and textual annotation. Using high-throughput in situ hybridization to generate serial sets of tissues displaying gene expression, this process was applied towards the establishment of a database representing over 200 genes in the postnatal day 7 mouse brain. These data using this protocol are now well-suited for interactive comparisons, analysis, queries, and visualization.
In situ hybridization; Comparison; Subdivision; Landmarks; Database; Brain atlas; Mice; Rodents
High-throughput instruments were recently developed to determine gene expression patterns on tissue sections by RNA in situ hybridization. The resulting images of gene expression patterns, chiefly of E14.5 mouse embryos, are accessible to the public at http://www.genepaint.org. This relational database is searchable for gene identifiers and RNA probe sequences. Moreover, patterns and intensity of expression in ∼100 different embryonic tissues are annotated and can be searched using a standardized catalog of anatomical structures. A virtual microscope tool, the Zoom Image Server, was implemented in GenePaint.org and permits interactive zooming and panning across ∼15 000 high-resolution images.
Motivation: Gene expression patterns can be useful in understanding the structural organization of the brain and the regulatory logic that governs its myriad cell types. A particularly rich source of spatial expression data is the Allen Brain Atlas (ABA), a comprehensive genome-wide in situ hybridization study of the adult mouse brain. Here, we present an open-source program, ALLENMINER, that searches the ABA for genes that are expressed, enriched, patterned or graded in a user-specified region of interest.
Results: Regionally enriched genes identified by ALLENMINER accurately reflect the in situ data (95–99% concordance with manual curation) and compare with regional microarray studies as expected from previous comparisons (61–80% concordance). We demonstrate the utility of ALLENMINER by identifying genes that exhibit patterned expression in the caudoputamen and neocortex. We discuss general characteristics of gene expression in the mouse brain and the potential application of ALLENMINER to design strategies for specific genetic access to brain regions and cell types.
Availability: ALLENMINER is freely available on the Internet at http://research.janelia.org/davis/allenminer.
Supplementary information: Supplementary data are available at Bioinformatics online.
Complex spatial and temporal patterns of gene expression underlie embryo differentiation, yet methods do not yet exist for the efficient genome-wide determination of spatial expression patterns during development. In situ imaging of transcripts and proteins is the gold-standard, but it is difficult and time consuming to apply to an entire genome, even when highly automated. Sequencing, in contrast, is fast and genome-wide, but is generally applied to homogenized tissues, thereby discarding spatial information. To take advantage of the efficiency and comprehensiveness of sequencing while retaining spatial information, we cryosectioned individual blastoderm stage Drosophila melanogaster embryos along the anterior-posterior axis and developed methods to reliably sequence the mRNA isolated from each 25 µm slice. The spatial patterns of gene expression we infer closely match patterns previously determined by in situ hybridization and microscopy. We applied this method to generate a genome-wide timecourse of spatial gene expression from shortly after fertilization through gastrulation. We identified numerous genes with spatial patterns that have not yet been described in the several ongoing systematic in situ based projects. This simple experiment demonstrates the potential for combining careful anatomical dissection with high-throughput sequencing to obtain spatially resolved gene expression on a genome-wide scale.
Here, we describe the BioMart interface to the eMouseAtlas gene expression database EMAGE. EMAGE is a spatiotemporal database of in situ gene expression patterns in the developing mouse embryo. BioMart provides a generic web query interface and programmable access using web services. The BioMart interface extends access to EMAGE via a powerful method of structuring complex queries and one with which users may already be familiar with from other BioMart implementations. The interface is structured into several data sets providing the user with comprehensive query access to the EMAGE data. The federated nature of BioMart allows scope for integration and cross querying of EMAGE with other similar BioMarts.
Database URL: http://biomart.emouseatlas.org
Transposable elements (TEs) are mobile nucleotide sequences which, through changing position in host genomes, partake in important evolutionary processes. The expression patterns of two TEs, P element transposon and 412 retrotransposon, were investigated during Drosophila melanogaster and D. willistoni embryogenesis, by means of embryo hybridization using riboprobes. Spatiotemporal transcription patterns for both TEs were similar to those of developmental genes. Although the two species shared the same P element transcription pattern, this was not so with 412 retrotransposon. These findings suggest that the regulatory sequences involved in the initial development of Drosophila spp are located in the transposable element sequences, and differences, such as in this case of the 412 retrotransposon, lead to losses or changes in their transcription patterns.
Drosophila; P element; 412; transposable element; embryonic development
The Allen Brain Atlas (ABA) project systematically profiles three-dimensional high-resolution gene expression in postnatal mouse brains for thousands of genes. By unveiling gene behaviors at both the cellular and molecular levels, ABA is becoming a unique and comprehensive neuroscience data source for decoding enigmatic biological processes in the brain. Given the unprecedented volume and complexity of the in situ hybridization image data, data mining in this area is extremely challenging. Currently, the ABA database mainly serves as an online reference for visual inspection of individual genes; the underlying rich information of this large data set is yet to be explored by novel computational tools. In this proof-of-concept study, we studied the hypothesis that genes sharing similar three-dimensional expression profiles in the mouse brain are likely to share similar biological functions.
In order to address the pattern comparison challenge when analyzing the ABA database, we developed a robust image filtering method, dubbed histogram-row-column (HRC) algorithm. We demonstrated how the HRC algorithm offers the sensitivity of identifying a manageable number of gene pairs based on automatic pattern searching from an original large brain image collection. This tool enables us to quickly identify genes of similar in situ hybridization patterns in a semi-automatic fashion and consequently allows us to discover several gene expression patterns with expression neighborhoods containing genes of similar functional categories.
Given a query brain image, HRC is a fully automated algorithm that is able to quickly mine vast number of brain images and identify a manageable subset of genes that potentially shares similar spatial co-distribution patterns for further visual inspection. A three-dimensional in situ hybridization pattern, if statistically significant, could serve as a fingerprint of certain gene function. Databases such as ABA provide valuable data source for characterizing brain-related gene functions when armed with powerful image querying tools like HRC.
The Allen Brain Atlas, the most comprehensive in situ hybridization database, covers over 21000 genes expressed in the mouse brain. Here we discuss the feasibility to utilize the ABA in research pertaining to the central regulation of feeding and we define advantages and vulnerabilities associated with the use of the atlas as a guidance tool. We searched for 57 feeding-related genes in the ABA, and of those 42 display distribution consistent with that described in previous reports. Detailed analyses of these 42 genes in the nucleus accumbens, ventral tegmental area, nucleus of the solitary tract, lateral hypothalamus, arcuate, paraventricular, ventromedial and dorsomedial nuclei suggests that molecules involved in feeding stimulation and termination are coexpressed in multiple consumption-related sites. Gene systems linked to energy needs, reward or satiation display a remarkably high level of overlap. This conclusion calls into question the classical concept of brain sites viewed as independent hunger or reward “centers” and favors the theory of a widespread feeding network comprising multiple neuroregulators affecting numerous aspects of consumption.
CNS; food intake; obesity; anorexia; neuropeptides
The Drosophila gene expression pattern images document the spatial and temporal dynamics of gene expression and they are valuable tools for explicating the gene functions, interaction, and networks during Drosophila embryogenesis. To provide text-based pattern searching, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with ontology terms manually by human curators. We present a systematic approach for automating this task, because the number of images needing text descriptions is now rapidly increasing. We consider both improved feature representation and novel learning formulation to boost the annotation performance. For feature representation, we adapt the bag-of-words scheme commonly used in visual recognition problems so that the image group information in the BDGP study is retained. Moreover, images from multiple views can be integrated naturally in this representation. To reduce the quantization error caused by the bag-of-words representation, we propose an improved feature representation scheme based on the sparse learning technique. In the design of learning formulation, we propose a local regularization framework that can incorporate the correlations among terms explicitly. We further show that the resulting optimization problem admits an analytical solution. Experimental results show that the representation based on sparse learning outperforms the bag-of-words representation significantly. Results also show that incorporation of the term-term correlations improves the annotation performance consistently.
gene expression pattern; image annotation; bag-of-words; sparse learning; regularization
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.
Summary: Images containing spatial expression patterns illuminate the roles of different genes during embryogenesis. In order to generate initial clues to regulatory interactions, biologists frequently need to know the set of genes expressed at the same time at specific locations in a developing embryo, as well as related research publications. However, text-based mining of image annotations and research articles cannot produce all relevant results, because the primary data are images that exist as graphical objects. We have developed a unique knowledge base (FlyExpress) to facilitate visual mining of images from Drosophila melanogaster embryogenesis. By clicking on specific locations in pictures of fly embryos from different stages of development and different visual projections, users can produce a list of genes and publications instantly. In FlyExpress, each queryable embryo picture is a heat-map that captures the expression patterns of more than 4500 genes and more than 2600 published articles. In addition, one can view spatial patterns for particular genes over time as well as find other genes with similar expression patterns at a given developmental stage. Therefore, FlyExpress is a unique tool for mining spatiotemporal expression patterns in a format readily accessible to the scientific community.
Motivation: Recent advancements in high-throughput imaging have created new large datasets with tens of thousands of gene expression images. Methods for capturing these spatial and/or temporal expression patterns include in situ hybridization or fluorescent reporter constructs or tags, and results are still frequently assessed by subjective qualitative comparisons. In order to deal with available large datasets, fully automated analysis methods must be developed to properly normalize and model spatial expression patterns.
Results: We have developed image segmentation and registration methods to identify and extract spatial gene expression patterns from RNA in situ hybridization experiments of Drosophila embryos. These methods allow us to normalize and extract expression information for 78 621 images from 3724 genes across six time stages. The similarity between gene expression patterns is computed using four scoring metrics: mean squared error, Haar wavelet distance, mutual information and spatial mutual information (SMI). We additionally propose a strategy to calculate the significance of the similarity between two expression images, by generating surrogate datasets with similar spatial expression patterns using a Monte Carlo swap sampler. On data from an early development time stage, we show that SMI provides the most biologically relevant metric of comparison, and that our significance testing generalizes metrics to achieve similar performance. We exemplify the application of spatial metrics on the well-known Drosophila segmentation network.
Availability: A Java webstart application to register and compare patterns, as well as all source code, are available from: http://tools.genome.duke.edu/generegulation/image_analysis/insitu
Supplementary information: Supplementary data are available at Bioinformatics online.
EMAGE (http://genex.hgu.mrc.ac.uk/Emage/database) is a database of in situ gene expression patterns in the developing mouse embryo. Domains of expression from raw data images are spatially integrated into a set of standard 3D virtual mouse embryos at different stages of development, allowing data interrogation by spatial methods. Sites of expression are also described using an anatomy ontology and data can be queried using text-based methods. Here we describe recent enhancements to EMAGE which include advances in spatial search methods including: a refined local spatial similarity search algorithm, a method to allow global spatial comparison of patterns in EMAGE and subsequent hierarchical-clustering, and spatial searches across multiple stages of development. In addition, we have extended data access by the introduction of web services and new HTML-based search interfaces, which allow access to data that has not yet been spatially annotated. We have also started incorporating full 3D images of gene expression that have been generated using optical projection tomography (OPT).
The establishment of a database of gene-expression patterns derived from systematic highthroughput in situ hybridization studies on whole-mount Drosophila embryos vastly increases the breadth and depth that can be reached by developmental genetics.
The establishment of a database of gene-expression patterns derived from systematic high-throughput in situ hybridization studies on whole-mount Drosophila embryos, together with new information on the reannotated Drosophila genome and several recent microarray-based genomic analyses of Drosophila development, vastly increase the breadth and depth that can be reached by developmental genetics.
Motivation: Animal development depends on localized patterns of gene expression. Whole-genome methods permit the global identification of differential expression patterns. However, most gene-expression-clustering methods focus on the analysis of entire expression profiles, rather than temporal segments or time windows.
Results: In the current study, local clustering of temporal time windows was applied to developing embryos of the fruitfly, Drosophila melanogaster. Large-scale developmental events, involving temporal activation of hundreds of genes, were identified as discrete gene clusters. The time-duration analysis revealed six temporal waves of coherent gene expression during Drosophila embryogenesis. The most powerful expression waves preceded major morphogenetic movements, such as germ band elongation and dorsal closure. These waves of gene expression coincide with the inhibition of maternal transcripts during early development, the specification of ectoderm, differentiation of the nervous system, differentiation of the digestive tract, deposition of the larval cuticle and the reorganization of the cytoskeleton during global morphogenetic events. We discuss the implications of these findings with respect to the gene regulatory networks governing Drosophila development.
Availability: Data and software are available from the UC Berkeley web resource http://flydev.berkeley.edu/cgi-bin/GTEM/dmap_dm-ag/index_dmap.htm
Supplementary information: Supplementary data are available at Bioinformatics online.
Morphogenetic events that shape the Drosophila melanogaster embryo are tightly controlled by a genetic program in which specific sets of genes are up-regulated. We used a suppressive subtractive hybridization procedure to identify a group of developmentally regulated genes during early stages of D. melanogaster embryogenesis. We studied the spatiotemporal activity of these genes in five different intervals covering 12 stages of embryogenesis.
Microarrays were constructed to confirm induction of expression and to determine the temporal profile of isolated subtracted cDNAs during embryo development. We identified a set of 118 genes whose expression levels increased significantly in at least one developmental interval compared with a reference interval. Of these genes, 53% had a phenotype and/or molecular function reported in the literature, whereas 47% were essentially uncharacterized. Clustering analysis revealed demarcated transcript groups with maximum gene activity at distinct developmental intervals. In situ hybridization assays were carried out on 23 uncharacterized genes, 15 of which proved to have spatiotemporally restricted expression patterns. Among these 15 uncharacterized genes, 13 were found to encode putative secreted and transmembrane proteins. For three of them we validated our protein sequence predictions by expressing their cDNAs in Drosophila S2R+ cells and analyzed the subcellular distribution of recombinant proteins. We then focused on the functional characterization of the gene CG6234. Inhibition of CG6234 by RNA interference resulted in morphological defects in embryos, suggesting the involvement of this gene in germ band retraction.
Our data have yielded a list of developmentally regulated D. melanogaster genes and their expression profiles during embryogenesis and provide new information on the spatiotemporal expression patterns of several uncharacterized genes. In particular, we recovered a substantial number of unknown genes encoding putative secreted and transmembrane proteins, suggesting new components of signaling pathways that might be incorporated within the existing regulatory networks controlling D. melanogaster embryogenesis. These genes are also good candidates for additional targeted functional analyses similar to those we conducted for CG6234.
See related minireview by Vichas and Zallen:
Understanding the molecular interactions that lead to the establishment of the major body axes during embryogenesis is one of the main goals of developmental biology. Although the past two decades have revolutionized our knowledge about the genetic basis of these patterning processes, the list of genes involved in axis formation is unlikely to be complete. In order to identify new genes involved in the establishment of the dorsoventral (DV) axis during early stages of zebrafish embryonic development, we employed next generation sequencing for full transcriptome analysis of normal embryos and embryos lacking overt DV pattern. A combination of different statistical approaches yielded 41 differentially expressed candidate genes and we confirmed by in situ hybridization the early dorsal expression of 32 genes that are transcribed shortly after the onset of zygotic transcription. Although promoter analysis of the validated genes suggests no general enrichment for the binding sites of early acting transcription factors, most of these genes carry “bivalent” epigenetic histone modifications at the time when zygotic transcription is initiated, suggesting a “poised” transcriptional status. Our results reveal some new candidates of the dorsal gene regulatory network and suggest that a plurality of the earliest upregulated genes on the dorsal side have a role in the modulation of the canonical Wnt pathway.
The mosquito, Anopheles gambiae, is the primary vector of human malaria, a disease responsible for millions of deaths each year. To improve strategies for controlling transmission of the causative parasite, Plasmodium falciparum, we require a thorough understanding of the developmental mechanisms, physiological processes and evolutionary pressures affecting life-history traits in the mosquito. Identifying genes expressed in particular tissues or involved in specific biological processes is an essential part of this process.
In this study, we present transcription profiles for ~82% of annotated Anopheles genes in dissected adult male and female tissues. The sensitivity afforded by examining dissected tissues found gene activity in an additional 20% of the genome that is undetected when using whole-animal samples. The somatic and reproductive tissues we examined each displayed patterns of sexually dimorphic and tissue-specific expression. By comparing expression profiles with Drosophila melanogaster we also assessed which genes are well conserved within the Diptera versus those that are more recently evolved.
Our expression atlas and associated publicly available database, the MozAtlas (http://www.tissue-atlas.org), provides information on the relative strength and specificity of gene expression in several somatic and reproductive tissues, isolated from a single strain grown under uniform conditions. The data will serve as a reference for other mosquito researchers by providing a simple method for identifying where genes are expressed in the adult, however, in addition our resource will also provide insights into the evolutionary diversity associated with gene expression levels among species.
The fushi tarazu gene is essential for the establishment of the Drosophila embryonic body plan. When first expressed in early embryogenesis, fushi tarazu mRNA is uniformly distributed over most of the embryo. Subsequently, fushi tarazu mRNA expression rapidly evolves into a pattern of seven stripes that encircle the embryo. The instability of fushi tarazu mRNA is probably crucial for attaining this localized pattern of expression. mRNA stability in transgenic embryos was measured by a new method that does not use drugs or external interference. Experiments using hybrid genes that fuse fushi tarazu sequences to those of the stable ribosomal protein A1 mRNA provide evidence for at least two destabilizing elements in the fushi tarazu mRNA, one located within the 5' one-third of the mRNA and the other near the 3' end (termed FIE3 for ftz instability element 3'). The FIE3 lies within a 201-nucleotide sequence just upstream of the polyadenylation signal and can act autonomously to destabilize a heterologous mRNA. Further deletion constructs identified an essential 68-nucleotide element within the FIE3. Lack of homology between this element and other previously identified destabilization sequences suggests that FIE3 contains a novel RNA destabilization element.
Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms.
Results: In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods.