We created innovative virtual representation for our large scale Drosophila insitu expression dataset. We aligned an elliptically shaped mesh comprised of small triangular regions to the outline of each embryo. Each triangle defines a unique location in the embryo and comparing corresponding triangles allows easy identification of similar expression patterns.The virtual representation was used to organize the expression landscape at stage 4-6. We identified regions with similar expression in the embryo and clustered genes with similar expression patterns.We created algorithms to mine the dataset for adjacent non-overlapping patterns and anti-correlated patterns. We were able to mine the dataset to identify co-expressed and putative interacting genes.Using co-expression we were able to assign putative functions to unknown genes.
Analyzing both temporal and spatial gene expression is essential for understanding development and regulatory networks of multicellular organisms. Interacting genes are commonly expressed in overlapping or adjacent domains. Thus, gene expression patterns can be used to assign putative gene functions and mined to infer candidates for networks.
We have generated a systematic two-dimensional mRNA expression atlas profiling embryonic development of Drosophila melanogaster (Tomancak et al, 2002, 2007). To date, we have collected over 70 000 images for over 6000 genes. To explore spatial relationships between gene expression patterns, we used a novel computational image-processing approach by converting expression patterns from the images into virtual representations (Figure 1). Using a custom-designed automated pipeline, for each image, we segmented and aligned the outline of the embryo to an elliptically shaped mesh, comprised of 311 small triangular regions each defining a unique location within the embryo. By comparing corresponding triangles, we produced a distance score to identify similar patterns. We generated those triangulated images (TIs) for our entire data set at all developmental stages and demonstrated that this representation can be used as for objective computationally defined description for expression in in situ hybridization images from various sources, including images from the literature.
We used the TIs to conduct a comprehensive analysis of the expression landscape. To this end, we created a novel approach to temporally sort and compact TIs to a non-redundant data set suitable for further computational processing. Although generally applicable for all developmental stages, for this study, we focused on developmental stages 4–6. For this stage range, we reduced the initial set of about 5800 TIs to 553 TIs containing 364 genes. Using this filtered data set, to discover how expression subdivides the embryo into regions, we clustered areas with similar expression and demonstrated that expression patterns divide the early embryo into distinct spatial regions resembling a fate map (Figure 3). To discover the range of unique expression patterns, we used affinity propagation clustering (Frey and Dueck, 2007) to group TIs with similar patterns and identified 39 clusters each representing a distinct pattern class. We integrated the remaining genes into the 39 clusters and studied the distribution of expression patterns and the relationships between the clusters.
The clustered expression patterns were used to identify putative positive and negative regulatory interactions. The similar TIs in each cluster not only grouped already known genes with related functions, but previously undescribed genes. A comparative analysis identified subtle differences between the genes within each expression cluster. To investigate these differences, we developed a novel Markov Random Field (MRF) segmentation algorithm to extract patterns. We then extended the MRF algorithm to detect shared expression boundaries, generate similarity measurements, and discriminate even faint/uncertain patterns between two TIs. This enabled us to identify more subtle partial expression pattern overlaps and adjacent non-overlapping patterns. For example, by conducting this analysis on the cluster containing the gene snail, we identified the previously known huckebein, which restricts snail expression (Reuter and Leptin, 1994), and zfh1, which interacts with tinman (Broihier et al, 1998; Su et al, 1999).
By studying the functions of known genes, we assigned putative developmental roles to each of the 39 clusters. Of the 1800 genes investigated, only half of them had previously assigned functions.
Representing expression patterns with geometric meshes facilitates the analysis of a complex process involving thousands of genes. This approach is complementary to the cellular resolution 3D atlas for the Drosophila embryo (Fowlkes et al, 2008). Our method can be used as a rapid, fully automated, high-throughput approach to obtain a map of co-expression, which will serve to select specific genes for detailed multiplex in-situ hybridization and confocal analysis for a fine-grain atlas. Our data are similar to the data in the literature, and research groups studying reporter constructs, mutant animals, or orthologs can easily produce in situ hybridizations. TIs can be readily created and provide representations that are both comparable to each other and our data set. We have demonstrated that our approach can be used for predicting relationships in regulatory and developmental pathways.
Discovery of temporal and spatial patterns of gene expression is essential for understanding the regulatory networks and development in multicellular organisms. We analyzed the images from our large-scale spatial expression data set of early Drosophila embryonic development and present a comprehensive computational image analysis of the expression landscape. For this study, we created an innovative virtual representation of embryonic expression patterns using an elliptically shaped mesh grid that allows us to make quantitative comparisons of gene expression using a common frame of reference. Demonstrating the power of our approach, we used gene co-expression to identify distinct expression domains in the early embryo; the result is surprisingly similar to the fate map determined using laser ablation. We also used a clustering strategy to find genes with similar patterns and developed new analysis tools to detect variation within consensus patterns, adjacent non-overlapping patterns, and anti-correlated patterns. Of the 1800 genes investigated, only half had previously assigned functions. The known genes suggest developmental roles for the clusters, and identification of related patterns predicts requirements for co-occurring biological functions.