Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
The task of deciphering the complex transcriptional regulatory networks controlling development is one of the major current challenges for molecular biology. The problem is difficult, if not impossible, to solve without a detailed knowledge of the spatiotemporal dynamics of gene expression. Thus, to understand development, we need to identify and functionally characterize all players in regulatory networks. Data on gene expression dynamics obtained from whole transcriptome microarray experiments, combined with in situ hybridization mRNA localisation patterns for a subset of genes, may provide a route for predicting the localisation of gene expression for those genes for which in situ data has not been generated, as well as suggesting functional information for uncharacterised genes. Here, we report the development of one of the first methods for predicting the localisation of gene expression during Drosophila embryogenesis from microarray data. Pooling the subset of genes in the fly genome with in situ data to form functional units, localised in space and time for relevant developmental processes, facilitates the statement of a classification problem, which we address with machine-learning methods. Our approach promotes a richer annotation of biological function for genes in the absence of costly and time-consuming experimental analysis.