Search tips
Search criteria

Results 1-25 (907313)

Clipboard (0)

Related Articles

1.  Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster 
PLoS Computational Biology  2007;3(7):e144.
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
Author Summary
The task of deciphering the complex transcriptional regulatory networks controlling development is one of the major current challenges for molecular biology. The problem is difficult, if not impossible, to solve without a detailed knowledge of the spatiotemporal dynamics of gene expression. Thus, to understand development, we need to identify and functionally characterize all players in regulatory networks. Data on gene expression dynamics obtained from whole transcriptome microarray experiments, combined with in situ hybridization mRNA localisation patterns for a subset of genes, may provide a route for predicting the localisation of gene expression for those genes for which in situ data has not been generated, as well as suggesting functional information for uncharacterised genes. Here, we report the development of one of the first methods for predicting the localisation of gene expression during Drosophila embryogenesis from microarray data. Pooling the subset of genes in the fly genome with in situ data to form functional units, localised in space and time for relevant developmental processes, facilitates the statement of a classification problem, which we address with machine-learning methods. Our approach promotes a richer annotation of biological function for genes in the absence of costly and time-consuming experimental analysis.
PMCID: PMC1924873  PMID: 17658945
2.  Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape 
We created innovative virtual representation for our large scale Drosophila insitu expression dataset. We aligned an elliptically shaped mesh comprised of small triangular regions to the outline of each embryo. Each triangle defines a unique location in the embryo and comparing corresponding triangles allows easy identification of similar expression patterns.The virtual representation was used to organize the expression landscape at stage 4-6. We identified regions with similar expression in the embryo and clustered genes with similar expression patterns.We created algorithms to mine the dataset for adjacent non-overlapping patterns and anti-correlated patterns. We were able to mine the dataset to identify co-expressed and putative interacting genes.Using co-expression we were able to assign putative functions to unknown genes.
Analyzing both temporal and spatial gene expression is essential for understanding development and regulatory networks of multicellular organisms. Interacting genes are commonly expressed in overlapping or adjacent domains. Thus, gene expression patterns can be used to assign putative gene functions and mined to infer candidates for networks.
We have generated a systematic two-dimensional mRNA expression atlas profiling embryonic development of Drosophila melanogaster (Tomancak et al, 2002, 2007). To date, we have collected over 70 000 images for over 6000 genes. To explore spatial relationships between gene expression patterns, we used a novel computational image-processing approach by converting expression patterns from the images into virtual representations (Figure 1). Using a custom-designed automated pipeline, for each image, we segmented and aligned the outline of the embryo to an elliptically shaped mesh, comprised of 311 small triangular regions each defining a unique location within the embryo. By comparing corresponding triangles, we produced a distance score to identify similar patterns. We generated those triangulated images (TIs) for our entire data set at all developmental stages and demonstrated that this representation can be used as for objective computationally defined description for expression in in situ hybridization images from various sources, including images from the literature.
We used the TIs to conduct a comprehensive analysis of the expression landscape. To this end, we created a novel approach to temporally sort and compact TIs to a non-redundant data set suitable for further computational processing. Although generally applicable for all developmental stages, for this study, we focused on developmental stages 4–6. For this stage range, we reduced the initial set of about 5800 TIs to 553 TIs containing 364 genes. Using this filtered data set, to discover how expression subdivides the embryo into regions, we clustered areas with similar expression and demonstrated that expression patterns divide the early embryo into distinct spatial regions resembling a fate map (Figure 3). To discover the range of unique expression patterns, we used affinity propagation clustering (Frey and Dueck, 2007) to group TIs with similar patterns and identified 39 clusters each representing a distinct pattern class. We integrated the remaining genes into the 39 clusters and studied the distribution of expression patterns and the relationships between the clusters.
The clustered expression patterns were used to identify putative positive and negative regulatory interactions. The similar TIs in each cluster not only grouped already known genes with related functions, but previously undescribed genes. A comparative analysis identified subtle differences between the genes within each expression cluster. To investigate these differences, we developed a novel Markov Random Field (MRF) segmentation algorithm to extract patterns. We then extended the MRF algorithm to detect shared expression boundaries, generate similarity measurements, and discriminate even faint/uncertain patterns between two TIs. This enabled us to identify more subtle partial expression pattern overlaps and adjacent non-overlapping patterns. For example, by conducting this analysis on the cluster containing the gene snail, we identified the previously known huckebein, which restricts snail expression (Reuter and Leptin, 1994), and zfh1, which interacts with tinman (Broihier et al, 1998; Su et al, 1999).
By studying the functions of known genes, we assigned putative developmental roles to each of the 39 clusters. Of the 1800 genes investigated, only half of them had previously assigned functions.
Representing expression patterns with geometric meshes facilitates the analysis of a complex process involving thousands of genes. This approach is complementary to the cellular resolution 3D atlas for the Drosophila embryo (Fowlkes et al, 2008). Our method can be used as a rapid, fully automated, high-throughput approach to obtain a map of co-expression, which will serve to select specific genes for detailed multiplex in-situ hybridization and confocal analysis for a fine-grain atlas. Our data are similar to the data in the literature, and research groups studying reporter constructs, mutant animals, or orthologs can easily produce in situ hybridizations. TIs can be readily created and provide representations that are both comparable to each other and our data set. We have demonstrated that our approach can be used for predicting relationships in regulatory and developmental pathways.
Discovery of temporal and spatial patterns of gene expression is essential for understanding the regulatory networks and development in multicellular organisms. We analyzed the images from our large-scale spatial expression data set of early Drosophila embryonic development and present a comprehensive computational image analysis of the expression landscape. For this study, we created an innovative virtual representation of embryonic expression patterns using an elliptically shaped mesh grid that allows us to make quantitative comparisons of gene expression using a common frame of reference. Demonstrating the power of our approach, we used gene co-expression to identify distinct expression domains in the early embryo; the result is surprisingly similar to the fate map determined using laser ablation. We also used a clustering strategy to find genes with similar patterns and developed new analysis tools to detect variation within consensus patterns, adjacent non-overlapping patterns, and anti-correlated patterns. Of the 1800 genes investigated, only half had previously assigned functions. The known genes suggest developmental roles for the clusters, and identification of related patterns predicts requirements for co-occurring biological functions.
PMCID: PMC2824522  PMID: 20087342
biological function; embryo; gene expression; in situ hybridization; Markov Random Field
3.  Genes encoding novel secreted and transmembrane proteins are temporally and spatially regulated during Drosophila melanogaster embryogenesis 
BMC Biology  2009;7:61.
Morphogenetic events that shape the Drosophila melanogaster embryo are tightly controlled by a genetic program in which specific sets of genes are up-regulated. We used a suppressive subtractive hybridization procedure to identify a group of developmentally regulated genes during early stages of D. melanogaster embryogenesis. We studied the spatiotemporal activity of these genes in five different intervals covering 12 stages of embryogenesis.
Microarrays were constructed to confirm induction of expression and to determine the temporal profile of isolated subtracted cDNAs during embryo development. We identified a set of 118 genes whose expression levels increased significantly in at least one developmental interval compared with a reference interval. Of these genes, 53% had a phenotype and/or molecular function reported in the literature, whereas 47% were essentially uncharacterized. Clustering analysis revealed demarcated transcript groups with maximum gene activity at distinct developmental intervals. In situ hybridization assays were carried out on 23 uncharacterized genes, 15 of which proved to have spatiotemporally restricted expression patterns. Among these 15 uncharacterized genes, 13 were found to encode putative secreted and transmembrane proteins. For three of them we validated our protein sequence predictions by expressing their cDNAs in Drosophila S2R+ cells and analyzed the subcellular distribution of recombinant proteins. We then focused on the functional characterization of the gene CG6234. Inhibition of CG6234 by RNA interference resulted in morphological defects in embryos, suggesting the involvement of this gene in germ band retraction.
Our data have yielded a list of developmentally regulated D. melanogaster genes and their expression profiles during embryogenesis and provide new information on the spatiotemporal expression patterns of several uncharacterized genes. In particular, we recovered a substantial number of unknown genes encoding putative secreted and transmembrane proteins, suggesting new components of signaling pathways that might be incorporated within the existing regulatory networks controlling D. melanogaster embryogenesis. These genes are also good candidates for additional targeted functional analyses similar to those we conducted for CG6234.
See related minireview by Vichas and Zallen:
PMCID: PMC2761875  PMID: 19772636
4.  A Digital Framework to Build, Visualize and Analyze a Gene Expression Atlas with Cellular Resolution in Zebrafish Early Embryogenesis 
PLoS Computational Biology  2014;10(6):e1003670.
A gene expression atlas is an essential resource to quantify and understand the multiscale processes of embryogenesis in time and space. The automated reconstruction of a prototypic 4D atlas for vertebrate early embryos, using multicolor fluorescence in situ hybridization with nuclear counterstain, requires dedicated computational strategies. To this goal, we designed an original methodological framework implemented in a software tool called Match-IT. With only minimal human supervision, our system is able to gather gene expression patterns observed in different analyzed embryos with phenotypic variability and map them onto a series of common 3D templates over time, creating a 4D atlas. This framework was used to construct an atlas composed of 6 gene expression templates from a cohort of zebrafish early embryos spanning 6 developmental stages from 4 to 6.3 hpf (hours post fertilization). They included 53 specimens, 181,415 detected cell nuclei and the segmentation of 98 gene expression patterns observed in 3D for 9 different genes. In addition, an interactive visualization software, Atlas-IT, was developed to inspect, supervise and analyze the atlas. Match-IT and Atlas-IT, including user manuals, representative datasets and video tutorials, are publicly and freely available online. We also propose computational methods and tools for the quantitative assessment of the gene expression templates at the cellular scale, with the identification, visualization and analysis of coexpression patterns, synexpression groups and their dynamics through developmental stages.
Author Summary
We propose a workflow to map the expression domains of multiple genes onto a series of 3D templates, or “atlas”, during early embryogenesis. It was applied to the zebrafish at different stages between 4 and 6.3 hpf, generating 6 templates. Our system overcomes the lack of significant morphological landmarks in early development by relying on the expression of a reference gene (goosecoid, gsc) and nuclear staining to guide the registration of the analyzed genes. The proposed method also successfully maps gene domains from partially imaged embryos, thus allowing greater microscope magnification and cellular resolution. By using the workflow to construct a spatiotemporal database of zebrafish, we opened the way to a systematic analysis of vertebrate embryogenesis. The atlas database, together with the mapping software (Match-IT), a custom-made visualization platform (Atlas-IT), and step-by-step user guides are available from the Supplementary Material. We expect that this will encourage other laboratories to generate, map, visualize and analyze new gene expression datasets.
PMCID: PMC4063669  PMID: 24945246
5.  A High-Resolution Anatomical Atlas of the Transcriptome in the Mouse Embryo 
PLoS Biology  2011;9(1):e1000582.
The manuscript describes the “digital transcriptome atlas” of the developing mouse embryo, a powerful resource to determine co-expression of genes, to identify cell populations and lineages and to identify functional associations between genes relevant to development and disease.
Ascertaining when and where genes are expressed is of crucial importance to understanding or predicting the physiological role of genes and proteins and how they interact to form the complex networks that underlie organ development and function. It is, therefore, crucial to determine on a genome-wide level, the spatio-temporal gene expression profiles at cellular resolution. This information is provided by colorimetric RNA in situ hybridization that can elucidate expression of genes in their native context and does so at cellular resolution. We generated what is to our knowledge the first genome-wide transcriptome atlas by RNA in situ hybridization of an entire mammalian organism, the developing mouse at embryonic day 14.5. This digital transcriptome atlas, the Eurexpress atlas (, consists of a searchable database of annotated images that can be interactively viewed. We generated anatomy-based expression profiles for over 18,000 coding genes and over 400 microRNAs. We identified 1,002 tissue-specific genes that are a source of novel tissue-specific markers for 37 different anatomical structures. The quality and the resolution of the data revealed novel molecular domains for several developing structures, such as the telencephalon, a novel organization for the hypothalamus, and insight on the Wnt network involved in renal epithelial differentiation during kidney development. The digital transcriptome atlas is a powerful resource to determine co-expression of genes, to identify cell populations and lineages, and to identify functional associations between genes relevant to development and disease.
Author Summary
In situ hybridization (ISH) can be used to visualize gene expression in cells and tissues in their native context. High-throughput ISH using nonradioactive RNA probes allowed the Eurexpress consortium to generate a comprehensive, interactive, and freely accessible digital gene expression atlas, the Eurexpress transcriptome atlas (, of the E14.5 mouse embryo. Expression data for over 15,000 genes were annotated for hundreds of anatomical structures, thus allowing us to systematically identify tissue-specific and tissue-overlapping gene networks. We illustrate the value of the Eurexpress atlas by finding novel regional subdivisions in the developing brain. We also use the transcriptome atlas to allocate specific components of the complex Wnt signaling pathway to kidney development, and we identify regionally expressed genes in liver that may be markers of hematopoietic stem cell differentiation.
PMCID: PMC3022534  PMID: 21267068
6.  Regulatory Pathway Analysis by High-Throughput In Situ Hybridization  
PLoS Genetics  2007;3(10):e178.
Automated in situ hybridization enables the construction of comprehensive atlases of gene expression patterns in mammals. Such atlases can become Web-searchable digital expression maps of individual genes and thus offer an entryway to elucidate genetic interactions and signaling pathways. Towards this end, an atlas housing ∼1,000 spatial gene expression patterns of the midgestation mouse embryo was generated. Patterns were textually annotated using a controlled vocabulary comprising >90 anatomical features. Hierarchical clustering of annotations was carried out using distance scores calculated from the similarity between pairs of patterns across all anatomical structures. This process ordered hundreds of complex expression patterns into a matrix that reflects the embryonic architecture and the relatedness of patterns of expression. Clustering yielded 12 distinct groups of expression patterns. Because of the similarity of expression patterns within a group, members of each group may be components of regulatory cascades. We focused on the group containing Pax6, an evolutionary conserved transcriptional master mediator of development. Seventeen of the 82 genes in this group showed a change of expression in the developing neocortex of Pax6-deficient embryos. Electromobility shift assays were used to test for the presence of Pax6-paired domain binding sites. This led to the identification of 12 genes not previously known as potential targets of Pax6 regulation. These findings suggest that cluster analysis of annotated gene expression patterns obtained by automated in situ hybridization is a novel approach for identifying components of signaling cascades.
Author Summary
Signaling pathways drive biological processes with high specificity. Reductionist approaches such as mutagenesis provide one strategy to identity components of pathways. We used high throughput in situ hybridization to systematically map the spatiotemporal expression pattern of ∼1,000 developmental genes in the mouse embryo. The rich information collectively contained in these patterns was captured in annotation tables that were systematically mined using hierarchical clustering, resulting in 12 groups of genes with related expression patterns. We show that this process generates biologically meaningful, high-content information. The expression pattern of developmental master regulator Pax6 is found in a cluster together with that of 81 other genes. The paired DNA binding domain of Pax6 can bind to regulatory sequences in 14 of the 81 genes. We also found that the expression pattern of all these 14 genes is up- or downregulated in Pax6 mutant mice. These results emphasize that determining the expression pattern of many genes in a systematic way followed by an application of integrative tools leads to the identification of novel candidate components of signaling pathways. More generally, when complemented with appropriate data-mining strategies, transcriptome-scale in situ hybridization can be turned into a powerful instrument for systems biology.
PMCID: PMC2041993  PMID: 17953485
7.  Unexpected Novel Relational Links Uncovered by Extensive Developmental Profiling of Nuclear Receptor Expression 
PLoS Genetics  2007;3(11):e188.
Nuclear receptors (NRs) are transcription factors that are implicated in several biological processes such as embryonic development, homeostasis, and metabolic diseases. To study the role of NRs in development, it is critically important to know when and where individual genes are expressed. Although systematic expression studies using reverse transcriptase PCR and/or DNA microarrays have been performed in classical model systems such as Drosophila and mouse, no systematic atlas describing NR involvement during embryonic development on a global scale has been assembled. Adopting a systems biology approach, we conducted a systematic analysis of the dynamic spatiotemporal expression of all NR genes as well as their main transcriptional coregulators during zebrafish development (101 genes) using whole-mount in situ hybridization. This extensive dataset establishes overlapping expression patterns among NRs and coregulators, indicating hierarchical transcriptional networks. This complete developmental profiling provides an unprecedented examination of expression of NRs during embryogenesis, uncovering their potential function during central nervous system and retina formation. Moreover, our study reveals that tissue specificity of hormone action is conferred more by the receptors than by their coregulators. Finally, further evolutionary analyses of this global resource led us to propose that neofunctionalization of duplicated genes occurs at the levels of both protein sequence and RNA expression patterns. Altogether, this expression database of NRs provides novel routes for leading investigation into the biological function of each individual NR as well as for the study of their combinatorial regulatory circuitry within the superfamily.
Author Summary
NRs are key molecules controlling development, metabolism, and reproduction in metazoans. Since NRs are implicated in many human diseases such as cancer, metabolic syndrome, and hormone resistance, they are important pharmaceutical targets and are under intense scrutiny to better understand their biological functions. In the present study, we determined the expression patterns of all NR genes as well as their main transcriptional coregulators during zebrafish development. We used zebrafish because the transparency of its embryo allows us to perform whole-mount in situ hybridization from early development to late organogenesis. This complete developmental profiling offers an unprecedented view of NR expression during embryogenesis, uncovering their potential function during central nervous system and retina formation. We observed that in contrast to NR genes, only a few coregulators exhibit a restricted expression pattern, suggesting that tissue specificity of hormone action is conferred more by the receptors than by their coregulators. Lastly, by evolutionary analysis of expression pattern divergence of duplicated genes, we observed that neofunctionalization occurs at the levels of both protein sequence and mRNA expression patterns. Taken together, our data provide the starting point for functional analysis of an entire gene family during development and call for the study of the intersection between metabolism and development.
PMCID: PMC2065881  PMID: 17997606
8.  Global analysis of patterns of gene expression during Drosophila embryogenesis 
Genome Biology  2007;8(7):R145.
Embryonic expression patterns for 6,003 (44%) of the 13,659 protein-coding genes identified in the Drosophila melanogaster genome were documented, of which 40% show tissue-restricted expression.
Cell and tissue specific gene expression is a defining feature of embryonic development in multi-cellular organisms. However, the range of gene expression patterns, the extent of the correlation of expression with function, and the classes of genes whose spatial expression are tightly regulated have been unclear due to the lack of an unbiased, genome-wide survey of gene expression patterns.
We determined and documented embryonic expression patterns for 6,003 (44%) of the 13,659 protein-coding genes identified in the Drosophila melanogaster genome with over 70,000 images and controlled vocabulary annotations. Individual expression patterns are extraordinarily diverse, but by supplementing qualitative in situ hybridization data with quantitative microarray time-course data using a hybrid clustering strategy, we identify groups of genes with similar expression. Of 4,496 genes with detectable expression in the embryo, 2,549 (57%) fall into 10 clusters representing broad expression patterns. The remaining 1,947 (43%) genes fall into 29 clusters representing restricted expression, 20% patterned as early as blastoderm, with the majority restricted to differentiated cell types, such as epithelia, nervous system, or muscle. We investigate the relationship between expression clusters and known molecular and cellular-physiological functions.
Nearly 60% of the genes with detectable expression exhibit broad patterns reflecting quantitative rather than qualitative differences between tissues. The other 40% show tissue-restricted expression; the expression patterns of over 1,500 of these genes are documented here for the first time. Within each of these categories, we identified clusters of genes associated with particular cellular and developmental functions.
PMCID: PMC2323238  PMID: 17645804
9.  Automated annotation of developmental stages of Drosophila embryos in images containing spatial patterns of expression 
Bioinformatics  2013;30(2):266-273.
Motivation: Drosophila melanogaster is a major model organism for investigating the function and interconnection of animal genes in the earliest stages of embryogenesis. Today, images capturing Drosophila gene expression patterns are being produced at a higher throughput than ever before. The analysis of spatial patterns of gene expression is most biologically meaningful when images from a similar time point during development are compared. Thus, the critical first step is to determine the developmental stage of an embryo. This information is also needed to observe and analyze expression changes over developmental time. Currently, developmental stages (time) of embryos in images capturing spatial expression pattern are annotated manually, which is time- and labor-intensive. Embryos are often designated into stage ranges, making the information on developmental time course. This makes downstream analyses inefficient and biological interpretations of similarities and differences in spatial expression patterns challenging, particularly when using automated tools for analyzing expression patterns of large number of images.
Results: Here, we present a new computational approach to annotate developmental stage for Drosophila embryos in the gene expression images. In an analysis of 3724 images, the new approach shows high accuracy in predicting the developmental stage correctly (79%). In addition, it provides a stage score that enables one to more finely annotate each embryo so that they are divided into early and late periods of development within standard stage demarcations. Stage scores for all images containing expression patterns of the same gene enable a direct way to view expression changes over developmental time for any gene. We show that the genomewide-expression-maps generated using images from embryos in refined stages illuminate global gene activities and changes much better, and more refined stage annotations improve our ability to better interpret results when expression pattern matches are discovered between genes.
Availability and implementation: The software package is available for download at:∼jye02/Software/Fly-Project/.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3892688  PMID: 24300439
10.  Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models 
PLoS Computational Biology  2011;7(7):e1002098.
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.
Author Summary
High throughput image acquisition is a quickly increasing new source of data for problems in computational biology, such as phenotypic screens. Given the very diverse nature of imaging technology, samples, and biological questions, approaches are oftentimes very tailored and ad hoc to a specific data set. In particular, the image-based genome scale profiling of gene expression patterns via approaches like in situ hybridization requires the development of accurate and automatic image analysis systems for understanding regulatory networks and development of multicellular organisms. Here, we present a computational method for automated annotation of Drosophila gene expression images. This framework allows us to extract, identify and compare spatial expression patterns, of essence for higher organisms. Based on a sparse feature extraction technique, we successfully cluster and annotate expression patterns with high reliability, and show that the model represents a “vocabulary” of basic patterns reflecting common function or regulation.
PMCID: PMC3140966  PMID: 21814502
11.  A bag-of-words approach for Drosophila gene expression pattern annotation 
BMC Bioinformatics  2009;10:119.
Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task.
We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods.
The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.
PMCID: PMC2680406  PMID: 19383139
12.  Automated annotation of Drosophila gene expression patterns using a controlled vocabulary 
Bioinformatics  2008;24(17):1881-1888.
Motivation: Regulation of gene expression in space and time directs its localization to a specific subset of cells during development. Systematic determination of the spatiotemporal dynamics of gene expression plays an important role in understanding the regulatory networks driving development. An atlas for the gene expression patterns of fruit fly Drosophila melanogaster has been created by whole-mount in situ hybridization, and it documents the dynamic changes of gene expression pattern during Drosophila embryogenesis. The spatial and temporal patterns of gene expression are integrated by anatomical terms from a controlled vocabulary linking together intermediate tissues developed from one another. Currently, the terms are assigned to patterns manually. However, the number of patterns generated by high-throughput in situ hybridization is rapidly increasing. It is, therefore, tempting to approach this problem by employing computational methods.
Results: In this article, we present a novel computational framework for annotating gene expression patterns using a controlled vocabulary. In the currently available high-throughput data, annotation terms are assigned to groups of patterns rather than to individual images. We propose to extract invariant features from images, and construct pyramid match kernels to measure the similarity between sets of patterns. To exploit the complementary information conveyed by different features and incorporate the correlation among patterns sharing common structures, we propose efficient convex formulations to integrate the kernels derived from various features. The proposed framework is evaluated by comparing its annotation with that of human curators, and promising performance in terms of F1 score has been reported.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2519157  PMID: 18632750
13.  Joint stage recognition and anatomical annotation of drosophila gene expression patterns 
Bioinformatics  2012;28(12):i16-i24.
Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms.
Results: In this article, we propose a novel computational model for jointly stage classification and anatomical terms annotation of Drosophila gene expression patterns. We propose a novel Tri-Relational Graph (TG) model that comprises the data graph, anatomical term graph, developmental stage term graph, and connect them by two additional graphs induced from stage or annotation label assignments. Upon the TG model, we introduce a Preferential Random Walk (PRW) method to jointly recognize developmental stage and annotate anatomical terms by utilizing the interrelations between two tasks. The experimental results on two refined BDGP datasets demonstrate that our joint learning method can achieve superior prediction results on both tasks than the state-of-the-art methods.
PMCID: PMC3371852  PMID: 22689756
14.  B1 SOX Coordinate Cell Specification with Patterning and Morphogenesis in the Early Zebrafish Embryo 
PLoS Genetics  2010;6(5):e1000936.
The B1 SOX transcription factors SOX1/2/3/19 have been implicated in various processes of early embryogenesis. However, their regulatory functions in stages from the blastula to early neurula remain largely unknown, primarily because loss-of-function studies have not been informative to date. In our present study, we systematically knocked down the B1 sox genes in zebrafish. Only the quadruple knockdown of the four B1 sox genes sox2/3/19a/19b resulted in very severe developmental abnormalities, confirming that the B1 sox genes are functionally redundant. We characterized the sox2/3/19a/19b quadruple knockdown embryos in detail by examining the changes in gene expression through in situ hybridization, RT–PCR, and microarray analyses. Importantly, these phenotypic analyses revealed that the B1 SOX proteins regulate the following distinct processes: (1) early dorsoventral patterning by controlling bmp2b/7; (2) gastrulation movements via the regulation of pcdh18a/18b and wnt11, a non-canonical Wnt ligand gene; (3) neural differentiation by regulating the Hes-class bHLH gene her3 and the proneural-class bHLH genes neurog1 (positively) and ascl1a (negatively), and regional transcription factor genes, e.g., hesx1, zic1, and rx3; and (4) neural patterning by regulating signaling pathway genes, cyp26a1 in RA signaling, oep in Nodal signaling, shh, and mdkb. Chromatin immunoprecipitation analysis of the her3, hesx1, neurog1, pcdh18a, and cyp26a1 genes further suggests a direct regulation of these genes by B1 SOX. We also found an interesting overlap between the early phenotypes of the B1 sox quadruple knockdown embryos and the maternal-zygotic spg embryos that are devoid of pou5f1 activity. These findings indicate that the B1 SOX proteins control a wide range of developmental regulators in the early embryo through partnering in part with Pou5f1 and possibly with other factors, and suggest that the B1 sox functions are central to coordinating cell fate specification with patterning and morphogenetic processes occurring in the early embryo.
Author Summary
In the developing embryo, various processes such as cell fate specification, embryo patterning, and morphogenesis take place concurrently. The embryo must control gene expression in order to coordinate these processes and thereby enable the proper organization of its structures. The B1 sox transcription factor genes, exemplified by the “stem cell gene” sox2, are thought to play a key role in these embryonic processes from the blastoderm stage to the neural stage. However, the precise regulatory functions of these genes are largely unknown due to the lack of loss-of-function studies. In our current study, we took advantage of the zebrafish system and successfully depleted B1 sox activity from the early embryo using antisense knockdown technology. This approach enabled us to further uncover the regulatory functions of B1 sox in early embryos. We found that the activity of the B1 sox genes is required for the expression of a wide range of developmental regulators including transcription factors, signaling pathway components, and cell adhesion molecules. These findings suggest that the B1 sox functions are central to coordinating diverse embryonic processes, particularly those that occur during the development of the primordium of the central nervous system.
PMCID: PMC2865518  PMID: 20463883
15.  Alpha-adaptin, a marker for endocytosis, is expressed in complex patterns during Drosophila development. 
Molecular Biology of the Cell  1997;8(8):1391-1403.
A Drosophila cDNA encoding a structural homologue of the mammalian coated vesicle component alpha-adaptin (AP2 adaptor complex) has been cloned and sequenced. The mammalian and invertebrate sequences are highly conserved, especially within the amino terminal region, a domain that mediates interactions with other components within the AP2 complex and with specific receptors tails. Mammalian alpha-adaptins are encoded by two genes; however, Drosophila alpha-adaptin has a single gene locus, within polytene bands 21C2-C3 on the left arm of the chromosome 2, closely adjacent to the paired homeobox gene aristaless. There seem to be at least two Drosophila alpha-adaptin transcripts expressed, plausibly by alternative splicing. One of the transcripts is more abundant during early embryogenesis and may be of maternal origin. We have studied the distribution of the alpha-adaptin protein throughout embryogenesis and at the neuromuscular junction of the third instar larva. During cellularization of the blastoderm embryo, the protein is seen between and ahead of the elongating nuclei, and then redistributes to the cell surface during gastrulation. These observations suggest a role for endocytosis in cellularization and are consistent with the finding that dynamin (the shibire gene product), another component of the endocytic mechanism, is required for cellularization. At later stages of embryogenesis, alpha-adaptin is expressed in complex and dynamic patterns. It is strongly induced in elements of the central and peripheral nervous system (e.g., in neuroblasts, the presumptive stomatogastric nervous system, and the lateral chordotonal sense organs), in the Garland cells, the adult midgut precursors, the antenno-maxillary complex, the endoderm, the fat bodies, and the visceral mesoderm. In the larva, alpha-adaptin is localized at the plasma membrane in the synaptic boutons of the neuromuscular junctions. The cells expressing high levels of alpha-adaptin are known or expected to support high levels of endocytosis; thus, this coated vesicle protein seems to be an excellent marker for endocytic activity. The expression patterns of dynamin, detected in the embryo by in situ hybridization methods, are very similar to those reported here for alpha-adaptin reflecting the likely coordinated expression of endocytic components. Taken together with previous evidence, our results suggest that endosomal vesicle trafficking, membrane recycling, and the regulation of endocytosis play critical roles in the wide range of developmental processes.
PMCID: PMC276164  PMID: 9285813
16.  Spatial gene expression quantification: a tool for analysis of in situ hybridizations in sea anemone Nematostella vectensis 
BMC Research Notes  2012;5:555.
Spatial gene expression quantification is required for modeling gene regulation in developing organisms. The fruit fly Drosophila melanogaster is the model system most widely applied for spatial gene expression analysis due to its unique embryonic properties: the shape does not change significantly during its early cleavage cycles and most genes are differentially expressed along a straight axis. This system of development is quite exceptional in the animal kingdom.
In the sea anemone Nematostella vectensis the embryo changes its shape during early development; there are cell divisions and cell movement, like in most other metazoans. Nematostella is an attractive case study for spatial gene expression since its transparent body wall makes it accessible to various imaging techniques.
Our new quantification method produces standardized gene expression profiles from raw or annotated Nematostella in situ hybridizations by measuring the expression intensity along its cell layer. The procedure is based on digital morphologies derived from high-resolution fluorescence pictures. Additionally, complete descriptions of nonsymmetric expression patterns have been constructed by transforming the gene expression images into a three-dimensional representation.
We created a standard format for gene expression data, which enables quantitative analysis of in situ hybridizations from embryos with various shapes in different developmental stages. The obtained expression profiles are suitable as input for optimization of gene regulatory network models, and for correlation analysis of genes from dissimilar Nematostella morphologies. This approach is potentially applicable to many other metazoan model organisms and may also be suitable for processing data from three-dimensional imaging techniques.
PMCID: PMC3532226  PMID: 23039089
Nematostella vectensis; Gene expression quantification; Gene network modelling; Embryonic development; Embryo morphology
17.  Search for the genes involved in oocyte maturation and early embryo development in the hen 
BMC Genomics  2008;9:110.
The initial stages of development depend on mRNA and proteins accumulated in the oocyte, and during these stages, certain genes are essential for fertilization, first cleavage and embryonic genome activation. The aim of this study was first to search for avian oocyte-specific genes using an in silico and a microarray approaches, then to investigate the temporal and spatial dynamics of the expression of some of these genes during follicular maturation and early embryogenesis.
The in silico approach allowed us to identify 18 chicken homologs of mouse potential oocyte genes found by digital differential display. Using the chicken Affymetrix microarray, we identified 461 genes overexpressed in granulosa cells (GCs) and 250 genes overexpressed in the germinal disc (GD) of the hen oocyte. Six genes were identified using both in silico and microarray approaches. Based on GO annotations, GC and GD genes were differentially involved in biological processes, reflecting different physiological destinations of these two cell layers. Finally we studied the spatial and temporal dynamics of the expression of 21 chicken genes. According to their expression patterns all these genes are involved in different stages of final follicular maturation and/or early embryogenesis in the chicken. Among them, 8 genes (btg4, chkmos, wee, zpA, dazL, cvh, zar1 and ktfn) were preferentially expressed in the maturing occyte and cvh, zar1 and ktfn were also highly expressed in the early embryo.
We showed that in silico and Affymetrix microarray approaches were relevant and complementary in order to find new avian genes potentially involved in oocyte maturation and/or early embryo development, and allowed the discovery of new potential chicken mature oocyte and chicken granulosa cell markers for future studies. Moreover, detailed study of the expression of some of these genes revealed promising candidates for maternal effect genes in the chicken. Finally, the finding concerning the different state of rRNA compared to that of mRNA during the postovulatory period shed light on some mechanisms through which oocyte to embryo transition occurs in the hen.
PMCID: PMC2322995  PMID: 18312645
18.  Image-level and group-level models for Drosophila gene expression pattern annotation 
BMC Bioinformatics  2013;14:350.
Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison.
We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach.
In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.
PMCID: PMC3924186  PMID: 24299119
19.  An Integrated Strategy for Analyzing the Unique Developmental Programs of Different Myoblast Subtypes 
PLoS Genetics  2006;2(2):e16.
An important but largely unmet challenge in understanding the mechanisms that govern the formation of specific organs is to decipher the complex and dynamic genetic programs exhibited by the diversity of cell types within the tissue of interest. Here, we use an integrated genetic, genomic, and computational strategy to comprehensively determine the molecular identities of distinct myoblast subpopulations within the Drosophila embryonic mesoderm at the time that cell fates are initially specified. A compendium of gene expression profiles was generated for primary mesodermal cells purified by flow cytometry from appropriately staged wild-type embryos and from 12 genotypes in which myogenesis was selectively and predictably perturbed. A statistical meta-analysis of these pooled datasets—based on expected trends in gene expression and on the relative contribution of each genotype to the detection of known muscle genes—provisionally assigned hundreds of differentially expressed genes to particular myoblast subtypes. Whole embryo in situ hybridizations were then used to validate the majority of these predictions, thereby enabling true-positive detection rates to be estimated for the microarray data. This combined analysis reveals that myoblasts exhibit much greater gene expression heterogeneity and overall complexity than was previously appreciated. Moreover, it implicates the involvement of large numbers of uncharacterized, differentially expressed genes in myogenic specification and subsequent morphogenesis. These findings also underscore a requirement for considerable regulatory specificity for generating diverse myoblast identities. Finally, to illustrate how the developmental functions of newly identified myoblast genes can be efficiently surveyed, a rapid RNA interference assay that can be scored in living embryos was developed and applied to selected genes. This integrated strategy for examining embryonic gene expression and function provides a substantially expanded framework for further studies of this model developmental system.
Animal development requires cells in complex organs to acquire distinct identities. During the development of the body wall musculature of the fruit fly, a pool of apparently identical cells gives rise to two types of muscle precursors, both of which are required for the appearance of functioning muscles. These identities depend on broad programs of gene expression. The authors attempt to dissect the complements of expressed genes that define these two different cell types by integrating modern methods in genetics, genomics, and informatics. By purifying informative cells from normal embryos and mutants that perturb muscle development, assaying their genomewide gene expression programs, and combining experiments statistically, they have identified fivefold more founder-specific genes than were previously suspected to characterize this cell type. The expression patterns of hundreds of genes were examined in whole embryos to test the statistical predictions, permitting the authors to estimate how many more cell type–specific genes remain to be discovered. Finally, dozens of the genes highlighted by these methods were tested for direct involvement in muscle development, and several new players in this process are reported. The integrated strategy used here can be generalized for studying genetic programs in other complex tissues.
PMCID: PMC1366495  PMID: 16482229
20.  Spatial expression of transcription factors in Drosophila embryonic organ development 
Genome Biology  2013;14(12):R140.
Site-specific transcription factors (TFs) bind DNA regulatory elements to control expression of target genes, forming the core of gene regulatory networks. Despite decades of research, most studies focus on only a small number of TFs and the roles of many remain unknown.
We present a systematic characterization of spatiotemporal gene expression patterns for all known or predicted Drosophila TFs throughout embryogenesis, the first such comprehensive study for any metazoan animal. We generated RNA expression patterns for all 708 TFs by in situ hybridization, annotated the patterns using an anatomical controlled vocabulary, and analyzed TF expression in the context of organ system development. Nearly all TFs are expressed during embryogenesis and more than half are specifically expressed in the central nervous system. Compared to other genes, TFs are enriched early in the development of most organ systems, and throughout the development of the nervous system. Of the 535 TFs with spatially restricted expression, 79% are dynamically expressed in multiple organ systems while 21% show single-organ specificity. Of those expressed in multiple organ systems, 77 TFs are restricted to a single organ system either early or late in development. Expression patterns for 354 TFs are characterized for the first time in this study.
We produced a reference TF dataset for the investigation of gene regulatory networks in embryogenesis, and gained insight into the expression dynamics of the full complement of TFs controlling the development of each organ system.
PMCID: PMC4053779  PMID: 24359758
21.  Transcriptomic analysis highlights epigenetic and transcriptional regulation during zygotic embryo development of Pinus pinaster 
BMC Plant Biology  2013;13:123.
It is during embryogenesis that the plant body plan is established and the meristems responsible for all post-embryonic growth are specified. The molecular mechanisms governing conifer embryogenesis are still largely unknown. Their elucidation may contribute valuable information to clarify if the distinct features of embryo development in angiosperms and gymnosperms result from differential gene regulation. To address this issue, we have performed the first transcriptomic analysis of zygotic embryo development in a conifer species (Pinus pinaster) focusing our study in particular on regulatory genes playing important roles during plant embryo development, namely epigenetic regulators and transcription factors.
Microarray analysis of P. pinaster zygotic embryogenesis was performed at five periods of embryo development from early developing to mature embryos. Our results show that most changes in transcript levels occurred in the first and the last embryo stage-to-stage transitions, namely early to pre-cotyledonary embryo and cotyledonary to mature embryo. An analysis of functional categories for genes that were differentially expressed through embryogenesis highlighted several epigenetic regulation mechanisms. While putative orthologs of transcripts associated with mechanisms that target transposable elements and repetitive sequences were strongly expressed in early embryogenesis, PRC2-mediated repression of genes seemed more relevant during late embryogenesis. On the other hand, functions related to sRNA pathways appeared differentially regulated across all stages of embryo development with a prevalence of miRNA functions in mid to late embryogenesis. Identification of putative transcription factor genes differentially regulated between consecutive embryo stages was strongly suggestive of the relevance of auxin responses and regulation of auxin carriers during early embryogenesis. Such responses could be involved in establishing embryo patterning. Later in development, transcripts with homology to genes acting on modulation of auxin flow and determination of adaxial-abaxial polarity were up-regulated, as were putative orthologs of genes required for meristem formation and function as well as establishment of organ boundaries. Comparative analysis with A. thaliana embryogenesis also highlighted genes involved in auxin-mediated responses, as well as epigenetic regulation, indicating highly correlated transcript profiles between the two species.
This is the first report of a time-course transcriptomic analysis of zygotic embryogenesis in a conifer. Taken together our results show that epigenetic regulation and transcriptional control related to auxin transport and response are critical during early to mid stages of pine embryogenesis and that important events during embryogenesis seem to be coordinated by putative orthologs of major developmental regulators in angiosperms.
PMCID: PMC3844413  PMID: 23987738
Conifer embryogenesis; Epigenetics; Gymnosperm; Transcriptomics; Transcription factor
22.  Inositol-requiring enzyme 1α is required for gut development in Xenopus lavies embryos 
AIM: To investigate the role of inositol-requiring enzyme 1α (IRE1α) in gut development of Xenopus lavies embryos.
METHODS: Xenopus embryos were obtained with in vitro fertilization and cultured in 0.1 × MBSH. One and half nanogram of IRE1α, 1 ng of IRE1α-GR mRNA, 1 ng of IRE1αΔC-GR mRNA, and 50 ng of IRE1α morpholino oligonucleotide (MO) or XBP1(C)MO were injected into four blastomeres at 4-cell stage for scoring the phenotype and marker gene analysis. To rescue the effect of IRE1α MO, 1 ng of IRE1α-GR mRNA was co-injected with 50 ng of MO. For the activation of the GR-fusion proteins, dexamethasone was prepared as 5 mmol/L stock solutions in 100% ethanol and applied to the mRNA injected embryos at desired stages in a concentration of 10 μmol/L in 0.1 × MBSH. Embryos were kept in dexamethasone up to stage 41. Whole-mount in situ hybridization was used to determine specific gene expression, such as IRE1α, IRE1β, Xbra and Xsox17α. IRE1α protein expression during Xenopus embryogenesis was detected by Western blotting.
RESULTS: In the whole-mount in situ hybridization analysis, xenopus IRE1α and IRE1β showed quite different expression pattern during tadpole stage. The relatively higher expression of IRE1α was observed in the pancreas, and significant transcription of IRE1β was found in the liver. IRE1α protein could be detected at all developmental stages analyzed, from stage 1 to stage 42. Gain-of-function assay showed that IRE1α mRNA injected embryos at tailbud stage were nearly normal and the expression of the pan-mesodermal marker gene Xbra and the endodermal gene Xsox17α at stage 10.5 was not significantly changed in embryos injected with IRE1α mRNA as compared to uninjected control embryos. And at tadpole stage, the embryos injected with IRE1α-GR mRNA did not display overt phenotype, such as gut-coiling defect. Loss-of-function assay demonstrated that the IRE1α MO injected embryos were morphologically normal before the tailbud stages. We did not observe a significant change of mesodermal and endodermal marker gene expression, while after stage 40, about 80% of the MO injected embryos exhibited dramatic gut defects in which the guts did not coil, but other structures outside the gastrointestinal tract were relatively normal. To test if the phenotypes were specifically caused by the knockdown of IRE1α, a rescue experiment was performed by co-injection of IRE1α-GR mRMA with IRE1α MO. The data obtained demonstrated that the gut coiling defect was rescued. The deletion mutant of IRE1α was constructed, consisting of the N-terminal part without the C-terminal kinase and RNase domains named IRE1αΔC, to investigate the functional domain of IRE1α. Injection of IRE1αΔC-GR mRNA caused similar morphological alterations with gut malformation by interfering with the function of endogenous xIRE1α. In order to investigate if IRE1α/XBP1 pathway was involved in gut development, 50 ng of XBP1 MO was injected and the results showed that knockdown of XBP1 resulted in similar morphological alterations with gut-coiling defect at tadpole stage.
CONCLUSION: IRE1α is not required for germ layer formation but for gut development in Xenopus lavies and it may function via XBP1-dependent pathway.
PMCID: PMC3548012  PMID: 23345945
Inositol-requiring enzyme 1α; XBP1; Xenopus lavies; Gut; Development
23.  Dual mode of embryonic development is highlighted by expression and function of Nasonia pair-rule genes 
eLife  2014;3:e01440.
Embryonic anterior–posterior patterning is well understood in Drosophila, which uses ‘long germ’ embryogenesis, in which all segments are patterned before cellularization. In contrast, most insects use ‘short germ’ embryogenesis, wherein only head and thorax are patterned in a syncytial environment while the remainder of the embryo is generated after cellularization. We use the wasp Nasonia (Nv) to address how the transition from short to long germ embryogenesis occurred. Maternal and gap gene expression in Nasonia suggest long germ embryogenesis. However, the Nasonia pair-rule genes even-skipped, odd-skipped, runt and hairy are all expressed as early blastoderm pair-rule stripes and late-forming posterior stripes. Knockdown of Nv eve, odd or h causes loss of alternate segments at the anterior and complete loss of abdominal segments. We propose that Nasonia uses a mixed mode of segmentation wherein pair-rule genes pattern the embryo in a manner resembling Drosophila at the anterior and ancestral Tribolium at the posterior.
eLife digest
Networks of genes that work together are widespread in nature. The conservation of individual genes across species and the tendency of their networks to stick together is a sign that they are working efficiently. Furthermore, it is common for existing gene networks to be adapted to perform new tasks, instead of new networks being invented every time a similar but distinct demand arises. One important question is: how can evolution use the same building blocks—such as the genes in a functioning network—in different ways to achieve new outcomes?
The gene network that sets up the ‘body plan’ of insects during development has been well studied, most deeply in the fruit fly, Drosophila. Like all insects, the body of a fruit fly is divided into three main parts—the head, the thorax and the abdomen—and each of these parts is made up of several smaller segments. There is a remarkable diversity of insect body plans in nature, and yet, these seem to arise from the same gene networks in the embryo.
When a Drosophila embryo is growing into a larva, all the different body segments develop at the same time. In most other insects, however, segments of the abdomen emerge later and sequentially during the development process. The ancestors of most insects are also thought to have developed in this way, which is known as ‘short germ embryogenesis’. So how did the so-called ‘long germ embryogenesis’, as observed in Drosophila, evolve from the short germ embryogenesis that is observed in most other insects?
The gene network that controls development includes the ‘pair-rule genes’ that are expressed in a pattern of alternating stripes that wrap around, top to bottom, along most of the length of the embryo. These stripes mark where the edges of each body segment will eventually develop. In fruit flies, this pattern extends along the entire length of the embryo and the stripes all appear at one time. However, in the abdominal region of short germ insects, the pair-rule genes are expressed in waves that pass through the posterior region as it grows, with new segments being added one behind the other.
Now, Rosenberg et al. have attempted to explain how the same genes can be used to direct the segmentation process in such different ways by studying another long germ insect species, the jewel wasp. Analysis of the expression of pair-rule genes in the jewel wasp shows that it uses a mixed strategy to control segmentation. The development of segments at the front of its body is directed in the same way as the fruit fly, with all these segments laid down together. However, the segments at the rear of the body are only patterned later, one after the other, like most other insects.
The work of Rosenberg et al. suggests that the jewel wasp represents an intermediate step between ancestral insects and Drosophila in the evolution of the gene network that patterns the ‘body plan’. Identifying and studying these intermediate forms allows us to understand the ways in which evolution can innovate by building upon what has come before.
PMCID: PMC3941026  PMID: 24599282
Nasonia vitripennis; Tribolium; embryonic patterning; evolution; segmentation; pair-rule genes; D. melanogaster; other
24.  Automatic image analysis for gene expression patterns of fly embryos 
BMC Cell Biology  2007;8(Suppl 1):S7.
Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a D. melanogaster embryo delivers the detailed spatio-temporal pattern of expression of the gene. Many biological problems such as the detection of co-expressed genes, co-regulated genes, and transcription factor binding motifs rely heavily on the analyses of these image patterns. The increasing availability of ISH image data motivates the development of automated computational approaches to the analysis of gene expression patterns.
We have developed algorithms and associated software that extracts a feature representation of a gene expression pattern from an ISH image, that clusters genes sharing the same spatio-temporal pattern of expression, that suggests transcription factor binding (TFB) site motifs for genes that appear to be co-regulated (based on the clustering), and that automatically identifies the anatomical regions that express a gene given a training set of annotations. In fact, we developed three different feature representations, based on Gaussian Mixture Models (GMM), Principal Component Analysis (PCA), and wavelet functions, each having different merits with respect to the tasks above. For clustering image patterns, we developed a minimum spanning tree method (MSTCUT), and for proposing TFB sites we used standard motif finders on clustered/co-expressed genes with the added twist of requiring conservation across the genomes of 8 related fly species. Lastly, we trained a suite of binary-classifiers, one for each anatomical annotation term in a controlled vocabulary or ontology that operate on the wavelet feature representation. We report the results of applying these methods to the Berkeley Drosophila Genome Project (BDGP) gene expression database.
Our automatic image analysis methods recapitulate known co-regulated genes and give correct developmental-stage classifications with 99+% accuracy, despite variations in morphology, orientation, and focal plane suggesting that these techniques form a set of useful tools for the large-scale computational analysis of fly embryonic gene expression patterns.
PMCID: PMC1924512  PMID: 17634097
25.  Zebrafish Pou5f1-dependent transcriptional networks in temporal control of early development 
Time-resolved transcriptome analysis of early pou5f1 mutant zebrafish embryos identified groups of developmental regulators, including SoxB1 genes, that depend on Pou5f1 activity, and a large cluster of differentiation genes which are prematurely expressed.Pou5f1 represses differentiation genes indirectly via activation of germlayer-specific transcriptional repressor genes, including her3, which may mediate in part Pou5f1-dependent repression of neural genes.A dynamic mathematical model is established for Pou5f1 and SoxB1 activity-dependent temporal behaviour of downstream transcriptional regulatory networks. The model predicts that Pou5f1-dependent increase in SoxB1 activity significantly contributes to developmental timing in the early gastrula.Comparison to mouse Pou5f1/Oct4 reveals evolutionary conserved targets. We show that Pou5f1 developmental function is also conserved by demonstrating rescue of Pou5f1 mutant zebrafish embryos by mouse POU5F1/OCT4.
The transcription factor Pou5f1/Oct4 controls pluripotency of mouse embryonic inner cell mass cells (Nichols et al, 1998), and of mouse and human ES cell lines (Boiani and Scholer, 2005). Although Pou5f1/Oct4-dependent pluripotency transcriptional circuits and many transcriptional targets have been characterized, little is known about the mechanisms by which Pou5f1/Oct4 controls early developmental events. A detailed understanding of Pou5f1/Oct4 functions during mammalian blastocyst and gastrula development as well as studies of the temporal changes in the Pou5f1/Oct4-regulated networks are precluded by the early lineage defects in pou5f1/oct4 mutant mice. To investigate Pou5f1-dependent transcriptional circuits in developmental control, we used the zebrafish (Danio rerio) as a genetic and experimental model representing an earlier state of vertebrate evolution. Zebrafish have one pou5f1/pou2 gene (Takeda et al, 1994) orthologous to the mammalian gene (Niwa et al, 2008; Frankenberg et al, 2009). Both fish and mammalian orthologs are expressed broadly in tissues giving rise to the embryo proper during blastula and early gastrula stages, as well as in the neural plate (Belting et al, 2001; Reim and Brand, 2002; Downs, 2008).
Zebrafish pou5f1 loss-of-function mutant embryos, MZspg (abbreviated ‘MZ'), are completely devoid of maternal and zygotic Pou5f1 activity (Lunde et al, 2004; Reim et al, 2004). MZ embryos have gastrulation abnormalities (Lachnit et al, 2008), dorsoventral patterning defects (Reim and Brand, 2006), and do not develop endoderm (Lunde et al, 2004; Reim et al, 2004). In contrast to Pou5f1/Oct4 mutant mice, which are blocked in development due to loss of inner cell mass, MZ mutant embryos are neither blocked in development nor display a general delay. Therefore, zebrafish present a good model system to identify specific transcriptional targets of Pou5f1 during development.
Our study aims to understand the structure, regulatory logic, and developmental temporal changes in the Pou5f1-dependent transcriptional network in the context of an intact embryo. Therefore, we investigated transcriptome changes in MZ compared with WT zebrafish by microarray analysis at 10 distinct time-points during development, from ovaries to late gastrulation. We identified changes in Pou5f1 target gene expression both with respect to their expression level and temporal behavior. We used correlation analysis to identify clusters of target genes enriched for genes with developmentally regulated expression profiles. This correlation analysis revealed a cluster of genes, which were not activated or were significantly delayed in MZ. Interestingly, there was also a large gene cluster with premature onset of expression in MZ.
Several targets activated by Pou5f1 encode known repressors of differentiation (RODs), of which we analyzed her3 in detail. Pou5f1 also activates several SoxB1 group transcription factors, which are known to act together with Pou5f1 in mammalian systems. Among the large group of genes prematurely activated in MZ, many genes encode developmental regulators of differentiation normally acting during organogenesis (promoters of differentiation—PODs). Our analysis of potential direct transcriptional interactions by suppression of translation of intermediate zygotic Pou5f1 or SoxB1 targets, enabled us to distinguish Sox-dependent and independent subgroups of the Pou5f1 transcriptional network. Interestingly, tissue-specific expression of Pou5f1 targets correlated with their regulation by Sox2, with Sox-dependent targets being mostly localized to ectoderm and neuroectoderm, whereas Sox-independent targets localized to mesendoderm of the developing zebrafish embryo. Further, SoxB1 independent Pou5f1 targets (for example foxD3) differ from SoxB1-dependent targets (e.g her3) in temporal dynamics of expression. Most Sox-independent direct Pou5f1 targets in WT reach maximal expression levels soon after midblastula transition (MBT) at 3–4 h postfertilization (hpf). In contrast, genes depending both on Sox2 and Pou5f1 tend to have a biphasic temporal expression curve or are activated with >2 h delay after MBT to reach maximum levels at 6–7 hpf only.
To better understand the impact of our findings on Pou5f1/SoxB1-dependent versus Pou5f1-only regulation on developmental mechanisms, we built a small dynamic network model that links the temporal control of target genes to regulatory principles exerted by Pou5f1 and SoxB1 proteins (Figure 6A). The model is based on ordinary differential equations, and parameters were determined by a fit to the WT and MZ gene expression data. The optimized model highlights two qualitatively different temporal expression modes of Pou5f1 downstream targets: monophasic for targets depending only on Pou5f1 (foxd3), and biphasic for Pou5f1- and SoxB1-dependent targets (sox2 and her3; Figure 6B). To test whether the model is also able to correctly predict a different genetic condition, we simulated the M mutant, which is lacking maternal Pou5f1, but gradually rescued by the paternal pou5f1 contribution after MBT (Figure 6B, blue, dashed curve). The model predicts an overall shift in the developmental program. Most importantly, the sox2 and her3 expression is rescued with a delay of about 2 h. The model predictions were checked experimentally by quantitative RT–PCR (Figure 6B, blue dots). Most predictions are in good agreement with the experimental data, for example the delayed rescue of the sox2 and her3 temporal expression profile. With respect to the ‘POD' nr2f1, the model correctly predicts the efficient downregulation by zygotic targets of Pou5f1 (Figure 6B).
We identified an evolutionary conserved core set of Pou5f1 targets, by comparing our gene list with the lists of mouse Pou5f1/Oct4 targets (Loh et al, 2006; Sharov et al, 2008). The evolutionary conservation suggests equivalent Pou5f1 functions during the pregastrulation and gastrulation period of vertebrate embryogenesis. Therefore, we tested whether mouse Pou5f1/Oct4 was able to rescue MZ embryos. Injection of mRNA encoding mouse Pou5f1/Oct4 into MZ embryos (Figure 8A) was able to restore normal zebrafish development to an extent comparable with zebrafish pou5f1/pou2 mRNA (Figure 8B and C). The significant overlap between zebrafish and mammalian Pou5f1 targets together with the ability of mouse Pou5f1/Oct4 to functionally replace the zebrafish Pou5f1/Pou2 (Figure 8A–C), suggests that the mammalian network may have evolved from a basal situation similar to what is observed in teleosts. We propose models that emphasize the evolution of Pou5f1-dependent transcriptional networks during development of the zebrafish (Figure 8D) and mammals (Figure 8E). Our representation highlights the evolutionary ancient germlayer-specific subnetworks downstream of Pou5f1, which are presumably used for controlling the timing of differentiation during gastrulation in all vertebrates (Figure 8D and E, black arrows). As the Pou5f1 downstream regulatory nodes revealed in our zebrafish model are likely conserved across vertebrates, we envision that their knowledge will contribute to the effort of directing differentiation of pluripotent stem cells to defined cell fates.
The transcription factor POU5f1/OCT4 controls pluripotency in mammalian ES cells, but little is known about its functions in the early embryo. We used time-resolved transcriptome analysis of zebrafish pou5f1 MZspg mutant embryos to identify genes regulated by Pou5f1. Comparison to mammalian systems defines evolutionary conserved Pou5f1 targets. Time-series data reveal many Pou5f1 targets with delayed or advanced onset of expression. We identify two Pou5f1-dependent mechanisms controlling developmental timing. First, several Pou5f1 targets are transcriptional repressors, mediating repression of differentiation genes in distinct embryonic compartments. We analyze her3 gene regulation as example for a repressor in the neural anlagen. Second, the dynamics of SoxB1 group gene expression and Pou5f1-dependent regulation of her3 and foxD3 uncovers differential requirements for SoxB1 activity to control temporal dynamics of activation, and spatial distribution of targets in the embryo. We establish a mathematical model of the early Pou5f1 and SoxB1 gene network to demonstrate regulatory characteristics important for developmental timing. The temporospatial structure of the zebrafish Pou5f1 target networks may explain aspects of the evolution of the mammalian stem cell networks.
PMCID: PMC2858445  PMID: 20212526
developmental timing; mathematical modeling; Oct4; transcriptional networks

Results 1-25 (907313)