PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (35)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
more »
1.  Genome-guided transcript assembly from integrative analysis of RNA sequence data 
Nature biotechnology  2014;32(4):341-346.
The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in genome annotation pipelines. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call generalized RNA integration tool, or GRIT. By applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recover the vast majority of previously annotated transcripts and double the total number of transcripts cataloged. We find that 20% of protein coding genes encode multiple protein-localization signals, and that, in 20 day old adult fly heads, genes with multiple poly-adenylation sites are more common than genes with alternate splicing or alternate promoters. When compared to the most widely used transcript assembly tools, GRIT recovers a larger fraction of annotated transcripts at higher precision. GRIT will enable the automated generation of high-quality genome annotations without necessitating extensive manual annotation.
doi:10.1038/nbt.2850
PMCID: PMC4037530  PMID: 24633242
2.  Automated protein-DNA interaction screening of Drosophila regulatory elements 
Nature methods  2011;8(12):1065-1070.
Drosophila melanogaster has one of the best characterized metazoan genomes in terms of functionally annotated regulatory elements. To explore how these elements contribute to gene regulation in the context of gene regulatory networks, we need convenient tools to identify the proteins that bind to them. Here, we present the development and validation of a highly automated protein-DNA interaction detection method, enabling the high-throughput yeast one-hybrid-based screening of DNA elements versus an array of full-length, sequence-verified clones containing 647 (over 85%) of predicted Drosophila transcription factors (TFs). Using six well-characterized regulatory elements (82 bp – 1kb), we identified 33 TF-DNA interactions of which 27 are novel. To simultaneously validate these interactions and locate their binding sites of involved TFs, we implemented a novel microfluidics-based approach that enables us to conduct hundreds of gel shift-like assays at once, thus allowing the retrieval of DNA occupancy data for each TF throughout the respective target DNA elements. Finally, we biologically validate several interactions and specifically identify two novel regulators of sine oculis gene expression and hence eye development.
doi:10.1038/nmeth.1763
PMCID: PMC3929264  PMID: 22037703
3.  An extracellular interactome of Immunoglobulin and LRR proteins reveals receptor-ligand networks 
Cell  2013;154(1):228-239.
Extracellular domains of cell-surface receptors and ligands mediate cell-cell communication, adhesion, and initiation of signaling events, but most existing protein-protein “interactome” datasets lack information for extracellular interactions. We probed interactions between receptor extracellular domains, focusing on the Immunoglobulin Superfamily (IgSF), Fibronectin type-III (FnIII) and Leucine-rich repeat (LRR) families of Drosophila, a set of 202 proteins, many of which are known to be important in neuronal and developmental functions. Out of 20503 candidate protein pairs tested, we observed 106 interactions, 83 of which were previously unknown. We ‘deorphanized’ the 20-member subfamily of defective in proboscis IgSF proteins, showing that they selectively interact with an 11-member subfamily of previously uncharacterized IgSF proteins. Both subfamilies interact with a single common ‘orphan’ LRR protein. We also observed new interactions between Hedgehog and EGFR pathway components. Several of these interactions could be visualized in live-dissected embryos, demonstrating that this approach can identify physiologically relevant receptor-ligand pairs.
doi:10.1016/j.cell.2013.06.006
PMCID: PMC3756661  PMID: 23827685
4.  Spatial expression of transcription factors in Drosophila embryonic organ development 
Genome Biology  2013;14(12):R140.
Background
Site-specific transcription factors (TFs) bind DNA regulatory elements to control expression of target genes, forming the core of gene regulatory networks. Despite decades of research, most studies focus on only a small number of TFs and the roles of many remain unknown.
Results
We present a systematic characterization of spatiotemporal gene expression patterns for all known or predicted Drosophila TFs throughout embryogenesis, the first such comprehensive study for any metazoan animal. We generated RNA expression patterns for all 708 TFs by in situ hybridization, annotated the patterns using an anatomical controlled vocabulary, and analyzed TF expression in the context of organ system development. Nearly all TFs are expressed during embryogenesis and more than half are specifically expressed in the central nervous system. Compared to other genes, TFs are enriched early in the development of most organ systems, and throughout the development of the nervous system. Of the 535 TFs with spatially restricted expression, 79% are dynamically expressed in multiple organ systems while 21% show single-organ specificity. Of those expressed in multiple organ systems, 77 TFs are restricted to a single organ system either early or late in development. Expression patterns for 354 TFs are characterized for the first time in this study.
Conclusions
We produced a reference TF dataset for the investigation of gene regulatory networks in embryogenesis, and gained insight into the expression dynamics of the full complement of TFs controlling the development of each organ system.
doi:10.1186/gb-2013-14-12-r140
PMCID: PMC4053779  PMID: 24359758
6.  Computational Identification of Diverse Mechanisms Underlying Transcription Factor-DNA Occupancy 
PLoS Genetics  2013;9(8):e1003571.
ChIP-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high-throughput method to understand transcriptional regulation, especially on a global scale. This has led to great interest in the underlying biochemical mechanisms that direct TF-DNA binding, with the ultimate goal of computationally predicting a TF's occupancy profile in any cellular condition. In this study, we examined the influence of various potential determinants of TF-DNA binding on a much larger scale than previously undertaken. We used a thermodynamics-based model of TF-DNA binding, called “STAP,” to analyze 45 TF-ChIP data sets from Drosophila embryonic development. We built a cross-validation framework that compares a baseline model, based on the ChIP'ed (“primary”) TF's motif, to more complex models where binding by secondary TFs is hypothesized to influence the primary TF's occupancy. Candidates interacting TFs were chosen based on RNA-SEQ expression data from the time point of the ChIP experiment. We found widespread evidence of both cooperative and antagonistic effects by secondary TFs, and explicitly quantified these effects. We were able to identify multiple classes of interactions, including (1) long-range interactions between primary and secondary motifs (separated by ≤150 bp), suggestive of indirect effects such as chromatin remodeling, (2) short-range interactions with specific inter-site spacing biases, suggestive of direct physical interactions, and (3) overlapping binding sites suggesting competitive binding. Furthermore, by factoring out the previously reported strong correlation between TF occupancy and DNA accessibility, we were able to categorize the effects into those that are likely to be mediated by the secondary TF's effect on local accessibility and those that utilize accessibility-independent mechanisms. Finally, we conducted in vitro pull-down assays to test model-based predictions of short-range cooperative interactions, and found that seven of the eight TF pairs tested physically interact and that some of these interactions mediate cooperative binding to DNA.
Author Summary
Chromatin Immunoprecipitation (ChIP)-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high throughput method to understand transcriptional regulation, especially on a global scale. Here, we utilize 45 ChIP-chip and ChIP-SEQ data sets from Drosophila to explore the underlying mechanisms of TF-DNA binding. For this, we employ a biophysically motivated computational model, in conjunction with over 300 TF motifs (binding specificities) as well as gene expression and DNA accessibility data from different developmental stages in Drosophila embryos. Our findings provide robust statistical evidence of the role played by TF-TF interactions in shaping genome-wide TF-DNA binding profiles, and thus in directing gene regulation. Our method allows us to go beyond simply recognizing the existence of such interactions, to quantifying their effects on TF occupancy. We are able to categorize the probable mechanisms of these effects as involving direct physical interactions versus accessibility-mediated indirect interactions, long-range versus short-range interactions, and cooperative versus antagonistic interactions. Our analysis reveals widespread evidence of combinatorial regulation present in recently generated ChIP data sets, and sets the stage for rich integrative models of the future that will predict cell type-specific TF occupancy values from sequence and expression data.
doi:10.1371/journal.pgen.1003571
PMCID: PMC3731213  PMID: 23935523
7.  Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures 
Nature  2007;450(7167):219-232.
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
doi:10.1038/nature06340
PMCID: PMC2474711  PMID: 17994088
8.  Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome 
Genome Biology  2002;3(12):research0086.1-86.2.
Analysis of conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons.
Background
It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.
Results
We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences.
Conclusions
Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone.
doi:10.1186/gb-2002-3-12-research0086
PMCID: PMC151188  PMID: 12537575
9.  Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence 
Genome Biology  2002;3(12):research0079.1-79.14.
The Drosophila melanogaster genome was the first metazoan genome to be sequenced by whole-genome shotgun. Now, the sequence has been finished in a process designed to close gaps, improve sequence quality and validate the assembly.
Background
The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.
Results
Our finishing process was designed to close gaps, improve sequence quality and validate the assembly. Sequence traces derived from the WGS and draft sequencing of individual bacterial artificial chromosomes (BACs) were assembled into BAC-sized segments. These segments were brought to high quality, and then joined to constitute the sequence of each chromosome arm. Overall assembly was verified by comparison to a physical map of fingerprinted BAC clones. In the current version of the 116.9 Mb euchromatic genome, called Release 3, the six euchromatic chromosome arms are represented by 13 scaffolds with a total of 37 sequence gaps. We compared Release 3 to Release 2; in autosomal regions of unique sequence, the error rate of Release 2 was one in 20,000 bp.
Conclusions
The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing. However, the initial method of repeat assembly was flawed. The sequence we report here, Release 3, is a reliable resource for molecular genetic experimentation and computational analysis.
doi:10.1186/gb-2002-3-12-research0079
PMCID: PMC151181  PMID: 12537568
10.  Global Patterns of Tissue-Specific Alternative Polyadenylation in Drosophila 
Cell reports  2012;1(3):277-289.
SUMMARY
We analyzed the usage and consequences of alternative cleavage and polyadenylation (APA) in Drosophila melanogaster by using >1 billion reads of stranded mRNA-seq across a variety of dissected tissues. Beyond demonstrating that a majority of fly transcripts are subject to APA, we observed broad trends for 3′ untranslated region (UTR) shortening in the testis and lengthening in the central nervous system (CNS); the latter included hundreds of unannotated extensions ranging up to 18 kb. Extensive northern analyses validated the accumulation of full-length neural extended transcripts, and in situ hybridization indicated their spatial restriction to the CNS. Genes encoding RNA binding proteins (RBPs) and transcription factors were preferentially subject to 3′ UTR extensions. Motif analysis indicated enrichment of miRNA and RBP sites in the neural extensions, and their termini were enriched in canonical cis elements that promote cleavage and polyadenylation. Altogether, we reveal broad tissue-specific patterns of APA in Drosophila and transcripts with unprecedented 3′ UTR length in the nervous system.
doi:10.1016/j.celrep.2012.01.001
PMCID: PMC3368434  PMID: 22685694
11.  Identification of Functional Elements and Regulatory Circuits by Drosophila modENCODE 
Roy, Sushmita | Ernst, Jason | Kharchenko, Peter V. | Kheradpour, Pouya | Negre, Nicolas | Eaton, Matthew L. | Landolin, Jane M. | Bristow, Christopher A. | Ma, Lijia | Lin, Michael F. | Washietl, Stefan | Arshinoff, Bradley I. | Ay, Ferhat | Meyer, Patrick E. | Robine, Nicolas | Washington, Nicole L. | Di Stefano, Luisa | Berezikov, Eugene | Brown, Christopher D. | Candeias, Rogerio | Carlson, Joseph W. | Carr, Adrian | Jungreis, Irwin | Marbach, Daniel | Sealfon, Rachel | Tolstorukov, Michael Y. | Will, Sebastian | Alekseyenko, Artyom A. | Artieri, Carlo | Booth, Benjamin W. | Brooks, Angela N. | Dai, Qi | Davis, Carrie A. | Duff, Michael O. | Feng, Xin | Gorchakov, Andrey A. | Gu, Tingting | Henikoff, Jorja G. | Kapranov, Philipp | Li, Renhua | MacAlpine, Heather K. | Malone, John | Minoda, Aki | Nordman, Jared | Okamura, Katsutomo | Perry, Marc | Powell, Sara K. | Riddle, Nicole C. | Sakai, Akiko | Samsonova, Anastasia | Sandler, Jeremy E. | Schwartz, Yuri B. | Sher, Noa | Spokony, Rebecca | Sturgill, David | van Baren, Marijke | Wan, Kenneth H. | Yang, Li | Yu, Charles | Feingold, Elise | Good, Peter | Guyer, Mark | Lowdon, Rebecca | Ahmad, Kami | Andrews, Justen | Berger, Bonnie | Brenner, Steven E. | Brent, Michael R. | Cherbas, Lucy | Elgin, Sarah C. R. | Gingeras, Thomas R. | Grossman, Robert | Hoskins, Roger A. | Kaufman, Thomas C. | Kent, William | Kuroda, Mitzi I. | Orr-Weaver, Terry | Perrimon, Norbert | Pirrotta, Vincenzo | Posakony, James W. | Ren, Bing | Russell, Steven | Cherbas, Peter | Graveley, Brenton R. | Lewis, Suzanna | Micklem, Gos | Oliver, Brian | Park, Peter J. | Celniker, Susan E. | Henikoff, Steven | Karpen, Gary H. | Lai, Eric C. | MacAlpine, David M. | Stein, Lincoln D. | White, Kevin P. | Kellis, Manolis
Science (New York, N.Y.)  2010;330(6012):1787-1797.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
doi:10.1126/science.1198374
PMCID: PMC3192495  PMID: 21177974
12.  The Developmental Transcriptome of Drosophila melanogaster 
Nature  2010;471(7339):473-479.
Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
doi:10.1038/nature09715
PMCID: PMC3075879  PMID: 21179090
13.  Dynamic reprogramming of chromatin accessibility during Drosophila embryo development 
Genome Biology  2011;12(5):R43.
Background
The development of complex organisms is believed to involve progressive restrictions in cellular fate. Understanding the scope and features of chromatin dynamics during embryogenesis, and identifying regulatory elements important for directing developmental processes remain key goals of developmental biology.
Results
We used in vivo DNaseI sensitivity to map the locations of regulatory elements, and explore the changing chromatin landscape during the first 11 hours of Drosophila embryonic development. We identified thousands of conserved, developmentally dynamic, distal DNaseI hypersensitive sites associated with spatial and temporal expression patterning of linked genes and with large regions of chromatin plasticity. We observed a nearly uniform balance between developmentally up- and down-regulated DNaseI hypersensitive sites. Analysis of promoter chromatin architecture revealed a novel role for classical core promoter sequence elements in directing temporally regulated chromatin remodeling. Another unexpected feature of the chromatin landscape was the presence of localized accessibility over many protein-coding regions, subsets of which were developmentally regulated or associated with the transcription of genes with prominent maternal RNA contributions in the blastoderm.
Conclusions
Our results provide a global view of the rich and dynamic chromatin landscape of early animal development, as well as novel insights into the organization of developmentally regulated chromatin features.
doi:10.1186/gb-2011-12-5-r43
PMCID: PMC3219966  PMID: 21569360
14.  Quantitative Analysis of the Drosophila Segmentation Regulatory Network Using Pattern Generating Potentials 
PLoS Biology  2010;8(8):e1000456.
A new computational method uses gene expression databases and transcription factor binding specificities to describe regulatory elements in the Drosophila A/P patterning network in unprecedented detail.
Cis-regulatory modules that drive precise spatial-temporal patterns of gene expression are central to the process of metazoan development. We describe a new computational strategy to annotate genomic sequences based on their “pattern generating potential” and to produce quantitative descriptions of transcriptional regulatory networks at the level of individual protein-module interactions. We use this approach to convert the qualitative understanding of interactions that regulate Drosophila segmentation into a network model in which a confidence value is associated with each transcription factor-module interaction. Sequence information from multiple Drosophila species is integrated with transcription factor binding specificities to determine conserved binding site frequencies across the genome. These binding site profiles are combined with transcription factor expression information to create a model to predict module activity patterns. This model is used to scan genomic sequences for the potential to generate all or part of the expression pattern of a nearby gene, obtained from available gene expression databases. Interactions between individual transcription factors and modules are inferred by a statistical method to quantify a factor's contribution to the module's pattern generating potential. We use these pattern generating potentials to systematically describe the location and function of known and novel cis-regulatory modules in the segmentation network, identifying many examples of modules predicted to have overlapping expression activities. Surprisingly, conserved transcription factor binding site frequencies were as effective as experimental measurements of occupancy in predicting module expression patterns or factor-module interactions. Thus, unlike previous module prediction methods, this method predicts not only the location of modules but also their spatial activity pattern and the factors that directly determine this pattern. As databases of transcription factor specificities and in vivo gene expression patterns grow, analysis of pattern generating potentials provides a general method to decode transcriptional regulatory sequences and networks.
Author Summary
The developmental program specifying segmentation along the anterior-posterior axis of the Drosophila embryo is one of the best studied examples of transcriptional regulatory networks. Previous work has identified the location and function of dozens of DNA segments called cis-regulatory “modules” that regulate several genes in precise spatial patterns in the early embryo. In many cases, transcription factors that interact with such modules have also been identified. We present a novel computational framework that turns a qualitative and fragmented understanding of modules and factor-module interactions into a quantitative, systems-level view. The formalism utilizes experimentally characterized binding specificities of transcription factors and gene expression patterns to describe how multiple transcription factors (working as activators or repressors) act together in a module to determine its regulatory activity. This formalism can explain the expression patterns of known modules, infer factor-module interactions and quantify the potential of an arbitrary DNA segment to drive a gene's expression. We have also employed databases of gene expression patterns to find novel modules of the regulatory network. As databases of binding motifs and gene expression patterns grow, this new approach provides a general method to decode transcriptional regulatory sequences and networks.
doi:10.1371/journal.pbio.1000456
PMCID: PMC2923081  PMID: 20808951
15.  Unlocking the secrets of the genome 
Nature  2009;459(7249):927-930.
Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.
doi:10.1038/459927a
PMCID: PMC2843545  PMID: 19536255
16.  Sequence Finishing and Mapping of Drosophila melanogaster Heterochromatin 
Science (New York, N.Y.)  2007;316(5831):1625-1628.
Genome sequences for most metazoans and plants are incomplete because of the presence of repeated DNA in the heterochromatin. The heterochromatic regions of Drosophila melanogaster contain 20 million bases (Mb) of sequence amenable to mapping, sequence assembly, and finishing. We describe the generation of 15 Mb of finished or improved heterochromatic sequence with the use of available clone resources and assembly methods. We also constructed a bacterial artificial chromosome–based physical map that spans 13 Mb of the pericentromeric heterochromatin and a cytogenetic map that positions 11 Mb in specific chromosomal locations. We have approached a complete assembly and mapping of the nonsatellite component of Drosophila heterochromatin. The strategy we describe is also applicable to generating substantially more information about heterochromatin in other species, including humans.
doi:10.1126/science.1139816
PMCID: PMC2825053  PMID: 17569867
17.  Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape 
We created innovative virtual representation for our large scale Drosophila insitu expression dataset. We aligned an elliptically shaped mesh comprised of small triangular regions to the outline of each embryo. Each triangle defines a unique location in the embryo and comparing corresponding triangles allows easy identification of similar expression patterns.The virtual representation was used to organize the expression landscape at stage 4-6. We identified regions with similar expression in the embryo and clustered genes with similar expression patterns.We created algorithms to mine the dataset for adjacent non-overlapping patterns and anti-correlated patterns. We were able to mine the dataset to identify co-expressed and putative interacting genes.Using co-expression we were able to assign putative functions to unknown genes.
Analyzing both temporal and spatial gene expression is essential for understanding development and regulatory networks of multicellular organisms. Interacting genes are commonly expressed in overlapping or adjacent domains. Thus, gene expression patterns can be used to assign putative gene functions and mined to infer candidates for networks.
We have generated a systematic two-dimensional mRNA expression atlas profiling embryonic development of Drosophila melanogaster (Tomancak et al, 2002, 2007). To date, we have collected over 70 000 images for over 6000 genes. To explore spatial relationships between gene expression patterns, we used a novel computational image-processing approach by converting expression patterns from the images into virtual representations (Figure 1). Using a custom-designed automated pipeline, for each image, we segmented and aligned the outline of the embryo to an elliptically shaped mesh, comprised of 311 small triangular regions each defining a unique location within the embryo. By comparing corresponding triangles, we produced a distance score to identify similar patterns. We generated those triangulated images (TIs) for our entire data set at all developmental stages and demonstrated that this representation can be used as for objective computationally defined description for expression in in situ hybridization images from various sources, including images from the literature.
We used the TIs to conduct a comprehensive analysis of the expression landscape. To this end, we created a novel approach to temporally sort and compact TIs to a non-redundant data set suitable for further computational processing. Although generally applicable for all developmental stages, for this study, we focused on developmental stages 4–6. For this stage range, we reduced the initial set of about 5800 TIs to 553 TIs containing 364 genes. Using this filtered data set, to discover how expression subdivides the embryo into regions, we clustered areas with similar expression and demonstrated that expression patterns divide the early embryo into distinct spatial regions resembling a fate map (Figure 3). To discover the range of unique expression patterns, we used affinity propagation clustering (Frey and Dueck, 2007) to group TIs with similar patterns and identified 39 clusters each representing a distinct pattern class. We integrated the remaining genes into the 39 clusters and studied the distribution of expression patterns and the relationships between the clusters.
The clustered expression patterns were used to identify putative positive and negative regulatory interactions. The similar TIs in each cluster not only grouped already known genes with related functions, but previously undescribed genes. A comparative analysis identified subtle differences between the genes within each expression cluster. To investigate these differences, we developed a novel Markov Random Field (MRF) segmentation algorithm to extract patterns. We then extended the MRF algorithm to detect shared expression boundaries, generate similarity measurements, and discriminate even faint/uncertain patterns between two TIs. This enabled us to identify more subtle partial expression pattern overlaps and adjacent non-overlapping patterns. For example, by conducting this analysis on the cluster containing the gene snail, we identified the previously known huckebein, which restricts snail expression (Reuter and Leptin, 1994), and zfh1, which interacts with tinman (Broihier et al, 1998; Su et al, 1999).
By studying the functions of known genes, we assigned putative developmental roles to each of the 39 clusters. Of the 1800 genes investigated, only half of them had previously assigned functions.
Representing expression patterns with geometric meshes facilitates the analysis of a complex process involving thousands of genes. This approach is complementary to the cellular resolution 3D atlas for the Drosophila embryo (Fowlkes et al, 2008). Our method can be used as a rapid, fully automated, high-throughput approach to obtain a map of co-expression, which will serve to select specific genes for detailed multiplex in-situ hybridization and confocal analysis for a fine-grain atlas. Our data are similar to the data in the literature, and research groups studying reporter constructs, mutant animals, or orthologs can easily produce in situ hybridizations. TIs can be readily created and provide representations that are both comparable to each other and our data set. We have demonstrated that our approach can be used for predicting relationships in regulatory and developmental pathways.
Discovery of temporal and spatial patterns of gene expression is essential for understanding the regulatory networks and development in multicellular organisms. We analyzed the images from our large-scale spatial expression data set of early Drosophila embryonic development and present a comprehensive computational image analysis of the expression landscape. For this study, we created an innovative virtual representation of embryonic expression patterns using an elliptically shaped mesh grid that allows us to make quantitative comparisons of gene expression using a common frame of reference. Demonstrating the power of our approach, we used gene co-expression to identify distinct expression domains in the early embryo; the result is surprisingly similar to the fate map determined using laser ablation. We also used a clustering strategy to find genes with similar patterns and developed new analysis tools to detect variation within consensus patterns, adjacent non-overlapping patterns, and anti-correlated patterns. Of the 1800 genes investigated, only half had previously assigned functions. The known genes suggest developmental roles for the clusters, and identification of related patterns predicts requirements for co-occurring biological functions.
doi:10.1038/msb.2009.102
PMCID: PMC2824522  PMID: 20087342
biological function; embryo; gene expression; in situ hybridization; Markov Random Field
18.  Determination of gene expression patterns using high-throughput RNA in situ hybridization to whole-mount Drosophila embryos 
Nature protocols  2009;4(5):605-618.
We describe a high-throughput protocol for RNA in situ hybridization (ISH) to Drosophila embryos in 96-well format. cDNA or genomic DNA templates are amplified by PCR and then digoxigenin-labeled ribonucleotides are incorporated into anti-sense RNA probes by in vitro transcription. The quality of each probe is evaluated prior to in situ hybridization using a RNA Probe Quantification (dot blot) assay. RNA probes are hybridized to fixed, mixed-staged Drosophila embryos in 96-well plates. The resulting stained embryos can be examined and photographed immediately or stored at 4°C for later analysis. Starting with fixed, staged embryos, the protocol takes 6 days from probe template production through hybridization. Preparation of fixed embryos requires a minimum of two weeks to collect embryos representing all stages. The method has been used to determine the expression patterns of over 6000 genes throughout embryogenesis.
doi:10.1038/nprot.2009.55
PMCID: PMC2780369  PMID: 19360017
19.  Functional Evolution of cis-Regulatory Modules at a Homeotic Gene in Drosophila 
PLoS Genetics  2009;5(11):e1000709.
It is a long-held belief in evolutionary biology that the rate of molecular evolution for a given DNA sequence is inversely related to the level of functional constraint. This belief holds true for the protein-coding homeotic (Hox) genes originally discovered in Drosophila melanogaster. Expression of the Hox genes in Drosophila embryos is essential for body patterning and is controlled by an extensive array of cis-regulatory modules (CRMs). How the regulatory modules functionally evolve in different species is not clear. A comparison of the CRMs for the Abdominal-B gene from different Drosophila species reveals relatively low levels of overall sequence conservation. However, embryonic enhancer CRMs from other Drosophila species direct transgenic reporter gene expression in the same spatial and temporal patterns during development as their D. melanogaster orthologs. Bioinformatic analysis reveals the presence of short conserved sequences within defined CRMs, representing gap and pair-rule transcription factor binding sites. One predicted binding site for the gap transcription factor KRUPPEL in the IAB5 CRM was found to be altered in Superabdominal (Sab) mutations. In Sab mutant flies, the third abdominal segment is transformed into a copy of the fifth abdominal segment. A model for KRUPPEL-mediated repression at this binding site is presented. These findings challenge our current understanding of the relationship between sequence evolution at the molecular level and functional activity of a CRM. While the overall sequence conservation at Drosophila CRMs is not distinctive from neighboring genomic regions, functionally critical transcription factor binding sites within embryonic enhancer CRMs are highly conserved. These results have implications for understanding mechanisms of gene expression during embryonic development, enhancer function, and the molecular evolution of eukaryotic regulatory modules.
Author Summary
The fertilized animal embryo is a mass of uniform cells that becomes a complex, segmented, and highly organized structure of differentiated cells through the process of development. This vital process is controlled by networks of developmental genes interacting with each other on the molecular level. Because these genes are crucial for animal development, they are conserved both in function and at the DNA sequence level in related species. We have examined critical DNA sequence modules which regulate genes that pattern the early embryo in different species of the fruit fly. We found that despite rapid evolution of the DNA sequences, the regulatory sequences from one fruit fly species are able to operate when tested in another fruit fly species. Further analysis reveals that there are sequences within these regulatory DNA modules which are conserved across different species and which are critical for regulatory function. These conserved sequences represent critical binding sites for protein transcription factors. These findings have important implications for our understanding of gene regulation during development and evolution across diverse animal species ranging from the fruit fly to humans.
doi:10.1371/journal.pgen.1000709
PMCID: PMC2763271  PMID: 19893611
20.  Comparative Genomics of the Eukaryotes 
Science (New York, N.Y.)  2000;287(5461):2204-2215.
A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae—and the proteins they are predicted to encode—was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.
PMCID: PMC2754258  PMID: 10731134
21.  Regulation of Early Endosomal Entry by the Drosophila Tumor Suppressors Rabenosyn and Vps45 
Molecular Biology of the Cell  2008;19(10):4167-4176.
The small GTPase Rab5 has emerged as an important regulator of animal development, and it is essential for endocytic trafficking. However, the mechanisms that link Rab5 activation to cargo entry into early endosomes remain unclear. We show here that Drosophila Rabenosyn (Rbsn) is a Rab5 effector that bridges an interaction between Rab5 and the Sec1/Munc18-family protein Vps45, and we further identify the syntaxin Avalanche (Avl) as a target for Vps45 activity. Rbsn and Vps45, like Avl and Rab5, are specifically localized to early endosomes and are required for endocytosis. Ultrastructural analysis of rbsn, Vps45, avl, and Rab5 null mutant cells, which show identical defects, demonstrates that all four proteins are required for vesicle fusion to form early endosomes. These defects lead to loss of epithelial polarity in mutant tissues, which overproliferate to form neoplastic tumors. This work represents the first characterization of a Rab5 effector as a tumor suppressor, and it provides in vivo evidence for a Rbsn–Vps45 complex on early endosomes that links Rab5 to the SNARE fusion machinery.
doi:10.1091/mbc.E08-07-0716
PMCID: PMC2555928  PMID: 18685079
23.  Exploiting position effects and the gypsy retrovirus insulator to engineer precisely expressed transgenes 
Nature genetics  2008;40(4):476-483.
A major obstacle to creating precisely expressed transgenes lies in the epigenetic effects of the host chromatin that surrounds them. Here we present a strategy to overcome this problem, employing a Gal4-inducible luciferase assay to systematically quantify position effects of host chromatin and the ability of insulators to counteract these effects at phiC31 integration loci randomly distributed throughout the Drosophila genome. We identify loci that can be exploited to deliver precise doses of transgene expression to specific tissues. Moreover, we uncover a previously unrecognized property of the gypsy retrovirus insulator to boost gene expression to levels severalfold greater than at most or possibly all un-insulated loci, in every tissue tested. These findings provide the first opportunity to create a battery of transgenes that can be reliably expressed at high levels in virtually any tissue by integration at a single locus, and conversely, to engineer a controlled phenotypic allelic series by exploiting several loci. The generality of our approach makes it adaptable to other model systems to identify and modify loci for optimal transgene expression.
doi:10.1038/ng.101
PMCID: PMC2330261  PMID: 18311141
24.  Transcription Factors Bind Thousands of Active and Inactive Regions in the Drosophila Blastoderm  
PLoS Biology  2008;6(2):e27.
Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. We used whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior–posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched in bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over 40 well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal–ventral patterning genes, whose expression we show to be quantitatively modulated by anterior–posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.
Author Summary
One of the largest classes of regulatory proteins in animals, sequence-specific DNA binding transcription factors determine in which cells genes will be expressed and so control the development of an animal from a single cell to a morphologically complex adult. Understanding how this process is coordinated depends on knowing the number and types of genes that each transcription factor binds and regulates. Using immunoprecipitation of in vivo crosslinked chromatin coupled with DNA microarray hybridization (ChIP/chip), we have determined the genomic binding sites in early embryos of six transcription factors that play a crucial role in early development of the fruit fly Drosophila melanogaster. We find that these proteins bind to several thousand genomic regions that lie close to approximately half the protein coding genes. Although this is a much larger number of genes than these factors are generally thought to regulate, we go on to show that whereas the more highly bound genes generally look to be functional targets, many of the genes bound at lower levels do not appear to be regulated by these factors. Our conclusions differ from those of other groups who have not distinguished between different levels of DNA binding in vivo using similar assays and who have generally assumed that all detected binding is functional.
ChIP/chip analysis indicates that sequence-specific transcription factors bind to overlapping sets of thousands of genomic regions in Drosophila embryos, but most regions are bound at low levels and many may not be functional targets of these factors.
doi:10.1371/journal.pbio.0060027
PMCID: PMC2235902  PMID: 18271625
25.  Improved repeat identification and masking in Dipterans 
Gene  2006;389(1):1-9.
doi:10.1016/j.gene.2006.09.011
PMCID: PMC1945102  PMID: 17137733
Heterochromatin; Drosophila; A. gambiae; PILER; transposable element; RepeatRunner

Results 1-25 (35)