1.  FlyBase 101 – the basics of navigating FlyBase 
Nucleic Acids Research  2011;40(Database issue):D706-D714.
FlyBase ( is the leading database and web portal for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly species. Whether you use the fruit fly as an experimental system or want to apply Drosophila biological knowledge to another field of study, FlyBase can help you successfully navigate the wealth of available Drosophila data. Here, we review the FlyBase web site with novice and less-experienced users of FlyBase in mind and point out recent developments stemming from the availability of genome-wide data from the modENCODE project. The first section of this paper explains the organization of the web site and describes the report pages available on FlyBase, focusing on the most popular, the Gene Report. The next section introduces some of the search tools available on FlyBase, in particular, our heavily used and recently redesigned search tool QuickSearch, found on the FlyBase homepage. The final section concerns genomic data, including recent modENCODE ( data, available through our Genome Browser, GBrowse.
PMCID: PMC3245098  PMID: 22127867
2.  FlyBase 102—advanced approaches to interrogating FlyBase 
Nucleic Acids Research  2013;42(Database issue):D780-D788.
FlyBase ( is the leading website and database of Drosophila genes and genomes. Whether you are using the fruit fly Drosophila melanogaster as an experimental system or wish to understand Drosophila biological knowledge in relation to human disease or to other model systems, FlyBase can help you successfully find the information you are looking for. Here, we demonstrate some of our more advanced searching systems and highlight some of our new tools for searching the wealth of data on FlyBase. The first section explores gene function in FlyBase, using our TermLink tool to search with Controlled Vocabulary terms and our new RNA-Seq Search tool to search gene expression. The second section of this article describes a few ways to search genomic data in FlyBase, using our BLAST server and the new implementation of GBrowse 2, as well as our new FeatureMapper tool. Finally, we move on to discuss our most powerful search tool, QueryBuilder, before describing pre-computed cuts of the data and how to query the database programmatically.
PMCID: PMC3964969  PMID: 24234449
3.  Opportunities for text mining in the FlyBase genetic literature curation workflow 
FlyBase is the model organism database for Drosophila genetic and genomic information. Over the last 20 years, FlyBase has had to adapt and change to keep abreast of advances in biology and database design. We are continually looking for ways to improve curation efficiency and efficacy. Genetic literature curation focuses on the extraction of genetic entities (e.g. genes, mutant alleles, transgenic constructs) and their associated phenotypes and Gene Ontology terms from the published literature. Over 2000 Drosophila research articles are now published every year. These articles are becoming ever more data-rich and there is a growing need for text mining to shoulder some of the burden of paper triage and data extraction. In this article, we describe our curation workflow, along with some of the problems and bottlenecks therein, and highlight the opportunities for text mining. We do so in the hope of encouraging the BioCreative community to help us to develop effective methods to mine this torrent of information.
Database URL:
PMCID: PMC3500518  PMID: 23160412
4.  FlyBase: improvements to the bibliography 
Nucleic Acids Research  2012;41(Database issue):D751-D757.
An accurate, comprehensive, non-redundant and up-to-date bibliography is a crucial component of any Model Organism Database (MOD). Principally, the bibliography provides a set of references that are specific to the field served by the MOD. Moreover, it serves as a backbone to which all curated biological data can be attributed. Here, we describe the organization and main features of the bibliography in FlyBase (, the MOD for Drosophila melanogaster. We present an overview of the current content of the bibliography, the pipeline for identifying and adding new references, the presentation of data within Reference Reports and effective methods for searching and retrieving bibliographic data. We highlight recent improvements in these areas and describe the advantages of using the FlyBase bibliography over alternative literature resources. Although this article is focused on bibliographic data, many of the features and tools described are applicable to browsing and querying other datasets in FlyBase.
PMCID: PMC3531214  PMID: 23125371
5.  FlyTF: improved annotation and enhanced functionality of the Drosophila transcription factor database 
Nucleic Acids Research  2009;38(Database issue):D443-D447.
FlyTF ( is a database of computationally predicted and/or experimentally verified site-specific transcription factors (TFs) in the fruit fly Drosophila melanogaster. The manual classification of TFs in the initial version of FlyTF that concentrated primarily on the DNA-binding characteristics of the proteins has now been extended to a more fine-grained annotation of both DNA binding and regulatory properties in the new release. Furthermore, experimental evidence from the literature was classified into a defined vocabulary, and in collaboration with FlyBase, translated into Gene Ontology (GO) annotation. While our GO annotations will also be available through FlyBase as they will be incorporated into the genes’ official GO annotation in the future, the entire evidence used for classification including computational predictions and quotes from the literature can be accessed through FlyTF. The FlyTF website now builds upon the InterMine framework, which provides experimental and computational biologists with powerful search and filter functionality, list management tools and access to genomic information associated with the TFs.
PMCID: PMC2808907  PMID: 19884132
6.  tagtog: interactive and text-mining-assisted annotation of gene mentions in PLOS full-text articles 
The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of these increases, FlyBase, a database for Drosophila genomic and genetic information, is constantly exploring new ways to mine the published literature to increase the efficiency and accuracy of manual curation and to automate some aspects, such as triaging and entity extraction. Toward this end, we present the ‘tagtog’ system, a web-based annotation framework that can be used to mark up biological entities (such as genes) and concepts (such as Gene Ontology terms) in full-text articles. tagtog leverages manual user annotation in combination with automatic machine-learned annotation to provide accurate identification of gene symbols and gene names. As part of the BioCreative IV Interactive Annotation Task, FlyBase has used tagtog to identify and extract mentions of Drosophila melanogaster gene symbols and names in full-text biomedical articles from the PLOS stable of journals. We show here the results of three experiments with different sized corpora and assess gene recognition performance and curation speed. We conclude that tagtog-named entity recognition improves with a larger corpus and that tagtog-assisted curation is quicker than manual curation.
Database URL:,
PMCID: PMC3978375  PMID: 24715220
7.  The FlyBase database of the Drosophila genome projects and community literature 
Nucleic Acids Research  2003;31(1):172-175.
FlyBase ( provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D. melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.
PMCID: PMC165541  PMID: 12519974
8.  The FlyBase database of the Drosophila genome projects and community literature 
Nucleic Acids Research  2002;30(1):106-108.
FlyBase ( provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. Following on the success of the Drosophila genome project, FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. The current cycle of reannotation focuses on establishing a comprehensive data set of gene models (i.e. transcription units and CDSs). There are many points of entry to the genome within FlyBase, most notably through maps, gene ontologies, structured phenotypic and gene expression data, and anatomy.
PMCID: PMC99082  PMID: 11752267
9.  FlyBase: a Drosophila database. The FlyBase consortium. 
Nucleic Acids Research  1997;25(1):63-66.
FlyBase is a database of genetic and molecular data concerning Drosophila. FlyBase is maintained as a relational database (in Sybase) and is made available as html documents and flat files. The scope of FlyBase includes: genes, alleles (and phenotypes), aberrations, transposons, pointers to sequence data, clones, stock lists, Drosophila workers and bibliographic references. The Encyclopedia of Drosophila is a joint effort between FlyBase and the Berkeley Drosophila Genome Project which integrates FlyBase data with those from the BDGP.
PMCID: PMC146418  PMID: 9045212
10.  FlyBase: a Drosophila database. Flybase Consortium. 
Nucleic Acids Research  1998;26(1):85-88.
FlyBase ( is a comprehensive database of genetic and molecular data concerning Drosophila . FlyBase is maintained as a relational database (in Sybase) and is made available as html documents and flat files. The scope of FlyBase includes: genes, alleles (with phenotypes), aberrations, transposons, pointers to sequence data, gene products, maps, clones, stock lists, Drosophila workers and bibliographic references.
PMCID: PMC147222  PMID: 9399806
11.  Directly e-mailing authors of newly published papers encourages community curation 
Much of the data within Model Organism Databases (MODs) comes from manual curation of the primary research literature. Given limited funding and an increasing density of published material, a significant challenge facing all MODs is how to efficiently and effectively prioritize the most relevant research papers for detailed curation. Here, we report recent improvements to the triaging process used by FlyBase. We describe an automated method to directly e-mail corresponding authors of new papers, requesting that they list the genes studied and indicate (‘flag’) the types of data described in the paper using an online tool. Based on the author-assigned flags, papers are then prioritized for detailed curation and channelled to appropriate curator teams for full data extraction. The overall response rate has been 44% and the flagging of data types by authors is sufficiently accurate for effective prioritization of papers. In summary, we have established a sustainable community curation program, with the result that FlyBase curators now spend less time triaging and can devote more effort to the specialized task of detailed data extraction.
Database URL:
PMCID: PMC3342516  PMID: 22554788
12.  FlyBase: integration and improvements to query tools 
Nucleic Acids Research  2007;36(Database issue):D588-D593.
FlyBase ( is the primary resource for molecular and genetic information on the Drosophilidae. The database serves researchers of diverse backgrounds and interests, and offers several different query tools to provide efficient access to the data available and facilitate the discovery of significant relationships within the database. Recently, FlyBase has developed Interactions Browser and enhanced GBrowse, which are graphical query tools, and made improvements to the search tools QuickSearch and QueryBuilder. Furthermore, these search tools have been integrated with Batch Download and new analysis tools through a more flexible search results list, providing powerful ways of exploring the data in FlyBase.
PMCID: PMC2238994  PMID: 18160408
13.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review 
Genome Biology  2002;3(12):research0083.1-83.22.
The recent completion of the Drosophila melanogaster genomic sequence to high quality, and the availability of a greatly expanded set of Drosophila cDNA sequences, afforded FlyBase the opportunity to significantly improve genomic annotations.
The recent completion of the Drosophila melanogaster genomic sequence to high quality and the availability of a greatly expanded set of Drosophila cDNA sequences, aligning to 78% of the predicted euchromatic genes, afforded FlyBase the opportunity to significantly improve genomic annotations. We made the annotation process more rigorous by inspecting each gene visually, utilizing a comprehensive set of curation rules, requiring traceable evidence for each gene model, and comparing each predicted peptide to SWISS-PROT and TrEMBL sequences.
Although the number of predicted protein-coding genes in Drosophila remains essentially unchanged, the revised annotation significantly improves gene models, resulting in structural changes to 85% of the transcripts and 45% of the predicted proteins. We annotated transposable elements and non-protein-coding RNAs as new features, and extended the annotation of untranslated (UTR) sequences and alternative transcripts to include more than 70% and 20% of genes, respectively. Finally, cDNA sequence provided evidence for dicistronic transcripts, neighboring genes with overlapping UTRs on the same DNA sequence strand, alternatively spliced genes that encode distinct, non-overlapping peptides, and numerous nested genes.
Identification of so many unusual gene models not only suggests that some mechanisms for gene regulation are more prevalent than previously believed, but also underscores the complex challenges of eukaryotic gene prediction. At present, experimental data and human curation remain essential to generate high-quality genome annotations.
PMCID: PMC151185  PMID: 12537572
14.  The FlyBase database of the Drosophila Genome Projects and community literature. The FlyBase Consortium. 
Nucleic Acids Research  1999;27(1):85-88.
The FlyBase Drosophila genetics database and the public interfaces of the Berkeley Drosophila Genome Project (BDGP) and European Drosophila Genome Project (EDGP) are in the process of integrating. At present, the data of these projects are available from independent, but hyperlinked, WWW sites (FlyBase URL, http://flybase.; BDGP URL,; EDGP URL, ). Because of the considerable overlap of data classes between the contributions of the Drosophila genome projects and the Drosophila community, the new and enlarged FlyBase consortium views the implementation of a single integrated Drosophila genomics/genetics server as essential to the scientific community. This integration will occur in a stepwise fashion over the next 1-2 years. In this report, the salient features of the current databases and how to interrogate and navigate the extensive data sets are discussed.
PMCID: PMC148103  PMID: 9847148
15.  FlyPhy: a phylogenomic analysis platform for Drosophila genes and gene families 
BMC Bioinformatics  2009;10:123.
The availability of 12 fully sequenced Drosophila species genomes provides an excellent opportunity to explore the evolutionary mechanism, structure and function of gene families in Drosophila. Currently, several important resources, such as FlyBase, FlyMine and DroSpeGe, have been devoted to integrating genetic, genomic, and functional data of Drosophila into a well-organized form. However, all of these resources are gene-centric and lack the information of the gene families in Drosophila.
FlyPhy is a comprehensive phylogenomic analysis platform devoted to analyzing the genes and gene families in Drosophila. Genes were classified into families using a graph-based Markov Clustering algorithm and extensively annotated by a number of bioinformatic tools, such as basic sequence features, functional category, gene ontology terms, domain organization and sequence homolog to other databases. FlyPhy provides a simple and user-friendly web interface to allow users to browse and retrieve the information at multiple levels. An outstanding feature of the FlyPhy is that all the retrieved results can be added to a workset for further data manipulation. For the data stored in the workset, multiple sequence alignment, phylogenetic tree construction and visualization can be easily performed to investigate the sequence variation of each given family and to explore its evolutionary mechanism.
With the above functionalities, FlyPhy will be a useful resource and convenient platform for the Drosophila research community. The FlyPhy is available at .
PMCID: PMC2680407  PMID: 19393099
16.  Flytrap, a database documenting a GFP protein-trap insertion screen in Drosophila melanogaster 
Nucleic Acids Research  2004;32(Database issue):D418-D420.
Flytrap is a web-enabled relational database of transposable element insertions in Drosophila melanogaster. A green fluorescent protein (GFP) artificial exon carried by a transposable P-element is mobilized and inserted into a host gene intron creating a GFP fusion protein. The sequence of the tagged gene is determined by sequencing inverse-PCR products derived from genomic DNA. Flytrap contains two principle data types: micrographs of protein localization and a cellular component ontology, based on rules derived from the Gene Ontology consortium (, describing protein localization. Flytrap also has links to gene information contained in Flybase ( The system is designed to accept submissions of micrographs and descriptions from any type of tissue (e.g. wing imaginal disk, ovary) and at any stage of development. Insertion lines can be searched using a number of queries, including Berkeley Drosophila Genome Project (BDGP) numbers and protein localization. In addition, Flytrap provides online order forms linked to each insertion line so that users may request any line generated from this project. Flytrap may be accessed from the homepage at
PMCID: PMC308749  PMID: 14681446
17.  Prediction of Gene Expression in Embryonic Structures of Drosophila melanogaster 
PLoS Computational Biology  2007;3(7):e144.
Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms.
Author Summary
The task of deciphering the complex transcriptional regulatory networks controlling development is one of the major current challenges for molecular biology. The problem is difficult, if not impossible, to solve without a detailed knowledge of the spatiotemporal dynamics of gene expression. Thus, to understand development, we need to identify and functionally characterize all players in regulatory networks. Data on gene expression dynamics obtained from whole transcriptome microarray experiments, combined with in situ hybridization mRNA localisation patterns for a subset of genes, may provide a route for predicting the localisation of gene expression for those genes for which in situ data has not been generated, as well as suggesting functional information for uncharacterised genes. Here, we report the development of one of the first methods for predicting the localisation of gene expression during Drosophila embryogenesis from microarray data. Pooling the subset of genes in the fly genome with in situ data to form functional units, localised in space and time for relevant developmental processes, facilitates the statement of a classification problem, which we address with machine-learning methods. Our approach promotes a richer annotation of biological function for genes in the absence of costly and time-consuming experimental analysis.
PMCID: PMC1924873  PMID: 17658945
18.  Grasping at molecular interactions and genetic networks in Drosophila melanogaster using FlyNets, an Internet database. 
Nucleic Acids Research  1999;27(1):89-94.
FlyNets ( +html) is a WWW database describing molecular interactions (protein-DNA, protein-RNA and protein-protein) in the fly Drosophila melanogaster. It is composed of two parts, as follows. (i) FlyNets-base is a specialized database which focuses on molecular interactions involved in Drosophila development. The information content of FlyNets-base is distributed among several specific lines arranged according to a GenBank-like format and grouped into five thematic zones to improve human readability. The FlyNets database achieves a high level of integration with other databases such as FlyBase, EMBL, GenBank and SWISS-PROT through numerous hyperlinks. (ii) FlyNets-list is a very simple and more general databank, the long-term goal of which is to report on any published molecular interaction occuring in the fly, giving direct web access to corresponding s in Medline and in FlyBase. In the context of genome projects, databases describing molecular interactions and genetic networks will provide a link at the functional level between the genome, the proteome and the transcriptome worlds of different organisms. Interaction databases therefore aim at describing the contents, structure, function and behaviour of what we herein define as the interactome world.
PMCID: PMC148104  PMID: 9847149
19.  The use of semantic similarity measures for optimally integrating heterogeneous Gene Ontology data from large scale annotation pipelines 
Frontiers in Genetics  2014;5:264.
With the advancement of new high throughput sequencing technologies, there has been an increase in the number of genome sequencing projects worldwide, which has yielded complete genome sequences of human, animals and plants. Subsequently, several labs have focused on genome annotation, consisting of assigning functions to gene products, mostly using Gene Ontology (GO) terms. As a consequence, there is an increased heterogeneity in annotations across genomes due to different approaches used by different pipelines to infer these annotations and also due to the nature of the GO structure itself. This makes a curator's task difficult, even if they adhere to the established guidelines for assessing these protein annotations. Here we develop a genome-scale approach for integrating GO annotations from different pipelines using semantic similarity measures. We used this approach to identify inconsistencies and similarities in functional annotations between orthologs of human and Drosophila melanogaster, to assess the quality of GO annotations derived from InterPro2GO mappings compared to manually annotated GO annotations for the Drosophila melanogaster proteome from a FlyBase dataset and human, and to filter GO annotation data for these proteomes. Results obtained indicate that an efficient integration of GO annotations eliminates redundancy up to 27.08 and 22.32% in the Drosophila melanogaster and human GO annotation datasets, respectively. Furthermore, we identified lack of and missing annotations for some orthologs, and annotation mismatches between InterPro2GO and manual pipelines in these two proteomes, thus requiring further curation. This simplifies and facilitates tasks of curators in assessing protein annotations, reduces redundancy and eliminates inconsistencies in large annotation datasets for ease of comparative functional genomics.
PMCID: PMC4123725  PMID: 25147557
functional annotation; Gene Ontology annotation; annotation pipeline; manual annotation; electronic annotation
20.  FlyBase: anatomical data, images and queries 
Nucleic Acids Research  2005;34(Database issue):D484-D488.
FlyBase () is a database of genetic and genomic data on the model organism Drosophila melanogaster and the entire insect family Drosophilidae. The FlyBase Consortium curates, annotates, integrates and maintains a wide variety of data within this domain. Access to the data is provided through graphical and textual user interfaces tailored to particular types of data. FlyBase data types include maps at the cytological, genetic and sequence levels, genes and alleles including their products, functions, expression patterns, mutant phenotypes and genetic interactions as well as aberrant chromosomes, annotated genomes, genetic stock collections, transposons, transgene constructs and insertions, anatomy and images, bibliographic data, and community contact information.
PMCID: PMC1347431  PMID: 16381917
21.  The Drosophila phenotype ontology 
Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions.
We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable.
The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.
PMCID: PMC3816596  PMID: 24138933
Drosophila; Phenotype; Ontology; OWL; OBO; Gene ontology; FlyBase
22.  FlyBase: the Drosophila database. The Flybase Consortium. 
Nucleic Acids Research  1996;24(1):53-56.
FlyBase is a database of genetic and molecular data concerning Drosophila. FlyBase is maintained as a relational database (in Sybase). The scope of FlyBase includes: genes, alleles (and phenotypes), aberrations, pointers to sequence data, clones, stock lists, Drosophila workers and bibliographic references. FlyBase is also available on CD-ROM for Macintosh systems (Encyclopaedia of Drosophila).
PMCID: PMC145580  PMID: 8594600
23.  BrainTrap: a database of 3D protein expression patterns in the Drosophila brain 
Protein-trap strains of Drosophila melanogaster provide a very useful tool for examining the 3D-expression patterns of proteins and purification of protein complexes. Here we present BrainTrap, available at, an online database of 3D confocal datasets showing reporter gene expression and protein localization in the adult brain of Drosophila. Full size images throughout the volume of the entire brain can be viewed interactively in a web browser. The database includes searchable annotations linked to the FlyBase Drosophila anatomy ontology. Anatomical search criteria can be specified using automatic completion and a hierarchical browser for the ontology. The provenance of all annotation is retained and the location where the annotator made the conclusion can be highlighted.
Database URL:
PMCID: PMC2911840  PMID: 20624714
24.  FlyNets and GIF-DB, two internet databases for molecular interactions in Drosophila melanogaster. 
Nucleic Acids Research  1998;26(1):89-93.
GIF-DB and FlyNets are two WWW databases describing molecular (protein-DNA, protein-RNA and protein-protein) interactions occuring in the fly Drosophila melanogaster ( ). GIF-DB is a specialised database which focuses on molecular interactions involved in the process of embryonic pattern formation, whereas FlyNets is a new and more general database, the long-term goal of which is to report on any published molecular interaction occuring in the fly. The information content of both databases is distributed in specific lines arranged into an EMBL- (or GenBank-) like format. These databases achieve a high level of integration with other databases such as FlyBase, EMBL, GenBank and SWISS-PROT through numerous hyperlinks. In addition, we also describe SOS-DGDB, a new collection of annotated Drosophila gene sequences, in which binding sites for regulatory proteins are directly visible on the DNA primary sequence and hyperlinked both to GIF-DB and TRANSFAC database entries.
PMCID: PMC147170  PMID: 9399807
25.  Characterizing the developmental transcriptome of the oriental fruit fly, Bactrocera dorsalis (Diptera: Tephritidae) through comparative genomic analysis with Drosophila melanogaster utilizing modENCODE datasets 
BMC Genomics  2014;15(1):942.
The oriental fruit fly, Bactrocera dorsalis, is an important pest of fruit and vegetable crops throughout Asia, and is considered a high risk pest for establishment in the mainland United States. It is a member of the family Tephritidae, which are the most agriculturally important family of flies, and can be considered an out-group to well-studied members of the family Drosophilidae. Despite their importance as pests and their relatedness to Drosophila, little information is present on B. dorsalis transcripts and proteins. The objective of this paper is to comprehensively characterize the transcripts present throughout the life history of B. dorsalis and functionally annotate and analyse these transcripts relative to the presence, expression, and function of orthologous sequences present in Drosophila melanogaster.
We present a detailed transcriptome assembly of B. dorsalis from egg through adult stages containing 20,666 transcripts across 10,799 unigene components. Utilizing data available through Flybase and the modENCODE project, we compared expression patterns of these transcripts to putative orthologs in D. melanogaster in terms of timing, abundance, and function. In addition, temporal expression patterns in B. dorsalis were characterized between stages, to establish the constitutive or stage-specific expression patterns of particular transcripts. A fully annotated transcriptome assembly is made available through NCBI, in addition to corresponding expression data.
Through characterizing the transcriptome of B. dorsalis through its life history and comparing the transcriptome of B. dorsalis to the model organism D. melanogaster, a database has been developed that can be used as the foundation to functional genomic research in Bactrocera flies and help identify orthologous genes between B. dorsalis and D. melanogaster. This data provides the foundation for future functional genomic research that will focus on improving our understanding of the physiology and biology of this species at the molecular level. This knowledge can also be applied towards developing improved methods for control, survey, and eradication of this important pest.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-942) contains supplementary material, which is available to authorized users.
PMCID: PMC4223851  PMID: 25348373

