|Home | About | Journals | Submit | Contact Us | Français|
The Gene Expression Database (GXD) is a community resource of mouse developmental expression information. GXD integrates different types of expression data at the transcript and protein level and captures expression information from many different mouse strains and mutants. GXD places these data in the larger biological context through integration with other Mouse Genome Informatics (MGI) resources and interconnections with many other databases. Web-based query forms support simple or complex searches that take advantage of all these integrated data. The data in GXD are obtained from the literature, from individual laboratories, and from large-scale data providers. All data are annotated and reviewed by GXD curators. Since the last report, the GXD data content has increased significantly, the interface and data displays have been improved, new querying capabilities were implemented, and links to other expression resources were added. GXD is available through the MGI web site (www.informatics.jax.org), or directly at www.informatics.jax.org/expression.shtml.
As a primary mammalian model of human disease, the mouse is used extensively for expression studies to determine the role of genes that function in molecular pathways during developmental and disease processes. With a focus on endogenous gene expression during development, the Gene Expression Database (GXD) collects data from the scientific literature, from individual laboratories, and from large-scale data providers. It makes these data readily available to the research community in a highly curated and integrated format that allows for a large variety of database queries. GXD captures a broad spectrum of assay types, including RNA in situ hybridization, immunohistochemistry, knock-in reporter assays, northern blot, western blot, RT–PCR, RNase protection and S1 nuclease assays. It covers all developmental stages and tissues and includes data from many different mouse strains and mutants, giving researchers a tool to examine the effects of mutations on gene expression. GXD forms an important and integral component of the larger Mouse Genome Informatics (MGI) resource. Therefore, the expression data are fully integrated with mouse genetic, sequence, functional and phenotypic information (1–4). MGI maintains further links to many other resources such as GenBank, gene model resources, Entrez Gene, UniProt, InterPro, Online Mendelian Inheritance in Man (OMIM) and the International Knockout Mouse Consortium (IKMC) among others (5–14). This robust integration puts the expression data in GXD into a much larger biological and analytical context.
Other databases that store mouse expression information have been developed in recent years. They store data from one or two specific assay types and/or focus on specific developmental stages; they are often dedicated to specific data generation projects (15–22). As will be evident from this article, GXD is working with those resources that are complementary to GXD, adding value through data integration and the implementation of new interconnections. Due to its broad scope, its extensive data curation and integration efforts, and the resulting querying capabilities, GXD continues to provide a unique resource to the biomedical research community. GXD is updated daily. GXD and its query interfaces have been described earlier (23–27). Here, we report on our recent progress in terms of data acquisition, and on the implementation of new query and display features.
To present the objectives of GXD more clearly and to make the database more intuitive to use, we redesigned the GXD homepage (www.informatics.jax.org/expression.shtml). The new layout provides clear access to the various query forms, with short descriptors for each form. The ‘Frequently Asked Questions (FAQ’s)’ section provides links to brief on-line tutorials demonstrating how one can search for different types of data in GXD. The ‘GXD Includes’ section provides information about the current data content in GXD, such as the number of genes with annotated expression data, the number of expression results and the number of images in the database. The ‘Gene Expression News’ section informs users when new features, capabilities and data sets become available. A series of tabs at the bottom of the home page provides access to help documentation and data policies, information about GXD and its collaborators, and links to guidelines and tools that help researchers to submit data electronically. That GXD is also an integral component of MGI is made transparent through the use of a central ‘Quick Search’ (see below), a common navigation bar and common drop-menus and tab choices that direct users to various data sets, search forms, tools and other resources. Large icons on the MGI homepage provide visual cues to the various core areas, including expression (GXD).
In the Literature Summary, GXD provides users with a way to quickly determine what mouse developmental expression data are available in the scientific literature. The staff of GXD searches the scientific literature for publications that present endogenous gene expression experiments during mouse development. In a first annotation step for each publication, the genes analyzed, the ages of mice used in the experiment, and the type of experimental assay performed for each gene are recorded and entered into the database. These data are easily searched using the ‘Gene Expression Literature Query Form’. These queries can also include citation (author, journal, year) and abstract information. However, this tool takes users further than a Pubmed search because the data in the GXD Literature Summary are based on the curation of the full-text of the paper, including supplemental information, and annotations are standardized with regard to gene, age and assay type information. The Literature Summary is comprehensive and up-to-date. It includes all journal articles containing expression data during mouse development from 1993 to the present and all articles from major developmental journals since 1990. Currently, the GXD Literature Summary has 108604 records covering data for 13619 genes from 17521 references.
Beyond summaries, GXD also provides detailed records of experimental expression results. GXD assay records contain the authors’ description of the tissue pattern and strength of expression, translated into standard terminology (see below), the probe or antibody information available, as well as the specimen age, genetic background and preparation (Figure 1). The expression information is recorded in standardized formats by making extensive use of controlled vocabularies and ontologies, thus enabling data integration and complex querying capabilities. To capture the author’s descriptions of where expression was or was not detected, GXD uses a dictionary of anatomical terms that lists the anatomical structures for each developmental stage in a hierarchical fashion. In this way, expression data from assays with differing spatial resolution can be recorded in a consistent manner, and the expression pattern information becomes accessible to hierarchical searches. The developmental part of the anatomical dictionary was established by our collaborators from the EMAGE project (28) and is being extended and refined jointly; the postnatal part was developed by the GXD project (29).
In addition to curating expression data from the literature on a daily basis, GXD continues to work with individual laboratories and large-scale data providers to bring their data sets into the database. Recently acquired large data sets include: an RT–PCR screen of more than 800 genes in pre-implantation embryos (30); an RNA in situ hybridization screen of 293 genes at E7.5 (31); an RNA in situ hybridization screen of more than 200 genes in the developing hypothalamus (32); a set of knock-in lacZ reporter studies from Deltagen Inc.; an RNA in situ hybridization data for 745 genes at E14.5 from GenePaint (22) and a data set from the Eurexpress project (www.eurexpress.org), covering RNA in situ hybridization data for 6409 genes at E14.5. In all these cases, GXD staff worked with the data providers to bring the data into a standardized format that could be incorporated into GXD and to resolve issues pertinent to nomenclature, incompleteness, and referential integrity. Upon computational and manual review, the data were bulk-loaded into GXD.
Due to all these efforts, the content of expression data in GXD has increased tremendously in recent years. The database currently holds almost 930000 expression results from 45305 expression assays for 12139 genes. This includes expression data from 1503 different mouse mutants.
An effective way to access the expression information for a single gene is to look up the ‘Expression’ section on the corresponding gene detail page (Figure 2). Gene detail pages can be easily found via the ‘Quick Search’ tool placed at the top of all MGI pages. Because these pages are a central hub within MGI, we have improved the layout of the expression section, implemented a new images summary (see below), and added new gene-specific links to external expression resources, including the Allen Brain Atlas (18), GENSAT (17), GEO (33) and ArrayExpress (34). Future work will include the implementation of links to model organisms from other species that hold developmental expression data, such as ZFIN (35), GEISHA (36), Xenbase (37) and Flybase (38).
The query forms available from the GXD homepage, as well as from the MGI search menu, provide more direct and more powerful means to access expression data. The ‘Gene Expression Data Query Form’ includes search fields that allow users to specify genes, tissue, developmental stage and expression assay type, as well as genome coordinates and gene ontology (GO) terms. Users can search for instances where expression was detected or not detected, as well as for expression data in specific mutants. Thus, this form enables both simple and very complex queries. The ‘Expanded Expression Data Query Form’ allows users to find genes that are expressed in one set of tissues and/or developmental stages, but not in others. The ‘Mouse Anatomical Dictionary Browser’ lets users search or browse for specific anatomical structures and the expression data associated with them. The ‘EMAGE Anatomical Section Browser’ uses the Edinburgh Mouse Atlas as a starting point for identifying anatomical structures and expression data associated with them in GXD and EMAGE.
Searches from the query form as well as the links from the expression section on the gene detail page lead to data summaries listing all assays/results that match the user’s criteria. We have improved the layout of these summaries. They now feature clear ‘data’ links that take the user to the detailed entries, as well as camera icons that indicate if the detailed entries contain links to primary image data (Figures 1 and and22).
Other avenues in MGI can be used to access the expression data as well. We have increased the utility of the Quick Search tool by adding the capability to return expression results if the user enters anatomical terms. We have expanded the Batch Query tool so that it can return GXD expression data for a list of genes or sequence IDs, as well as export expression data as Excel worksheets or tab-delimited text files (Figure 3).
In addition to implementing gene specific links to the microarray gene expression data at GEO and ArrayExpress, we have, in collaboration with the Mouse Genome Database (MGD) project, developed new tools that allow researchers to combine microarray expression results with data in MGI. Specifically, we have begun to provide up-to-date mappings between microarray probe sets and genes so that users can query MGI by microarray probe set ID to retrieve the associated gene, or download reports of all probe set-to-gene mappings. We have also enhanced the MGI Batch Query tool so that it accepts sets of microarray probe IDs. Therefore, one can easily extract information about genes for sets of microarray probes, including RNA in situ hybridization, immunohistochemistry and RT–PCR expression data, GO annotations, mouse phenotype data and disease data associated with the orthologous human genes. Query results are available as web page displays and can also be downloaded as tab-delimited text and thus fed into other analysis tools (Figure 3).
GXD is implemented in the Sybase relational database management system. If a user’s queries are beyond the capabilities of our web-based forms, custom SQLs or direct SQL access can be requested via our user support group. In addition, a number of reports for specific data sets, such as a list of all the genes that have annotated expression data, are produced nightly and are available on the website.
Whenever possible, the standardized text annotations of expression data are linked to the primary image data. There are currently more than 191 099 images in GXD. Images are accessible via the ‘Expression’ section on the gene detail page. The recently implemented image summary displays all the images for the corresponding gene as thumbnails (Figure 2). These thumbnails are then linked to the full-sized image page. However, users are not restricted to selecting images on a per gene basis. Because all images are extensively indexed via the text annotations, they are accessible through the different search parameters that GXD provides on its query forms. Query result summaries display camera icons for entries that contain image data. The detailed entries then link directly to primary images obtained from the publication or from the data provider.
As part of our longstanding collaboration with the EMAGE project, which aims to ultimately create a resource that fully combines standardized text based and graphical means for storing and querying expression data (39), GXD makes all its in situ images and pertinent text annotations available to EMAGE. EMAGE then provides 3D spatial mappings of selected in situ data. The image detail pages in GXD now link to entries in EMAGE that show the image spatially mapped onto the 3D atlas (Figure 1).
Large-scale in situ hybridization projects have contributed both data and images to GXD. Two of them, GenePaint and Eurexpress, provide additional images and features for image viewing at their web sites. Therefore, GXD has added links from the image page at GXD to the corresponding image entries at GenePaint and Eurexpress. These successful collaborations illustrate how community resources such as GXD and project-oriented resources such as GenePaint and Eurexpress can work with each other. GXD adds value to the incorporated data by integrating them with other expression data and with all the other types of data stored in MGI and by providing unique and powerful querying capabilities. Further, GXD maintains the data and keeps them current with regard to nomenclature and data connections. Conversely, the GenePaint and Eurexpress data significantly enrich GXD, and the links implemented to GenePaint and Eurexpress provide additional utility such as high-resolution images and zoom functions.
Users can contribute data directly to GXD. Guidelines for electronic submission are available on the web site, and a GXD staff member is available to assist. GXD has developed an Excel-based program, the Gene Expression Notebook (GEN), which provides users with a template for submitting expression data to GXD. A variety of other formats are also accepted. The GEN is also useful in the lab for maintaining expression data and is available for free at http://www.informatics.jax.org/mgihome/GXD/GEN/index.shtml. All data submitted to GXD are given accession IDs that can be referenced in publications and grant applications.
GXD provides users with extensive on-line documentation available by clicking on the question marks available on many pages. In addition, user support staff can be contacted via the ‘Help’ or ‘Contact Us’ links on the website, by email (email@example.com) or by telephone (207-288-6445).
The following citation format is suggested when referring to data from GXD: These data were retrieved from the GXD, and MGI, The Jackson Laboratory, Bar Harbor, Maine, USA (URL: http://www.informatics.jax.org). [Type in date (month, year) when you retrieved the data cited.] To reference the database itself, please cite this article.
National Institutes of Health, Eunice Kennedy Shriver National Institute of Child Health and Human Development grant HD033745. Funding for open access charge: National Institutes of Health grant HD033745.
Conflict of interest statement. None declared.
We would like to thank our colleagues from the other MGI projects for their contributions to the GXD project and to the larger integrated MGI resource.