|Home | About | Journals | Submit | Contact Us | Français|
The laboratory mouse is the premier animal model for studying human biology because all life stages can be accessed experimentally, a completely sequenced reference genome is publicly available and there exists a myriad of genomic tools for comparative and experimental research. In the current era of genome scale, data-driven biomedical research, the integration of genetic, genomic and biological data are essential for realizing the full potential of the mouse as an experimental model. The Mouse Genome Database (MGD; http://www.informatics.jax.org), the community model organism database for the laboratory mouse, is designed to facilitate the use of the laboratory mouse as a model system for understanding human biology and disease. To achieve this goal, MGD integrates genetic and genomic data related to the functional and phenotypic characterization of mouse genes and alleles and serves as a comprehensive catalog for mouse models of human disease. Recent enhancements to MGD include the addition of human ortholog details to mouse Gene Detail pages, the inclusion of microRNA knockouts to MGD’s catalog of alleles and phenotypes, the addition of video clips to phenotype images, providing access to genotype and phenotype data associated with quantitative trait loci (QTL) and improvements to the layout and display of Gene Ontology annotations.
The laboratory mouse is widely recognized as the premier animal model for investigating genetic and cellular systems relevant to human biology and disease. A large arsenal of experimental genetic tools is available for mouse, including unique inbred strains, a complete reference genome (and deep-sequencing data for 17 additional inbred lines), extensive genome variation maps (e.g. Single Nucleotide Polymorphisms) and technologies for directly and specifically manipulating the mouse genome. An international effort to knockout all mouse genes has produced an ES cell line resource covering over 18 000 genes (1) and the phenotyping phase has begun (2). New resources for complex trait mapping including the Collaborative Cross and Diversity Outbred mice are beginning to emerge (3,4). In the arena of human genetics and genomics, exome sequencing and the quest for lower and lower cost genome sequences will change again the way we approach computational and experimental methods for understanding the biology of the genome. The mouse is essential for the functional analysis and annotation of rapidly emerging human genomes through comparative genomics.
Realizing the full power of the mouse as a model of human biology depends, in part, on integrating the diverse genetic, genomic and phenotypic data for the mouse in ways that promote experimental and translational research. The central objective of the Mouse Genome Database (MGD) is to provide an integrative and comparative bioinformatics resource that supports the effective translation of information from experimental mouse models to uncover the genetic basis of human diseases. MGD is the highly curated, community model organism database for the laboratory mouse providing web and programmatic access to a complete catalog of mouse genes and genome features integrated with functional annotations, a comprehensive catalog of mutant alleles, phenotype annotations, human disease model annotations, variation data and sequence data. MGD went online via the World Wide Web in 1994, unifying and harmonizing several different databases of genetic map and allele information for the laboratory mouse. MGD has evolved rapidly, re-tooling and enhancing the database to adapt to the multitude of new data types, developing and upgrading data access tools for an increasingly diverse community of researchers, and adopting new database and software technologies as they have emerged and matured.
MGD is the central component of a number of coordinated genome informatics projects that are part of the Mouse Genome Informatics (MGI) consortium (http://www.informatics.jax.org). Other database resources available through the MGI web portal include the Gene Expression Database (GXD) (5), the Mouse Tumor Biology Database (6), the Gene Ontology (GO) project (7) and the MouseCyc database of biochemical pathways (8). Taken together, these resources provide a combination of data breadth, depth, integration and quality that exists nowhere else for mouse.
The curation efforts within MGD focus on maintaining a catalog of genes and other genome features, functional annotation of mouse genes using Gene Ontology terms, annotation of phenotypes associated with genotypes using terms from the Mammalian Phenotype Ontology and the association of mouse models with human disease. Data release for MGD occur weekly. A summary of the database content for MGD is given in Table 1.
A banner displaying information about the human ortholog of each mouse gene was added to the Gene Detail pages in MGD to improve comparisons of gene–disease associations in mouse and human. The human ortholog detail stripe is positioned above the section of the Gene Detail page that describes alleles and phenotypes for the mouse gene (Figure 1). For each human ortholog, the name and location of the human gene is provided and, if relevant, a list of associated diseases according to the On Line Mendelian Inheritance in Man (OMIM) resource (9) is displayed. The combination of the human ortholog and alleles/phenotypes sections of the Gene Detail page facilitates the ability of the researchers to determine cases where the human gene is associated with a disease and the mouse gene is not (or has yet to be specifically tested as a model) (Figure 1).
By providing information on concordant and discordant instances of mutations in orthologous genes resulting in phenotypes that model-specific human diseases MGD can be used to discover potential candidate genes for human diseases that have no gene associations in human; and to discover mutations in mice that should be examined as new models of human disease. For example, the spermatogenesis associated 16 (Spata16; MGI: 1 918 112) gene is the mouse ortholog for the human SPATA16 (HGNC: 29 935) gene. In humans, mutations in this gene are associated with Apermatogenic Failure 6 (SPGF6) (OMIM 102 530). In mouse, there are currently three alleles for the Spata16 gene; however, all of these mutants exist only in ES cell lines, thus representing potential sources of mouse models for this disease once the ES cells are made into mice and phenotyped. Conversely, one can observe where a mouse disease model has been associated with a human disease, but there is not yet evidence for the human–disease association to the human ortholog. For example, the mouse cholinergic receptor, muscarinic 3, cardiac (Chrm3, MGI: 88398) gene are a model for human Megacystis–Microcolon–Intestinal Hypoperistalsis Syndrome (OMIM 249 210). Thus, study of existing mouse models can facilitate discovery of candidates for disease genes in human.
In some cases, alleles of the mouse gene are associated with human disease phenotypes that differ from associations reported in OMIM. For example, for the mouse caveolin 1 gene (Cav1; MGI: 102 709), the human ortholog (CAV1) is associated with congenital lipodystrophy (OMIM: 612 526). However, the genotypes in mouse are associated with human breast cancer (OMIM: 114 480) and Alzheimer’s disease (OMIM: 104 300) but not with lipodystrophy. The bicaudal C homolog 1 (Bicc1; MGI: 1 933 388) gene in mouse is reported as a model for three human diseases in OMIM [Heterotaxy (HTX5), OMIM: 270 100; Polycystic Kidney Disease 1 (PKD1), OMIM: 173 900; and PKD, ARPKD, OMIM: 263 200). In contrast, the human ortholog, BICC1 (HGNC: 19 351), is not associated with any disease according to OMIM.
In recent years, the importance of small regulatory RNAs, including microRNAs, in posttranscriptional gene regulation has been recognized. Mice carrying targeted mutations in microRNAs are important resources for characterizing the biological functions of these molecules. Several initiatives have been launched to generate ES cell lines and mice with targeted mutations in microRNAs (10,11). MGD has added these emerging microRNA ‘knockouts’ to the comprehensive catalog of alleles and phenotypes in mouse. Details for microRNA alleles includes the description of the mutation, links to published references, description of observed phenotypes if available and links to the International Mouse Strain Resource (12,13) for information on the availability of strains or cell lines that carry a specific microRNA allele. To date, 434 alleles in 284 microRNAs have been entered into MGD. Although many of these mutant alleles are available as ES cell lines, approximately 170 have been made into live mice. With respect to phenotype annotations, 67 of the microRNA knockout mice in MGD have phenotype annotations, 5 have no abnormal phenotype and 98 have yet to be phenotyped. As reports appear in the published literature or through large-scale mouse phenotyping projects, the annotations for microRNA knockouts will be updated.
MGD has regularly included still images that illustrate mouse phenotypes associated with alleles and genotypes. Brief video clips of mouse phenotypes have been added recently to provide a new dimension of information on the phenotypic consequences of genomic variants. The over 340 phenotype videos available in MGD are presented as YouTube® clips embedded in the web pages. These videos were generated by the National Heart Lung and Blood Institute’s Bench-to-Bassinet program within the Cardiovascular Development Consortium. The imaging modalities represented include Episcope Fluorescence Image Capture (EFIC) image stacks, video microscopy, ultrasound imaging and micro-CT scans. If phenotype images or videos for alleles of a specific gene are available, a direct link to the images can be found in the Alleles and Phenotypes section of the Gene Detail pages in MGD and on the Phenotype Detail pages for specific mutant alleles. Figure 2 shows a link to the 15 phenotype images associated with alleles of the bicaudal C homolog 1 (Bicc1; MGI: 1 933 388) gene; one of the available images for the Bicc1b2b222Clo allele is a 2D serial EFIC image stack of the heart in coronal view. Investigators can submit phenotype videos for either existing or new alleles reported in MGD by following the Submit Data link on the MGI home page and following the instructions for data file submissions.
MGD staff curate published reports of quantitative trait locus-mapping experiments and, where possible, translates the mapping data into genome coordinates so that regions of the genome associated with mapped phenotypes can be displayed in a genome context. Reciprocal links have been established between quantitative trait loci (QTL) records in MGD records in the QTL Archive (http://www.qtlarchive.org/). The QTL Archive extends the utility of mapped phenotypes in MGD by providing researchers with the access to underlying genotype and phenotype data used to map a QTL. Of the 4715 QTL marker records in MGD, over 750 have data available in the QTL Archive.
MGD is one of the founding members of the Gene Ontology Consortium (GOC) (14,15) and provides major contributions to the development of the GO ontologies and to developing GO community standards for curation of the scientific literature. MGD project curators are responsible for annotating mouse genes and gene products to GO ontology terms.
Improvements in the GO knowledge representation and annotation procedures are incorporated into MGD functional annotation workflows as they are developed (see Gene Ontology Consortium (7). Updated ontologies are loaded into the MGI system and mouse GO annotations are contributed to the GOC on a weekly basis. MGD has the community responsibility of provided non-redundant set of mouse GO annotations to the research community through the MGI database resource and through the GOC annotation repository and database. The UniProt-Gene Ontology Annotation (GOA) project is the major other provider of mouse GO annotations (16).
New visualization paradigms for displaying GO annotations have been implemented in MGD (Figure 3). The text-based summaries of gene/protein function previously displayed have been replaced by shorter summary statements obtained from NCBI’s RefSeq resource (17) for each gene. RefSeq summaries include the source of the summarized information as well as the date the information was last updated. When RefSeq statements are not available for the mouse gene, statements pertaining to the orthologous human gene are included. Orthology assertions between mouse and human genes are taken from NCBI’s Homologene resource (18). Previously supported tabular and graphical options for displaying GO annotations are still supported in MGD.
Most of the data in MGD comes from semi-automated curation of the peer-reviewed scientific literature and from collaborative/cooperative arrangements with large, mouse-related data centers and repositories and other informatics resources. MGD also supports electronic data contributions directly from individual researchers. Any type of data that MGD maintains can be submitted as an electronic contribution. Other common types of submission include mutant and QTL-mapping data. Each electronic submission receives a permanent database accession ID. All data sets are associated with their source, either a publication or an electronic submission reference. MGD reference pages provide links to associated data sets. On-line information about data submission procedures is found at the URL: http://www.informatics.jax.org/submit.shtml.
MGD provides extensive user support through on-line documentation, email and phone access to User Support Staff.
User Support can be accessed by:
Additional outreach and support are provided by a moderated email bulletin board, MGI-LIST (http://www.informatics.jax.org/mgihome/lists/lists.shtml). MGI-LIST is managed by the MGI User Support team and has over 2000 subscribers and an average of 75 posts/discussions per month.
The software, database and hardware components comprising MGD are organized into a front end, where the data are made available to the public and a back end, where data are loaded/curated/integrated. Most of the components that were previously supported by a Sybase (http://www.sybase.com) relational database management system have been replaced with a combination of PostgreSQL (http://www.postgresql.org) and Solr/Lucene indexes (http://lucene.apache.org/solr). Solr is an enterprise search server built on the Lucene text searching library. It provides powerful and fast text searching via an applications programming interface (API) over the web (via HTTP). Components maintained outside of the main MGD system include BLAST-able databases and genome assemblies, the databases that support Mouse GBrowse (19) resource and the MGI BioMart (20,21) instance.
There are two primary means by which data are entered into MGD: the editing interface (EI) and automated load programs. The EI is an interactive and graphical application. Curators use the EI to enter new data from the literature, to verify the results of automated loads and to correct errors. The automated load programs integrate larger data sets from many sources into the database. Automated loads involve quality control checks and processing algorithms that integrate the bulk of the data automatically and identify issues to be resolved by curators or the data provider. Through these two vehicles, the EI and automated loads, MGD is able to scale and adapt as new data sources for the mouse are made available.
Access to information in MGD is provided in several ways to support our diverse community of users including the web interface, Batch Query tool, FTP and a web services API.
Interactive web-based interfaces are the primary means of access to MGD. The keyword based ‘Quick Search’ option on the MGI home page is the most commonly used search tool for single concept searches. The Batch Query tool (19) is a component of the MGD web interface that enables searches by lists of genes. It can be used as an accession ID translator (e.g. to convert a list of MGI IDs to the corresponding list of EntrezGene IDs) or as a way to retrieve a set of information for a collection of genes/features (e.g. to obtain all the GO annotations for a list of genes). In either case, the input is a user-specified collection of IDs in a text field or file upload. Gene symbols and a wide variety of ID types are accepted, including IDs from MGI, EntrezGene, Ensembl, Havana/Vega, GenBank, RefSeq, UniProt, RefSNP, Affymetrix, GO, etc. Users also select their desired output, including annotations from GO, MP, OMIM; phenotypic alleles; gene expression results (from GXD); or any of the above ID types. The Batch Query maps each input ID to any corresponding genes/features in MGD (may be more than one) and returns them with the requested data. The Batch Query is fully integrated with the MGD web interface and is called from various pages to generate a user customizable gene/feature summary. Results are available as HTML, tab delimited or Excel format.
Other web interfaces to MGD include MouseBLAST for sequence similarity searches against a variety of rodent-relevant sequence databases, Mouse GBrowse for genome centric browsing and MGI’s BioMart for searches that combine results from MGI and Ensembl.
MGD’s public FTP reports include over 50 flat file reports that are generated weekly. Most external informatics resources that incorporate data from MGD obtain their data from these reports. Custom reports are created upon request.
The MGI web services API is a Simple Object Access Protocol-based interface to the database providing programmatic access with identical functionality as the Batch Query tool described above.
For a general citation of the MGD resource, researchers should cite this article. In addition, the following citation format is suggested when referring to data sets specific to the MGD component of MGI: MGD, MGI, The Jackson Laboratory, Bar Harbor, Maine (URL: http://www.informatics.jax.org). [Type in date (month, year) when you retrieved the data cited.]
National Institutes of Health; National Human Genome Research Institute [HG000330]. Funding for open access charge: Grant funds.
Conflict of interest statement. None declared.
The Mouse Genome Database Group: M.T. Airey, A. Anagnostopoulos, R. Babiuk, R.M. Baldarelli, J.S. Beal, S.M. Bello, N.E. Butler, J. Campbell, L.E. Corbani, H. Dene, H.R. Drabkin, K.L. Forthofer, S.L. Giannatto, M. Knowlton, J.R. Lewis, M. McAndrews, S. McClatchy, D.S. Miers, L. Ni, H. Onda, J.E. Ormsby, J.M. Recla, D.J. Reed, B. Richards-Smith, D.R. Shaw, D. Sitnikov, C.L. Smith, M. Tomczuk, L.L. Washburn, Y. Zhu.