The Mouse Genome Database (MGD) is an integrated database of genetic, genomic and phenotypic data for the laboratory mouse (1–3
). MGD is a central component of the Mouse Genome Informatics (MGI) database resource (http://www.informatics.jax.org
), the community model organism database for the laboratory mouse. Other MGI data resources integrated with MGD includes the Gene Expression Database (GXD) (4
), the Mouse Tumor Biology Database (MTB) (5
), the Gene Ontology (GO) project (6
) and the MouseCyc database of biochemical pathways (7
). Data in MGD are updated daily. There are typically four to six major software releases per year to support access and display of new data types.
The primary data types maintained in MGD include mouse genes and other genome features along with their function and phenotype annotations, associations of genome features with nucleotide and protein sequences, genetic and physical maps, gene families, mutant phenotypes, SNPs and other polymorphisms animal models of human disease, and mammalian homology. A recent summary of MGD content is shown in .
Summary of MGD data content (10 September 2009)
MGD is the authoritative source for mouse gene, allele and strain nomenclature, Gene Ontology annotations for mouse gene function, and Mammalian Phenotype (MP) Ontology (8
) annotations for phenotype associations. MGD contains the most comprehensive source of mouse phenotype information and associations between human diseases and mouse models. MGI curatorial staff acquire data by direct data loads from other databases, from direct submission from researchers and from published literature. To facilitate data integration, MGI employs recognized standards for genetic nomenclature and functional annotation to describe mouse sequence data, genes, strains, expression data, alleles and phenotypes. All data associations in MGD are supported with evidence and citations.
Researchers can query MGD using keyword searches, vocabulary browsers and advanced web-based query forms. Keyword search supports the use of the wildcard characters (i.e.*) for broad searches and the use of quotation marks for specific phrases search. MGD also provides vocabulary browsers for GO annotations, MP annotations and Human Disease Term annotations to support browsing of the database content. The web-based query forms in MGD allow, users to construct queries of differing degrees of specificity. For example, using the Genes and Markers Query form in MGD, a researcher query broadly for all genes on mouse Chromosome 3 or specifically for genes on Chromosome 3 that are associated with specific phenotypes and/or functions (i.e. show me all genes on mouse Chromosome 3 that are associated with respiratory distress and that have been annotated functionally as being enzymes). The MGI MouseBLAST server allows users to interrogate the MGI database using nucleotide and/or protein sequences. Access to data in MGD is also facilitated by summary data files that are updated nightlyand available for download via FTP, and through direct SQL (Structured Query Language; user account is required).
The staff of MGD collaborates with members of other large genome informatics resources including NCBI (http://www.informatics.jax.org
), Ensembl (http://www.ensembl.org
), UCSC Genome Browser (http://genome.ucsc.edu
) and the Vertebrate Genome Annotation (Vega) group (http://vega.sanger.ac.uk/index.html
), to maintain a comprehensive catalog of mouse genes and other genome features, and also to resolve inconsistencies in the representation of mouse genome features as needed. Biological annotations for mouse genes based on MGD curation are incorporated into scores of external informatics resources and software products.