PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of narLink to Publisher's site
 
Nucleic Acids Res. 2009 January; 37(Database issue): D720–D730.
Published online 2008 November 5. doi:  10.1093/nar/gkn778
PMCID: PMC2686531

Mouse Phenome Database

Abstract

The Mouse Phenome Database (MPD; http://www.jax.org/phenome) is an open source, web-based repository of phenotypic and genotypic data on commonly used and genetically diverse inbred strains of mice and their derivatives. MPD is also a facility for query, analysis and in silico hypothesis testing. Currently MPD contains about 1400 phenotypic measurements contributed by research teams worldwide, including phenotypes relevant to human health such as cancer susceptibility, aging, obesity, susceptibility to infectious diseases, atherosclerosis, blood disorders and neurosensory disorders. Electronic access to centralized strain data enables investigators to select optimal strains for many systems-based research applications, including physiological studies, drug and toxicology testing, modeling disease processes and complex trait analysis. The ability to select strains for specific research applications by accessing existing phenotype data can bypass the need to (re)characterize strains, precluding major investments of time and resources. This functionality, in turn, accelerates research and leverages existing community resources. Since our last NAR reporting in 2007, MPD has added more community-contributed data covering more phenotypic domains and implemented several new tools and features, including a new interactive Tool Demo available through the MPD homepage (quick link: http://phenome.jax.org/phenome/trytools).

INTRODUCTION

The laboratory mouse is an invaluable model organism for investigating the genetic basis of human disease. Studies have demonstrated the efficacy of comparative mouse–human genomics to identify novel mechanisms of human disease progression, underscoring the need to make mouse strain data widely available for community access. Using inbred strain data for integrative studies leverages their fixed genotypes and expands their utility to determine molecular relationships between disease and associated risk factors.

The Mouse Phenome Project was launched as an international collaboration to complement the mouse genome sequencing effort and provide a research resource and integral tool for complex trait analysis (1). This powerful approach, termed phenomics, captures complexities of entire biological pathways that are not accessible through conventional approaches. A central database was built to support the Project and provide a repository for the large amounts of data collected. The database, called the Mouse Phenome Database (MPD; www.jax.org/phenome), has been publicly available since 2001 (2). MPD is a grant-supported effort with three full-time staff members headquartered at The Jackson Laboratory (JAX), a non-profit biomedical research institute with a focus on the mouse as a model for understanding human biology and disease (http://www.jax.org).

The Mouse Phenome Project promotes and facilitates strain surveys that follow a set of recommendations proposed by members of the research community to standardize testing across laboratories and over time, and ultimately to maximize data reproducibility and value. A set of diverse inbred mouse strains was carefully chosen for systematic phenotyping to generate the building blocks of the phenome of the laboratory mouse. The Project is open to researchers with expertise in any biomedically-relevant field of study. Strain characteristics data are received from members of the scientific community and added to the MPD standardized framework, providing users a platform for data exploration, analysis and hypothesis testing. Project recommendations, priority strains and data submission guidelines are accessible through the MPD homepage.

The ability of investigators to use MPD to find causal genes and biomarkers of human disease will be significantly enhanced by the capacity to integrate human data with comprehensive information on the laboratory mouse. International efforts are underway to address integration issues for several public mouse resources holding phenotypic data (3), including Europhenome at Harwell (UK), PhenoSITE at Riken (Japan) and MPD. Discussions are in progress to coordinate data formats and reporting standards that ensure interoperability across databases. We have also been involved in the development of minimum information for mouse phenotyping procedures (MIMPP; www.interphenome.org) as part of the larger community-wide effort for minimum information for biological and biomedical investigations (MIBBI) (14) that fosters coordination of minimum information checklists such as minimum information about a microarray experiment (MIAME). These checklists ensure adequate descriptions about the biological material being tested (or used for testing) and the assays employed for measuring biological or behavioral manifestations (traits). Until community standards are in place for reporting phenotypic data, we will continue using the definitions adopted when MPD was launched in 2001 (Table 1).

Table 1.
MPD Definitions

DATA IN MPD

Our last NAR update was in 2007 (5). Most of the discussion points, figures and URLs set forth there are still current. Before presenting our recent updates, we will review some fundamental points about MPD. Every MPD project has a dataset and detailed protocols, health status and environmental parameters of the test animals, and any other information essential to understand and evaluate the data (Table 1). MPD is also a repository for protocol information where a library of procedures and assays are maintained so that others in the community may benefit from their use. Most phenotypic datasets in MPD are in strain survey format. For example, an expert in lipid metabolism participating in the Project and following Project recommendations might take readings on 10 females and 10 males of 40 strains and submit the individual animal data in a spreadsheet having one row per mouse and multiple columns for various lipid measurements. We would then annotate and format the data to meet MPD standards. Each measurement is classified and integrated in the MPD phenotype category structure. We compute summary statistics, where our unit of analysis is an MPD measurement with strain (by sex) being our analysis group (we do not combine male and female data nor do we combine data from different MPD measurements). Individual animal data and summary statistics are available for downloading as well as protocols and other metadata. To identify possible biological correlations (related phenotypes may indicate common genes or pathways), we further analyze each measurement by regression analysis with every other measurement currently in the database and store the results to support queries based on measurement correlations (see below) (2). In addition to phenotypic data, strain genotypes are collected and stored in MPD so that phenotypic and genotypic data can be juxtaposed, facilitating the ability to determine how allele-specific variations translate to differences in mouse phenotype.

Current contents

At the present time MPD contains around 1400 phenotypic measurements and ~740 million single nucleotide polymorphism (SNP) allele calls. Over 600 strains of mice are represented in MPD where phenotypic and/or genotypic data are available (most of the data are for MPD priority strains and their derivatives). Around 200 people are currently registered as principal investigators of MPD projects (phenotyping and genotyping), representing ~130 institutions in 12 countries, and supported by ~60 funding agencies and research foundations worldwide. Phenotypic measurements are from 75 investigator-contributed projects (~20 other projects are pending), with coverage in a number of important areas (summarized in Table 2). Several large phenotyping initiatives utilize MPD as the official repository for their strain survey data, including the Jackson Aging Center (Nathan Shock Center of Excellence in the Basic Biology of Aging) and the Heart, Lung, Blood and Sleep Disorders Center (NHLBI Program for Genomic Applications) (6).

Table 2.
SNAPSHOT of Selected MPD Content

Phenotypic data currently available can be classified as baseline (72%), longitudinal aging data (14%), or controlled studies of intervention effects (14%) such as administering drugs or high-fat diet, or exposure to toxins or pathogens. Each measurement contains data from multiple strains of mice with as many as 60 strains tested (the average per measurement is 20 strains). Most projects involve both sexes (84%) and use MPD priority strains (82%). The remaining 18% are special strain panels where the progenitors are often MPD priority strains. Analysis tools for phenotypic data are available in the MPD Toolbox depicted in Figure 1. To see how these tools work, see the interactive Tool Demo available through the MPD homepage (quick link: http://phenome.jax.org/phenome/trytools).

Figure 1.
MPD Toolbox. Screenshot of MPD analysis tools, grouped by function: strain profiling (identifying mouse models with specific characteristics), measurement displays, correlations, and other actions. Some of our new tools are featured elsewhere: side-by-side ...

Genomic characterization of mouse strains is currently supported in MPD by way of SNP data. Copy number variant (CNV) data will be added in the future. SNP datasets are supplied by investigators (or institutions) either directly or as freely available data downloads. The MPD SNP collection currently includes 8+ million unique genomic locations for 16 strains in our high-density merged dataset (about 3.5 SNP locations per 1 kb) and lesser amounts of SNP data for approximately 125 additional strains plus 7 recombinant inbred (RI) strain panels. Overall, there are 18 SNP data sources represented in MPD, including SNPs from Broad, Celera, Perlegen (NIEHS), Wellcome Trust, Genomics Institute of the Novartis Research Foundation (GNF), and The Jackson Laboratory. To provide maximal utility for different research applications, MPD consolidates SNPs from multiple sources based on SNP density and the complement of strains assayed. Currently there are five datasets with four degrees of SNP density (high, as defined above; to very low ~2000 SNPs per entire genome). SNP and gene annotations from external resources such as Mouse Genome Informatics (MGI; http://www.informatics.jax.org) (7), NCBI dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP), (8) and Ensembl (http://www.ensembl.org) (9) are part of the merge operation. NCBI dbSNP also provides the service of updating SNP locations when the mouse genome reference assembly is updated, and MPD mirrors these updates when they become available. MPD does not store flanking sequences or other lower-level trace information, but we maintain links to NCBI dbSNP and other resources holding this data. MPD SNP tools for retrieval and analysis are illustrated in Figure 2.

Figure 2.
MPD SNP interface tools for retrieval and filtering SNPs. SNPs may be retrieved by gene symbol or genomic location (left panel), or by more complex criteria. A SNP wizard (top right) has been added to assist users, showing possible options for each retrieval ...

New phenotype strain survey data and functionality

Coat color has been a classic model for many studies in mouse genetics. Since our last NAR update, photographs of 60 strains have been made publicly available (Figure 3), with many strains having a composite of up to four different photos. In addition to coat color, there are new postings of quantitative measurements that can be classified as baseline strain surveys, longitudinal aging data and controlled intervention studies. New data highlights include studies of bone density, chemically-induced tumorigenesis, assisted reproduction, anxiety and exploratory behavior, vision and eye morphology (for example, see Figure 4). In addition to inbred strains, data have been added for chromosome substitution panels and an eight-way F1 cross panel (see a list of selected projects and participants released since our last NAR update in Table 3). Several phenotype analysis tools have been improved or developed for better visualization and pattern recognition (see Figure 5 for examples and details). Of particular note is a new tool that helps users link phenotype and genotype (see ‘Find Genomic Regions’ below).

Figure 3.
Mouse strain coat color and appearance. Sixty strains have been professionally photographed under standardized conditions (lighting, background, etc.). Four strains are shown here to illustrate the wide range of phenotypes found in laboratory strains ...
Figure 4.
Retinal degeneration. Forty inbred strains were examined for eye abnormalities (retina, cornea, lens, iris). Twenty-five percent of the strains exhibit retinal degeneration by 6–7 weeks of age. This study underscores the importance of using strain ...
Figure 5.
New phenotype tools for strain profiling and identifying important new mouse models for research. The Jackson Aging Center is in the process of testing 32 inbred strains for a wide variety of phenotypic traits at 6, 12, 18 and 24 months of age. A new ...
Table 3.
SNAPSHOT of selected MPD projects added since last NAR reporting

The number of MPD measurements has grown substantially, and we do not expect this trend to wane. To improve browsing and search capabilities, we have refined our measurement classification scheme to present measurement listings in a more compact and readable way by grouping measurements with common metadata, for example measurements in a time or dose series are grouped together conserving space and eliminating the redundancy of repeated text (see example in Figure 6). In addition, we have split out ‘intervention’ and ‘age’ from the category hierarchy which simplifies the classification scheme further and makes it easier on the eye to browse lists of measurements. In some situations, listing measurements without groupings is helpful, so we have retained this option for users (see Figure 6 comparing these options).

Figure 6.
MPD measurement categories and using metadata to organize displays. When new MPD measurements are accessioned, they are classified based on the trait measured and experimental context. In this example, when a set of data containing three triglyceride ...

New genotype (SNP) data and functionality

We have made various incremental improvements to the MPD SNP interface such as adding a SNP wizard interface and offering more flexible polymorphism filtering options. New SNP data from several sources have been added recently, including a 12 000-location set for 43 strains (Merck-Rosetta) (11) and mitochondrial characterization of 22 strains (University of Porto, Portugal) (12). The largest new addition is a dataset from the Center for Genome Dynamics (CGD; http://cgd.jax.org) containing a mixture of actual and mathematically imputed allele calls, covering 7.8+ million genomic locations for 74 strains, built by merging data from a number of public data sources and then applying a hidden Markov model algorithm to impute calls that are missing, and attaching a confidence level probability value to each imputed call (13). After importing and processing this dataset, we found that 78% of the SNPs are imputed, and of those, 72% have a confidence level of 0.9 or higher, while 86% have a confidence level of 0.6 or higher. MPD supports queries on this imputed dataset where data are listed based on a specified minimum confidence level threshold, for example ‘show only actual calls’ or ‘show only imputed calls with confidence level of 0.9 or higher’ (the right panel of Figure 2 shows this option).

A new exploratory SNP-based tool (called ‘Find Genomic Regions’) has been developed based on the concept of identity by decent (IBD) whereby two strains or strain sets can be compared across the entire mouse genome, to find regions where the two strain sets differ the most. We make the assumption that phenotypic differences reflect genotypic differences and that differences in a causative element (gene or regulatory region) are present in ancestral variation and are not due to recent mutations. This tool is based on SNP data from several large datasets (Perlegen, Broad, Celera) which together cover 8+ million genomic locations for 16 strains (14–16). This tool can be used in concert with strain survey data to locate genomic regions that may have an effect on a given phenotype (see example in Figure 7). This tool operates not by tabulating individual SNP locations (which would take much too long for a web-based tool) but rather by scanning an intermediate file that has been produced in advance, containing tabulations of strain differences for successive 50 kb windows.

Figure 7.
Find Genomic Regions. This new tool is based on the concept of identity by decent (IBD) regarding ancestral inheritance in inbred strains of mice, and on the assumption that phenotypic differences reflect genotypic differences. Therefore, finding regions ...

New QTL analysis archive

At the request of members of the research community, MPD has developed an archive of quantitative trait loci (QTL) analysis datasets. At this writing there are 23 datasets available in a variety of subject areas, many associated with projects that have also contributed inbred strain survey data to MPD. These QTL studies typically involve intercross (F2) or backcross (N2) progeny of strains in the MPD priority list. Currently these data are available in Excel spreadsheets (R/qtl format), where a spreadsheet contains phenotypic measurements for each individual in the population (usually several hundred mice) and their genotypes (typically based on Mit markers). Linkages to MPD phenotype categories are maintained to optimize search capabilities, and links to MGI are maintained for connectivity to other databases. The primary purpose of the QTL archive is to provide a public repository for these datasets so that investigators can easily find and download them for custom analyses, e.g. combined cross analysis to reduce QTL intervals to a more manageable size for subsequent gene testing and validation. We plan to add QTL analysis tools in the near future, including interactive QTL maps.

HIGH LEVEL OVERVIEW OF IMPLEMENTATION

All public access to MPD is via our web site. MPD runs on a Solaris (Unix) computer system and is implemented using an open source software platform that includes relational database, web presentation scripting, and integrated graphical data plotting components. Apache web server software serves our web pages using a CGI method, and web ‘cookies’ are utilized to manage user preferences and item collections. Some custom-written programs in the ‘C’ language are invoked for compute-intensive tasks such as computation of statistics and correlations, and for SNP data display. We have a URL interface that web site developers can use to build links to specific MPD data views (visit our web site and search on ‘URL’).

The database has 70 data tables, including 6 containing mouse biometric data, 30 for SNP data, 17 catalogs and dictionaries, and 8 of various internal and external mappings (our detailed data model can be viewed by visiting our web site and searching on ‘schema’). Data are typically contributed using Excel spreadsheets transmitted as email attachments, and all database updates are made by staff via interactive web tools or direct table updates to our development node. MPD's production node is then refreshed from the development node as needed. There is no situation where the database is directly updated by non-staff users.

AN INVITATION TO INVESTIGATORS AND FUTURE MPD DIRECTIONS

Data in all subject areas with potential relevance to translational research towards improvement of human health are of considerable importance. Although many phenotypic domains are currently represented in MPD, the acquisition of new data is open-ended with the goal of collecting data on a broader scope (and in some cases to a deeper level for phenotypes needing more granularity) as well as collecting data generated from new, more sophisticated phenotyping technologies. To expand the scope and maximize the utility of MPD, members of the global scientific community are invited to contribute their strain survey data or join us in a coordinated effort to seek funding that will support systematic strain surveys. It is this spirit of collaboration that has shaped MPD and made it an important community resource and that will continue to guide the future growth and development of MPD.

Researchers interested in contributing data to MPD or in collaborating on new phenotyping projects should contact us at phenome@jax.org. Data submission guidelines are accessible through the MPD homepage ‘How to contribute data’.

MPD provides user support through online documentation and via email (phenome/at/jax.org). PHENOME-LIST is a moderated electronic bulletin board http://phenome.jax.org/phenome/list.html. We welcome user input and suggestions. Our Suggestion Box is accessible from most every MPD page (footer). Suggestions or comments can be submitted anonymously.

CITING MPD

For general citation of MPD, this article may be used. In addition, the following citation format may be used when MPD projects are referred to or MPD datasets used: Investigator(s) name (year project posted) Project title. MPD accession number (MPD:XXX). Mouse Phenome Database Web Site, The Jackson Laboratory, Bar Harbor, Maine USA. World Wide Web (URL: http://www.jax.org/phenome, date of download or access). For more information visit our web site and search on ‘citing’.

FUNDING

The Jackson Laboratory and National Institutes of Health (HG003057, HL66611, AG025707, and MH071984). Funding for open access charge: National Institutes of Health MH071984.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank participating investigators for contributing their data for worldwide access (http://phenome.jax.org/pub-cgi/phenome/mpdcgi?rtn=projects/list). And we thank Dale Begley, Mary Dolan and Debbie Krupke for reviewing this manuscript.

Appendix

MPD data are available through projects funded by 62 funding agencies and research foundations.

NIH:

National Cancer Institute,

National Center for Research Resources

National Eye Institute

National Heart, Lung, and Blood Institute

National Institute on Aging

National Institute on Alcohol Abuse and Alcoholism

National Institute of Arthritis and Musculoskeletal and  Skin Diseases

National Institute of Child Health & Human  Development

National Institute on Deafness and other  Communication Disorders

National Institute of Dental and Craniofacial Research

National Institute of Diabetes & Digestive & Kidney  Diseases

National Institute on Drug Abuse

National Institute of Environmental Health Sciences

National Institute of General Medical Sciences

National Institute of Mental Health

National Institute of Neurological Disorders and Stroke

American Health Assistance Foundation

American Heart Association

American Liver Foundation

American Physiological Society

Andrew Mellon Foundation

AstraZeneca

Aventis

BD Biosciences

Bristol-Myers Squibb

Burroughs Wellcome Fund

Canadian Institutes for Health Research

Centre National de la Recherche Scientifique (CNRS)

Commonwealth of Pennsylvania Health Research  Formula Grant

Council for Nail Research

Department of the Army

Department of Defense

Department of Veterans Affairs

Dermatology Foundation

Deutsche Forschungsgemeinschaft

Ellison Medical Foundation

Fonds pour la Formation de Chercheurs et l'Aide a la  Recherche of Quebec

Foundation Fighting Blindness

GlaxoSmithKline

GlaxoWellcome

Hoffmann-LaRoche

Howard Hughes Medical Institute

Integrative Neuroscience Initiative on Alcoholism

The Jackson Laboratory

Japan Heart Foundation

Japanese Ministry of Education, Science, Sport, and  Culture

Knoll Pharmaceutical

The March of Dimes

Medical Research Council of Canada

Merck Genome Research Institute

Millennium Pharmaceuticals

Ministere de la Recherche et de la Technologie

National Alopecia Areata Foundation

National Health and Medical Research Council of  Australia

National Science Foundation

Natural Sciences and Engineering Research Council of  Canada (NSERC)

Novartis

Pfizer

SD Betchel Foundation

Thyssen Stiftung and the Hebrew University Center for  Research on Pain

Wellcome Trust Center for Human Genetics

The Zaffaroni Foundation

REFERENCES

1. Bogue M. Mouse Phenome Project: understanding human biology through mouse genetics and genomics. J. Appl. Physiol. 2003;95:1335–1337. [PubMed]
2. Grubb SC, Churchill GA, Bogue MA. A collaborative database of inbred mouse strain characteristics. Bioinformatics. 2004;20:2857–2859. [PubMed]
3. Mouse Phenotype Database Integration Consortium. Hancock JM, Adams NC, Aidinis V, Blake A, Bogue M, Brown SD, Chesler EJ, Davidson D, Duran C, Eppig JT, et al. Mouse Phenotype Database Integration Consortium: integration [corrected] of mouse phenome data resources. Mamm. Genome. 2007;18:157–163. (Errata in: Mamm. Genome. 2007; 18, 815. Mamm. Genome. 2008; 19, 219–220). [PubMed]
4. Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat. Biotechnol. 2008;26:889–896. [PMC free article] [PubMed]
5. Bogue MA, Grubb SC, Maddatu TP, Bult CJ. Mouse Phenome Database (MPD) Nucleic Acids Res. 2007;35:D643–D649. [PMC free article] [PubMed]
6. Svenson KL, Von Smith R, Magnani PA, Suetin HR, Paigen B, Naggert JK, Li R, Churchill GA, Peters LL. Multiple trait measurements in 43 inbred mouse strains capture the phenotypic diversity characteristic of human populations. J. Appl. Physiol. 2007;102:2369–2378. [PubMed]
7. Bult CJ, Eppig JT, Kadin JA, Richardson JE, Blake JA. the members of the Mouse Genome Database Group. The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res. 2008;36:D724–D728. [PMC free article] [PubMed]
8. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007;36:D13–D21. [PMC free article] [PubMed]
9. Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, et al. Nucleic Acids Res. 2007;35:D610–D617. [PubMed]
10. Kikkawa Y, Miura I, Takahama S, Wakana S, Yamazaki Y, Moriwaki K, Shiroishi T, Yonekawa H. Microsatellite database for MSM/Ms and JF1/Ms, molossinus-derived inbred strains. Mamm. Genome. 2001;12:750–752. [PubMed]
11. Cervino AC, Li G, Edwards S, Zhu J, Laurie C, Tokiwa G, Lum PY, Wang S, Castellini LW, Lusis AJ, et al. Integrating QTL and high-density SNP analyses in mice to identify Insig2 as a susceptibility gene for plasma cholesterol levels. Genomics. 2005;86:505–517. [PubMed]
12. Goios A, Pereira L, Bogue M, Macaulay V, Amorim A. mtDNA phylogeny and evolution of laboratory mouse strains. Genome Res. 2007;17:293–298. [PubMed]
13. Szatkiewicz JP, Beane GL, Ding Y, Hutchins L, Pardo-Manuel de Villena F, Churchill GA. An imputed genotype resource for the laboratory mouse. Mamm. Genome. 2008;19:199–208. [PMC free article] [PubMed]
14. Frazer KA, Eskin E, Kang HM, Bogue MA, Hinds DA, Beilharz EJ, Gupta RV, Montgomery J, Morenzoni MM, Nilsen GB, et al. A sequence-based variation map of 8.27 million SNPs in inbred mouse strains. Nature. 2007;448:1050–1053. [PubMed]
15. Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GL, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, et al. A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science. 2002;296:1661–1671. [PubMed]
16. Wade CM, Daly MJ. Genetic variation in laboratory mice. Nat. Genet. 2005;37:1175–1180. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press