|Home | About | Journals | Submit | Contact Us | Français|
The Mouse Phenome Database (MPD; http://www.jax.org/phenome) is an open source, web-based repository of phenotypic and genotypic data on commonly used and genetically diverse inbred strains of mice and their derivatives. MPD is also a facility for query, analysis and in silico hypothesis testing. Currently MPD contains about 1400 phenotypic measurements contributed by research teams worldwide, including phenotypes relevant to human health such as cancer susceptibility, aging, obesity, susceptibility to infectious diseases, atherosclerosis, blood disorders and neurosensory disorders. Electronic access to centralized strain data enables investigators to select optimal strains for many systems-based research applications, including physiological studies, drug and toxicology testing, modeling disease processes and complex trait analysis. The ability to select strains for specific research applications by accessing existing phenotype data can bypass the need to (re)characterize strains, precluding major investments of time and resources. This functionality, in turn, accelerates research and leverages existing community resources. Since our last NAR reporting in 2007, MPD has added more community-contributed data covering more phenotypic domains and implemented several new tools and features, including a new interactive Tool Demo available through the MPD homepage (quick link: http://phenome.jax.org/phenome/trytools).
The laboratory mouse is an invaluable model organism for investigating the genetic basis of human disease. Studies have demonstrated the efficacy of comparative mouse–human genomics to identify novel mechanisms of human disease progression, underscoring the need to make mouse strain data widely available for community access. Using inbred strain data for integrative studies leverages their fixed genotypes and expands their utility to determine molecular relationships between disease and associated risk factors.
The Mouse Phenome Project was launched as an international collaboration to complement the mouse genome sequencing effort and provide a research resource and integral tool for complex trait analysis (1). This powerful approach, termed phenomics, captures complexities of entire biological pathways that are not accessible through conventional approaches. A central database was built to support the Project and provide a repository for the large amounts of data collected. The database, called the Mouse Phenome Database (MPD; www.jax.org/phenome), has been publicly available since 2001 (2). MPD is a grant-supported effort with three full-time staff members headquartered at The Jackson Laboratory (JAX), a non-profit biomedical research institute with a focus on the mouse as a model for understanding human biology and disease (http://www.jax.org).
The Mouse Phenome Project promotes and facilitates strain surveys that follow a set of recommendations proposed by members of the research community to standardize testing across laboratories and over time, and ultimately to maximize data reproducibility and value. A set of diverse inbred mouse strains was carefully chosen for systematic phenotyping to generate the building blocks of the phenome of the laboratory mouse. The Project is open to researchers with expertise in any biomedically-relevant field of study. Strain characteristics data are received from members of the scientific community and added to the MPD standardized framework, providing users a platform for data exploration, analysis and hypothesis testing. Project recommendations, priority strains and data submission guidelines are accessible through the MPD homepage.
The ability of investigators to use MPD to find causal genes and biomarkers of human disease will be significantly enhanced by the capacity to integrate human data with comprehensive information on the laboratory mouse. International efforts are underway to address integration issues for several public mouse resources holding phenotypic data (3), including Europhenome at Harwell (UK), PhenoSITE at Riken (Japan) and MPD. Discussions are in progress to coordinate data formats and reporting standards that ensure interoperability across databases. We have also been involved in the development of minimum information for mouse phenotyping procedures (MIMPP; www.interphenome.org) as part of the larger community-wide effort for minimum information for biological and biomedical investigations (MIBBI) (14) that fosters coordination of minimum information checklists such as minimum information about a microarray experiment (MIAME). These checklists ensure adequate descriptions about the biological material being tested (or used for testing) and the assays employed for measuring biological or behavioral manifestations (traits). Until community standards are in place for reporting phenotypic data, we will continue using the definitions adopted when MPD was launched in 2001 (Table 1).
Our last NAR update was in 2007 (5). Most of the discussion points, figures and URLs set forth there are still current. Before presenting our recent updates, we will review some fundamental points about MPD. Every MPD project has a dataset and detailed protocols, health status and environmental parameters of the test animals, and any other information essential to understand and evaluate the data (Table 1). MPD is also a repository for protocol information where a library of procedures and assays are maintained so that others in the community may benefit from their use. Most phenotypic datasets in MPD are in strain survey format. For example, an expert in lipid metabolism participating in the Project and following Project recommendations might take readings on 10 females and 10 males of 40 strains and submit the individual animal data in a spreadsheet having one row per mouse and multiple columns for various lipid measurements. We would then annotate and format the data to meet MPD standards. Each measurement is classified and integrated in the MPD phenotype category structure. We compute summary statistics, where our unit of analysis is an MPD measurement with strain (by sex) being our analysis group (we do not combine male and female data nor do we combine data from different MPD measurements). Individual animal data and summary statistics are available for downloading as well as protocols and other metadata. To identify possible biological correlations (related phenotypes may indicate common genes or pathways), we further analyze each measurement by regression analysis with every other measurement currently in the database and store the results to support queries based on measurement correlations (see below) (2). In addition to phenotypic data, strain genotypes are collected and stored in MPD so that phenotypic and genotypic data can be juxtaposed, facilitating the ability to determine how allele-specific variations translate to differences in mouse phenotype.
At the present time MPD contains around 1400 phenotypic measurements and ~740 million single nucleotide polymorphism (SNP) allele calls. Over 600 strains of mice are represented in MPD where phenotypic and/or genotypic data are available (most of the data are for MPD priority strains and their derivatives). Around 200 people are currently registered as principal investigators of MPD projects (phenotyping and genotyping), representing ~130 institutions in 12 countries, and supported by ~60 funding agencies and research foundations worldwide. Phenotypic measurements are from 75 investigator-contributed projects (~20 other projects are pending), with coverage in a number of important areas (summarized in Table 2). Several large phenotyping initiatives utilize MPD as the official repository for their strain survey data, including the Jackson Aging Center (Nathan Shock Center of Excellence in the Basic Biology of Aging) and the Heart, Lung, Blood and Sleep Disorders Center (NHLBI Program for Genomic Applications) (6).
Phenotypic data currently available can be classified as baseline (72%), longitudinal aging data (14%), or controlled studies of intervention effects (14%) such as administering drugs or high-fat diet, or exposure to toxins or pathogens. Each measurement contains data from multiple strains of mice with as many as 60 strains tested (the average per measurement is 20 strains). Most projects involve both sexes (84%) and use MPD priority strains (82%). The remaining 18% are special strain panels where the progenitors are often MPD priority strains. Analysis tools for phenotypic data are available in the MPD Toolbox depicted in Figure 1. To see how these tools work, see the interactive Tool Demo available through the MPD homepage (quick link: http://phenome.jax.org/phenome/trytools).
Genomic characterization of mouse strains is currently supported in MPD by way of SNP data. Copy number variant (CNV) data will be added in the future. SNP datasets are supplied by investigators (or institutions) either directly or as freely available data downloads. The MPD SNP collection currently includes 8+ million unique genomic locations for 16 strains in our high-density merged dataset (about 3.5 SNP locations per 1 kb) and lesser amounts of SNP data for approximately 125 additional strains plus 7 recombinant inbred (RI) strain panels. Overall, there are 18 SNP data sources represented in MPD, including SNPs from Broad, Celera, Perlegen (NIEHS), Wellcome Trust, Genomics Institute of the Novartis Research Foundation (GNF), and The Jackson Laboratory. To provide maximal utility for different research applications, MPD consolidates SNPs from multiple sources based on SNP density and the complement of strains assayed. Currently there are five datasets with four degrees of SNP density (high, as defined above; to very low ~2000 SNPs per entire genome). SNP and gene annotations from external resources such as Mouse Genome Informatics (MGI; http://www.informatics.jax.org) (7), NCBI dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP), (8) and Ensembl (http://www.ensembl.org) (9) are part of the merge operation. NCBI dbSNP also provides the service of updating SNP locations when the mouse genome reference assembly is updated, and MPD mirrors these updates when they become available. MPD does not store flanking sequences or other lower-level trace information, but we maintain links to NCBI dbSNP and other resources holding this data. MPD SNP tools for retrieval and analysis are illustrated in Figure 2.
Coat color has been a classic model for many studies in mouse genetics. Since our last NAR update, photographs of 60 strains have been made publicly available (Figure 3), with many strains having a composite of up to four different photos. In addition to coat color, there are new postings of quantitative measurements that can be classified as baseline strain surveys, longitudinal aging data and controlled intervention studies. New data highlights include studies of bone density, chemically-induced tumorigenesis, assisted reproduction, anxiety and exploratory behavior, vision and eye morphology (for example, see Figure 4). In addition to inbred strains, data have been added for chromosome substitution panels and an eight-way F1 cross panel (see a list of selected projects and participants released since our last NAR update in Table 3). Several phenotype analysis tools have been improved or developed for better visualization and pattern recognition (see Figure 5 for examples and details). Of particular note is a new tool that helps users link phenotype and genotype (see ‘Find Genomic Regions’ below).
The number of MPD measurements has grown substantially, and we do not expect this trend to wane. To improve browsing and search capabilities, we have refined our measurement classification scheme to present measurement listings in a more compact and readable way by grouping measurements with common metadata, for example measurements in a time or dose series are grouped together conserving space and eliminating the redundancy of repeated text (see example in Figure 6). In addition, we have split out ‘intervention’ and ‘age’ from the category hierarchy which simplifies the classification scheme further and makes it easier on the eye to browse lists of measurements. In some situations, listing measurements without groupings is helpful, so we have retained this option for users (see Figure 6 comparing these options).
We have made various incremental improvements to the MPD SNP interface such as adding a SNP wizard interface and offering more flexible polymorphism filtering options. New SNP data from several sources have been added recently, including a 12 000-location set for 43 strains (Merck-Rosetta) (11) and mitochondrial characterization of 22 strains (University of Porto, Portugal) (12). The largest new addition is a dataset from the Center for Genome Dynamics (CGD; http://cgd.jax.org) containing a mixture of actual and mathematically imputed allele calls, covering 7.8+ million genomic locations for 74 strains, built by merging data from a number of public data sources and then applying a hidden Markov model algorithm to impute calls that are missing, and attaching a confidence level probability value to each imputed call (13). After importing and processing this dataset, we found that 78% of the SNPs are imputed, and of those, 72% have a confidence level of 0.9 or higher, while 86% have a confidence level of 0.6 or higher. MPD supports queries on this imputed dataset where data are listed based on a specified minimum confidence level threshold, for example ‘show only actual calls’ or ‘show only imputed calls with confidence level of 0.9 or higher’ (the right panel of Figure 2 shows this option).
A new exploratory SNP-based tool (called ‘Find Genomic Regions’) has been developed based on the concept of identity by decent (IBD) whereby two strains or strain sets can be compared across the entire mouse genome, to find regions where the two strain sets differ the most. We make the assumption that phenotypic differences reflect genotypic differences and that differences in a causative element (gene or regulatory region) are present in ancestral variation and are not due to recent mutations. This tool is based on SNP data from several large datasets (Perlegen, Broad, Celera) which together cover 8+ million genomic locations for 16 strains (14–16). This tool can be used in concert with strain survey data to locate genomic regions that may have an effect on a given phenotype (see example in Figure 7). This tool operates not by tabulating individual SNP locations (which would take much too long for a web-based tool) but rather by scanning an intermediate file that has been produced in advance, containing tabulations of strain differences for successive 50 kb windows.
At the request of members of the research community, MPD has developed an archive of quantitative trait loci (QTL) analysis datasets. At this writing there are 23 datasets available in a variety of subject areas, many associated with projects that have also contributed inbred strain survey data to MPD. These QTL studies typically involve intercross (F2) or backcross (N2) progeny of strains in the MPD priority list. Currently these data are available in Excel spreadsheets (R/qtl format), where a spreadsheet contains phenotypic measurements for each individual in the population (usually several hundred mice) and their genotypes (typically based on Mit markers). Linkages to MPD phenotype categories are maintained to optimize search capabilities, and links to MGI are maintained for connectivity to other databases. The primary purpose of the QTL archive is to provide a public repository for these datasets so that investigators can easily find and download them for custom analyses, e.g. combined cross analysis to reduce QTL intervals to a more manageable size for subsequent gene testing and validation. We plan to add QTL analysis tools in the near future, including interactive QTL maps.
All public access to MPD is via our web site. MPD runs on a Solaris (Unix) computer system and is implemented using an open source software platform that includes relational database, web presentation scripting, and integrated graphical data plotting components. Apache web server software serves our web pages using a CGI method, and web ‘cookies’ are utilized to manage user preferences and item collections. Some custom-written programs in the ‘C’ language are invoked for compute-intensive tasks such as computation of statistics and correlations, and for SNP data display. We have a URL interface that web site developers can use to build links to specific MPD data views (visit our web site and search on ‘URL’).
The database has 70 data tables, including 6 containing mouse biometric data, 30 for SNP data, 17 catalogs and dictionaries, and 8 of various internal and external mappings (our detailed data model can be viewed by visiting our web site and searching on ‘schema’). Data are typically contributed using Excel spreadsheets transmitted as email attachments, and all database updates are made by staff via interactive web tools or direct table updates to our development node. MPD's production node is then refreshed from the development node as needed. There is no situation where the database is directly updated by non-staff users.
Data in all subject areas with potential relevance to translational research towards improvement of human health are of considerable importance. Although many phenotypic domains are currently represented in MPD, the acquisition of new data is open-ended with the goal of collecting data on a broader scope (and in some cases to a deeper level for phenotypes needing more granularity) as well as collecting data generated from new, more sophisticated phenotyping technologies. To expand the scope and maximize the utility of MPD, members of the global scientific community are invited to contribute their strain survey data or join us in a coordinated effort to seek funding that will support systematic strain surveys. It is this spirit of collaboration that has shaped MPD and made it an important community resource and that will continue to guide the future growth and development of MPD.
Researchers interested in contributing data to MPD or in collaborating on new phenotyping projects should contact us at email@example.com. Data submission guidelines are accessible through the MPD homepage ‘How to contribute data’.
MPD provides user support through online documentation and via email (phenome/at/jax.org). PHENOME-LIST is a moderated electronic bulletin board http://phenome.jax.org/phenome/list.html. We welcome user input and suggestions. Our Suggestion Box is accessible from most every MPD page (footer). Suggestions or comments can be submitted anonymously.
For general citation of MPD, this article may be used. In addition, the following citation format may be used when MPD projects are referred to or MPD datasets used: Investigator(s) name (year project posted) Project title. MPD accession number (MPD:XXX). Mouse Phenome Database Web Site, The Jackson Laboratory, Bar Harbor, Maine USA. World Wide Web (URL: http://www.jax.org/phenome, date of download or access). For more information visit our web site and search on ‘citing’.
The Jackson Laboratory and National Institutes of Health (HG003057, HL66611, AG025707, and MH071984). Funding for open access charge: National Institutes of Health MH071984.
Conflict of interest statement. None declared.
We thank participating investigators for contributing their data for worldwide access (http://phenome.jax.org/pub-cgi/phenome/mpdcgi?rtn=projects/list). And we thank Dale Begley, Mary Dolan and Debbie Krupke for reviewing this manuscript.
MPD data are available through projects funded by 62 funding agencies and research foundations.
National Cancer Institute,
National Center for Research Resources
National Eye Institute
National Heart, Lung, and Blood Institute
National Institute on Aging
National Institute on Alcohol Abuse and Alcoholism
National Institute of Arthritis and Musculoskeletal and Skin Diseases
National Institute of Child Health & Human Development
National Institute on Deafness and other Communication Disorders
National Institute of Dental and Craniofacial Research
National Institute of Diabetes & Digestive & Kidney Diseases
National Institute on Drug Abuse
National Institute of Environmental Health Sciences
National Institute of General Medical Sciences
National Institute of Mental Health
National Institute of Neurological Disorders and Stroke
American Health Assistance Foundation
American Heart Association
American Liver Foundation
American Physiological Society
Andrew Mellon Foundation
Burroughs Wellcome Fund
Canadian Institutes for Health Research
Centre National de la Recherche Scientifique (CNRS)
Commonwealth of Pennsylvania Health Research Formula Grant
Council for Nail Research
Department of the Army
Department of Defense
Department of Veterans Affairs
Ellison Medical Foundation
Fonds pour la Formation de Chercheurs et l'Aide a la Recherche of Quebec
Foundation Fighting Blindness
Howard Hughes Medical Institute
Integrative Neuroscience Initiative on Alcoholism
The Jackson Laboratory
Japan Heart Foundation
Japanese Ministry of Education, Science, Sport, and Culture
The March of Dimes
Medical Research Council of Canada
Merck Genome Research Institute
Ministere de la Recherche et de la Technologie
National Alopecia Areata Foundation
National Health and Medical Research Council of Australia
National Science Foundation
Natural Sciences and Engineering Research Council of Canada (NSERC)
SD Betchel Foundation
Thyssen Stiftung and the Hebrew University Center for Research on Pain
Wellcome Trust Center for Human Genetics
The Zaffaroni Foundation