Historically our ability to identify genetic variants underlying complex behavioral traits in mice has been limited by low mapping resolution of conventional mouse crosses. The newly developed Diversity Outbred (DO) population promises to deliver improved resolution that will circumvent costly fine mapping studies. The DO is derived from the same founder strains as the Collaborative Cross (CC), including three wild derived strains. Thus the DO provides more allelic diversity and greater potential for new discovery compared to crosses involving standard mouse strains. We have characterized 283 male and female DO mice using open-field, light-dark box, tail-suspension, and visual-cliff avoidance tests to generate 38 behavioral measures. We identified several quantitative trait loci (QTL) for these traits with support intervals ranging from 1 to 3 Mb in size. These intervals contain relatively few genes (ranging from 5 to 96). For a majority of QTL, using the founder allelic effects together with whole genome sequence data, we could further narrow the positional candidates. Several QTL replicate previously published loci. Novel loci were also identified for anxiety- and activity-related traits. Half of the QTLs are associated with wild-derived alleles, confirming the value to behavioral genetics of added genetic diversity in the DO. In the presence of wild-alleles we sometimes observe behaviors that are qualitatively different from the expected response. Our results demonstrate that high-precision mapping of behavioral traits can be achieved with moderate numbers of DO animals, representing a significant advance in our ability to leverage the mouse as a tool for behavioral genetics.
QTL mapping; complex trait; collaborative cross; mouse populations; fine- mapping; heterogeneous stock
Mouse genetics is a powerful approach for discovering genes and other genome features influencing human pain sensitivity. Genetic mapping studies have historically been limited by low mapping resolution of conventional mouse crosses, resulting in pain-related quantitative trait loci (QTL) spanning several megabases and containing hundreds of candidate genes. The recently developed Diversity Outbred (DO) population is derived from the same eight inbred founder strains as the Collaborative Cross, including three wild-derived strains. DO mice offer increased genetic heterozygosity and allelic diversity compared to crosses involving standard mouse strains. The high rate of recombinatorial precision afforded by DO mice makes them an ideal resource for high-resolution genetic mapping, allowing the circumvention of costly fine-mapping studies. We utilized a cohort of ~300 DO mice to map a 3.8 Mbp QTL on chromosome 8 associated with acute thermal pain sensitivity, which we have tentatively named Tpnr6. We used haplotype block partitioning to narrow Tpnr6 to a width of ~230 Kbp, reducing the number of putative candidate genes from 44 to 3. The plausibility of each candidate gene’s role in pain response was assessed using an integrative bioinformatics approach, combining data related to protein domain, biological annotation, gene expression pattern, and protein functional interaction. Our results reveal a novel, putative role for the protein-coding gene, Hydin, in thermal pain response, possibly through the gene’s role in ciliary motility in the choroid plexus–cerebrospinal fluid system of the brain. Real-time quantitative-PCR analysis showed no expression differences in Hydin transcript levels between pain-sensitive and pain-resistant mice, suggesting that Hydin may influence hot-plate behavior through other biological mechanisms.
The Mouse Genome Database (MGD) (http://www.informatics.jax.org) is the community model organism database resource for the laboratory mouse, a premier animal model for the study of genetic and genomic systems relevant to human biology and disease. MGD maintains a comprehensive catalog of genes, functional RNAs and other genome features as well as heritable phenotypes and quantitative trait loci. The genome feature catalog is generated by the integration of computational and manual genome annotations generated by NCBI, Ensembl and Vega/HAVANA. MGD curates and maintains the comprehensive listing of functional annotations for mouse genes using the Gene Ontology, and MGD curates and integrates comprehensive phenotype annotations including associations of mouse models with human diseases. Recent improvements include integration of the latest mouse genome build (GRCm38), improved access to comparative and functional annotations for mouse genes with expanded representation of comparative vertebrate genomes and new loads of phenotype data from high-throughput phenotyping projects. All MGD resources are freely available to the research community.
The Protein Ontology (PRO; http://proconsortium.org) formally defines protein entities and explicitly represents their major forms and interrelations. Protein entities represented in PRO corresponding to single amino acid chains are categorized by level of specificity into family, gene, sequence and modification metaclasses, and there is a separate metaclass for protein complexes. All metaclasses also have organism-specific derivatives. PRO complements established sequence databases such as UniProtKB, and interoperates with other biomedical and biological ontologies such as the Gene Ontology (GO). PRO relates to UniProtKB in that PRO’s organism-specific classes of proteins encoded by a specific gene correspond to entities documented in UniProtKB entries. PRO relates to the GO in that PRO’s representations of organism-specific protein complexes are subclasses of the organism-agnostic protein complex terms in the GO Cellular Component Ontology. The past few years have seen growth and changes to the PRO, as well as new points of access to the data and new applications of PRO in immunology and proteomics. Here we describe some of these developments.
The Mouse Phenome Database (MPD; phenome.jax.org) was launched in 2001 as the data coordination center for the international Mouse Phenome Project. MPD integrates quantitative phenotype, gene expression and genotype data into a common annotated framework to facilitate query and analysis. MPD contains >3500 phenotype measurements or traits relevant to human health, including cancer, aging, cardiovascular disorders, obesity, infectious disease susceptibility, blood disorders, neurosensory disorders, drug addiction and toxicity. Since our 2012 NAR report, we have added >70 new data sets, including data from Collaborative Cross lines and Diversity Outbred mice. During this time we have completely revamped our homepage, improved search and navigational aspects of the MPD application, developed several web-enabled data analysis and visualization tools, annotated phenotype data to public ontologies, developed an ontology browser and released new single nucleotide polymorphism query functionality with much higher density coverage than before. Here, we summarize recent data acquisitions and describe our latest improvements.
Autism spectrum disorders (ASD) represent a group of developmental disabilities with a strong genetic basis. The laboratory mouse is increasingly used as a model organism for ASD, and MGI, the Mouse Genome Informatics resource, is the primary model organism database for the laboratory mouse. MGI uses the Mammalian Phenotype (MP) ontology to describe mouse models of human diseases. Using bioinformatics tools including Phenologs, MouseNET, and the Ontological Discovery Environment, we tested data associated with MP terms to characterize new gene-phenotype associations related to ASD. Our integrative analysis using these tools identified numerous mouse genotypes that are likely to have previously uncharacterized autistic-like phenotypes. The genes implicated in these mouse models had considerable overlap with a set of over 300 genes recently associated with ASD due to small, rare copy number variation (Pinto D. et al, 2010). Prediction and characterization of autistic mutant mouse alleles assists researchers in studying the complex nature of ASD and provides a generalizable approach to candidate gene prioritization.
Autism spectrum disorders; phenotype ontology; mouse disease models
The laboratory mouse is the premier animal model for studying human biology because all life stages can be accessed experimentally, a completely sequenced reference genome is publicly available and there exists a myriad of genomic tools for comparative and experimental research. In the current era of genome scale, data-driven biomedical research, the integration of genetic, genomic and biological data are essential for realizing the full potential of the mouse as an experimental model. The Mouse Genome Database (MGD; http://www.informatics.jax.org), the community model organism database for the laboratory mouse, is designed to facilitate the use of the laboratory mouse as a model system for understanding human biology and disease. To achieve this goal, MGD integrates genetic and genomic data related to the functional and phenotypic characterization of mouse genes and alleles and serves as a comprehensive catalog for mouse models of human disease. Recent enhancements to MGD include the addition of human ortholog details to mouse Gene Detail pages, the inclusion of microRNA knockouts to MGD’s catalog of alleles and phenotypes, the addition of video clips to phenotype images, providing access to genotype and phenotype data associated with quantitative trait loci (QTL) and improvements to the layout and display of Gene Ontology annotations.
Integrated analyses of functional genomics data have enormous potential for identifying phenotype-associated genes. Tissue-specificity is an important aspect of many genetic diseases, reflecting the potentially different roles of proteins and pathways in diverse cell lineages. Accounting for tissue specificity in global integration of functional genomics data is challenging, as “functionality” and “functional relationships” are often not resolved for specific tissue types. We address this challenge by generating tissue-specific functional networks, which can effectively represent the diversity of protein function for more accurate identification of phenotype-associated genes in the laboratory mouse. Specifically, we created 107 tissue-specific functional relationship networks through integration of genomic data utilizing knowledge of tissue-specific gene expression patterns. Cross-network comparison revealed significantly changed genes enriched for functions related to specific tissue development. We then utilized these tissue-specific networks to predict genes associated with different phenotypes. Our results demonstrate that prediction performance is significantly improved through using the tissue-specific networks as compared to the global functional network. We used a testis-specific functional relationship network to predict genes associated with male fertility and spermatogenesis phenotypes, and experimentally confirmed one top prediction, Mbyl1. We then focused on a less-common genetic disease, ataxia, and identified candidates uniquely predicted by the cerebellum network, which are supported by both literature and experimental evidence. Our systems-level, tissue-specific scheme advances over traditional global integration and analyses and establishes a prototype to address the tissue-specific effects of genetic perturbations, diseases and drugs.
Tissue specificity is an important aspect of many genetic diseases, reflecting the potentially different roles of proteins and pathways in diverse cell lineages. We propose an effective strategy to model tissue-specific functional relationship networks in the laboratory mouse. We integrated large scale genomics datasets as well as low-throughput tissue-specific expression profiles to estimate the probability that two proteins are co-functioning in the tissue under study. These networks can accurately reflect the diversity of protein functions across different organs and tissue compartments. By computationally exploring the tissue-specific networks, we can accurately predict novel phenotype-related gene candidates. We experimentally confirmed a top candidate gene, Mybl1, to affect several male fertility phenotypes, predicted based on male-reproductive system-specific networks and we predicted candidates related to a rare genetic disease ataxia, which are supported by experimental and literature evidence. The above results demonstrate the power of modeling tissue-specific dynamics of co-functionality through computational approaches.
In 2007, the International Knockout Mouse Consortium (IKMC) made the ambitious promise to generate mutations in virtually every protein-coding gene of the mouse genome in a concerted worldwide action. Now, 5 years later, the IKMC members have developed high-throughput gene trapping and, in particular, gene-targeting pipelines and generated more than 17,400 mutant murine embryonic stem (ES) cell clones and more than 1,700 mutant mouse strains, most of them conditional. A common IKMC web portal (www.knockoutmouse.org) has been established, allowing easy access to this unparalleled biological resource. The IKMC materials considerably enhance functional gene annotation of the mammalian genome and will have a major impact on future biomedical research.
The Mouse Phenome Project was launched a decade ago to complement mouse genome sequencing efforts by promoting new phenotyping initiatives under standardized conditions and collecting the data in a central public database, the Mouse Phenome Database (MPD; http://phenome.jax.org). MPD houses a wealth of strain characteristics data to facilitate the use of the laboratory mouse in translational research for human health and disease, helping alleviate problems involving experimentation in humans that cannot be done practically or ethically. Data sets are voluntarily contributed by researchers from a variety of institutions and settings, or in some cases, retrieved by MPD staff from public sources. MPD maintains a growing collection of standardized reference data that assists investigators in selecting mouse strains for research applications; houses treatment/control data for drug studies and other interventions; offers a standardized platform for discovering genotype–phenotype relationships; and provides tools for hypothesis testing. MPD improvements and updates since our last NAR report are presented, including the addition of new tools and features to facilitate navigation and data mining as well as the acquisition of new data (phenotypic, genotypic and gene expression).
The Mouse Genome Database (MGD, http://www.informatics.jax.org) is the international community resource for integrated genetic, genomic and biological data about the laboratory mouse. Data in MGD are obtained through loads from major data providers and experimental consortia, electronic submissions from laboratories and from the biomedical literature. MGD maintains a comprehensive, unified, non-redundant catalog of mouse genome features generated by distilling gene predictions from NCBI, Ensembl and VEGA. MGD serves as the authoritative source for the nomenclature of mouse genes, mutations, alleles and strains. MGD is the primary source for evidence-supported functional annotations for mouse genes and gene products using the Gene Ontology (GO). MGD provides full annotation of phenotypes and human disease associations for mouse models (genotypes) using terms from the Mammalian Phenotype Ontology and disease names from the Online Mendelian Inheritance in Man (OMIM) resource. MGD is freely accessible online through our website, where users can browse and search interactively, access data in bulk using Batch Query or BioMart, download data files or use our web services Application Programming Interface (API). Improvements to MGD include expanded genome feature classifications, inclusion of new mutant allele sets and phenotype associations and extensions of GO to include new relationships and a new stream of annotations via phylogenetic-based approaches.
Representing species-specific proteins and protein complexes in ontologies that are both human- and machine-readable facilitates the retrieval, analysis, and interpretation of genome-scale data sets. Although existing protin-centric informatics resources provide the biomedical research community with well-curated compendia of protein sequence and structure, these resources lack formal ontological representations of the relationships among the proteins themselves. The Protein Ontology (PRO) Consortium is filling this informatics resource gap by developing ontological representations and relationships among proteins and their variants and modified forms. Because proteins are often functional only as members of stable protein complexes, the PRO Consortium, in collaboration with existing protein and pathway databases, has launched a new initiative to implement logical and consistent representation of protein complexes.
We describe here how the PRO Consortium is meeting the challenge of representing species-specific protein complexes, how protein complex representation in PRO supports annotation of protein complexes and comparative biology, and how PRO is being integrated into existing community bioinformatics resources. The PRO resource is accessible at http://pir.georgetown.edu/pro/.
PRO is a unique database resource for species-specific protein complexes. PRO facilitates robust annotation of variations in composition and function contexts for protein complexes within and between species.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
The Mouse Genome Database (MGD) is the community model organism database for the laboratory mouse and the authoritative source for phenotype and functional annotations of mouse genes. MGD includes a complete catalog of mouse genes and genome features with integrated access to genetic, genomic and phenotypic information, all serving to further the use of the mouse as a model system for studying human biology and disease. MGD is a major component of the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) resource. MGD contains standardized descriptions of mouse phenotypes, associations between mouse models and human genetic diseases, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information. Data are obtained and integrated via manual curation of the biomedical literature, direct contributions from individual investigators and downloads from major informatics resource centers. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology (GO) and the Mammalian Phenotype (MP) Ontology. Major improvements to the Mouse Genome Database include comprehensive update of genetic maps, implementation of new classification terms for genome features, development of a recombinase (cre) portal and inclusion of all alleles generated by the International Knockout Mouse Consortium (IKMC).
The Protein Ontology (PRO) provides a formal, logically-based classification of specific protein classes including structured representations of protein isoforms, variants and modified forms. Initially focused on proteins found in human, mouse and Escherichia coli, PRO now includes representations of protein complexes. The PRO Consortium works in concert with the developers of other biomedical ontologies and protein knowledge bases to provide the ability to formally organize and integrate representations of precise protein forms so as to enhance accessibility to results of protein research. PRO (http://pir.georgetown.edu/pro) is part of the Open Biomedical Ontology Foundry.
The International Knockout Mouse Consortium (IKMC) aims to mutate all protein-coding genes in the mouse using a combination of gene targeting and gene trapping in mouse embryonic stem (ES) cells and to make the generated resources readily available to the research community. The IKMC database and web portal (www.knockoutmouse.org) serves as the central public web site for IKMC data and facilitates the coordination and prioritization of work within the consortium. Researchers can access up-to-date information on IKMC knockout vectors, ES cells and mice for specific genes, and follow links to the respective repositories from which corresponding IKMC products can be ordered. Researchers can also use the web site to nominate genes for targeting, or to indicate that targeting of a gene should receive high priority. The IKMC database provides data to, and features extensive interconnections with, other community databases.
General transcription factor (TFII-I) is a multi-functional transcription factor encoded by the Gtf2i gene, that has been demonstrated to regulate transcription of genes critical for development. Because of the broad range of genes regulated by TFII-I as well as its potential role in a significant neuro-developmental disorder, developing a comprehensive expression profile is critical to the study of this transcription factor. We sought to define the timing and pattern of expression of TFII-I in post-implantation embryos at a time during which many putative TFII-I target genes are expressed.
Antibodies to the N-terminus of TFII-I were used to probe embryonic mouse sections. TFII-I protein is widely expressed in the developing embryo. TFII-I is expressed throughout the period from E8-E16. However, within this period there are striking shifts in localization from cytoplasmic predominant to nuclear. TFII-I expression varies in both a spatial and temporal fashion. There is extensive expression in neural precursors at E8. This expression persists at later stages. TFII-I is expressed in developing lung, heart and gut structures. There is no evidence of isoform specific expression. Available data regarding expression patterns at both an RNA and protein level throughout development are also comprehensively reviewed.
Our immunohistochemical studies of the temporal and spatial expression patterns of TFII-I in mouse embryonic sections are consistent with the hypothesis that hemizygous deletion of GTF2I in individuals with Williams-Beuren Syndrome contributes to the distinct cognitive and physiological symptoms associated with the disorder.
The Jackson Laboratory Colony Management System (JCMS) is a software application for managing data and information related to research mouse colonies, associated biospecimens, and experimental protocols. JCMS runs directly on computers that run one of the PC Windows® operating systems, but can be accessed via web browser interfaces from any computer running a Windows, Macintosh®, or Linux® operating system. JCMS can be configured for a single user or multiple users in small- to medium-size work groups. The target audience for JCMS includes laboratory technicians, animal colony managers, and principal investigators. The application provides operational support for colony management and experimental workflows, sample and data tracking through transaction-based data entry forms, and date-driven work reports. Flexible query forms allow researchers to retrieve database records based on user-defined criteria. Recent advances in handheld computers with integrated barcode readers, middleware technologies, web browsers, and wireless networks add to the utility of JCMS by allowing real-time access to the database from any networked computer.
The Mouse Genome Database (MGD) is a major component of the Mouse Genome Informatics (MGI, http://www.informatics.jax.org/) database resource and serves as the primary community model organism database for the laboratory mouse. MGD is the authoritative source for mouse gene, allele and strain nomenclature and for phenotype and functional annotations of mouse genes. MGD contains comprehensive data and information related to mouse genes and their functions, standardized descriptions of mouse phenotypes, extensive integration of DNA and protein sequence data, normalized representation of genome and genome variant information including comparative data on mammalian genes. Data for MGD are obtained from diverse sources including manual curation of the biomedical literature and direct contributions from individual investigator’s laboratories and major informatics resource centers, such as Ensembl, UniProt and NCBI. MGD collaborates with the bioinformatics community on the development and use of biomedical ontologies such as the Gene Ontology and the Mammalian Phenotype Ontology. Recent improvements in MGD described here includes integration of mouse gene trap allele and sequence data, integration of gene targeting information from the International Knockout Mouse Consortium, deployment of an MGI Biomart, and enhancements to our batch query capability for customized data access and retrieval.
MouseCyc is a database of curated metabolic pathways for the laboratory mouse.
Linking biochemical genetic data to the reference genome for the laboratory mouse is important for comparative physiology and for developing mouse models of human biology and disease. We describe here a new database of curated metabolic pathways for the laboratory mouse called MouseCyc . MouseCyc has been integrated with genetic and genomic data for the laboratory mouse available from the Mouse Genome Informatics database and with pathway data from other organisms, including human.
The laboratory mouse has long been an important tool in the study of the biology and genetics of human cancer. With the advent of genetic engineering techniques, DNA microarray analyses, tissue arrays, and other large-scale, high-throughput data generating methods, the amount of data available for mouse models of cancer is growing exponentially. Tools to integrate, locate and visualize these data are crucial to aid researchers in their investigations. The Mouse Tumor Biology database (http://tumor.informatics.jax.org) seeks to address that need.
A finished clone-based assembly of the mouse genome reveals extensive recent sequence duplication during recent evolution and rodent-specific expansion of certain gene families. Newly assembled duplications contain protein-coding genes that are mostly involved in reproductive function.
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.
The availability of an accurate genome sequence provides the bedrock upon which modern biomedical research is based. Here we describe a high-quality assembly, Build 36, of the mouse genome. This assembly was put together by aligning overlapping individual clones representing parts of the genome, and it provides a more complete picture than previous assemblies, because it adds much rodent-specific sequence that was previously unavailable. The addition of these sequences provides insight into both the genomic architecture and the gene complement of the mouse. In particular, it highlights recent gene duplications and the expansion of certain gene families during rodent evolution. An improved understanding of the mouse genome and thus mouse biology will enhance the utility of the mouse as a model for human disease.
The Mouse Phenome Database (MPD; http://www.jax.org/phenome) is an open source, web-based repository of phenotypic and genotypic data on commonly used and genetically diverse inbred strains of mice and their derivatives. MPD is also a facility for query, analysis and in silico hypothesis testing. Currently MPD contains about 1400 phenotypic measurements contributed by research teams worldwide, including phenotypes relevant to human health such as cancer susceptibility, aging, obesity, susceptibility to infectious diseases, atherosclerosis, blood disorders and neurosensory disorders. Electronic access to centralized strain data enables investigators to select optimal strains for many systems-based research applications, including physiological studies, drug and toxicology testing, modeling disease processes and complex trait analysis. The ability to select strains for specific research applications by accessing existing phenotype data can bypass the need to (re)characterize strains, precluding major investments of time and resources. This functionality, in turn, accelerates research and leverages existing community resources. Since our last NAR reporting in 2007, MPD has added more community-contributed data covering more phenotypic domains and implemented several new tools and features, including a new interactive Tool Demo available through the MPD homepage (quick link: http://phenome.jax.org/phenome/trytools).
The Mouse Genome Database (MGD, http://www.informatics.jax.org/), integrates genetic, genomic and phenotypic information about the laboratory mouse, a primary animal model for studying human biology and disease. Information in MGD is obtained from diverse sources, including the scientific literature and external databases, such as EntrezGene, UniProt and GenBank. In addition to its extensive collection of phenotypic allele information for mouse genes that is curated from the published biomedical literature and researcher submission, MGI includes a comprehensive representation of mouse genes including sequence, functional (GO) and comparative information. MGD provides a data mining platform that enables the development of translational research hypotheses based on comparative genotype, phenotype and functional analyses. MGI can be accessed by a variety of methods including web-based search forms, a genome sequence browser and downloadable database reports. Programmatic access is available using web services. Recent improvements in MGD described here include the unified mouse gene catalog for NCBI Build 37 of the reference genome assembly, and improved representation of mouse mutants and phenotypes.