|Home | About | Journals | Submit | Contact Us | Français|
The Mouse Tumor Biology Database (MTB) is designed to provide an electronic data storage, search, and analysis system for information on mouse models of human cancer. MTB includes data on tumor frequency and latency, strain, germ line and somatic genetics, pathologic notations, and photomicrographs. MTB collects data from the primary literature, other public databases, and direct submissions from the scientific community. MTB is a community resource that provides integrated access to mouse tumor data from different scientific research areas and facilitates integration of molecular, genetic, and pathologic data. Current status of MTB, search capabilities, data types, and future enhancements are detailed in this article.
In the post-molecular/genomics scientific world, it is increasingly difficult for individual researchers to locate, identify, and synthesize the huge and diverse amount and diversity of scientific data that may be relevant to their research. This challenge is amplified by the advent of large-scale data generating projects, such as the human genome sequencing project,8 the ENCODE project,21 GWAS studies,14 and the promise of the $1000 genome.15 Concurrent studies in mice have produced a complete reference genome sequence,7 The Mammalian Gene Collection,20 and the RIKEN Mouse Encyclopedia sequence.12 Recent initiatives are leveraging advances in sequencing technology to deeply sequence the genomes for 17 inbred lines of mice (http://www.sanger.ac.uk/resources/mouse/genomes/) and to characterize their transcriptomes.17
Mouse has become the most common and important model system used in studying human diseases and has resulted in the generation of large amounts of data, much of it disease related. Factors contributing to the preeminence of mouse models are practical (economy of maintenance), genetic (the mouse has both an exquisite genetic map and is fully sequenced), accessible (the mouse can be studied at all life-stages, including embryonic), and the available tools to manipulate its genome are unmatched in any other mammalian system.4,5,22 Mouse models take advantage of the similarity of physiology and genetics of humans and mice, established inbred strains, and a wide array of molecular tools to generate targeted (so-called knockouts) and conditional mutations to simulate human disease states. Many of these model systems are designed to investigate cancer. An indicator of how prevalent cancer models are and how much pathological data are derived from them comes from a simple PubMed search (http://www.ncbi.nlm.nih.gov/sites/entrez) for all references containing the terms ‘mouse’, ‘cancer’, ‘human’ and ‘pathology’. 32,297 scientific articles included the search terms ranging from 2,061 published in 2000 to 6,065 published in 2009. Collating published data on the strains, mutations, nomenclature, and pathological descriptions in the increasing number of mouse model systems and interpreting these data to discover new insights and enable design of new experiments has become virtually impossible on an individual level. The Mouse Tumor Biology Database (MTB) was created to integrate tumor data generated by these models and provide the ability to query and analyze these data. As such, MTB presents a unique opportunity for pathologists to both search for and examine existing primary data, study mouse tumor genetic and epidemiological data associated with pathology records, and provide a forum for presentation of published and unpublished photomicrographs, complete with detailed annotations, to the scientific community at large.
MTB was first made available to the public in 1998.2,13 The goal of MTB is to provide a centralized electronic resource to collect and integrate the many different types of data obtained from mouse cancer models in an easily searchable database and provide analysis tools that allow users to identify existing models and facilitate the development of new models. Data include incidence and latency of mouse tumors, pathology reports and images, and strain and somatic genetics. MTB also includes cytogenetic images showing changes in tumor karyotype, Spectral Karyotyping (SKY), Comparative Genome Hybridization (CGH), Quantitative Trait Loci (QTL) data, and will soon include gene expression array data from the Gene Expression Omnibus (GEO) and the ArrayExpress Archive (ArrayExpress). All data are attributed to the original reference, a contributor citation, or the source web-site. MTB uses multiple controlled vocabularies and standardized nomenclature to allow for integrated searches of data from different sources. Searches of MTB are accomplished using web-based query forms. Each query form uses terms specific for a primary data type in MTB, such as tumor class, mouse strain, genetics, images, reference, and mouse homologs of human genes and associated data. Combined searches using terms from the strain, genetics, pathology image, and tumor search forms simultaneously are also available using the advanced search form.
MTB is updated weekly and includes curated data from the scientific literature, data submissions from cancer researchers, and data downloaded from public databases, such as Pathbase (http://www.pathbase.net/),18 and health surveillance data from production colonies at The Jackson Laboratory (JAX) and colonies of aging mice from the Jackson Aging Center. MTB is part of the Mouse Genome Informatics (MGI)9 resource and can be accessed from the MGI Website (http://www.informatics.jax.org/). The use of standard gene nomenclature, controlled vocabularies for tumor and anatomical terms, and shared database infrastructure facilitates links between MTB and other MGI databases, Mouse Genome Database (MGD),1 Gene Expression Database (GXD)19 and the Gene Ontology Database (GO).10 MTB provides links to supplementary on-line resources detailed in Table 1. Additional resources, such as Entrez Gene (http://www.ncbi.nlm.nih.gov/gene), Online Mendelian Inheritance in Man (OMIM) (http://www.ncbi.nlm.nih.gov/omim) and Ensembl (http://www.ensembl.org/), can be accessed from reference links to MGI and associated gene detail pages. MTB also maintains a list of over 200 mouse-specific antibodies available, and how to use them for immunohistochemistry with links to positive control sample/images in both HTML and Microsoft® Excel format.16
The largest portion of data in MTB, including pathologic notations and images, comes from expert curation of published scientific literature by the MTB biocurators. MTB works directly with journals to obtain permissions to incorporate images from publications into MTB. Although MTB biocurators have expertise in cancer biology and mouse genetics, they are not trained pathologists. As a result MTB staff encourages direct submission of annotated data and images from pathologists involved in cancer research. MTB currently includes such data obtained from pathologists at The Jackson Laboratory and from direct submission by cancer pathologists in the greater scientific community. Currently MTB contains 5451 tumor pathology reports containing 4187 photomicrographs submitted by 46 researchers from 32 institutions or obtained from journals. Pathology and image data are presented in MTB as Pathology Records (Fig. 1) attached to specific Tumor Frequencies and can be accessed from the tumor frequency or the originating reference. Pathology based annotations include information on organ of origin and affected organ, treatments, classification of the tumor, tumor frequency, mouse strain in which the tumor was observed or induced, and relevant genetic mutations (germ-line and somatic). Table 2 shows the current data content of MTB.
Pathology descriptions and images in MTB can be accessed using links from the tumor frequency record or by directly searching with the Pathology Image Search Form (Supplementary Data Fig. S1). For example, to search for data on ovarian hemangiomas users would first select hemangioma from the list of tumor types and then select ovary as organ affected (Fig. S1). This search returns 3 pathology reports with 10 thumbnail size images (Supplementary Data Fig. S2). Clicking on an image (arrow in Fig. S2) opens a new window displaying relevant pathology descriptions, a higher resolution version of the image, and tumor, strain, and reference details. In addition, links are provided to other MTB data associated with this photomicrograph (Fig. 2).
A web-based submission system is available to facilitate direct submission of data and images to MTB. The Pathology Submission Forms allow researchers to create records containing detailed information on strain, genetics, tumor diagnosis, treatment regimens, pathology image descriptions, and attach images to complement the descriptions. High quality images can be entered as Zoomify images (http://www.zoomify.com/). Researchers register for an ID and password to create submissions in a private MTB database space. This allows private access to their data and flexibly allows in-progress submissions to be saved and editorial access to expand or correct previously submitted data. Data can be submitted immediately, partially entered and saved to be completed at a later date, and edited after being submitted to amend or add additional data to an entry. Data are private until released by the submitter. Once submitted, data are reviewed by the MTB staff for obvious inconsistencies, and then released for public view. Large image files, for example whole slide images (WSI) from Aperio® or Hamamatsu® scanners, require uploads by FTP. MTB provides a user support link on the MTB home page that allows users to arrange for alternate data submission methods or provide feedback to curators on existing MTB data records.
The availability of MTB as a vehicle to publish pathology image and diagnostic data is a significant augmentation for the scientific community’s access to these data. Because of publication costs, scientific journals restrict the number of photomicrographs published, limiting the amount of data publicly available for comparison and interpretation. The mechanisms for querying Supplementary Data for journal articles are severely restricted and access may be limited and transient, depending on individual journal policies. Submitting unpublished data, or additions to previously published data, to MTB enables significantly more data to be made available in a format that is integrated with other published data and easily searchable. In addition, MTB offers a high degree of flexibility in the types of data that can be associated with the images.
MTB will continue to increase the quantity and types of tumor associated mouse data included in MTB and enhance the representations of mouse tumors as models for human cancer. Currently planned new developments include integration of mouse tumor relevant gene expression data from the Gene Expression Omnibus and Array Express, complex trait data from the Collaborative Cross,6 and connecting large scale human cancer genomics data with corresponding mouse cancer models. In addition, new graphical interfaces for viewing and querying tumor associated genome wide mutation data, QTLs, and sequence data are being developed.
The incredible explosion of both the volume and variety of data in science today has made the development of new data search and integration tools vitally important. MTB is designed to collect, store, and integrate a wide variety of data on mouse tumor development and human cancer models and make these data freely available to the scientific community in an easily searchable form. A key component is the inclusion of mouse pathology diagnostic descriptions and images and supporting data on tumor incidence, genetics, treatment, and frequency. In conclusion, MTB provides the most comprehensive source of mouse tumor data available, the highest level of data curation, and multiple search forms allowing data queries from many different scientific perspectives; for example, searches using gene, strain or phenotype terms. MTB also serves as a repository for data from the pathology community; for example, MTB stores images from the Jackson Aging Center and Pathbase and are working with external scientists to store details on antibodies used to study mouse cancer models. MTB enables mouse pathologists to present their tumor data to the scientific community in a way that will place these data in the wider context of genetic and molecular data. This also enables community access to key pathology data connected to tumor diagnoses and outcomes that is otherwise unavailable for scientific analysis.
Using the Pathology Image Search Form. Example shows how to search for ovarian hemangiomas, arrows indicate selected pull-down menu values.
Displays a Pathology Image Search Results for Ovarian Hemangiomas. Clicking on an image (arrow) opens an image detail page.
This work was supported in part by the National Institute of Health (CA089713, CA034196, HG000330, and RR017436) and The Ellison Medical Foundation. The authors declared that they received no financial support for their research and/or authorship of this article.
Declaration of Conflict of Interest
The authors declared that they had no conflicts of interests with respect to their authorship of this article.