|Home | About | Journals | Submit | Contact Us | Français|
The laboratory mouse has long been an important tool in the study of the biology and genetics of human cancer. With the advent of genetic engineering techniques, DNA microarray analyses, tissue arrays, and other large-scale, high-throughput data generating methods, the amount of data available for mouse models of cancer is growing exponentially. Tools to integrate, locate and visualize these data are crucial to aid researchers in their investigations. The Mouse Tumor Biology database (http://tumor.informatics.jax.org) seeks to address that need.
It has long been acknowledged that genetics has a significant role in the initiation and progression of many forms of cancer1. In the early 1900s scientists recognized that different strains of mice varied in their susceptibilities to specific forms of cancer and that these phenotypes were tied to each strain's distinct genetic background2,3. Inbred strains of mice have served as powerful genetic tools to explore the underlying mechanisms of variable susceptibility4,5. The relatively recent emergence of genetic engineering as a means to manipulate the mouse genome has led to the development of mouse models specifically designed to provide insight into the roles of specific genes and pathways in the development and progression of cancer6,7. Early genetically engineered mouse models, such as transgenic mice that overexpressed oncogenes, proved to be an effective method for generating reproducible neoplasms, but such models did little to recapitulate the complexities of the genetics of cancer development and progression. More recent genetically engineered mouse models use techniques such as conditional, retrovirus-delivered and 'latent' mutant alleles that provide a genetic picture that is more similar to the processes observed in developing human neoplasms8–12. As data continue to be generated from models old and new at an ever-increasing rate (FIG. 1), the need for a repository for this information continues to grow as well. The Mouse Tumor Biology database (MTB) provides a publicly accessible electronic information resource for cancer researchers worldwide to both explore and contribute to this ever-expanding base of cancer biology data13–15.
The MTB database provides web-based access to primary research data on the pathobiology of cancer in genetically defined strains of laboratory mice. Other databases on the pathobiology of cancer that provide information on mouse tumours are available and comparative examples are shown in BOX 1. The MTB project was initiated in 1997 with funding from the National Cancer Institute (NCI). The design and prototype phase of the project followed an initial review of the relevant biomedical literature and interviews with cancer geneticists. An early prototype was presented to the attendees of "The Mouse in Mammary Carcinogenesis Research" conference in October 1997 at The Jackson Laboratory (TJL) and a questionnaire was circulated requesting feedback. The database was first made public online in the autumn of 1998. Since that time the database and web interface have undergone various enhancements to respond to the demands of the cancer genetics research community. Examples of enhancements include visualization tools, a pathology data image submission interface and user help documentation13–15.
The Mouse Tumor Biology (MTB) database provides integrated access to data on the pathobiology and genetics of cancers in mouse models of human disease. All data are attributed to a source, either published or personal submission. Listed below are other resources available online that offer data related to that found in MTB.
The data in the MTB database are accumulated from several sources including the peer-reviewed scientific literature, pathology reports from the TJL mouse colonies and submissions from the cancer research community. All data entered into the database are attributed to a source, either published or personal communication. Data incorporated into MTB are reviewed by scientific curators with expertise in the areas of cancer biology and mouse genetics. The literature curation process begins with the identification of appropriate publications. The entire Mouse Genome Informatics (MGI) curatorial staff routinely reviews over 150 scientific journals that are determined to regularly carry articles involving some aspect of mouse biology and genetics, screening them for relevant papers. In addition, MTB staff carry out monthly PubMed searches to identify relevant articles published in other journals not covered in the primary screening process. Each paper is reviewed by a curator and information relevant to the database is entered. The scope of the data in MTB includes all endogenous tumours in mice, both spontaneous and induced, from genetically defined strains of mice (inbred, hybrid, mutant and genetically engineered strains of mice). These data include standardized tumour names and classifications, pathology reports and images, mouse genetics, genomic changes occurring in the tumour, strain names, tumour frequency and latency, and literature citations. An important part of the curation process is ensuring that data such as genes, alleles and strains all adhere to the accepted nomenclature for those data types16 (BOX 2). This requires that curators must sometimes translate published, non-conforming terminology into synonymous approved nomenclature terms.
Adherence to official nomenclature for data types such as genes and strains is vital for clarity of communication. Without adherence to these standards, confusion may easily arise. For example, a researcher may have made a knockout of the ‘p75’ gene. Is this a knockout of nerve growth factor receptor (often referred to as p75)? Or is this a knockout of tumor necrosis factor receptor superfamily, member 1b (also often referred to as p75)? Without contextual clarification as to which gene is meant, the reader may be unable to determine which one was actually targeted.
The Mouse Tumor Biology (MTB) database adheres to the official nomenclature for genes, alleles and strains as determined by the guidelines of the International Committee on Standardized Genetic Nomenclature for Mice16. These data are synchronized with the corresponding data held in the Mouse Genome Database. To facilitate searching on alternate names, MTB has used a system to capture synonyms. For example, ‘cadherin 1’ is the official gene name for a gene commonly known as ‘E-cadherin’. Searching MTB for ‘E-cadherin’ in the ‘Gene/Marker’ field of the ‘Genetics’ search form will return a record for the cadherin 1-knockout strain annotated in MTB.
The use of controlled vocabularies in MTB contributes to the assurance of accuracy and completeness of database searches. For example, a researcher interested in mouse models of lung cancer might turn to PubMed to identify relevant publications on this topic. However, the search terms used and the writing style of different authors will directly affect the information that is returned from a literature search. A recent test search of PubMed for ‘mouse model lung tumor’ and ‘mouse model pulmonary tumor’ returned 2,372 and 1,452 results respectively. Only about 600 citations appeared in both searches. The remainder of the citations returned by one search would be missed by someone searching with the alternate terminology. This situation is avoided in MTB by the use of standardized terminology for cells, tissues and organs. Because of this standardization, searches on the term ‘lung’ will return all results for lesions of the lung regardless of the varying terminology used in published articles to describe them (‘lung tumor’, ‘pulmonary tumor’, ‘lung carcinoma’, ‘pulmonary adenoma’ and so on). MTB curators ensure that synonymous terms are entered into the database in a systematic manner.
Annotated histopathology images of tumours from mouse colonies held at TJL, including data from The Jackson Laboratory Aging Center's large-scale study of spontaneous diseases in commonly used inbred strains as they age, are also present in the database. Additionally, submission of mouse tumour data and photomicrographs of lesions are encouraged from the broader cancer research community, including visiting investigators at TJL. A custom-designed data submission tool is available on the MTB web site for contribution of such data.
The MTB database does not currently include data on tumour cell lines or human tumour xenografts. The project's focus on the importance of genetic background in the aetiology of cancer makes this database a particularly useful resource for scientists studying mouse models of human cancers and its expanding image archive appeals to pathologists. It can also serve as a powerful training tool for the next generation of cancer researchers.
The MTB web site is designed to provide users with a variety of search options that accommodate a broad range of perspectives from within the cancer research community. This design allows users to search the database using a query designed to address their own particular area of interest. The following are examples of some of the types of queries that may be asked of and answered by the database.
First-time users of MTB might not be sure where to start in their exploration of the database. In this case the ‘Quick Organ/Tissue’ search form, located on the MTB home page, provides a simple introduction to the data and the interface. This search form consists of a pick list that contains an extensive vocabulary of mouse anatomical terms. Choosing one of the terms from the list and clicking on the ‘Search’ button will launch a search for records in which the organ of origin of the tumour matches the chosen term, regardless of tumour type. For example to answer the question “Do mice get tumours of the adrenal gland?” one could first search for ‘adrenal gland’. This returns over 200 records of adrenal gland tumours in mice; some are spontaneous tumours and others are those induced following some sort of treatment (chemical, hormonal or radiation). These lesions arose in a variety of strain types (inbred, hybrid, fostered, targeted mutant, chimeric and transgenic). Data provided on the results page include tumour name, organ affected, treatment type and agents (if applicable), strain name and type, tumour frequency and metastasis information as well as links to any related images (if available). Clicking on the ‘Summary’ hyperlink for a particular tumour record will display the ‘Tumor Summary’ page, which presents additional information regarding the selected tumour. These data may include information such as reproductive status of the affected mice, tumour frequency details (colony size and number of mice affected), tumour latency, pathology notes, tumour genetics and additional notes as well as links to the appropriate references.
If a researcher is interested primarily in spontaneous tumours in inbred mice, another useful option is the Tumor Frequency Grid (FIG. 2, Tumor Frequency Grid movie in Further information). This is a display and dynamic search tool that can be accessed by clicking on the ‘Tumor Frequency Grid’ hyperlink. The current version of the grid derives from all the current data held in MTB for spontaneous tumours in inbred mice and clicking on one of the data cells will open a ‘Tumor Results’ page as described above. The grid may be particularly useful when taking genetic background into consideration in planning the construction of a new mutant strain, choosing strains for a study of complex traits or designing other experiments.
After familiarizing themselves with the results of the broad search provided by the ‘Quick Organ/Tissue Search’ option, a user may decide they would like to refine their search parameters. The ‘Tumor’ search form allows users to ask more complex questions, such as “Which strains of mice spontaneously develop adrenal gland pheochromocytomas that metastasize?”. This search may be performed by choosing ‘adrenal gland’ from the ‘Organ/Tissue of Origin’ menu as before and then adding ‘pheochromocytoma’ from the ‘Tumor Classification’ menu and ‘None (spontaneous)’ from the ‘Treatment Type’ menu. Finally, the question can be restricted further by checking the box beside ‘Restrict search to metastatic tumors only’. This search will return five records for spontaneous, metastatic pheochromocytomas, all of which arose in targeted mutant mice. As with the other ‘Tumor Results’ pages, the user can view more detailed information for each tumour record by clicking on the ‘Summary’ hyperlink for the record of interest.
Many researchers will be primarily interested in a particular strain (either inbred or mutant) rather than in a particular type of tumour. In this case the ‘Strain’ search form is a good starting point. For example, searching for records in which the ‘Strain Name’ contains ‘Kras’ and the ‘Strain Type’ is ‘targeted mutation (knockout)’ returns seven records for strains carrying a knockout allele of Kras, either singly or in combination with some other mutation, on a variety of genetic backgrounds. Clicking on one of the hyperlinked strain names brings the user to the ‘Strain Tumor Overview’ page on which are listed all the tumour types analysed in those particular mice. The default view for this page is the ‘Collapsed View’, which shows data on the tumour type, organ(s) affected treatment type, and includes a link to additional information on the tumours. There are also links to other resources citing these mice (when applicable). These resources include places where mice may be obtained, such as the JAX® Mice web site and the Mouse Models of Human Cancers Consortium's (MMHCC's) Mouse Repository as well as data resources such as the Mouse Phenome Database (MPD), the Biology of the Mammary Gland Web Site, and Festing's Listing of Inbred Strains of Mice (see URLs in Further information). Clicking on the ‘Strain Tumor Overview Expanded View’ link at the top of the page will expand the provided table to include metastasis data and specific related reference identifiers with hyperlinks to additional reference information. On either of the views, clicking on the hyperlinked numbers in the right-hand column will present the user with a ‘Tumor Summary’ page for that tumour type in those particular mice.
Researchers interested in investigating the role of a specific gene or genes in tumorigenesis can search MTB for specific genes using the ‘Genetics’ search. For example, a researcher studying Cdkn2a may want to explore all of the strains and tumours in MTB that have been reported to harbour a mutation in this gene. By choosing the ‘Genetics’ search form, entering Cdkn2a in the ‘Gene/Marker’ field, and clicking on the ‘Search’ button, the subsequent results page provides the user with data regarding mutant strains in which the mutation involves Cdkn2a (in this case there are both targeted mutant and transgenic strains) as well as data regarding tumours that have been reported to harbour a mutation in the Cdkn2a gene. These somatic mutations are changes such as point mutations, deletions and loss of heterozygosity. The strain section of the results page provides the user with information regarding the mutation type, mouse chromosome, gene symbol and name, and genotype of the mice. The hyperlinked numbers beside each genotype will provide users with a list of all the strains in MTB carrying the associated genotype. The tumour section of the results page provides the user with information regarding mouse chromosome, gene symbol and name, and the type of genetic change observed in the tumour. Hyperlinked numbers beside the type of genetic change will link the user to a page with additional information on the specific mutations as well as on the number and classification of the tumours found to carry those mutations. From there users may click on a link to view details of one or more of those tumour records.
One of the sections of the database many users find most valuable is the pathology image archive. Researchers characterizing neoplasms from a newly developed mutant strain may wish to compare the tumours they have observed with other examples of the same tumour type from other strains of mice. This type of data can be accessed using the ‘Pathology Image’ search form. For example, one might want to compare the histopathology of mammary fibroadenomas in mice. A query for ‘mammary gland’ in the ‘Organ/Tissue of Origin’ menu and ‘fibroadenoma’ in the ‘Tumor Classification’ menu returns two pathology reports. Each report lists the tumour name, organ(s) affected, treatment type and agents (if applicable), strain name, sex, reproductive status, tumour frequency, age at necropsy and a short description. Under the report summary are thumbnail images, each with its own caption, written by a pathologist. Clicking on any of the thumbnail images will present the user with an enlarged version of the image along with all the associated data relevant to that image (FIG. 3). Many of the images currently held in MTB are displayed using the dynamic Zoomify viewer (see Zoomify and Zoomify movie URLs in Further information). This viewer allows users to zoom in and out and pan around high-resolution image files so that the image can be viewed in greater detail than would be available from a handful of static JPEG images.
Images of tumours can also be added to MTB. These can be images from a published paper (if allowed by copyright) or they can be additional images collected during a study but not published. MTB has designed an online image submission system to facilitate community pathology image contribution through the inclusion of the ‘Submit Pathology Images’ link. Each mouse record in the submission system may be associated with one or more diagnoses, and each diagnosis may be associated with one or more images (FIG. 4). Submission of photomicrographs of cancer case material to MTB by the scientific community are much encouraged.
Although the search forms already discussed provide the user with multiple avenues to delve into the data in the database, many users may find it most useful to perform a combination search, one that allows the user to specify both strain-specific and tumour-specific search terms. This type of search may be done using the ‘Advanced’ search form. For example, it may be useful to a researcher to search for transgenic mice that were developed on an FVB strain background and identify those mice that develop metastatic mammary adenocarcinomas (FIG. 5). Such a search returns the first 25 records of a total of 92 that are found. It is worth noting that the user may return to the search form and choose the ‘No Limit’ option to retrieve all 92 records. The results returned from this search are formatted in the same way as results from a ‘Tumor’ or ‘Quick Organ/Tissue’ search, and links to images (where available) as well as to the additional data given on the subsequent ‘Tumor Summary’ pages are also displayed.
As many of the data in MTB are derived from the published literature, searching by reference can be a useful tool. This may be done using the ‘Reference’ search form. For example, searching for the word ‘myoepithelioma’ in the title field returns one result with a hyperlinked ID number that will access a ‘Reference Detail’ page. This page provides the user not only with details regarding the specified reference but also with a summary of the numbers of tumours, pathology images and strains that have been annotated to that reference in MTB as well as links to those data. This page also provides a link to the MGI reference page, from which users may access PubMed and, where available, download the original reference.
After achieving the initial goal of building a prototype mouse tumour bioinformatics resource, MTB subsequently expanded in several areas such as pathology-related data and interfaces, graphic displays and visualization tools. It has also been transitioned to a multi-tier software architecture using Sybase as the relational database management system. Since the public launch of MTB in October 1998, both the amount of data held in the resource and the number of researchers using it has grown at a steady rate. After its first year online MTB held 76 pathology images. This number has recently grown to over 2,600. Over a recent 6 month period the MTB web site averaged nearly 50,000 web hits per month from approximately 1,200 unique IP addresses per month. This level of hits is 11-fold greater than that received in MTB's first 6 months online.
In order to continue to address the needs and interests of the broader cancer research community, MTB will expand to include a number of new data types in the near future. The integration of Quantitative Trait Loci (QTL) data is an area of acknowledged importance. Identification of the cancer susceptibility or resistance genes underlying these QTL in mice is an area of considerable research interest as it will probably lead to identification of human homologues that are influential in human cancers.4,17–19 Plans are also underway to incorporate and display various other types of tumour-associated genomic and genetic data, including fluorescence in situ hybridization20,21, spectral karyotyping20,22, comparative genome hybridization23–25 and gene expression array data26,27. Enhanced visualization tools for a variety of data are also under development, including the addition of improved flexibility to the Tumor Frequency Grid. This updated version will allow the user to specify the strains and organs mapped to the Grid and therefore will be configurable by the user to include data on mutant as well as inbred strains. There are also plans underway to link information on the strains in MTB with the International Mouse Strain Resource (see URL in Further information)28, thereby providing users with more comprehensive information on where to obtain a particular mutant strain of mice.
In order to better address the needs and interests of the bioinformatics and computational biology community, MTB will enhance its use of ontologies. MTB currently uses several controlled vocabularies to ensure accuracy and completeness for database searches. The integration and implementation of several standard ontologies would enhance this aspect of MTB and would also facilitate programming access to the resource. Several ontologies are under consideration for integration, including the Gene Ontology29, Cell Type Ontology (see URL in Further information), Mouse Anatomical Dictionary30, Mammalian Phenotype Ontology31, MPath Ontology (Pathbase; see URL in Further information)32 and the ontologies being developed by the MMHCC.
MTB has recently contributed to the wider cancer informatics community through participation in the NCI-initiated cancer Biomedical Informatics Grid (caBIG™)33,34, an initiative to develop an integrated data network of cancer research institutes, resources and data. Members of the MTB curatorial staff collaborate with the staff of one of the caBIG™ resources, the cancer Models (caMOD; see URL in Further information) database35, to assist in their incorporation of official standard gene and mouse strain nomenclature. Additionally, much of the mouse-related data in caMOD was acquired from MTB using exported data reports. Future plans for the MTB project include the development of programmatic access to MTB through an application programming interface to facilitate seamless integration of the data from MTB with caMOD as well as other software systems and databases. The inclusion of mouse model data in caBIG™ is a key component in the area of translational research, aiding in the application of lessons learned in experimental mouse models to clinical research.
For nearly a decade MTB has served as a valuable resource for the scientific community. Curation of the ever-expanding volume of mouse cancer pathobiological data remains a priority, as does the expansion of included data types and improved visualization tools. By focusing both on expanded data coverage and improved query and visualization tools, this resource will continue to assist cancer research efforts worldwide by facilitating the selection of experimental models for cancer research, the evaluation of mouse genetic models of human cancer, the review of patterns of mutations in specific cancers and the identification of genes that are commonly mutated across a range of cancers. Comments and suggestions from the cancer research community are encouraged.
For critical reading of the manuscript we thank K. Mills and B. Tennent of The Jackson Laboratory, Bar Harbor, USA. For sample images in FIG. 4 we thank J. Ward of the National Cancer Institute. The Mouse Tumor Biology Database is supported by grant CA089713 from the National Cancer Institute. Images generated from The Jackson Aging Center projects are supported by a grant from the Ellison Medical Foundation.
DEBRA M. KRUPKE Debra Krupke received her B.A. in Biology from The Johns Hopkins University, Baltimore, Maryland, USA and did her M.S. graduate work in the laboratory of cancer biologist R. Evans at The Jackson Laboratory, Bar Harbor, Maine, USA. Since 1997 she has been working as a member of the Mouse Tumor Biology (MTB) database project, part of the Mouse Genome Informatics group at The Jackson Laboratory. She is currently a Senior Scientific Curator for MTB.
DALE A. BEGLEY Dale Begley received his B.Sc. in Biology and Ph.D. from The University of Michigan, USA studying gene regulation in the laboratory of S. Tsubota. He did his postdoctoral work at the UpJohn Company, Kalamazoo, Michigan, USA, in the laboratory of I. Abraham investigating multidrug resistance in cancer cells. He joined the Mouse Genome Informatics group at The Jackson Laboratory, Bar Harbor, Maine, USA in 1995 and is currently a Senior Scientific Curator for the Mouse Tumor Biology database project.
JOHN P. SUNDBERG John P. Sundberg is a Professor of pathology at The Jackson Laboratory, Bar Harbor, Maine, USA. He is director of the Pathology Program of The Jackson Aging Center as well as a Principal Investigator involved in developing mouse models for skin and adnexa diseases including cancer models. He has been involved with the development of the Mouse Tumor Biology database since its inception.
CAROL J. BULT Carol Bult, Ph.D., is an Associate Professor at The Jackson Laboratory, Bar Harbor, Maine, USA. Her research areas include data integration and developmental cancer genomics. She is a member of the Mouse Genome Informatics database consortium and serves as a Principal Investigator for the Mouse Genome Database, Mouse Tumor Biology Database and MouseCyc Biochemical Pathways database.
JANAN T. EPPIG Janan Eppig, Ph.D., is a Professor at The Jackson Laboratory, Bar Harbor, Maine, USA whose research interests include genome organization, model systems for human disease, bioinformatics resources and semantic standards for data integration. She is the Principal Investigator of the Mouse Genome Database and Mouse Tumor Biology Database and a member of the Mouse Genome Informatics database collaboration. She also heads the International Mouse Strain Resource.
FURTHER INFORMATION Biology of the Mammary Gland Web Site: http://mammary.nih.gov/
Cancer Biomedical Informatics Grid: https://cabig.nci.nih.gov
Cancer Genome Anatomy Project: http://cgap.nci.nih.gov/
Cancer Models: https://cancermodels.nci.nih.gov/
Cell Type Ontology: http://obofoundry.org/cgi-bin/detail.cgi?id=cell
Festing's Listing of Inbred Strains of Mice: http://www.informatics.jax.org/external/festing/search_form.cgi
International Mouse Strain Resource: http://www.findmice.org/
The Jackson Laboratory: http://www.jax.org/
JAX® Mice Web Site: http://jaxmice.jax.org/
MMHCC Mouse Repository: http://mouse.ncifcrf.gov/
Mouse Genome Informatics: http://www.informatics.jax.org
Mouse Nomenclature Home Page: http://www.informatics.jax.org/nomen/
Mouse Phenome Database: http://www.jax.org/phenome
Mouse Tumor Biology database: http://tumor.informatics.jax.org/
Tumor Frequency Grid movie: http://tumor.informatics.jax.org/media/MTB-TumorFrequencyGrid.mov
UC-Davis Center for Comparative Medicine Image Archive: http://imagearchive.compmed.ucdavis.edu/
Zoomify movie: http://tumor.informatics.jax.org/media/MTB-Zoomify.mov