The Rat Genome Database (RGD, http://rgd.mcw.edu) was developed to provide a core resource for rat researchers combining genetic, genomic, pathway, phenotype and strain information with a focus on disease. RGD users are provided with access to structured and curated data from the molecular level through to the level of the whole organism, including the variations associated with disease phenotypes. To fully support use of the rat as a translational model for biological systems and human disease, RGD continues to curate these datasets while enhancing and developing tools to allow efficient and effective access to the data in a variety of formats including linear genome viewers, pathway diagrams and biological ontologies. To support pathophysiological analysis of data, RGD Disease Portals provide an entryway to integrated gene, QTL and strain data specific to a particular disease. In addition to tool and content development and maintenance, RGD promotes rat research and provides user education by creating and disseminating tutorials on the curated datasets, submission processes, and tools available at RGD. By curating, storing, integrating, visualizing and promoting rat data, RGD ensures that the investment made into rat genomics and genetics can be leveraged by all interested investigators.
The strength of the rat as a model organism lies in its utility in pharmacology,
biochemistry and physiology research. Data resulting from such studies is difficult
to represent in databases and the creation of user-friendly data mining tools has
proved difficult. The Rat Genome Database has developed a comprehensive ontology-based
data structure and annotation system to integrate physiological data along with
environmental and experimental factors, as well as genetic and genomic information.
RGD uses multiple ontologies to integrate complex biological information from the
molecular level to the whole organism, and to develop data mining and presentation
tools. This approach allows RGD to indicate not only the phenotypes seen in a strain
but also the specific values under each diet and atmospheric condition, as well as
gender differences. Harnessing the power of ontologies in this way allows the user
to gather and filter data in a customized fashion, so that a researcher can retrieve
all phenotype readings for which a high hypoxia is a factor. Utilizing the same data
structure for expression data, pathways and biological processes, RGD will provide
a comprehensive research platform which allows users to investigate the conditions
under which biological processes are altered and to elucidate the mechanisms of
It has been four years since the rat genome’s original publication. Five groups are working together to assemble, annotate and release the next version of the genome for this key model system. As the prevailing model for physiology, complex disease and pharmacological studies, there is an acute need for the rat’s genomic resources to keep pace with the rat’s prominence in the laboratory. In this commentary we describe the current status of the rat genome sequence and the plans for its impending ‘upgrade’. We then cover the key online resources providing access to the rat genome, including the new SNP views at Ensembl, NCBI’s RefSeq and Genes databases, UCSC’s Genome Browser and RGD’s disease portals for cardiovascular disease and obesity.
The Rat Genome Database (RGD, http://rgd.mcw.edu) is an NIH-funded project whose stated mission is ‘to collect, consolidate and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community’. In a collaboration between the Bioinformatics Research Center at the Medical College of Wisconsin, the Jackson Laboratory and the National Center for Biotechnology Information, RGD has been created to meet these stated aims. The rat is uniquely suited to its role as a model of human disease and the primary focus of RGD is to aid researchers in their study of the rat and in applying their results to studies in a wider context. In support of this we have integrated a large amount of rat genetic and genomic resources in RGD and these are constantly being expanded through ongoing literature and bulk dataset curation. RGD version 2.0, released in June 2001, includes curated data on rat genes, quantitative trait loci (QTL), microsatellite markers and rat strains used in genetic and genomic research. VCMap, a dynamic sequence-based homology tool was introduced, and allows researchers of rat, mouse and human to view mapped genes and sequences and their locations in the other two organisms, an essential tool for comparative genomics. In addition, RGD provides tools for gene prediction, radiation hybrid mapping, polymorphic marker selection and more. Future developments will include the introduction of disease-based curation expanding the curated information to cover popular disease systems studied in the rat. This will be integrated with the emerging rat genomic sequence and annotation pipelines to provide a high-quality disease-centric resource, applicable to human and mouse via comparative tools such as VCMap. RGD has a defined community outreach focus with a Visiting Scientist program and the Rat Community Forum, a web-based forum for rat researchers and others interested in using the rat as an experimental model. Thus, RGD is not only a valuable resource for those working with the rat but also for researchers in other model organisms wishing to harness the existing genetic and physiological data available in the rat to complement their own work.
Simple modular architecture research tool (SMART) is an online tool (http://smart.embl.de/) for the identification and annotation of protein domains. It provides a user-friendly platform for the exploration and comparative study of domain architectures in both proteins and genes. The current release of SMART contains manually curated models for 784 protein domains. Recent developments were focused on further data integration and improving user friendliness. The underlying protein database based on completely sequenced genomes was greatly expanded and now includes 630 species, compared to 191 in the previous release. As an initial step towards integrating information on biological pathways into SMART, our domain annotations were extended with data on metabolic pathways and links to several pathways resources. The interaction network view was completely redesigned and is now available for more than 2 million proteins. In addition to the standard web access to the database, users can now query SMART using distributed annotation system (DAS) or through a simple object access protocol (SOAP) based web service.
The Rat Genome Database (RGD, ) is one of the core resources for rat genomics and recent developments have focused on providing support for disease-based research using the rat model. Recognizing the importance of the rat as a disease model we have employed targeted curation strategies to curate genes, QTL and strain data for neurological and cardiovascular disease areas. This work has centered on rat but also includes data for mouse and human to create ‘disease portals’ that provide a unified view of the genes, QTL and strain models for these diseases across the three species. The disease curation efforts combined with normal curation activities have served to greatly increase the content of the database, particularly for biological information, including gene ontology, disease, pathway and phenotype ontology annotations. In addition to improving the features and database content, community outreach has been expanded to demonstrate how investigators can leverage the resources at RGD to facilitate their research and to elicit suggestions and needs for future developments. We have published a number of papers that provide additional information on the ontology annotations and the tools at RGD for data mining and analysis to better enable researchers to fully utilize the database.
The Rat Genome Database (RGD) (http://rgd.mcw.edu) aims to meet the needs of its community by providing genetic and genomic infrastructure while also annotating the strengths of rat research: biochemistry, nutrition, pharmacology and physiology. Here, we report on RGD's development towards creating a phenome database. Recent developments can be categorized into three groups. (i) Improved data collection and integration to match increased volume and biological scope of research. (ii) Knowledge representation augmented by the implementation of a new ontology and annotation system. (iii) The addition of quantitative trait loci data, from rat, mouse and human to our advanced comparative genomics tools, as well as the creation of new, and enhancement of existing, tools to enable users to efficiently browse and survey research data. The emphasis is on helping researchers find genes responsible for disease through the use of rat models. These improvements, combined with the genomic sequence of the rat, have led to a successful year at RGD with over two million page accesses that represent an over 4-fold increase in a year. Future plans call for increased annotation of biological information on the rat elucidated through its use as a model for human pathobiology. The continued development of toolsets will facilitate integration of these data into the context of rat genomic sequence, as well as allow comparisons of biological and genomic data with the human genomic sequence and of an increasing number of organisms.
The set of interacting molecules collectively referred to as a pathway or network represents a fundamental structural unit, the building block of the larger, highly integrated networks of biological systems. The scientific community's interest in understanding the fine details of how pathways work, communicate with each other and synergize, and how alterations in one or several pathways may converge into a disease phenotype, places heightened demands on pathway data and information providers. To meet such demands, the Rat Genome Database [(RGD) http://rgd.mcw.edu] has adopted a multitiered approach to pathway data acquisition and presentation. Resources and tools are continuously added or expanded to offer more comprehensive pathway data sets as well as enhanced pathway data manipulation, exploration and visualization capabilities. At RGD, users can easily identify genes in pathways, see how pathways relate to each other and visualize pathways in a dynamic and integrated manner. They can access these and other components from several entry points and effortlessly navigate between them and they can download the data of interest. The Pathway Portal resources at RGD are presented, and future directions are discussed.
Database URL: http://rgd.mcw.edu
The importance of genetic laboratory models such as mice and rats becomes evident when there is poor understanding of the nature of human disease. Many rat models for human disease, created over the years by phenotype-driven strategies, now provide a foundation for the identification of their genetic determinants. These models are especially valuable with the emerging need for validation of genes found in genome-wide association studies for complex diseases. The manipulation of the rat genome using engineered zinc-finger nucleases now introduces a key technology for manipulating the rat genome, which is broadly applicable. The ability to generate knockout rat models using zinc-finger nuclease technology will now enable its full emergence as an exceptional physiological and genetic research model.
Publicly accessible DNA databases (genome browsers) are rapidly accelerating post-genomic research (see http://www.genome.ucsc.edu/), with integrated genomic DNA, gene structure, EST/ splicing and cross-species ortholog data. DNA databases have relatively low dimensionality; the genome is a linear code that anchors all associated data. In contrast, RNA expression and protein databases need to be able to handle very high dimensional data, with time, tissue, cell type and genes, as interrelated variables. The high dimensionality of microarray expression profile data, and the lack of a standard experimental platform have complicated the development of web-accessible databases and analytical tools. We have designed and implemented a public resource of expression profile data containing 1024 human, mouse and rat Affymetrix GeneChip expression profiles, generated in the same laboratory, and subject to the same quality and procedural controls (Public Expression Profiling Resource; PEPR). Our Oracle-based PEPR data warehouse includes a novel time series query analysis tool (SGQT), enabling dynamic generation of graphs and spreadsheets showing the action of any transcript of interest over time. In this report, we demonstrate the utility of this tool using a 27 time point, in vivo muscle regeneration series. This data warehouse and associated analysis tools provides access to multidimensional microarray data through web-based interfaces, both for download of all types of raw data for independent analysis, and also for straightforward gene-based queries. Planned implementations of PEPR will include web-based remote entry of projects adhering to quality control and standard operating procedure (QC/SOP) criteria, and automated output of alternative probe set algorithms for each project (see http://microarray.cnmcresearch.org/pgadatatable.asp).
In the last decade, significant progress has been made in expanding the scope and depth of publicly available immunological databases and online analysis resources, which have become an integral part of the repertoire of tools available to the scientific community for basic and applied research. Herein, we present a general overview of different resources and databases currently available. Because of our association with the Immune Epitope Database and Analysis Resource, this resource is reviewed in more detail. Our review includes aspects such as the development of formal ontologies and the type and breadth of analytical tools available to predict epitopes and analyze immune epitope data. A common feature of immunological databases is the requirement to host large amounts of data extracted from disparate sources. Accordingly, we discuss and review processes to curate the immunological literature, as well as examples of how the curated data can be used to generate a meta-analysis of the epitope knowledge currently available for diseases of worldwide concern, such as influenza and malaria. Finally, we review the impact of immunological databases, by analyzing their usage and citations, and by categorizing the type of citations. Taken together, the results highlight the growing impact and utility of immunological databases for the scientific community.
Database; Epitope; Epitope prediction tools; T cell; Antibody
Since several aspects of physiology in rats has evolved to be more similar to humans than that of mice, it is highly desirable to link the rat into the process of annotating the human genome with function. However, the lack of technology for generating defined mutants in the rat genome has hindered the identification of causative relationships between genes and disease phenotypes. As an important step towards this goal, an approach of establishing transposon-mediated insertional mutagenesis in rat spermatogonial stem cells was recently developed. Transposons can be viewed as natural DNA transfer vehicles that, similar to integrating viruses, are capable of efficient genomic insertion. The mobility of transposons can be controlled by conditionally providing the transposase component of the transposition reaction. Thus, a DNA of interest such as a mutagenic gene trap cassette cloned between the inverted repeat sequences of a transposon-based vector can be utilized for stable genomic insertion in a regulated and highly efficient manner. Gene trap transposons integrate into the genome in a random fashion, and those mutagenic insertions that occurred in expressed genes can be selected in vitro based on activation of a reporter. Selected monoclonal as well as polyclonal libraries of gene trap clones are transplanted into the testes of recipient/founder male rats allowing passage of the mutation through the germline to F1 progeny after only a single cross with wild-type females. This paradigm enables a powerful methodological pipeline for forward genetic screens for functional gene annotation in the rat, as well as other vertebrate models. This article provides a detailed description on how to culturerat spermatogonial stem cell lines, their transfection with transposon plasmids, selection of gene trap insertions with antibiotics, transplantation of genetically modified stem cells and genotyping of knockout animals.
The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB–Genetics at Göteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided.
The laboratory mouse has become the organism of choice for discovering gene function and unravelling pathogenetic mechanisms of human diseases through the application of various functional genomic approaches. The resulting deluge of data has led to the deployment of numerous online resources and the concomitant need for formalized experimental descriptions, data standardization, database interoperability and integration, a need that has yet to be met. We present here the Mouse Resource Browser (MRB), a database of mouse databases that indexes 217 publicly available mouse resources under 22 categories and uses a standardised database description framework (the CASIMIR DDF) to provide information on their controlled vocabularies (ontologies and minimum information standards), and technical information on programmatic access and data availability. Focusing on interoperability and integration, MRB offers automatic generation of downloadable and re-distributable SOAP application-programming interfaces for resources that provide direct database access. MRB aims to provide useful information to both bench scientists, who can easily navigate and find all mouse related resources in one place, and bioinformaticians, who will be provided with interoperable resources containing data which can be mined and integrated.
Database URL: http://bioit.fleming.gr/mrb
The Mouse Genome Database (MGD) is the community database resource for the laboratory mouse, a key model organism for interpreting the human genome and for understanding human biology and disease (http://www.informatics.jax.org). MGD strives to provide a highly curated, highly integrated information resource that not only includes the consensus view of current knowledge about the mouse, but also provides comparative genomic information particularly for human and rat genomes. MGD includes extensive information about mouse genes, supporting all gene attribute assertions with experimental data, statements of evidence and citation. Detailed information about alleles and mouse mutants includes genotype, molecular variant and phenotype descriptions. Extensive collaboration with other data providers such as NCBI, RIKEN and SWISS-PROT provides standardization of gene:sequence associations and robust interconnections between large information systems based on shared sequence curation. Recent integration of large datasets of mouse full-length cDNAs and radiation-hybrid mapped ESTs, the continued development and use of extensive structured vocabularies and the expansion of the representation of phenotypes highlight this year’s developments.
The mouse has long been an important model for the study of human genetic disease. Through the application of genetic engineering and mutagenesis techniques, the number of unique mutant mouse models and the amount of phenotypic data describing them are growing exponentially. Describing phenotypes of mutant mice in a computationally useful manner that will facilitate data mining is a major challenge for bioinformatics. Here we describe a tool, the Mammalian Phenotype Ontology (MP), for classifying and organizing phenotypic information related to the mouse and other mammalian species. The MP Ontology has been applied to mouse phenotype descriptions in the Mouse Genome Informatics Database (MGI, http://www.informatics.jax.org/), the Rat Genome Database (RGD, http://rgd.mcw.edu), the Online Mendelian Inheritance in Animals (OMIA, http://omia.angis.org.au/) and elsewhere. Use of this ontology allows comparisons of data from diverse sources, can facilitate comparisons across mammalian species, assists in identifying appropriate experimental disease models, and aids in the discovery of candidate disease genes and molecular signaling pathways.
Ontology; Phenotype; Mammal; Annotation; Model System
Genome-oriented plant research delivers rapidly increasing amount of plant genome data. Comprehensive and structured information resources are required to structure and communicate genome and associated analytical data for model organisms as well as for crops. The increase in available plant genomic data enables powerful comparative analysis and integrative approaches. PlantsDB aims to provide data and information resources for individual plant species and in addition to build a platform for integrative and comparative plant genome research. PlantsDB is constituted from genome databases for Arabidopsis, Medicago, Lotus, rice, maize and tomato. Complementary data resources for cis elements, repetive elements and extensive cross-species comparisons are implemented. The PlantsDB portal can be reached at .
The Mouse Genome Database (MGD) forms the core of the Mouse Genome Informatics (MGI) system (http://www.informatics.jax.org), a model organism database resource for the laboratory mouse. MGD provides essential integration of experimental knowledge for the mouse system with information annotated from both literature and online sources. MGD curates and presents consensus and experimental data representations of genotype (sequence) through phenotype information, including highly detailed reports about genes and gene products. Primary foci of integration are through representations of relationships among genes, sequences and phenotypes. MGD collaborates with other bioinformatics groups to curate a definitive set of information about the laboratory mouse and to build and implement the data and semantic standards that are essential for comparative genome analysis. Recent improvements in MGD discussed here include the enhancement of phenotype resources, the re-development of the International Mouse Strain Resource, IMSR, the update of mammalian orthology datasets and the electronic publication of classic books in mouse genetics.
The laboratory rat (Rattus norvegicus) is an important model for human disease, and is extensively used for studying complex traits for example in the physiological and pharmacological fields. To facilitate genetic studies like QTL mapping, genetic makers that can be easily typed, like SNPs, are essential.
A genome-wide set of 820 SNP assays was designed for the KASPar genotyping platform, which uses a technique based on allele specific oligo extension and energy transfer-based detection. SNPs were chosen to be equally spread along all chromosomes except Y and to be polymorphic between Brown Norway and SS or Wistar rat strains based on data from the rat HapMap EU project. This panel was tested on 38 rats of 34 different strains and 3 wild rats to determine the level of polymorphism and to generate a phylogenetic network to show their genetic relationships. As a proof of principle we used this panel to map an obesity trait in Zucker rats and confirmed significant linkage (LOD 122) to chromosome 5: 119–129 Mb, where the leptin receptor gene (Lepr) is located (chr5: 122 Mb).
We provide a fast and cost-effective platform for genome-wide SNP typing, which can be used for first-pass genetic mapping and association studies in a wide variety of rat strains.
The Mouse Genome Database (MGD) integrates genetic and genomic data for the mouse in order to facilitate the use of the mouse as a model system for understanding human biology and disease processes. A core component of the MGD effort is the acquisition and integration of genomic, genetic, functional and phenotypic information about mouse genes and gene products. MGD works within the broader bioinformatics community to define referential and semantic standards to facilitate data exchange between resources including the incorporation of information from the biomedical literature. MGD is also a platform for computational assessment of integrated biological data with the goal of identifying candidate genes associated with complex phenotypes. MGD is web accessible at . Recent improvements in MGD described here include the incorporation of an interactive genome browser, the enhancement of phenotype resources and the further development of functional annotation resources.
Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided.
It has become increasingly evident that the descriptions of many complex diseases are only possible by taking into account multiple influences at different physiological scales. To do this with computational models often requires the integration of several models that have overlapping scales (genes to molecules, molecules to cells, cells to tissues). The Virtual Physiological Rat (VPR) Project, a National Institute of General Medical Sciences (NIGMS) funded National Center of Systems Biology, is tasked with mechanistically describing several complex diseases and is therefore identifying methods to facilitate the process of model integration across physiological scales. In addition, the VPR has a considerable experimental component and the resultant data must be integrated into these composite multiscale models and made available to the research community. A perspective of the current state of the art in model integration and sharing along with archiving of experimental data will be presented here in the context of multiscale physiological models. It was found that current ontological, model and data repository resources and integrative software tools are sufficient to create composite models from separate existing models and the example composite model developed here exhibits emergent behavior not predicted by the separate models.
Semantic annotation; Model merging; Model repositories; Biomedical ontologies; Data dissemination; Model sharing; Mechanistic physiological models; Virtual Physiological Rat
EcoliWiki is the community annotation component of the PortEco (http://porteco.org; formerly EcoliHub) project, an online data resource that integrates information on laboratory strains of Escherichia coli, its phages, plasmids and mobile genetic elements. As one of the early adopters of the wiki approach to model organism databases, EcoliWiki was designed to not only facilitate community-driven sharing of biological knowledge about E. coli as a model organism, but also to be interoperable with other data resources. EcoliWiki content currently covers genes from five laboratory E. coli strains, 21 bacteriophage genomes, F plasmid and eight transposons. EcoliWiki integrates the Mediawiki wiki platform with other open-source software tools and in-house software development to extend how wikis can be used for model organism databases. EcoliWiki can be accessed online at http://ecoliwiki.net.
Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks.
PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database.
PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world.
Background: There is an increasing need to integrate phenotype measurement data across studies for both human studies and those involving model organisms. Current practices allow researchers to access only those data involved in a single experiment or multiple experiments utilizing the same protocol. Results: Three ontologies were created: Clinical Measurement Ontology, Measurement Method Ontology and Experimental Condition Ontology. These ontologies provided the framework for integration of rat phenotype data from multiple studies into a single resource as well as facilitated data integration from multiple human epidemiological studies into a centralized repository. Conclusion: An ontology based framework for phenotype measurement data affords the ability to successfully integrate vital phenotype data into critical resources, regardless of underlying technological structures allowing the user to easily query and retrieve data from multiple studies.