|Home | About | Journals | Submit | Contact Us | Français|
CEBS (Chemical Effects in Biological Systems) is an integrated public repository for toxicogenomics data, including the study design and timeline, clinical chemistry and histopathology findings and microarray and proteomics data. CEBS contains data derived from studies of chemicals and of genetic alterations, and is compatible with clinical and environmental studies. CEBS is designed to permit the user to query the data using the study conditions, the subject responses and then, having identified an appropriate set of subjects, to move to the microarray module of CEBS to carry out gene signature and pathway analysis. Scope of CEBS: CEBS currently holds 22 studies of rats, four studies of mice and one study of Caenorhabditis elegans. CEBS can also accommodate data from studies of human subjects. Toxicogenomics studies currently in CEBS comprise over 4000 microarray hybridizations, and 75 2D gel images annotated with protein identification performed by MALDI and MS/MS. CEBS contains raw microarray data collected in accordance with MIAME guidelines and provides tools for data selection, pre-processing and analysis resulting in annotated lists of genes of interest. Additionally, clinical chemistry and histopathology findings from over 1500 animals are included in CEBS. CEBS/BID: The BID (Biomedical Investigation Database) is another component of the CEBS system. BID is a relational database used to load and curate study data prior to export to CEBS, in addition to capturing and displaying novel data types such as PCR data, or additional fields of interest, including those defined by the HESI Toxicogenomics Committee (in preparation). BID has been shared with Health Canada and the US Environmental Protection Agency. CEBS is available at http://cebs.niehs.nih.gov. BID can be accessed via the user interface from https://dir-apps.niehs.nih.gov/arc/. Requests for a copy of BID and for depositing data into CEBS or BID are available at http://www.niehs.nih.gov/cebs-df/.
CEBS (Chemical Effects in Biological Systems) is a public repository for toxicogenomics data developed by the National Center for Toxicogenomics (NCT) within the National Institute of Environmental Health Science (NIEHS). Development of CEBS began in 2002 (1) and focused first on capture of microarray and proteomics data. The CEBS SysBio Object Model (2), based on MIAME (3) and MIAPE Standard (4), was used for this portion of the development of CEBS. CEBS1 was released in August 2003, followed by the start of development of CEBS2. The aim of the second stage of CEBS development was to integrate study design and toxicological assay data with the’omics data captured in CEBS. Thus, the CEBS SysTox Object Model (5) and the CEBS Data Dictionary (CEBS-DD) (6) were developed to permit accurate management of study data. CEBS2 was released in November 2006.
As of July 2007, there are 27 toxicogenomics studies in CEBS. A ‘study’ refers to an observational or perturbational experiment carried out over a defined timeline to understand a biological system, address a scientific question and/or to generate hypotheses. Of the 27 studies, 22 are of rat, 4 are of mouse and 1 is of Caenorhabditis elegans. Twenty-six of the studies have associated microarray data, one has proteomics data. Companies which have published data in CEBS include Iconix Biosciences (http://www.iconixpharm.com/), Johnson & Johnson (7,8), Pfizer Inc. (9) and Sankyo Co., Ltd (10). Other data in CEBS have been submitted by researchers at the National Cancer Institute (11,12), the University of Tennessee (13–15), the University College of London (16,17), the HESI Toxicogenomics Committee (18,19) and the Toxicogenomics Research Consortium (submitted). Additional data have been deposited in CEBS from in-house studies carried out at the NIEHS (20) and at the National Toxicology Program (NTP) (21,22).
CEBS can store data from studies of laboratory animals, cultured cells or humans. Most studies in CEBS contain observations or measurements made of the study subjects and of specimens such as blood or tissue sections derived from these subjects. The objective of CEBS is to permit the user to integrate various data types and studies. The CEBS user can select groups of subjects drawn from different studies, based on subject responses or study conditions. Once the subjects are selected, any associated microarray data can be analyzed to produce lists of annotated genes that can shed light on the biological and toxicological processes occurring in the subjects.
CEBS is the first public repository designed to integrate toxicological, histopathological and other biological measures with’omics data. A number of other databases, for instance the Gene Expression Omnibus (GEO) (23,24), capture microarray data and information about the sample treatment. The ArrayExpress database (25) captures observations and measures taken on the study subject concurrently with preparation of the tissue for microarray analysis (26). A distinguishing feature of CEBS is that the data captured from toxicogenomics studies includes observations made of the subject throughout the study timeline, potentially both before and after a specimen was taken for toxicological, histopathological or other biological analysis. Since the descriptions of the protocols used in the study and associated analyses are captured using controlled vocabularies rather than in free text form, these data are available for effective filtering and query. These protocols, measures and temporal events are useful in anchoring the transcriptomics or proteomics profile displayed by the specimen within the time- and dose-dependent biological responses seen in the study. Thus CEBS supports phenotypic anchoring (27–31), defined as the linking of microarray or proteomics data with a pathophysiological phenotype.
CEBS includes both microarray and proteomics data. Microarray data in CEBS includes 965 hybridizations to Affymetrix arrays, 924 to Agilent arrays and 1810 to custom format microarrays. Data can be combined within a given microarray platform for cross-study analysis in CEBS. Thus, data from rats exposed to any of a number of different chemical agents can be selected using a CEBS query tool and the microarray data can be compared to identify differentially responsive gene products. The proteomics data include downloadable 2D gel images, and MALDI and MS/MS spectra used to identify peptide spots from the gels. Intensity levels of both identified and unidentified spots are also available in CEBS, and can be browsed for association with time- and dose–response.
While the transcriptomics data in CEBS are captured using MIAME guidelines, at this time there is no widely accepted public standard for the exchange and capture of study design and toxicity assay data. A number of efforts are underway to create such a standard. Recently the result of a consensus about the minimal information to include was reported (32). In addition, a format for data exchange is being developed by the Standard for Exchange of Non-clinical Data (SEND) Consortium (http://www.cdisc.org/standards/index.html) and an ontology for describing a biomedical investigation, which would include a toxicology study, is under development by the OBI (Ontology for Biomedical Investigations) Working Group (https://wiki.cbil.upenn.edu/obiwiki/index.php/HomePage). CEBS will support these standards as they are developed.
Most institutions engaged in toxicology and toxicogenomics studies use in-house data repositories that are tailored to the study designs and regimens used in the institution [c.f. dbZach (33) and EDGE (34)]. In contrast, because it is a public resource, CEBS must be able to manage data from a variety of sources, reflecting a wide range of experimental organisms and study designs. Additionally, CEBS can manage data from experimental animals, from in vitro cells in culture, from human studies and from experiments with model organisms such as C. elegans. Each depositor to CEBS to date has used a different data experimental design, reflecting different means to care for the subjects, different treatment regimens, varying measurements taken and so forth.
The problems associated with managing various data streams have been addressed by the creation of BID, the Biomedical Investigation Database, which is based on the CEBS data dictionary (CEBS-DD). The CEBS-DD describes the incoming data based on alignment with public standards and proprietary data formats. At present, data submissions to CEBS are handled by collaboration between the depositor and the CEBS curation staff, because, to date, each depositor has used a different format. Information about data deposition is available at the CEBS Development Forum (http://www.niehs.nih.gov/research/resources/databases/cebs/forum).
BID is built in Oracle with a Cold Fusion interface, and, because it is used to load study data into CEBS, it contains essentially the same content as CEBS does. In addition, BID can easily be modified to contain additional data fields, as requested by users. Thus, as part of the collaboration with the HESI Toxicogenomics Committee, BID was extended to include PCR data and to capture additional fields describing subject handling during the study. Additionally, the BID interface was extended to permit query by the users of these fields. These data will be available to the public via BID once the Committee releases them. At the present time BID is a data management tool, permitting access to data and download capabilities.
The architecture of CEBS is shown in Figure 1, which displays the relationships between CEBS components: the CEBS SysBio and SysTox Object Models; the Oracle databases handling study design, assay data and metadata for microarray and proteomics data; the caBIO annotation engine developed by the National Cancer Institute's Center for Bioinformatics (NCICB). Microarray data files are stored as netCDF files within CEBS. Access to the data in CEBS is via the SysTox Browser (http://cebs.niehs.nih.gov/). CEBS was moved to the NIEHS at the end of 2006 from its developmental location at SAIC. Prior to implementing CEBS at NIEHS, a series of stress tests were run to determine whether the workloads could be supported with the infrastructure and to identify any potential bottlenecks. This involved simulating up to 100 concurrent users in various typical functional scenarios.
The CEBS user can follow various workflows within CEBS. These include: Show All Studies; Search by Study Characteristics; Search by Subject Characteristics; Browse Proteomics Data; Analyze Microarray Data Workflow; Annotate Gene List. The CEBS user can also combine aspects of these different workflows to customize their exploration and use of the data in CEBS (Figure 2). Users can combine elements of different workflows to customize their queries and use of CEBS as diagrammed in Figure 2. Additionally, users can download data and annotation in different formats at various points throughout the workflows.
‘Show All Studies Workflow’ permits the user to see a list of all studies and investigations in CEBS (Figure 3A). An investigation refers to a self-contained scientific enquiry, which can be composed of several studies. A study is, as defined earlier, an observational or perturbational experiment carried out over a defined timeline to understand a biological system, address a scientific question and/or to generate hypotheses. The data type(s) associated with each study in CEBS are shown with icons next to the title, and the user can quickly retrieve any data associated with the study using links on the page. The Study Timeline, Study Details and Study Group Grid are also accessible from this page. Study Timeline provides a graphical representation of the timeline of the Study, showing when treatment was applied, when observations and husbandry occurred, and other important evens that occurred during the Study (Figure 3B). The Study Group Grid provides a rapid overview of the study subjects, and permits the user to unambiguously identify relevant groups of biological replicates for analysis and comparison (Figure 3C).
The architecture of BID is given in Figure 4. BID uses a workflow design, similar to the original MIAME/MAGE design. The BID database dependencies are set up to have the study defined prior to the subjects, and subjects and groups defined prior to specimens, and specimens defined prior to deposition of any associated data. Similarly, the microarray data storage portion of BID has been modified from the ArrayTrack (35) schema to model a hybridization workflow, capturing the links from a biomaterial to RNA to labeled RNA to hybridization to scanned data.
The majority of the work in BID has been in the area of study design and phenotypic data (primarily clinical chemistry and histopathology). Each subject type requires a spectrum of characteristics and protocols specific to that subject type. For example, if the subject is a lab animal, then the protocols are husbandry and euthanasia, whereas if the subject is a cell culture then the protocols are culture and harvest. Subject characteristics collected might be strain, sex, age, gut microflora characterization if the subject is a lab animal, while a cell culture might be characterized by cell cycle time, number of passages, ploidy, etc. This database design allows the user to focus quickly on relevant details both in data deposition and in querying. Similarly, details of stressor characteristics and protocols are specific for chemical, genetic and environmental stressors. A recent publication describes a checklist for the minimum information needed to interpret/exchange toxicology data, and BID adheres to this standard (32). In addition, the Ontology for Biomedical Investigations (OBI) Working Group (https://wiki.cbil.upenn.edu/obiwiki/index.php/HomePage) is developing an ontology to be used to annotate a biomedical investigation, as would be done for automatic depositions of data into CEBS.
CEBS2 and BID are released, and can be accessed at http://cebs.niehs.nih.gov and https://dir-apps.niehs.nih.gov/arc/. Going forward we hope to integrate the best features of CEBS2 and BID and concentrate on three new areas as we develop CEBS3: facilitating data loading and exchange, addition of novel data modules, addition of enhanced analysis. CEBS3 will be made available to the public after development and testing.
The architecture of CEBS3 will be streamlined to facilitate data entry and retrieval, and the workflow dependencies will become optional, so that data from observational and genetic studies may be more easily entered.
Currently the biggest obstacle to increasing the content of CEBS is the unique formats used by each depositor, which reflects the lack of a common standard for formatting study data. There are efforts to address this need, led by the OBI working group and the CDISC/SEND projects. Once the data fields and format are addressed, then a minimum checklist for a study must be developed. Towards this, the MIBBI (Minimal Information about Biological and Biomedical Investigations) group has been formed, and a recent publication describing a checklist for a toxicology study written (32). Exchange formats for microarray data include the SOFT format (36), MAGE-tab (37) and MAGE-ML (38). BIO-tab is under development at the EBI to exchange functional genomics data (http://nebc.nox.ac.uk/workshops/nebc_ebi/sansonne_biomap_info.pdf). We are developing a format for study data termed SIFT (Simple Investigation Formatted Text), and will collaborate with BIO-tab as it is developed. Ideally, applications will be written to permit a user to create a SIFT file for a study and associated data, verify the format, validate the contents and then transfer to NIEHS for automated loading into CEBS3, and rely on current microarray data formats for associated ’omics data.
It is of interest to expand the capability of CEBS to be searched on the basis of chemical information, and towards this end we are collaborating with the NTP to permit views of short-term testing results obtained by the NTP Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM), and of high-throughput screening, and with the EPA to access a public chemical structure viewer. Additionally, we hope to use the standards developed by the MSI (Metabolomics Standards Initiative) to house experimental metabolomics data.
At the moment CEBS permits the user to identify genes with a significantly altered transcript levels, and to combine subjects from different studies if they were tested using the same microarray platform. However, the user must begin each analysis with raw data from the entire microarray. We plan to permit additional analytical tools, for example ANOVA and unsupervised pattern finding, and also store normalized data values so that the user does not need to re-analyze the array with each query. We anticipate that this will permit integration of data across microarray platform if the user chooses to do so.
CEBS is a public repository integrating data describing study timeline and design, histopathological and biological measures and ’omics data. This permits the user to anchor ’omics data in the unfolding biological response pattern captured in the study data in CEBS. Users can access CEBS either by accessing ’omics data directly, or by way of the search and query workflows, using characteristics of studies or subjects to select ’omics data. To illustrate the various options available we have posted material at the CEBS Development Forum (http://www.niehs.nih.gov/research/resources/databases/cebs/forum) and as Supplementary Data here. CEBS integrates data from a number of contributors, making it possible to integrate disparate data and develop comprehensive answers to questions posed in the database.The BID data management tool is used to house data prior to loading into CEBS, and to expand the data management capabilities of the CEBS/BID system by permitting the user to deposit novel data types and attributes. The BID user interface permits the users to access the data in BID analogously to the searching capabilities of CEBS.
We anticipate that with publication of CEBS2 that more data will be contributed, making it possible to identify an ever-increasing number of gene signatures and mechanistic pathways and networks relevant to toxicogenomics. Instructions for CEBS contributors can be found at the CEBS Development Forum (http://www.niehs.nih.gov/research/resources/databases/cebs/forum).
We are grateful for the contributions of Sarah Bittenbender, Mark Jenkins, Tong Li, Scott McCrimmon, Sumeet Muju, Larry Schuler and Rona Zhou from the Science Applications International Corporation contract to the development of CEBS. This research was supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences. Funding to pay the Open Access publication charges for this article was provided by NIEHS Division of Intramural Research.
Conflict of interest statement. None declared.