ArrayTrack™ has been developed at the FDA's National Center for Toxicological Research (NCTR) over the past seven years and is a comprehensive microarray data management, analysis and interpretation system. It is currently being used in the FDA as a review tool for Voluntary Genomics Data Submission (VGDS). However, multiple factors necessitate the development of a new system based on ArrayTrack™: (a) it is based on a client-server architecture involving expensive database software (Oracle; http://www.oracle.com
); (b) it is limited to the management and analysis of data from DNA microarray experiments for humans, mouse and rat; and (c) it is not a public repository and, thus, still requires significant scale up in order to accommodate data from diverse laboratories involved in environmental health research; and (d) it was originally developed by the US FDA for regulatory purposes, and thus incorporates a set of well-established tools that comprise only a small subset of the tools required to address the broad scope and variation in data sets encountered in environmental health research. Therefore, ebTrack is being developed as an enhancement to ArrayTrack™ in order to provide a significantly wider set of tools and functionality necessary for environmental bioinformatics and quantitative risk assessment.
The design of ebTrack includes individual modules for management, analysis and interpretation of data from single or multiple experiments that span multiple biological scales (transcriptomics, proteomics, and metabonomics); these modules are designed for use independently or in combination, as desired by the end user. The modular structure allows for a gradual development, implementation, and application of ebTrack as a user-oriented system for integrated systems toxicology studies. Specifically, users of ebTrack will be able to select an analysis method, apply it to the stored "omics" data, and link the analysis results to gene, protein and pathway information for further data interpretation. Likewise, they will be able to analyze and visualize various "omics" data sets in conjunction with traditional toxicological data for enhanced interpretation of all available data through systems biology.
ebTrack is being developed as a client-server system, using the powerful, free, open source PostgreSQL database engine, and Java tools for user interface, analysis, visualization, and web-based deployment. JDBC (Java Database Connectivity) is used for querying the database, since it facilitates cross-platform deployment as well as integration with other databases, including Oracle. Furthermore, the use of Java tools in ebTrack allow for direct integration with other environmental tools such as the Modeling ENvironment for TOtal Risk studies (MENTOR [5
]), which provides an open library of computational tools for exposure and dose modeling, and the DOse Response INformation ANalysis system (DORIAN;[7
]), which is under development as a toolbox for modeling processes in the sequence from dose to adverse health outcome. Both MENTOR and DORIAN are based on Matlab, which provides direct interfaces for invoking functions written in other languages such as C, C++, FORTRAN, Java, and Perl.
The implementation of ebTrack in the context of a typical deployment involves the following modules: (a) the database module stores experiment information (e.g. gene expression microarray data or tandem mass spectrometry data) in accordance with standard protocols; (b) the analysis tools module provides tools for analysis, visualization, and knowledge discovery; and (c) the functional data module provides relevant information for data interpretation. The overall architecture and design of ebTrack is similar to that of ArrayTrack™, and the design is based on a full integration of the above modules for facilitating consistent analysis of diverse toxicogenomic data sets for environmental health risk analysis. These three major modules are described in the following:
(1) Database Module
This supports toxicogenomics data in various standard guidelines and formats such as the MIAME (Minimum Information About a Microarray Experiment [8
]) and MAGE-ML standards (Microarray Gene Expression Markup Language [9
]). This module is designed to accommodate a wide set of experimental data relevant to environmental toxicology and to facilitate easy exchange of data with other public repository such as ArrayExpress, Gene Expression Omnibus (GEO), ArrayTrack™, and Chemical Effects in Biological Systems (CEBS) [3
(2) Analysis Tools Module
In order to meet the variety of analysis needs for environmental health research, ebTrack is designed as an open architecture to incorporate diverse analysis tools from different sources. In addition to the analysis tools supported by ArrayTrack™, this module contains interfaces to public bioinformatics tools and resources such as R/Bioconductor [12
]. As an example, the Significance Analysis of Microarray (SAM [13
]), which is a widely used method for gene expression data analysis was implemented as an ebTrack module through a customized interface for running SAM in ebTrack. This design also allows straight-forward incorporation of other analysis tools in Bioconductor into ebTrack while providing a consistent user interface. This module also contains tools for exporting the data in ebTrack to an R environment for analysis utilizing user-selected custom tools in R. Furthermore, interfaces are being developed to use these tools with the MENTOR and DORIAN systems that provide modules for mechanistically modeling various processes in the source-to-dose-to-outcome continuum (see, e.g., [7
(3) Functional Data Module
Large amounts of annotation data in public domain for different organisms (e.g. human, rat, mouse, canine and zebrafish) is being compiled through local mirroring of publicly available data; the data is stored in standard formats in order to facilitate rapid analysis, and easy interoperability with other tools and databases. ebTrack also contains provisions for connecting to various proprietary databases for interpreting various "omics" studies. An Environmental Bioinformatics Knowledge Base (ebKB; http://www.ebkb.org
), a compendium of computational tools, databases, and literature information, is being developed as part of this effort by the environmental bioinformatics and Computational Toxicology Center (ebCTC; http://ccl.rutgers.edu/ebCTC/publications.html
) to support enhanced interpretation of various toxicological and biological data sets in ebTrack.
A workflow involving the important steps for the analysis of microarray data using ebTrack is presented in Figure . This approach has been applied in a case study focusing on gene expression profiles of mouse skin after a single high dose of Sulfur Mustard (SM) applied topically. SM is a chemical warfare agent that can penetrate human skin causing extensive blistering at the dermal-epidermal junction after a latency period of several hours. Although toxic effects of SM have been well characterized, the precise mechanisms responsible for SM-induced skin injury are still unknown. In this study, the effects of SM-treatment on mouse skin were examined at multiple time points to characterize the extended time response. This study was also used to evaluate the efficacy of candidates for inhibiting the adverse effects of SM. The study was done by (a) identifying a list of differentially expressed genes using the volcano plot (p < 0.05 and fold-change > 1.5) and SAM algorithm, (b) mapping those genes to KEGG (Kyoto Encyclopedia of Genes and Genomes [14
]) and Ingenuity Pathway Analysis (http://www.ingenuity.com
; see, e.g. [15
]) and (c) determining significant pathways using a Fisher Exact test. The results indicated that cytokine-cytokine receptor interaction, cell adhesion molecules and hematopoietic cell lineage are common significant pathways in the mouse skin treated with SM. Details on the study are available in Gerecke et al. (2008; unpublished manuscript, revisions to the manuscript submitted).
Steps involved in the application of ebTrack for microarray data analysis