Environmental agents are postulated to play a critical role in the etiology of many human diseases (
1–4), and chemicals are an important component of the environment. To understand the impact of environmental chemicals on human health, we have developed the Comparative Toxicogenomics Database (CTD;
http://ctd.mdibl.org) as a unique tool to provide connections between chemicals, genes/proteins and diseases that may not otherwise be apparent, and to provide the basis for testable hypotheses about the mechanisms underlying the etiology of environmental diseases (
5–7).
Several valuable chemical, gene and disease databases currently exist. Each one has its advantages. Many public chemical databases, such as PharmGKB (
8), DrugBank (
9), ChemBank (
10) and STITCH (
11) focus on drugs and other small molecules, providing an invaluable resource for therapeutic research. There are several microarray resources that provide varying degrees of data for chemicals, genes and diseases. Chemical Effects in Biological Systems (CEBS) (
12) is a public repository and tool for chemically relevant microarray, proteomics, clinical chemistry, hematology and histopathology data. ArrayExpress (
13) and Gene Expression Omnibus (GEO) (
14) are public repositories for microarray data. Although the latter contain chemically relevant data, these data are not their expressed priority. ArrayTrack (
15) is an installable application and database for managing and analyzing microarray data. Currently, only users at the US Food and Drug Administration (FDA) may submit their data; however, non-FDA users have access to ArrayTrack functionality. ChEBI (
16) is an excellent dictionary for chemical entities, but outsources its information on the biology of those chemicals to other databases via external links. PubChem (
14) is a repository of chemical substance information, compound structures and biological activities of small molecules, but does not integrate that data with official gene symbols or disease information. OMIM (
17) and HGMD (
18), two of the most commonly cited disease databases, annotate genetic diseases, but do not provide any associated chemical information. Some gene databases, such as GeneCards (
19) and PubGene (
20), have recently included gene–chemical associations, but those relationships are established via text-mining algorithms and are not reviewed or validated by professional biocurators. KEGG (
21) and Reactome (
22) map chemicals, genes and (in the case of KEGG) disease information to pathways, but the pathways and interactions are generically applied to orthologous proteins and all species, and it is not always clear which reference supports which pathway relationship. CTD is distinct from these databases in three ways: (i) it focuses on environmental chemicals; (ii) it integrates curated and imported data, allowing users to explore connections between chemicals, genes, and diseases; and (iii) it functions not only as a repository for information, but also as a resource for generating novel hypotheses about environmental diseases and chemical actions.