|Home | About | Journals | Submit | Contact Us | Français|
The Comparative Toxicogenomics Database (CTD) is a public resource that promotes understanding about the interaction of environmental chemicals with gene products, and their effects on human health. Biocurators at CTD manually curate a triad of chemical–gene, chemical–disease and gene–disease relationships from the literature. These core data are then integrated to construct chemical–gene–disease networks and to predict many novel relationships using different types of associated data. Since 2009, we dramatically increased the content of CTD to 1.4 million chemical–gene–disease data points and added many features, statistical analyses and analytical tools, including GeneComps and ChemComps (to find comparable genes and chemicals that share toxicogenomic profiles), enriched Gene Ontology terms associated with chemicals, statistically ranked chemical–disease inferences, Venn diagram tools to discover overlapping and unique attributes of any set of chemicals, genes or disease, and enhanced gene pathway data content, among other features. Together, this wealth of expanded chemical–gene–disease data continues to help users generate testable hypotheses about the molecular mechanisms of environmental diseases. CTD is freely available at http://ctd.mdibl.org.
The environment is believed to play an important role in the etiology of many human diseases, and chemicals are an important component of the environment (1). The Comparative Toxicogenomics Database (CTD; http://ctd.mdibl.org) is a unique resource that makes connections between chemicals, gene products and diseases that may not otherwise be apparent, and provides the basis for testable hypotheses about the mechanisms underlying the etiology of environmental diseases (2–4).
Several valuable chemical databases currently exist (5). CTD is distinct in three important ways: it focuses on environmental chemicals, it manually curates and then integrates datasets to discover novel connections, and it functions as both a data repository as well as a tool for generating hypotheses about chemical actions and environmental diseases. The value and utility of CTD is evidenced by it being indexed at numerous other databases, including PubChem (6), PharmGKB (7), UniProt (8), T3DB (9), GAD (10) and ChemID (11), and by the inclusion of CTD’s curated information in other database products, such as STITCH (12), ToppGene (13), PhenoHM (14), Chem2Bio2RDF (15), UCSC Browser (16), WhichGenes (17), ChemSpider (http://www.chemspider.com) and RefGene (http://refgene.com). As well, CTD datasets have been used by several independent groups for meta-analyses to derive relationships between environmental chemicals and complex human diseases (18–20). Furthermore, CTD will be included as a search option in the suite of integral databases at TOXNET, the National Library of Medicine’s portal for toxicology data (21).
We previously reported on CTD in an introductory article (22); here we update the increased data content and describe several new analytical and visualization tools and enhancements to CTD since 2009.
The strength of CTD data still comes from information being derived by professional biocurators who read and manually curate the peer-reviewed scientific literature. This process, albeit time-consuming, ensures that the core (triad) data, which underlie the establishment of novel relationships, are valid and accurate. Our curation paradigm, including methods and sources of controlled vocabularies and ontologies, and our integration strategy used to generate inferred relationships between datasets were described previously in detail (22). To streamline the curation process, CTD recently designed a text-mining application that efficiently prioritizes the vast toxicology literature (23), yet information is still extracted manually from articles. CTD curated data are then combined and integrated with other CTD data as well as annotations from external sources to produce a plethora of novel inferred relationships.
In July 2010, CTD contained over 240300 molecular interactions between 5900 unique chemicals and 17300 gene products, 11500 direct gene–disease relationships and 8500 direct chemical–disease relationships extracted from over 21600 publications (Table 1). Integration of these data generates 886600 inferred gene–disease relationships and 246600 inferred chemical–disease relationships. In total, ~1.4 million chemical–gene–disease data connections are now available for exploration and analysis, representing a 2.5-fold increase in the content since our original description.
In addition to the increased content, CTD has also expanded its external data sources. We now include pathway data from Reactome (24), in addition to the previously included KEGG pathways (25). Reactome and KEGG databases annotate genes into pathways, and this information is then integrated with CTD genes. By integrating these chemical–gene data with gene pathway data, novel chemical pathway connections are generated, allowing users to explore pathways that may be influenced by environmental chemicals. Additionally, CTD now includes links to gene pages at PharmGKB (7) and chemical pages at DrugBank (26).
We have enhanced CTD data by adding three new computational features.
To help navigate the 1.4 million chemical–gene–disease data points in CTD, we have created a suite of analytical and visualization tools, accessible from the ‘Tools’ menu bar.
CTD has been enhanced with many other features to make the website even more user-friendly, including a redesigned homepage to make navigation easier and more intuitive, a ‘Downloads’ menu tab that allows users to download all of CTD’s dataset files, and a ‘Help’ menu tab with links to our FAQ (which includes step-by-step instructions on how to perform various queries), tutorial resources (including our handy Resource Guide for quick reference to CTD), instructions on how to link to CTD and a site to join our email list to stay informed about new features and releases, including what new chemicals, genes and diseases were curated for that month. Users can also request curation of specific papers or underrepresented chemicals using the ‘Contact us’ tab.
Lastly, we have started engaging the scientific community in reviewing the curation at CTD. When available, the email address of the corresponding author from a curated paper is captured by biocurators. After each monthly update, emails are automatically sent to the authors to alert them that their work has been curated and to ask them to review the data. To date, over 3900 authors have been notified for more than 4600 papers. This interaction with and feedback from the research community helps to ensure the high quality of curated data, and introduces CTD to potential new users.
CTD provides detailed information about manually curated chemical–gene interactions, chemical–disease relationships and gene–disease relationships. By integrating these core data with other datasets, CTD helps turn knowledge into discoveries by identifying novel connections between chemicals, genes, diseases, pathways and GO annotations that might not otherwise be apparent using other biological resources.
Here we have highlighted the recent improvements to CTD, including new data content, expanded data sources, enhanced data features and new analytical tools that allow users to perform meta-analyses of the datasets. Users can also search with our query pages that allow a multitude of parameters to be queried simultaneously (e.g. GO annotations, pathway terms, chemical classes, types of interactions and diseases) to ask sophisticated questions such as: which transcription factors have their activity affected by heavy metals, or how does the chemical resveratrol affect the gene p53? Examples of how to run these queries and retrieve the results are provided in the FAQ section of the ‘Help’ menu tab.
In the future, we hope to expand the depth and breadth of the manually curated core data in CTD, include new inference scores for gene–disease predictions (to complement the inference scores for chemical–disease predictions) and provide a new metric (‘DiseaseComps’) to describe diseases related to each other based upon shared toxicogenomic profiles. As well, we plan to increase the utility of our inferred data for GO terms and pathways.
All of these features continue to make CTD a unique scientific resource that promotes understanding about the effects of environmental chemicals on human health and for generating testable hypotheses about the mechanisms underlying the etiology of environmental diseases.
National Institutes of Health grants, National Institute of Environmental Health Sciences and the National Library of Medicine (R01 ES014065 and R01 ES014065-04S1 to CTD); INBRE program of the National Center for Research Resources (P20 RR016463). Funding for open access charge: NIEHS grant (R01 ES014065).
Conflict of interest statement. None declared.