In CTD, we present associations between chemicals, genes, and proteins based on the published literature, provide a filtered set of relevant references, integrate high-throughput experimental data, curate toxicologically important genes and their proteins, and provide visualization capabilities that facilitate cross-species sequence comparisons of genes and proteins. These comparisons will enhance understanding about the function of these genes and proteins and the molecular basis of differential susceptibility. CTD centralizes data that is core to toxicogenomics by integration and manual review of diverse molecular, reference, and chemical data.
Although many chemical-gene interactions have yet to be described in the literature it still remains the most valuable source for detailed functional data. High-throughput techniques like microarrays are beginning to provide insight into the range of genes that may be affected by chemical exposures, but these techniques lack specifics about non-transcriptional mechanisms underlying chemical effects. CTD integrates data from both sources to coordinate this complementary information and enhance understanding of chemical actions.
Genes and proteins function together in complex networks rather than in isolation (Vidal, 2005
). Understanding mechanisms of chemical actions requires knowledge of these networks and constituent chemical-gene and protein interactions, which may be direct (e.g., “chemical binds to protein”) or indirect (e.g., “chemical results in activated transcription of a gene” via intermediate events). CTD currently presents associations between groups of chemicals and genes based on information retrieval of reference titles and abstracts as well as MeSH annotations. Although this approach is valuable for creating preliminary correlations, it is limited in that these associations are only inferred and the types of interactions are not specified. To address these limitations, we have begun manually curating specific chemical-gene and protein interactions in diverse species from the published literature. Identifying interactions across species offers advantages to developing network hypotheses, including validating conserved mechanisms of action, identifying species-specific differences in chemical actions, and testing functionality of networks and network components in experimentally tractable systems.
To ensure consistency of literature curation, we developed an “interaction” controlled vocabulary that characterizes common physical, regulatory, and biochemical interactions between chemicals and genes or proteins. This vocabulary comprises 70 terms including “actions” (e.g., “binds to”, “imports”), “operators” that describe the degree of a chemical’s effect (e.g., “increase”), and “qualifiers” that specify the form of the gene or chemical involved in an interaction (e.g., “protein” or “chemical metabolite,” respectively). Curators use this vocabulary to construct detailed annotations of chemical-gene and protein interactions from the literature. The interaction vocabulary was initiated in collaboration with Dr. Andrey Rzhetsky (Columbia University; Rzhetsky et al., 2004
) and it continues to be refined.
To date, we have manually curated over 22,000 interactions involving 2,000 chemicals, 2,300 genes and proteins, and 75 different species from more than 3,000 references. The CTD curation process efficiently adds valuable information about chemical-gene and protein relationships by combining an information retrieval strategy to identify relevant references and manual curation to extract and validate interactions. The importance of using manual curation to validate results from automated information retrieval methods was demonstrated by a comparison of CTD-curated interactions with automated “chemical compound relationships” displayed by GeneCards, a database of human genes (Safran et al, 2002
). For example, a GeneCards query for ATP-binding cassette, sub-family B (MDR/TAP), member 11 (ABCB11) on February 10, 2006 listed 8 associated chemicals for this gene. However, manual review of the supporting references confirmed relationships with only 4 of these chemicals. Three of the remaining chemicals were invalid and based solely on co-occurrence of terms in the title or abstract of the references. One term presented as a chemical is an acronym for a disease associated with ABCB11 mutations. In addition to the 4 valid chemicals in GeneCards, CTD identified an additional 88 chemicals with valid ABCB11 interactions.
Manually curated interactions will be integrated with public CTD data and identified in future releases. This integration will allow visitors to ask mechanistic questions about relationships between chemicals and genes or proteins by querying with numerous parameters (e.g., “tetrachlorodibenzodioxin [chemical term] results in increased transcription [CTD interaction] of which genes?”; “what gene promoters are demethylated [CTD interaction] in response to the green tea extract epigallocatechin gallate [chemical term]?; “what protein kinases [GO molecular function term] play a role in chemical resistance [CTD interaction term] to tamoxifen [chemical term]?”; “what proteins undergo phosphorylation [CTD interaction term] in response to the adamantine class of compounds [chemical term]?”). Ultimately, these interactions will make substantial contributions to predicting and visualizing complex chemical interaction networks from either a molecular (gene or protein) or chemical perspective. is a high-level schematic illustrating the currently curated 322 interactions between lipopolysaccharides (LPS) and 286 unique genes and proteins. LPS exposure results in activated expression of 152 genes, decreased expression of 143 genes, decreased activity of 4 proteins, increased phosphorylation of 6 proteins, increased secretion of 2 proteins, and localization effects on 4 proteins. A table listing all LPS interactions with genes and proteins curated to date is provided as supplementary data. Extensive integration of additional data with chemicals and genes in CTD will enable visitors to evaluate these interactions in important contexts such as diseases (e.g., can these interactions with specific genes and proteins help explain the correlation between LPS and the onset or severity of asthma?). Access to and understanding of complex chemical-gene and protein networks will directly support hypothesis-driven research about the mechanisms of chemical actions and have important implications for predicting both past exposures and chemically-induced toxicity or disease.
Figure 4 Chemical-Gene Interaction Network Schematic. Cross-species interactions between chemicals and genes and proteins are curated manually from the literature for CTD. These data will be important for building complex interaction networks. This high-level (more ...)