|Home | About | Journals | Submit | Contact Us | Français|
Gene-environment interactions contribute to complex disease development. The environmental contribution, in particular low-level and prevalent environmental exposures, may constitute much of the risk and contribute substantially to disease. Systematic risk evaluation of the majority of human chemical exposures, has not been conducted and is a goal of regulatory agencies in the U.S. and worldwide. With the recent recognition that toxicological approaches more predictive of effects in humans are required for risk assessment, in vitro human cell line data as well as animal data are being used to identify toxicity mechanisms that can be translated into biomarkers relevant to human exposure studies. In this review, we discuss how data from toxicogenomic studies of exposed human populations can inform risk assessment, by generating biomarkers of exposure, early effect, and/or susceptibility, elucidating mechanisms of action underlying exposure-related disease, and detecting response at low doses. Good experimental design incorporating precise, individual exposure measurements, phenotypic anchors (pre-disease or traditional toxicological markers), and a range of relevant exposure levels, is necessary. Further, toxicogenomic studies need to be designed with sufficient power to detect true effects of the exposure. As more studies are performed and incorporated into databases such as the Comparative Toxicogenomics Database (CTD) and Chemical Effects in Biological Systems (CEBS), data can be mined for classification of newly tested chemicals (hazard identification), and, for investigating the dose-response, inter-relationship among, genes, environment and disease in a systems biology approach (risk characterization).
Human disease is thought to arise when the normal physiological state of an individual, determined by the unique genetic background (genome), is perturbed by the exposome, a term describing all exposures from conception onwards . Such perturbations can be assessed by measuring the components of the responsome (transcriptome, proteome, miRNome, methylome) using toxicogenomic technologies. The variability of the human genome and the exposomes encountered leads to a wide range of possible outcomes in a population. Thus, gene-environment interactions contribute to disease development and progression across the life stages especially in susceptible individuals. Comprehensive analysis of the genome, exposome and responsome are necessary to elucidate these processes.
Recently, genome-wide association studies (GWAS) have identified several disease risk alleles, inherited variations or polymorphisms in gene sequences. Results from GWAS show that many common variants each of small, additive effect probably contribute to complex disease risk . The increased resolution of genetic endpoints through the inclusion of copy-number variation (CNV) in GWAS studies  or the application of massively parallel sequencing  may further inform the genetic contribution to disease (genome). Based on current data, however, it appears that the environmental contribution, comprising in part the “exposome” representing all exposures from conception onwards , may constitute the majority of the risk of chronic disease. In support of a strong environmental effect on disease development is the finding that disease risk in migrant populations for atherosclerotic disease [5–6] and cancer  shifts towards that of the population of the adoptive country.
Low-level and prevalent environmental exposures may contribute substantially to disease [8–10]. The development of high-resolution technologies to assess exposures in the environment and in individuals is urgently needed to further understand such links [1,10–13]. Adductomics, the “omic” level measurement of protein and DNA adducts, compounds formed by the covalent reactions between blood proteins (typically hemoglobin and albumin) or DNA and chemicals (or their metabolites) to which an individual has been exposed, by analytical techniques such as mass spectrometry (MS) is one such approach . Other promising approaches include microfluidics, nanotechnologies and MS [15–18].
Toxicity data on the more than 100,000 chemicals marketed in the U.S. and Europe, is extremely limited [19–21]. Together, the risk characterization of the total burden of environmental exposures and mixtures thereof, using updated toxicogenomic approaches, should greatly inform the mechanisms underlying chemically-induced complex disease. Here, we review the application of toxicogenomic studies to evaluate the responsome at molecular (DNA, RNA and protein) levels, in exposed human populations, to develop a better understanding of gene-environment interactions underlying disease.
The conventional health risk assessment paradigm for chemical exposures comprises four major steps: exposure assessment, hazard identification, dose-response assessment, and risk characterization. The approach has traditionally relied largely on animal toxicity studies with the extrapolation to adverse human health responses derived from the application of “uncertainty factors” to account for uncertainties associated with species extrapolation (animal to human), dose-extrapolation (high doses in animal studies to low dose human exposures) and prediction of risk to susceptible populations . Data from animal tests are often poor predictors of real-world human effects, the most famous recent example probably being the TGN 1412 clinical trial . Many drug candidates are abandoned due to non-predicted human effects in clinical trials . While improved animal models such as “humanized” mice have the potential to overcome some of the uncertainties associated with extrapolation to human and to address human susceptibility , it is increasingly being recognized that alternative approaches predictive of effects in humans are required. [26–28]. In 2007, the U.S. National Research Council (NRC) reviewed existing strategies and developed a long-range vision for toxicity testing and risk assessment employing updated toxicological methodologies such as toxicogenomics and in vitro and high-throughput systems to facilitate the screening of the large numbers of chemicals in commercial use [26–27]. A similar approach was taken in Europe . A number of challenges remain including the requirement for massive advances in computational biology to extrapolate from in vitro multi-tissue effects to effects on organs and whole humans .
A more reductionist version of this approach was recently proposed by the U.S. Environmental Protection Agency (EPA)  and is based on 2007 recommendations by the National Academies of Sciences (NAS; Toxicity testing in the 21st century)  and the hypothesis that the ability of chemicals to induce perturbations in the finite number of toxicity pathways (e.g. oxidative stress response) could be queried using methodologies such as in vitro assays and toxicogenomics [26,31]. The strategy focuses on the measurement of perturbations in baseline biological processes elicited by environmentally relevant exposure levels that may trigger toxicity pathways leading to adverse health outcomes . Characterization of the relevant toxicity pathways and the identification of biomarkers of key event parameters that can be monitored in human studies of chemical exposure are required. The combination of these data with distributional data on population characteristics of exposure and dose (magnitude, frequency, and duration) would provide a scientifically based approach for reducing the uncertainties associated with current risk assessments. The utilization of existing human data from epidemiological studies and clinical trials to retrospectively and prospectively demonstrate that the approach successfully and adequately predicts human toxicological responses, is proposed . It is also envisioned that GWAS data will provide additional support for the pathway-based models.
The newer approaches described above focus mainly on identifying mechanisms of toxicity in animal studies or in vitro, and subsequently translating these findings into biomarkers that can be applied to human exposure studies. A complementary approach is to perform toxicogenomic studies of human populations with well-characterized exposures, in order to directly determine biomarkers of exposure and early effect, and assess dose-response, as outlined in Figure 1. Transcriptomics, proteomics, and epigenomics can each provide a “molecular signature” or “fingerprint” of exposure or early effect, which can be compared with the profiles associated with known hazards, e.g. carcinogens, to inform hazard identification. Examination of impacted gene functions and pathways may enhance our understanding of the mechanisms by which chemicals contribute to disease (risk characterization). The incorporation of sensitive, updated measures of exposure assessment, e.g. adductomics, would allow assessment of dose-response at environmentally relevant exposure levels. These omic signatures and ultimately, risk, induced by exposure, are determined by the unique genomic composition of each individual and biomarkers of susceptibility can be determined through genomic analyses. Thus, adductomics, transcriptomics, proteomics, and epigenomics can characterize the exposome, responsome and (early) outcome of each individual in the context of underlying susceptibility (genomics), facilitating the examination of gene-environment interactions. Correlation of toxicogenomic data with phenotypic endpoints such as traditional toxicological or clinical endpoints or pre-disease states (phenotypic anchors) could help to predict outcome thereby greatly improving the rigorous application of this approach to risk assessment. Few human toxicogenomic studies to date have incorporated phenotypic anchors or assessed dose-response to chemical exposures, particularly at low levels of environmental exposure.
Beyond the scope of this review is a discussion of metabolomics, the measurement of the full complement of endogenous metabolites in a cell, tissue or biofluid by techniques such as MS . Metabolomics provides a direct “functional readout of the physiological state” of an organism  and metabolite profiles vary with genotype , diet, and gut microbial composition . Specific profiles have associated with risk factors for cardiovasular disease  and with nicotine consumption  and comprise potential biomarkers of pathophysiology. The detection of xenobiotic metabolites can reflect internal dose. This review focuses on transcriptomics, proteomics, and epigenomics.
The transcriptome is measured by global gene expression profiling using microarray analysis, or, more recently, by next-generation sequencing technologies . Microarray technology and its potential application has matured in part through the efforts of the FDA-led MicroArray Quality Control (MAQC) consortium, a widespread collaboration conceived to broadly address performance, quality, and data analysis issues related to the use of DNA microarrays  and the development of Minimum Information About a Microarray Experiment (MIAME) standards . As a result, good concordance has been reported among platforms, allowing the comparison of data among different studies, laboratories and technologies .
The human peripheral blood (PB) transcriptome is dynamic, responding to environmental factors including stress , exercise [40–41], diet  and lifestyle , though remaining stable over time in the individual . In the case of lifestyle, the broadest environmental factor studied, different lifestyles were characterized by the expression of one third of the leukocyte transcriptome, including various classes of immune response genes that influence susceptibility to respiratory and inflammatory disease .
Environmental exposure to chemicals also modifies the human transcriptome although studies examining the impact of such exposures on global gene expression in human populations are currently limited and include populations exposed to benzene [45–46], dioxin , arsenic [48–50], metal fumes , and complex environmental exposures such as cigarette smoke (CS) [52–55] and diesel exhaust , summarized in Table 1. Each of these studies identified potential biomarkers of exposure and/or early effect. The genes altered by these exposures represent a diversity of mechanisms including systemic effects on inflammation, which may underlie the development of associated diseases. In the study of CS-associated alterations in gene expression, signatures that could distinguish smokers from nonsmokers were identified in two studies [52–53]. A third study identified signatures of current and past exposure to CS and distinguished mechanisms associated with chronic and acute exposure .
One of the challenges of human toxicogenomic studies is to address variability arising from differences across life stages that potentially influence or interact with toxicity pathways. In studies examining exposure to air pollution, several differentially expressed genes were identified in the blood cells of children from urban regions of the Czech Republic compared with rural regions  and the effects on children and adults at the transcriptional level differed . However, the data were based on general measurements of air exposures from monitoring stations in these studies while precise individual exposures may be more appropriate to detect robust changes.
The CS- and air pollution studies correlated gene expression with traditional toxicological endpoints such as micronuclei frequencies and DNA adduct formation [57,59]. Below, we discuss human transcriptomic studies of benzene and arsenic, which serve as examples of toxicogenomic studies using clinical endpoints as phenotypic anchors that also assess response at environmentally relevant doses.
Benzene is an established cause of acute myeloid leukemia (AML), myelodysplastic syndromes (MDS), and probably lymphocytic leukemias and non-Hodgkin lymphoma (NHL) in humans [60–63]. We found evidence of hematotoxicity in workers exposed to varying levels of benzene (n=250) and non-exposed controls (n=140) in Tianjin, China, in a study with accurate, individual exposure measurements . A significant decrease in almost all blood cell counts, such as white blood cells (WBC), granulocytes, lymphocytes, platelets etc, was observed in exposed workers, even at exposures below 1 ppm (n=109), the current occupational standard in the U.S. The demonstration of hematotoxicity in exposed workers represents a phenotypic anchor, providing context for analysis of toxicogenomic effects, particularly useful given the long latencies of AML and NHL.
In a study of global gene expression and high-dose occupational benzene exposure in peripheral blood mononuclear cells (PBMC), we identified CXCL16, ZNF331, JUN, and PF4, as potential biomarkers of early response to benzene exposure  in 6 exposed-control pairs. A later study, using 2 different microarray platforms (Affymetrix & Illumina), confirmed altered expression of these 4 genes, and revealed impacts on apoptosis and lipid metabolism in 8 individuals exposed to >10ppm benzene compared with 8 unexposed controls . More recently, we have shown, in an expanded study of 125 factory workers, that low-dose benzene exposure (<1 ppm, n=59) is associated with widespread subtle, yet highly significant, perturbation of the expression of more than 2500 genes [65–66]. Further, the study revealed potential biomarkers and pathways impacted by benzene exposure across a range of exposure levels as well as biomarkers and pathways uniquely impacted at low levels of benzene exposure. Many of the altered genes were involved in apoptosis, and, immune and inflammatory responses.
Response pathways implicated in mouse bone marrow and stem cells exposed to very high levels of benzene (100–300 ppm), including p53 response, DNA repair and cell cycle arrest [67–68], were not confirmed in human chronic exposure studies, although apoptosis was impacted in both human and animal studies. The differences in transcriptome response between human and animal studies could reflect different mechanisms of action of benzene although differences in exposure intensity and time, and tissues analyzed may also be contributing factors.
Chronic exposure to the carcinogen, arsenic , is associated with lung, bladder, nonmelanoma skin cancers, kidney and liver cancer [70–72]. Exposure to inorganic arsenic alters the expression of genes involved in arsenic metabolism, stress response, damage response and apoptosis, cell cycling, cell signaling and growth factor signaling, as recently reviewed by Ghosh and colleagues . However, the degree of individual susceptibility to arsenic-induced effects varies among populations from different parts of the world exposed to comparable levels of arsenic in drinking water. Only a minor percentage of exposed individuals within a population develop arsenic-induced premalignant skin lesions, an early manifestation and hallmark of arsenic toxicity that may indicate increased future risk of arsenic-related cancer . Although the molecular basis of arsenic-induced skin lesions and its progression to cancer is poorly understood, it serves as a potential phenotypic anchor of arsenic toxicity in exposed humans.
A microarray-based gene expression study was conducted among individuals chronically exposed to arsenic in Bangladesh to assess whether arsenical skin lesion status and arsenic exposure level were associated with differential gene expression patterns . Mean (SD) well-water levels of arsenic in the Bangladesh study were 342.7 (258.1) μg/L for the group with skin lesions (n=11) and 39.6 (48.5) μg/L for the group without skin lesions (n = 5) . Genes involved in RNA metabolism, hydrolase activity, ribonucleoprotein complex, translation, cellular protein catabolism, amino acid activation, transport and transporter activity, signal transduction through the interleukin (IL)-1 receptor, and glycoprotein metabolism were found to be differentially expressed in peripheral blood lymphocytes (PBL) of exposed individuals with arsenical skin lesions compared with exposed individuals without such lesions. Dose-dependent analysis of exposure was not possible in the study because of the wide variation in exposure levels and limited sampling data.
A study of PBL global gene expression of populations exposed above and below the drinking water standard of 10 μg/L was conducted in New Hampshire, US, where 40% of the population consuming drinking water from unregulated private wells . The drinking-water arsenic levels of the higher-exposed group (n = 11) averaged 32 μg/L (range: 10.4–74.7 μg/L), whereas the levels for the low-exposure group (n = 10) averaged 0.7 μg/L (range: 0.007–5.3 μg/L). The most significant pathways in the higher-exposed groups were involved in defense and immune response, including inhibitory killer cell immunoglobulin-like receptors with roles in both innate and adaptive immune response. Cell growth, apoptosis, cell cycle regulation, and T-cell receptor signaling pathway were also impacted. Differential expression of transcripts involved in diabetes was observed at high-exposure. Arsenic exposure has been associated with increased diabetes mellitus related mortality in several populations, including the U.S. [76–77].
Differential expression of genes involved in the nervous system and other aspects of development, support associations between arsenic exposure and fetal and early childhood effects [78–80]. Early childhood arsenic exposure increases the subsequent mortality in young adults from both malignant and nonmalignant lung disease  and the childhood liver cancer mortality . The latency for arsenic-induced bladder cancers may exceed 50 years . It has been suggested that intrauterine or early childhood exposure to arsenic induces changes that become apparent much later in life, probably through epigenetic effects, endocrine effects, immune suppression, neurotoxicity and interference with fetal programming . Examination of the gene expression profiles of a population of newborns, whose mothers were exposed to varying levels of arsenic exposure during pregnancy, revealed a systemic inflammatory response and increased NF-κB signaling . Additionally, a network of 11 transcripts was identified which could predict arsenic exposure in newborns with 83% accuracy. However, as the unexposed newborns were from two different regions of Thailand, one urban and one rural, it is not possible to conclusively associate these changes with arsenic exposure.
More proximal to phenotype than the transcriptome, the human proteome may better reflect molecular and cellular process. However, analysis of the total protein output encoded by the genome using proteomics techniques such as MS  and antibody arrays  is more challenging and less amenable to application in a high-throughput capacity, due to differences in protein properties, location and abundance . In order to reduce the complexity of the proteome, protein fractionation and depletion of high-abundance proteins such as albumin, must be performed prior to analysis. Differences in sampling, collection, handling and storage can impact the observable proteome from serum and plasma, two readily available and commonly tested biofluids, underscoring the importance of standardized protocols across studies and laboratories [89–90]. The Minimum information about a proteomics experiment (MIAPE), a Human Proteome Organization’s Proteomics Standards Initiative has been developed to encourage the standardized collection, integration, storage and dissemination of proteomics data, and develop guidance modules for reporting the use of techniques such as gel electrophoresis and MS [91–92].
Few proteomic studies of human-exposed populations have been conducted. A recent study examining the impact of cigarette smoke on the airway epithelial proteome of 5 current smokers compared with 5 never smokers, using 1D-PAGE coupled with LC-MS/MS, identified 23 proteins that differed between never and current smokers and confirmed the smoking-related changes of PLUNC, P4HB1, and uteroglobin protein levels by Western blotting . The study also demonstrated a strong correlation between protein and transcript detection within the same samples. Other such studies include those by our group and others of populations exposed to benzene , and arsenic [94–96].
The plasma proteomes of fifty workers reportedly exposed to benzene in solvents at a printing company and 38 matched unexposed controls were analyzed by two-dimensional electrophoresis (2-DE)  and significant differences in the resulting protein profiles were found using matrix-assisted laser desorption ionization/time of flight (MALDI-TOF) MS and Western blot. Although up-regulation of T cell receptor beta chain, FK506-binding protein and matrix metalloproteinase-13 was seen in the printing workers, limited exposure information on benzene levels in the solvents used precluded a true association with benzene exposure.
Three proteins were found to be consistently down-regulated in benzene-exposed compared with control subjects in two sequential studies of shoe factory workers with well-characterized benzene exposures using surface enhanced laser desorption/ionisation (SELDI-TOF), a combined MS and array-based technology . The proteins were highly inversely correlated with individual estimates of benzene exposure (r > 0.75). Two of the proteins were subsequently identified as platelet factor 4 (PF4) and connective tissue activating peptide (CTAP)-III, both members of the CXC-chemokine family. As well as representing potential biomarkers of benzene exposure, the biological roles of these proteins [98–100] support the current understanding of the toxic effects of benzene including immunosuppression and toxicity to hematopoietic progenitors.
We analyzed the urinary proteomes of human populations exposed to arsenic in Nevada and Chile in order to elucidate the mechanisms underlying As-associated kidney and bladder cancers, and identify biomarkers of exposure and early effect. Decreased expression of human β-defensin-1 (HBD-1) peptides, in the urine of men from Nevada with high arsenic exposure was found and the finding was replicated in a second, independent arsenic exposed population from Chile . HBD-1 is a peptide with well-known antimicrobial effects , and lesser-known cytotoxic and chemotactic properties [102–103], which may function as a tumor suppressor gene for urological cancers.
The differential expression of 20 proteins in the plasma of arsenic exposed individuals from Bangladesh was reported . Similarly, five discriminatory protein peaks were identified in the serum proteomic profiles of forty-six male smelter workers with combined exposure to both mixed lead and arsenic compared with forty-five age-matched male office workers  using SELDI-TOF. However, in both of these studies the affected proteins have not been identified or validated.
The epigenome is dynamic and is thought to be in uenced by environmental factors throughout life [104–107]. Epigenetic modifications, such as DNA methylation and histone modifications, may represent more stable fingerprints of exposure than altered gene or protein expression . Further, interindividual differences in the epigenetic state could also affect susceptibility to xenobiotics and associated disease risk . A role for miRNAs in mediating the response to environmental exposures has been demonstrated by a study showing that smoking induces gene expression changes in the human airway epithelium  with some genes modulated by miRNA . DNA methylation levels  and miRNA profiles  are amenable to investigation in a high-throughput manner by array- and sequencing-based methods.
Aberrant gene promoter methylation is a common event in cancer [114–116] and other diseases . A recent study of DNA methylation in lung cancer arising in tobacco smokers and alcohol drinkers revealed evidence of gene-specific and sex-specific differences in methylation patterns . Cancer-related methylation changes have been reported in cancer-free individuals and potentially associated with lifestyle factors . Expression profiling analyses have also revealed potential characteristic miRNA signatures in certain human cancers [119–122] and other diseases [123–124].
Studies of epigenetic alterations in populations at increased risk of disease through exposure to chemicals are necessary to determine whether such alterations are involved in the causal pathways of disease development.
A study of epigenetic changes induced by low-level exposure to benzene in healthy subjects including gas station attendants and traffic police officers, revealed significant hypermethylation in p15 with increasing airborne benzene levels . While this is the first human study to show DNA methylation changes induced by low-level carcinogen exposure, the magnitude of altered methylation was small and the benzene exposures were very low (~22 ppb) and potentially confounded by other exposures and lifestyle factors.
We conducted a pilot study analyzing the DNA methylation profiles of over 800 genes in the buffy coat DNA of 6 workers (2 male, 4 female) exposed to benzene and 4 unexposed controls (2 male, 2 female), using array technology . Preliminary data showed gender-specific methylation patterns, as expected, and revealed altered methylation induced by benzene at many CpG sites. Decreased methylation of RUNX3 (AML2), a gene whose altered expression has been associated with myeloproliferative disorders  and increased methylation of MSH3, a critical gene in the maintenance of genome integrity, and Sema3C, a secreted guidance protein implicated in tumorigenesis , was also found. We also reported that benzene exposure altered miRNA expression in exposed workers . We are currently expanding these studies.
Arsenic exposure has been shown to alter the DNA methylation status of multiple gene promoters in humans. Hypermethylation of the p53 gene promoter was observed in arsenic-exposed people compared to control subjects and of the p53 and p16 genes in arsenic-induced skin cancer patients compared to subjects having skin cancer unrelated to arsenic . Arsenic exposure has also been shown to induce death-associated protein kinase (DAPK) promoter hypermethylation in a human uroepithelial cell line  and in human urothelial carcinoma  and RASSF1A and PRSS3 promoter hypermethylation in advanced human bladder cancer .
It has been proposed that the mode of action of arsenic is similar to folate deficiency . In support of this, Kelsey and colleagues showed that treatment with arsenic and folate deprivation in vitro produced similarly altered miRNA expression profiles  and confirmed the altered expression of hsa-miR-222 in human subjects with low dietary folate levels. This study shows that miRNA expression profiles altered by environmental carcinogen exposures may be associated with the process of carcinogenesis.
It should be clear from the studies described above that good experimental design incorporating precise, individual exposure measurements, phenotypic anchors (pre-disease state or traditional toxicological markers), and a range of relevant exposure doses can increase the power of human population toxicogenomic studies to generate biomarkers of exposure and/or early effect, elucidate modes of action underlying associated disease and detect effects at low doses. These findings can inform risk assessment. Recently, a committee of the NAS, the “Science and Decisions Committee”, found “substantial deficiencies” in the current approaches to the treatment of uncertainty and variability in quantitative risk assessment of both cancer and noncancer outcomes and offered a new framework for risk-based decision making . One of the recommendations was the harmonization of cancer and noncancer risk assessment. Human toxicogenomic data, being unbiased, can potentially generate biomarkers and inform mechanisms underlying a wide range of human disease. The framework differentiates individual from population risk with probabilistic characterization of the latter informed by formal systematic assessment of human heterogeneity with respect to susceptibility (genetics, age, lifestage), co- and background exposures, as well as mechanisms of action. Toxicogenomic endpoints reflect gene-environment interaction and in a sufficiently large diverse population could potentially evalute human heterogeneity. For example, data could be evaluated in subgroups of susceptible individuals containing candidate or known susceptibility genes. Further, as discussed earlier, adductomics can provide a measure of internal dose reflecting gene-environment interaction.
Current methods used in hazard identification of e.g. carcinogens, include the 2-year rodent carcinogenicity bioassay, which assesses the risk of cancer development in animals , and the short-term in vitro genotoxicity testing battery which assesses a chemical’s ability to cause genetic damage in cells predictive of cancer . Both of these approaches have limited predictive potential for carcinogenesis in humans and fail to address non-genotoxic effects. In order to achieve better cancer predictive potential, toxicogenomic studies of exposed cell lines and animals have been explored as an alternative approach to hazard identification of different classes of carcinogens, through determination of generic molecular pathway responses, as reviewed . Gene sets have been identified that could discriminate classes of chemicals, e.g. carcinogens and non-carcinogens [138–141], and genotoxic vs non-genotoxic carcinogens , in vitro and in animal models.
The paucity of toxicogenomic data from exposed human populations currently precludes the identification of similar human gene sets. However, initiatives such as the Comparative Toxicogenomics Database (CTD)  and Chemical Effects in Biological Systems (CEBS)  have been developed to store current and future human toxicogenomic datasets and facilitate studies of the inter-relationship among, genes, environment and disease. Such databases also provide a framework for the classification of new chemicals based on comparison of transcriptomic, proteomic and epigenomic profiles. Aside from the toxicogenomic databases, most recently published transcriptomic data is also publicly available through the Gene Expression Omnibus (GEO) Database, www.ncbi.nlm.nih.gov/geo.
Among the limitations of current risk assessment approaches are difficulties in extrapolating from acute, high-dose exposures in animals to environmentally relevant chronic exposures in humans and assessing dose-response in the appropriate dose-range. The NAS Science and Decision committee recommended three conceptual models for estimating low-dose risk estimates . However, concerns regarding the use of human variability modeling to set exposure standards for human exposures to toxic chemicals (model 2) has been raised . Human toxicogenomic studies can be designed to measure effects at low-dose exposures but have mainly addressed exposures at the upper end of typical ranges of human exposure and have often lacked precise, individual estimates of exposure. Biomarkers of internal dose, such as specific protein adducts [14,146–147] which can account for inter-individual differences in metabolism, have rarely been applied. Dose response, an important criterion of risk assessment, has not been examined in the majority of human population studies. The prevailing notion that genotoxic agents have a dose-response curve that is linear in the low-dose region without a threshold, while the dose-response curves for non-genotoxic agents have a threshold, has been disputed . Further, it has been argued that the study of endpoints in humans exposed at low levels may be able to provide empirical data necessary to clarify the shape of the population dose-response curve. With good study design and precise measurements of exposure, toxicogenomic studies have the potential to detect effects across a range of environmentally relevant low-dose exposures in humans [66,75].
We recently showed that two different metabolic pathways, with different affinities, exist for high and low-dose benzene in a study of benzene exposures and metabolite levels among 263 non-smoking women . Statistical evidence from the study strongly suggests that a currently uncharacterized high-affinity pathway is largely responsible for the metabolism of benzene at sub-part per million air concentrations. The finding implies that the risk of leukemia associated with benzene could be substantially greater than is currently thought in the general population. A differential effect of low-dose exposure to benzene is further supported by our finding of unique gene and pathway effects, through transcriptomic analysis [65–66].
Toxicogenomic studies in animals have informed the modes of action underlying toxic effects by chemical class, e.g. DNA damage and cell cycle progression characterized four genotoxic hepatocarcinogens, while oxidative stress or a regeneration response characterized nongenotoxic carcinogens in a study examining male rats exposed for up to 14 days at doses previously shown to induce hepatic tumors in long-term cancer bioassays . As discussed earlier, toxic effects in animals often do not predict those in humans. Therefore, the direct identification of key mechanisms underlying human toxicity may be more informative and can certainly complement the animal and in vitro approaches. As discussed above, many of the human toxicogenomic transcriptome studies identified impacts on altered immune and inflammatory processes as well as apoptosis and cell cycle. The CTD  and CEBS  databases provide a framework on which to investigate mechanisms of action underlying toxicity, as more studies are performed on different types of chemical exposure. Toxicogenomic profiling of exposed individuals with pre-disease states predictive of future disease, such as arsenical skin lesions , can increase the ability to identify disease-causal mechanisms.
While individual toxicogenomic datasets can provide valuable information, systems biology approaches may be necessary to clarify the molecular and cellular networks impacted by exposure, and thus identify all potential mechanisms of action. Assessment of a single epigenetic modification such as DNA methylation may not have a predicted phenotypic effect or inform the causal pathway of disease, as multiple mechanisms are required to coordinately regulate transcriptional status. Altered transcription, in turn, may not be reflected in protein levels. According to systems theory, whereas individual genes or environmental factors may be key elements in a complex disease process, the phenotype is ultimately determined by the modulation of underlying pathways. Systems-based approaches provide a holistic view of interactions at the molecular, pathway and organism level, with connectivity described by networks. Systems approaches have been proposed for deriving networks informing risk assessment , disease development  and gene-environment interaction . Ultimately, these networks could define the continuum from baseline biological perturbation induced by environmental exposures through pre-clinical and clinical disease, at multiple levels.
The application of a systems approach to risk assessment has been proposed in which molecular networks are constructed from omics data at different levels of the system . Through biological interpretation and in vitro and in vivo data, key event networks with nodes representing toxicity pathways, are abstracted from the molecular network. In this scenario, mechanisms of action for an environmental factor would represent perturbations of the “normal” state and allow predictions of adverse outcomes to be made. Outcomes are driven at the individual level by the genetic, epigenetic and exposure profile and at the population level by common genetics, lifestyle and environment. This approach could be informative for disease-associated exposures for which underlying disease mechanisms are not understood despite knowledge of multiple modes of action such as benzene and arsenic . The systems approach would facilitate examination of the interactions among multiple modes of action and their variability with life stage, genetic background and dose.
Network-based approaches have also been applied to understand disease processes. Despite the large number of mutations, epigenetic alterations and gene expression perturbations catalogued for human disease [153–156], a relatively small number of pathways may ultimately underlie disease. Pedersen-Bjergaard characterized 8 different genetic pathways for the development of acute myeloid leukemia (AML) and showed that de novo and therapy-related AML can be considered biologically as the same disease . Similarly, a core set of 12 signaling pathways and processes have been identified in pancreatic cancer . A novel framework for the identification of disease-specific protein biomarkers through the integration of biofluid proteomes and inter-disease relationships using a network paradigm, was recently described . From a blood plasma biomarker network of 136 diseases and 1,028 detectable blood plasma proteins and a urinary biomarker network of 127 diseases and 577 urine proteins, it was shown that the majority (>80%) of putative protein biomarkers are linked to multiple disease conditions with few associated with a single disease.
A network-based gene-environment-disease approach recently identified key regulatory pathways that integrate genetic and environmental modulators of disease . In that study, a network of complex diseases and environmental factors was derived through the identification of key molecular pathways associated with both genetic and environmental effects based on information in the Genetic Association database  and the CTD . The analysis identified natural and synthetic retinoids, antipsychotic medications, omega 3 fatty acids, and pyrethroid pesticides as potential environmental modulators of metabolic syndrome phenotypes through PPAR and adipocytokine signaling, and organophosphate pesticides as potential environmental modulators of neuropsychiatric phenotypes. Intersection of the top pathways most often enriched in genetic association studies and environmental factor research suggest retinol metabolism, Jak-STAT signaling, Toll-like receptor signaling, and adipocytokine signaling are critical pathways important to complex disease progression.
A new research discipline, systems epidemiology, has been proposed that would use novel “globolomic” design of prospective cancer epidemiology studies, and data obtained through omic technologies and systems approaches, to assess cancer risk in an integrated manner . This approach would consider the complexity of the multistage carcinogenic process, the latency time, and the changing lifestyle of the cohort members, integrating data, spanning multiple levels of the biological scale, and environment information. Challenges remain in the design of human toxicogenomic and, ultimately, systems epidemiology studies.
A large number of toxicogenomic endpoints are generated from individual studies e.g. gene expression data for ~21,000 genes, DNA methylation data for multiple CpG sites per gene, ~1 million SNPs. Given the degree of human heterogeneity and the large numbers of potential biomarkers examined, the so-called curse of dimensionality means that toxicogenomic studies need to be designed with sufficient power (relatively large sample sizes) to detect effects of the exposure under examination and to allow for analysis of the interrelationship among different toxicogenomic endpoints in the systems biology approach. Although, as the dimension becomes larger, the challenge becomes more profound, such that studies that do not properly account for the analytical challenge run the risk of a high probability of false positive findings .
Epidemiologic studies generally adjust for confounding from age, smoking, and gender. Due to likely synergistic effects of complex mixtures, overlap in toxic mechanisms, and interaction with non-chemical stressors, studies should also adjust for past exposure to the substance under examination and current co-exposures that could be potentially confounding. Other confounding may be more difficult to control for. Diet modulates the human blood transcriptome  and even under similar dietary conditions, variability in the gut microbiome influences host metabolism, physiology and gene expression . Stress , exercise [40–41], and lifestyle  also modulate the human transcriptome. It has been postulated that distal environmental conditions, such as in utero or early childhood exposures, can influence an individual’s response to a later exposure . Cumulative damage such as genetic or epigenetic mutations could increase risk of disease even at low exposures, particularly those diseases occurring later in life. This is supported by the finding of a greater effect of environmental tobacco smoke (ETS) among smokers compared to never-smokers in a large prospective study of respiratory cancer and chronic obstructive pulmonary disease .
A major goal of human toxicogenomic studies is the examination of effects at low doses. Precise, individual exposure assessment and measures of internal dose covering a range of doses including low/environmental levels is necessary. Typically, dose-response has not been incorporated into the design of such studies. Our recent study of transcriptomic profiles associated with benzene exposure, which incorporated precise, individual exposure measurements and examined a range of doses, revealed potential biomarkers and pathways uniquely impacted at low-dose benzene exposure [65–66].
Given the high-dimensional nature of toxicogenomic data, standardization of data analysis is desirable. One of the main current approaches is to examine the association of variables (e.g., gene expression) and past exposures one at a time, minimizing the number of false positives by controlling experimental error rates, from the conservative family-wise error rate (FWER) to the more lenient false discovery rate (FDR). Many techniques have been proposed (see for instance  for permutation-based methods as well as the commonly used Benjamini and Hochberg method for controlling FDR ). We have focused on re-sampling based multiple testing methods that can gain efficiency by using knowledge on the marginal distribution of the test statistics [165–166]. We also believe there is great promise in using semiparametric models developed for causal inference as tools for biomarker discovery . In addition to looking on a gene by gene basis, one can gain power and possibly aid interpretation by looking for common patterns among sets of genes, either by use of clustering algorithms for instance,  and so-called gene-set enrichment analysis (GSEA)  and  as well as looking for gene ontologies with over-represented, differentially expressed genes [171–172].
The goals of deriving information pertinent to hazard identification (chemical classification) and risk characterization (mechanism of action) from human toxicogenomic data, are predicated on an expansion of current toxicogenomic databases. This is challenging given that human population studies are expensive to undertake with the power required for systems biology approaches. Characterization of the “normal” human blood transcriptome, methylome, and proteome is also necessary and will be defined as the number of studies increases. The NIH Roadmap Initiative, established in 2007, aims to develop comprehensive reference epigenome maps . As more toxicogenomics studies are performed, with robust exposure assessment, the range of “normal” profiles will be coordinately be delineated.
Human toxicogenomic studies typically analyze effects in readily available biofluids such as blood and urine. It is uncertain whether such tissues are good surrogates of all potential target tissues such as bladder, kidney, and lung. Toxicogenomic profiles from disease-relevant tissues such as exfoliated bladder cells in the investigation of bladder cancer  or bronchial airway epithelial cells in lung cancer, as was recently done using miRNA profiling for cigarette smoke exposure , may be necessary. Further, analyzing changes in blood generates an average of the responses of all cell populations therein and may mask effects on cellular subtypes. Optimal sample processing for all toxicogenomic endpoints is challenging.
A final challenge is the interpretation of the findings of human toxicogenomic studies. In order to make causal inferences, true effects need to be distinguished from adaptive responses in the context of appropriate phenotypic anchors. It is unclear whether the demonstration of key events (critical perturbations) predictive of health endpoints (e.g. cancer) is necessary or whether perturbations of baseline biological processes sufficient to induce substantial cellular level response (e.g. stress response) provide adequate endpoint risk assessment .
With appropriate study design and sufficient power, data from toxicogenomic studies of exposed human populations can inform risk assessment, by generating biomarkers of exposure and/or early effect, elucidating mechanisms of action underlying exposure-related disease, and, detecting effects at low doses. The incorporation of precise, individual exposure measurements, phenotypic anchors (pre-disease or traditional toxicological markers), and a range of relevant exposure doses are necessary. As more studies are performed and incorporated into databases such as CTD and CEBS, data can be mined for classification of newly tested chemicals (hazard identification), and, for investigating the inter-relationship among, genes, environment and disease in a systems biology approach (risk characterization). Efforts are underway to address the challenges to this approach including improvements in exposure assessment, accounting for past and current exposures and other confounding factors, particularly at low-doses, consideration of the effect of lifestage and the contribution of cumulative exposures, development of powerful bioinformatic approaches required for systems biology analyses and standardization of toxicogenomic statistical analyses. The size of the studies required for sufficient power, and the associated cost, needs to be addressed through prioritization and availability of funding.
We thank Drs. Kate Guyton and Babasaheb Sonawane, at U.S. EPA, for their critical comments and suggestions. This work was supported by NIH grants RO1ES06721 and P42ES04705.
Conflict of Interest Statement
Dr. Smith has received consulting and expert testimony fees from lawyers representing both plaintiffs and defendants in cases involving claims related to exposure to benzene.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Cliona M. McHale, School of Public Health, Division of Environmental Health Sciences, University of California, Berkeley, CA 94720.
Luoping Zhang, School of Public Health, Division of Environmental Health Sciences, University of California, Berkeley, CA 94720.
Alan E. Hubbard, School of Public Health, Division of Biostatistics, University of California, Berkeley, CA 94720.
Martyn T. Smith, School of Public Health, Division of Environmental Health Sciences, University of California, Berkeley, CA 94720.