We have described a prototype Environmental-Wide Association Study (EWAS) and applied it to the study of Type 2 Diabetes (T2D), and validated many of our significant findings across independent cohorts and confirmed some of them through the literature. This pilot study is made possible by the examination of multiple cohorts present in the nationally representative NHANES dataset. We have rediscovered factors such as carotenes and PCBs with previously known association with T2D. Unexpectedly, we found higher levels of γ-tocopherol were associated with higher likelihood of T2D, independent of dietary intake. Of the components of Vitamin E, γ-tocopherol is the most abundant form in the US diet
[36], and makes up to 50% of the total vitamin E in human muscle and adipose tissue
[37], two known insulin-target tissues. As γ-tocopherol has been previously suggested as a preventive agent against colon cancer
[38], any potential adverse metabolic effects for this vitamin should be studied closely.
Another novel finding was in the significant association between heptachlor epoxide levels and T2D. Heptachlor is a pesticide; most uses of heptachlor were discontinued in the late 1980s
[39]. The main source of heptachlor and its breakdown product, heptachlor epoxide, is from food, but heptachlor epoxide is persistent in the environment and can even be passed in breast milk
[40]. While a significant association with T2D has been reported across thirty-thousand pesticide applicators who used the pesticide heptachlor
[41], to our knowledge, this broad association between heptachlor epoxide and T2D in the general public, as surveyed by NHANES, is novel.
While this study successfully demonstrates a prototype EWAS for T2D, this methodology can be reconfigured to measure the relationship between environmental factors and other disorders, such as obesity, lipid level abnormalities, hypertension, and/or cardiovascular disease. Methodologically, the EWAS takes inspiration from GWAS, which have been used to assess the correlation between genome-wide variability and disease.
Like GWAS, the utility of EWAS lies in two types of hypothesis generation. First, the EWAS framework can be used to propose targets for further study. For example, many factors are correlated; some are similar structurally, such as the isomers of β-carotene, or co-occur in the environment, such as the PCBs and organochlorine pesticides. As we extend the GWAS analogy, these and other environmental factors could be said to be in “linkage disequilibrium” with each other. Just as is done for preliminary GWAS findings, EWAS findings can and should be used to identify further factors that may be in “disequilibrium,” for further detailed measurement and causal identification.
We acknowledge that the measurement of 266 environmental factors is hardly a comprehensive study of the environment, but this is still a greater number of factors measured than the 30 microsatellite markers
[42], or 100 single nucleotide polymorphisms (SNPs) in some of the earliest implementations of GWAS
[43]. We suggest that measurement technologies for the environment can and will improve in resolution, as novel associations are made using even few measurements in these prototype studies. Measurement of the panel of environmental factors used here, most of which are performed by mass spectrometry, currently costs an estimated $40,000 per individual
[44], or close to the current pricing for whole-genome sequencing.
Another type of hypothesis we may generate is regarding the complex cause of disease. For example, we can now use an EWAS to hypothesize about “gene-environment” interactions and their relation to disease etiology. A future study addressing gene-environment interactions might be designed as a combination of both a GWAS and EWAS, where genetic variability is assessed simultaneously along with key environmental factors. While marginally more resource intensive, this type of study design could perhaps facilitate an explanation of disease causation that has eluded genomic-wide scans in addition and provide more accurate estimates of attributable risk.
The EWAS allows for comprehensive and systematic analysis of the effects of the environment in association with disease on a broad scale. While many investigators have already utilized the NHANES to address the effect of a limited number of factors on disease, they do not provide a global view of these associations
[45],
[46]. Further, while arriving at similar results, the previous studies use differing definitions of T2D status (medical questionnaire), exposure coding (discretization or log transformation), and lack methods for multiple comparison control
[47]–
[49]. It is the well-established toolkit of the GWAS that has provided us with methods to overcome these limitations and to enable us to postulate about environment-wide association with disease.
Limitations of this study remind us that measuring environment-wide aspects in relation to phenotypic states such as disease will be a difficult undertaking
[50]. Unlike genetic loci, the environment is boundless. While the NHANES provides a large number of factors to study, a comprehensive assessment will require precise definition over a broader dimension (more factors). While laboratory measurements are collected during a baseline fasting state for all participants in NHANES, we will still have to account for the dynamic and heterogeneous nature of different exposures and their associated responses by taking replicate measurements at different physiological states. Further, this study utilizes cross-sectional data and can only show correlation between exposure and disease prevalence. To ascertain causality, we would need to perform prospective EWAS over the life course, consider incident cases, and/or consider randomization methods
[51] as additional validation. Due to the number of hypotheses generated, we would need to integrate more evidence from large-scale collaborative studies in order to confirm (or refute) etiological aspects of these factors while being as comprehensive as possible in the observation of potential confounding variables. For example, additional factors such as behavior (food consumption, drug use, and/or exercise patterns), geographic location, and occupation must also be ascertained to account for associated risk factors and reverse causality.
While GWAS has allowed us to find novel variants associated with T2D of possible mechanistic importance and provided a model for a comprehensive study of the environment described here, associated variants have had only moderate effect sizes to date. Most of the risk loci identified with GWAS have small individual odds ratios, generally less than 1.3
[52]–
[54] and the highest has been reported to be 1.71, belonging to a variant in the
TCF7L2 gene
[55],
[56]. Albeit from different populations and analytical scenarios, the effect sizes of our validated environmental factors on T2D were comparable to the highest odds ratios seen in GWAS.
However, the correlated and dynamic nature of a multitude of environmental factors will hinder causal inference to a greater degree than GWAS
[50]. Nevertheless, similar biases do influence GWAS interpretation. For example, the statistical association of a variant of
FTO with T2D was nullified by accounting for BMI
[57]. However, despite these hindrances, we view EWAS similarly to GWAS, a step towards learning about a component that plays a large role in complex disease.
It is imperative not only for epidemiologists and geneticists but also physicians and their patients to understand how multiple environmental factors may influence disease in a systematic fashion. Individuals are already demanding information regarding their “body burdens”, or the number and amount of chemicals present in their system, as evidenced by the “Human Toxome Project”
[44],
[58]. We must learn how all these factors might contribute to disease in context of other common risk factors to inform our health care practitioners and individuals appropriately. We must conduct our analyses in a non-selective fashion.
In conclusion, the EWAS is a promising way to search and consider potential environmental factors as associated with disease or other clinical phenotypes. These results demand a rethinking and restructuring of studies that study disease in the genomics context. The time is ripe to usher in “enviromics”
[59], the study of a wide array of environmental factors in relation to health and biology.