The process we describe in this section has two components. The first uses DAGs as a visual tool to explore a range of possible interaction scenarios. The second uses DAGs as a formal tool to describe the formal dependence among the variables in the problem. These two components go hand in hand, as intuition about the problem will generally guide the first whilst the second will reflect information in the observed data as well as considerations about what is biologically plausible. In a second instance, which is beyond the scope of this paper, the interaction quantitative effects can be estimated. How the latter step is done will depend both on the nature of the data available and crucially on the model for interaction. We assume an additive interaction model for simplicity; however, the DAGs work equally well with a multiplicative model as they describe associations rather than their exact mathematical nature.
We consider first the case study of gene-environment interactions (GEI) involving risk of Parkinson's disease, the
DJ-1 gene and exposure to paraquat described above. To do this we use simplified versions of models proposed by Khoury et al. [
55] and Ottman [
56]. Subsequently, we consider fruit fly experiments where the associations between Parkinson's,
DJ-1 and paraquat have been ascertained, and we present this as the ideal situation to make causal inference. The approach we are proposing can be also used to tackle a range of other complex problems.
In order to look at possible GEI scenarios we need to introduce some simple notation:
gene: DJ-1 = d* variant (deletion as in the Dutch families or inactivity as in the Italian families); DJ-1 = d wild type
pesticides: P = p* exposed; P = p unexposed
disease: Y = 1 with Parkinson's disease; Y = 0 without Parkinson's disease
The crux of this approach is the introduction of an interaction variable I. It is determined by the values of the genetic and environmental exposure variables. In simple terms, it acts like a switch and is turned "on" when the parents (a parent P of another variable X has an edge pointing into X, and X is a child of P) take on some values, and "off" when the parents have other values. In the current context this is typically the presence of the genetic exposure (i.e., the genetic variant) and/or the environmental exposure that leads to an increase in disease risk which turns the interaction "on". Thus, in addition to the above variables, we also define:
interaction: I = 1 ("on") if there is an interaction and I = 0 ("off") if there isn't. The exact nature of the interaction depends on the contexts sketched below.
For the sake of simplicity, we assume that I is a deterministic variable. What we mean by this is that unlike the other variables in the problem, I is not random. Once the value of its parents is known, then so is the value of I. This might be considered unduly restrictive if there are other potential parents in the interaction which are suspected but unobserved. It is possible in these cases to view I as a random variable, where its variability is associated with that of the unobserved interactant. However, in the paper we focus on the simplest case and thus we make the following assumption:
1. DJ-1 and P are the only parents of the interaction variable I. Another assumption that is generally plausible, provided that the exposure does not modify the genetic structure (e.g., the exposure does not cause somatic mutations) is that:
2. There is no a priori association between the gene and the external exposure; this is represented by the absence of a directed edge between DJ-1 and P in the DAGs below.
Generally, this is a plausible assumption provided that the exposure does not modify the genetic structure [
57]. In this specific example, this assumption is likely to be true. However, with other environmental exposures this assumption does not hold. For example, the association of some lifestyle factors with genotypes predisposing (or causing) Parkinson's disease is possible as the dopaminergic system is involved in rewarding mechanisms and it is hypothesized to influence some seeking behaviours and addiction (i.e., smoking or alcohol drinking) [
58].
The idea of I as a variable to represent interaction is similar to the sufficient component cause (SCC) variables in VanderWeele and Robins [
59]. We feel however that our approach presents a few advantages over the SCC framework. As we do not need to incorporate all the sufficient causes (we are not using a causal DAG), the structure of our DAGs is less cumbersome. Also, although for the sake of simplicity we have defined I in terms of binary exposures, we can easily extend it if we are considering multi-valued or continuous exposures. The DAG in Figure shows a complex situation we can imagine, given assumptions 1 and 2, in which there is confounding between both the exposure to paraquat and the disease outcome (Cp) as well as confounding between
DJ-1 and the disease (Cd), and no other variables are postulated. Confounding between both exposure to paraquat and the disease might be due, for example, to the fact that people exposed to paraquat may also be more likely to smoke, a factor that is negatively associated with the risk of Parkinson's disease [
60]. Confounding between
DJ-1 and the disease might be due to the involvement of the dopamine-mediated rewarding system [
58]. Any observational study -any study of these issues in humans- is unlikely to observe all potential confounders. Nevertheless, just to simplify our model, we also assume that:
3. There are no further confounders between either the gene and the outcome or the exposure and the outcome. This is represented by the absence of additional variables and corresponding directed edges in the DAGs below.
Now we turn our attention to looking at the case by evaluating the plausibility of a few different GEI scenarios. As mentioned above, these are loosely based on Khoury et al. [
55]. For each of the models that we consider below, we present a more formal description in Appendix 4.
Model I
Both exposure and genotype are required to increase risk as in Figure . Here, if I is "on" then there is an association between the disease and the genetic exposure and the environmental exposure to pesticides when both are present. If on the other hand I is "off" then there is no association -in other words, Parkinson's is only associated with DJ-1 and paraquat exposure through the interaction itself. This is an extreme form of interaction that is unlikely to occur in the pathogenesis of common diseases. Does this model describe the relationship between DJ-1, exposure to pesticides and Parkinson's disease? For this to be the case, all the Dutch and Italian families with the variant DJ-1 and Parkinson's would also have to have been exposed to pesticides. Further, the incidence of Parkinson's amongst the families with the gene variant would have to be the same on average as that of those without the gene variant (if unexposed to pesticides). Similarly, those exposed to pesticides would have to have the same incidence as those not exposed to the pesticides without the DJ-1 variant. This is clearly not the case.
Model II
The exposure to pesticides increases the risk of disease but the presence of the gene variant alone does not increase the risk of disease, although the variant further increases the risk of disease in the exposed population (Figure ). In this model, I is switched on and off by P. When P = p* (exposure to pesticides) I = 1, indicating that the interaction is switched "on" and the presence of the variant in DJ-1 and Parkinson's is influential. When P = p then I = 0 and whether DJ-1 is the variant or wild-type form makes no difference to the outcome Y. It is possible that in some cases exposure to P is protective; i.e., I would take the opposite value of P in a binary situation. In more complex situations, the effect of P might be such that only certain values of P result in interactions and in these cases the values of I and P would not be the same. In this instance, we have that Y depends directly on exposure P; however, Y depends on DJ-1 only through the interaction and the exposure when this is present -i.e. when P = p*.
This model is also not a plausible description of the relationship between the three variables based on the evidence at hand, as it would mean that all the families with the variant and Parkinson's would have to also have been exposed to pesticides.
Model III
Exposure to pesticides exacerbates the effect of the gene variant but has no effect on persons with the normal genotype. In this model, I is switched on and off by DJ-1. The model does not provide either a plausible explanation of the available evidence (Figure ).
Model IV
The environmental exposure and the gene variant both have some effect of their own but together they further modify the effect of the other. Here I is a function of both P and DJ-1 and is defined as follows: I is "on" if and only if both P and DJ-1 are "on" otherwise I is "off". Here there are also direct associations between P and Y and DJ-1 and Y other than through I; this indicates that there are effects of P on Y irrespective of DJ-1, and effects of DJ-1 on Y irrespective of P. From the data we cannot distinguish between DAGs A and B in Figure .
A core issue with these models is that I is essentially unobservable in humans living under normal conditions; these biological interactions can only be tested in animal experiments. Thus, in humans we cannot disentangle the two DAGs above apart without further information (VanderWeele and Robins [
61] provide some tests to determine which individuals present Y only when the interaction I is "on" provided there is no unmeasured confounding). In order to be able to fully tell them apart, an experiment can be conducted or the relative risks can be compared (see Appendix 1).
In light of the evidence on Parkinson's disease, we have to favour one of the two models IV above the other three, as it would appear that both the genetic and the environmental exposure have separate (independent) effects on the risk of Parkinson's. However, from the data on humans we cannot distinguish between the two "type IV" models until we run a study to determine the presence of an interaction. In the case of the Drosophila experiments (see section below) the interaction model on the left-hand side provides a better explanation, as flies with the mutation that have been exposed demonstrate further sensitivity to exposure to pesticides than those who do not have the mutation.
The example we have shown exemplifies, we think, a common situation concerning the interaction between metabolic genes and environmental exposures (e.g. arylamines and NAT2, PAH and GSTM1 and many others) but has the peculiarity that experiments in Drosophila have been done (see below).
Experimental evidence: the case of the Drosophila
The DAGs above alone cannot be directly used for causal inference unless additional assumptions are made or experiments conducted. The reason is the limited information on potential confounders (and intermediate variables, etc.) that can influence the relationship between the three observed variables. For the sake of making the DAGs clear, we have assumed that there are no confounders; however this is unlikely to be the case in practice as Parkinson's is a multifactorial disease. The method we have proposed can however be extended to include confounders and intermediate variables.
In the case of Drosophila the situation is simpler. Meulener et al. [
49] show that both exposure to pesticides and the mutation of
DJ-1 may be associated with increased risk of neural degeneration. Further, the combination of the two has also been demonstrated to aggravate the condition, as the flies which had the
DJ-1 gene knocked out exhibited a ten-fold increase in sensitivity to paraquat (which would indicate a supra-multiplicative interaction).
As in this case both the genetic make-up and the exposure status of the flies have been intervened upon under controlled conditions, we can make causal inference based on this data by introducing randomisation variables into our DAG. The DAG in Figure is an augmented DAG [
38] that includes randomisation variables Rp and Rd. These tell us whether P or
DJ-1 are being randomised or not and allow us to make inferences about interventions and, hence, causality using DAGs. For a more detailed discussion see Appendix 2.
The DAG in Figure implied that for the Drosophila at least we can state that exposure to pesticides causes an increased risk of neural damage, as does the presence of the mutated DJ-1 gene. Also as the combined presence of the mutation and paraquat further increases the risk of neural damage, we can ascertain the presence of an interaction. It should be noted that DAGs do not specify or constrain the model of statistical interaction, which can follow either an additive or a multiplicative null hypothesis model.
In the case of humans, we cannot assume such randomisation variables exist (except in Mendelian randomisation which, however, applies to gene variants only, and not to exposure); thus, we cannot expand the DAG in Figure . On the other hand, etiologic factors and clinical phenotypes are usually more diverse in human diseases than in animal models; inferences to human diseases from relatively simple animal experiments have well known limitations. An avenue for progress lies in integrating DAGs with the inductive reasoning implicit in Hill's guidelines.
Application of causal guidelines to DJ-1 and exposure to paraquat for Parkinson's disease
Following the DAG approach, we established the relationship between genes and some environment exposures in promoting Parkinson's disease, and we proposed different interaction models between DJ-1, pesticides and Parkinson's disease. In order to apply Hill's causal guidelines to the DAGs we are going to work with (Figure ), we need to label each of the edges. Throughout the rest of this section we use the following labels:
• The edge between DJ-1 and Parkinson's disease is referred to as [edge 1],
• The edge between exposure to pesticides and Parkinson's disease is referred to as [edge 2],
• The interaction between DJ-1 and the exposure to pesticides in causing Parkinson's disease is called [edge 3].
Hill's guidelines are discussed in a slightly different order than in the original version and statistical significance is omitted because it refers to the contingent evaluation of each study and does not require a specific discussion in relation to genomics.
(a)
Strength of association. DJ-1 has been seen to be lacking in Dutch families with Parkinson's disease, and to be functionally inactive because of a point mutation in the Italian families studied by Bonifati and cols [
51]. The deletion showed complete cosegregation with the disease allele in the Dutch family [
51]; also in the Italian family the homozygous mutation showed complete cosegregation with the disease haplotype, and absence from large numbers of control chromosomes [
62]. Although the function of the DJ-1 protein is unknown, these data suggest a strong association between the
DJ-1 gene and the occurrence of Parkinson's disease in certain families [edge 1]. To establish the strength of the association between specific environmental factors and a disease is far more complicated, mainly due to the quality of exposure assessment, the latency period, and body concentrations during the lifecourse. A meta-analysis of the association of pesticides and Parkinson's disease points out that both pesticide exposure in general and selective exposure to paraquat seem to be associated with Parkinson's disease, with odds ratios ranging from 1.25 (95% C.I.: 0.34 - 4.36) to 3.22 (95% C.I.: 2.41 to 4.31) [
53] [edge 2]. With respect to the interaction parameter, there is as yet no epidemiological study that has tested whether there is an interaction between
DJ-1 and pesticides; thus neither the existence nor the strength of such an association are known. However, knockout models of Drosophila Melanogaster (fruit fly) lacking DJ-1 function, display a marked and selective sensitivity to the environmental oxidative insults exerted by both paraquat and rotenone [
49], suggesting an interaction between these toxicants and the
DJ-1 genotype [edge 3] in animal models and, consequently, that in humans the interaction between the chemicals and
DJ-1 is biologically plausible (as can be seen, Hill's criteria often "interact", i.e., they are often related to each other, as in this paragraph the strength of association is related to the biological plausibility).
(b)
Consistency of the association. After the first variants described, different variants of the
DJ-1 gene associated with the same Parkinson's disease phenotype have been found in patients of Ashkenazi Jewish and Afro-Caribbean origins [
63,
64] [edge 1]. The association of paraquat and rotenone with Parkinson's disease is more consistent in animals (in which these two toxicants are often used to produce animal models of the disease) [
54] than in humans. In environmental epidemiological studies in humans, the association has been found substantially consistent across studies, although some associations did not reach statistical significance, mainly due to limited sample size. In a study in Taiwan, where paraquat is routinely used in rice fields, a strong association between paraquat exposure and Parkinson's disease was found; the hazard increased by more than six times in subjects exposed for more than 20 years [
64]. A dose-response curve with length of exposure was also observed in plantation workers in Hawaii [
65], and British Columbia [
66]. In a population-based case-control study in Calgary, occupational herbicide use was the only significant predictor of Parkinson's disease in multivariable analysis [
67]. However, in another population-based case-control study in Washington, the odds ratio of 1.67 did not reach statistical significance (95% CI: 0.22-12.76) [
68] [edge 2]. There is yet no evidence from human studies to confirm the consistency of GEIs in the causation of Parkinson's disease [edge 3]. Furthermore, genes other than
DJ-1 may be involved in the etiopathogenic process, and so may be exposures other than pesticides, and other GEIs. Since environmental conditions vary substantially across the globe, and the role of one gene, one exposure or one GEI is often dependent on other genes, exposures and GEIs, lack of consistency is to be expected in studies conducted in different settings, and in particular when studies focus only on a few GEIs and overlook other interactions.
(c)
Specificity of the association. The specificity of the association between
DJ-1 gene mutations and Parkinson's disease [edge 1] will be clearer once the data on the pathological features of the
DJ-1 patients will be available (see Appendix 3). Chronic systemic exposure to rotenone has been demonstrated to cause highly selective nigrostriatal dopaminergic degeneration associated with characteristic movement disorders in rats [
54] [edge 2]. Similarly, paraquat caused a significant loss of nigral dopaminergic neurons in mice compared to controls [
69] [edge 2]. Once an appropriate epidemiological study is set up aimed at studying GEIs in this context, results from the pathological analysis of the sample subjects will help to answer important questions regarding the aetiological pathway of the disease [edge 3].
(d)
Temporality. This criterion does not apply directly to genotype, as it is determined at conception and it remains constant over time (see Appendix 1) [edge 1]. However, temporality is crucial if we go beyond genetic effects and consider epigenetic mechanisms; e.g., gene regulation by environmental factors [
14,
16-
18]. This problem goes beyond the present contribution, but is worth mentioning. Concerning pesticides, temporality might be a concern given that all studies on GEI in Parkinson's disease are case-control studies, which are particularly prone to selection bias, disease progression bias, and so-called "reverse causality" [
3,
70,
71]. In this case, while it is unlikely that suffering from Parkinson's disease would have influenced past exposure to pesticides or their metabolism, it could have influenced recall. The observed dose-response relationship, with 20 years of exposure required [
53], favours the existence of a true association, and is compatible with disease characteristics of neurodegeneration, making the temporality pattern suggestive of a causal role [edge 2].
(e)
Biological gradient. This criterion does not apply since we are dealing with a recessive model of inheritance. Nonetheless, a co-dominant model should not be completely ruled out as a careful neurological evaluation of heterozygote subjects might point out some sub-clinical changes [edge 1]. A dose-response relationship between toxicant exposure and neural loss in animal experiments has been observed [
72]. In addition, several studies observed a positive correlation with duration of exposure to, and high dose of, herbicides and insecticides in humans [
53] [edge 2].
(f)
Biological plausibility. Biological plausibility of the
DJ-1 mutation awaits the discovery and characterisation of the encoded protein [edge 1]; the capability of some toxicants to induce a progressive cellular loss in the substantia nigra and to be responsible for a progressive clinical syndrome with an intervening latent period has been hypothesized [
54] [edge 2]. It is, therefore, plausible that these two factors may interact during the course of life producing Parkinson's symptoms in genetically susceptible individuals [edge 3].
(g)
Coherence with previous knowledge. Confirmation of the presence of different mutations on the same
DJ-1 gene in families with other background origins but manifesting the same symptoms supports the involvement of the gene in the disease [
63,
64] [edge 1]. A role of herbicides in neurodegeneration has also been studied with generally confirmatory results [edge 2].
All these considerations taken together suggest that there may be a potential interaction between exposure to certain pesticides and the DJ-1 mutation in the risk of developing Parkinson's disease. However, as no studies on humans have yet been specifically conducted to investigate this issue, we can use the evidence only as a reason to further explore this interaction, perhaps by conducting a more targeted study. As mentioned, it is likely that other factors (both genetic and environmental) also contribute to the final development of the disease.
In the example above we have shown that the DAG approach can be complemented by the use of Hill's guidelines when no experimental evidence can be brought to bear on a particular gene-environment interaction.