Search tips
Search criteria 


Logo of plosgenPLoS GeneticsSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)View this Article
PLoS Genet. 2017 November; 13(11): e1007081.
Published online 2017 November 17. doi:  10.1371/journal.pgen.1007081
PMCID: PMC5711033

Orienting the causal relationship between imprecisely measured traits using GWAS summary data

Gibran Hemani, Conceptualization, Data curation, Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing,* Kate Tilling, Methodology, Writing – original draft, Writing – review & editing, and George Davey Smith, Conceptualization, Funding acquisition, Methodology, Writing – original draft, Writing – review & editing
Jun Li, Editor


Inference about the causal structure that induces correlations between two traits can be achieved by combining genetic associations with a mediation-based approach, as is done in the causal inference test (CIT). However, we show that measurement error in the phenotypes can lead to the CIT inferring the wrong causal direction, and that increasing sample sizes has the adverse effect of increasing confidence in the wrong answer. This problem is likely to be general to other mediation-based approaches. Here we introduce an extension to Mendelian randomisation, a method that uses genetic associations in an instrumentation framework, that enables inference of the causal direction between traits, with some advantages. First, it can be performed using only summary level data from genome-wide association studies; second, it is less susceptible to bias in the presence of measurement error or unmeasured confounding. We apply the method to infer the causal direction between DNA methylation and gene expression levels. Our results demonstrate that, in general, DNA methylation is more likely to be the causal factor, but this result is highly susceptible to bias induced by systematic differences in measurement error between the platforms, and by horizontal pleiotropy. We emphasise that, where possible, implementing MR and appropriate sensitivity analyses alongside other approaches such as CIT is important to triangulate reliable conclusions about causality.

Author summary

Understanding the causal relationships between pairs of traits is crucial for unravelling the causes of disease. To this end, results from genome-wide association studies are valuable because if a trait is known to be influenced by a genetic variant then this knowledge can be used to test the trait’s causal influences on other traits and diseases. Here we discuss scenarios where the nature of the genetic association with the causal trait can lead existing causal inference methods to give the wrong direction of causality. We introduce a new method that can be applied to summary level data and is potentially less susceptible to problems such as measurement error, and apply it to evaluate the causal relationships between DNA methylation levels and gene expression. While our results show that DNA methylation is more likely to be the causal factor, we point out that is it crucial to acknowledge that systematic differences in measurement error between the platforms could influence such conclusions.


Observational measures of the human phenome are growing ever more abundant, but using these data to make causal inference is notoriously susceptible to many pitfalls, with basic regression-based techniques unable to distinguish a true causal association from reverse causation or confounding [13]. In response to this, the use of genetic associations to instrument traits has emerged as a technique for improving the reliability of causal inference in observational data, and with the coincident rise in genome-wide association studies it is now a prominent tool that is applied in several different guises [36]. However, shifting from observational associations to instrumentation does require more (often untestable) assumptions, and potential pitfalls remain. One that is often neglected is the influence of non-differential measurement error on the reliability of causal inference.

Measurement error is the difference between the measured value of a quantity and its true value. This study focuses specifically on non-differential measurement error where all strata of a measured variable have the same error rate, which can manifest as changes in scale or measurement imprecision (noise). Such variability can arise through a whole plethora of mechanisms, which are often specific to the study design and difficult to avoid [7, 8]. Array technology is now commonly used to obtain high throughput phenotyping at low cost, but comes with the problem of having imperfect resolution, for instance methylation levels as measured by the Illumina450k chip are prone to have some amount of noise around the true value due to imperfect sensitivity [9, 10]. Relatedly, if the measurement of biological interest is the methylation level in a T cell, then measurement error of this value can be introduced by using methylation levels from whole blood samples because the measured value will be an assay of many cell types [11].

Measurement error will of course arise in other types of data too. For example when measuring BMI one is typically interested in using this as a proxy for adiposity, but it is clear that the correlation between BMI and underlying adiposity is not perfect [12], leading to the problem that phenotypes may be imprecisely defined. A similar problem of biological misspecification is unavoidable in disease diagnosis, and measuring behaviour such as smoking or diet is notoriously difficult to do accurately. Measurement error can also be introduced after the data have been collected, for example the transformation of non-normal data for the purpose of statistical analysis will lead to a new variable that will typically incur both changes in scale and imprecision (noise) compared to the original variable. The sources of measurement error are not limited to this list [8], and its impact has been explored in the epidemiological literature extensively [13, 14]. Given the near-ubiquitous presence of measurement error in phenomic data it is vital to understand its impact on the tools we use for causal inference.

An established study design that can provide information about causality is randomisation. Given the hypothesis that trait A (henceforth referred to as the exposure) is causally related to trait B (henceforth referred to as the outcome), randomisation can be employed to assess the causal nature of the association by randomly splitting the sample into two groups, subjecting one group to the exposure and treating the other as a control. The association between the exposure and the outcome in this setting provides a robust estimate of the causal relationship. This provides the theoretical basis behind randomised control trials, but in practice randomisation is often difficult or impossible to implement in an experimental context due to cost, scale or inability to manipulate the exposure. The principle, however, can be employed in extant observational data through the use of genetic variants associated with the exposure (instruments), where the inheritance of an allele serves as a random lifetime allocation of differential exposure levels [15, 16]. Two statistical approaches to exploiting the properties of genetic instruments are widely used: mediation-based approaches and Mendelian randomisation (MR).

Mediation-based approaches employ genetic instruments (typically single nucleotide polymorphisms, SNPs) to orient the causal direction between the exposure and the outcome. If a SNP is associated with an exposure, and the exposure is associated with some outcome, then it logically follows that in this simple three-variable scenario the estimated direct influence of the SNP on the outcome will be zero when conditioning on the exposure. Here, the exposure completely mediates the association between the SNP and the outcome, providing information about the causal influence of the exposure on the outcome. This forms the basis of a number of methods such as genetical genomics [17], the regression-based causal inference test (CIT) [4, 18], a structural equation modelling (SEM) implementation in the NEO software [5], and various other methods including Bayesian approaches [6]. They have been employed by a number of recent publications that make causal inferences in large scale ‘omics datasets [6, 1923].

MR can be applied to the same data—phenotypic measures of the exposure and the outcome variables and a genetic instrument for the exposure—but the genetic instrument is employed in a subtly different manner. Here the SNP is used as a surrogate for the exposure. Assuming the SNP associates with the outcome only through the exposure, the causal effect of the exposure on the outcome can be estimated by scaling the association between the SNP and the outcome by the association between the SNP and the exposure. Though difficult to test empirically, this assumption can be relaxed in various ways when multiple instruments are available for a putative exposure [24, 25] and a number of sensitivity tests are now available to improve reliability [26]. Additionally, if valid genetic instruments are known for both traits of interest then MR can be performed in both directions (bi-directional MR), testing the influence of one trait on the other and vice versa, to infer the causal direction between the two phenotypes [27, 28].

By utilising genetic instruments in different ways, mediation-based analysis and MR models have properties that confer some advantages and some disadvantages for reliable causal inference. In the CIT framework (described fully in the Methods) for example, the test statistic is different if you test for the exposure causing the outcome or the outcome causing the exposure, allowing the researcher to infer the direction of causality between two variables by performing the test in both directions and choosing the model with the strongest evidence. The CIT also has the valuable property of being able to distinguish between several putative causal graphs that link the traits with the SNP (Fig 1). Such is not the case for MR, where in order to infer the direction of causality between two traits the instrument must have its most proximal link with the exposure, associating with the outcome only through the exposure.

Fig 1
Gene expression levels (blue blocks) and DNA methylation levels (green triangles) may be correlated but the causal structure is unknown.

Assuming biological knowledge of genetic associations can be problematic because if there exists a putative association between two variables, with the SNP being robustly associated with each, it can be difficult to determine which of the two variables is subject to the primary effect of the SNP (i.e. for which of the two variables is the SNP a valid instrument? Fig 1). By definition, we expect that if the association is causal then a SNP for the exposure will be associated with the outcome, such that if the researcher erroneously uses the SNP as an instrument for the outcome then they are likely to see an apparently robust causal association of outcome on exposure. Genome-wide association studies (GWASs) that identify genetic associations for complex traits are, by design, hypothesis free and agnostic of genomic function, and it often takes years of follow up studies to understand the biological nature of a putative GWAS hit [29]. If multiple instruments are available for an hypothesised exposure, which is increasingly typical for complex traits that are analysed in large GWAS consortia, then techniques can be applied to mitigate these issues [16]. But these techniques cannot always be applied in the case of determining causal directions between ’omic measures where typically only one cis-acting SNP is known. For example if a DNA methylation probe is associated with expression of an adjacent gene, then is a cis-acting SNP an instrument for the DNA methylation level, or the gene expression level (Fig 1)?

MR has some important advantages over the mediation-based approaches. First, the mediation-based approaches require that the exposure, outcome and instrumental variables are all measured in the same data, whereas recent extensions to MR circumvent this requirement, allowing causal inference to be drawn when exposure variables and outcome variables are measured in different samples [30]. This has the crucial advantage of improving statistical power by allowing analysis in much larger sample sizes, and dramatically expands the breadth of possible phenotypic relationships that can be evaluated [26]. Second, the mediation-based approach of adjusting the outcome for the exposure to nullify the association between the SNP and the outcome is affected by unmeasured confounding of the exposure and outcome. This is because adjusting the outcome by the exposure induces a collider effect between the SNP and outcome [31], and in order to fully abrogate this association one must also adjust for all (hidden or otherwise) confounders. MR does not suffer from this problem because it does not test for association through adjustment. Third, when MR assumptions are satisfied the method is robust to there being measurement error in the exposure variable [32]. Indeed instrumental variable (IV) analysis was in part initially introduced as a correction for measurement error in the exposure [33], whereas it has been noted that both classic mediation-based analyses [13, 14, 34, 35] and mediation-based methods that use instrumental variables [36, 37] are prone to be unreliable in its presence.

Using theory and simulations we show how non-differential measurement error in phenotypes can lead to unreliable causal inference in the mediation-based CIT method. Though we only examine the CIT method in detail, we believe that attempting to adjust for mediating variables to make causal inference is susceptible to problems, which can be generalised to other mediation-based methods. We then present an extension to MR that allows researchers to ascertain the causal direction of an association even when the biology of the instruments are not fully understood, and also a metric to evaluate the sensitivity of the result of this extension to measurement error. Finally, to demonstrate the potential impact of measurement error we apply this method to infer the direction of causation between DNA methylation levels and gene expression levels. Our analyses highlight that because these different causal inference techniques have varying strengths and weaknesses, triangulation of evidence from as many sources as possible should be practiced in causal inference [38].


We model a system whereby some exposure x has a causal influence βx on an outcome y such that


In addition, the exposure is influenced by a SNP g with an effect of βg such that


The α* terms represent intercepts, and henceforth can be ignored. The ϵ* terms denote random error, assumed independently and normally distributed with mean zero. Mediation-based analyses that test whether x causally relates to y rely on evaluating whether the influence of g on y can be accounted for by conditioning on x, such that


where y^=β^xx and assuming no intercept y-y^=ϵx. MR analysis estimates the causal influence of x on y by using the instrument as a proxy for x, such that


where βMR ≠ 0 denotes the existence of causality, and βMR is an estimate of the causal effect.

Measurement error of an exposure can be modeled as a transformation of the true value (x) that leads to the observed value, xo = f(x). For example, following Pierce and VanderWeele [32] we can define

f(x) = αmxβmxxϵmx

where αmx and βmx influence the error in the measurement of x by altering its scale, and ϵmx represents the imprecision (or noise) in the measurement of x. Measurement imprecision can represent imprecise measurement due to limits on sensitivity of measuring equipment, or arise because of phenotypes being imprecisely defined. The same model of measurement error can be applied to the outcome variable y.

In this study we assume there is no measurement error in the SNP. Common genetic variants are typically less susceptible to measurement error due to strict quality control procedures prior to genome wide association studies. Any non-differential measurement error that might be present (either because the SNP is poorly typed or because the SNP is not in complete linkage disequilibrium with the causal variant) will reduce power in MR but will not incur bias [3, 13, 32]. We also assume that measurement error in the exposure and the outcome are uncorrelated.


Mediation-based causal inference under measurement error

In the causal inference test (CIT), the 4th condition (see Methods) employs mediation for causal inference, and can be expressed as cov(g,y-y^)=0, where y^=α^x+β^xxo. When measurement error in scale and imprecision is introduced, such that yo is the measured value of y, it can be shown using basic covariance properties (S1 Text) that




Thus an observational study will find cov(g,yo-yo^)=0 when the true model is causal only when D = 1. Therefore, if there is any measurement error that incurs imprecision in x (i.e. var(ϵmx) ≠ 0) then there will remain an association between g and yo|xo, which is in violation of the the 4th condition of the CIT. Note that scale transformation of x or y without any incurred imprecision is insufficient to lead to a violation of the test statistic assumptions, and henceforth mention of measurement error will relate to imprecision unless otherwise stated.

We performed simulations to verify that this problem does arise using the CIT method. Fig 2 shows that when there is no measurement error in the exposure or outcome variables (ρx,xo = ρy,yo = 1) the CIT is reliable in identifying the correct causal direction. However, as measurement error increases in the exposure variable, eventually the CIT is more likely to infer a robust causal association in the wrong direction. Also of concern here is that increasing sample size does not solve the issue, indeed it only strengthens the apparent evidence for the incorrect inference.

Fig 2
The CIT was performed on simulated variables where the exposure influenced the outcome and the exposure was instrumented by a SNP.

Using MR Steiger to infer the direction of causality

If we do not know whether the SNP g has a primary influence on x or y then CIT can attempt to infer the causal direction. Though bi-directional MR can be used to orient causal directions [27], this requires knowledge of a valid instrument for each trait, and we were motivated to develop the MR Steiger method that could operate on summary data to orient the direction of causality using the same conditions as the CIT, where the underlying biology of a single SNP is not fully understood. We go on to explore the scenarios in which the method is likely to return the correct or incorrect causal directions.

We performed simulations to compare the power and type 1 error rates of MR and CIT in detecting a causal association between simulated variables under different levels of imprecision simulated in the exposure. Comparing the performance of methods with different sets of assumptions can be difficult, but a basic comparison is shown in Fig 3. We observe that the CIT is more conservative under the null model of no association owing to the omnibus test statistic comprising several statistical tests. The FDR using a p-value threshold of 0.05 appears to be close to zero, whereas for the MR Steiger method the FDR is around 0.05. Using the same p-value thresholds to declare significance in the non-null simulations, the general trend appears to be that the CIT power reduces as measurement error in the exposure increases more steeply than that of the MR Steiger method.

Fig 3
Outcomes were simulated to be unrelated to the exposure (bottom plot, showing false positive rates on the y-axis) or causally influenced by the exposure (top plot, showing true positive rates on the y-axis) with varying degrees of measurement imprecision ...

For a particular association, it is of interest to identify the range of possible measurement error values for which the method will give results that agree or disagree with the empirically inferred causal direction (Fig 4a, S2 Text). This metric can be used to evaluate the reliability of MR Steiger test.

Fig 4
a) We can predict the values the MR Steiger test would take (z-axis) for different potential values of measurement error (x and y axes), drawn here as the blue surface. When ρg,y > ρg,x, as denoted by the range of values where ...

We show that in the presence of measurement imprecision, d = ρx,xoρx,yρy,yo (S2 Text) determines the range of parameters around which the MR Steiger test is liable to provide the wrong direction of causality (i.e. if d > 0 then the MR Steiger test is likely to be correct about the causal direction). Fig 4b shows that when there is no measurement error in x, the MR Steiger test is unlikely to infer the wrong direction of causality even if there is measurement error in y. It also shows that in most cases where x is measured with error, especially when the causal effect between x and y is not very large, the sensitivity of the MR Steiger test to measurement error is relatively low.

Unmeasured confounding between the exposure and outcome can also give rise to problems with the MR Steiger approach (S3 Text). The relationship between unmeasured confounding and causal orientation is complex across the parameter space of possible confounding values (S2 Fig). Based on the range of parameter values that we explored, when the magnitude of the observational variance explained between the exposure and the outcome is below 0.2 the MR Steiger method is unlikely to return the incorrect causal direction due to unmeasured confounding.

Comparison of CIT and MR Steiger for obtaining the correct direction of causality

We used simulations to explore the performance of the MR Steiger approach in comparison to CIT for different levels of measurement error. The performance was compared in terms of the rate at which evidence of a causal relationship is obtained for the correct direction of causality, and the rate at which evidence of a causal relationship is obtained where the reported direction of causality is incorrect. Simulations were performed for two models, one for a “causal model” where there was a causal relationship between x and y; and one for a “non-causal model” where x and y were not causally related, but had a confounded association induced by the SNP g influencing x and y independently.

Fig 5a shows that, for the “causal model”, the MR analysis is indeed liable to infer the wrong direction of causality when d < 0, and that this erroneous result is more likely to occur with increasing sample size. However, the CIT is in general more fallible to reporting a robust causal association for the wrong direction of causality. When d > 0 we find that in most cases the MR Steiger method has greater power to obtain evidence for causality than CIT, and always obtains the correct direction of causality. The CIT, unlike the MR Steiger test, is able to distinguish the “non-causal model” from the “causal model” (Methods, Fig 5b), but it is evident that measurement error will often lead the CIT to identify the causal model as true, when in fact the underlying model is this non-causal model.

Fig 5
a) Outcome y was simulated to be caused by exposure x as shown in the graph, with varying degrees of measurement error applied to both. CIT and MR were used to infer evidence for causality between the exposure and outcome, and to infer the direction of ...

The causal relationship between gene expression and DNA methylation levels

We used the MR Steiger test to infer the direction of causality between DNA methylation and gene expression levels between 458 putative associations. We found that the causal direction commonly goes in both directions (Fig 6a), but assuming no or equal measurement error, DNA methylation levels were the predominant causal factor (p = 1.3 × 10−5). The median reliability (R) of the 458 tests was 3.92 (5%-95% quantiles 1.08–37.11). We then went on to predict the causal directions of the associations for varying levels of systematic measurement error for the different platforms. Fig 6a shows that the conclusions about the direction of causality between DNA methylation and gene expression are very sensitive to measurement error. We made a strong assumption that either methylation influenced gene expression or vice versa, but it is certainly possible that the SNP is solely or additionally influencing some other trait that confounds the association between gene expression and DNA methylation.

Fig 6
Using 458 putative associations between DNA methylation and gene expression we used the MR Steiger test to infer the direction of causality between them.

We performed two sample MR [30] for each association in the direction of causality inferred by the Stieger test. We observed that the sign of the MR estimate was generally in the same direction as the Pearson correlation coefficient reported by Shakhbazov et al [39] (Fig 6b). There was a moderate correlation between the absolute magnitudes of the causal correlation and the observational Pearson correlation (r = 0.45). Together these inferences suggest that even in estimating associations between ‘omic’ variables, which are considered to be low level phenotypes, it is important to use causal inference methods over observational associations to infer causal effect sizes.

We also observed that for associations where methylation caused gene expression the causal effect was more likely to be negative than for the associations where gene expression caused methylation (OR = 0.61 (95% CI 0.36–1.03), Fig 6c), suggesting that reducing methylation levels at a controlling CpG typically leads to increased gene expression levels, consistent with expectation [40].


Researchers are often confronted with the problem of making causal inferences using a statistical framework on observational data. In the epidemiological literature issues of measurement error in mediation analysis are relatively well explored [41]. Our analysis extends this to related methods such as CIT that are used in predominantly ’omic data. These methods are indeed susceptible to the same problem as standard mediation based analysis, and specifically we show that as measurement error in the (true) exposure variable increases, CIT is likely to have reduced statistical power, and liable to infer the wrong direction of causality. We also demonstrate that, though unintuitive, increasing sample size does not resolve the issue, rather it leads to more extreme p-values for the model that predicts the wrong direction of causality.

Under many circumstances a practical solution to this problem is to use Mendelian randomisation instead of methods such as the CIT or similar that are based on mediation. Inferring the existence of causality using Mendelian randomisation is robust in the face of measurement error and, if the researcher has knowledge about the biology of the instrument being used in the analysis, can offer a direct solution to the issues that the CIT faces. This assumption is often reasonable, for example SNPs are commonly used as instruments when they are found in genes with known biological relevance for the trait of interest. But on many occasions, especially in the realm of ’omic data, this is not the case, and methods based on mediation have been valuable in order to be able to both ascertain if there is a causal association and to infer the direction of causality. Here we have described a simple extension to MR which can be used as an alternative to or in conjunction with mediation based methods. We show that this method is still liable to measurement error, but because it has different properties to the CIT it offers several main advantages. First, it uses a formal statistical framework to test for the reliability of the assumed direction of causality. Second, after testing in a comprehensive range of scenarios the MR based approach is less likely to infer the wrong direction of causality compared to CIT, while substantially improving power over CIT in the cases where d > 0.

We demonstrate this new method by evaluating the causal relationships of 458 known associations between DNA methylation and gene expression levels using summary level data. The inferred causal direction is heavily influenced by how much measurement error is present in the different assaying platforms. For example, if DNA methylation measures typically have lower or equal measurement error compared to gene expression measures then our analysis suggests that DNA methylation levels would be more often the causal factor in the association. Indeed, previous studies which have evaluated measurement error in these platforms do support this position [42, 43], though making strong conclusions for this analysis is difficult because measurement error is likely to be study specific. We also haven’t accounted for the influence of winner’s curse, which can inflate estimates of the variance explained by SNPs, with higher inflation expected amongst lower powered studies. Using p-values for genetic associations from replication studies will mitigate this problem.

In our simulations we focused on the simple case of a single instrument in a single sample setting with a view to making a fair comparison between MR and the various mediation-based methods available. However, if there is only a single instrument it is difficult to separate between the two competing models of g instrumenting a trait which causes another trait, and g having pleiotropic effects on both traits independently [44]. Under certain conditions of measurement error the CIT test can distinguish these models. We also note that it is straightforward to extend the MR Steiger approach to multiple instruments, requiring only that the total variance explained by all instruments be calculated under the assumption that they are independent. Multiple instruments can indeed help to distinguish between the causal and pleiotropic models, for example by evaluating the proportionality of the SNP-exposure and SNP-outcome effects [16]. Additionally, if there is at least one instrument for each trait then bi-directional MR can offer solutions to inferring the causal direction [16, 28, 45]. We restricted the simulations to evaluating the causal inference between quantitative traits, but it is possible that the analysis could be extended to binary traits by using the genetic variance explained on the liability scale, taking into account the population prevalence [46]. However, our analysis goes beyond many previous explorations of measurement error by assessing the impacts of both imprecision (noise) and linear transformations of the true variable on causal inference.

Our new method attempts to infer causal directions under the assumption that horizontal pleiotropy (the influence of the instrument on the outcome through a mechanism other than the exposure) is not present. Recent method developments in MR [24, 25] have focused on accounting for the issues that horizontal pleiotropy can introduce when multiple instruments are available, but how they perform in the presence of measurement error remains to be explored. An important advantage that MR confers over most mediation based analysis is that it can be performed in two samples, which can considerably improve power and expand the scope of analysis. However, whether there is a substantive difference in two sample MR versus one sample MR in how measurement error has an effect is not yet fully understood. We have also assumed no measurement error in the genetic instrument, which is not unreasonable given the strict QC protocols that ensure high quality genotype data is available to most studies. We have restricted the scope to only exploring non-differential measurement error and avoided the complications incurred if measurement error in the exposure and outcome is correlated. We have also not addressed other issues pertaining to instrumental variables which are relevant to the question of instrument-exposure specification. One such problem is exposure misspecification, for example an instrument could associate with several closely related putative outcomes, with only one of them actually having a causal effect on the outcome. This problem has shown to be the case for SNPs influencing different lipid fractions, for example [47, 48].

Mediation based network approaches, that go beyond analyses of two variables, are very well established [37] and have a number of extensions that make them valuable tools, including for example network construction. But because they are predicated on the basic underlying principles of mediation they are liable to suffer from the same issues of measurement error. Recent advances in MR methodology, for example applying MR to genetical genomics [49], multivariate MR [48] and mediation through MR [5052] may offer more robust alternatives for these more complicated problems.

The overarching result from our simulations is that, regardless of the method used, inferring the causal direction using an instrument of unknown biology is highly sensitive to measurement error. With the presence of measurement error near ubiquitous in most observational data, and our ability to measure it limited, we argue that it needs to be central to any consideration of approaches which are used in attempt to strengthen causal inference, and any putative results should be accompanied with appropriate sensitivity analysis that assesses their robustness under varying levels of measurement error.


CIT test

Here we describe how the CIT method [4] is implemented in the R package R/cit [18]. Assume an exposure x is instrumented by a SNP g, and the exposure x causes an outcome y, as described above. The following tests are then performed:

  1. H0: cov(g, y) = 0; H1: cov(g, y) ≠ 0; the SNP associates with the outcome
  2. H0: cov(g, x|y) = 0; H1: cov(g, x|y) ≠ 0; the SNP associates with the exposure conditional on the outcome
  3. H0: cov(x, y|g) = 0; H1: cov(x, y|g) ≠ 0; the exposure associates with the outcome conditional on the SNP
  4. H0: cov(g, y|x) ≠ 0; H1: cov(g, y|x) = 0; the SNP is independent of the outcome conditional on the exposure

The term in the 4th test can be rewritten as cov(g,y|x)=cov(g,y-y^) where y-y^=y-(α^g+β^gx) is the residual of y after adjusting for x, and x is assumed to mediate the association between the SNP and the outcome. The condition in the 4th test is formulated as an equivalence testing problem that is estimated using simulations, comparing the estimate from the data against empirically obtained estimates for simulated variables where the independence model is true (full details are given in [4]). We note here that this approach is liable to fail, even when there is a true causal relationship, when confounders of the exposure and outcome are present, as these will induce collider bias.

If all four tests reject the null hypothesis then it is inferred that x causes y. The CIT measures the strength of causality by generating an omnibus p-value, pCIT, which is simply the largest (least extreme) p-value of the four tests, the intuition being that causal inference is only as strong as the weakest link in the chain of tests.

Now we describe how we used the CIT method in our simulations. The cit.cp function was used to obtain an omnibus p-value. To infer the direction of causality using the CIT method, an omnibus p-value generated by CIT for each of two tests—pCIT,xy, was estimated for the direction of x causing y (Model 1), and for the direction of y causing x, pCIT,yx (Model 2). The results from each of these methods can then be used in combination to infer the existance and direction of causality. For some significance threshold α there are four possible outcomes from these two tests, and their interpretations are as follows:

  • If pCIT,xy < α and pCIT,yx > α then model 1 is accepted
  • If pCIT,xy > α and pCIT,yx < α then model 2 is accepted
  • If pCIT,xy > α and pCIT,yx > α then no evidence for a causal relationship
  • If pCIT,xy < α and pCIT,yx < α then there is potentially confounding (S1 Fig) and no call is made.

For the purposes of compiling simulation results we use an arbitrary α = 0.05 value, though we stress that for real analyses it is not good practice to rely on p-values for making causal inference, nor is it reliable to depend on arbitrary significance thresholds [53].

MR causal test

Two stage least squares (2SLS) is a commonly used technique for performing MR when the exposure, outcome and instrument data are all available in the same sample. A p-value for this test, pMR, was obtained using the systemfit function in the R package R/systemfit [54]. Note that the value of pMR is identical when using the same genetic variant to instrument the influence of the exposure x on the outcome y, or erroneously, instrumenting the outcome y on the exposure x.

The method that we will now describe is designed to distinguish between two models, xy or yx. Unlike the CIT framework, this approach cannot infer if the true model is xgy. We also assume all genetic effects are additive.

To infer the direction of causality it is desirable to know which of the variables, x or y, is being directly influenced by the instrument g. This can be achieved by assessing which of the two variables has the biggest absolute correlation with g (S2 Text), formalised by testing for a difference in the correlations ρgx and ρgy using Steiger’s Z-test for correlated correlations within a population [55]. It is calculated as


where Fisher’s z-transformation is used to obtain Zg*=12ln(1+ρg*1-ρg*),






The Z value is interpreted such that


and a p-value, pSteiger is generated from the Z value to indicate the probability of obtaining a difference between correlations ρgx and ρgy at least as large as the one observed, under the null hypothesis that both correlations are identical.

The existence of causality and its direction is inferred based on combining information from the MR analysis and the Steiger test. The MR analysis indicates whether there is a potential causal relationship (pMR), and the Steiger test indicates the direction (sign(Z)) of the causal relationship and the confidence of the direction (pSteiger). For the purposes of compiling simulation results, these can be combined using an arbitrary α = 0.05 value:

  • If pSteiger < α and pMR < α and Z > 0 then a causal association for the correct model is accepted, xy
  • If pSteiger < α and pMR < α and Z < 0 then a causal association for the incorrect model is accepted, yx
  • Otherwise if pSteiger > α or pMR > α, neither model is accepted

Note that the same correlation test approach can be applied to a two-sample MR setting. Two-sample MR refers to the case where the SNP-exposure association and SNP-outcome association are calculated in different samples (e.g. from publicly available summary statistics [26, 30]). Here the Steiger test of two independent correlations can be applied where.


An advantage of using the Steiger test in the two sample context is that it can compare correlations in independent samples where sample sizes are different. Steiger test statistics were calculated using the r.test function in the R package R/psych [56].

The Steiger test assumes that there is a causal relationship between the two variables, and that the SNP is a valid instrument for one of them. However it is liable to give incorrect causal directions under some other circumstances. First, some levels of horizontal pleiotropy, where the SNP influences the outcome through some pathway other than the exposure, could induce problems because this is a means by which the instrument is invalid. Second, some differential values of measurement error between the exposure and the outcome could lead to incorrect inference of the causal direction (S2 Text). Third, some levels of unmeasured confounding between the exposure and the outcome could lead to inference of the wrong causal direction (S3 Text).

Causal direction sensitivity analysis for measurement error

The Steiger test for inferring if xy is based on evaluating ρgx > ρgy. However, ρgx (or ρgy) are underestimated if x (or y) are measured imprecisely. If, for example, x has lower measurement precision than y then we might empirically obtain ρg,xo < ρg,yo because ρg,xo could be underestimated more than ρg,yo.

As we show in S2 Text it is possible to infer the bounds of measurement error on xo or yo given known genetic associations. The maximum measurement imprecision of xo is ρg,xo, because it is known that at least that much of the variance has been explained in xo by g. The minimum is 0, denoting perfectly measured trait values (the same logic applies to yo). It is possible to simulate what the inferred causal direction would be for all values within these bounds.

To evaluate how reliable, R, the inference of the causal direction is to potential measurement error in x and y we need to predict the values of ρgyρgx for those values of measurement error. We offer two tools in which to do this. First, the user can provide values of measurement error for x and y and obtain a revised inference of the causal direction. Second, we integrate over the entire range of ρgyρgx values for possible measurement error values, assuming that any measurement error value is equally likely. Across all possible values of measurement error in x and y we find the volume that agrees with the inferred direction of causality and the volume that disagrees with the inferred direction of causality, and take the ratio of these two values. A ratio R = 1 indicates that the inferred causal direction is highly sensitive to measurement error, because equal weight of the measurement error parameter space supports each direction of causality. In general, the R value denotes that the inferred direction of causality is R times more likely to be the empirical result than the opposite direction (S2 Text).


Simulations were conducted by creating variables of sample size n for the exposure x, the measured values of the exposure xo, the outcome y, the measured values of the outcome yo and the instrument g. One of two models are simulated, the “causal model” where x causes y and g is an instrument for x; or the “non-causal model” where g influences a confounder u which in turn causes both x and y. Here x and y are correlated but not causally related. Each variable in the causal model was simulated such that:


where non-differential measurement error is represented by a noise (measurement imprecision) term ϵm*N(0,σm*2), and measurement bias terms αm* and βm* for the exposure variable x and the outcome variable y. Note that following the first section of the Results we no longer include the bias terms for simplicity. We have formulated the non-causal model as:


All α values were set to 0, and β values set to 1. Normally distributed values of ϵ* were generated such that


giving a total of 432 combinations of parameters. Simulations using each of these sets of variables were performed 100 times, and the CIT and MR methods were applied to each in order to evaluate the causal association of the simulated variables. Similar patterns of results were obtained for different values of cor(g, x).

Applied example using two sample MR

Two sample MR [30] was performed using summary statistics for genetic influences on gene expression and DNA methylation. To do this we obtained a list of 458 gene expression—DNA methylation associations as reported in Shakhbazov et al [39]. These were filtered to be located on the same chromosome, have robust correlations after correcting for multiple testing, and to share a SNP that had a robust cis-acting effect on both the DNA methylation probe and the gene expression probe. Because only summary statistics were available (effect, standard error, effect allele, sample size, p-values) for the instrumental SNP on the methylation and gene expression levels, the Steiger test of two independent correlations was used to infer the direction of causality for each of the associations. The Wald ratio test was then used to estimate the causal effect size for the estimated direction for each association.

All analysis was performed using the R programming language [57] and code is made available at and implemented in the MR-Base ( platform [26].

Supporting information

S1 Text

The influence of measurement error in the exposure on mediation-based estimated.


S2 Text

Sensitivity analysis for measurement error on the MR Steiger test.


S3 Text

The influence of unmeasured confounding on the inference of causal directions.


S1 Fig

Influence of confounding on CIT.

Illustrative simulations (n = 5000) showing the results from CIT analysis under a model of confounding. Here, the phenotypes x and y are not causally related, but there is a genetic effect and a confounder both influencing each phenotype. Each point represents a single simulation. Where power is high (when the absolute values of the x and y axes are large) the CIT returns a significant result (p < 0.01) when testing the causal effect of x on y, and when testing the causal effect of y on x.


S2 Fig

Influence of confounding on MR Steiger.

Graph representing the unmeasured confounding parameters that will lead to the MR Steiger test returning the wrong causal direction. Columns of boxes represent different signed values of the observational variance explained between x and y (Rxy2).


Funding Statement

This work was supported by the UK Medical Research Council (MC_UU_12013/1 and MC_UU_12013/9). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

Data Availability

All scripts and data used to perform these analyses are available without restriction at The summary data used in the main analysis was obtained from a previously published study (Shakhbazov et al 2016) and can be downloaded from and


1. Phillips AN, Davey Smith G. How independent are “independent” effects? relative risk estimation when correlated exposures are measured imprecisely. Journal of Clinical Epidemiology. Pergamon; 1991;44: 1223–1231. doi: 10.1016/0895-4356(91)90155-3 [PubMed]
2. Davey Smith G, Ebrahim S. Data dredging, bias, or confounding. BMJ. 2002;325: 1437–8. doi: 10.1136/bmj.325.7378.1437 [PMC free article] [PubMed]
3. Davey Smith G, Ebrahim S. Mendelian randomization: prospects, potentials, and limitations. International journal of epidemiology. 2004;33: 30–42. doi: 10.1093/ije/dyh132 [PubMed]
4. Millstein J, Zhang B, Zhu J, Schadt EE. Disentangling molecular relationships with a causal inference test. BMC genetics. 2009;10: 23 doi: 10.1186/1471-2156-10-23 [PMC free article] [PubMed]
5. Aten JE, Fuller TF, Lusis AJ, Horvath S. Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC systems biology. 2008;2: 34 doi: 10.1186/1752-0509-2-34 [PMC free article] [PubMed]
6. Waszak SM, Delaneau O, Gschwind AR, Kilpinen H, Raghav SK, Witwicki RM, et al. Variation and genetic control of chromatin architecture in humans. Cell. Elsevier Inc. 2015;162: 1039–1050. [PubMed]
7. Houle D, Pélabon C, Wagner G, Hansen T. Measurement and meaning in biology. The Quarterly Review of Biology. 2011;86: 3–34. Available: [PubMed]
8. Hernán M a, Cole SR. Invited Commentary: Causal diagrams and measurement bias. American journal of epidemiology. 2009;170: 959–62; discussion 963–4. doi: 10.1093/aje/kwp293 [PMC free article] [PubMed]
9. Harper KN, Peters B a, Gamble MV. Batch effects and pathway analysis: two potential perils in cancer studies involving DNA methylation array analysis. Cancer epidemiology, biomarkers & prevention: a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology. 2013;22: 1052–60. doi: 10.1158/1055-9965.EPI-13-0114 [PMC free article] [PubMed]
10. Chen Y, Lemire M, Choufani S, Butcher DT, Grafodatskaya D, Zanke BW, et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics: official journal of the DNA Methylation Society. 2013;8: 203–9. doi: 10.4161/epi.23470 [PMC free article] [PubMed]
11. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics. 2012;13: 86 doi: 10.1186/1471-2105-13-86 [PMC free article] [PubMed]
12. Ahima RS, Lazar MA. Physiology. The health risk of obesity–better metrics imperative. Science (New York, NY). American Association for the Advancement of Science; 2013;341: 856–8. doi: 10.1126/science.1241244 [PubMed]
13. Cessie S le, Debeij J, Rosendaal FR, Cannegieter SC, Vandenbroucke JP. Quantification of bias in direct effects estimates due to different types of measurement error in the mediator. Epidemiology (Cambridge, Mass). 2012;23: 551–60. doi: 10.1097/EDE.0b013e318254f5de [PubMed]
14. Blakely T, McKenzie S, Carter K. Misclassification of the mediator matters when estimating indirect effects. Journal of epidemiology and community health. 2013;67: 458–66. doi: 10.1136/jech-2012-201813 [PubMed]
15. Davey Smith G, Ebrahim S. ’Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. 2003;32: 1–22. doi: 10.1093/ije/dyg070 [PubMed]
16. Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Human molecular genetics. Oxford Univ Press; 2014;23: R89–R98. doi: 10.1093/hmg/ddu328 [PMC free article] [PubMed]
17. Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, GuhaThakurta D, et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genetics. 2005;37: 710–717. doi: 10.1038/ng1589 [PMC free article] [PubMed]
18. Millstein J. cit: Causal Inference Test. R package version 1.9 [Internet]. 2016. Available:
19. Koestler DC, Chalise P, Cicek MS, Cunningham JM, Armasu S, Larson MC, et al. Integrative genomic analysis identifies epigenetic marks that mediate genetic risk for epithelial ovarian cancer. BMC medical genomics. BMC Medical Genomics; 2014;7: 8 doi: 10.1186/1755-8794-7-8 [PMC free article] [PubMed]
20. Liu Y, Aryee MJ, Padyukov L, Fallin MD, Hesselberg E, Runarsson A, et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nature biotechnology. 2013;31: 142–7. doi: 10.1038/nbt.2487 [PMC free article] [PubMed]
21. Yuan W, Xia Y, Bell CG, Yet I, Ferreira T, Ward KJ, et al. An integrated epigenomic analysis for type 2 diabetes susceptibility loci in monozygotic twins. Nature communications. 2014;5: 5719 doi: 10.1038/ncomms6719 [PMC free article] [PubMed]
22. Tang Y, Axelsson AS, Spégel P, Andersson LE, Mulder H, Groop LC, et al. Genotype-based treatment of type 2 diabetes with an α2A-adrenergic receptor antagonist. Science translational medicine. 2014;6: 257ra139 doi: 10.1126/scitranslmed.3009934 [PubMed]
23. Hong X, Hao K, Ladd-Acosta C, Hansen KD, Tsai H-J, Liu X, et al. Genome-wide association study identifies peanut allergy-specific loci and evidence of epigenetic mediation in US children. Nature communications. 2015;6: 6304 doi: 10.1038/ncomms7304 [PMC free article] [PubMed]
24. Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. International Journal of Epidemiology. 2015;44: 512–25. doi: 10.1093/ije/dyv080 [PMC free article] [PubMed]
25. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator. Genetic Epidemiology. 2016;40: 304–314. doi: 10.1002/gepi.21965 [PMC free article] [PubMed]
26. Hemani G, Zheng J, Wade KH, Laurin C, Elsworth B, Burgess S, et al. MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations. BioRxiv. 2016;10.1101/07.
27. Timpson NJ, Nordestgaard BG, Harbord RM, Zacho J, Frayling TM, Tybjærg-Hansen a, et al. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. International journal of obesity (2005). 2011;35: 300–8. doi: 10.1038/ijo.2010.137 [PMC free article] [PubMed]
28. Richmond RC, Davey Smith G, Ness AR, Hoed M den, McMahon G, Timpson NJ. Assessing Causality in the Association between Child Adiposity and Physical Activity Levels: A Mendelian Randomization Analysis. Ludwig DS, editor. PLoS Medicine. 2014;11: e1001618 doi: 10.1371/journal.pmed.1001618 [PMC free article] [PubMed]
29. Claussnitzer M, Dankel SN, Kim K-H, Quon G, Meuleman W, Haugen C, et al. FTO Obesity Variant Circuitry and Adipocyte Browning in Humans. The New England journal of medicine. 2015;373: 895–907. doi: 10.1056/NEJMoa1502214 [PMC free article] [PubMed]
30. Pierce BL, Burgess S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. American journal of epidemiology. 2013;178: 1177–84. doi: 10.1093/aje/kwt084 [PMC free article] [PubMed]
31. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology (Cambridge, Mass). 2004;15: 615–25. Available: doi: 10.1097/01.ede.0000135174.63482.43 [PubMed]
32. Pierce BL, VanderWeele TJ. The effect of non-differential measurement error on bias, precision and power in Mendelian randomization studies. International Journal of Epidemiology. Oxford University Press; 2012;41: 1383–1393. doi: 10.1093/ije/dys141 [PubMed]
33. Ashenfelter O, Krueger AB. Estimates of the Economic Return to Schooling from a New Sample of Twins. The American Economic Review. 1994;84: 1157–1173.
34. Nagarajan R, Scutari M. Impact of noise on molecular network inference. PloS one. 2013;8: e80735 doi: 10.1371/journal.pone.0080735 [PMC free article] [PubMed]
35. Shpitser I, VanderWeele T, Robins J. On the validity of covariate adjustment for estimating causal effects. Proceedings of the Twenty Sixth Conference on Uncertainty in Artificial Intelligence (UAI-10). 2010; 527–536.
36. Wang L, Michoel T. Detection of regulator genes and eQTLs in gene networks. arXiv. 2015;arXiv:1512. Available:
37. Lagani V, Triantafillou S, Ball G, Tegner J, Tsamardinos I. Probabilistic Computational Causal Discovery for Systems Biology. Uncertainty in biology: A computational modeling approach. Springer; 2015. p. 47 Available:{\&}pgis=1
38. Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. International Journal of Epidemiology. Volume 45, Issue 6, 1 December 2016, Pages 1866–1886. doi: 10.1093/ije/dyw314 [PubMed]
39. Shakhbazov K, Powell JE, Hemani G, Henders AK, Martin NG, Visscher PM, et al. Shared genetic control of expression and methylation in peripheral blood. BMC genomics. BioMed Central; 2016;17: 278 doi: 10.1186/s12864-016-2498-4 [PMC free article] [PubMed]
40. Bird A. DNA methylation patterns and epigenetic memory. Genes & development. 2002;16: 6–21. doi: 10.1101/gad.947102 [PubMed]
41. Cole DA, Preacher KJ. Manifest Variable Path Analysis: Potentially Serious and Misleading Consequences Due to Uncorrected Measurement Error. Psychological Methods. 2014;19: 300–315. doi: 10.1037/a0033805 [PubMed]
42. Bose M, Wu C, Pankow JS, Demerath EW, Bressler J, Fornage M, et al. Evaluation of microarray-based DNA methylation measurement using technical replicates: the Atherosclerosis Risk In Communities (ARIC) Study. BMC Bioinformatics. 2014;15: 312 doi: 10.1186/1471-2105-15-312 [PMC free article] [PubMed]
43. Bryant PA, Smyth GK, Robins-Browne R, Curtis N, Novak J, Sladek R, et al. Technical Variability Is Greater than Biological Variability in a Microarray Experiment but Both Are Outweighed by Changes Induced by Stimulation. Khodursky AB, editor. PLoS ONE. Public Library of Science; 2011;6: e19556 doi: 10.1371/journal.pone.0019556 [PMC free article] [PubMed]
44. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics. Nature Research; 2016;48: 481–487. doi: 10.1038/ng.3538 [PubMed]
45. Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B. Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits. The American Journal of Human Genetics. 2017;100: 473–487. doi: 10.1016/j.ajhg.2017.01.031 [PubMed]
46. Lee SH, Wray NR. Novel genetic analysis for case-control genome-wide association studies: quantification of power and genomic prediction accuracy. PLoS One. 2013;8: e71494 doi: 10.1371/journal.pone.0071494 [PMC free article] [PubMed]
47. Do R, Willer CJ, Schmidt EM, Sengupta S, Gao C, Peloso GM, et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nature Genetics. Nature Research; 2013;45: 1345–1352. doi: 10.1038/ng.2795 [PMC free article] [PubMed]
48. Burgess S, Freitag DF, Khan H, Gorman DN, Thompson SG. Using multivariable Mendelian randomization to disentangle the causal effects of lipid fractions. PloS one. Public Library of Science; 2014;9: e108891 doi: 10.1371/journal.pone.0108891 [PMC free article] [PubMed]
49. Relton CL, Davey Smith G. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. International journal of epidemiology. 2012;41: 161–76. doi: 10.1093/ije/dyr233 [PMC free article] [PubMed]
50. Varbo A, Benn M, Smith GD, Timpson NJ, Tybjaerg-Hansen A, Nordestgaard BG. Remnant cholesterol, low-density lipoprotein cholesterol, and blood pressure as mediators from obesity to ischemic heart disease. Circulation research. 2015;116: 665–73. doi: 10.1161/CIRCRESAHA.116.304846 [PubMed]
51. Burgess S, Daniel RM, Butterworth AS, Thompson SG. Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. International journal of epidemiology. 2015;44: 484–95. doi: 10.1093/ije/dyu176 [PMC free article] [PubMed]
52. Richmond RC, Hemani G, Tilling K, Davey Smith G, Relton CL. Challenges and novel approaches for investigating molecular mediation. Human molecular genetics. Oxford University Press; 2016;25: R149–R156. doi: 10.1093/hmg/ddw197 [PMC free article] [PubMed]
53. Sterne JAC, Smith GD. Sifting the evidence—what’s wrong with significance tests? BMJ. 2001;322: 226–231. doi: 10.1136/bmj.322.7280.226 [PMC free article] [PubMed]
54. Henningsen A, Hamann JD. systemfit: A Package for Estimating Systems of Simultaneous Equations in R. Journal of Statistical Software. 2007;23: 1–40.
55. Steiger JH. Tests for comparing elements of a correlation matrix. Psychological Bulletin. 1980;87: 245–251. doi: 10.1037/0033-2909.87.2.245
56. Revelle W. psych: Procedures for Psychological, Psychometric, and Personality Research [Internet]. Evanston, Illinois: Northwestern University; 2015. Available:
57. R Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: R Foundation for Statistical Computing; 2015. Available:

Articles from PLoS Genetics are provided here courtesy of Public Library of Science