We outline the idea now known as “Mendelian randomisation” using the example provided by Katan [10
] in his early description of the concept in 1986, although the first implementation of this basic idea in an epidemiological setting under the flag of “Mendelian randomisation” was more recent [11
]. Details of the derivation of the approach and its nomenclature are provided in a recent review [12
In the mid-1980s, there was considerable debate over the hypothesis that low serum cholesterol levels might directly increase the risk of cancer. Alternative explanations for the observed association were that cholesterol levels were lowered by the presence of latent tumours in future cancer patients (reverse causation), or that both cancer risk and cholesterol levels might be affected by confounding factors like diet and smoking. The observation that individuals with abetalipoproteinaemia, and hence negligible levels of serum cholesterol, did not seem to be predisposed to cancer led Katan to the idea of finding a larger group of individuals genetically inclined towards lower cholesterol levels. The apolipoprotein E (ApoE) gene was known to affect serum cholesterol, the ApoE2 variant being associated with lower levels. Katan's idea was that many individuals will carry the ApoE2 variant and thus will naturally have lower cholesterol levels from birth. Crucially, since genes are randomly assigned during meiosis (which gives rise to the name “Mendelian randomisation”), these ApoE2 carriers will not be systematically different from carriers of the other ApoE alleles in any other respect, and in consequence there should be no confounding. Only if low serum cholesterol is really causal for the disease should cancer patients have more ApoE2 alleles than controls. Otherwise the distribution of ApoE alleles should be similar in both groups. This can be easily checked from the observed distributions.
Katan's reasoning corresponds exactly to what is known as an instrumental variable method in econometrics [13–16
]. The genetic variant acts as a so-called instrumental variable (or instrument) and helps to disentangle the confounded causal relationship between intermediate phenotype and disease. Once this theoretical connection had been made, epidemiologists were able to learn from and adapt the methods that were so well known in econometrics [7
The three key assumptions for Katan's idea to work, and hence for a genetic variant to qualify as an instrumental variable, are illustrated graphically in and interpreted as follows.
- The genetic variant is unrelated to (independent of) the typical confounding factors, i.e., the graph has no arrow (in either direction) connecting ApoE with the confounders.
- The genetic variant is (reliably) associated with the exposure, i.e., there is an arrow connecting ApoE to serum cholesterol and we can accurately quantify the relationship this represents.
- For known exposure status (cholesterol level) and known confounders (if the confounders were observable), i.e., conditional on exposure and confounders, the genetic variant is independent of the outcome, i.e., ApoE does not provide any additional information for the prediction of cancer once these two variables are measured. An equivalent way of expressing this, which is less precise but perhaps more intuitive, is to say that there is no direct effect of genotype on disease (no single arrow between ApoE and cancer) nor any other mediated effect other than through the exposure of interest (no other routes in the graph between ApoE and cancer).
The ApoE Genotype as an Instrumental Variable in a Mendelian Randomisation Application
Note that these assumptions have to be justified from background knowledge of the underlying biology. Neither the first nor the third assumption can be tested statistically since they depend on the confounding factors, which, by definition, are unobserved. The first assumption means that you must have reasonable belief that your genetic variant is unaffected by the sort of confounding that might generally be expected of such an exposure–disease relationship. Fortunately, the very basis of Mendelian randomisation rests on the knowledge that alleles are randomly assigned from parental alleles at meiosis (see above), and this implies that, across the population, genetic effects are relatively robust, although not immune to confounding [7
]. Furthermore, the type of information needed to explore this assumption is often available in practice, as it is usually well-studied genetic variants that are proposed as instruments. Assumption 3 demands a comprehensive understanding of the underlying biological and clinical science, and may appropriately be considered in a sensitivity analysis. Unlike the first and third, the second assumption can
formally be tested using the observed data, and the method works better the stronger the association between gene and exposure.
If the three assumptions seem reasonable (i.e., is believable), then it can be shown that, as Katan originally hypothesised, a simple statistical test of association between the ApoE
genotype and cancer amounts to a test for causal effect of cholesterol levels on cancer [19
The idea of using a gene as an instrument to test for a causal effect of an intermediate phenotype on a disease has been used for a range of other traits, some of which are summarised in [9
]. For example, raised plasma fibrinogen levels have been associated with an elevated risk of coronary heart disease (CHD) in large-scale prospective studies, prompting suggestions that methods to reduce fibrinogen levels should be sought [29
]. If the fibrinogen–CHD relationship were causal, then such interventions could have considerable clinical and public health benefits. However, interventions to lower plasma fibrinogen levels would not be warranted if the association was explained by confounding or reverse causation. Doubts about a causal link between fibrinogen and CHD have been raised by evidence that the association is considerably attenuated by adjustment for smoking, body mass index, and plasma apolipoprotein B/A1
], and that there are many known correlates of fibrinogen, only some of which are typically measured and adjusted for in individual studies [30
]. Furthermore, bezafibrate was found to reduce plasma fibrinogen in randomised controlled trials, but it did not have a greater effect on CHD risk than could already be explained by its cholesterol-lowering effect [31
Examples of Mendelian Randomisation Studies
Additional light can be cast on this relationship from relevant genetic studies. A recent large meta-analysis of genetic association studies of fibrinogen promoter region polymorphisms (G-455
) showed that there was a mean increase in fibrinogen of 0.12 g/l (95% confidence interval [CI] 0.09 to 0.14) per copy of the A
allele. However, these same alleles were not
associated with CHD risk: the odds ratio per allele was 0.98 (95% CI 0.92 to 1.04) [21
]. Since the 95% confidence interval includes the null hypothesis value of 1, we cannot reject the null hypothesis at the 5% level and hence conclude that the data provide little or no evidence for a causal effect of fibrinogen on CHD. This could be due to random error or lack of power of the statistical test, which is a problem with genetic association studies when relatively small effects are being sought. The findings are also consistent with the hypothesis that the associations shown previously in observational studies are partially or wholly explained by reverse causation or confounding. Of course, as with any test, the fact that an exposure appears
to be non-causal does not
necessarily mean that it is not clinically useful. Clearly, it would be dangerous to stop investigating the role of fibrinogen in CHD risk because of such an outcome. What is
implied, however, is that more investigation is required before making any great investment in intervening on fibrinogen levels.
Mendelian randomisation can also be applied when the exposure of interest is a modifiable behaviour rather than an intermediate phenotype. For example, Chen et al. [9
] consider the causal effect of alcohol intake on blood pressure. An RCT would be problematic here, and measurement of alcohol intake is prone to error. Hence, observational data have to be considered in a setting where the causal relationship of interest is known to be heavily confounded. In some populations, a particular variant (*2
) of the ALDH2
gene is quite common. The *2
variant is associated with accumulation of acetaldehyde, and therefore unpleasant symptoms, after drinking alcohol. Carriers of this variant tend to limit their alcohol consumption, and alleles at the ALDH2
locus can hence be used as a surrogate or proxy for alcohol intake [9
]. Based on this assumption, a Mendelian randomisation meta-analysis approach, combining evidence from several studies, indicated that previous observational evidence on the beneficial effects of moderate drinking on blood pressure were possibly misleading. Exploring biological complexity is another important application of the method, although we have not focused on this aspect here. Li et al. [32
] use a Mendelian randomisation approach to infer parts of biological causal pathways, for example.