Vitamin D can be ingested or synthesized in the skin from inactive precursors through the action of UV sunlight. Its active form, 1,25(OH)2
) is produced after two hydroxylation steps in the liver and kidneys () 
. The prevalence of vitamin D deficiency in Scotland is high due to high northern latitude, often cloudy weather (lack of sunlight impairs vitamin D synthesis during winter months), indoors oriented lifestyle and poor diet, and so routine vitamin D and calcium supplementation for the housebound (>65 years old) is recommended 
. In a recent study of over 2000 healthy individuals living in Scotland, we found that 77.5% of the individuals were vitamin D deficient 
. Although the Reference Nutrient Intake (RNI) of vitamin D by the Scientific Advisory Committee on Nutrition in Scotland for people over 65 years old is 10 ug per day 
, there is a great variation of the recommended daily allowances (RDA) by different research groups and institutions 
Vitamin D has been considered relevant to skeletal disease and calcium metabolism, but there is growing evidence that vitamin D deficiency might be a risk factor for cancer, cardiovascular, metabolic, infectious and autoimmune diseases 
. In particular, vitamin D may affect colorectal cancer (CRC) risk via its binding to the vitamin D receptor (VDR) 
influencing cell proliferation, differentiation, apoptosis and angiogenesis 
or affecting insulin resistance 
. Results from case-control and cohort studies are inconclusive, but results from cohort studies measuring 25-hydroxy-vitamin D (25-OHD) in the blood or the serum are more consistent indicating an inverse association with CRC 
Establishing causal relationships between environmental exposures and common diseases using conventional methods of observational studies is problematic due to unresolved confounding, reverse causation and selection bias 
. The theory underpinning the Mendelian randomization (MR) approach is based on the random assortment of alleles at the time of gamete formation, which is equivalent to a randomized controlled trial in which people are randomly allocated to therapeutic interventions. The main concept of a MR study is based on three relationships: genotype–intermediate phenotype; intermediate phenotype–disease; genotype–disease 
and it can be used to identify causal environmental risk factors without the several potential problems of observational epidemiology 
. The MR approach can also strengthen causal conclusions by limiting reverse causation problems (biological, through exposure assignment, due to reporting bias), selection bias and regression dilution bias 
. illustrates how this concept is applied to inform causal inference.
Directed acyclic graph (DAG) showing the instrumental variable assumptions underpinning our Mendelian randomisation study (note the instrument is not allowed to have a direct effect on the outcome, hence this line is dashed).
The analytic approach employed here for MR is the instrumental variable (IV) model, in which the genetic variant is treated as an instrument which is assumed to be associated with the disease only through its association with the intermediate phenotype 
. This requires firstly the identification of one or more genetic variants (typically a single nucleotide polymorphism or SNP) as the IV that is known from published data to be associated with the phenotype 
. The three key assumptions underlying the MR approach are: a) the genotype is associated with the phenotype; b) the genotype is independent of measured and unmeasured confounders; and c) that the effect of genotype on outcome is mediated only through the intermediate phenotype (no pleiotropy) 
In this study, we set out to evaluate the relationship between CRC, plasma 25-OHD levels and genotype at 4 genetic loci tagging genes involved in vitamin D metabolism (Table S1
) and which have previously been shown to be associated with plasma vitamin D levels in a pooled meta-analysis of Genome Wide Association Studies 
. In order to estimate whether there is a causal relationship between plasma 25-OHD and CRC risk we applied the control function IV estimator.