|Home | About | Journals | Submit | Contact Us | Français|
Both genetic and environmental factors contribute to the aetiology of multiple sclerosis (MS). More than 50 genomic regions have been associated with MS susceptibility and vitamin D status also influences the risk of this complex disease. However, how these factors interact in disease causation is unclear. We aimed to investigate the relationship between vitamin D receptor (VDR) binding in lymphoblastoid cell lines (LCLs), chromatin states in LCLs and MS-associated genomic regions. Using the Genomic Hyperbrowser, we found that VDR-binding regions overlapped with active regulatory regions [active promoter (AP) and strong enhancer (SE)] in LCLs more than expected by chance [45.3-fold enrichment for SE (P < 2.0e−05) and 63.41-fold enrichment for AP (P < 2.0e−05)]. Approximately 77% of VDR regions were covered by either AP or SE elements. The overlap between VDR binding and regulatory elements was significantly greater in LCLs than in non-immune cells (P < 2.0e−05). VDR binding also occurred within MS regions more than expected by chance (3.7-fold enrichment, P < 2.0e−05). Furthermore, regions of joint overlap SE-VDR and AP-VDR were even more enriched within MS regions and near to several disease-associated genes. These findings provide relevant insights into how vitamin D influences the immune system and the risk of MS through VDR interactions with the chromatin state inside MS regions. Furthermore, the data provide additional evidence for an important role played by B cells in MS. Further analyses in other immune cell types and functional studies are warranted to fully elucidate the role of vitamin D in the immune system.
It is well known that genetic factors influence susceptibility to the complex disease multiple sclerosis (MS) (1). Large population-based studies investigating the familial aggregation of the disease have shown that the risk of MS is increased among biological relatives of patients and positively correlates with the degree of kinship (2,3). These findings provided the rationale for linkage and association studies which identified the main genetic locus for MS within the major histocompatibility complex (MHC) class II region (4–7). However, genetic risk factors for MS are not limited to the MHC and the development of genome-wide association study (GWAS) approaches has provided further insights into the genetic architecture of the disease (8,9). Recently, The International Multiple Sclerosis Genetics Consortium and the Wellcome Trust Case Control Consortium 2 performed a GWAS including more than 9000 MS patients and found evidence for over 50 genomic regions associated to MS (10). The discovery of associated genes can greatly aid our knowledge of disease aetiology and pathogenesis.
The chromatin landscape of a cell is specific to each cell type and, among other roles, determines which regions of the genome are accessible to the binding of transcription factors and whether transcription occurs or is repressed. A recent study mapped a number of chromatin marks across nine cell types to systematically characterize regulatory elements and their cell-type specificities (11). We have recently shown that MS-associated genomic regions overlap with strong enhancer (SE), active promoter (AP) and strongly transcribed regions in lymphoblastoid cell lines (LCLs), more than expected by chance and more so than in non-immune cell types (12). This is relevant since genetic variants associated with a certain disease are likely located within genomic regions which are active in the causative cell type(s) and thus these results further highlighted the primary immunological nature of MS (13).
Like all complex traits, genes are not the only determinant of MS susceptibility. Several lines of evidence support a role for vitamin D deficiency in influencing the risk of MS. Both vitamin D intake and vitamin D levels are inversely associated with the risk of MS later in life (14,15). Furthermore, rare loss-of-function variants in the CYP27B1 gene (which encodes the enzyme which activates vitamin D) increase the MS risk (16). How vitamin D is involved in MS aetiology is unclear, but recent findings strongly support the presence of a gene–environment interaction for vitamin D in MS. Vitamin D acts in the genome via its cognate nuclear receptor, the vitamin D receptor (VDR). Using chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) in LCLs, our group has shown the presence of 2776 different vitamin D responsive elements throughout the genome bound by the VDR. Notably, genetic loci associated with MS were substantially enriched for VDR-binding sites (17).
However, the VDR-MS enrichment analysis was performed using old data on MS-associated genomic regions and needs to be updated in light of the latest GWAS findings (10). Furthermore, the relation between VDR binding and chromatin states has never been explored. In the present study, our aim was to investigate how genomic regions with VDR binding in LCLs (VDR regions) (17), chromatin states in LCLs (11) and MS-associated genomic regions (MS regions) (10) are related to each other. We performed this analysis using the Genomic Hyperbrowser (http://hyperbrowser.uio.no/hb/), a powerful and robust bioinformatic tool which is able to investigate genomic relations genome wide as well as at local scales (18).
Our first aim was to assess in which chromatin states in LCLs VDR binding after vitamin D stimulation was more likely to occur. Chromatin states analysed were AP, weak promoter (WP), poised promoter (PP), SE, weak enhancer (WE), polycomb repressed (PR), heterochromatic (H), insulator (I), strongly transcribed (ST), weakly transcribed (WT) and repetitive/CNV (Rep/CNV) (11).
The position of VDR regions was fixed, while chromatin intervals were randomized as described in the Materials and Methods. On the global scale, the overlap between VDR regions and chromatin states was higher than expected by chance in 5 out of the 11 chromatin states analysed (SE, WE, AP, WP and I). However, the overlap appeared particularly strong in chromatin states of high regulatory activity (SE and AP). Approximately 44 and 33% of VDR regions were in fact covered by SE and AP, respectively. The enrichment values of VDR regions within these chromatin states were 45.26 for SE (P < 2.0e−05) and 63.41 for AP (P < 2.0e−05). Furthermore, SE and AP were the only chromatin states in which the overlap was significantly higher than expected in all chromosome arms (41 out of 41) (Table 1).
Similarly, when VDR-binding regions before vitamin D stimulation were used for the analysis, both SE and AP elements overlapped with VDR more than expected by chance (P < 2.0e−05 for both). However, the extent of SE-VDR and AP-VDR overlaps was decreased and increased, respectively (SE-VDR: enrichment = 15.5, proportion of VDR covered = 21%; AP-VDR: enrichment = 119.6, proportion of VDR covered = 48%). Vitamin D stimulation substantially increases the number of VDR-binding sites across the genome (17). Furthermore, studies have shown that other nuclear receptors, such as the glucocorticoid receptor, bind primarily to already accessible chromatin regions after stimulation (19). For these reasons, we decided to use VDR-binding data after vitamin D stimulation for all the following analyses.
On its own, the presence of significant overlap between VDR regions and certain chromatin states in LCLs does not necessarily imply biological significance. Both VDR regions and chromatin intervals could just be more likely to be near commonly transcribed genes and a similar degree of overlap could be present between VDR regions and chromatin states of other cell types. To exclude this hypothesis, we tested whether the previously identified LCL chromatin states with significant overlap (SE, WE, AP, WP and I) co-localized with VDR regions more than the same chromatin states from three additional cell types (hepatocytes, fibroblasts and keratinocytes). We found that LCL-specific SE and AP elements overlapped with VDR regions more than SE and AP elements from control cell types (P < 2.0e−05 for all comparisons, Table 2).
By simply looking at the proportion of the genome which is covered individually by two sets of genomic intervals (tracks), one can calculate the expected base pair overlap (e.g. if two tracks cover 10% of the genome each, the expected overlap would be 1%). However, this assumption is valid only if the two tracks are independent from each other, which is not always the case. For example, tracks could be preferentially located within the same regions of the genome and this could considerably increase their expected base pair overlap. This appears to be the case for both VDR-SE and VDR-AP relationships as suggested by the strong correlations observed between the proportional coverage of SE or AP versus VDR regions within each cytoband (Fig. 1).
In order to understand whether the strong overlap observed between VDR regions and SE and AP is not only due to shared preference for the same genomic regions but also due to a tendency towards detailed base pair overlap, we calculated and compared the following values: (i) the expected VDR-SE and VDR-AP base pair overlaps if all these elements were independent (independency of tracks); (ii) the proportional coverage of VDR, SE and AP within each cytoband; (iii) the expected VDR-SE and VDR-AP base pair overlaps after fixing the proportional coverage of VDR, SE and AP elements per cytoband (so that each of them is required to occur with the same base pair coverage within each cytoband as observed); (iv) the actual VDR-SE and VDR-AP observed base pair overlaps. All values are shown in Table 3.
When compared with independency of tracks, fixing cytoband preferences increased the expected overlap by ~2-fold (from 0.0011 to 0.0023% for VDR-SE and from 0.0005 to 0.0011% for VDR-AP). This further indicates the strong bias of VDR, SE and AP elements towards occurring within the same regions of the genome. However, the actual observed VDR-SE and VDR-AP overlaps were a further 13 and 21 times greater than the expected overlap after fixing cytoband preferences (P < 0.001, for both VDR-SE and VDR-AP). This shows that both the VDR-SE and VDR-AP overlaps are not only due to shared preference for the same cytobands but also due to a more detailed overlap at the base pair level. In addition, we found that regions of overlap between VDR and either SE or AP were strikingly enriched for differentially expressed genes after vitamin D stimulation (enrichment = 8.99, P < 2.0e−05).
We next investigated whether VDR regions were more likely than chance to occur within MS-associated regions. We have previously observed this using 3-year-old data on MS-associated regions (17). In the updated analysis, out of 58 MS-associated regions, 35 (60.3%) had VDR binding inside (Table 4). The positions of MS regions were kept fixed, while VDR intervals were randomized as described in the Materials and Methods. The observed base pair overlap was considerably higher than expected by chance (P < 2.0e−05), with a fold enrichment of 3.67.
However, both VDR-binding sites and MS regions could have a tendency to be located near genes. This shared relation could itself lead to a higher overlap between VDR and MS regions than expected by independency of these tracks. As expected, when position relative to genes was taken into account in the null model, the expected number of VDR regions falling inside MS regions increased from 30 to 45 (see Materials and Methods for details). However, the observed number of VDR regions falling inside MS regions was 2.2-fold higher than expected (99 vs. 45) (P < 0.001).
Given the significant co-occurrence of VDR-SE and VDR-AP at the genome-wide level and the observed overlap between VDR and MS regions, we investigated the extent of the overlap between VDR and SE and between VDR and AP within MS regions. Similar to how we analysed the genome-wide VDR-SE and VDR-AP overlaps, we calculated and compared the following values: (i) the expected VDR-SE and VDR-AP base pair overlaps within MS regions if all these elements were independent; (ii) the proportional coverage of VDR, SE and AP within each MS region; (iii) the expected VDR-SE and VDR-AP base pair overlaps after fixing the proportional coverage of VDR, SE and AP elements per MS region; (iv) the actual VDR-SE and VDR-AP observed base pair overlaps within MS regions. All values are shown in Table 5.
When compared with independency of tracks, fixing proportional coverage of MS regions slightly increased the expected overlap (from 0.0133 to 0.0204% for VDR-SE and from 0.0055 to 0.0079% for VDR-AP). The actual observed VDR-SE and VDR-AP overlaps were a further 4.75 and 14.93 times higher than the expected overlap after fixing for MS regions (P < 0.001, for both VDR-SE and VDR-AP). Overlap between VDR and either SE or AP elements was present in several MS-associated regions and near a number of putative candidate genes, including SP140, CD86, IL22RA2, CXCR5, TNFRSF1A, ZFP36L1, BATF, IRF8, CD40 and TNFRSF6B in addition to a region on chromosome 6 with no candidate genes (Figs 2 and and33).
We have seen how VDR regions tend to overlap with MS regions, SE and AP elements. Furthermore, MS regions also overlap with SE and AP intervals more than expected by chance (12). A genome-wide plot showing their occurrence and relations built using the software Circos is presented in Figure 4 (20). It appears therefore more appropriate to look at the relations between these genomic elements using a three-way rather than two-way (pair-wise) approach. We can do this in terms of probability calculus by considering the proportion of the genome covered by a certain track A as a probability ‘P(A)’. The observed overlap between two tracks A and B can therefore be written as ‘P(A&B)=P(A)*P(B | A)’ (i.e. the proportion of the genome covered by both A and B is equal to the proportion covered by A, multiplied by the proportion of A covered by B). Extending this to three tracks, the observed overlap between A, B and C can be written as ‘P(A&B&C) = P(A)*P(B | A)*P(C | B,A)’.
As for two-way relations, the expected overlap between three tracks A, B and C given independence would simply be the product of the coverage proportions of each track ‘P(A)*P(B)*P(C)’. One can then compare the observed ‘P(A&B&C)’ against the expected ‘P(A)*P(B)*P(C)’ value under independency of tracks. It is also possible to keep some partial dependencies, e.g. fixing the observed overlap between A and B throughout the genome to get another expected overlap value which takes into account the overlap between A and B ‘P(A&B)*P(C) = P(C)*P(A)*P(B | A)’. By comparing the observed overlap ‘P(A&B&C)’ against the expected ‘P(A&B)*P(C)’, one can see if there is any overlap preference between the three tracks beyond what is expected from the pair-wise overlap between A and B and the proportion of the genome covered by C.
We performed these calculations considering SE-VDR-MS as well as AP-VDR-MS relations and the results are shown in Table 6. The observed overlaps ‘P(SE&VDR&MS)’ and ‘P(AP&VDR&MS)’ were considerably higher than any of the expected overlap values both when tracks were considered independent and when pair-wise overlaps were taken into account. Similar results were also obtained when fixing the cytoband preference for the corresponding values of individual or pair-wise overlap coverage.
On average, VDR regions were more likely to be located in the upstream region with a particularly strong preference for transcription start sites (TSS). The lowest VDR coverage was observed within genes. As expected, AP elements were also preferentially located next to TSS. Accordingly, AP-VDR overlaps tended to be located next to TSS. SE regions were located both inside and outside genes, with particularly high peaks before and after TSS and depleted in TSS itself (Fig. 5). The overlap between SE and VDR appeared particularly frequent in downstream regions.
We also looked at how VDR, SE and AP elements tended to be distributed within MS regions (Fig. 6). Interestingly, VDR coverage appeared to be particularly low in the centre of MS regions. SE elements were clearly more likely to be located at the centre of MS regions, while the position of AP elements as well as that of AP-VDR and SE-VDR overlaps did not follow any particular distribution.
Assessing whether two or three sets of genomic intervals overlap may appear like a simple question to answer. However, performing the analysis without bias is more complicated. For example, one needs to distinguish between whether the overlap is due to shared preference for the same genomic regions or due to detailed overlap at the base pair level. Furthermore, two different tracks could share a similar tendency to be located in proximity to a third set of genomic intervals (for example genes), and when this happens the effect of these confounding tracks needs to be taken into account. We were able to overcome these issues using the Genomic Hyperbrowser (18).
We recently showed that MS regions overlapped with SE and AP elements of LCLs more than expected by chance and more than in non-immune cell types (12). We followed up these findings and investigated how VDR binding is involved in the relation between MS regions and chromatin states. We found that:
Since chromatin data from vitamin D-stimulated cells were not available, the correct interpretation of our results is that VDR binds to genomic regions that have a regulatory function. This does not imply a causative relation between VDR binding and the chromatin profile of a cell. Future studies should test this hypothesis comparing VDR and chromatin data before and after vitamin D stimulation. However, taken together, these findings have a number of important implications. First, they suggest that VDR binding influences or interacts with the chomatin profile of a cell. The regulatory role played by vitamin D at the genomic level likely underlies its modulatory actions in the immune system. Both the VDR and the CYP27B1 enzyme are expressed in an exceptionally wide range of immune cells, including B cells, but also Th1 and Th17 subsets and FOXP3+ regulatory T cells (21). In B cells, vitamin D is able to modulate cell proliferation, induce apoptosis and inhibit plasma cell differentiation and immunoglobulin secretion (22).
Secondly, the fact that VDR-SE and VDR-AP overlaps are seen not only at the genome-wide level, but also inside genomic regions associated with the MS risk provides a potential biological explanation for the epidemiological observations linking vitamin D deficiency to both risk and clinical course of MS (15,16,23,24).
Finally, these data provide insights into the cell types that are primarily driving the onset of MS. It would be logical to hypothesize that genetic and environmental risk factors would have a direct influence on the cell type triggering the abnormal immune response in MS patients. Therefore, the observation that both genomic regions associated with MS- and VDR-binding intervals overlap with genomic regions which are active and play a regulatory role in B cell lines provide further support for an important role for this cell type in the pathogenesis of this devastating disease (25). The presence of oligoclonal bands in the cerebrospinal fluid represents the most consistent immunological finding in MS patients and B cell abnormalities influence both conversion to clinically definite MS, MRI activity, onset of relapses and disease progression (26–32). Furthermore, clinical trials have shown that antibody-mediated depletion of CD20+ B cells is highly effective in diminishing MRI activity and onset of clinical relapses (33,34). Unfortunately, a similarly detailed chromatin profile of T cells is not yet available and we could not perform a direct comparison between B and T cells.
One possible confounding factor is that both chromatin and VDR-binding data come from B cells immortalized via Epstein Barr virus (EBV). While we acknowledge that the EBV-mediated immortalization process could influence both chromatin profile and VDR binding, this could itself be interesting for MS since EBV infection is another environmental risk factor which is strongly implicated in the aetiology of this complex disease (35–37). It would therefore be particularly interesting to investigate how EBV infection can change the chromatin features of infected B cells, and whether this modifies their relation with MS-associated regions and VDR binding.
To conclude, genomic regions with VDR binding in LCLs substantially overlap with promoter and enhancer elements that are active in LCLs. This is also true in MS-associated regions and is therefore relevant for MS aetiology. Further, similar analyses in other immunological cell types and functional studies will be required to fully understand how vitamin D can influence the chromatin landscape and gene expression outside and inside genomic regions which play a role in the aetiology of this devastating disease.
Genetic variants associated with the MS risk were obtained from the recent GWAS performed by the International Multiple Sclerosis Genetics Consortium (IMSGC) and the Wellcome Trust Case Control Consortium 2 (WTCCC2) (10). Since VDR binding could regulate gene expression at several kilobase of distance, we decided to define MS regions as genomic intervals of 0.25 cM centred on the lead associated SNP ± 100 kb each side. The chromatin profiles of B-lymphoblastoid cells (GM12878), hepatocellular carcinoma cells (HepG2), normal lung fibroblasts (NHLF) and normal epidermal keratinocytes (NHEK) were obtained from the ENCODE project (11). Briefly, chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) and expression data were used to identify different classes of chromatin states: AP, WP, PP, SE, WE, PR, H, I, ST, WT and Rep/CNV (11). Genomic regions with VDR binding and differentially expressed genes in LCLs before and after stimulation for 36 h with 0.1 mm calcitriol were identified as previously described (17).
To assess statistical significance, we defined a null model for which the location of track A intervals varied randomly, while preserving the empirical segment and inter-segment length distribution of track A intervals. Track B intervals were fixed. The number of overlapping base pairs between the two tracks was calculated for the real data, as well as for 50 000 Monte Carlo samples from the null model. The P-value was calculated in the usual way, i.e. as the proportion of Monte Carlo samples being equal to or more extreme than the observed overlap. The analyses were performed on both the global (whole genome) and chromosome arms scale. For the local analysis of chromosome arms, P-values were deemed significant at a false discovery rate of 10% (38).
The enrichment of track A intervals inside track B was defined as the ratio of the proportion of track B intervals covered by track A intervals, to the proportion of non-track B intervals covered by track A intervals. When assessing shared preference of bins versus detailed base pair overlap and three-way relations, a factor of observed divided by expected base pair overlap was used.
To evaluate the overlap of VDR regions and chromatin states in LCLs relative to that in other cell lines, we created case–control tracks for each chromatin state by removing all parts of chromatin intervals that overlapped between LCLs and control cells and marking the remaining intervals as case (LCL specific intervals) and control (other cell type specific intervals). P-values were computed by a Monte Carlo procedure, in which the case–control labels of chromatin intervals were randomly permuted. The observed base pair overlap between case intervals and VDR regions were compared against the corresponding distribution for 50 000 Monte Carlo samples in the usual way. The fold enrichment difference in overlap between LCLs and control cells was calculated as the ratio between the proportions of case and control intervals that overlapped with VDR regions.
To control for a possibly confounding effect of gene proximity to the VDR-MS relation, we assessed VDR-MS overlap conditioning on their common relation to genes, using the approach to handle confounder tracks previously described (18). Briefly, we estimated the occurrence frequency of observed VDR binding in exponentially increasing ranges of distance to nearest gene, and under the null model sampled VDR binding according to a non-homogeneous Poisson process with intensity given by the estimated occurrence frequency at a given distance (range) from the nearest gene. MS regions were kept at their observed locations.
To compare the distribution of VDR, SE and AP regions relative to gene position, we divided the genome into regions around each Ensembl gene in a way that these extended Ensembl gene regions exactly covered the whole genome. Overlapping genes were clustered and each nucleotide of the genome assigned to the closest gene giving as many gene-associated regions as there are Ensembl genes (after clustering). Each gene-associated region was then divided into three sub-regions: the gene in the middle, upstream regions and downstream region at the sides. Gene, upstream and downstream regions were further divided into 20 equally sized intervals and average coverage of VDR, SE and AP elements in each of these intervals was computed. To compare the distribution of VDR, SE and AP elements within MS regions, we symmetrically extended the central interval of 0.25 cM centred on the lead associated SNP ± 100 kb each side, divided each MS region in 20 equally sized intervals and computed average VDR, SE and AP coverage.
Conflicts of Interest statement. G.G. serves on scientific advisory boards for Merck Serono and Biogen Idec and Vertex Pharmaceuticals; served on the editorial board of Multiple Sclerosis; has received speaker honoraria from Bayer Schering Pharma, Merck Serono, Biogen Idec, Pfizer Inc., Teva Pharmaceutical Industries Ltd.—sanofiaventis, Vertex Pharmaceuticals, Genzyme Corporation, Ironwood and Novartis; has served as a consultant for Bayer Schering Pharma, Biogen Idec, GlaxoSmithKline, Merck Serono, Protein Discovery Laboratories, Teva Pharmaceutical Industries Ltd.—sanofi-aventis, UCB, Vertex Pharmaceuticals, GW Pharma, Novartis and FivePrime; serves on the speakers bureau for Merck Serono; and has received research support from Bayer Schering Pharma, Biogen Idec, Merck Serono, Novartis, UCB, Merz Pharmaceuticals, LLC, Teva Pharmaceutical Industries Ltd.—sanofi-aventis, GW Pharma and Ironwood. G.C.E. serves on the editorial boards of the International Multiple Sclerosis Journal and Multiple Sclerosis and as Section Editor for BMC Medical Genetics; has received funding for travel or speaker honoraria from Bayer Schering Pharma, sanofi-aventis, Roche and UCB; has served as a consultant to Biopartners, Bayer Schering Pharma, Howrey LLP, Heron Health, and Eli Lilly and Company; and receives research support from Bayer Schering Pharma, the Multiple Sclerosis Society of the UK, and the Multiple Sclerosis Society of Canada Scientific Research Foundation. S.V.R. reports no competing interest. All authors have no relationships with companies that might have an interest in the submitted work in the previous 3 years; their spouses, partners or children have no financial relationships that may be relevant to the submitted work; and have no non-financial interests that may be relevant to the submitted work.
This work was funded by a research fellowship FISM-Fondazione Italiana Sclerosi Multipla-Cod. (2010/B/5 to G.D.); the FUGE2 program of the Research Council of Norway to G.K.S.; a research fellowship by the National Council for Science and Technology (CONACyT), Mexico and the Multiple Sclerosis Society of Great Britain and Northern Ireland (211990 and 915/09 to A.J.B.T.); the Medical Research Council (grant number G0801976); and the Wellcome Trust 090532/Z/09/Z. The study sponsors had no role in the design and conduct of the study; collection, management, analysis and interpretation of the data; and preparation, review or approval of the manuscript. All authors state that this research was carried out independently of the influence of funding bodies. Funding to pay the Open Access publication charges for this article was provided by the Wellcome Trust (090532/Z/09/Z).