|Home | About | Journals | Submit | Contact Us | Français|
Multiple sclerosis (MS) is characterized as an autoimmune demyelinating disease. Numerous family studies have confirmed a strong genetic component underlying its etiology. After several decades of frustrating research, the advent and application of affordable genotyping of dense SNP maps in large datasets has ushered in a new era in which rapid progress is being made in our understanding of the genetics underlying many complex traits. For MS, one of the first discoveries to emerge in this new era was the association with rs6897932[T244I] in the interleukin-7 receptor alpha chain (IL7RA) gene (Gregory et al. 2007; International Multiple Sclerosis Genetics Consortium 2007; Lundmark 2007), a discovery that was accompanied by functional data that suggest this variant is likely to be causative rather than a surrogate proxy (Gregory et al. 2007). We hypothesized that variations in other genes functionally related to IL7RA might also influence MS. We investigated this hypothesis by examining genes in the extended biological pathway related to IL7RA to identify novel associations. We identified 73 genes with putative functional relationships to IL7RA and subsequently genotyped 7,865 SNPs in and around these genes using an Illumina Infinium BeadChip assay. Using 2,961 case-control dataset, two of the gene regions examined, IL7 and SOCS1, had significantly associated single-nucleotide polymorphisms (SNPs) that further replicated in an independent case-control dataset (4,831 samples) with joint p-values as high as 8.29×10-6 and 3.48×10-7, respectively, exceeding the threshold for experiment-wise significance. Our results also implicate two additional novel gene regions that are likely to be associated with MS: PRKCE with p-values reaching 3.47×10-4 and BCL2 with p-values reaching 4.32×10-4. The TYK2 gene, which also emerged in our analysis, has recently been associated with MS (Ban et al. 2009). These results help to further delineate the genetic architecture of MS and validate our pathway approach as an effective method to identify novel associations in a complex disease.
Multiple sclerosis (MS) is an idiopathic autoimmune demyelinating disease that strikes in early to mid-adult life and affects women more often than men. MS, like most common diseases, has several complicating characteristics including clinical and genetic heterogeneity, incomplete penetrance, fluctuations over time, polygenic inheritance, and substantial environmental risk factors (Compston and Coles 2002). Numerous family and twin studies have shown that there is a strong genetic component underlying the etiology of MS. The prevalence of the disease among first-degree relatives of affected individuals is substantially increased over the prevalence in the general population, and twin studies suggest an overall heritability of ~50% (Ebers, Sadovnick, and Risch 1995; Hauser and Goodkin 1998; Mumford et al. 1994; Robertson et al. 1996; Sadovnick et al. 1993; Sadovnick and Ebers 1995; Sadovnick et al. 1996).
In the 1970s the Major Histocompatibility Complex (MHC) association with MS was discovered (Bertrams and Kuwert 1972; Naito et al. 1972). However, linkage screens failed to consistently identify any regions of linkage outside the MHC (Ebers et al. 1996; Haines et al. 1996; Kuokkanen et al. 1997; Sawcer et al. 1996; Transatlantic Multiple Sclerosis Genetics Cooperative 2001; International Multiple Sclerosis Genetics Consortium 2005). Furthermore, although hundreds of candidate genes have been tested, none had gained consensus as carrying risk for MS (Hirschhorn et al. 2002). This slow progress made it clear that more powerful approaches were needed in the genetic analysis of MS. Genomic convergence was applied to explicitly identify candidate genes with the best genetic and biological relevance (Hauser et al. 2003). In 2007, more than three decades after HLA was first identified, the interleukin-7 receptor alpha chain (IL7RA or CD127) was identified as the first non-MHC MS susceptibility locus using this approach (Gregory et al. 2007) and simultaneously confirmed in an accompanying report (Lundmark 2007).
The associated SNP was also identified in the first genome-wide association study (GWAS) in MS (International Multiple Sclerosis Genetics Consortium 2007) and in the non-synonymous SNP (nsSNP) screen performed by the Wellcome Trust Case Control Consortium (WTCCC) (Wellcome Trust Case Control Consortium 2007). The original GWAS study also identified interleukin-2 receptor α gene (IL2RA or CD25) as an MS risk gene, and highlighted the likely role of additional loci including CLEC16A and CD58. The effects of these additional genes were replicated in multiple independent datasets confirming their status as MS associated loci. In addition, other genes were identified and replicated as MS risk genes, including TYK2, CD226, TNFRSF1A (or CD120a), IRF8, and CD6 (Australia and New Zealand Multiple Sclerosis Genetics Consortium 2009; Ban et al. 2009; Baranzini, Wang, and Gibson 2009; De Jager et al. 2009; International Multiple Sclerosis Genetics Consortium 2009a; Kumpfel et al. 2008).
Rs6897932, a non-synonymous coding SNP in IL7RA, plays a functional role in MS, as the major allele (C) increases skipping of exon 6 and thus increases the soluble form of IL7RA. IL7RA is a likely functional candidate for this complex disease because it is involved in the development, maturation, and homeostasis of T and B cells. It is involved in two signaling pathways: IL7 and thymic stromal lymphopoietin (TSLP) (Gregory et al. 2007).
IL7 is an essential cytokine for lymphocyte development. Its receptor consists of two components: IL7RA and IL2RG (gamma c or CD132); the latter also heterodimerizes with receptors for other interleukins, including IL2, IL4, IL9, IL15, and IL21. Both components are essential for the biological effects of IL7 and are also required for the downstream signaling cascade, which involves the activation of Janus kinases (Jaks) and signal transducer activators of transcription (STATs). The downstream signaling cascades are involved in various cell processes including development of B and T cells, regulation of cell proliferation, regulation of survival for progenitor and mature lymphoid cells, and gene transcription following an immune response (Hofmeister et al. 1999; Jiang et al. 2005). We hypothesized that genes in the IL7RA pathway may have independent or interactive effects with the risk SNP in IL7RA (Gregory et al. 2007).
Other complex diseases, including schizophrenia and Crohn's disease, have employed the approach of examining the extended pathways identified through known susceptibility genes to identify new disease loci (Hennah et al. 2007; McGovern et al. 2009). With the prior knowledge that IL7RA is associated, it is plausible that this pathway (IL7/IL7RA) is involved in the autoimmune aspects of the disease. We designed our study to investigate the role of genes in the IL7RA pathway in MS susceptibility both for independent and interactive gene effects. We selected genes as part of this pathway if they were directly involved in the downstream signaling cascade or were known to be influenced by varying IL7 concentrations.
After all quality control was completed, the discovery dataset consisted of 7,637 SNPs genotyped in 1,343 cases and 1,379 controls. The results using the Armitage trend test found 578/7,637 nominally significant SNP results with estimated genotype relative risks (GRRs) of 1.11-1.64 (over 75% have GRRs<1.20). None of these SNPs met a Bonferroni experiment-wise correction, although one SNP in IL7 (rs2587156) came close (p-value= 2.46 × 10-5). There are seventeen SNPs of particular interest (p-values<1 × 10-3, Table I) (the full set of significant SNPs is given in Suppl. Table 1), representing five gene regions. The genotypic and haplotypic analyses did not yield any significant novel regions that were not detected using the Armitage trend test. Likewise, the Multi-factor dimensionality reduction (MDR) results did not display any significant 2-way or 3-way interactions. We selected 31 SNPs including the seventeen top SNPs and fourteen additional SNPs with suggestive p-values (p-values<1×10-2, Supplementary Table 1) from these five gene regions. We chose to focus on those most significant results (p-values <1×10-3) based on what we could reasonably follow up on in a replication study. To get better coverage of those five regions of particular interest, we included additional SNPs giving us a total of 31 SNPs.
Following QC analyses, 24/31 SNPs were analyzed in the independent validation dataset of 2,111 cases and 2,037 controls. We observed 11/24 significant SNPs (p-values<0.05) with estimated GRRs of 1.11-1.25 (Table II). In addition, we merged the two independent datasets together to obtain the overall Armitage p-values for all 3,454 cases and 3,416 controls. The SNPs on chromosome 16 flanking SOCS1 were the most significant having an estimated GRR of 1.25; the most significant p-value (3.48×10-7) was with SNP rs441349. IL7 also had significant SNPs as well with estimated GRRs of 1.14 and 1.35 (Table III) with p-values of 6.92×10-4 and 8.29×10-6 for SNPs rs1054283 and rs2587156, respectively. Because of gender ratios in the replication dataset were different from the discovery set, we also examined the stratified dataset to confirm that gender is not a confounding variable. We saw no differences in the gender specific results (Table IV).
The high risk allele frequency of the associated IL7RA variant meant that our analysis based on stratifying according to this locus had only limited power. It was thus not surprising that no novel finding emerged. The HLA stratified data was more interesting. There were 2,584/6,870 individuals with at least one risk allele (“A” allele for rs3135388) (IMSGC 2007). The Armitage trend test for those individuals without a risk allele showed IL7 had more significant p-values than in the overall dataset. For those individuals with the risk allele (at least one copy), SNPs in the flanking region of SOCS1 yielded the most significant p-values for the Armitage trend test (Table V). However, a test of heterogeneity was only significant for the SNPs in IL7 (p-values=0.03-0.04).
Through our analysis of the genes implicated as part of the IL7R pathway, we identified two novel gene regions, IL7 and SOCS1, replicated these in an independent dataset, and have highly significant overall results. Finding association with the biologically relevant ligand (IL7) of IL7R indicates that the known biological activity of these genes is relevant to MS, as opposed to some perhaps as yet unknown pleiotropic effect of these genes. The IL7/IL7R complex along with IL2RG is vital for the development, maturation, and survival of T cells, which is theorized to be where the breakdown in the body's immune response occurs. This breakdown could, therefore, play a role in the adverse autoimmunity found in MS patients (Goverman 2009).
Interestingly, SOCS1 is located on chromosome 16 slightly downstream from a recently confirmed MS susceptibility locus, CLEC16A. After examining the linkage disequilibrium (LD) architecture of this region, SOCS1 and CLEC16A appear to be two independent MS disease loci (the r2 values between the CLEC16A SNP (rs7184083) and the most significant SNPs in the SOCS1 region are 0.057 (rs1640923) and 0.049 (rs12922733)). The LD block containing CLEC16A is separated from the more proximal LD block that contains the SOCS1 gene region that contains our significant SNPs. Having an associated SNP in the flanking region of a gene involved in suppression of cytokine signaling (i.e. SOCS1 functions as a suppressor of cytokine signaling) suggests that this flanking region could be a regulatory element. Having multiple associated genes in close proximity may suggest that these genes could be co-regulated. These observations warrant further investigation of this region via functional studies.
We also identified three additional strong candidate genes that have biological relevance to the etiology of MS, including Protein kinase C epsilon (PRKCE), B-cell lymphoma 2 (BCL2), and tyrosine kinase 2 (Tyk2). PRKCE is involved in neuron channel activation and apoptosis. BCL2 is involved in activation of apoptosis. Tyk2 transmits cytokine signals by phosphorylating receptor subunits (Jiang et al. 2005; Shirai, Adachi, and Saito 2008). Through independent efforts, we have already identified Tyk2 as a susceptibility gene in MS. Its re-emergence here thus provides a valuable positive control and further support for the importance of Tyk2 (Ban et al. 2009).
The results indicate that the multiple genetic variations in the IL7/IL7RA pathway are important in MS pathogenesis, and by implication, that this pathway plays an important biological role in the etiology of MS. Furthermore our data support the approach of using candidate pathways as an effective way of identifying additional MS susceptibility genes. Examining the extended pathways of previously identified associated genes has recently proven useful in the identification of new disease loci (Hennah et al. 2007; McGovern et al. 2009). Therefore, this method of screening the biologically interacting genes of confirmed disease genes may help to elucidate additional aspects of the underlying genetic architecture of numerous complex diseases.
These data have also identified several genes that have modest significance and should be examined further. Some of the genes mentioned previously (i.e. PRKCE and BCL2) should be investigated in independent datasets. They were not strongly replicated, but had several SNPs with nominally significant p-values in the discovery dataset and trending favorably in the validation dataset. Thus, additional examination of these genes is necessary. Additionally, other genes (i.e. PRKCA, Stat5a/b, IL12A/B, T-BET, BAX, and BIM) also had several SNPs with interesting trends that were not followed up in this study. All of these genes have biological plausibility and need further exploration.
The HLA-DR*1501 allele identified decades ago confers a much larger effect on MS susceptibility (odds ratio (OR) ~3) than the effects of recently identified associations. Both the IL7RA and the IL2RA alleles have much smaller effects (OR~1.2 and OR~1.25, respectively) (Gregory et al. 2007; International Multiple Sclerosis Genetics Consortium 2007). It is possible that common alleles with smaller effects could explain a significant proportion of the genetic component of MS (Baranzini et al. 2009; Sawcer 2008; Wang et al. 2005). In this study of the IL7/IL7RA pathway, the nominally significant SNPs all have estimated GRRs of 1.11-1.64, and over 75% have GRRs<1.20. Using the pathway approach in conjunction with these large datasets and dense SNP genotyping, the etiology of MS is finally starting to be dissected. By building on the knowledge of these smaller gene effects, functional studies can help us understand the underlying mechanism of these genes and their relationships to the pathogenesis of MS.
Samples for the discovery screen came from both the United States and the United Kingdom. Through the International Multiple Sclerosis Genetics Consortium (IMSGC) we compiled a total of 2,961 samples (1,479 cases and 1,482 controls). The McDonald criteria were used to provide a positive diagnosis for MS (McDonald et al. 2001). Healthy unrelated controls were selected from the same US sites and from the 1958 Birth Cohort in the UK. Controls were selected to match the appropriate ratio of female to male cases. Controls were also age-matched to the cases when possible. Subjects were chosen with all four main types of MS and varying disability. In the UK subset, those with clinically isolated syndrome (CIS) were excluded from this study; however, about 7% of the US subset had CIS at their examination dates, as the US experience indicates that these individuals virtually always develop MS. All participants self-reported as non-Hispanic whites. The table of the demographics of these samples can be found elsewhere (Table 4 of International Multiple Sclerosis Genetics Consortium 2009b).
Samples for the replication dataset also were obtained from the United States and the United Kingdom through the IMSGC. The same criteria were applied for diagnosis (McDonald et al. 2001). The controls were collected from the same ascertainment sites as the cases with the UK controls coming from the 1958 Birth Cohort. Like the discovery dataset, the UK cases with CIS were removed; whereas, CIS cases only made up 4% of the US cases. All samples were self-reported non-Hispanic whites. This dataset consisted of 4,831 samples (2397 cases and 2434 controls) (Table VI). Genotype concordance tests were performed to exclude any possible duplicate samples, thereby ensuring that the two datasets (discovery dataset and the replication dataset) are independent.
We identified 73 genes demonstrating putative biological involvement within the IL7RA pathway (Table VII) by performing a PubMed search (Ansel et al. 2006; Fry and Mackall 2005; Gregory et al. 2007; Hofmeister et al. 1999; Jiang et al. 2004; Jiang et al. 2005; Kisseleva et al. 2002; Motsinger et al. 2007; O'Shea 2004). All known SNPs within the genomic sequence of each gene and 50 kb flanking upstream and downstream of the gene were identified using dbSNP version 127 (for complete coordinates of these genes see Suppl. Table 2). The flanking regions of each gene were included to thoroughly examine the gene and any possible regulatory regions surrounding the gene.
We used the Illumina Infinium 60K BeadChip assay as our genotyping platform (Steemers et al. 2006) for the initial screening experiment. Both case and control samples were included on each BeadChip in a randomized order. With additional collaborators from the International Multiple Sclerosis Genetics Consortium (IMSGC), we aggregated a total of 48,767 SNPs for multiple projects into a single 60K BeadChip design. This study selected and used 7,865 SNPs for the 73 gene regions. These SNPs represented 16% of the total 60K BeadChip. The criteria for picking SNPs were: minor allele frequencies greater than 5%, validated SNPs (from dbSNP), and Illumina Infinium score greater than 0.60 (proprietary score used to determine the overall success of the assay). The Illumina protocol involves amplification of genomic DNA, fragmentation of this DNA, hybridization of the DNA on the BeadChip, extension on the BeadChip, and imaging to read the chip (Steemers et al. 2006).
For the validation experiment, the Sequenom MassARRAY iPLEX platform was used. The Sequenom protocol involves a multiplex PCR reaction preceding a single base primer extension reaction. The individual SNPs are identified by using Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI TOF MS). The protocol for this platform is described in detail elsewhere (Gabriel, Ziaugra, and Tabbaa 2009). In total, 90 SNPs were genotyped across 3 Sequenom pools for the aggregated multiple IMSGC projects. This study selected 31 SNPs across five gene regions. The SNPs chosen for replication were those with the highest Armitage p-values and were derived from five biologically relevant gene regions.
Upon completion of genotyping, QC processes were run for the entire BeadChip to guarantee the accuracy of the genotyped dataset. The discovery dataset contained 2,961 individuals (1,479 cases and 1,482 controls). Using the Whole-genome Association Study Pipeline (WASP) (Sexton et al. 2007), we tested for SNP and sample genotyping efficiency, allele frequencies, gender errors, and Hardy-Weinberg equilibrium (HWE). To ensure thorough cleaning of the data, this process was performed recursively.
Overall, individuals were dropped based on low genotyping efficiency (<95%), on gender discrepancies (X-chromosome heterozygosity scores), on apparent relatedness (using IBS statistics in Graphical Relationship Representation), and on population stratification (using Structure). In total, 239 individuals were eliminated: 92 based on low genotyping efficiency, 68 based on gender discrepancies and/or apparent relatedness, and 79 based on population stratification. Over 800 X-chromosome SNPs were on the BeadChip and could be used to determine possible gender discrepancies. Population stratification was investigated, and samples were dropped if the probability was <0.90 of being from European descent.
SNPs were eliminated based on low genotyping efficiency (<95%), on the absence of a minor allele in our dataset (monomorphic), and on the deviation from HWE (<0.0001). Seventy-three SNPs were out of HWE in the overall dataset but were in HWE (>0.05) in either the cases alone or the controls alone and were subsequently retained for analysis.
After application of stringent QC, we eliminated ~8% of the samples and ~4% of the SNPs. The estimate of residual genotypic error (based on X-chromosome heterozygosity in males) was 1.87×10-3. Overall, QC was completed yielding 2,722 individuals (1,343 cases and 1,379 controls) and 46,874 SNPs. This QC process was the same for this experiment and for an extensive follow-up of GWAS data from the IMSGC (International Multiple Sclerosis Genetics Consortium 2009b). This project included 7,637 SNPs (16.3% of the total BeadChip SNPs passing QC).
Similar QC processes were applied to the 31 SNPs in the replication phase. The replication dataset contained 4,831 individuals (2,398 cases and 2,433 controls). Using the Whole-genome Association Study Pipeline (WASP), we tested for marker and sample genotyping efficiency, allele frequencies, and Hardy-Weinberg equilibrium (HWE). Overall, 474 individuals were eliminated based on low genotyping efficiency (<85%) or the individuals self-reporting as other than non-Hispanic Caucasians. In addition, 219 individuals were eliminated due to overlap with the discovery dataset. No X-chromosome SNPs were run so gender discrepancies could not be addressed, and there were not enough SNPs genotyped to look at sample relatedness. Markers were eliminated based on low genotyping efficiency (<95%) and deviation from HWE (<0.0001). Four SNPs had genotyping efficiencies lower than 95%, and three SNPs were out of HWE. The QC process eliminated ~14.3% of the samples and ~22.5% of the SNPs. Overall, QC was completed yielding 4,148 individuals (2,111 cases and 2,037 controls) and 24 SNPs.
Armitage trend tests and genotypic chi square analyses were performed using WASP software for the minor allele. In addition, haplotype blocks were identified and examined for haplotypic associations using HaploView (Barrett et al. 2005) and haplo.stats from the R project (http://www.r-project.org/). The Armitage trend test and genotypic chi square analyses were performed for all 7,637 SNPs on all individuals using an additive scoring model for the minor allele. The haplotypic analysis was performed on the “tagging SNPs” that tag (using r2 value=0.80) only the 578 nominally significant (p-values <0.05) SNPs from the Armitage trend test on all individuals. Rs3135388, a SNP that is highly predictive of the HLA-DRB1*1501 allele, was included on the chip for purposes of secondary analyses such as stratification and interactive effects (International Multiple Sclerosis Genetics Consortium 2007). We also included the rs6897932 SNP in IL7RA. Stratification by this SNP was used to explore possible interactions as previously suggested (Gregory et al. 2007). The replication dataset had an unequally distributed sex ratio (Table VI); thus, to ensure that there was not a possible confounder, we stratified our dataset by gender to see if there were differences in the results.
We report nominal p-values (p-values<0.05) (unless otherwise indicated). A stringent Bonferroni corrected p-value is 2.0×10-5 calculated using ~2,500 independent SNPs (given the linkage disequilibrium in each gene region using an r2 value=0.80). MDR was used to look for gene-gene interactions within the pathway (Ritchie et al. 2001). We have adequate power (>80%) in the discovery dataset to see odds ratios of 1.3 and larger.
Supplementary Table 1. Complete list of 578 significant SNPs (pvalue<0.05) in the discovery dataset
Supplementary Table 2. List of Genes from IL7/IL7R Pathway Included in the 60K BeadChip using dbSNP version 127.
This work was supported by a grant from the NIH R01NS32830. JLM was partially supported by a National MS Society (NMSS) Research Award (RG 4201-A-1). The International MS Genetics Consortium is supported by R01NS049477. PLD is a Harry Weaver Neuroscience Scholar Award of the National MS Society (NMSS); he is also a William C. Fowler Scholar in Multiple Sclerosis Research and is supported by an NINDS K08 grant, NS46341. LP was a fellow of the National MS Society, USA (FG-1665-A-1) during these studies. DAH is a Jacob Javits Scholar of the NIH. This work was also supported by the Wellcome Trust (084702/Z/08/Z), the Medical Research Council (G0700061) and the Cambridge NIHR Biomedical Research Centre. We also acknowledge use of genotype data from the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We acknowledge use of genotype data from the British 1958 Birth Cohort DNA collection, and thank the Accelerated Cure Project for its work in collecting samples from subjects with MS and for making these samples available to IMSGC investigators. We also thank the following clinicians for contributing to sample collection efforts: Accelerated Cure project – Drs. Elliot Frohman, Benjamin Greenberg, Peter Riskind, Saud Sadiq, Ben Thrower, and Tim Vollmer; Washington University -Drs. B.J. Parks and R.T. Naismith. Finally, we thank the Brigham & Women's Hospital PhenoGenetic Project for providing DNA samples from healthy subjects that were used in the Stage 2 follow-up effort of this study. We acknowledge the work done by the Center for Genome Technology Genotyping Core within the Miami Institute for Human Genomics (University of Miami); specifically Ashley Anderson and Luis Espinosa for aid in sample processing and genotyping. We also thank the Computational Genomics Core within the Center for Human Genetics Research (Vanderbilt University); specifically Justin Giles, Yuki Bradford, and David Sexton for their support in data processing.
Author contributions: RLZ, JLM, and JLH designed the study and wrote the manuscript. RLZ, JLM, and JLH performed analytical work. RLZ performed the quality control analysis. IK, PLW, and CA generated and processed genotype data for analysis. PLD, MB, SP, RB, SR, NTA, LP, WLM, DP, DE, AHC, BC, JDR, LFB, AJI, AC, DAH, SLH, JRO, SJS and MPV contributed to DNA sample collection, genetic data, discussion, and edits to the manuscript. All authors have read and contributed to the manuscript.