|Home | About | Journals | Submit | Contact Us | Français|
Multiple sclerosis (OMIM 126200) is a common disease of the central nervous system in which the interplay between inflammatory and neurodegenerative processes typically results in intermittent neurological disturbance followed by progressive accumulation of disability.1 Epidemiological studies have shown that genetic factors are primarily responsible for the substantially increased frequency of the disease seen in the relatives of affected individuals;2,3 and systematic attempts to identify linkage in multiplex families have confirmed that variation within the Major Histocompatibility Complex (MHC) exerts the greatest individual effect on risk.4 Modestly powered Genome-Wide Association Studies (GWAS)5-10 have enabled more than 20 additional risk loci to be identified and have shown that multiple variants exerting modest individual effects play a key role in disease susceptibility.11 Most of the genetic architecture underlying susceptibility to the disease remains to be defined and is anticipated to require the analysis of sample sizes that are beyond the numbers currently available to individual research groups. In a collaborative GWAS involving 9772 cases of European descent collected by 23 research groups working in 15 different countries, we have replicated almost all of the previously suggested associations and identified at least a further 29 novel susceptibility loci. Within the MHC we have refined the identity of the DRB1 risk alleles and confirmed that variation in the HLA-A gene underlies the independent protective effect attributable to the Class I region. Immunologically relevant genes are significantly over-represented amongst those mapping close to the identified loci and particularly implicate T helper cell differentiation in the pathogenesis of multiple sclerosis.
We performed a large GWAS as part of the Wellcome Trust Case Control Consortium 2 (WTCCC2) project. Cases were recruited through the International Multiple Sclerosis Genetics Consortium (IMSGC) and compared with the WTCCC2 common control set12,13 supplemented by data from the control arms of existing GWAS. We introduced a number of novel quality control (QC) methods for processing these datasets (see Supplementary Information), which ultimately provided reliable information from 9772 cases and 17376 controls (see Figure 1A). Following single nucleotide polymorphism (SNP) based QC, data from 441547 autosomal SNPs, common to all internally and externally generated datasets, were available for analysis.
The multi-population nature of our study (Figure 1 A and B) afforded an opportunity to assess various published approaches for controlling the potential confounding effects of population structure, several of which (in the event) proved unhelpful (see Supplementary Information). Whilst not common in primary GWAS undertaken to date, the challenge of combining data across populations, in contexts where not all case samples have controls available from the same population (thus precluding standard meta-analytical techniques), may become more routine as study sizes increase.
We attempted analyses of the non-United Kingdom (UK) data with the now widespread technique of using principal components as covariates to correct for structure. However, even use of all seven top principal components which captured genome-wide effects in our data resulted in an unacceptably high genomic inflation: for example, the genomic control factor (λ)14 was λ = 1.2. We tried to reduce the genomic inflation by discarding the case samples that seemed least well matched to control sets. Removal of half the available cases in this fashion only reduced λ to 1.1. In another approach to handling structure, statistical clustering algorithms were successful in identifying subgroups of the data within which cases and controls appeared well-matched for ancestry (See Supplementary figure S17). However tests within these subgroups combined via fixed-effects meta-analysis also yielded unacceptably high genomic inflation (λ >1.4) in an analysis with seven matched sub-groups of cases and controls. Finally, we applied a novel variance components method (similar to Kang et al.15), separately to the UK and non-UK datasets, that explicitly accounts for correlations among the phenotypes of individuals resulting from relatedness, allowing us to deal successfully with all sources of structure in our samples (see Supplementary Information for details of the linear mixed model we used). For example, the genomic inflation was reduced to λ = 0.995 in the UK and 1.016 in the non-UK data (see also Supplementary Information). After fixed effects meta-analysis of the results from the UK and non-UK datasets, the inflation factor was λ = 1.045. We adopted this approach for all subsequent non-MHC association analyses.
Outside the MHC we identified 95 distinct regions having at least one SNP associated with multiple sclerosis at pGWAS <1×10−4.5; in six of these 95 regions conditional analysis revealed an additional SNP showing association to the same locus (one locus containing two such SNPs). In total we took all 102 SNPs forward to replication, which we performed using data from previously reported multiple sclerosis GWAS8,9 and the iControl database (excluding any WTCCC controls previously used in these studies). In total, the replication analysis included data from 4218 cases and 7296 controls. These were considered in six independent strata after which results were combined through a fixed effects meta-analysis. For 98 of the 102 SNPs, the same allele was over-represented in cases compared to controls. Twenty three of the 26 previously known or strongly suggested multiple sclerosis associated loci were replicated in our primary GWAS with pGWAS <1×10−3. Our GWAS and replication also revealed another 29 novel associated regions (defined as having pGWAS <1×10−4.5, one-sided pReplication <0.05, and pCombined <5×10−8), and a further 5 regions with strong evidence for association (with pGWAS <1×10−4.5, one sided pReplication <0.05, and pCombined <5×10−7). In one previously reported locus and two novel loci, additional SNPs were identified as being conditionally important in explaining risk. Just over one third of the identified loci overlap with regions already confirmed as associated with at least one other autoimmune disease (according to the GWAS catalog, http://www.genome.gov/gwastudies/). Results both for the previously established and novel loci are shown in Figure 2 and Supplementary Tables 1-3; and details of all 102 SNPs taken to replication are available in the Supplementary Data file.
In order to assess objectively the collective evidence across the associated regions for particular classes of genes, we performed statistical analyses to look for enrichment of genes with similar function. We first identified the nearest gene to the lead SNP in each of the (52) regions of association and used the Gene Ontology (GO) database16 to define sets of functionally related genes (GO terms). We then tested whether the set of nearest-genes was enriched for particular GO terms using Fisher’s exact test. The GO terms having the most significant enrichment include genes linked to lymphocyte function (p =3.2×10−11, OR = 35.96) and in particular those with a role in T cell activation and proliferation (p = 1.85×10−9, OR = 40.85). These are representative of a larger group associated with various components of the GO ‘immune system process’ (p = 8.6×10−11, OR = 9.12). A similar analysis based on all genes in or near association regions showed similar enrichment, as did independent analyses based on nearest-gene or all genes in our next tier of signals, the 42 regions taken to replication but not meeting the thresholds above for association (see Supplementary file.) Although GO immune system genes only account for 7% of human genes, in 30% of our association regions the nearest gene to the lead SNP is an immune system gene. As an illustration, Figure 3 shows a schematic of genes involved in the T helper cell differentiation pathway; a striking number show strong evidence for association with multiple sclerosis particularly those acting as cell surface receptors. We infer from this pathway analysis of our GWAS signals that specific classes of immune system genes are especially important in the pathogenesis of multiple sclerosis.
Our screen not only implicates a multitude of genes coding for cytokine pathway (CXCR5, IL2RA, IL7R, IL7, IL12RB1, IL22RA2, IL12A, IL12B, IRF8, TNFRSF1A, TNFRSF14, TNFSF14), co-stimulatory (CD37, CD40, CD58, CD80, CD86, CLECL1) and signal transduction (CBLB, GPR65, MALT1, RGS1, STAT3, TAGAP, TYK2) molecules of immunological relevance, but also relates to previously reported environmental risk factors such as Vitamin D9,17 (CYP27B1, CYP24A1) and therapies for multiple sclerosis including Natalizumab18 (VCAM1) and Daclizumab19 (IL2RA). There is a relative absence of genes relevant to potential pathways for neurodegeneration independent of inflammation (GALC, KIF21B).
To refine our understanding of the MHC associations in multiple sclerosis we imputed classical Human Leukocyte Antigen (HLA) types at six loci (A, B, C, DQA1, DQB1 and DRB1)20 and analysed these alongside the SNPs (see Supplementary Information for validation; at alleles responsible for the major signals described below, estimated specificity was at least 0.99 and sensitivity was at least 0.98, except for DRB1*13:03, where it was 0.88). Primary discovery was focused on the UK cohort with candidate signals being validated through support from additional case-control cohorts. Because of the extensive linkage disequilibrium within the MHC, we identified associated alleles in a stepwise manner, selecting the most strongly associated to include in a general model, in turn, if pUK <10−4 and pcombined <10−9 (Supplementary Information). At each stage we explored possible interactions and departures from the simple model in which risk increases multiplicatively with each additional copy of the relevant allele (additive increase on the log-odds scale) within the logistic risk framework.
By this approach we found that DRB1*15:01 has the strongest association with multiple sclerosis amongst all classical and SNP alleles, with a consistent effect between cohorts (p <1×10−320: Figure 4A). The data are consistent with an additive effect on the log-odds scale for each additional allele. Conditioning on DRB1*15:01, we confirmed the presence of a protective Class I allele and identified the signal as being driven by HLA-A*02:01 (as previously suggested),21 with a consistent effect size across cohorts (p = 9.1×10−23: Figure 4A). Again, we found no strong evidence for departure from additivity on the log-odds scale or statistical interaction with DRB1*15:01. Conditioning on both DRB1*15:01 and A*02:01 revealed additional risk associated with the strongly linked alleles DRB1*03:01 and DQB1*02:01 (p = 3.6×10−10: Figure 4A; note that we cannot separate these alleles but for simplicity refer only to DRB1*03:01 below). Further conditioning identified an additional DRB1 risk allele DRB1*13:03 (p = 1.3×10−11: Figure 4A). Although no other classical alleles meet the above criteria, we did observe several SNPs providing independent signals, the strongest coming from rs9277535_G (combined OR 1.28, p = 2.2×10−22), an allele known to be in linkage disequilibrium with DPB1*03:01 (r2 = 0.37).22
Analysis of the MHC SNP data using a genealogical method (GENECLUSTER)23 offers an alternate means of relating our results to classical HLA alleles that provides additional insight into the underlying genetic architecture (see Supplementary Information). Figure 4B shows genealogical trees relating the classical alleles at DRB1 and HLA-A, together with the estimated evolutionary position of the mutations predicted by GENECLUSTER as most completely modelling the association. At HLA-DRB1, three mutations are predicted, each of which implicates a clade of haplotypes carrying particular DRB1 alleles. All of the DRB1 alleles we have shown to be independently associated are included in these clades, each corresponding to a particular mutation. In addition, the analysis also explains why those haplotypes carrying the *08:01 allele have previously been shown to increase risk24,25 since they carry the same mutation as those bearing *13:03. At HLA-A, the predicted protective mutation is also concordant with our regression analysis of classical alleles in implicating *02:01 but, in addition, predicts that *68:01, *02:05, and *02:06 carry the same protective allele. All of these secondary predictions (increased risk from DRB1*08:01 and protection from HLA-A*68:01, *02:05, and *02:06) are supported in our regression analysis of classical alleles but the power to detect them in the primary analyses is limited because each allele occurs at a very low frequency.
We found no evidence for genetic associations with clinical course, severity of disease or month of birth, and no evidence of interaction with gender or DRB1*15:01 in any part of the genome (see Supplementary Information). However, analysis with respect to age at onset replicated the previously suggested association with the DRB1*15:01 allele.26 Although no other part of the genome contained individual SNPs showing strong evidence for association, risk alleles determining susceptibility are collectively more closely associated with age at onset than expected by chance, suggesting that individual genetic susceptibility is inversely correlated with age at onset.
Our GWAS - large for any complex trait having a prevalence of 1:1000 and involving diverse populations of European descent - has identified 29 novel susceptibility loci. Four mutations, one from Class I and three from Class II, with effects modelled in a simple multiplicative manner within and across loci are sufficient to account for most of the risk attributable to the MHC (see Supplementary file). Although our data do not address the issue of which components within the nervous system are initially damaged by the inflammatory response the over-representation of genes that influence T cell maturation provides independent and compelling evidence that the critical disease mechanisms primarily involve immune dysregulation.
More generally, our study reinforces the view that the GWAS design, combined with very large experimental sample sizes and careful statistical analysis, provides valuable insights into the genetic architecture of common complex diseases. Here, this approach has identified many associated genetic variants close to genes, which are both individually interesting and collectively illuminate the roles of key biological pathways. It also provides indirect evidence that many more common variants of small effect contribute to genetic susceptibility for multiple sclerosis. Simple models, in which the previously-known and newly-identified variants affect risk multiplicatively, both within and across loci, explain a meaningful proportion (~20%, see Supplementary Information) of genetic risk for the disease. Important challenges lie ahead, in understanding overlap between the genetic basis for susceptibility in the context of different autoimmune diseases, and in uncovering the functional mechanisms underlying these associations.
Details of case ascertainment, processing and genotyping, together with sample and genotyping quality control are provided in the Supplementary Information. Statistical methods developed for testing the reliability of externally generated data sets, detecting samples with non-European ancestry, correcting for structure, classical HLA imputation and meta-analysis are also outlined in the Supplementary Information. Results for all scans and all reported loci are described in detail in the Supplementary Information.
The principal funding for this study was provided by the Wellcome Trust (085475/B/08/Z, 085475/Z/08/Z, 075491/Z/04/Z and 068545/Z/02). The work was also supported by National Institutes of Health (AI076544, NS032830, NS049477, NS19142, NS049510, NS26799, NS43559, NS067305, CA104021, RR020092, RR024992 and K23N/S048869), US National Multiple Sclerosis Society (RG 4201-A-1), Nancy Davis Foundation, Cambridge NIHR Biomedical Research Centre, UK Medical Research Council (G0700061, G0000934), Multiple Sclerosis Society of Great Britain and Northern Ireland (898/08), Wolfson Royal Society Merit Award, Peter Doherty fellowship, Lagrange Fellowship, Harry Weaver Neuroscience Scholarships, Australian National Health and Medical Research Council (NHMRC), Australian Research Council Linkage Program Grant, JHH Charitable Trust Fund, Multiple Sclerosis Research Australia, Health Research Council New Zealand, National MS Society of New Zealand, Wetenschappelijk Onderzoek Multiple Sclerose, Bayer Chair on Fundamental Genetic Research regarding the Neuroimmunological aspects of Multiple Sclerosis, Biogen Idec Chair Translational Research in Multiple Sclerosis, FWO-Vlaanderen, Belgian Neurological Society, Danish Multiple Sclerosis Society, Neuropromise EU grant (LSHM-CT-2005-018637), Center of Excellence for Disease Genetics of the Academy of Finland, Sigrid Juselius Foundation, Helsinki University Central Hospital Research Foundation, Bundesministerium für Bildung und Technologie (KKNMS consortium Control MS), Deutsche Forschungsgemeinschaft, Institut National de la Santé et de la Recherche Médicale (INSERM), Association pour la Recherche sur la Sclérose En Plaques (ARSEP), Association Française contre les Myopathies (AFM), Italian Foundation for Multiple Sclerosis (2002/R/40, 2005/R/10, 2008/R/11 and 2008/R/15), Italian Ministry of Health (grant Giovani Ricercatori 2007 - D.lgs 502/92), Regione Piemonte (grants 2003, 2004, 2008, 2009), CRT Foundation, Turin, Moorfields / UCL Institute of Ophthalmology NIHR Biomedical Research Centre, Norwegian MS Register and Biobank, Research Council of Norway, South-Eastern and Western Norway regional Health Authories, Ullevål University Hospital Scientific Advisory Council, Haukeland University Hospital, Amici Centro Sclerosi Multipla del San Raffaele (ACESM), Association of British Neurologists, Spanish Ministry of Health(FISPI060117), Bibbi and Niels Jensens Foundation, Montel Williams foundation, Hjärnfonden and Swedish medical research council (8691), Stockholm County Council (562183), Swedish Council for Working life and Social Research, Gemeinnützige Hertie Stiftung, Northern California Kaiser Permanente members and Polpharma Foundation, and Washington University Institute of Clinical and Translational Sciences - Brain, Behavioral and Performance Unit.
We acknowledge use of data from the British 1958 Birth Cohort, the UK National Blood Service, the popgen biobank, the KORA and MONICA Augsburg studies, the Accelerated Cure Project, the Brigham & Women’s Hospital PhenoGenetic Project, the Swedish CAD project, the Norwegian Bone Marrow Donor Registry, the Children’s Hospital of Philadelphia (CHOP), the Swedish Breast Cancer study, BRC-REFGENSEP (Pitié-Salpêtrière Centre d’Investigation Clinique (CIC) and Généthon) and HYPERGENES (HEALTH-F4-2007-201550). Projects which received support from the German Ministry of Education and Research, the Helmholtz Zentrum München–National Research Center, the German National Genome Research Network (NGFN), the LMUinnovativ, the Knut and Alice Wallenberg Foundation, the Center for Applied Genomics from the Children’s Hospital of Philadelphia Development Award, the Agency for Science & Technology and Research of Singapore and the Susan G Komen Breast Cancer Foundation.
We thank S. Bertrand, J. Bryant, S.L. Clark, L. Collimedaglia, G. Coniglio, J.S. Conquer, B. Colombo, T. Dibling, G. Eckstein, J.C. Eldred, G. Fischer, S. Gamble, P. Gregersen, R. Guerrero, C. Hind, P. Lichtner, L. Moiola, H. Mousavi, R. Naismith, R.J. Parks, R. Pearson, V. Pilato, M. Radaelli, E. Scarpini, C.R. Stribling, T. Strom, S. Taylor, D. Vukcevic, A. Wilk, for their help and support.
Detailed acknowledgements are available in the supplementary information file.
This manuscript is dedicated to the memory of Leena Peltonen, a member of both the IMSGC and WTCCC2, in recognition of her many contributions to, and her leadership in, human genetics.