|Home | About | Journals | Submit | Contact Us | Français|
1John P. Hussman Institute for Human Genomics, University of Miami, Miller School of Medicine, Miami, FL 33136, USA.
2Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
3Program in NeuroPsychiatric Genomics, Department of Neurology, Center for Neurologic Diseases, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
4Program in Medical and Population Genetics, Broad Institute of Harvard University and Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
5Department of Clinical Neurosciences, Addenbrooke's Hospital, University of Cambridge, Box 165, Hills Road, Cambridge CB2 2QQ, UK.
6Rush Alzheimer Disease Center and Department of Neurological Sciences, Rush University, Chicago, IL 60612, USA.
7Department of Neurology, Washington University Saint Louis, Saint Louis, MO 63110, USA.
8Department of Social Medicine, University of Bristol, Bristol BS8 2BN, UK.
9Division of Community Health Sciences, St George's, University of London, London SW17 ORE, UK.
10Department of Neurology, School of Medicine, University of California, San Francisco, CA 94143-0435, USA.
11Laboratory in Genetics and Genomic Medicine of Inflammation, Université de Montréal and the Montreal Heart Institute, Montreal, Quebec H1T 1C8, Canada.
12Division of Epidemiology, School of Public Health, University of California, Berkeley CA 94720-7356, USA.
13Harvard NeuroDiscovery Center, Harvard Medical School, Boston, MA 02155, USA.
14Department of Neurology, Yale University School of Medicine, New Haven, CT 06520-8018, USA.
15Institute for Human Genetics, School of Medicine, University of California, San Francisco, CA 94143-0435, USA.
†These individuals contributed equally to this work.
Genome-wide association studies (GWASs) have proven highly effective, identifying hundreds of associations across numerous complex diseases. These studies typically test hundreds of thousands of variations and identify hundreds of potential associations. However, to date, follow-up attempts have generally only concentrated on just the few most significant initial associations, leaving the majority of true associations in any GWAS study without replication. Here, we present a substantially more comprehensive follow-up of the first genome-wide association screen performed in multiple sclerosis (MS), a complex genetic disease with central nervous system inflammation. We genotyped approximately 30 000 single-nucleotide polymorphisms (SNPs) that demonstrated mild-to-moderate levels of significance (P ≤ 0.10) in the initial GWAS in an independent set of 1343 MS cases and 1379 controls. We further replicated several of the most significant findings in another independent data set of 2164 MS cases and 2016 controls. We find considerable evidence for a number of novel susceptibility loci including KIF21B [rs12122721, combined P = 6.56 × 10−10, odds ratio (OR) = 1.22] and TMEM39A (rs1132200, P = 3.09 × 10−8, OR = 1.24), both of which meet genome-wide significance. Both of these loci were overlooked in the initial replication, despite being among the top 3000 (~1%) SNP hits in the original screen.
Multiple sclerosis (MS, MIM 126200) is an inflammatory, demyelinating disease of the central nervous system (CNS), thought to be mediated by an autoimmune process. It affects over 2 million individuals world-wide. The disease is characterized by mononuclear cell infiltration in the CNS associated with demyelination leading to a spectrum of symptoms and disability within affected individuals. MS is most common in young adults and affects women two to three times more frequently than men. Family and twin studies have long shown evidence for a strong genetic component underlying the etiology of MS. Until recently, the major histocompatibility complex (MHC) was the only universally accepted genetic locus associated with MS.
In 2007, we reported the first genome-wide association study (GWAS) for MS susceptibility. In this GWAS, we screened 931 trio families (an affected individual and both parents) with 334 923 single-nucleotide polymorphisms (SNPs) and followed-up 110 of the most promising associations in additional cases (n = 2322), controls (n = 5418) and trio families (n = 609). This first-pass follow-up resulted in the identification of three strongly associated SNPs outside of the MHC, namely rs6897932 in the interleukin-7 receptor α gene (IL7RA) and both rs12722489 and rs2104286 within the interleukin-2 receptor α gene (IL2RA) (1). These associations were replicated by a number of groups (2–5) and further refined in subsequent analyses (6). The GWAS also identified highly suggestive associations with variations in CLEC16A and CD58, both of which have subsequently been confirmed, along with other genes identified through additional MS GWAS and restricted follow-up efforts (e.g. TNFRSF1A, IRF8, CD6, TYK2, CD226 and CYB27B1) (7–14). These genes are now the focus of multiple ongoing studies to confirm and understand their potential involvement in MS susceptibility.
Statistically we would expect the pool of moderately significant GWAS results to be enriched for genuine associations. To more comprehensively test for additional MS-associated loci, we examined approximately 30 000 SNPs, whose initial association P-values were ≤0.10 in the original IMSGC GWAS, in an independent data set.
In Stage 1, we obtained genotype data on 30 915 SNPs in 1488 cases and 3710 controls. Following extensive quality control (QC), we were ultimately able to utilize 29 561 SNPs in 1343 cases and 3577 controls for association with MS. This data set gave us 80% maximum potential power to detect risk odds ratio (OR) of 1.25, accepting a type 1 error rate of 0.001 (Supplementary Material, Fig. S1) (15). There were 85 SNPs, outside of the MHC (i.e. 29–34 Mb on chromosome 6), demonstrating high levels of significance (P ≤ 0.001) (Table 1). Detailed analysis of SNPs within the broader MHC is the focus of a separate parallel project. As the SNPs selected for Stage 1 were chosen without consideration of linkage disequilibrium (LD), there are a number of SNPs with P-values ≤0.0001 that are in relatively strong LD with each other and therefore the significant SNPs do not represent 85 independent loci. As expected, there are a number of Stage 1 top hits in previously identified MS genes, including CLEC16A (1,7), CD58 (1,8), IRF8 (11) and MMEL1 (M.B., unpublished data). As is typically the case in replication studies, previous top hits have shifted ranking in subsequent follow-up experiments. Our study is no exception, as the association P-values with arguably the two most notable genes, IL2RA (rs2104286, P = 1.89 × 10−2) and IL7RA (rs6897932, P = 1.03 × 10−2), fall just below our arbitrary P-value cutoff (P < 1.0 × 10−4) for inclusion in Table 1 (see Supplementary Material, Table S1 for results of the remaining 29 447 SNPs analyzed in Stage 1).
Following our analysis of the Stage 1 follow-up, we choose a smaller subset of SNPs for further replication in an independent data set. The results of the 19 SNPs genotyped (Sequenom MassARRAY iPLEX) and analyzed for Stage 2 (20 SNPs were genotyped, with 1 failing QC) are presented in Table 2. Eight of these SNPs demonstrated further replication (P ≤ 0.05, with consistent OR) in this independent data set. A combined analysis for these 19 SNPs using data from the original screen and both Stage 1 and Stage 2 included 931 Trios, 3507 cases and 8024 controls. Five SNPs meet a conservative estimate of genome-wide significance using a Bonferroni correction (P-value cutoff 1.49 × 10−7) considering the 334 923 independent tests from the original GWAS screen (Table 2). Furthermore, 4/5 SNPs were significant in each of the independent data sets. These four SNPs lie within or nearby KIF21B (on chromosome 1), TMEM39A (on chromosome 3), C16orf75 and PRM1 (both on chromosome 16). However, the two SNPs on chromosome 16 (rs12922090 and rs243315) near C16orf75 and PRM1 are in very strong LD (D′ = 0.99, r2 = 0.82). We also performed conditional logistic regression on these 19 SNPs conditioning on the HLA-DRB1*1501 tag (rs3135388); interestingly, the three SNPs (rs12922090, rs243315 and rs12927773) on chromosome 16 show slightly more significance in the HLA conditional analysis (Table 2).
We find considerable evidence for several new MS susceptibility loci including KIF21B (rs12122721, combined P = 6.56 × 10−10, OR = 0.82), TMEM39A (rs1132200, combined P = 3.09 × 10−8, OR = 0.80) and PRM1 (rs243315, combined P = 1.07 × 10−7, OR = 0.83), all of which have demonstrated moderate-to-strong significance in each stage of our analyses and furthermore meet genome-wide significance using a stringent Bonferroni correction.
We have successfully identified novel loci for MS through more detailed examination of results from a large first-generation GWAS. Interestingly, in the original GWAS, the SNPs in KIF21B, TMEM39A and PRM1, although relatively significant in the more powerful case/control analysis [Cochran–Mantel–Haenszel (CMH) P-value ranks between 0.3 and 4.2%], failed to rise to the top of the more limited family-based analysis [the most significant SNP (rs12122721) had a transmission disequilibrium test (TDT) P-value rank of 28.9%] (Tables 1 and and3).3). Furthermore, these SNPs were among the top P-values (CMH P-value ranks between 0.4 and 2.1%) in a recent meta-analysis of three GWASs (11) (Table 3). These overall results clearly demonstrate that additional true susceptibility loci are likely to be buried beneath the top association results from GWAS (and even meta-analyses of GWAS), and subsequently overlooked in the rush to follow up the top hits. Testing only the ‘top hits’ is often the result of the limited availability of resources after conducting such a massive initial screening experiment. Our data suggest that it is imperative to perform a more comprehensive follow-up study in the pursuit of identifying all loci contributing to the genetic load for a given complex disease.
Furthermore, of the top Stage 1 results (P ≤ 0.001), the average original GWAS P-value ranking of these SNPs is approximately 40 000 for the CMH test (most significant SNP ranking 177, least significant SNP ranking 319 841) and approximately 69 000 for the TDT test (most significant SNP ranking194, least significant SNP ranking 308 800). Approximately one-third (29/85) of the most significant non-MHC SNPs in Stage 1 (Table 1) had original GWAS P-values <0.10 in both the TDT and CMH tests, with only two of these SNPs further replicating in Stage 2 (rs11583328 and rs10469900) (Table 2). We extended this examination by ranking the three SNPs meeting genome-wide significance (i.e. within or nearby KIF21B, TMEM39A and PRM1) along with the most significant SNPs from the original GWAS (or in the case of IL2RA where rs2104286 has been indicated as the primary association (6)) and from other subsequently identified MS susceptibility loci with varying levels of confidence. In addition, we examined the rank of these SNPs in a recent meta-analysis (Table 3). The original P-values of the three newly identified loci were similar to those P-values seen in the other confirmed loci. Furthermore, each of these SNPs was mildly to moderately significant in the meta-analysis, but as in the initial GWAS follow-up, these loci fall far enough from the top that they are not initially selected for limited follow-up. It follows that there may be other yet-to-be-confirmed loci within this same range of the data. It is also noteworthy to highlight the robustness of the CMH test compared with the TDT in identifying all of these loci in the original screen. This may in part be related to the gain in power due to the additional samples used in the CMH analysis.
The new MS loci identified in this study are functionally interesting. KIF21B is a plus end-directed kinesin-like protein (KLP) involved in neuronal (axonal) transport. Its uniqueness stems from its enrichment in dendrites compared with the typical cell body and from its contrast from other plus end-directed KLPs, which have axon enrichment (16). KIF21B is also expressed in a variety of immune cells. Although KIF21B has not been functionally associated with neurodegeneration or inflammation, given the nature and role of its protein in neurons, there is a plausible biologic role for this gene in MS. Recently, another kinesin superfamily member (KIF1B) was reported as associated with MS (17); however, efforts by the IMSGC have failed to confirm this association (IMSGC, unpublished data). KIF21B is among the first genes identified via association studies, with the potential for a direct neurodegenerative role in MS pathology.
Very little has been known about TMEM39A (mRNA-transmembrane protein 39A). The associated SNP (rs1132200) within this gene causes a non-synonymous amino acid change (alanine–threonine) at position 487 in the protein. Although this SNP may hold some functional effect relevant to MS, almost nothing is known about this gene and what biologic role it might play with regard to disease susceptibility.
PRM1 (protamine 1) functions as a DNA-binding protein expressed in the nucleus of sperm. The strongest association in this region is with rs243315 and is 5′ of PRM1; however, there are several SNPs across this region of chromosome 16 showing mild-to-moderate levels of significance within the top hits (rs12922090, rs243315, rs1292773) (Table 2). This region of chromosome 16 is >100 kb from CLEC16A and there is little-to-no LD between these SNPs and any SNP within CLEC16A. There is, however, a very nearby candidate gene, SOCS1 (suppressor of cytokine signaling 1), which is in strong LD with these SNPs and could possibly contain the true association. Additional work is needed to explore the exact location of this association, and is the focus of ongoing laboratory efforts.
Through this exhaustive follow-up approach, we have identified a number of additional MS susceptibility loci and highlighted even more loci that may yet prove to be involved in MS. Ultimately, fine mapping and functional studies will be required to understand the consequences of the associations detected in this experiment.
DNA samples from study participants were ascertained at two sites within the USA [Brigham and Women's Hospital in Boston (BWH) and the University of California at San Francisco (UCSF)] and through one site in the UK [University of Cambridge (CMS)]. All affected individuals met the McDonald criteria for a positive diagnosis for MS (18). Unrelated controls were obtained from these US sites and from the British 1958 Birth Cohort Study. These controls were selected to provide nearly equivalent gender and age matching. This sample set contained 2961 individuals (1479 cases and 1482 controls) for genotyping. Additional control sample data were available on 2198 samples from both the National Institute of Mental Health (NIMH) and the Wellcome Trust Case Control Consortium (WTCCC). Data from these additional controls were previously analyzed in the 110 SNPs selected for replication in the original GWAS (1). With the exception of a small set of overlapping SNPs genotyped in this effort (95 of the 110 SNPs from the replication phase of the original GWAS were genotyped and analyzed in this study), these control data are completely independent of previous association testing in these MS samples. All samples used in the Stage 1 analysis come from participants self-reporting as non-Hispanic whites (Table 4).
Cases and controls genotyped for Stage 2 were made available through an entirely independent replication set (11). This data set consists of an additional 2164 cases and 2016 controls from the sites listed previously as well as those made available through other collaborative efforts. The same criteria were applied to these cases and controls as in Stage 1 (Table 5).
Approval for these studies was granted by the appropriate institutional review boards. All studies were performed after informed consent from human subjects.
We utilized the Illumina iSelect Custom BeadChip platform to perform additional genotyping of a more in-depth list of top hits from the GWAS experiment (19). This experiment was performed in parallel with several other projects organized through the IMSGC to maximize the use of samples and resources. This strategy allowed us to use the maximum number of bead types (60 800) available for the iSelect platform (depending on the chemistry used for assaying a particular SNP, there may be one bead type per SNP or two bead types per SNP). The SNPs selected for inclusion in our Stage 1 effort satisfied two criteria: (i) SNPs demonstrating P-values ≤0.10 in either the TDT or the CMH test, from the original GWAS screen; (ii) SNPs that had an Infinium score >0.60 (a proprietary score used by Illumina to determine the likelihood of assays to generate accurate and reliable results). In the original GWAS, a total of 62 488 SNPs had a P-value ≤0.10 in either the TDT or CMH test; of these, 33 484 had an Infinium quality score >0.60. These 33 484 SNPs were selected for inclusion in Stage 1 of our replication effort along with an additional 19 318 SNPs (for other parallel IMSGC projects) giving a total of 52 801 SNPs (60 800 bead types). Once manufacturing and internal QC procedures at Illumina were complete, 48 767 SNPs (~92% of the total requested) were arrayed on each of the beadchips for genotyping, including 30 915 of the Stage 1 follow-up SNPs (~49% of those meeting the initial criteria) (Table 6). A total of 29 561 of these 30 915 SNPs were ultimately analyzed after QC procedures were completed. Through LD (r2 = 0.80), these 29 561 SNPs capture 60.1% of the total SNPs (62 488) having a P-value ≤0.10 in the original GWAS. Furthermore, these SNPs, through LD (r2 = 0.80), cover 92% of all the SNPs with P-values ≤0.05 in the original screen. Supplementary Material, Figure S2 provides a visual summary of those SNPs analyzed in Stage 1 relative to their significance in the original GWAS and of those SNPs further chosen and analyzed in Stage 2.
We followed the Illumina Infinium protocol for the genotyping of DNA samples. In brief, this involved amplification and subsequent fragmentation of genomic DNA, followed by hybridization of this fragmented DNA to the BeadChip, then an extension step and finally imaging to read the chip (19). We genotyped an initial data set of 2961 individuals (1479 cases and 1482 controls) distributing DNA samples across beadchips (12 samples per beadchip), with attention given to representing both cases and controls from each of the different ascertainment sites on every chip as to minimize any experimental biases in genotyping performance.
Following the analysis for Stage 1, there were 85 SNPs outside of the MHC region with association P-values ≤0.001, and of these, we genotyped 20 SNPs in a second independent data set (independent of both Stage 1 and the original GWAS) for our Stage 2 follow-up. There were five criteria used to select the SNPs for Stage 2 genotyping: (i) Stage 1 P-value ≤0.001; (ii) SNPs within or nearby known genes; (iii) exclusion of SNPs in the MHC (within 29–34 Mb on chromosome 6); (iv) exclusion of SNPs overlapping with previously identified MS genes or examined as part of the initial GWAS replication effort; (v) exclusion of SNPs being analyzed as part of other parallel projects using this common data set. We chose 21 SNPs that met these criteria; however, one SNP (rs9855065) failed to pass the design process. We used the Sequenom MassARRAY iPLEX platform for this genotyping. The Sequenom protocol involves a multiplex PCR reaction prior to a single-base primer extension reaction. The individual SNPs are identified by using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (20).
We initially performed a thorough series of QC procedures, which are described in the Supplemental Material. Following stringent QC and finding no significant population differences in this data set, we chose to analyze this data set as one uniform sample collection (Supplementary Material, Fig. S3). The Stage 1 test for association was conducted using a logistic regression approach as implemented in PLINK and using PCA1 and PCA2 as covariates to correct for differential genotyping bias (21). This method tests for a linear trend in the number of alleles at a single locus. This analysis included GWAS data from 2198 NIMH and WTCCC controls used in the original GWAS replication in addition to the newly genotyped data set of 1343 cases and 1379 controls. After removing SNPs from the MHC (i.e. 29–34 Mb on chromosome 6), the genomic inflation factor (GIF) was 1.16 (Supplementary Material, Fig. S4). This is larger than the original GWAS GIF (1.05) and is likely due to preferential selection of SNPs with small P-values. In addition to the standard logistic regression, a conditional logistic regression analysis was also performed conditioning on the HLA-DRB1*1501 tag SNP (rs3135388). Genotypes for rs3135388 had previously been imputed for the NIMH and WTCCC control, as this SNP was not genotyped on the Affymetrix 500K chip.
PLINK was also used for the Stage 2 replication analysis. Logistic regression was used to test for association with the 19 SNPs and 4180 independent replication samples that passed QC. To perform a joint analysis of both Stage 1 and Stage 2 data sets, and the original GWAS screen (931 trios and 2431 controls), the UNPHASED software was utilized (22). A joint conditional analysis was also done on the HLA-tag SNP (rs3135388) in UNPHASED.
The International MS Genetics Consortium is supported by grants, societies, foundations and a number of individual donors. P.L.D. is a Harry Weaver Neuroscience Scholar Awardee of the National MS Society (NMSS); he is also a William C. Fowler Scholar in Multiple Sclerosis Research. D.A.H. is a Jacob Javits Scholar of the NIH. We acknowledge the use of genotype data from the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. This work was supported by the National Institutes of Health (R01NS049477 to the IMSGC, R01NS32830 to J.L.H., NS46341 to P.L.D.); the National Multiple Sclerosis Society (NMSS) (AP3758-A16 to the IMSGC, RG4201-A-1 to J.L.M., RG4198-A-1); the Wellcome Trust (084702/Z/08/Z); the Medical Research Council (G0000934); a number of individual donors; and the Cambridge NIHR Biomedical Research Centre Funding to pay the open access charge was provided by the IMSGC.
We thank the Accelerated Cure Project for its work in collecting samples from subjects with MS and for making these samples available to IMSGC investigators. We also thank the following clinicians for contributing to sample collection efforts: Accelerated Cure Project—Drs Elliot Frohman, Benjamin Greenberg, Peter Riskind, Saud Sadiq, Ben Thrower and Tim Vollmer; Washington University—Drs B.J. Parks and R.T. Naismith. We thank the Brigham and Women's Hospital PhenoGenetic Project for providing DNA samples from healthy subjects that were used in the Stage 2 follow-up effort of this study. We acknowledge the work done by the Biorepository and the Center for Genome Technology within the John P. Hussman Institute for Human Genomics (University of Miami); specifically, Sandra West for aid in specimen management and both Ashley Anderson and Luis Espinosa for aid in sample processing and genotyping. We thank the Computational Genomics Core within the Center for Human Genetics Research (Vanderbilt University); specifically, Justin Giles, Yuki Bradford and David Sexton for their support in data processing. We also thank Joanne Wang for meta-analysis data management.
Conflict of Interest statement. None declared.