The present study comprises seven data sets of non-overlapping case and control subjects of European descent. Five strata (IMSGC-US, IMSGC-UK, GeneMSA-US, GeneMSA-NL, and GeneMSA-CH) were taken from a previously published meta-analysis.3
Details on these data sets can be found elsewhere.2–3
The sixth stratum (BWH/TT) is based on data from our previous study3
enriched with additional 1453 MS cases and 2176 controls, with all samples genotyped on the Affymetrix GeneChip 6.0 platform. Finally, we have added another stratum (ANZ) from a recently described genome-wide study containing 1618 cases and 1988 controls.5
Further information on these two strata and quality control applied can be found in the supplementary material
. summarizes the subject collections that have been assembled for this meta-analysis. All subjects met either 1) a diagnosis of MS per McDonald criteria6
or 2) a diagnosis of clinically isolated demyelinating syndrome (CIS) in which individuals have had one episode of inflammatory demyelination and harbor two or more T2 hyperintense lesions in their brain or spinal cord. The majority of CIS subjects go on to have a second episode of inflammatory demyelination, which results in a diagnosis of MS. An earlier study did not find differences in the distribution of susceptibility alleles in CIS and MS subjects, suggesting that their genetic architecture is similar.7
Characteristics of the meta-analyzed datasets
We used EIGENSOFT to remove outliers in terms of genetic ancestry and to calculate the top ten eigenvectors of the genotype data within each stratum.8
The seven data sets were genotyped using different genotyping platforms. To maximize genome-wide coverage, we used the imputation algorithm implemented in MACH to yield 2.5 million SNPs across the genome in all data sets.9
Imputation based on linkage disequilibrium (LD) patterns observed in a representative European population sample in HapMap is a widely used approach to increase the power of GWAS and facilitate in silico
After imputation, we excluded all SNPs with an imputation quality score less or equal to 0.10 or minor allele frequency (MAF) <0.01 per stratum. For each stratum, we tested the imputed dosages for association to case-control status using logistic regression, including the ten first eigenvectors as covariates to correct for population stratification. For each SNP, the dosage corresponds to the (imputed) number of the coded allele in a given individual and varies from 0 to 2 on a continuous scale, thus incorporating information about the imputation uncertainty. Under a per-allele model, we calculated the odds ratio (OR), its corresponding standard error (SE) and p-value. To evaluate the robustness of the observed distribution of the test statistic, we inspected the quantile- quantile (Q-Q) plot and calculated the genomic inflation factor (λGC
To correct for any residual, unexplained inflation of the test statistic, we corrected the SEs by multiplying them with the square root of the λGC
Finally, we performed the same analyses adjusting for sex and report these sensitivity analyses in the Supplementary material
Ensuring consistency in the strand orientation of the alleles across all strata, we meta-analyzed the ORs with the respective corrected SEs using inverse variance weighting under a fixed-effects model. We calculated the λGC
of the genome-wide association results to evaluate the robustness of the meta-analysis. Furthermore, in a secondary sensitivity analysis, we used a random effects model to meta-analyze, thus allowing for between-study heterogeneity. We used Cochran’s Q to test for the presence of statistical heterogeneity and I2
, with respective 95% confidence intervals, to quantify inconsistency of effects across the different strata.12, 13
We performed linkage disequilibrium (LD) pruning (r2 > 0.5) among correlated SNPs to identify the most statistically significant SNP in regions of strong LD. For non-MHC loci, we performed conditional analysis using a forward stepwise logistic regression for the most statistically significant SNPs (P < 10−6) within a 2-Mb distance from the best index SNP (with the lowest p-value) at a locus.
We pre-defined genome-wide significance for our meta-analysis at a P-value of < 5×10−8
. At this type I error rate, and under a fixed effect model, we have more than 80% power to detect an OR of 1.15 for a risk allele with 0.4 minor allele frequency. The Cochran’s Q test was considered to be statistically significant at P < 0.10. For analyses we used the PLINK v1.0714
To leverage the rapidly growing list of susceptibility loci associated with inflammatory diseases, we tested all known SNP associations with Crohn’s disease (CD), ulcerative colitis (UC), celiac disease (CE), type 1 diabetes (T1D), rheumatoid arthritis (RA), systemic lupus erythematosus (SLE) and psoriasis (PS) for a role in MS. To identify these bona fide
associations, we searched the online NHGRI catalog (www.genome.gov/26525384
) and PubMed for GWAs, meta-analyses of GWAS, or follow-up studies that reported a non-MHC SNP for these diseases with genome-wide significance (P < 5×10−8
). For each of these SNPs we tested for replication of an effect using different p-value thresholds (0.05, 10−3
, and 10−4
), allowing for heterogeneity in the direction of effect. As a comparison, we also list the genome-wide non-MHC SNPs associated with type 2 diabetes (T2D),15
lipid traits (LI),17
and myocardial infraction (MI)18–20
as negative control diseases, because we do not expect these diseases to have an etiologic relationship with MS.
For all SNPs that reached a P < 10−6, we also performed a meta-analysis under a recessive and dominant model. We used the posterior probabilities for each of the three genotypes (AA, AB, BB) from the imputations to calculate the corresponding dominant and recessive dosage in each individual for each SNP. With these dosages, we calculated the per-stratum ORs and corrected SEs, and meta-analyzed these to obtain the overall ORs and the corresponding p-values.
For the newly identified susceptibility loci, we sought to test the hypothesis that the identified SNPs can affect expression levels of nearby genes (within 1 Mb upstream and downstream of the SNP). We collected RNA expression data with an Affymetrix U133 v2.0 array from peripheral blood mononuclear cells (PBMCs) of 228 subjects with Relapsing Remitting (RR) MS or CIS. These data were collected between July 2002 and October 2007, as part of the Comprehensive Longitudinal Investigation of MS at the Brigham and Women’s Hospital.21,22
We regressed the observed gene expression on the SNP imputed dosages, adjusted for the treatment used. The probes that passed our quality check criteria (n=20,517) were used for the subsequent analyses. In an exploratory analysis, we performed an eQTL analysis of all of probes for each newly identified loci and organized the tail of the distribution of the results, i.e. probes that reached a nominal significance threshold (p < 0.05), using the Ingenuity Pathway Analysis (IPA) software. Ingenuity maps probe IDs to its database and performs statistical computing to identify the most significant canonical pathways and networks overrepresented in a given gene list as compared with the whole list of genes in the Human Genome U133 Plus 2.0 array. The canonical pathway analysis tool identified the pathways from the IPA library of canonical pathways that were most significant to the dataset, based upon genes within the dataset that were associated with a canonical pathway in the Ingenuity Pathways Knowledge Base. In a similar way, the software leveraged the input gene expression data to provide networks. Specifically, molecules of interest, which interact with each other, and molecules in the Ingenuity Knowledge Base were identified as Network Eligible Molecules, which served as “seeds” for generating networks.