|Home | About | Journals | Submit | Contact Us | Français|
The cause of sporadic amyotrophic lateral sclerosis (ALS) is largely unknown, but genetic factors are thought to play a significant role in determining susceptibility to motor neuron degeneration. To identify genetic variants altering risk of ALS, we undertook a two-stage genome-wide association study (GWAS): we followed our initial GWAS of 545 066 SNPs in 553 individuals with ALS and 2338 controls by testing the 7600 most associated SNPs from the first stage in three independent cohorts consisting of 2160 cases and 3008 controls. None of the SNPs selected for replication exceeded the Bonferroni threshold for significance. The two most significantly associated SNPs, rs2708909 and rs2708851 [odds ratio (OR) = 1.17 and 1.18, and P-values = 6.98 × 10−7 and 1.16 × 10−6], were located on chromosome 7p13.3 within a 175 kb linkage disequilibrium block containing the SUNC1, HUS1 and C7orf57 genes. These associations did not achieve genome-wide significance in the original cohort and failed to replicate in an additional independent cohort of 989 US cases and 327 controls (OR = 1.18 and 1.19, P-values = 0.08 and 0.06, respectively). Thus, we chose to cautiously interpret our data as hypothesis-generating requiring additional confirmation, especially as all previously reported loci for ALS have failed to replicate successfully. Indeed, the three loci (FGGY, ITPR2 and DPP6) identified in previous GWAS of sporadic ALS were not significantly associated with disease in our study. Our findings suggest that ALS is more genetically and clinically heterogeneous than previously recognized. Genotype data from our study have been made available online to facilitate such future endeavors.
Amyotrophic lateral sclerosis (ALS) is a rare and devastating neurodegenerative disease that predominantly affects motor neurons leading to progressive paralysis, and ultimately death from respiratory failure within 3–5 years of symptom onset. Approximately 5% of ALS is familial in nature, whereas the remaining 95% occurs sporadically throughout the community (1). Although the genetic causes of many monogenic familial forms of ALS have been described (2), the etiology of sporadic ALS is largely unknown. Familial aggregation studies, twin studies and epidemiological observations have suggested a substantial genetic contribution to disease risk (3,4). Recently, genome-wide association studies (GWAS) have putatively identified variants with moderate effects on the risk of developing ALS in the 1p32.1 region (FGGY) (5), in the 12p11 region (ITPR2) (6) and in the 7q36.2 region (DPP6) (7,8). However, these loci require replication in independent cohorts to confirm disease association, and, at most, account for only a fraction of the elevated risk of developing ALS, suggesting that additional genetic factors exist.
We conducted a two-stage GWAS to search for common variants with moderate risk (9,10). For the first stage, we used 555 352 SNPs that extract information on 91% of common autosomal SNPs identified in the European populations based on the HapMap data [CEU, r2 > 0.8, minor allele frequency (MAF) > 5%] (10,11). These SNPs were genotyped in two independent cohorts of European origin consisting of 553 ALS cases and 2338 controls. For the second stage, we analyzed the 7600 SNPs that were most associated with the altered risk of disease in the initial genome-wide scan in an additional 2160 cases and 3008 controls. The large number of SNPs and samples genotyped in the second stage provided sufficient power to follow up on regions with moderate association in the initial genome-wide scan (threshold P-value for follow-up study <0.005).
We conducted the initial genome-wide scan in a case-control cohort of 553 ALS cases and 2338 neurologically normal control of European ancestry. In the second stage, we genotyped 7600 of the most associated SNPs from the first stage in three additional replication cohorts totaling 2160 ALS cases, and compared this with data for the same 7600 SNPs in three control cohorts totaling 3008 samples. After quality control procedures, 6758 SNPs were available for analysis in a final combined stage 1 and stage 2 cohort of 2289 cases and 4532 controls. These SNPs covered 3152 distinct chromosomal regions defined by a maximal distance between two SNPs of <100 kb; 1745 regions contained only one SNP and 40 regions contained 10 or more SNPs. Of these regions, 94 had at least one SNP with an observed P-value <10−3 (Fig. 1).
None of the SNPs tested in this study clearly achieved genome-wide significance after correction for multiple testing (see Supplementary Material, Table S3 for association results of all 6758 tested SNPs). The SNPs with the lowest P-values identified by our two-stage GWAS were located on chromosome 7p12.3 (Table 1), a region which has not been previously linked to the pathogenesis of ALS. The SNPs were located within a 175 kb linkage disequilibrium (LD) block containing three genes, SUNC1, HUS1 and C7orf57. The strongest signal was observed for rs2708909 located in the third intron of the gene SUNC1 (P-value = 6.98 × 10−7 in combined analysis, NM_152782.3), which encodes the ‘SAD1 and UNC84 domain containing 1’ protein. The second SNP, rs2708851, was in complete LD with rs2708909 (r2 = 0.97; Fig. 2), and was located 22 kb upstream of SUNC1 within intron 4 of C7orf57 (NM_001100159.1). These SNPs did not exceed the threshold for genome-wide significance in the overall cohort, were only marginally associated with ALS risk when analyzed in the individual North American and Italian cohorts (P-values for rs2708909 = 5.40 × 10−5 and 0.0006, respectively), and were not associated in the German dataset (P-value for rs2708909 = 0.503), probably reflecting the smaller size of this cohort.
To further test the association with increased risk of disease, we genotyped rs2708851 and rs2708909 in a dataset of 989 US cases with ALS or other motor neuron diseases (MNDs) and 327 neurologically normal US controls, all of whom were of non-Hispanic Caucasian ethnicity and had previously served in the US military (12,13). This sample set represented and independent sample, as none of these samples were included in the initial genome-wide stage or in the replication stage. Rs2708909 and rs2708851 failed to reach significance [P-value = 0.08 for rs2708909 and 0.06 for rs2708851 based on a logistic regression model correcting for age at onset and gender, odds ratio (OR) = 1.176 and 1.189, respectively], though the sample size was underpowered to detect moderate effect alleles (power to detect OR of 1.17 for a MAF of 0.45 = 41.1% at P-value of 0.05). The results are very similar when only patients diagnosed with definite or probable ALS were analyzed as cases (P-value for rs2708909 = 0.09; P-value for rs2708851 = 0.06). Furthermore, no evidence for association with the previously implicated SNPs in ITPR2 and DPP6 was found in this dataset (rs109260404: P-value = 0.62; rs2306677: P-value = 0.77) (14).
Rs2708909 and rs2708851 lie within a 175 kb region of LD (multiallelic D′>0.8) on chromosome 7p12.3. Using our stage 1 datasets, we found that the HapMap CEU, the US and the Italian populations share an almost identical haplotype structure across this region (Supplementary Material, Fig. S2), and determined that seven SNPs (rs6955251, rs2686821, rs2686831, rs2708909, rs2708851, rs2307252 and rs2708912) account for 85% of the variation across the 175 kb region at an r2 > 0.5. The first five of these markers had been genotyped as part of our stage 1 and stage 2 datasets. To investigate whether other SNPs in the same region were more significantly associated with altered risk of developing ALS, we analyzed genotype data for the two additional SNPs rs2307252 and rs2708912 for all stage 1 cases and controls (based on previous whole genome data, n = 2521), stage 2 controls (based on previous whole genome data, n = 2548) and 216 stage 2 US cases (based on additional sequencing data). Neither rs2307252 nor rs2708912 achieved genome-wide significance (P-values = 0.47 and 0.16) based on this cohort of 753 cases and 4532 controls. Next, we applied imputation to our stage 1 data using MACH version 1.0, but none of the untyped SNPs in the region of 7p12.3 provided stronger evidence of association compared with rs2708909 and rs2708851 (Fig. 2).
Our two-stage GWAS identified several additional loci with P-values less than 10−3 representing hypotheses that may merit additional follow-up studies (Supplementary Material, Table S4) (9,10). The three loci (FGGY, ITPR2 and DPP6) identified in previous GWAS of sporadic ALS (5–8) did not alter risk of developing disease in either the combined case-control cohort or in the three individual populations examined in our study (Table 2).
Raw sample-level genotype data from the initial GWAS study (North American ALS cases, North American controls, Italian ALS cases and Italian controls from the Piemonte/Turin region) are available for download through the dbGAP portal (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000006.v1.p1). Individual genotype data for the CGEMS dataset are available for registered users through the CGEMS portal (http://cgems.cancer.gov/data.), whereas individual genotype data for the KORA cohort may be requested (http://epi.gsf.de/kora-gen/index_e.php).
Here we present the results of our two-stage GWAS involving 2713 cases and 5346 controls. This analysis was corrected based on the 531 661 autosomal SNPs genotyped in the initial whole genome scan, rather than the smaller number of SNPs followed up in the replication stage, as we wanted to be as conservative as possible. Our study represents the largest GWAS published to date in ALS, and the first to be sufficiently powered to reliably detect moderately associated SNPs in this relatively rare, fatal neurodegenerative disease.
Our study did not identify any SNPs that clearly exceeded the standard threshold for genome-wide significance (i.e. <10−7), and there is little or no overlap with the results of previously published studies (5–8). This contrasts with GWAS in other neurological diseases such as multiple sclerosis, where the most associated SNP in the HLA-DRA locus had a P-value < 10−80 (15). One possible explanation for this lack of success may be that ALS is a more genetically and clinically heterogeneous disease than previously appreciated, which would significantly limit the power of GWAS. The identification of multiple different familial ALS genes, each involving disparate biological pathways, supports this notion (16). Alternatively, causative genetic factors may increase the risk of motor neuron degeneration by only a small amount (i.e. OR < 1.2) meaning that even larger GWAS involving 5000–10 000 will be required to reliably identify loci (17). Finally, we compensated for the relatively small number of cases in the first stage of our study by selecting several thousand SNPs for detailed follow-up. Although this approach is likely to be adequate for identifying alleles of moderate effects (i.e. OR of > 1.4), mild effect alleles could easily have failed to reach the threshold for inclusion in the replication stage.
Our study also failed to replicate the three loci (FGGY, ITPR2 and DPP6) that had been previously published as been associated with increased risk of sporadic ALS (5–8). This finding agrees with data from the National Registry of Veterans with ALS, which also failed to replicate these loci in a cohort of 989 cases and 327 controls (14). There are several possible explanations for this finding. First, the lack of replication of these loci in the current study may be explained by the small number of SNPs selected from the initial genome-wide scans of the Dutch and TGEN studies for follow-up to confirm disease association. In the Dutch study of 461 cases and 450 controls, the 200 most associated SNPs were brought forward to the replication stages (6), whereas the TGEN study used a DNA pooling methodology involving 386 North American sporadic ALS cases and 542 controls to select 192 SNPs for individual-level genotyping (5). The several hundred thousand tests performed as part of any GWAS make it more likely that the most associated SNPs in the initial genome-wide scan represent false positive associations arising by chance (‘winner's curse’). Indeed, previous two-stage GWAS studies have repeatedly shown that truly causative SNPs are often not ranked in the top 1000 SNPs in the initial genome-wide scan (10), which led us to select a large number of SNPs for replication in our stage 2 analysis. Another possible explanation for the lack of replication of the FGGY, ITPR2 and DPP6 is that the initial Dutch and TGEN studies identified markers that are not in strong LD with the causal variant, leading to a false refutation in our study that was based on different populations (9).
The chromosome 7 risk variants putatively identified as hypotheses by our study were not associated with disease when analyzed in the individual German population included in the study. Although population-to-population variation in causative genes has been postulated for ALS (18,19), our findings are more likely to reflect the smaller number of the samples from the individual populations included in the study, and the consequent loss in power to detect moderate effect loci: the smallest German cohort of 549 cases and 484 controls had only 16.4% power to detect the SUNC1 locus (assuming an OR of 1.18 and a MAF of 0.45), whereas the larger North American dataset of 3727 samples had 55.1% power under the same parameters. Indeed, the putative association of rs2708909 and rs2708851 with ALS is only apparent in the combined analysis of 2289 cases and 4532 controls (power to detect SUNC1 locus = 94.8%), emphasizing the necessity of using several thousand samples to detect variants that only moderately increase risk of developing sporadic ALS (20).
Even if we assume that the chromosome 7 variants are truly associated with ALS, we are left with the problem of determining which gene within this LD block is responsible for increased risk of disease. The location of the variant with the most significant P-value within the intron of SUNC1 would suggest that this gene is the most likely candidate. Indeed, SUNC1 encodes a 40.5 kDa nuclear envelope protein ‘Sad1 and UNC84 domain containing 1’ (21), and mutations in nuclear envelope proteins underlie a variety of neuromuscular diseases including Charcot-Marie-Tooth disease, type 2B1 (22) and spastin-associated hereditary spastic paraplegia (23). However, these biological hypotheses should be interpreted cautiously: although the gene lying closest to an associated SNP is generally considered to be the prime suspect in disease pathogenesis, a number of alternative pathogenic mechanisms must be considered: our own studies have shown that the associated SNP may ‘tag’ the true causative variant residing many kilobases distant in another gene; the associated SNP could affect expression of cis genes up to 100 Kb distant, or could act in trans to alter gene expression on other chromosomes (24); alternatively, the SNP could alter the function or tissue-specific expression of a previously unidentified microRNA or genetic element. Furthermore, despite the large number of samples analyzed in our study, replication of the locus in independent cohorts remains a necessity (9). The two SNPs reported in the current study did not achieve significant association with disease in a separate cohort of 221 Irish ALS cases and 211 neurologically normal controls, though the small size of this cohort precludes firm conclusions being drawn from this data (Irish data was not included in the current study as necessary covariates were not available from the investigators associated with the study) (7). Public release of raw genotype data is helpful in this regard, as it reduces the expense of future whole genome association studies and allows researchers to have greater confidence in the results of their association studies by increasing sample size and power to accurately detect causative loci (25). Our initial public release of data established a powerful, unique resource for the ALS research community (25), and this data has been incorporated into all other ALS GWAS published to date (5–8). Coincident with publication, we have augmented this initiative with data from all 2713 ALS cases genotyped in the current study.
In summary, we present the results of our two-stage genome-wide association in a large cohort of sporadic ALS patients. None of the studied loci clearly achieved genome-wide level of significance, and none of the previously published loci were significantly associated with disease in our study. Though the data supporting an association of the chromosome 7p12.3 variants are suggestive, we chose to interpret these results cautiously as loci previously reported to be associated with increased risk of developing ALS have uniformly failed to replicate (26). Thus, these variants should be considered as hypothesis-generating that require additional replication to confirm or refute their veracity. The current lack of success of GWAS in sporadic ALS may indicate that the disease is more heterogeneous than previously recognized, and highlights the fact that even larger sample numbers will be required to definitively dissect the genetics underlying this fatal neurodegenerative disease.
We used HumanHap550 version one BeadChips (Illumina Inc., San Diego, CA, USA) to genotype 555 352 SNPs in (i) 276 North American ALS cases and 828 North American neurologically normal control samples obtained from the NINDS Neurogenetics repository at the Coriell Institute for Medical Research (NJ, USA). The initial part of this scan has been previously published, and the raw genotype data made publicly available (25); and (ii) 277 Italian ALS cases and 263 Italian control samples collected within the Turin/Piemonte area of Northern Italy. We also genotyped 561 466 SNPs in an additional cohort of 1247 control samples obtained from the InChianti study, a representative population-based cohort of older persons living in the Chianti geographic area (Tuscany, Central Italy) (27), using the HumanHap550 version three BeadChip (28). Analysis was confined to the 545 066 SNPs that are common to versions one and three HumanHap550 BeadChips. Demographics and clinical features of the case and control cohorts are shown in Supplementary Material, Table S1. All patients included in the initial and follow-up stages of the study had been diagnosed with probable, clinically probable, laboratory supported or definite sporadic ALS accorded to the El Escorial criteria (29). Only unrelated, non-Hispanic, white Caucasian individuals of European descent were included in the study.
Standard quality control procedures (e.g. exclusion of samples with low call rates, mismatch between gender according to phenotype data and gender defined by genotype, non-European ancestry, cryptic relatedness and incomplete phenotype data; and exclusion of SNPs with low call rates, Hardy-Weinberg equilibrium P-value < 0.001 and non-random missingness) were applied to the data (Supplementary Material, Fig. S1). After applying these filters, the cohorts included in the initial genome-wide scans consisted of (i) 271 North American sporadic ALS cases and 794 North American control individuals and (ii) 266 Italian sporadic ALS cases and 1190 Italian control samples. A total of 474,554 SNPs were available for association testing in the North American cohort, and 466 131 SNPs in the Italian cohort. Q-Q plots were prepared using R version 2.6.1 (2007, The R Foundation for Statistical Computing) based on genomically controlled association data (Fig. 3). Genomic inflation factor λ for the US cohort was 1.002 and for the Italian cohort was 1.147. The most significantly associated SNPs identified in the genome-wide scans of the North American and Italian cohorts are listed in Supplementary Material, Table S2.
A total of 7600 SNPs were selected on the basis of single-SNP association tests in the initial genome-wide scans. Of these, 2533 were selected as they were the most significantly associated SNPs in the North American cohort based on the Cochran-Armitage test; 2533 were the most significant SNPs in the Italian cohort and 2534 SNPs were chosen based on their disease association in the combined Italian and North American GWAS cohorts. These SNPs were genotyped using a custom-designed iSelect Infinium assay (Illumina) in three cohorts of sporadic ALS patients of European background totaling 2160 cases. The case dataset was compared with data for the same SNPs obtained from 3008 neurologically normal control individuals. The cases consisted of 963 North American sporadic ALS patients, 631 Italian sporadic ALS patients and 566 German individuals diagnosed with sporadic ALS. Population control data was obtained from: the Cancer Genetic Markers of Susceptibility study (CGEMS; n = 2243 North American control samples) (30); European Network for Genetic-Epidemiological Studies (HYPERGENES, n = 275 Italian control samples) and the Cooperative Health Research in the Region of Augsburg study (KORA; n = 490 German control samples) (31). These studies were approved by the appropriate institutional review boards.
After applying quality control filters to the replication data, 6758 SNPs were suitable for analysis in the replication cohort consisting of 1752 cases and 2548 controls (Supplementary Material, Fig. S1). Association analysis was carried out in two ways: for each population separately, and for the stage 1 and stage 2 replication data combined (32). The combined stage 1 and stage 2 samples consisted of 2289 ALS cases and 4532 controls, which yielded 94.8% power to detect loci with an OR of 1.17 and a MAF of 0.45 under the log-additive model assuming a two-sided α of 0.005 (threshold P-value for selection of SNPs for follow-up). The PLINK toolset (33) was used to test for association using logistic regression, adjusted for age, gender and population. This approach retains power to detect recessive or overdominant alleles at the cost of a small decrease in power relative to the Cochrane-Armitage trend test (34) for the detection of alleles with multiplicative effect (10). Bonferroni correction for multiple testing yielded a threshold P-value of 10−7 (α = 0.05/531 661 autosomal SNPs genotyped in stage 1) (35).
Detailed descriptions of sample collection methodology and the quality control filters used in this study are available in Supplementary Material, Methods.
Rs2708851 and rs2708909 were genotyped by TaqMan assays (ABI) in an additional dataset of 989 non-Hispanic Caucasian individuals diagnosed with ALS or other MNDs and 327 neurologically normal control samples. The case samples have been collected by the National Registry of Veterans with ALS (12), and the control samples by the GENEVA study (13). Of these 989 patients, 663 (67%) were diagnosed with definite or probable ALS by El Escorial criteria, 79 (8%) had possible ALS, 158 (16%) had progressive muscular atrophy and the remaining 89 (9%) had primary lateral sclerosis or progressive bulbar palsy. These samples had not been genotyped in either the initial genome-wide stage or in the replication stage.
Haploview v4.1 was used for assessment of LD (36). Haplotype blocks were defined using the Haploview v4.1 solid spine of LD method (D′ > 0.8). SNPs selected to fine-map the haplotype block on chromosome 7p12.3 were genotyped by sequence analysis. PCRs were amplified using primers designed using Primer3-web v0.3.0 (http://frodo.wi.mit.edu) and FastStart PCR MasterMix polymerase (Roche Diagnostics Corp., IN, USA) were sequenced using BigDye terminator v3.1 sequencing chemistry, and run on an ABI3730xl (Applied Biosystems, CA, USA) genetic analyzer as per manufacturer's instructions. The sequences were analyzed with Sequencher software, version 4.2 (Genecodes, VA, USA). MACH version 1.0 software (http://www.sph.umich.edu/csg/abecasis/MACH/download/) (37) was used to estimate haplotypes, and map crossover and error rates using 100 iterations of the Markov chain Monte Carlo algorithm in subsets of 250 random samples from the stage 1 US and Italian cohorts. These estimates were combined with haplotypes from the HapMap CEU samples to infer genotypes in the region of interest for both cohorts using maximum likelihood estimates of genotypes present in HapMap samples (www.hapmap.org, release 21), but not in the Illumina data. After genotype imputation, the maximum likelihood genotypes for the stage 1 US and Italian cohorts were merged. Analyses were rerun, excluding imputed SNPs with r2 values <0.30 between imputed and known genotypes, and posterior probability averages <0.80 for the most likely genotype imputed.
This work was supported by the Intramural Research Program of the National Institute on Aging (project Z01 AG000949-02), the National Institute of Neurological Disorders and Stroke, and the National Institute of Mental Health; the ALS Association; the Packard Center for ALS Research at Johns Hopkins; Istituto Superiore di Sanità (grant number 2005-10 to FL); PF A15 Regione Piemonte (to GR); and the Medical Research Council (to EF and DK). The MONICA/KORA Augsburg studies were financed by the Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany and supported by grants from the German Federal Ministry of Education and Research (BMBF). Part of this work was financed by the German National Genome Research Network (NGFN). The KORA research was supported within the Munich Center of Health Sciences (MC Health) as part of LMUinnovativ. We gratefully acknowledge support for the GENEVA study (‘Genes and Environmental Exposures in Veterans with ALS’) from the National Institutes of Health (grant number ES013244) and the ALS Association (grant number 1230). The National Registry of Veterans with ALS (grant number CSP #500A) and its DNA bank (grant number CSP #478) were supported by the Department of Veterans Affairs Cooperative Studies Program (CSP).
We would like to thank Katrina Gwinn, Larry Refolo and Roderick Corriveau of Coriell, The Italian Football League (FIGC), and Augustin Luna. DNA panels from the NINDS Human Genetics Resource Center DNA and Cell Line Repository (http://ccr.coriell.org/ninds) were used in this study, as well as clinical data. The submitters that contributed samples are acknowledged in detailed descriptions of each panel: NDPT002, NDPT006, NDPT008, NDPT011 to NDPT013, NDPT019 to NDPT030. We also thank the ALS Research Group and the patients who submitted their samples to the NINDS Repository and to other investigators.
Conflict of Interest statement. L.B. is Vice-President for Research, ALS Association and J.R. is Medical Director of the Packard Center for ALS Research. Other authors report no conflict of interest.