With a sample size of 1194 (575 ALS cases and 621 controls) and genotypes across more than 317,000 SNPs, we set out to investigate the role of copy number variation in ALS using a widely available CNV calling algorithm. We used two different approaches to collate and test association of the raw CNV calls with ALS and compared our results with those of previous studies.
Although none of the loci identified as being associated with ALS contained, or were close to, any genes previously reported as plausible ALS candidate genes 
, several potential new genes of interest were highlighted. The region on chromosome 16 encompasses several genes that are reasonable candidates for ALS susceptibility. These have functions in calcium signalling (CACNA1H
), axonal transport, (MAPK8IP3
), nerve growth factors (IGFALS
), angiogenesis (BAIAP3
), and the ubiquitin proteasome system (UBE2I, SPSB3
). The CPLX1
gene on chromosome 4 encodes a protein involved in synaptic vesicle exocytosis. The PPP1R13B
gene on chromosome 14 is an apoptosis stimulating protein. Perhaps the most interesting potential ALS candidate of the genes identified here is EEF1D
, a eukaryotic translation elongation factor 1 delta on chromosome 8, which encodes a subunit of the elongation factor-1 complex, responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome. This process has been directly implicated in genetic studies of motor neuron diseases (GARS
genes), and the RNA processing pathway in general is involved in amyotrophic lateral sclerosis as evidenced by ALS-linked mutations identified in the SETX, FUS
, and the association of ALS with variants in the ANG
Although we found few differences between the numbers and sizes of ALS-specific and control-specific regions overall, we identified 5 regions that were ALS-specific and achieved significance (before multiple testing correction). Many of the genes in these regions were also associated when the raw CNV calls were tested using a gene-based approach, including EEF1D
. Although we found the copy number variation at a chromosome 11 region to be ALS-specific in our sample population, CNVs at this locus in healthy individuals have been recorded previously 
and are recorded in the Database of Genomic Variants
suggesting that this region may not be ALS-specific.
There was a small amount of overlap between the results from this study and those of a previous study investigating the role of CNV in ALS risk but the majority of the findings of the previous study were not replicated in our data. Cronin et al
found a significant difference between the median size of heterozygous deletions in cases and controls. In this study, the median size of heterozygous deletions was larger in cases compared to controls but this difference was not found to be significant. When our data was filtered using an alternative set of filters that were closer to those implemented by Cronin et al
(see Appendix S1
), the size of heterozygous deletions were found to be larger in cases than in controls but the difference did not reach significance (p<0.05). Significant (P<0.01) differences between the mean number per individual and the median size of duplications were found between cases and controls using both our original filters and those similar to Cronin et al
Our primary analysis was undertaken using a region-based approach which aimed to reduce the impact of errors in boundary estimation. Four different reciprocal overlap thresholds were applied to the data to merge CNV calls into CNV regions and, of the 38 loci showing association with ALS if a >70% reciprocal overlap was used, all but 3 regions were also associated with ALS (nominal P<0.05) if a different reciprocal overlap threshold was used (i.e. any overlap, >50% overlap and 100% overlap). However, the level of significance achieved varied for most regions according to the overlap threshold used; if validation were to be undertaken for a limited number of regions, the choice of such regions would be dependent on the choice of reciprocal overlap threshold used. Interestingly, many of the regions identified using the region-based approach, were also represented amongst the hits from the gene-based analysis. The potentially interesting genes within the chromosome 16 region and CPLX1 were not significantly associated with ALS under the gene-based approach. However, EEF1D and PPP1R13B were significantly associated with ALS using both approaches.
No evidence of any gene set being overrepresented amongst those affected by ALS-specific copy number variation was found by GRAIL analysis. There were five significant genes amongst those affected by control-specific copy number variation, identified by GRAIL analysis, including GLL1 which is involved in B cell biology and has been associated with agammaglobulinemia
. No significantly associated gene-sets were identified using GSA although several involving cell death, muscle organ development and central nervous system development were amongst those showing nominal significance (uncorrected p<0.05). If a result that is significant in a CNV genome-wide association study is not significant in GSA it might still be important if it perturbs a pathway. For example, a deletion upstream of the IRGM
gene affects the expression of IRGM but the gene itself is not believed to be copy number variable 
. In this case GSA using CNV association data would not report the IRGM-related pathways.
Previous studies have found limited evidence for a role of CNV in sporadic ALS hypothesising that multiple rare variants are more likely to contribute to risk than common variants. In this study, although nine copy number variable regions found in more than 5% of the study population showed association with ALS, seven of these were found to map close to the centromeres and may be artefactual. In addition, the regions on chromosomes 4, 8, 14 and 16 which are reported here as being associated with ALS and containing potential ALS candidates all lie within the telomeric chromosome band and so must also be treated with caution. The telomeric and centromeric regions may be more prone to false CNV calls than other regions of the genome, in part due to the lower density coverage of older genotyping arrays in these regions. Findings based on newer generation SNP arrays (which also include non-SNP copy number variation probes) have shown strong evidence of common CNVs in these regions 
suggesting that there may be true copy number variants in these regions although caution must be taken when studying these regions on older platforms. The gene-based approach also reported genes from within some of the region-based approach findings, including the regions on chromosomes 8 and 14 which contain potential ALS candidate genes. The chromosome 11 region at which ALS-specific copy number variation was observed was reported using both approaches (with p<0.01 for each gene using gene based approach and p
0.025 using region-based approach, although this region also lies within 1.5 Mb of the telomere and therefore must be treated with caution). Evidence of association in regions for which copy number variation has previously been characterised by McCarroll and colleagues 
suggests that a role of common copy number variation in ALS risk should not be ruled out. It should be noted that the HumanHap300 chip has low (or no) coverage of regions known to contain common copy number variants and further studies involving larger sample sizes and using denser SNP arrays or SNP-CNV hybrid arrays should be undertaken to further investigate the role of CNV in ALS and provide replication of the findings presented here.