|Home | About | Journals | Submit | Contact Us | Français|
We assessed the role of rare copy number variants (CNVs) in Alzheimer's disease (AD) using intensity data from 3260 AD cases and 1290 age-matched controls from the genome-wide association study (GWAS) conducted by the Genetic and Environmental Risk for Alzheimer's disease Consortium (GERAD). We did not observe a significant excess of rare CNVs in cases, although we did identify duplications overlapping APP and CR1 which may be pathogenic. We looked for an excess of CNVs in loci which have been highlighted in previous AD CNV studies, but did not replicate previous findings. Through pathway analyses, we observed suggestive evidence for biological overlap between single nucleotide polymorphisms and CNVs in AD susceptibility. We also identified that our sample of elderly controls harbours significantly fewer deletions >1 Mb than younger control sets in previous CNV studies on schizophrenia and bipolar disorder (P = 8.9 × 10−4 and 0.024, respectively), raising the possibility that healthy elderly individuals have a reduced rate of large deletions. Thus, in contrast to diseases such as schizophrenia, autism and attention deficit/hyperactivity disorder, CNVs do not appear to make a significant contribution to the development of AD.
Alzheimer's disease (AD) is the most common form of dementia with a prevalence of ~1% in western populations at the age of 65, rising to 25–35% in those over 85 (1). While AD is genetically complex, it is also highly heritable, with recent estimates of heritability ranging from 58 to 79% (1). Neuropathologically, the disease is characterized by extracellular senile plaques containing β-amyloid (Aβ), intracellular neurofibrillary tangles containing hyperphosphorylated tau protein and loss of synapses (2).
Mutations of the genes APP, PSEN1 and PSEN2 cause rare Mendelian forms of the disease, usually with early onset. Until recently, the only unequivocal susceptibility gene for the late-onset form of the disease was APOE (3). Although several candidate gene studies had previously shown suggestive evidence of association [e.g. Bertram et al. (4)], within the past 3 years genome-wide association studies (GWAS) of AD have identified nine genome-wide significant susceptibility loci [see Hollingworth et al. for a review (5)]. These are CLU, PICALM, CR1, BIN1, MS4A, ABCA7, CD33, EPHA1 and CD2AP (6–10). It has been estimated that common variants within GWAS conducted by the Genetic and Environmental Risk for Alzheimer's disease (GERAD) consortium (8) account for ~24% of the estimated heritability of AD (Lee et al., submitted for publication) and so other sources of genetic variation that contribute to the disease remain to be identified.
Structural variation, including copy number variants (CNVs), may account for some of the unexplained heritability. A number of rare CNV loci have been implicated in brain disorders (11) and several specific CNVs have been identified that increase the risk for neurodevelopmental disorders such as schizophrenia, autism and mental retardation (12–15). To date, there have been four published genome-wide case–control association studies to assess the contribution of CNVs in late-onset AD, three of which were conducted on Caucasian samples (16–18) and one on a sample of Caribbean Hispanic origin (19). None of these studies found a global excess of CNVs in AD cases; however, Heinzen et al. reported a rare duplication in the schizophrenia and epilepsy risk region at 15q13.3, affecting the CHRNA7 gene, with 2% of their cases and 0.3% of their controls having the duplication (P = 0.053) (16). Swaminathan et al. carried out a CNV analysis in participants of the Alzheimer's Disease Neuroimaging Initiative (ADNI) study (17), and found a significant excess of CNVs overlapping the genes CSMD1 and HNRNPCL1 in AD cases, but these findings were not significant after correction for multiple testing. They also identified CNVs overlapping the genes NRXN1, ERBB4, ATXN1, HLA-DPB1, RELN, CHRFAM7A, DOPEY2 and GSTT1 in AD cases but not in controls, although the excess in these loci was not significant. These findings were subsequently confirmed by the authors through CNV analysis of participants in the National Institute of Aging-LOAD/National Cell Repository for AD (NIA-LOAD/NCRAD) Family Study (18). This study also reported an excess of CNVs overlapping the gene IMMP2L in 1.6% of their AD cases with no CNVs identified in their controls, although this finding was not significant (uncorrected P = 0.059). Furthermore, Ghani et al. (19) CNV study on an AD data set of Caribbean Hispanic origin found nominal association with a duplication on chromosome 15q11.2 (chr15: 20.3 Mb–20.65 Mb). Although the loci highlighted by these studies are not significant, they warrant further investigation in larger data sets. These studies are relatively small (n < 1200) in comparison with the more recent collaborative AD GWAS which have been conducted (6–10), and are unlikely to have the power required to identify a significant association of CNVs in AD.
We aimed to identify CNVs contributing to AD development by undertaking a powerful analysis of association using CNV data on 3260 AD cases and 1290 age-matched controls. These samples had been genotyped on Illumina 610-quad chip arrays (Illumina, Inc.) as part of a GWAS conducted by the GERAD consortium (8).
Unless otherwise stated, analyses were restricted to the 2690 CNVs in 3260 AD cases and 1290 controls that passed QC, were <1% frequent in each sample set, were >100 kb in length, called with at least 20 probes and were validated using z-score analysis.
The rates of rare CNVs of different sizes and the corresponding P-values when AD cases were compared with controls are shown in Supplementary Material, Table S1. When examining the rates of all CNVs >100 kb, we observed a significant excess of deletions in the controls rather than cases (case–control ratio = 0.90, P = 0.0332), but this excess does not remain significant (adjusted P = 0.30) after Bonferroni correction for multiple testing for different size ranges and types of CNVs (deletions/duplications). No statistically significant differences between cases and controls were observed in CNVs >500 kb. Very large deletions (>1 Mb) are likely to be the most pathogenic class of CNV (20–23) and an excess of deletions >1 Mbs was observed in cases (case–control ratio = 4.19, P = 0.023). This excess is not significant when corrected for multiple testing (adjusted P = 0.21). However, we did observe that the rate of these rare larger deletions is very low in our elderly control population compared with younger control sets studied previously. For example, the International Schizophrenia Consortium (ISC) study (20) identified rare deletions >1 Mb at a rate of 1.2% in 3181 controls and a study by Grozeva et al. (24) identified such CNVs at a rate of 0.7% in 2806 population controls. We observed such CNVs at a rate of only 0.16% in our 1290 controls, which is a significantly lower rate than the ISC study (P = 8.9 × 10−4, χ2 = 11.04, 1 df) and the study by Grozeva et al. (P = 0.024, χ2 = 5.1, 1 df).
As the global burden analysis highlighted a significant excess of deletions >100 kb in controls compared with cases, we carried out regional analysis of the whole genome to identify any specific regions which harboured an excess of deletions in controls. We identified four regions which showed an excess of deletions in controls, but these regions did not remain significant after correction for multiple testing (corrected P > 0.4).
We also carried out regional analysis to identify any regions which may harbour an excess of CNVs in cases, but no regions showed significant association (uncorrected P > 0.05).
We sought to replicate the findings of the four previous AD CNV studies (16–19). As the CNVs in these studies were not filtered for frequency, we carried out this analysis on the 7718 CNVs present before filtering for 1% frequency, and all CNVs identified in these regions were validated using z-score analysis, like we did for the rare CNVs. As shown in Table 1, we did not replicate any of the findings of the previous studies. Out of the five loci in which we also identified CNVs, four loci had a greater rate of CNVs in our controls (ERBB4: case–control ratio = 0.38; CSMD1: case–control ratio = 0.48; CHRNA7: case–control ratio = 0.95; DOPEY2: case–control ratio = 0.65). The only previously reported region in which we observed an excess of CNVs in AD cases was the 15q11.2 duplication locus identified by Ghani et al. (19). We found CNVs overlapping this region in 0.52% of our cases and 0.23% of our controls (case–control ratio = 2.26); however, the excess in cases was not significant (uncorrected P = 0.22).
We sought to identify CNVs which overlapped the known AD risk genes, regardless of their frequency. We used the data set of 7718 CNVs present before filtering for <1% frequency was carried out and all CNVs were validated using z-score analysis. Table 2 shows the results of the analysis of CNVs that overlapped genes known to contribute to AD. APOE, PSEN1, PSEN2, BIN1, PICALM, CLU, the MS4A gene cluster (chr11: 59.5–60.35 Mb, NCBI b36), EPHA1, CD33 and CD2AP were not overlapped by any CNVs and as such are not shown in the table. In total, three rare CNVs overlapping the genes APP (one duplication) and CR1 (two duplications) were identified in cases. Although no CNVs overlapping these genes were identified in controls, the higher rates of CNVs in cases were not significant (P > 0.596). Figure 1a shows the position of the duplication that overlaps the APP locus. This duplication is over 5 Mb and overlaps another 13 genes as well as the entire length of APP. This duplication was identified in a post-mortem sample from an individual whose exact age of onset was unknown but who was a patient in a residential nursing home by 57 years of age and who had severe dementia (25). This individual is also part of the sample analysed by McNaughton et al. (25). That study aimed to identify duplications at the APP and PRNP loci using a three tiered screening assay which included exonic real-time quantitative PCR, fluorescent microsatellite quantitative PCR, and Illumina arrays, resulting in the identification of an APP duplication in the same individual, thus confirming our CNV call. The duplications overlapping the CR1 gene are shown in Figure 1b, and both of the individuals with these duplications have late-onset forms of AD (onset ages of 66 and 67 years).
We analysed CNVs that were present in regions of the genome that are regarded as harbouring potentially pathogenic CNVs that increase the risk of schizophrenia, autism, mental retardation and/or epilepsy (14), as shown in Table 3. None of these regions showed a significant excess of CNVs in our AD cases, although the relatively small number of cases in our study almost precludes the detection of statistically significant differences for the very rare CNVs, in the region of 1:5000 controls (1q21.1, NRXN1, 15q13.3, 16p11.2 and 17p12) or even lower rates (3q29 and 22q11.2). The most common of the pathogenic CNVs (deletions at 15q11.2 and duplications at 16p13.1) were found in our cases at rates slightly lower to those in previous control populations (26), and were lower than the rates in our own controls. We identified higher rates of deletions at the 1q21.1 locus, duplications at the 16p11.2 locus and deletions at the 17p12 locus in our cases than have been observed in previous control populations, but these rates are still far lower than those observed in schizophrenia case populations and are more similar to rates reported among controls (26).
We performed pathway analysis by testing 1833 gene sets for an excess of genes hit by CNVs in cases. We identified 113 pathways which were significantly enriched (P < 0.05). None of these pathways survived correction for multiple testing of pathways at a false discovery rate of less than 0.05 (all q-values were >0.15). We also tested these pathways for enrichment of gene hits in deletions and duplications separately. One hundred and seventy pathways were significantly enriched for case gene hits in deletions, but none survived multiple testing correction (q-values > 0.08). Likewise, of the 97 pathways significantly enriched for case hits in duplications, none survived multiple testing correction (q-values > 0.45).
We have previously conducted pathway analysis on the genome-wide single nucleotide polymorphism (SNP) data for these sample sets and found a number of pathways involved in cholesterol metabolism and the immune system to be significantly enriched (27). To test for overlap between the pathways identified through SNPs or CNVs, we ran the ALIGATOR algorithm (28) on the GERAD GWAS data, using more up-to-date pathway definitions than those used in (27). We restricted these analyses to pathways that were enriched for genes hit by all CNVs, deletions and duplications in cases. Of the pathways that were nominally significantly enriched (P < 0.05) for genes hit by deletions (>100 kb) in cases, 15 were enriched for SNP signal in GERAD at P < 0.05 and 6 at P < 0.01. These numbers are significantly larger than expected by chance (P = 0.034 and P = 0.015, respectively), suggesting some biological mechanisms for AD susceptibility acting through both SNPs and CNVs. The significant pathways are listed in Supplementary Material, Table S2, and include pathways related to lipid/cholesterol homoeostasis and cell signalling. No significant overlap was observed for pathways enriched for genes hit in cases by duplications or CNVs in general.
We conducted a large-scale study of rare CNVs in AD. Overall, there was no excess of CNVs in cases compared with controls, which is what has been observed in the other smaller genome-wide studies of CNVs in AD published so far (16–19). When the CNVs were divided into four size ranges and type (deletions and duplications), there was a significant excess of deletions >1 Mb in cases, but this finding did not remain significant after multiple testing.
This is in contrast with the findings of CNV studies of neurodevelopmental disorders. For example, the ISC (20) identified a significant excess of rare CNVs >100 kb in cases with schizophrenia (case–control ratio = 1.15) and this excess was more pronounced for deletions >500 kb (case–control ratio = 1.67). Williams et al. (29) also found a significant excess of CNVs >500 kb in cases with attention deficit/hyperactivity disorder (ADHD) (case–control ratio = 2.09). We observed an excess of deletions >100 kb in controls, but this was not significant after correction for multiple testing. This is a similar finding to that of Swaminathan et al. in their recent genome-wide study of CNVs in AD, where they observed a trend towards a reduced rate of both deletions and duplications in their AD cases (18). However, much higher CNV rates overall were observed in that study (CNVs per person = 9.3) as they did not filter the data for rare CNVs >100 kb when carrying out this analysis.
We investigated potentially interesting loci in which previous AD CNV studies identified an excess of CNVs in AD cases (16–19). Although we observed an excess of CNVs in our cases in the 15q11.2 region identified by Ghani et al. (19), this excess did not reach significance in our study. The rate of CNVs in this region in their AD cases of Caribbean Hispanic origin is five times what we have observed in our cases. This difference may be due to population differences in CNV rates at this locus. We identified CNVs in four other loci which had been highlighted in the two studies by Swaminathan et al. (17,18), but we observed a higher rate of CNVs in our controls than in our cases. This discrepancy may be due to the small control sample sizes of the previous two studies (combined n = 339). This present study consists of nearly four times as many controls and so has greater power to detect more CNVs in these regions.
We investigated if any known AD risk genes were intersected by CNVs. APP was overlapped by a duplication identified in an individual with early-onset AD, but no CNVs overlapped this gene in controls. This duplication was independently identified and validated in this same sample by another group that specifically focused on the APP gene (25). A number of previous studies have also identified duplications at the APP locus in early-onset AD cases (15,30–33). We also identified duplications of CR1 in two individuals with late-onset AD. One of these duplications overlaps the low-copy repeat-associated CNV in CR1. This is particularly interesting as Brouwers et al. have shown that duplication of this intragenic CNV in the CR1-S isoform of the gene increases risk for AD, possibly by increasing the number of C3b/C4b-binding sites (34).
Our pathway analysis did not show a significant enrichment of any biological pathways after correction for multiple testing. However, we have shown that a number of pathways were found to be enriched in both the SNP data and in deletions, more so than would be expected by chance. This suggests that some biological mechanisms for AD susceptibility may act through both SNPs and CNVs (specifically, deletions). The signifcant pathways include lipid/cholesterol homoeostatsis, as well as cell signalling. Cholesterol homoeostatsis is of particular interest with AD as a number of AD risk genes are thought to be involved in lipid metabolism; APOE and CLU are involved in the formation and transport of lipoprotein particles, both systematically and in the brain (1,5) and ABCA7 is involved in the release of cholesterol and phopholipids from cells to lipoprotein particles (35). BIN1 and PICALM may also have roles in the internalization and transport of lipids through receptor-mediated endocytosis (1,5).
We observed a low rate of very large (>1 Mb) deletions in our elderly control population compared with younger control sets used in the CNV studies conducted by the ISC (20) and Grozeva et al. (24). Although different arrays were used in each of these analyses, deletions >1 Mb in size are the most reliable to call, allowing for fairly confident comparisons between studies. This raises the question of whether healthy elderly individuals have a reduced rate of large deletions, in other words, do deletions >1 Mb cause other general health problems that reduce life expectancy? This finding needs to be replicated in additional samples of elderly individuals before conclusions can be drawn.
In summary, we did not find a global excess of rare and large CNVs in AD cases and we did not replicate findings for an excess of CNVs in loci highlighted by previous AD CNV studies. Furthermore, we did not find an excess of CNVs overlapping AD candidate genes in cases, but did identify duplications overlapping APP and CR1 which may be pathogenic. We have also shown potential biological overlap between the involvement of SNPs and CNVs in AD susceptibility, centred on lipid/cholesterol metabolism. We also find a reduced rate of large, rare deletions in our elderly controls than has been observed in other control sets, raising the possibility that this class of CNVs not only increase the rate of various neurodevelopmental disorders, but might also reduce life expectancy in general. Thus, in contrast to diseases such as schizophrenia, autism and ADHD, CNVs do not appear to make a significant contribution to the development of AD.
The sample used in this study consisted of 4112 cases and 1602 elderly screened controls. All samples were genotyped on Illumina 610-quad chip arrays as part of a GWAS of AD conducted by the GERAD consortium, as previously described by Harold et al. (8).
All AD cases met criteria for either probable [NINCDS-ADRDA (36), DSM-IV) or definite (CERAD (37)] AD. Controls were screened for dementia using the MMSE or ADAS-cog, and were determined to be free from dementia at neuropathological examination or had a Braak score of 2.5 or lower. Any controls with a known history of psychiatric illness were excluded. All individuals included in these analyses provided informed consent to take part in genetic association studies. Control samples were prepared in the same way as case samples at each collection centre to minimize sample differences. Participants were recruited from the UK and Ireland (2774 cases and 1165 controls) by the Medical Research Council (MRC) Genetic Resource for AD (Cardiff University; Institute of Psychiatry, London; Cambridge University; Trinity College Dublin), the Alzheimer's Research UK (ARUK) Collaboration [University of Nottingham; University of Manchester; University of Southampton; University of Bristol; Queen's University Belfast; the Oxford Project to Investigate Memory and Ageing (OPTIMA), Oxford University]; MRC PRION Unit, University College London and London and the South East Region AD project (LASER-AD), University College London. Individuals from Germany (680 cases and 137 controls) were recruited by the Competence Network of Dementia and Department of Psychiatry, University of Bonn. Individuals from the USA (658 cases and 300 controls) were recruited by the National Institute of Health (NIMH) AD Genetics Initiative and Washington University, St Louis, USA. Cases or controls described here have not been part of previous publications on CNVs.
Genotyping was performed using Illumina 610-quad arrays at the Sanger Institute, UK. 200 ng of input DNA per sample were used and prepared for genotyping using the Illumina Infinium system following the manufacturer's protocols. The Log R Ratios (LRR) and B allele frequency (BAF) values for each sample were calculated from the signal intensity files by Illumina BeadStudio v3.2. Initially, we observed strong batch effects in the data set, manifested by large differences in the mean number of CNVs (both overall and >100 kb in size) produced on samples from different plates. These increased numbers of CNVs indicate the presence of false-positive calls in such plates. To correct the batch effects, we clustered SNPs in BeadStudio on a plate by plate basis, rather than for the whole sample before exporting the data. The exported data were then used to generate CNV calls using the PennCNV software (27 August 2009 version), applying the GC-model wave adjustment (38). This software detects CNVs using a hidden Markov model-based approach.
Case and control samples were subject to the same stringent QC filters in order to prevent any DNA quality differences between case and controls samples affecting the results. After minimizing batch effects in the sample, four plates (consisting of 179 cases and 42 controls) appeared to be outliers from the distribution of the mean number of CNVs and were excluded. The remainder of the samples were excluded from the analyses if they had SNP call rates <98%, Log R Ratio standard deviations >0.3, B allele frequency drift > 0.0002, wave factor out of the range −0.04 to 0.04 and a BAF median out of the range 0.45–0.55. We also filtered out individual samples that had >70 CNVs, which were outliers from the distribution. See Supplementary Material, Table S3 for a breakdown of the sample exclusions. This resulted in a final sample set of 3260 cases (average age at interview = 77.67, SD = 9.13; average age at onset = 72.91, SD = 8.49) and 1290 controls (average age at interview = 76.36, SD = 6.91).
The PennCNV algorithm identified a total of 165 361 CNVs. To reduce false positives, we set a relatively high CNV quality threshold: we excluded CNVs if they included fewer than 20 SNPs and if they had a low SNP density (>15 kb per SNP). We also chose to limit the analysis to CNVs >100 kb in size, as small CNVs are more difficult to call reliably. These cut-offs have been used in a number of papers (20,24,39). We excluded any CNVs if >50% of their length overlapped regions of segmental duplications, as defined by the ‘Segmental Dups’ track in UCSC (http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=164967197&c=chr2&g=genomicSuperDups) (40,41).
We then joined any CNVs that appeared to be artificially split by the PennCNV algorithm. The rationale we used involved joining two CNVs if the length of the sequence between them was <50% of the length of the larger CNV. After QC, 7718 CNVs remained.
Consistent with previous CNV studies (20,24,39), we analysed only rare CNVs (<1%) due to the problems of reliably calling more common CNVs with standard SNP arrays (42). Research on neurodevelopmental disorders is also justified in analysing only rare CNVs, as selection against pathogenic CNVs keeps them at low rates (43). Selection pressure might not be relevant in a disorder with such a late age at onset, like AD, but we decided that the technical problems associated with the calling of common CNVs still justified their exclusion. In addition, common CNVs have been shown to be in linkage disequilibrium with common SNPs, and therefore the signals would have been detected by GWAS (42). We filtered out the common CNVs present in >1% of each sample using PLINK v1.06 (44), leaving 3593 CNVs.
To evaluate the remaining CNVs, we used a slight modification of the algorithm MeZOD reported by McCarthy et al. (13). This method is discussed in more detail in a recent article by Kirov et al (39). Briefly, the median z-score outlier method is a three-stage process: (i) the signal from each probe on an individual array is assigned a z-score based upon the distribution of all probe signals on that array (individual-wise standardization) (ii) each resulting z-score for each probe from (i) is assigned a new z-score based upon the distribution of all individual z-scores for that probe (probe-wise standardization) (iii) the median of the z-scores for all probes within a region of interest from (ii) is calculated and displayed as a histogram. Outlier detection is performed by visual inspection of the histogram, and outliers represent CNVs in the particular region of interest as shown in Supplementary Material, Figure S1. Software to perform this analysis and visualize the results can be obtained from: http://x004.psycm.uwcm.ac.uk/~dobril/z_scores_cnvs. We produced histograms of the z-scores produced by each CNV (deletions and duplications), and selected cut-offs in order to remove CNVs which did not appear to be true outliers, as shown in Supplementary Material, Figure S2. For deletions, CNVs were considered to be real if they had a z-score ≤ −5, and those with a z-score > −4 were excluded. Deletions with z-scores between −4 and −5 were manually inspected. Duplications were considered to be real if they had a z-score ≥ 4 and were excluded if they had a z-score <3. Duplications with z-scores between 3 and 4 were manually inspected. This filtering resulted in the exclusion of 317 deletions and 586 duplications, leaving 1220 deletions and 1470 duplications. The mean size of these CNVs in cases is 324.8 kb and in controls it was 308.9 kb.
To compare the global CNV burden between cases and controls, a Poisson regression was fitted on the number of CNVs per individual in cases versus controls. Ten principal components from Eigenstrat analysis of SNP data as well as country of origin (i.e. UK and Ireland, USA or Germany) were used as covariates in this analysis to control for systematic differences between centres as well as ethnic differences. Significant loci were identified using the ‘segment group’ function in PLINK v1.06 (44). Association analyses of these loci, and of CNVs overlapping AD associated genes and regions highlighted by previous studies of AD CNVs were also carried out using PLINK v1.06 (44). P-values are two-tailed, based on comparing the number of CNVs per individual cases and controls with the use of 10 000 permutations. The genomic coordinates used in this study are based on the March 2006 human genome sequence assembly (UCSC hg18, National Centre for Biotechnology Information build 36).
CNVs in previously implicated loci in other neuropsychiatric conditions were included if they spanned at least 50% of that region. For deletions in NRXN1, we used the criterion employed in previous reviews that found the strongest associations for deletions: >100 kb and disrupting exons (45,46).
The gene sets used in our pathway analyses came from three sources and have been previously described: (i) Gene Ontology (GO) (47), (ii) Kyoto Encyclopedia of Genes and Genomes pathways (KEGG) (ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/hsa_pathway.list) and (iii) the canonical pathways set from the Molecular Signatures Database (MSigDB) (48). Pathways were required to contain between 3 and 1500 genes to be included in the analysis, giving a total of 10 086 pathways. We did not exclude CNVs which were only identified in controls in this analysis as we assumed incomplete penetrance of these CNVs. Gene sets were tested for an excess of genes hit by large, rare CNVs in cases by fitting the following logistic model to the combined set of case and control CNVs:
logit(pr(case)) = CNV size + Total number of annotated genes hit outside the gene set + number of genes hit in the gene set
The change in deviance was then compared between this and the model:
logit(pr(case)) = CNV size + Total number of annotated genes hit outside the gene set
A one-sided test for an excess of genes hit by case CNVs was performed. This overcomes biases relating to gene and CNV size (49). The comparison of case to control CNVs allows for the possibility of non-random CNV location unrelated to disease (i.e. CNVs being more likely to occur in certain specific locations of the genome in both cases and controls). The inclusion of the CNV size in the regression allows for the possibility of case CNVs being larger than control CNVs (and thus likely to hit more genes, regardless of function). Inclusion of the total number of genes hit outside the gene set in the regression corrects for case CNVs hitting more genes overall (regardless of function) than control CNVs. Analysis was restricted to gene sets containing at least 10 CNV hits in total (case and control combined), since pathways with a large number of gene hits are more likely to be biologically meaningful. This resulted in a total of 1833 gene sets being analysed. Correction of the enrichment P-values for each gene set for the multiple testing of gene sets was carried out by calculating q-values (50)—these are equivalent to the minimum value of the false discovery rate at which the gene set would be counted as significant (51).
Conflict of Interest statement. M.O. has received funding from GlaxoSmithKline plc and holds patents. J.W. has received lecture fees from Eli Lilly and company and Eisai Ltd and holds patents.
The work was made possible by the generous participation of the control subjects, the subjects with Alzheimer's disease, and their families. This work was supported by funding from the following organisations: the Wellcome Trust (grant number GR082604MA); Medical Research Council (grant number G0300429); Alzheimer's Research UK; Welsh Assembly Government; Alzheimer's Society; Ulster Garden Villages, Northern Ireland R&D Office; Royal College of Physicians/Dunhill Medical Trust; Mercer's Institute for Research on Ageing; Bristol Research into Alzheimer's and Care of the Elderly (BRACE); Charles Wolfson Charitable Trust; NIH (grant number PO1-AG026276, PO1-AG03991, RO1-AG16208, P50-AG05681); NIA; Barnes Jewish Foundation; Charles and Joanne Knight Alzheimer's Research Initiative of the Washington University Alzheimer's Disease Research Centre; the UCLH/UCL Biomedical Centre; Lundbeck SA; German Federal Ministry of Education and Research (BMBF): Kompetenznetz Demenzen (grant number 01GI0420); Bundesministerium für Bildung und Forschung and Competence Network Dementia (CND) Förderkennzeichen (grant number 01GI0102, 01GI0711). Funding to pay the Open Access publication charges for this article was provided by Medical Research Council – Identifying Genetic Risk for Late-onset Alzheimer's Disease: The GERAD Consortium.