|Home | About | Journals | Submit | Contact Us | Français|
The Alzheimer Disease Genetics Consortium (ADGC) performed a genome-wide association study (GWAS) of late-onset Alzheimer disease (LOAD) using a 3 stage design consisting of a discovery stage (Stage 1) and two replication stages (Stages 2 and 3). Both joint and meta-analysis analysis approaches were used. We obtained genome-wide significant results at MS4A4A [rs4938933; Stages 1+2, meta-analysis (PM) = 1.7 × 10−9, joint analysis (PJ) = 1.7 × 10−9; Stages 1–3, PM = 8.2 × 10−12], CD2AP (rs9349407; Stages 1–3, PM = 8.6 × 10−9), EPHA1 (rs11767557; Stages 1–3 PM = 6.0 × 10−10), and CD33 (rs3865444; Stages 1–3, PM = 1.6 × 10−9). We confirmed that CR1 (rs6701713; PM = 4.6×10−10, PJ = 5.2×10−11), CLU (rs1532278; PM = 8.3 × 10−8, PJ = 1.9×10−8), BIN1 (rs7561528; PM = 4.0×10−14; PJ = 5.2×10−14), and PICALM (rs561655; PM = 7.0 × 10−11, PJ = 1.0×10−10) but not EXOC3L2 are LOAD risk loci1–3.
Alzheimer Disease (AD) is a neurodegenerative disorder affecting more than 13% of individuals aged 65 years and older and 30%–50% aged 80 years and older4–5. Early work identified mutations in APP, PSEN1, and PSEN2 that cause early-onset autosomal dominant AD6–9 and variants in APOE that affect LOAD susceptibility10. A recent GWAS identified CR1, CLU, PICALM, and BIN1 as LOAD susceptibility loci1–3. However, because LOAD heritability estimates are high (h2 ≈ 60–80%)11, much of the genetic contribution remains unknown.
To identify genetic variants associated with risk for AD, the ADGC assembled a discovery dataset [Stage 1; 8,309 LOAD cases, 7,366 cognitively normal controls (CNEs)] using data from eight cohorts and a ninth newly assembled cohort from the 29 NIA-funded Alzheimer Disease Centers (ADCs) (Supplementary Tables 1 and 2, Supplementary Note) with data coordinated by the National Alzheimer Coordinating Center (NACC) and samples coordinated by the National Cell Repository for Alzheimer Disease (NCRAD). For the Stage 2 replication, we used four additional datasets and additional samples from the ADCs (3,531 LOAD cases, 3,565 CNEs). The Stage 3 replication used the results of association analyses provided by three other consortia (Hollingworth et al.12; 7,650 LOAD cases, 25,839 mixed-age controls). For Stages 1 and 2, we used both a meta-analysis (M) approach that integrates results from association analyses of individual datasets; and a joint analysis (J) approach where genotype data from each study are pooled. The latter method has improved power over meta-analysis in the absence of between-study heterogeneity13 and more direct correction for confounding sampling bias14. We were limited to meta-analysis for Stage 3.
Because cohorts were genotyped using different platforms, we used imputation to generate a common set of 2,324,889 SNPs. We applied uniform stringent quality control measures to all datasets to remove low-quality and redundant samples and problematic SNPs (Supplementary Tables 3, 4, and Online Methods). We performed association analysis assuming an additive model on the log odds ratio scale with adjustment for population substructure using logistic regression for case-control data and generalized estimating equations (GEE) with a logistic model for family data. Results from individual datasets were combined in the meta-analysis using the inverse variance method, applying a genomic control to each dataset. The joint analysis was performed using GEE and incorporated terms to adjust for population substructure and site-specific effects (Online Methods). For both approaches, we also examined an extended model of covariate adjustment that adjusted for age (age at onset or death in cases; age at exam or death in controls), sex, and number of APOE ε4 alleles (0, 1, or 2). Genomic inflation factors (λ) for both the discovery meta-analysis and the joint analysis and extended models were less than 1.05, indicating that there was not substantial inflation of the test statistics (Supplementary Table 3, Supplementary Figure 1). Association findings from meta-analysis and joint analysis were comparable.
In Stage 1, the strongest signal was from the APOE region (e.g., rs4420638, PM =1.1 × 10−266, PJ =1.3 × 10−253; Supplementary Table 5). Excluding the APOE region, SNPs at nine distinct loci yielded a PM or PJ ≤ 10−6 (Table 1; all SNPs with P < 10−4 are in Supplementary Table 5). SNPs from these nine loci were carried forward to Stage 2. Five of these had not previously been associated with LOAD at a genome-wide significance level of P ≤ 5.0 × 10−8 (MS4A, EPHA1, CD33, ARID5B, and CD2AP). Because Hollingworth et al.12 identified SNPs at ABCA7 as a novel LOAD locus, we included ABCA7 region SNPs in Stage 2 and provided the results to Hollingworth et al.12. For all loci in Table 1, we did not detect evidence for effect heterogeneity (Supplementary Fig. 2). One novel locus (MS4A) was significant in the Stage 1+2 analysis. Four other loci approached but did not reach genome-wide significance in the Stage 1+2 analyses and were carried forward to Stage 3. For three of these (CD33, EPHA1, and CD2AP), Stage 3 analysis strengthened evidence for association. However, Stages 2 and 3 results did not support Stage 1 results for ARID5B 2 (Table 2).
Stage 1+2 analysis identified the MS4A gene cluster as a novel LOAD locus (PM = 1.7 × 10−9, PJ = 1.7 × 10−9)(Table 1, Fig. 1A). The minor allele (MAF = 0.39) was protective with identical odds ratios (ORs) from both meta-analysis and joint analysis (ORM and ORJ = 0.88, 95% CI: 0.85–0.92). In the Stage 1+2 analysis, other SNPs gave smaller P values when compared to discovery SNP rs4938933, with the most significant SNP being rs4939338 (PM = 2.6 × 10−11, PJ = 4.6 × 10−11; ORM and ORJ = 0.87, 95% CI: 0.84–0.91) (Supplementary Table 5). In the accompanying manuscript12, genome-wide significant results were also obtained at the MS4A locus (rs670139, PM = 5.0 × 10−12) using an independent sample. In a combined analysis of ADGC results and those from Hollingworth et al.12, the evidence for this locus at rs4938933 increased to PM = 8.2 × 10−12 (Table 3: ORM = 0.89, 95% CI: 0.87–0.92; Fig. 1A).
SNPs in the CD2AP locus also met our Stage 1 criteria for additional analysis (Fig. 1B). Stage 2 data modestly strengthened this association, but the results did not reach genome-wide significance. Stage 3 analysis yielded a genome-wide significance result for rs9349407 (PM = 8.6 × 10−9), identifying CD2AP as a novel LOAD locus. The minor allele (MAF = 0.27) at this SNP increased risk for LOAD (ORM = 1.11, 95% CI: 1.07–1.15) (Table 2, Fig. 1B).
Another locus studied further in Stages 2 and 3 centered on EPHA1. Previous work provided suggestive evidence that this is a LOAD risk locus, although the associations did not reach genome-wide significance (P = 1.7 × 10−6)2. Here, results from Stages 1 and 2 for SNP rs11767557, located in the promoter region of EPHA1, reached genome-wide significance in the joint analysis. The addition of Stage 3 results increased evidence for association (PM = 6.0 × 10−10, Table 2, Fig. 1C). The minor allele (MAF = 0.19) for this SNP is protective (ORM = 0.90, 95% CI: 0.86–0.93). We observed no evidence for heterogeneity at this locus (Supplementary Fig. 2D, heterogeneity P = 0.58).
In Stages 1 and 2, strong evidence for association was also obtained for SNPs in CD33, a gene located approximately 6Mb from APOE, but the results did not reach genome-wide significance. The addition of Stage 3 data confirmed that CD33 is a LOAD risk locus (rs3865444; Stages 1–3, PM = 1.6 × 10−9). The minor allele (MAF = 0.30) is protective (ORM = 0.91, 95% CI: 0.88–0.93; Tables 1,,2,2, Fig. 1D). A single SNP (rs3826656) in the 5’ region of CD33, was previously reported as an AD-related locus using a family-based approach as genome-wide significant (P = 6.6 × 10−6) 15. We were unable to replicate this finding (PM = 0.73; PJ = 0.39, Stage 1 analysis for rs3826656). Though rs3826656 is only 1,348 bp from our top SNP (rs3865444), these 2 sites display only weak LD (r2 = 0.13).
Hollingworth et al 12 report highly significant evidence for the association of an ABCA7 SNP rs3764650 with LOAD (PM = 4.5 × 10−17) that included data from our study. In our Stage 1+2 analysis, we obtained suggestive evidence for association with ABCA7 SNP rs3752246 (PM = 5.8 × 10−7, and PJ = 5.0 × 10−7), which is a missense variant (G1527A) that may alter the function of the ABCA7 protein (see Supplementary Table 6 for functional SNPs in LD with SNPs yielding PM or PJ < 10−4).
Our Stage 1+2 analyses also confirmed the association of previously reported loci (BIN1, CR1, CLU, and PICALM) with LOAD (Table 1). For each locus, supporting evidence was P ≤ 5.0 × 10−8 in one or both types of analysis.
We also examined SNPs with statistically significant GWAS results reported by others (GAB216, PCDH11X17, GOLM118, and MTHFD1L 19, Supplementary Table 7). Stage 1 data were used except for PCDH11X where Stage 1+2 data were used because Affymetrix platforms do not contain the appropriate SNP. Only SNPs in the APOE, CR1, PICALM, and BIN1 loci demonstrated P < 10−6. For MTHFD1L19, at rs11754661 (previously reported P = 4.7 × 10−8) we obtained modest independent association evidence (ORM = 1.16, 95% CI: 1.04–1.29, PM = 0.006; ORJ = 1.19, 95% CI: 1.08–1.32, PJ = 7.5 × 10−4). For the remaining sites, only nominal evidence (P < 0.05) or no evidence was obtained. For the GAB2 locus16 at rs10793294 (previously reported P = 1.60 × 10−7), we obtained nominal statistical significance results (PM = 0.017; PJ = 0.029). The association for rs5984894 in the PCDH11X locus17 (previously reported P = 3.9 × 10−12), did not replicate (PM = 0.89, PJ = 0.26). Likewise, findings at GOLM118 for rs10868366 (previously reported P = 2.40 × 10−4) did not replicate (PM = 0.71; PJ = 0.62). Another gene consistently implicated in LOAD is SORL120 where at rs3781835 (previously reported P = 0.006), we obtained modest evidence for association (ORM = 0.72, 95% CI: 0.60–0.86, PM = 2.9 × 10−4; ORJ = 0.78, 95% CI: 0.59–0.86; PJ = 3.8 × 10−4).
We examined the influence of the APOE ε4 allele on the loci in Table 1, stratified by and in interactions with APOE ε4 allele carrier status. After adjustment, all loci had similar effect sizes to the unadjusted analyses with some showing a modest reduction in statistical significance. We previously reported evidence for a PICALM-APOE21 interaction using a dataset that largely overlaps with the Stage 1 dataset used here. However, using the Stage 1+2 data, we do not replicate this finding or see evidence of SNP-APOE interactions with Table 1 loci (data not shown).
Previous work reported an association between LOAD and chromosome 19 SNP rs597668, located 7.2 kb proximal to EXOC3L2 and 296 kb distal of APOE 2. While we did observe a signal for this SNP (Stage 1, PM = 1.5 × 10−9; PJ = 7.7 × 10−10) and other SNPs in the EXOC2L3-MARK4 region, evidence was completely extinguished for all SNPs after adjustment for APOE (Online Methods, Supplementary Table 8), suggesting that signal in this region is from APOE.
Our observation of genome-wide significant associations at MS4A4A, CD2AP, EPHA1, and CD33 extend our understanding of the genetic architecture of LOAD and confirm the emerging consensus that common genetic variation plays a significant role in the etiology of LOAD. With our findings and those by Hollingsworth et al.12, there are now ten LOAD susceptibility loci (APOE, CR1, CLU, PICALM, BIN1, EPHA1, MS4A, CD33, CD2AP, and ABCA7). Examining the amount of genetic effect attributable to these candidate genes, the most strongly associated SNPs at each locus other than APOE demonstrated population attributable fractions (PAFs) between 2.72–5.97% (Supplemental Table 9), with a cumulative PAF for non-APOE loci estimated to be as much as 35%; however, these estimates may vary widely between studies22, and the actual effect sizes are likely to be much smaller than those estimated here because of the ‘winner’s curse’. Also the results do not account for interaction among loci, and are not derived from appropriate population-based samples.
A recent review of GWAS studies23 noted that risk alleles with small effect sizes (0.80 < OR < 1.2) likely exist for complex diseases such as LOAD but remain undetected, even with thousands of samples, because of insufficient power24. Our discovery dataset (Stage 1; 8,309 cases and 7,366 controls), was well-powered to detect associations exceeding the statistical significance threshold of P < 10−6 (Supplementary Table 9). If there are many loci of more modest effects, some, but not all, will likely be detected in any one study. This likely explains the genome-wide statistical significance for the ABCA7 locus in the accompanying manuscript12, which reaches only modest statistical significance in our dataset (rs3752246; PM = 1.0 × 10−5, PJ = 1.9 × 10−5). Finding additional LOAD loci will require larger studies with increased depth of genotyping to test for the effects of both common and rare variants.
The National Institutes of Health, National Institute on Aging (NIH-NIA) supported this work through the following grants: ADGC, U01 AG032984, RC2 AG036528; NACC, U01 AG016976; NCRAD, U24 AG021886; NIA LOAD, U24 AG026395, U24 AG026390; Boston University, P30 AG013846, R01 HG02213, K24 AG027841, U01 AG10483, R01 CA129769, R01 MH080295, R01 AG009029, R01 AG017173, R01 AG025259; Columbia University, P50 AG008702, R37 AG015473; Duke University, P30 AG028377; Emory University, AG025688; Indiana University, P30 AG10133; Johns Hopkins University, P50 AG005146, R01 AG020688; Massachusetts General Hospital, P50 AG005134; Mayo Clinic, P50 AG016574; Mount Sinai School of Medicine, P50 AG005138, P01 AG002219; New York University, P30 AG08051, MO1RR00096, and UL1 RR029893; Northwestern University, P30 AG013854; Oregon Health & Science University, P30 AG008017, R01 AG026916; Rush University, P30 AG010161, R01 AG019085, R01 AG15819, R01 AG17917, R01 AG30146; University of Alabama at Birmingham, P50 AG016582, UL1RR02777; University of Arizona/TGEN, P30 AG019610, R01 AG031581, R01 NS059873; University of California, Davis, P30 AG010129; University of California, Irvine, P50 AG016573, P50, P50 AG016575, P50 AG016576, P50 AG016577; University of California, Los Angeles, P50 AG016570; University of California, San Diego, P50 AG005131; University of California, San Francisco, P50 AG023501, P01 AG019724; University of Kentucky, P30 AG028383; University of Michigan, P50 AG008671; University of Pennsylvania, P30 AG010124; University of Pittsburgh, P50 AG005133, AG030653; University of Southern California, P50 AG005142; University of Texas Southwestern, P30 AG012300; University of Miami, R01 AG027944, AG010491, AG027944, AG021547, AG019757; University of Washington, P50 AG005136, UO1 AG06781, UO1 HG004610; Vanderbilt University, R01 AG019085; and Washington University, P50 AG005681, P01 AG03991. ADNI Funding for ADNI is through the Northern California Institute for Research and Education by grants from Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics, Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., Alzheimer's Association, Alzheimer's Drug Discovery Foundation, the Dana Foundation, and by the National Institute of Biomedical Imaging and Bioengineering and NIA grants U01 AG024904, RC2 AG036535, K01 AG030514. We thank Creighton Phelps, Marcelle Morrison-Bogorad, and Marilyn Miller from NIA who are ex-officio ADGC members. Support was also from the Alzheimer’s Association (LAF, IIRG-08-89720; MP-V, IIRG-05-14147) and the Veterans Affairs Administration. P.S.G.-H. is supported by Wellcome Trust, Howard Hughes Medical Institute, and the Canadian Institute of Health Research.
The Alzheimer Disease Genetics Consortium (ADGC), http://alois.med.upenn.edu/adgc/about/overview.html; ADNI database, (www.loni.ucla.edu/ADNI); ADNI investigators, http://www.loni.ucla.edu/ADNI/Collaboration/ADNI_Manuscript_Citations.pdf; APOE Genotyping kit from TIB MOLBIOL, http://www.roche-as.es/logs/LightMix%C2%AE_40-0445-16_ApoE-112-158_V080904.pdf; PLINK, http://pngu.mgh.harvard.edu/~purcell/plink/; PREST, http://utstat.toronto.edu/sun/Software/Prest/; MACH, http://www.sph.umich.edu/csg/abecasis/mach/; EIGENSTRAT, http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm; The R Project for Statistical Computing, http://www.r-project.org/; Package GWAF in R, http://cran.r-project.org/web/packages/GWAF/index.html; Package gee in R, http://cran.r-project.org/web/packages/gee/index.html; UCSC Genome Browser, http://genome.ucsc.edu/; METAL, http://www.sph.umich.edu/csg/abecasis/Metal/; FUGUE, http://www.sph.umich.edu/csg/abecasis/fugue/.
Author ContributionsSample collection, phenotyping, and data management: J.D.Buxbaum, G.P.J., P.K.C., E.B.L., T.D.B., B.F.B., N.R.G., P.L.D., D.E., J.A.Schneider, M.M.C., N.E., S.G.Y., C.C., J.S.K.K., P.N., P.K., J.H., M.J.H., A.J.M., M.M.B., F.Y.D., C.T.B., R.C.G., E.R., P.S.G.-H., S.E.A., R.B., T.B., E.H.B., J.D.Bowen, A.B., J.R.B., N.J.C., C.S.C., S.L.C., H.C.C., D.G.C., J.C., C.W.C., J.L.C., C.D., S.T.D., R.D.-A., M.D., D.W.D., W.G.E., K.M.F., K.B.F., M.R.F., S.F., M.P.F., D.R.G., M.Ganguli, M.Gearing, D.H.G., B.Ghetti, J.R.G., S.G., B.Giordani, J.G., J.H.G., R.L.H., L.E.H., E.H., L.S.H., C.M.H., B.T.H., G.A.J., L.-W.J., N.J., J.K., A.K., J.A.K., R.K., E.H.K., N.W.K., J.J.L., A.I.L., A.P.L., O.L.L., W.J.M., D.C.Marson, F.M., D.C.Mash, E.M., W.C.M., S.M.M., A.N.M., A.C.M., M.M., B.L.M., C.A.M., J.W.M., J.E.P., D.P.P., E.P., R.C.P., W.W.P., J.F.Q., M.R., B.R., J.M.R., E.D.R., R.N.R., M.S., L.S.S., W.S., M.L.S., M.A.S., C.D.S., J.A.Sonnen, S.S., R.A.S., R.E.T., J.Q.T., J.C.T., V.M.V., H.V.V., J.P.V., S.W., K.A.W., J.W., R.L.W., L.B.C., B.A.D., D.Beekly, M.I.K., A.J.S., E.M.R., D.A.B., A.M.G., W.A.K., T.M.F., J.L.H., R.M., M.A.P., L.A.F. Study management and coordination: L.B.C., D.Beekly, D.A.B., J.C.M., T.J.M., A.M.G., D.Blacker, D.W.T., H.H., W.A.K., T.M.F., J.L.H., R.M., M.A.P., L.A.F., G.D.S. Statistical methods and analysis: A.C.N., G.J., G.W.B., L.-S.W., B.N.V., J.B., P.J.G., R.M.C., R.A.R., M.A.S., K.L.L., E.R.M., J.L.H., M.A.P., L.A.F. Interpretation of results: A.C.N., G.J., G.W.B., L.-S.W., B.N.V., J.B., P.J.G., R.A.R., M.A.S., K.L.L., E.R.M., M.I.K., A.J.S., E.M.R., D.A.B., J.C.M., T.J.M., A.M.G., D.Blacker, D.W.T., H.H., W.A.K., T.M.F., J.L.H., R.M., M.A.P., L.A.F., G.D.S. Manuscript writing group: A.C.N., G.J., G.W.B., L.-S.W., B.N.V., J.B., P.J.G., J.L.H., R.M., M.A.P., L.A.F., G.D.S. Study design: D.A.B., J.C.M., T.J.M., A.M.G., D.Blacker, D.W.T., H.H., W.A.K., T.M.F., J.L.H., R.M., M.A.P., L.A.F., G.D.S.
Competing Financial Interests
T.D.B. received licensing fees from and is on the speaker's bureau of Athena Diagnostics, Inc. M.R.F. receives research funding from BristolMyersSquibb Company, Danone Research, Elan Pharmaceuticals, Inc., Eli Lilly and Company, Novartis Pharmaceuticals Corporation, OctaPharma AG, Pfizer Inc., and Sonexa Therapeutics, Inc; Receives honoraria as scientific consultant from Accera, Inc., Astellas Pharma US Inc., Baxter, Bayer Pharmaceuticals Corporation, BristolMyersSquibb, Eisai Medical Research, Inc., GE Healthcare, Medavante, Medivation, Inc., Merck & Co., Inc., Novartis Pharmaceuticals Corp., Pfizer, Inc., Prana Biotechnology Ltd., QR Pharma., Inc., The sanofi-aventis Group, and Toyama Chemical Co., Ltd.; and is speaker for Eisai Medical Research, Inc., Forest Laboratories, Pfizer Inc. and Novartis Pharmaceuticals Corporation. A.M.G. has research funding from AstraZeneca, Pfizer and Genentech, and has received remuneration for giving talks at Pfizer and Genentech. R.C.P. is on the Safety Monitory Committee of Pfizer, Inc. (Wyeth) and a consultant to the Safety Monitoring Committee at Janssen Alzheimer's Immunotherapy Program (Elan), to Elan Pharmaceuticals, and to GE Healthcare. R.E.T. is a consultant to Eisai, Japan in the area of Alzheimer's genetics and a shareholder in, and consultant to Pathway Genomics, Inc, San Diego, CA.