Analysis of the NIA-LOAD/NCRAD sample indicates that unraveling susceptibility to LOAD is complex even when individuals from genetically-loaded multiplex families are included. As with other studies, support for the association between LOAD and SNPs near APOE was strong. By taking advantage of this association, we were able to identify a potential novel locus, CUGBP2, on chromosome 10p14 with genome-wide significant evidence of association within the highest-risk APOE ε4/ε4 stratum, with replication in an independent sample. We also found support for association with recently-reported SNPs in CLU and BIN1, and to a lesser extent with CR1. However, we found that the strong APOE association also introduced a source of structure into the sample that had effects that were detectable through standard evaluation of analysis results. Our results provide strong evidence that this correlation with APOE explains the association in this sample with some, but not all, previously-noted SNPs, including PICALM and the recently-proposed association near EXOC3L2, both of which have significantly different allele frequencies in AD cases who are carriers vs. non-carriers of the APOE ε4 allele.
Detection of true risk loci in a GWAS of LOAD requires careful attention to potential sampling biases
[87]. Large samples such as ours are necessary for detecting modest associations, but such samples usually involve multiple collection sites, introducing the potential for confounding or other complications. Consistent with this, across our participating sites we found variability in the numbers of cases and controls, the fraction of underlying identifiable ethnic subgroups, differences among subgroups in terms of
APOE genotype frequencies, and differences in
APOE genotype distributions as a function of an indicator of genetic differentiation. None of this is surprising, given the history of US colonization and immigration coupled with differentiation among European populations
[81],
[88]. Other large samples in Europe and other locations are likely to have similar issues, as suggested by genome-wide inflation factors reported by recent studies
[42],
[43] that were higher than those in our study. Appropriate accommodation for confounding or structure when it is present can provide both protection against false positive associations, as well as increased power to detect associations that are confined to a subset of the sample, as we have demonstrated as part of our investigations surrounding the influence of
APOE on our results. We also found that common methods failed to provide the necessary correction for
APOE-induced associations, including use of principal components adjustment
[64] and genomic control
[74]. Together these observations have important implications for interpretation of results from other large combined samples.
Accommodation for
APOE genotype was key for obtaining appropriate genomic control in our sample. Incorporation of individual
APOE genotypes, as opposed to the more typical use of presence or absence of
ε4, resulted in the closest approximation to a uniform distribution of p-values over a wide range of the test results. This likely resulted in a reduction in false positive association results since such control must be achieved before accepting evidence of association. Not only were our genome-wide results impacted by adjustment for
APOE genotypes, but the support for some SNP associations from previous studies was similarly affected. For the SNPs that were most sensitive to
APOE-adjustment, the allele frequencies differed among cases as a function of
APOE genotype, suggesting a relatively simple diagnostic for which SNPs require adjustment for
APOE as part of the analysis: for such SNPs, a full adjustment for
APOE genotype may be critical for genomic control in part because of allele frequency differences among populations
[82],
[89]. These differences could lead to structure in the ascertained sample through variability in disease risk or survival in underlying subpopulations, as seen across the subpopulations identified in this sample. It thus may represent a corollary to confounding through ascertainment of cases, possibly related to the effects discussed by Voight and Pritchard
[90]. Alternatively, it may represent statistical interaction resulting from population stratification, which can create mild linkage-disequilibrium between many markers that are on different chromosomes, with the strongest such LD occurring between loci with the largest frequency differences across populations. Such genome-wide effects of population stratification have recently been demonstrated both in simulated data, and in breast cancer, where there is association, detectable in cases, between SNPs in LCT and genome-wide SNPs, with a similar genomewide shift in the distribution of p-values
[91]. Such adjustments for loci with strong effects may also be important in other diseases with such strong risk loci.
Stratification on
APOE genotype did facilitate the identification of a novel region with genome-wide significant evidence for association on chromosome 10p14, which replicated in a second sample consisting of three additional cohorts. This region was identified only in the
APOE ε
4/ε
4 stratum or in a logistic analysis that contrasted
ε4 and
ε3 homozygotes in a model with an interaction term with
APOE. The relative infrequency of
ε4 homozygotes means that these results will need to be further investigated in other large data sets to determine its importance. Data sets that consist of high-risk families, such as our sample and the NIMH AD sample
[92], may be preferable in such analyses, since such sample ascertainment may have contributed to the detection of this locus through the resulting presence of a relatively high fraction of
APOE ε4 homozygotes. It is also worth noting that an earlier linkage analysis of a subset of the families used here, based on the Illumina 6K mapping panel, obtained lod scores for rs1537626 of 2.35 in the whole sample and 1.6 in an analysis that retained only
APOE ε4-positive cases. This SNP is within 10 cM of rs201119
[93]. This SNP was not on the marker panel used here, nor was rs201119 on the earlier 6K marker panel, preventing further comparison of results. It is also possible that analysis within the high-risk
APOE ε4/ε4 genotype improved detection of this region in the current study by increasing the within-genotype penetrance, possibly by affecting age-at onset. If so, this would be similar to the strategy of identifying risk- or age-at-onset modifier loci on a background of a single, early-onset AD mutation
[94]–
[96]. The implicated region on chromosome 10p14 contains the genes
CUGBP2 and
PITRM1.
CUGBP2 has one isoform that is expressed predominantly in neurons, with experimental evidence suggesting involvement in apoptosis in the hippocampus
[97], with both these observations consistent with a role in pathogenesis of Alzheimer's disease.
PITRM1 can degrade amyloid β4
APP protein when it is accumulated in mitochondria
[98].
Our results both support and refute recently proposed association with SNPs in several genes
[42]–
[44]. Evidence for association with SNPs previously reported in each of
BIN1,
CLU, and
CR1 was relatively robust to
APOE adjustment within this European-American sample, with evidence for
BIN1 and
CR1 also obtained across an analysis that conditioned on ethnic background. Recent reports by others that include portions of the sample we used here also report evidence for association with PICALM
[99],
[100], but did not report the results of quality control analyses that allow evaluation of adequacy of correction for confounding. In our analyses, with correction for sources of confounding, evidence for association with SNPs in
PICALM and
EXOC3L2 was much less convincing than for these other three loci because of the exquisite sensitivity to
APOE adjustment. One interpretation of sensitivity of these associations to
APOE adjustment is that this statistical interaction is indicative of biological interaction in an analysis that includes a subset of the current sample
[99]. However, the differences in SNP allele frequencies across
APOE strata within cases that we showed here coupled with information demonstrating the existence of population stratification raise concerns that the original associations for these latter SNPs may represent confounding or other aspects of sample or population structure. This could include linkage disequilibrium with
APOE, even for unlinked markers. Further investigation in genetically more diverse populations will still be necessary to clarify even the role of SNPs with positive evidence for association, because shared history can lead to spurious replication in samples drawn from the same population
[80].
The results presented here and in other GWAS reports of LOAD underscore the view that such studies do not necessarily identify the specific genetic alterations contributing to disease risk. Rather, they are useful in identifying genes or gene pathways involved in disease pathogenesis or risk. In that sense, GWAS represents a method of screening the genome for genes that may also contain rare variants. While the large number of subjects in current GWAS provides a benefit in terms of perceived statistical power, it comes at a price. For example, despite the very low p-values representing genome-wide statistical significance, the effect sizes in most recent GWAS involving LOAD are small. It has also been suggested that different significance thresholds as a function of sample size are needed in order to balance power against the false-discovery rate
[101], with very large studies requiring more stringent thresholds. This means that subtle differences in the genetic architecture of either the cases or the controls become more important with increasing sample sizes. In this situation some of the “significant” differences in allele frequency may also represents differences in ancestral origins rather than disease phenotype-genotype associations, and would likely not lead to further biological insights. As we have shown here, genetic variability within European-American groups exists and can affect analyses of association. Moving forward, GWAS in LOAD should consider more detailed care to control for population stratification or
APOE genotypes prior to drawing firm conclusions about associations. In this sense bigger studies of LOAD or of other diseases with similar influential risk loci may not always be better, if the increases in sample size result in added data structure or confounding.