|Home | About | Journals | Submit | Contact Us | Français|
Alzheimer’s disease (AD) is the most common form of progressive dementia in the elderly. It is a neurodegenerative disorder characterized by the neuropathologic findings of intracellular neurofibrillary tangles and extracellular amyloid plaques that accumulate in vulnerable brain regions. AD etiology has been studied by many groups, but since the discovery of the APOE ε4 allele, no further genes have been mapped conclusively to the late-onset form of the disease. In this study, we examined genetic association with late-onset Alzheimer’s susceptibility in 738 Caucasian families with 4704 individuals and an independent case-control dataset with 296 unrelated cases and 566 unrelated controls exploring 11 candidate genes with 47 SNPs common to both samples. In addition to tests for main effects and haplotype analyses, the Multifactor Dimensionality Reduction Pedigree Disequilibrium Test (MDR-PDT) was used to search for single-locus effects as well as 2-locus and 3-locus gene-gene interactions associated with AD in the family data. We observed significant haplotype effects in ACE in both family and case-control samples using standard and cladistic haplotype models. ACE was also part of significant 2-locus and 3-locus MDR-PDT joint effects models with Alpha-2-Macroglobulin (A2M), which mediates the clearance of Aβ, and Leucine-Rich Repeat Transmembrane 3 (LRRTM3), a nested gene in Alpha-3 Catenin (CTNNA3) which binds Presenilin 1. This result did not replicate in the case-control sample, and may not be a true positive. These genes are related to amyloid beta clearance; thus this constellation of effects might constitute an axis of susceptibility for late-onset AD. The consistent ACE haplotype result between independent data sets of families and unrelated cases and controls is strong evidence in favor of ACE as a susceptibility locus for AD, and replicates results from several other studies in a very large sample.
Alzheimer’s disease (AD) (OMIM 104300, 104310) is the most common form of progressive dementia in the elderly. More than 4 million Americans are afflicted with this debilitating disorder and many studies have been conducted to elucidate an etiology [Martin et al., 2005]. The discovery of the APOE ε4 risk factor demonstrated that genetic analysis can be successful in complex disease research. However, between 42% and 68% of cases do not carry the ε4 allele [Henderson et al., 1995; Lucotte et al., 1994; Ritchie et al., 1996; Hardy et al., 2004]. Additional environmental and genetic factors likely play a role in Alzheimer’s susceptibility. Some of these putative factors are explored in the current study. Genes known to interact with presenilins, amyloid beta (Aβ) clearance, and cardiovascular disease are evaluated due to their known biological relevance.
Amyloid precursor proteins and presenilins influence autosomal dominant, early-onset disease due to altered Amyloid Protein Precursor processing, leading to Aβ deposition [Goate, 2006; Hardy, 1997; Levy-Lahad et al., 1995; Rogaev et al., 1995; Sherrington et al., 1995]. These mutations have not been shown to influence late-onset susceptibility, which is far more prevalent. They do, however, provide potential insight into the pathophysiology of the disorder.
Fourteen years after the discovery of APOE, single-locus approaches by many groups have not discovered any additional candidates consistently associating with late-onset AD (LOAD). The ε4 allele of APOE causes increased risk for AD, while the ε2 allele is protective [Chartier-Harlin et al., 1994; Corder et al., 1993]. The mechanism by which APOE ε4 influences risk of AD is unknown, but is likely related to Aβ processing [Bales et al., 1999]. This inability to unravel the mechanism underlying the trait, given steadily increasing ascertainment and genotyping capability, illustrates the difficulty of finding LOAD genes.
A gene that has been carefully studied for association with LOAD with inconsistent results is angiotensin converting enzyme (ACE) [Alvarez et al., 1999a; Kehoe et al., 1999; Scacchi et al., 1998; Chapman et al., 1998]. Large meta-analyses of ACE markers support the hypothesis that ACE is a susceptibility locus for AD [Bertram et al., 2007a; Lehmann et al., 2005]. ACE functions in several biological systems that may be related to AD, such as the renin-angiotensin system regulating salt homeostasis [Reid, 1992] and Aβ degradation pathways [Hemming and Selkoe, 2005; Hu et al., 2001].
Increased ACE activity and expression has also been associated with AD patients and ACE isoform I accumulates perivascularly in cases of severe cerebral amyloid angiopathy [Miners et al., 2008a]. The N-teminal catalytic domain of ACE has recently been indicated as primarily responsible for Aβ degradation [Oba et al., 2005]. Other studies have suggested the C-terminal domain also participates in Aβ catabolism [Sun et al., 2008]. Additionally, species of Aβ that are associated with aging are selectively degraded by ACE [Toropygin et al., 2008], and ACE has been demonstrated to cleave the putatively pathogenic form of Aβ-42 to the more benign Aβ-40 [Zou et al., 2007]. The statistical and molecular evidence for ACE as an AD locus has motivated the discussion of ACE inhibitors as therapeutics for AD in the literature [Kehoe and Wilcock, 2007; Miners et al., 2008b; Nalivaeva et al., 2008], as well as studies evaluating ACE inhibitors in vivo in both animal models of Aβ deposition [Eckman et al., 2006; Hemming and Selkoe, 2005; Hemming et al., 2007] and clinical trials for the effect of ACE inhibitors on dementia and AD [Hanon and Forette, 2004; Khachaturian et al., 2006; Ohrui et al., 2004; Tzourio et al., 2003].
Some other genes where previous associations have been observed but inconsistently replicated are alpha-2-macroglobulin (A2M) [Blacker et al., 1998; Rogaeva et al., 1999; Alvarez et al., 1999b; Blennow et al., 2000] and alpha-T-catenin (CTNNA3) [Bertram et al., 2007b; Busby et al., 2004; Li et al., 2008; Martin et al., 2005], and a nested gene in CTNNA3 leucine-rich repeat transmembrane protein 3 (LRRTM3) [Ertekin-Taner et al., 2003; Martin et al., 2005]. A2M has protease inhibitor activity [Bergqvist and Nilsson, 1979], and mediates the clearance of Aβ deposits. CTNNA3 binds to beta catenin which then interacts with presenilin 1, which has been associated with early-onset AD [Sherrington et al., 1995]. There have also been previous reports of linkage to late-onset AD at the CTNNA3/LRRTM3 locus in addition to CTNNA3 association with Aβ-42 levels in late-onset AD families [Ertekin-Taner et al., 2000; Ertekin-Taner et al., 2003].
ACE, A2M and LRRTM3 were also found as the best multilocus model by Multifactor Dimensionality Reduction Pedigree Disequilibrium Test (MDR-PDT) analysis of our family sample. Previous analyses of the family samples presented here restricted to only to CTNNA3/LRRTM3 and APOE detected a multilocus model with a synergistic effect [Martin et al., 2005; Martin et al., 2006]. Here we expand this analysis to several more candidate genes, including some previously studied by our group for main effects in samples with various degrees of overlap with these, such as A2M [Rogaeva et al., 1999], LRRTM3/CTNNA3 [Liang et al., 2007; Martin et al., 2005; Martin et al., 2006], and LRP1 [Scott et al., 1998].
Undiscovered gene-gene interactions associating with LOAD could explain why the search for LOAD loci since the APOE discovery has been relatively fruitless [Ioannidis, 2007]. Such interactive effects can exist without the presence of substantial main effects, making detection with single-locus analysis unlikely [Hirschhorn et al., 2002]. Methods for the analyses of large interaction search spaces are now available and were applied here for family and case-control data [Martin et al., 2006; Ritchie et al., 2001; Ritchie et al., 2003]. Due to strong biological and epidemiological evidence of a genetic etiology for LOAD but lack of consistent single-locus findings, LOAD would appear to be an ideal trait to begin a search for epistasis among past candidates.
The goal of this study is to explore effects explaining LOAD through single-locus analysis, haplotypes, and epistatic gene-gene interactions among candidate gene SNPs.
The data for this study consisted of genotypes in both a family sample and an independent case-control sample. The family sample contains 738 families consisting of 4704 indivduals collected through three ascertainment groups: the Collaborative Alzheimer Project (CAP: The Joseph and Kathleen Bryan ADRC and the Center for Human Genetics at Duke University, the Center for Human Genetics Research at Vanderbilt University Medical Center, and the University of California at Los Angeles Neuro-psychiatric Institute); National Institutes of Mental Health (NIMH); and the National Cell Repository for AD at Indiana University Medical Center (IU). The family sample is described in Table 1. The singleton dataset contains 158 families with one sampled affected family member and any number of unaffected siblings. The multiplex dataset contains 580 families with at least two sampled affected family members.
All affected individuals met the NINDS/ADRDA criteria for probable or definite AD. Unaffected relatives from the CAP and NIMH sites showed no signs of dementia upon examination. Unaffected individuals from IU were classified based on self report. The mean (SD) age at onset (AAO) in affected individuals was 72.31 (9.09) years, and the mean (SD) age at examination (AAE) was 74.82 (11.02) years. For more information on the family sample, see [Martin et al., 2005]
The case-control dataset consisted of 296 unrelated cases and 566 unrelated controls independent of the family data. The average age of exam (standard deviation) for cases was 79.02(6.76), and controls was 73.63(6.30). The average age of onset (SD) for cases was 71.78(7.82). The ages of onset and controls were not significantly different. Unrelated cases were determined to be affected by examination based on the same criteria as the cases in the family data. Priority for selection was given to cases where age of onset was known, Parkinson’s disease (PD) was not present, depression status was known, and documentation proving AD was available. Unaffected controls required unaffected status confirmed by examination, no first-degree relatives with AD, no PD, otherwise no dementia, and adequate DNA for genotyping. Unrelated cases and controls were collected at the Center for Human Genetics at Duke University and the Center for Human Genetics Research at Vanderbilt University Medical Center. Also ascertained in the case control data was hypertension status, which was measured by survey as having ever being diagnosed with hypertension.
The list of SNPs selected for this study is shown in Table 2. The rationale for including each gene in the list of candidates is detailed in Table 3. The SNPs were designed to be genotyped on the Applied Biosystems, Taqman 7900HT allelic discrimination system and were either custom (Assay by Design) or inventoried (Assay on Demand) assays. All genotyping reactions were run according to the standard genotyping methods as outlined by Applied Biosystems protocols and were performed on 3ng of genomic DNA per reaction. All SNPs were held to a minimum genotyping efficiency of 95%. Quality control was performed on the SNPs by using matched pairs of quality control samples placed within and between the 384 well plates. Laboratory technicians were blinded to the matching pattern, affection status, and pedigree information. In the family sample 48 SNPs were genotyped, including A2M SNP rs1800433, which was not genotyped in the case-control data. In the case-control sample, 55 SNPs were genotyped, including SNPs not genotyped in the family data in AGT, NCSTN and A2MP. These differences are detailed in Table 2.
To examine association between alleles and genotypes and AD in the family data, the Pedigree Disequilibrium Test (PDT) and genotype Pedigree Disequilibrium Test (genoPDT) sum statistics were used [Martin et al., 2000; Martin et al., 2003a] The PDT statistics measure transmission of alleles or genotypes to affected offspring from informative pedigrees, testing for excessive transmission of particular alleles or genotypes.
An informative pedigree is either a nuclear family with at least one affected child, both parents genotyped at the locus and at least one heterozygous parent, a discordant sibling pair (DSP) with different genotypes at the locus with or without parental genotypes, or an extended pedigree with at least one informative nuclear family or DSP. Most information about association with the trait in these data comes from DSPs due to the late onset of AD.
The MDR-PDT is a within-family measure of multilocus association between genotypes and phenotype [Martin et al., 2006]. The genoPDT statistic functions within the framework of the Multifactor Dimensionality Reduction (MDR) algorithm [Hahn et al., 2003; Ritchie et al., 2001; Ritchie et al., 2003] by establishing which multilocus genotypes are positively associated with the outcome of interest. Positive values of the genoPDT test statistic classify multilocus genotypes as high-risk, creating a binary variable useful for summarizing the association at a multilocus model, and retaining the useful property of robustness to population stratification.
To examine whether the best model of a given order that is observed by MDR-PDT is a real signal or is the result of sampling error, a permutation test is conducted. The permutation test consists of randomizing status for offspring, holding the proportion of affected individuals constant within sibships across permutations, calculating the statistic, and repeating many times to estimate the distribution of the null hypothesis. The test based on the permutation procedure should have the correct type I error, even for sparse data. This validity is due to all contingency table cells from each permutation containing the same number of observations.
Tag SNPs in family data were chosen using tagger, a function within the haploview software package [Barrett et al., 2005; Gabriel et al., 2002] for the MDR-PDT analyses to remove redundant variables from the data, which reduce the power of MDR-PDT. An r2 threshold of 0.8 and LOD of 3 were used to choose tag SNPs in order to eliminate nearby markers with very similar information and maximize power for MDR-PDT analysis. The abridged data contained 32 of the original 47 markers.
For single-locus effect size evaluation, the referent group was the major allele homozygote. The adjustment of Siegmund et al. [Siegmund et al., 2000] was implemented to correct confidence intervals for familial correlation in regions of linkage.
The Association in the Presence of Linkage (APL) statistic [Chung et al., 2006; Martin et al., 2003b] was employed to measure haplotype associations in family data. APL measures the difference in the number of copies of an allele or haplotype in affected offspring from the expected number of copies under the null hypothesis of no association conditional on parental genotypes. APL uses nuclear families with at least one affected offspring. When parental genotypes are missing, they are inferred using the expected probabilities of consistent parental mating types. APL correctly adjusts for correlated transmissions to multiple affected siblings by estimating IBD probabilities. The probability IBD 0, 1, 2 and the haplotype frequency are estimated by EM algorithm [Clark, 1990; Excoffier and Slatkin, 1995; Long et al., 1995].
To estimate the variance of the APL statistic, a bootstrapping approach is used [Chung et al., 2006]. Bootstrap samples are taken with replacement across families, forming same-size pseudosamples consisting of replicates of some families and missing others at random. The variance of the APL statistic calculated for all pseudosamples is the estimated sampling variance for the statistic. This variance can be used to test the null hypothesis of no association allowing for the presence of linkage.
To test for association of sex with genotypes in the case-control data chi-squared or Fisher’s exact tests of differences between frequencies of alleles and genotypes between sexes were performed in controls at each marker. This test should detect where sampling error has distorted the distribution of alleles or genotypes by sex at autosomal markers. Since there is a difference in prevalence by sex in AD, such a scenario in the data could cause confounding. If the genotype frequency tests were significant at the 0.05 level, then sex-stratified chi-squared or Fisher’s exact tests of Hardy-Weinberg Equilibrium (HWE) and association with disease at alleles and genotypes were performed.
Sex stratification, single site allele and genotype frequency and association, and HWE analyses in controls were performed using Powermarker statistical software [Zaykin et al., 2002]. Where the number of observations for a cell from the 3×2 table stratifying the data by genotype and status was five or less, Fisher’s exact test was used to assess HWE and association with LOAD.
Multifactor dimensionality reduction (MDR) [Hahn et al., 2003; Ritchie et al., 2001; Ritchie et al., 2003] was used to search for interactions in the case-control data. MDR exhaustively screens all possible interactions and ranks results by the signal detected by balanced accuracy and cross-validation consistency in case-control data to find models with the most potential to be real interactions. MDR has performed well across many genetic simulation scenarios where purely epistatic relationships existed between status and a set of variables with an absence of main effects [Ritchie et al., 2003].
To estimate single-locus effect sizes in parallel with the family data, effect sizes in case-control data were estimated using logistic regression using the major allele homozygote as the referent group [Stata Corp, 2005].
Haplotype analyses for case-control data were performed using the haplo.cc and haplo.glm functions in Haplo.Stats [Schaid et al., 2002]. A 3-marker sliding window was run to identify associations among correlated sets of markers. Full haplotypes were tested and haplotype exposure odds ratios were estimated using the most frequent haplotype as the referent group.
The website SNPer [Riva and Kohane, 2004] and Entrez PubMed were used to collect information on candidate genes and genotyped markers. Online Inheritance in Man (OMIM) [OMIM, 2008] was used to collect information about the phenotype and candidate genes. The Alzgene database at www.Alzgene.org [Bertram et al., 2007a] was also used to collect information about AD association studies.
Multiple testing was accounted for depending on the type of analysis. MDR and MDR-PDT both inherently correct for the search conducted with permutation testing. Multiple tests of main effects were corrected using Nyholt’s method [Nyholt, 2004] with the modification of [Li and Ji, 2005]. The effective number of tests for the 47 markers which were in both datasets was 28.9 for the founders from family data and 29.4 for the controls from the case-control data, showing the similarity of correlation among these independent samples. There were 7 additional tests in the case control data than the family sample, and the effective number of independent tests considering all markers for case-control and family samples was 34.6 and 29.9, respectively. These effective numbers of tests for each dataset lead to thresholds for significance of p≤0.0015 for the case-control data and p≤0.0017 for the family data. To reject the null for any test from either sample, the threshold is p≤0.00079.
For purposes of assessing significance where two tests have been performed for the same null hypothesis in independent samples, we used Fisher’s method [Fisher, 1950] to merge p-values from the same SNP in different samples. We then compared the merged p-value to the threshold for significance given the effective number of independent tests established by SNPSpD. This threshold is determined by the Sidak correction for multiple tests [Sidak, 1967] which is slightly more liberal than the Bonferroni correction but provides the exact correction necessary to return the experiment-wise error rate to the desired level.
Clade-based haplotype analysis was conducted using markers rs4291 and rs4343 as suggested in Katzov et al [Katzov et al., 2004]. These markers denote ancestral haplotype clades A, B and C which have been previously associated with circulating levels of ACE [Keavney et al., 1998; Rieder et al., 1999; Soubrier et al., 2002]. A straightforward cladistic model of the ACE locus was proposed by Farrall et al., in which the variation in the gene could be captured using only a few markers [Farrall et al., 1999]. We analyzed these markers for haplotype association with LOAD.
For the family dataset, single-locus associations were examined with allele and genotype PDT statistics. These results are presented in Table 4. Seventeen SNPs from the family data yielded a p-value less than 0.05 at tests of either alleles or genotypes. Of these, only APOE is statistically significant after accounting for multiple tests. The PZP SNP rs12230214 (C/G), a nonsynonymous L/V change located in exon 11 was nominally associated with LOAD at genotypes (allele p = 0.18, genotype p = 0.05). Two LRP1 SNPs, rs9669595 (A/G), located in intron 65 (allele p = 0.02, genotype p = 0.04), and rs7956957 (G/C), located in intron 78 (allele p = 0.08, genotype p = 0.02) were nominally associated with LOAD. Two CTNNA3 intron 13 SNPs, rs7911820 (G/T) (allele p = 0.02, genotype p = 0.03) and rs7074454 (C/T) (allele p = 0.01, genotype p = 0.02) and 1 intron 14 SNP, rs12357560 (T/C) (allele p = 0.06, genotype p = 0.03) were nominally associated. One LRRTM3 intron 7 SNP, rs1925617 (T/G) was nominally associated with LOAD (allele p = 0.65, genotype p = 0.01). One NCSTN intron 2 SNP, rs2038781 (G/C) was nominally associated (allele p = 0.05, genotype p = 0.04). rs3832852 (ins/del), a 5-base insertion in A2M that spans the upstream splice site for exon 18 (allele p = 0.01, genotype p = 0.01) showed a nominal association. The APOE allele ε4 was highly significantly associated with AD in both allele and genotype tests (allele p < 0.001, genotype p < 0.001). The remaining seven markers nominally significantly associated with disease at alleles and genotypes were all found in ACE. The ACE markers were: rs4291 (A/T), 239 base pairs upstream of exon 1 (allele p = 0.02, genotype p = 0.01); rs4295 (G/C), an intron 2 marker (allele p = 0.07, genotype p = 0.03); rs4311 (C/T), an intron 9 marker (allele p = 0.1, genotype p = 0.03); rs4646994 (del/ins), a 287bp indel in intron 16 (allele p = 0.02, genotype p = 0.07); rs4343 (A/G), a synonymous coding SNP in exon 16 (allele p = 0.01, genotype p = 0.03); rs4353 (A/G), a marker in intron 19 (allele p = 0.04, genotype p = 0.04); and rs4978 (C/T), a synonymous coding SNP in exon 23 (allele p = 0.01, genotype p = 0.01).
Conditional logistic regression was run to estimate the effect sizes observed in the family sample among those markers that were nominally significant at either alleles or genotypes in families or case-control samples. Of note are effect size estimates for markers that were at least nominally significantly associated in the case-control sample (Table 4, and described below). These estimates attempt to remedy the bias encountered when effect size estimation and association detection are performed on the same data. The major allele homozygote was used as the referent group for these analyses. These results are detailed in Figure 2. APOE had a very strong effect in these data for the ε4 homozygote (OR = 31.1 95% CI = 7.37–130) and the ε4 heterozygote (OR = 4.57 95% CI = 3.28–6.57). Other than APOE, seven nominally significant single-locus genotype effects in five genes were observed in the family data. The PZP marker rs12230214 (OR = 1.42, 95% CI = 1.03–1.95) had a nominally significant effect for the CG heterozygote. Two CTNNA3 markers showed a nominally significant effect: rs12357560 (OR 1.37, 95% CI = 1–1.87) for the TC heterozygote and rs7074454 (OR 0.69, 95% CI = 0.48–0.99) for the TC heterozygote. The LRRTM3 marker rs1925617 (OR 0.619, 95% CI = 0.44–0.87) had a nominally significant effect estimated for the TG heterozygote. The A2M marker rs3832852 (OR = 1.81, 95% CI = 1.25–2.64) had a nominally significant effect estimate for the splice site deletion heterozygote. Two markers in ACE had nominally significant effect estimates. They were rs4291 (OR = 0.48, 95% CI = 0.21–1.0) for the A allele homozygote and (OR = 0.64, 95% CI = 0.47–0.88) for the AT heterozygote, and rs4295 (OR = 0.62, 95% CI = 0.45–0.85) for the GC heterozygote.
The results of tests at single loci from the case-control data are in Table 4. Three markers in 2 genes significantly deviated from HWE in controls. One was the PZP marker rs12230214, minor allele frequency (MAF): 0.28 (p = 0.01). The AGT markers rs5050, MAF: 0.16 (p = 0.04) and rs4762, MAF: 0.123 (p = 0.01) also significantly deviated from HWE.
Allele and genotype frequency differences among controls between males and females were significant at 4 markers in 4 genes. These tests were conducted to make observations regarding potential confounding by sex where sampling error had caused association of autosomal alleles and genotypes with sex in controls. Such spurious associations in the data might lead to confounding since there is an association between sex and AD.
PZP marker rs12230214 (allele p = 0.01, genotype p = 0.05), LRP1 marker rs1800127 (allele p = 0.03, genotype p = 0.03), LRRTM3 marker rs942780 (allele p = 0.01, genotype p = 0.02), A2M marker rs3832852 (allele p = 0.01, genotype p = 0.02) had significantly different frequencies by sex in controls at both alleles and genotypes. Each of these markers was tested separately in males and females for HWE and allele and genotype frequency differences between cases and controls. Among these tests, significant deviations from HWE were found in control females for PZP marker rs12230214 (p = 0.02).
Nominally significant single-locus differences in allele or genotype frequency between cases and controls were observed at 3 markers in 2 genes. One marker in ACE was nominally significantly associated with disease at alleles. rs4343 (A/G) MAF: 0.46, a synonymous SNP in exon 16 (allele p = 0.05, genotype p = 0.14). Two markers in CTNNA3 were nominally significantly associated with disease. The markers rs6480140 (A/C) MAF: 0.37, a SNP in intron 14 (allele p = 0.61, genotype p = 0.01) and rs997225 (A/G), a SNP in intron 10 (allele p = 0.03, genotype p = 0.08). The APOE marker MAF: 0.09, (allele p < 0.001, genotype p < 0.001) was statistically significantly associated with disease.
Merged p-values for the family and case-control single-locus tests were analyzed using Fisher’s method [Fisher, 1950]. This approach allows for the evidence against the null hypothesis across tests to be combined into a single statistic for each null hypothesis. Global p-values were used from the family-based tests on genotypes. The results of this analysis are presented in Supplementary Table 1. Several markers in ACE were nominally significant at alleles and trending at genotypes. CTNNA3 marker rs7074454 was also nominally significant at alleles and trending at genotypes. A2M SNP rs3832852 was nominally significant at alleles and genotypes. Both of these SNPs had effect size estimates which indicated opposite risk alleles (Figure 2, Figure 3). Again, Only APOE survived a correction for multiple tests.
To estimate the effect of each significant finding from family or case-control data, odds ratios and 95% confidence intervals using the major allele homozygote as the referent group were estimated in the case-control data using logistic regression from the STATA statistical software package (STATA). Since the markers associating with LOAD did not significantly differ in genotype frequency by sex, no adjustment for confounding by sex was performed. Also, since no difference was detected between age of onset and controls, no adjustment for age was performed. These results are presented in Figure 3. Of note in these results are those loci which demonstrated nominal association in the family sample. Statistically significant effects were detected at APOE ε4 homozygotes (OR 16.1 95% CI = 8.6–30.2), APOE ε4 heterozygotes (OR 4.55 95% CI = 3.28–6.29). Nominally significant effects were estimated at the CTNNA3 SNP rs997225 GA heterozygote (OR 1.39 95% CI = 1.02–1.89) and ACE SNP rs4343 for the minor allele homozygote, (OR 1.49 95% CI = 1.0–2.23). Markers in ACE were also assessed for effect size adjusted for hypertension status. No regression term for any marker was statistically significant in that analysis, but the OR point estimates did not change, indicating that hypertension was not a confounder for those variables.
Haplotype analysis was performed across all candidate markers in pairwise LD as defined by a D’ of 0.95 or greater in the family data with APL using a 3-locus sliding window. These tests identified overlapping 3-locus haplotypes in the ACE gene that were significantly associated with AD in the family data set. Results of this procedure are in Table 5a. These results suggest a consistent signal of association with disease on a common haplotype background throughout these ACE markers. This signal is from a chromosome containing an array of minor alleles at each of these markers. This diffuse association signal is detectable at each individual marker, but this phenomenon is also observed in the case-control data, which makes the family result worthy of note. Also, the p-values observed at these overlapping 3-locus haplotypes are smaller than those for most of the single-locus statistics.
In the family data, the ACE gene contained several significant markers and overlapping associated haplotypes. No other regions in the family sample contained significant haplotypes. To follow up this observation, and to validate the haplotype findings in ACE from the family data, the Haplo.Stats software package was used to estimate haplotype frequencies and test haplotype associations in ACE in case-control data. Haplo.Stats uses an EM algorithm to estimate haplotype frequencies from unphased genotype data. The results of both sets of tests in family and case-control data are presented in Tables 5a and 5b. A sliding window scan of the markers in ACE, analogous to that performed in the family data, was conducted among markers in strong LD (r2 > 0.9, D’ > 0.95) in both datasets. This scan yielded an odds ratio and 95% confidence interval for all haplotypes versus the most common haplotype, and a chi-squared test for haplotype frequency differences between cases and controls. Every 3-locus haplotype in ACE between rs4311 and rs4978 had a chi-squared p-value < 0.05. The 2-locus haplotype including rs4291 and rs4295 was not significant in either dataset. The 6-locus haplotype including rs4311, rs4329, rs4646994, rs4343, rs4353, and rs4978 had an OR estimate very close to 1.2 and 95% confidence intervals at approximately 1.0–1.5, which was very similar to those estimates for the 3-locus sliding window through that region. This indicates that chromosomes in this area of the gene tend to be either all major or minor alleles with little recombination in two primary haplotypes.
Clade-based haplotype analysis was conducted using markers rs4291 and rs4343 as suggested in Katzov et al [Katzov et al., 2004]. We applied this model to our samples and further evaluated haplotype association to LOAD, observing similar results as the previous study using this approach [Katzov et al., 2004]. For exposure to the 2-marker haplotype corresponding with clade A (frequency: 0.35, A-A) versus clades B (frequency: 0.46, T-G) and C (frequency: 0.18, A-G), the p-value in the family sample was statistically significant at p = 0.0004, and for the case-control data, p = 0.029 (OR = 1.3, 95% CI = 1.06–1.54).
Having explored single-locus main effects at alleles, genotypes, and haplotypes, we began a search for multi-locus signals significantly associated with disease using the MDR-PDT in family data and MDR in case-control data. The MDR-PDT models are presented in Table 6 and Figures 4a and 4b. MDR and MDR-PDT were run with all markers and every model including the APOE marker was highly significant by the permutation test. Since the strength of the APOE signal obfuscated other potentially interesting multilocus models, the APOE locus was excluded from the search and a subset of tag SNPs were chosen from the data. Haplotype tag SNPs were chosen using the haploview software function tagger (r2 = 0.8, LOD = 3), and the best models were found by MDR and MDR-PDT. The best two and 3-locus models from the full data without APOE contained the same markers as those chosen from the tag SNP data for the MDR-PDT. This indicated that the signal observed at these models was detected by MDR-PDT, but the known issue of power loss with increasing numbers of markers caused the failure to reject. No MDR model was significant by the permutation test. The best MDR model was a 3-locus model including LRP1 SNP rs1800165, PZP SNP rs3213831, and PZP SNP rs10842971 (CVC 2/5, PE 43.41, p-value = 0.34). Two significant signals were found by MDR-PDT. The 2-locus model included rs1925617 in LRRTM3 and rs4295 in ACE (MDR-PDT statistic p < 0.001). The 3-locus model included rs1925617 in LRRTM3, rs4291 in ACE, and rs1800433 in A2M (MDR-PDT statistic p < 0.001).
These results highlight the ACE gene as a risk factor in LOAD. In both family and case-control samples, significant associations were observed when considering ACE haplotypes. Notably in the case-control samples, only one single-locus test was marginally significant at rs4343 for the test of allelic frequency differences, but the haplotype tests on specific, overlapping sets of alleles were nominally significant. This result at rs4343 is consistent with previous work in ACE [Katzov et al., 2004; Kehoe et al., 2003]. The p-values from the family data haplotype analysis were also smaller than those from the single-locus analysis, suggesting the variation influencing LOAD was more efficiently captured when considering the entire region. When using markers that denote ancestral clades in European populations, the association signals were even stronger, which is consistent with the results of Katzov et al [Katzov et al., 2004]. This finding strongly supports the existence of a genetic background in the ACE locus in European descent Caucasians that is associated with LOAD.
Evidence for ACE association was found across many studies in recent meta-analyses of association results [Bertram et al., 2007a; Lehmann et al., 2005]. Haplotype associations have also been previously observed in ACE for AD in five independent case-control samples [Kehoe et al., 2003], including a large Swedish sample [Katzov et al., 2004], and in an inbred Israeli Arab sample [Meng et al., 2006]. In the Meng et al paper, the haplotype distribution was quite different from that observed in these samples and the alleles in the associated haplotype were opposite to those reported here. These results agree with the Kehoe et al study, both with regard to approximate effect size and haplotypes, and the Katzov et al study, with regard to A clade association. Additionally, cladistic analysis of haplotypes yielded statistically significant results compared with the nominally significant analysis of larger numbers of SNPs. This result further supports the use of ancestral clades in the ACE locus to evaluate associations to traits in European descent populations. We are unaware of a study evaluating ACE association with LOAD with a larger combined sample not featuring meta-analysis methods in the literature.
ACE plasma concentrations have been shown to be increased in persons bearing a 289bp deletion in intron 16 of the gene [Rigat et al., 1990]. This ACE I/D has also been shown to be a risk locus for cardiovascular disease [Hessner et al., 2001; Malik et al., 1997], which may share some common etiological factors with AD [Breteler et al., 1994; Hofman et al., 1997]. The ACE intron 16 I/D has been previously reported to associate with AD in Caucasians [Alvarez et al., 1999a; Kehoe et al., 1999; Kolsch et al., 2005; Mattila et al., 2000], Japanese [Hu et al., 1999], and Chinese samples [Cheng et al., 2002; Wang et al., 2006; Yang et al., 2000]. This association has a plausible biological explanation, since ACE degrades Aβ peptide in vitro [Hu et al., 2001; Hemming and Selkoe, 2005; Oba et al., 2005; Sun et al., 2008; Toropygin et al., 2008], and the insertion allele, which is on the A clade haplotype [Farrall et al., 1999], is associated with decreased plasma levels of the ACE protein [Hemming and Selkoe, 2005; Rigat et al., 1990; Tiret et al., 1992]. The biological connection between ACE and AD has been explored with clinical trials that provided evidence favoring use of ACE inhibitors to treat AD [Hanon and Forette, 2004] or dementia [Tzourio et al., 2003], and other studies which observed no benefit of ACE inhibition, but did observe significant benefit of hypertension therapy in a large prospective study [Khachaturian et al., 2006]. These studies are reviewed in [Kehoe and Wilcock, 2007]. Additionally, in vivo studies in mouse models have demonstrated that ACE inhibitors can improve cognitive performance and reduce amyloid protein levels [Wang et al., 2007; Zou et al., 2007].
Overall, association results in the ACE gene for Caucasians have been inconsistent in past studies. In 31 case-control association studies in Caucasians with markers in ACE reviewed by [Bertram et al., 2007a], there were 12 positive findings, 15 negative findings and 4 trends suggesting association between ACE and AD. The sample sizes were larger in studies where positive associations were observed (1-sided t-test p-value for cases = 0.04, p-value for controls = 0.03, p-value for overall sample size = 0.02), suggesting that differential power might explain some of the previous inconsistency. Additionally, A2M associations have been inconsistent, where in Alzgene.org there have been six positive associations, two trends and 33 negative associations observed in case-control Caucasian samples for markers in that gene. However, in family data there were three positive and one trend associations. Most of those studies were performed on rs3832852, where we observed inconsistent results for allele effects. There was not a significant sample size difference across study outcomes for sample sizes. For CTNNA3, there were seven negative and one positive association in case-control samples, but in family samples three of four studies found variants associating with AD. Evidence exists for associations at each of these genes, and interactions among them may explain some of the previous inconsistency [Ioannidis, 2007].
Interactions likely are relevant to genetic epidemiology of LOAD, and the MDR-PDT provides a capability to search for such effects in family data. MDR, the analogous approach in case-control data, has been used to find several interactive effects for various phenotypes. MDR has been used to detect genetic interactions contributing to risk in several diseases. Some examples are: sporadic breast cancer [Ritchie et al., 2001], essential hypertension [Moore and Williams, 2002; Williams et al., 2004], atrial fibrillation [Tsai et al., 2004], type II diabetes [Cho et al., 2004], coronary artery calcification [Bastone et al., 2004], myocardial infarction [Coffey et al., 2004], schizophrenia [Qin et al., 2005], and amyloid polyneuropathy [Soares et al., 2005]. MDR-PDT has been shown in simulation to have better power than MDR when families are large, as in these data [Martin et al., 2006].
The multilocus model presented here suggests that there may be a functional axis of effects predicting LOAD in these data. It is also possible that this is an artifact arising from the presence of main effects at some of these loci. The non-replication from the case-control dataset casts some further doubt on the strength of this model. There is a chance that this is a type II error, caused by a smaller case-control sample, or by the similar ages of onset of cases to ages of examination in controls, leading to loss of power since controls were not sufficiently aged to be certain not to develop LOAD in the future. We caution that these results should be considered preliminary with regard to the presence of effect modification among these loci. Recent experiments in simulation show that fitting regression models to evaluate effect modification in the same data where MDR or MDR-PDT models are found is extremely unreliable (unpublished results). Independent samples should be obtained to properly test after a search for interactions, or a valid test procedure for the null hypothesis of no interaction should be developed. These candidates also are all related to Aβ clearance, providing a biological rationale for this multilocus model.
Future directions for these investigations into the mechanism underlying late-onset AD should include further investigation of genes in the Aβ degradation pathway as well as functional studies targeting potential molecular etiologies of LOAD involving ACE. Plasma levels of ACE and putative downstream targets relevant to AD should be included in future study designs to make observations regarding coordinate regulation, feedback systems, and continuous measurements further explaining this pattern of association. The presence of A2M and CTNNA3/LRRTM3 SNPs in the MDR-PDT model also point to Aβ accumulation as a factor predicting late-onset AD, as these genes all relate to Aβ clearance. It may be that we have already discovered the main causes of AD in the constellation of weak main effects that have been observed to date. The attributable risk of the ACE haplotype alone explains about 16.6% of late-onset AD cases among those exposed to the haplotype. For the entire Caucasian population, the population attributable risk (PAR) for the haplotype explains about 8%, or 320,000 late-onset Alzheimer’s cases. This contrasts with the single locus PAR of 35% reported by [Kehoe et al., 2003] for rs4343, and the large effect size in ACE reported by [Meng et al., 2006] in an inbred population. When we apply the same dominant model from our case-control data as in the Kehoe paper, the PAR is 21%. For the effect size and exposure rate estimate from our family data, the PAR is 28% for rs4343. The A allele of rs4343 is also reported as associated in the Kehoe et al haplotype, in the Katzov cladistic model, and in our data. The Alzgene meta-analysis for this SNP also shows a significant effect at this locus [Bertram et al., 2007a]. This concordance of statistical and laboratory results implicates ACE variation as not only contributing modest risk, but also supports the biological hypothesis regarding plasma concentrations of ACE and the relationship to Aβ concentrations.
This work was supported by National Institutes of Health grant AG20135. We would also like to thank Dr. Chun Li for providing feedback. We would also like to thank Dana Hancock for providing the code for the conditional logistic regression in SAS.