|Home | About | Journals | Submit | Contact Us | Français|
Apolipoprotein E (APOE) ε4 alleles increase the risk for late-onset Alzheimer disease (LOAD) and decrease the age of onset. Recently, sequencing the APOE region in a small sample of LOAD subjects identified a variable length poly-T repeat sequence in the nearby gene, TOMM40, which may affect age of onset. We genotyped the TOMM40 poly-T repeat using a novel statistical approach to refine the identification of allele length in 892 LOAD subjects and evaluated its effects on age of onset. Because psychosis in LOAD is a heritable phenotype which has shown conflicting associations with APOE genotype, we also evaluated the association of poly-T repeat length with psychosis. Poly-T repeat lengths had a trimodal distribution which differed between APOE genotype groups. After accounting for APOE ε4 there was no association of poly-T repeat length with age of onset. Neither APOE ε4 nor poly-T repeat length was associated with psychosis. Our findings do not support the association of poly-T repeat length with age of onset in LOAD. The clinical implications of this repeat length polymorphism remain to be elucidated.
Late-onset Alzheimer disease (LOAD) is a neurodegenerative illness with substantial heritability (Gatz et al, 1997). Similarly, and not surprisingly, as the risk of LOAD increases in a highly age-dependent manner (Ferri et al, 2005), the age of onset of LOAD is also heritable (Pedersen et al, 2001). The gene with the most strongly established relationship to LOAD risk is apolipoprotein E (APOE), with increased risk of LOAD found in individuals carrying one or two copies of the ε4 allele (Farrer et al, 1997). The primary effect of APOE ε4 alleles on LOAD risk appears to be mediated via lowering the age of onset of Alzheimer disease, with a reduction of up to 7–9 years for each ε4 allele (Reitz and Mayeux, 2009).
The APOE ε4 allele has also been extensively examined for association with the presence of the psychotic phenotype of LOAD (LOAD+Psychosis, LOAD+P) (Sweet et al, 2003). LOAD+P is heritable (Bacanu et al, 2005;Sweet et al, 2010), and identifies a subgroup of LOAD subjects with more severe cognitive impairment and more rapid cognitive decline (Emanuel et al, 2011;Ropacki and Jeste, 2005). Unlike age of onset, the association of APOE ε4 allele with psychosis has revealed inconsistent findings, with slightly more negative than positive studies, and some studies showing evidence for a protective effect (DeMichele-Sweet and Sweet, 2010). Such a pattern could result solely from Type I error due to small cohorts with varying approaches to clinical characterization and analysis, however, a variable pattern of association can also arise due to a causal association with genetic variation in linkage disequilibrium with the APOE ε4 allele.
APOE ε4 is defined by a two SNP haplotype in APOE exon 4. SNPs rs429358 and rs7412 each code for either arginine (C) or cysteine (T). APOE ε4 alleles are the CC haplotype(with TT and TC defining ε2 and ε3 alleles, respectively). Recent investigations fine-mapping the region within and surrounding APOE on chromosome 19 identified a set of SNPs within the nearby gene, TOMM40, in linkage disequilibrium with the APOE ε4 allele (Yu et al, 2007) and affecting APOE expression (Bekris et al, 2010). This finding, in part, motivated an effort to sequence the APOE and TOMM40 region in subjects with AD, in an effort to identify possible causal variants within the linked region (Roses et al, 2010). Sequencing identified a variable length poly-T repeat sequence in intron 6 of TOMM40 that was in linkage disequilibrium with APOE ε4. Individuals with APOE ε3/ε4 genotype and long poly-T repeats (defined as ≥ 27) had significantly lower age of onset of LOAD than individuals with APOE ε3/ε4 genotype and short repeats (Roses et al, 2010).
Ultimately, reconciling the independent effects of APOE ε4 and TOMM40 repeat length polymorphism on age of onset of LOAD will require concurrent genotyping of large numbers of subjects. To address this goal, we developed an approach to high throughput genotyping of the TOMM40 poly-T repeat length polymorphism by starting with PCR to generate an initial estimate of allele sizes and then refining these estimates with a statistical model. We evaluated the independent and joint effects of these genetic variants in a large population of 892 Caucasian individuals with LOAD, examining both the age of onset and LOAD+P phenotypes.
A total of 892 Caucasian, non-Latino subjects with a final diagnosis of possible or probable AD (McKhann et al, 1984), all evaluated at the University of Pittsburgh Alzheimer Disease Research Center (ADRC), were included. All subjects were assessed as described previously (DeMichele-Sweet et al, 2011). All data collected in this study were obtained with protocols approved by the Institutional Review Board of the University of Pittsburgh.
Psychosis was evaluated with the CERAD behavioral rating scale (Tariot et al, 1995), as described previously (DeMichele-Sweet et al, 2011). Subjects were characterized as having no psychotic symptoms, a single psychotic symptom at only one time point, or multiple/recurrent psychotic symptoms, reflecting the increasing genetic loading associated with this hierarchy (Bacanu et al, 2005;Sweet et al, 2010). Finally, because the occurrence of psychosis is less frequent in the early stages of AD, subjects were required to have a Mini Mental State Exam (Folstein et al, 1975) score ≤ 20 in order to be classified as having LOAD without psychosis.
The TOMM40 polymorphic repeat was genotyped by PCR amplification with forward primer 5′-VIC-GAGATGGGGTCTCACTATG-3′ and reverse primer 5′-GTACAGGCCACAATGTG-3′, with an initial 3 minute denaturation at 95°C, followed by 35 cycles of denaturation at 95°C for 30 sec., annealing at 56°C for 30 sec, and extension at 72°C for 30 sec. PCR was carried out in a final volume of 10 ul containing 10pM primers, 200uM dNTPs, 2 mM MgCl and 1U of Taq polymerase (Invitrogen). Fragments were resolved on an ABI 3730 automatic fragment analyzer, with a LIZ500 size standard, and fragment sizes were initially estimated using the output from GeneMapper v4.0 software (Applied Biosystems Inc.).
The TOMM40 DNA sequence has an intronic poly-T (multiple thymine base pairs) that is highly variable at a population level. Our first statistical goal was to estimate the counts of T for each of the pair of alleles carried by subjects using the intensity signals obtained from the GeneMapper readout, where intensity (I) is some function of the number of times a particular length was replicated in the PCR process. The expected pattern for our method of measuring alleles would be to observe, for each allele, a maximum intensity of signal near the true count of T (N[T]), but distributed continuously with error around the true value, and PCR-based stutter around that peak that decays in intensity with increasing (integer) distance from N[T]. Thus, for an individual with two distinctly different poly-T alleles, a reasonable preliminary estimate can be obtained as the length associated with the maximum intensity of the two distinct peaks in an individual’s GeneMapper readout (Figure 1). Note in Figure 1 the PCR stutter. Measurement error is illustrated by Figure 2. If an individual is homozygous for poly-T alleles, then the global maximum is a good estimator for both alleles.
Because of the PCR stutter and continuous measurements, we smoothed the sizing trace from each individual as follows. (1) Lay down a series of overlapping bins of size 1 on the size range 337–364 (min-max observed), which when accounting for the 320 bp flanking region maps onto 17 to 44 T repeats. Bins were separated by 0.1, such that the first bin was 337–338, the second bin was 337.1–338.1, and so on. (2) Find the set of bins, starting points separated by count 0.1, which contain the maximum intensity signal in those bins. This establishes a grid of bins. (3) If the alleles are well separated (≥ 2T) find the pair of bins with maximum intensity mass. A maximum is defined in the context of surrounding bins so that to the left of the maximum the bin mass is increasing and to the right it is decreasing (Figure 1, in which the dots represent the maxima I for each bin). If there were more than 2 maxima, the largest two were chosen as the alleles and the rest assumed to be stutter artifact (Figure 1). If there were only 2 maxima, and the average intensity of the smaller maxima I < 300, then this maximum was again assumed to be a stutter artifact and the genotype assumed to be homozygous. Likewise, if there were only one maximum, again the genotype was assumed to be homozygous. (4) Round the length or lengths to integer value.
To assess measurement error, four samples showing different genotypes were measured five times. For each measurement we determined maxima as described above and determined the standard deviation of the maxima for each allele (Figure 2). We then evaluated whether the standard error was a function of repeat lengths, and could find no significant relationship. Counter to intuition, the error decreased slightly but non-significantly with increasing count of T. Thus we assumed the standard deviation of measurement is constant and equals the average over all repeated measured alleles, specifically 0.24. Note that roughly 95% of all maxima should be within rounding error of their true allele size.
To further refine allele calls, we next applied least squares methods to the data. This approach can be broken down into four steps in which two key assumptions are made. Our first assumption is that the intensity obtained from the GeneMapper readout reflects the probability of the pair of alleles true sizes (Figure 1). Using the grid of bins established previously, we assume the relative intensity mass (relative to total intensity) in each bin is equivalent to the probability of an allele’s true size. Our second assumption is that the distribution of probabilities of obtaining any lengths given a selected “true mean” is approximately normal. For the standard deviation of these normal distributions we use the standard error obtained from our measurement error analyses.
The steps in the least squares method follow: (1) Under the first assumption, create a probability mass function from the GeneMapper readout that yields discrete probabilities for each allele length which, when summed, is equal to 1. We will call this the stutter probability mass function (PMF). (2) Under the second assumption, find the probability mass function for each proposed possible true genotype, which also sums to 1. We will call these the normal PMFs. (3) Find the proposed genotype that minimizes the residual sum of squares of the two PMFs.
In the first step, in order to obtain the stutter PMF, we first found the sum of all the averaged binned intensities, and divided each binned intensity average by the total. This yielded the probabilities for each discrete integer value for length, and forms the stutter PMF (Figure 1).
In the second step, we first found all plausible pairs of alleles that could be the true genotype of the individual. Based on measurement error, these correspond to one repeat length to the left and right of each allele in the proposed genotypes defined by the identified maxima. All possible pairwise combinations of these alleles – following the rule of gametes – comprise the set of proposed genotypes. For each proposed genotype, the normal distribution for each allele follows from its location (mean), and the standard deviation of measurement error. The height of the normal curve at each discrete length was treated as reflective of the probability of the allele’s true size. We then found the sum of all the heights of the discrete integer lengths from 0 to 45, and divided each height by this sum. This yielded probabilities for each discrete integer value for length and forms the normal PMFs.
In our third step, we compare the two PMFs. Denote the stutter PMF as f (z), and the normal PMF as f (z*). Find the genotype that minimizes the residual sums of squares (RSS).
We used the expectation-maximization algorithm (Supplement) to arrive at the final genotype calls.
In our modeling we had two response variables, age of onset of LOAD measured in years and LOAD+P (multiple/recurrent psychotic events, N=295 versus no psychotic events, N=324). Individuals with just one psychotic event were not included in the analyses. Predictor variables were the count of APOE ε4 alleles summary variables derived from the complex poly-T allele distribution. Specifically, to encode the poly-T genotype we followed Roses et al. (Roses et al, 2010) by creating a binary variable that had a division between small and long repeats (long > 27), referred to as the Roses encoding. For a richer predictor we defined an ordinal factor based on the poly-T allele distribution (Table 2). We analyzed each response variable by first constructing marginal models with each single predictor variable, and then constructing joint models using combinations of predictor variables that always included the APOE ε4 allele count. For age of onset, we used linear regression models and, for psychosis risk, we used conditional logistic regression models. All analyses were implemented in R.
Demographics and clinical characteristics for the 892 participants with APOE genotypes are shown in Table 1. All subjects had late-onset AD (age of onset ≥ 60); the vast majority (91.3%) was diagnosed with Probable AD.
The frequency distribution of repeat lengths over all the subjects as a whole (Figure 3) has several peaks in the distribution, consistent with published estimates (Roses et al, 2010). These peaks occur at approximately 17, 31, and 37, with diminishing frequencies of counts on either side of the peaks. The distribution is easily categorized to small, medium and long lengths of repeats. Based on these natural separations, we defined our poly-T variable with small repeats as being those of length ≥ 17 but <25, medium repeats as those ≥ 25 but <34 repeats, and long repeats as those repeats ≥ 34.
The distribution of repeat lengths of the ε3/ε3 carriers peaks at two locations, the most common allele length of 17, the other at allele length of 37, with respective relative frequencies being approximately 0.59 and 0.20. The ε3/ε4 carrier distribution of repeat lengths peaks in three places; 17, 31, and 37, with respective relative frequencies of 0.35, 0.26, and 0.13. For our ε4/ε4 carrier distribution a peak count of Ts falls at repeats 30 and 31, with approximate relative frequencies 0.53 and 0.30, respectively (Figure 4).
For age of onset, we fitted models with one predictor variable at a time and found that the count of APOE ε4 alleles in genotype, the Roses encoding of poly-T repeats, and the count of medium lengths in our ordinal factor variable were significant (Table 2). We then created joint models for age of onset in which the count of APOE ε4 alleles was always in the models. After accounting for the predictive value of APOE ε4 allele counts, neither of the poly-T predictors was significant (Table 3).
To predict LOAD+P, we again constructed marginal models with each predictor variable. Neither the Roses encoding nor the count of medium repeats in the ordinal factor variable were significant (Table 2). No association was noted for APOE ε4 (Table 2). The joint logistic models supported the findings of no association to psychosis risk (Table 3).
Finally, to mimic the analysis in Roses et al (Roses et al, 2010), we analyzed data restricted to carriers of the APOE ε3/ε4 genotype. We fit models to predict age of onset and LOAD+P with each independent variable, but none of the results were significant (Table 4). Therefore, we conclude that only the APOE polymorphism has an impact on age of onset and there is no association between APOE regional variation and LOAD+P.
We did not find that the TOMM40 poly-T repeat length polymorphism is associated with age of onset in LOAD independently of the effect of APOE ε4. The lack of independent association was consistent whether categorizing the TOMM40 poly-T repeat length polymorphism along the lines of its apparent trimodal distribution, or categorizing it as originally defined (Roses et al, 2010). Similarly, we found no association when confining our analysis to the additional impact of long TOMM40 poly-Ts in subjects with the APOE ε3/ε4 genotype. The reason for the discrepancy between the current findings and those reported by Roses et al (Roses et al, 2010) is not known, but could include differences within the two subject pools in the nature of linkage disequilibrium between APOE and the poly-T repeat length polymorphism. Nevertheless the large size of our sample, including 405 APOE ε3/ε4 subjects in contrast to 34 APOE ε3/ε4 subjects in the original report, enhances confidence in the current findings.
We had hypothesized that conflicting findings regarding the association of APOE with LOAD+P could result from linkage disequilibrium with the intron 6 TOMM40 poly-T repeat sequence, with the latter providing the true association with LOAD+P. The current findings do not support this hypothesis. Our findings do not preclude the possibility that other nearby genetic variation may contribute to the inconsistent associations between LOAD+P and APOE. Alternatively, the conflicting prior reports may have resulted due to differences in subject populations, sample sizes, definitions of LOAD+P, analytic approaches, or the observation that small, but real effects often show inconsistent results in studies with small sample sizes. In that regard, the current findings in a homogeneously assessed large cohort provide substantial evidence against any true association of APOE ε4 with LOAD+P. Similarly, we recently evaluated the association of APOE with LOAD+P in the Uniform Data Set collected by the National Alzheimer’s Coordinating Center, comprising 2317 individuals with LOAD, 802 (34.6%) with psychosis, the largest cohort examined to date (DeMichele-Sweet et al, 2011). Once again APOE ε4 was not associated with LOAD+P.
Accurate assessment of the number of Ts in a polyT repeat of this size is challenging using standard molecular technology. Nonetheless we obtained very good precision and good accuracy by PCR amplification followed by size estimation from an ABI 3730 automatic fragment analyzer. Based on results from repeated assessment of the same samples we showed that 90–95% of these length estimates were within rounding error of their true size. To make these estimates even more accurate we built a statistical model that essentially shrank the measurement error. In this way we developed a cost-effective and accurate means of determining poly-T genotypes at the TOMM40 locus. Though our findings cast doubt on the importance of assessing the intron 6 poly-T gentotype in TOMM40 for LOAD genetics, nevertheless our method should have straightforward application to poly-T repeat sequences at other loci.
This work was supported by National Institute on Aging (NIA) [Grants AG 027224, AG030653, and AG 05133].
The supplement contains the EM algorithm.
The authors have no conflict of interest to report. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.