The thesis that genetic predisposition to complex diseases results from small genetic effects from multiple loci has now been tested in a very large number of genome-wide association studies (GWAS) with varying degrees of success [4
]. GWAS test up to a million relatively common SNPs, individually, for association with a phenotype. This large number of association tests increases the rate of false discoveries and each test incurs a statistical correction penalty, which limits the power to detect a significant associations. These studies have reported, with a couple of notable exceptions, that common SNPs impart only small risk of disease, and that the association signal may not be readily reproduced in independent cohorts. Certainly, to date, there are no GWAS results with the robust effect size and reproducible significance of the association of the TOMM40-APOE-APOC1
region and risk of LOAD. This risk is attributed to the 4 allele of APOE
Even now, with several thousand patients and unaffected subjects tested in GWAS of LOAD, other association intervals show only borderline statistical significance and relatively low effect size. See, for example, which presents the results of a recent GWAS of LOAD [2
]. The associations between SNPs within the APOE
LD region with disease are highly significant (P values less than 10−37
) with odds ratios ranging from 0.6 to >2. The significance values for the associations of other loci with this disease are markedly smaller (P values less than 10−8
) with odds ratios much closer to 1. While the large size of GWAS typically allows detection of these modest association signals, and although many of the recent LOAD studies have received attention in the public press, the utility of these modest associations is not clear.
Example of GWAS results for LOAD
All nine published LOAD GWAS demonstrated a highly significant association between SNPs in the APOE
LD region and LOAD [2
]. It should be emphasized, however, that the specific polymorphisms that identify the 3 and 4 alleles of APOE
are not included on the platforms used in these GWAS. Nevertheless, it was assumed in eight of the studies that the highly significant SNP association signals from the “APOE
” region were due to the robust disease risk signal previously described for APOE4
. Only one of these GWAS studies referred to other potential candidate genes within the APOE
LD region [12
The discovery that the TOMM40
gene, encoding an essential mitochondrial protein [T
embrane translocase], was adjacent to and in strong LD with APOE
] spurred a series of important biological studies. An apoE4 peptide was shown to interact with mitochondria, specifically with the outer membrane [14
], and apoE or apoE fragments was shown to induce mitochondrial toxicity and reduce in mitochondrial motility [15
]. It has also been demonstrated that amyloid beta precursor protein (APP) interacts with mitochondrial translocase-containing complexes [17
] and accumulates in the outer and inner mitochondrial import channels leading to mitotoxicity [18
]. In the brains of LOAD patients, the amount of APP caught in the channels varied with disease severity and patient genotype, with APOE3/4
patients having the highest amount of mitochondrial APP accumulation [18
]. It is possible that apoE isoforms influence the targeting of APP to mitochondria, perhaps via differential binding between APP/apoE isomers [19
] or differential interaction between apoE truncated peptide fragment isoforms and mitochondria. The molecular and genetic studies that implicated mitochondrial dysfunction in LOAD pathogenesis were complemented by positron emission tomography (PET) studies that demonstrated reduced glucose utilization in the brains of Alzheimer s Disease patients or in cognitively normal people who were carriers of APOE4
]. There is also APOE4
dose-dependent decrease in glucose consumption in affected brain regions, and lack of efficacy of a PPAR agonist for the treatment of LOAD for carriers of APOE4
Since coincidences do not generally occur in nature, the biological data suggest that the large, highly significant and reproducible disease susceptibility signal associated with the APOE
LD region is due to contributions from both the TOMM40
genes to the disease mechanism or pathway [12
Phylogenetic strategies are widely used outside of human genetics for the analysis of the evolution of simpler organisms with smaller genomes and shorter generation times, for viruses and bacteria for example [28
]. Practical uses of phylogenetics in human medicine include the study of the evolution of the H. influenza
virus to generate the annual vaccines used to prevent infection and for elucidating the evolution of drug resistance in the human immunodeficiency virus [31
]. Phylogenetic approaches for analyzing human genomic DNA sequences reveal the spatial relationships of genetic variants that occur on chromosomes with different mutation histories. In the special case of a region of high LD (or low recombination), the mutations stay connected during evolution. This pattern of co-inheritance can provide a view of the order (or timing) for each mutation in a region of interest.
A phylogenetic approach to ordering mutations that occurred over evolutionary time can show how mutations, rare and common, relate to each other and these relationships are presented in the form of a phylogenetic tree. The tree or phylogeny illustrated in is more appropriately a genealogy since it is constructed from DNA sequences from one species, human, and is constructed from DNA sequences from the APOE LD region from LOAD patients and controls. Each node in the tree represents the ancestral sequence of the two lineages that branch from it. At the tips of this tree are clusters of highly similar sequences from the APOE LD region of all of the sampled chromosomes.
Phylogenetic tree that resolved with strong bootstrap support and without prior consideration of APOE allele for a 10 Kb region in the TOMM40-APOE LD block
Events such as mutation and recombination introduce diversity in each chromosome and each of these events is subsequently inherited by chromosomes in that lineage. If a mutation causes susceptibility to a disease, then this variant will tend to be clustered into one lineage on a phylogenetic tree. It then follows that chromosomes of affected and non-affected individuals will be non-randomly distributed on the phylogeny [35
]. If a phylogeny is developed from a genomic region that is known to be associated with disease then each cluster of related haplotypes on the tree can be tested for association with a phenotype of interest, and the variants that distinguish each cluster can be further analyzed. This clustering technique ensures that infrequent or rare mutations are not lost to the analysis. In a GWAS, where a single SNP is used to tag a complex LD region that may be more appropriately represented by groups of evolutionarily-related haplotypes of variants, the genetic locus may appear to impart only a small genetic effect if the effects of different haplotypes essentially cancel each other.
Molecular evolutionary analysis techniques, and specifically the development of phylogenetic trees, are being used with increasing frequency in a variety of different applications. Public domain computer programs are readily available to produce phylogenetic trees from DNA and protein sequence data. The availability of extensive genetic variation data makes it possible to study evolution that occurs over shorter time scales, particularly to test for selective pressures in recent human history. A very good book, entitled Phylogenetic Trees Made Easy: A How-to Manual [36
], describes in practical terms the steps for producing and interpreting phylogenetic trees.
The phylogenetic tree from the APOE-TOMM40 region () was published in our recent paper entitled, “A TOMM40 variable length polymorphism determines the age of late-onset Alzheimer’s disease”. For the reader interested in LOAD but uninitiated to phylogenetics, it may be useful to walk through .
Multiple mutations occurred within the TOMM40-APOE locus during the course of human evolution. Some of these mutations created branching points in the Caucasian phylogenetic tree whereas others occurred on the branches. Together or independently, may these mutations may be associated with disease phenotype. A number of mutations apparently occurred independently on chromosomes that carry APOE3 or APOE4 and, because there has been limited recombination in this region over the course of evolutionary history represented by the Caucasians that we sampled, each mutational event was preserved in the context of the APOE allele in which it occurred.
One of the polymorphisms in this region, a poly-T repeat at rs10524523 (“523”), appears to have mutated more than once in Caucasian history (See the five boxes in ). Based on the number of T residues, three categories (alleles) of repeat length are described, short [<20], long [>20], or very long [>30] ( and ). Each allele is represented by a range of poly-T lengths but the means of each length category are significantly different.
Poly-T repeats linked to APOE3 and APOE4
The major findings of our phylogenetic analysis of LOAD were: 1) that APOE4 was, almost without exception, always linked to a long poly-T repeat; 2) that the ε3 allele of APOE was linked to either a short poly-T or a very long poly- T repeat ( and ); and 3) that linkage to a short or very long poly-T for an APOE3 carrier resulted in later or earlier onset of LOAD, respectively, regardless of risk imparted by APOE4.
In virtually every publication of age of LOAD onset for APOE3/4 patients, the average age of onset is reported to be about 75 years. Potentially, a very long poly-T repeat in linkage with APOE3 may equal, or even exceed, the disease risk associated with APOE4. If this is true, then individuals with the APOE3/3 genotype and two copies of the very long “523” polymorphism may be at risk of developing LOAD at a very early age, perhaps <70 years. Patients who develop LOAD in their 50 s, but do not carry the rare autosomal dominant mutations that cause early-onset Alzheimer s disease, may be homozygous for the APOE3-very long “523” haplotype. As an interesting note, in the second cohort that we examined, we observed two, rare, APOE4 haplotypes in clade B that were linked to short “523” alleles and we speculate that these might represent the ancestral scenario. The intriguing observation was that the age of disease onset for both of these people was 78 years, which is similar to the age of onset for individuals who possess the short-APOE3/long-APOE4 genotype.
The poly-T variant was discovered because we deep sequenced the APOE
LD region. Deep sequencing was essential for accurately measuring the length of the poly-T allele in each instance and also for unequivocally determining the link between the poly-T polymorphism and the APOE
allele. The discovery of the “523” poly-T variant suggests that susceptibility to LOAD behaves as an autosomal co-dominant trait with the age of onset phenotype determined by the combined effects of each allele at the rs10524523 (“523”) poly-T locus [3
]. An interesting question is, then, which is the engine of the LOAD endophenotype (age of onset)? Is it the APOE
allele as was previously assumed, or the “523” poly-T length?
Beyond LOAD, a more general hypothesis is that, in the evolutionary history of other complex diseases, multiple evolutionarily-linked variants within a genomic interval, that are not necessarily in high LD [nor acting as coincident genetic loci], may determine risk of disease. In other words, haplotypes of evolutionarily-related variants may associate more significantly with the disease phenotype than individual, common SNPs that represent the average effect of a genomic interval. The approach that we have adopted in studying other complex diseases is to use phylogenetic analysis to fine-map regions that are reproducibly flagged by GWAS or linkage studies but which impart only small effect or narrowly miss genome-wide significance. In the case of LOAD, the phenotype (age of disease onset) was highly associated with multiple polymorphisms that arose over the course of human evolution at a single site (rs10524523) in the APOE LD region (see ). However, phylogenetic approaches may also provide a powerful mechanism for capturing the effects of disease susceptible variants that occur at multiple loci within a larger, disease-associated region.