|Home | About | Journals | Submit | Contact Us | Français|
I coauthored a recently published research paper demonstrating that a variable length, poly-T polymorphism in the TOMM40 (Translocase of the Outer Mitochondrial Membrane 40 homolog (yeast)) gene, which lies adjacent to APOE on chromosome 19, accounts for the age of onset distribution for a complex disease, late-onset Alzheimer's disease (LOAD).1 These new data explain the average age of disease onset for patients with the APOE4/4 genotype, and differentiate two forms of TOMM40 poly-T polymorphisms linked to APOE with each form associated with a different age of disease onset distribution.2 When linked to APOE3, the longer TOMM40 poly-T repeats (19–39 nucleotides) at the rs10524523 (523) locus are associated with earlier age of onset and shorter TOMM40 alleles (11–16 nucleotides) with later onset. The data suggest that the poly-T alleles are co-dominant, with the age of onset phenotype determined by both of the two inherited alleles but with variable expressivity. Additional data will further refine the relationship between the length of the poly-T alleles and age of disease onset and determine if the relationship is linear.
Genetics has progressed remarkably fast and probably has affected diagnosis and treatment in neurology as dramatically as in any other area of medicine except, perhaps, oncology. New drug treatments are more difficult and take longer to discover and develop in the field of neurology.3 Much of the genetics has only recently been learned, creating a long list of metabolic and structural mutations that seems to justify the “splitters” in the old wars of “lumping or splitting” similar clinical presentations. Several genetic mutations that explain the majority of early-onset Alzheimer's Disease (EOAD) have been discovered4. Estimates of the genetic load for LOAD that is due to APOE4 range from 20%–70%, with most investigators agreeing on about 50%.4
We have used a phylogenetic approach to examine the region around the APOE gene and have made two related discoveries: Age of onset for this late onset, complex disease is determined by a traditional form of Mendelian inheritance at the poly-T locus in TOMM40. The age of onset phenotype complexity is introduced by each polymorphic, poly-T allele rather than a myriad of small effects from many other genes. The phylogenetic methodology was critical for identifying the age of onset-associated polymorphism. The TOMM40 gene is located next to the APOE gene in a region of strong linkage disequilibrium (LD). This means that through human evolution, few recombination events have occurred between the two genes and specific variations in each gene have been co-inherited over the course of human evolution. However, additional mutations have accumulated within the LD region containing both TOMM40 and APOE providing genetic heterogeneity. The phylogenetic experiments and analyses leading to the discovery of the role of the TOMM40 523 polymorphism are detailed elsewhere.1 The purpose of this review is to convey an explanation of the methodology, data, and clinical implications for physicians and clinical scientists.
In this review, I will try to explain these highly technical and complicated data to my fellow neurologists who probably have never been exposed to phylogenetics. I apologize to those who have greater understanding of the discipline.
Figure 1 shows the location of many SNPs and at least four poly-T tracts in TOMM40. Located in intron 6 is 523, the variable poly-T repeat that is associated with age of LOAD onset; the bottom of this illustration shows a histogram of frequency of the different-length poly-T alleles (in numbers of T resides) in a case/control population. Also indicated in this cartoon is rs8106922 (922) which is the SNP that, in our analyses, anchors the split between two groups of evolutionarily-related sequences that are grouped into the branches (clades) designated A and B (figure 2, left and right). The initial search for the potentially functional location in the LD region that comprises APOE, TOMM40 and APOC1 was guided by previous biological data.5 6, 7 The goal was to see if one of the clades was enriched for LOAD cases. Within this LD region, polymorphisms within a specific 10 Kb piece of TOMM40 supported the formation of a statistically robust phylogenetic tree that enriched LOAD cases into one of the clades. While the cohort used for the analysis shown in Figure 2 had a 2:1 ratio of patients to controls, the patient to control ratio in clade A was increased to 2·7:1, while Clade B dropped to 1·7:1 (Figure 2, left panel). No other region that we analyzed supported a robust phylogenetic tree structure and did not enrich for cases into a branch. When we examined the distribution of APOE genotypes across the clades, we found that almost all the APOE4/4 patients resolved to clade A, while APOE3/4 and APOE3/3 were present in both clades. This is where the data stood in 2007 and is the data that was presented at the IX International Meeting on Human Genome Variation and Complex Genome Analysis (September 7, 2007). There was a strong hint that something was going on in this region – which happened to be located in the intronic sequence of TOMM40, not APOE.
Figure 2 illustrates the data from the second of two independent, case/control series that we have analyzed.1 The samples for this analysis were provided by the Arizona Alzheimer's Disease Research Center. There were 105 individuals, or 210 DNA strands, in the analysis with almost two AD patients for each cognitively-normal, APOE genotype-matched, age and gender-matched control. The polymorphisms that were analyzed were identified by deep Sanger sequencing of the region of interest from all copies of chromosome 19 in the analysis. The data in the right and left sides of figure 2 are identical, but with different information superimposed for explanatory purposes.
From left to right on figure 2, beginning with the 922 SNP, the vertical lines represent successive mutation events in the parent sequence that create branch-points in the tree. SNP 922 is most effective in distinguishing the two main branches of the phylogenetic tree; the A allele is shared by all sequences in one major branch of the tree and the T allele is shared by all sequences in the second major branch. This is an “unrooted” tree – it has not been rooted by including an obvious ancestral sequence which would help to determine the true chronology of mutational events. Therefore, in this article, use of terms that imply timing refer to the apparent order of events according to this tree structure and not the absolute timing of mutational events. This simplifies the discussion. However, even in the absence of the chronology, comparison of divergent clades that enrich for a phenotypic variable, like age of onset of AD, provides an opportunity to screen the clades for mutations related to that disease. Each horizontal line in this figure indicates a sequence; a `stack' of horizontal lines indicates a sequence that is shared by multiple individuals. Each terminal clade on this tree appears to have a different mutational history. The circles on figure 2 (left panel) enclose mutations that introduce sequence diversity and also divide the population into related, but distinct, clades. The terminal clades are indicated by the boxes in figure 2 (right panel). This figure illustrates that mutations occur on chromosomes in the context of earlier mutations giving rise to sequence diversity even within regions of strong LD. Identifying a collection of phased mutations or polymorphisms within a region of strong LD, as represented in figure 2, typically exceeds the capacity of genome-wide association studies (GWAS). In addition, since many of these multiple mutations are uncommon they are not adequately assayed in GWAS.
As described earlier, clade A on the phylogenetic tree constructed from this region of TOMM40 was enriched for LOAD patients. Further analysis showed that this enrichment was due at least in part to an unequal distribution of subjects with the APOE4/4 genotype into clade A: 24% of the genotypes in clade A were APOE4/4, while APOE4/4 represented only 3% of clade B. The APOE3/4 genotype was evenly distributed between the two clades and APOE3/3 was more prevalent in clade B.
We attempted to develop a complex algorithm, based on the SNPs that differentiated clades A and B, that would accurately predict LOAD risk. This proved difficult because some mutations were shared by multiple terminal clades, although on different sequence backgrounds. The task became much simpler when we identified the polymorphisms that differentiated all of the stacks of common haplotypes - those illustrated by the various boxes in figure 2 (right panel). We were surprised to discover that each of these boxes was distinguished by the length of a polymorphic, poly-T locus, 523. While the poly-T is approximately the same length within each box, the average length of the poly-T is significantly different among the boxes. When we next asked on which APOE allelic background each poly-T mutation occurred, we observed for the first time why we thought that APOE4 was the risk gene: APOE4 was virtually always connected to a long poly-T variant. Short poly-T variants were virtually always on APOE3 strands. However, some APOE3 alleles were connected to long poly-T variants.
Figure 3 shows histograms of the lengths of the 523 mutations on each APOE backbone for all individuals (both AD patients and controls) in this experiment. Looking first at the APOE4/4 genotype group, all the poly-T lengths were greater than 19 T nucleotides, and were typically 22 – 29 T nucleotides long. Looking next at the APOE3/3 individuals, there were clearly two separate poly-T length groups in relatively similar proportions. The short poly-T lengths were 11 – 16 T nucleotides and the longer poly-T lengths were 29 – 39 T nucleotides. Thus APOE3 alleles can now be characterized on the basis of being connected to either a long or short poly-T.
Further analyses demonstrated that the 523 short repeats always separated into clade B, while the long poly-T length repeats, regardless of whether they were linked to APOE3 or APOE4, segregated to clade A. APOE4 occurred in clade A 98% of the time. Thus, looking at the precise distribution of poly-T lengths in geographically-determined population studies of thousands of individuals will be much more informative than examining APOE status.
After confirming the phylogenetic structure of this region of TOMM40, and the distribution of APOE and 523 genotypes on the tree, we asked whether the 523 genotype influenced the central LOAD endophenotype, age of disease onset. This was addressed using a separate population of APOE3/4 LOAD patients, and asking whether there was a difference in mean age of onset for carriers of the APOE3-long 523 versus APOE3-short 523 haplotype. The DNA and age of onset data for these well-characterized patients was obtained from the Duke Bryan Alzheimer Disease Research Center. With all the patients having one allele of APOE4-523 long, those patients who also carried a short 523 allele (5 subjects) had an average age of onset of 78 years (Figure 4). Those patients who carried a second long 523 allele (29 subjects) had an average age of onset of 70 years which is very similar to the age, first determined by Corder et al. and verified over the last 17 years, at which 50% of APOE4/4 individuals have diagnosed disease.2 Analysis of the age of onset for a limited number of APOE3/3 patients suggests that there is earlier onset of cognitive impairment or LOAD for patients carrying long 523 alleles, consistent with observations for the APOE3/4 patients. The data discussed here are from case/control cohorts rather than an epidemiologic study. Conclusions about population risk will have to wait for analyses of prospective, observational series of aged subjects. Several prospective studies of LOAD have been ongoing for 4–18 years. We plan to genotype the 523 and APOE loci for incident LOAD cases from these studies, as well as all individuals who remain without disease according to accepted neuropsychological criteria.
The phylogenetic analysis of the APOE LD region indicates that each strand of the pair in every person has the capacity to add great phenotypic heterogeneity even within a relatively homogenous group like Caucasians. In the case of LOAD, phenotypic complexity is measured as age of onset, obscuring the Mendelian inheritance of the 523 locus and presenting as a late-onset, complex disease. The variable repeat that is inherited from each parent provides the phenotypic complexity measured by age of onset.
This new data inspires a new question: Is APOE the risk gene or is the disease-causing locus the 523 poly-T variant in TOMM40? Genetic parsimony and Okkam's razor would rest with 523 as the “real gene” and APOE4 being identified because it is almost always connected to a long 523 poly-T allele. The genetic inheritance of the disease would now seem to be clearer than the biology.
For a drug discovery program, it is more than gene identification that is relevant; what is critical for LOAD is to understand the pathogenic mechanisms and triggering events. We know that apoE protein interacts with the mitochondrial outer membrane and that different apoE isoforms interact differently with and cause differential effects on the dynamics and function of neuronal mitochondria leading to differences in neurite outgrowth.8–11 There is, therefore, likely also a role for APOE in disease pathogenesis. Both apoE and Tom40 proteins are implicated in mitochondrial toxicity early in the proposed pathogenic process of AD and perturbation of the normal activities of these proteins can explain the formation of amyloid plaques and other toxic protein aggregates. The so-called “amyloid cascade” exists and may exacerbate pathogenesis once initiated. However, early toxicity is likely initiated by apoE-Tom40 protein interactions causing the release of cytochrome C from damaged mitochondria with subsequent apoptosis and initiation of the amyloid cascade.12,13 Targeting the long-term poisoning of mitochondria and increased apoptosis would seem to be a good way to delay the age of onset and interrupt the course of AD: The best place to turn off the water in a cascade is to block the source.
The variable-length, poly-T polymorphism appears to be stably inherited through human evolution rather than representing a site for sporadic, recurrent, contemporary mutational events. The phylogenetic approach identifies stable length mutations that occur on different evolutionary mutational backgrounds. Therefore, the situation is unlike the dominant triplet repeat diseases with variable expressivity, such as myotonic muscular dystrophy, where the number of repeats is unstable through successive generations. Thus it would seem that the 523 polymorphism could be used in a LOAD age of onset predictive test. Based on the age of the individual (between the ages of 60–87) and their 523 genotype, we propose that a risk estimate (high or low) can be made for the onset of mild cognitive impairment symptoms and conversion to AD over the next 5–7 years (i.e. not lifetime risk).
Replication and validation of the relationship between 523 and age of LOAD onset is necessary before the marker is used in clinical practice. For that reason, we have designed a prospective clinical study that combines validation of the genetic marker with assessment of the efficacy of safe drug for delay of disease onset (www.opalstudy.org). A number of considerations regarding this clinical study have been discussed with scientists at the US Food and Drug Administration via the Voluntary Exploratory Data Submission mechanism. Discussions with pharmaceutical companies have already been initiated by Zinfandel Pharmaceuticals, Inc., a “virtual” drug development company organized to design, accelerate, and manage the clinical trial. During the course of this prospective study, the poly-T assay will be developed into an FDA-qualified test. Even with the improved estimate of age of onset risk provided by the genetic marker, a study of this nature will take 5 or more years to complete.
While we have chosen not to commercialize this test for clinical use until after validation, the test will be available for sponsored academic research. In parallel, we will negotiate licenses for commercial studies as the marker will have application in stratification of results from ongoing clinical trials of AD interventions in addition to prospective stratification of studies. Given the range of APOE allele frequencies across ethnic groups, it is likely that different ethnic groups will have different allele frequencies for the poly-T variant as well. Ethnic diversity is not a problem if anticipated, and primary phylogenetic studies will be necessary to describe any differences in evolution of the LD region and to calculate the allele frequencies.
I anticipate, especially based on my early 1990's experience with the spirited debate around the diagnostic use of APOE genotyping in AD, that there will be differences of opinions on various ethical, legal and social issues (ELSI). I have therefore already spent more than a year in consultation with a panel of external, worldwide ELSI experts who helped with developing the clinical plan.
There are many unanswered questions that will be addressed in the coming years and one of these refers to the genetics of other so-called complex diseases: It is reasonable to consider that other small structural polymorphisms, not detected by GWAS, could be responsible for the variable expression of other complex diseases. These loci could be defined using targeted phylogenetic strategies. The work described in this review demonstrates the complexity of the genotype-phenotype relationship, and is a reminder that we are just beginning our journey to understand the human genome.
The work described in this article would not have been possible without the generous contribution of DNA samples from the Netherlands Brain Bank (under the direction of Dr. Rivka Ravid), the Banner Sun Health Research Institute (under the direction of Dr. Thomas Beach), the Arizona Alzheimer's Disease Core Center (Arizona ADCC) and the Joseph and Kathleen Bryan Alzheimer's Disease Research Center (Bryan ADRC). Work at the Arizona ADCC was supported in part by grants from the National Institute on Aging (NIA) to Dr. Eric Reiman (P30 AG019610 and R01 AG031581); grants from the National Institute of Neurological Disorders and Stroke (R01 NS059873) and Science Foundation Arizona to Dr. Matthew Huentelman; the Arizona Alzheimer's Consortium, and the State of Arizona. The work at the Joseph and Kathleen Bryan ADRC was supported in part by a NIA grant to Dr. Kathleen Welsh-Bohmer (P30 AG028377). Dr. Roses is supported in part by a grant from the NIA (1RC1 AG035635-01).
Financial disclosure Dr. Roses is the President of three companies filed as S-Corporations in the state of North Carolina: Cabernet Pharmaceuticals, Inc is a pipeline pharmacogenetic consultation and project management company that has other pharmaceutical companies as clients; Shiraz Pharmaceuticals, Inc. is focused on the commercialization of diagnostics, including companion diagnostics, for universities, pharmaceutical companies, and biotechnology companies; Zinfandel Pharmaceuticals is the sponsor of OPAL (Opportunity to Prevent Alzheimer's Disease) which is a combined clinical validation of a diagnostic and a pharmacogenetics-assisted delay of onset clinical trial.
These companies are independent of Duke University, but the diagnostic intellectual property generated by Dr. Roses or his team is intended to be treated as Deane Duke Discovery Institute property once there is an established commercial value. There are no other potential sources of financial conflict. Dr. Roses had full access to all data that contributed to the work described