|Home | About | Journals | Submit | Contact Us | Français|
It is hoped that an understanding of the genetic basis of Parkinson’s disease (PD) will lead to an appreciation of the molecular pathogenesis of disease, which in turn will highlight potential points of therapeutic intervention. It is also hoped that such an understanding will allow identification of individuals at risk for disease prior to the onset of motor symptoms.
A large amount of work has already been performed in the identification of genetic risk factors for PD and some of this work, particularly those efforts that focus on genes implicated in monogenic forms of PD, have been successful, although hard won. A new era of gene discovery has begun, with the application of genome wide association studies; these promise to facilitate the identification of common genetic risk loci for complex genetic diseases. This is the first of several high throughput technologies that promise to shed light on the (likely) myriad genetic factors involved in this complex, late-onset neurodegenerative disorder.
The exact pathobiologic basis for the progressive cellular and system dysfunction observed in Parkinson’s disease (PD) has remained largely elusive since the recognition of this disorder as a distinct entity. While palliative treatments exist for this disease, it is hoped that understanding the molecular events that lead to PD will facilitate the creation of an etiologic based therapy that will halt or reverse the disease process. This hope is a primary motivation underlying work aimed at understanding the genetic basis of PD and related disorders.
Monogenic forms of parkinsonism with a clear Mendelian pattern of inheritance have been the most accessible in terms of the identification of genetic causes and contributors. This work has lead to the discovery of mutations in SNCA, PARK2, PINK1, PARK7 and LRRK2 as causes of primary parkinsonism and/or PD. These findings have been central to much of the investigation into the molecular underpinnings of PD. While the identification of these loci has been important, mutation of these genes is responsible for a relatively small proportion of PD cases.
In parallel to work on monogenic PD a large amount of research has focused on identifying genetic variability that confers risk for, rather than causes, PD. This work aims not only to add insight into the molecular pathogenesis of PD, but also to create a risk profile for disease in the general population. For the most part risk variant identification is based on the tenet of the common-disease common-variant hypothesis. This theory operates on the premise that common genetic variants underlie susceptibility for common diseases such as PD. The common-disease common-variant hypothesis is an idea that is the basis for the vast majority of genetic case control association studies and the impetus for initiatives such as the International Human Haplotype Map Project (www.hapmap.org). There has been significant contention over the common disease common variant hypothesis with substantive support for the idea that rare mutations underlie the etiology of complex disorders. While the common disease common variant and rare variant hypotheses are often proposed as opposing theories they are not mutually exclusive; however, the current accessible technologies do not readily allow the investigation of rare mutations as a cause of common disease in a complete way.
In this review we will discuss the genetic risk loci discovered thus far in PD; we will limit our discussion to those loci for which a genetic variant or variants have been proposed, rather than extant loci for which no ‘gene’ has been identified. We will extend our discussion to include a brief consideration of genome wide association studies in PD.
In general within the field of disease genetics the search for common genetic risk variants has been difficult and has had a low hit rate. There are of course several reasons for this failure, particularly in a disease such as PD. Most prominent is the size of effect and thus the number of samples required to detect an effect. Many studies were predicated on the idea that risk variants with effect sizes comparable to APO E ε4 in Alzheimer’s disease may exist for PD; thus these generally only included sample sizes in the low hundreds. Early genome wide association data shows quite convincingly that in terms of common genetic risk factors (allele frequency >10%), there are no risk loci with an odds ratio greater than two in the North American White population. It is likely that sample sizes of more than a thousand are required to detect genetic variants that exert effects below this magnitude. The second limitation of such studies is that because they are largely low-throughput in nature, typing usually only one gene and one or a few variants, the prior odds of selecting the correct gene and the correct variant to type were very small (i.e. if there are 10 genes that exert a measurable risk in PD and 20 SNPs, the odds are against an investigator choosing the correct gene, out of 25,000 in the genome or the correct SNPs, out of 2 million in the genome).
Thus the majority of previous studies were not only unlikely to have selected a genuine risk locus or variant for interrogation but further were likely not powered to detect risk effects, should they exist. The exception to this work is that which has centered on exhaustive analysis of genes already implicated in familial forms of related disorders, emphasizing that candidate analysis in the absence of prior genetic evidence implicating the locus in disease, is likely to fail. Such analyses provide two of the most convincing sets of genetic association data for PD, implicating the genes SNCA and MAPT as risk loci.
Genetic variability at SNCA is arguably the most reliable association of a common genetic risk locus with PD identified to date. The impetus for examination of this locus resulted from the cloning of SNCA mutations as a cause of a rare familial form of PD. Closely after the initial identification of the first disease causing mutation in SNCA the protein product of this gene, α-synuclein, was shown to be a major constitutive part of the pathologic hallmark inclusion of PD, the Lewy body. The relevance of studying rare familial forms of PD to understanding the common non-familial form of this disease had remained in question, however these two findings elegantly linked these disease entities. Shortly after this work Krüger and colleagues  reported an association between common genetic variability in SNCA and risk for PD, specifically an association with a dinucleotide repeat approximately 10kb 5’ to the translation start site of SNCA. This allele, called REP or NACP REP in the literature is an imperfect dinucleotide repeat believed to reside in or close to a regulatory region upstream of SNCA. Variability at and proximal to REP has been examined in a large number of studies [2–9] and most recently a meta-analysis of published studies combined with novel data revealed a consistent association between risk for disease and the longer REP allele . Experimental in vitro evidence suggests that this allele is associated with increased expression of SNCA. From an etiologic perspective this observation fits well with the discovery that multiplication of the SNCA locus, which leads to increased levels of the wild type α-synuclein, causes familial PD [11, 12]. Further the multiplication mutations, which are to date both triplication and duplication events, appear to have a dose dependent effect on disease severity; thus triplications that double the genomic copy number of SNCA result in disease with an onset in the 4th decade of life and duplications, which increase SNCA load by 50% result in disease onset in the 5th or 6th decade of life. Given this finding it is quite reasonable to suppose that common genetic variability at SNCA, which may increase expression by a small amount, is a risk factor for the late onset sporadic form of the disease. In addition to the work characterizing the role of the REP alleles in risk for PD, more complete analyses have shown an association between genetic variability at other parts of SNCA with risk for disease; in particular variability in the 3’ half of the gene [13–15]. Although this work does not resolve the issue of which are the pathobiologically relevant variants, they do suggest that investigation of REP alleles alone may miss, or underestimate, the contribution of genetic variability at SNCA to sporadic PD.
There are six major brain isoforms of the microtubule associated protein tau (hereafter called tau) generated by alternate splicing of exons 2, 3 and 10 of the gene MAPT. Alternate splicing of exon 10 results in tau with 3 or 4 microtubule binding respeats (3 repeat tau or 4 repeat tau). Tau is a major protein component of neurofibrillary tangles, a hallmark lesion of Alzheimer’s disease; in these lesions tau appears to be deposited as hyperphosphorylated insoluble filaments. Tau deposition is a hallmark of several other neurodegenerative disorders, including Picks disease (OMIM #257220), argyrophilic grain disease (OMIM #172700), corticobasal degeneration, progressive supranuclear palsy (PSP; OMIM #601104) and frontotemporal dementia (FTD; OMIM #600274).
An initial link between the MAPT locus and disease was first reported in 1997 by Conrad and colleagues ; who showed an association between a repeat polymorphism close to the gene and PSP, indicating that variability at this locus is a risk factor for this disorder. In 1998 definitive evidence linking MAPT to neurodegenerative disease was provided when Hutton and colleagues identified mutations in this gene as a cause of frontotemporal dementia with parkinsonism linked to chromosome 17 (FTDP17) . To date more than 35 mutations have been identified at the MAPT locus associated with this disease (for a good review see ). Many of these mutations are predicted to alter the alternative splicing of MAPT, altering the ratio of 3 repeat tau to 4 repeat tau [17, 19].
In addition to rare causal mutations, common variability in MAPT has been linked to several diseases; notably robust association between MAPT and risk for PSP, AD and most recently PD has been reported. From a genomic perspective the architecture of the MAPT locus is unusual; the gene sits within a large block approximately 1.6 million base pairs long that shows reduced recombination and thus high levels of linkage disequilibrium. This appears to be a result of a common genomic inversion in the Caucasian population; this inversion inhibits recombination between genomic fragments that are in the opposite orientation.
This phenomenon results in two common Caucasian haplotype groups across this locus; often termed H1 and H2. Association between MAPT H1 sub-haplotypes and risk for PD has been tested by many groups [20–24], and the results in general show a consistent association with disease, the H1 haplotype conferring a risk with an odds ratio of approximately 1.3 (for summary statistics see the PD Gene database http://www.pdgene.org/meta.asp?geneID=14). Evidence is also mounting that the MAPT risk alleles for these disorders are associated with increased MAPT expression; either in total or specific to four-repeat tau splice variants (i.e. those containing exon 10) [25, 26]. Most recently Tobin and colleagues  have shown association between PD risk and a sub-haplotype of H1; these authors then extended this work to show over-expression of 4 repeat tau in the brains of PD patients.
From a pathological standpoint the relationship between tau and PD remains enigmatic: in general the brains of PD patients do not show abundant tau positive neuropathology; however the data supporting genetic association between MAPT and risk for PD continues to grow and is certainly one of the more robust findings in the field of risk variants in neurogenetics.
Glucocerebrosidase is a lysosomal enzyme that hydrolyses the beta-glycosidic linkage of glucosylceramide, a ubiquitous sphingolipid present in the plasma membrane of mammalian cells, originating ceramide and glucose. The human GBA gene is located on chromosome 1q21 and comprises 11 exons and 10 introns spanning over 7kb.
A 5.5kb pseudogene, which shares over 96% homology with GBA, is located just 16kb downstream of the functional gene. The difference in size between the two is due to several Alu insertions in intronic regions of GBA. The lack of functionality of the pseudogene is attributed to two exonic deletions: a 4bp deletion in exon 4 and a 55bp deletion in exon 9 . The pseudogene is absent in non-primate species and it has been suggested that the duplication event that lead to the pseudogene, occurred about 40 million years ago. Interestingly, it has been shown that the orangutan does not present a pseudogene, but instead two functional genes, hence potentially four copies of GBA .
Mutations in GBA are the cause of a recessive lysosomal storage disorder – Gaucher disease. Patients with Gaucher disease present macrophages enlarged with deposits of glucosylceramide, suggesting that mutations in GBA act in a loss-of-function fashion . Over 200 mutations have been described in GBA, including point mutations, deletions and recombinant alleles derived from the pseudogene sequence. It has been estimated that approximately 20% of the pathogenic mutations in GBA are caused by recombination or gene conversion between the two genes. Although mutations are distributed over the entire GBA coding region, pathogenic mutations seem to cluster in the carboxyl-terminal region, which encodes the catalytic domain .
Phenotypes of Gaucher and Parkinson’s diseases do not overlap significantly, but the first indication of a relationship between the two, came from clinical case reports. These reported patients with Gaucher disease who developed early-onset, treatment-refractory parkinsonism .
The first report of an increased frequency of mutations in GBA in Parkinson’s disease patients was published online in 2003 . Here, the authors screened 57 brain samples from subjects with PD and 44 brain samples from adult subjects without a diagnosis of PD. Mutations in GBA were identified in 14% of the PD samples, and no mutations were found in the control samples. The percentage found in PD patients was of particular relevance, given that the carrier frequency for Gaucher disease-causing alleles is estimated at 0.006.
In 2004 Aharon-Peretz J et al.  reported a screening of 99 Ashkenazi PD patients, 74 Ashkenazi Alzheimer’s disease patients and 1543 healthy Ashkenazi Jews for six GBA mutations, considered to be the most common cause of Gaucher disease among Ashkenazi Jews. A surprising percentage of 31% of PD patients had one or two mutant alleles, when compared to only 6% of controls with mutations in GBA. Also, among the PD patients, those who were carriers of GBA mutations had significantly earlier age-of-onset than those who presented no mutations.
The following year, Clark LN et al.  presented a report on 160 Ashkenazi Jewish probands with Parkinson’s disease and 92 clinically evaluated, age-matched controls of Jewish ancestry. Subjects were screened only for the N370S mutation, which was the most frequent variant in the previous Ashkenazi Jewish study. Seventeen probands (10.7%) were identified with mutations compared to 4.3% of controls; however these results did not reach statistical significance. Sato et al. performed a screening for seven of the most common variants in GBA in a series of 88 unrelated Caucasian subjects of Canadian origin, selected for early age of onset and/or positive family history; additionally a group of 122 healthy controls was also screened. Mutations were enriched in the PD group when compared with the controls (5.6% vs 0.8%; p=0.048) .
In a smaller series of cases and controls collected in Venezuela (33 PD samples, 31 controls), Eblan M. et al. screened the entire coding region of the GBA gene and described an increase in mutation frequency among the early-onset PD samples when compared to the controls (12% vs 3.2%) . Toft M. et al published a report on the screening of two variants in GBA in individuals from Norway . This was the first report on northern European subjects. The authors screened 311 PD patients and 474 healthy controls for the two common mutations N370S and L444P. They did not find an increased frequency of mutations in PD samples when compared to controls, however, the frequency of the mutations was surprisingly high when compared to estimates in white individuals (1.7% vs 0.6%). Though no statistical significance was obtained, the fact that only two variants were screened in a population not previously studied, may account for the lack of association.
Several other studies reported positive associations of GBA mutations with PD, and one with Lewy bodies disorders. The most recent studies, which performed complete screening of the gene, all found an association with PD in their study populations. Clark, L. et al. performed a study with two subsets of PD patients: one with Jewish ancestry and another without Jewish background. Controls were also selected to match each of these groups. The frequency of GBA mutations was consistently greater in PD samples when compared to controls, particularly if only early-onset PD samples were considered .
Our group recently published a report on a cohort of Portuguese samples, where the enrichment of mutations among PD samples is also clear. This is the first report to obtain a clear statistically significant association in a population other than Ashkenazi Jewish. However, this same population has been shown to present an increased frequency of the G2019S mutation in the gene LRRK2; which is known to be a mutation enriched within the Ashkenazi Jewish population .
An interesting result regarding GBA, and lysosomal enzymes in general, comes from the report of Balducci C. et al. who tested PD patients and controls for activity of lysosomal hydrolases in the CSF. GBA activity was significantly reduced in the CSF of PD patients, as would be expected by a loss-of-function model of mutations in these samples .
Given all these results it seems clear that mutations in GBA are a risk factor for the development of PD, particularly early-onset PD. The mechanism by which mutations exert their effect and act as a risk factor is not yet understood, but several tentative explanations have been provided in the literature, relating to decrease in lysosomal function or involvement of the ubiquitin proteasome system.
Mutations in LRRK2 were identified as a cause of PD in 2004 by us and others [42, 43]. A single mutation, G2019S, is a relatively common cause of PD in Caucasian populations, underlying approximately 2% of sporadic PD cases in North America and Northern Europe and 5% of cases with a positive family history for disease [44–46]. This mutation is more common in populations such as those from Portugal, those of Ashkenazi Jewish origin and from North African Arab populations; underlying 8%, 21% and 41% of disease in these populations respectively [47–49]. The G2019S variant does not however occur at appreciable frequency in control cohorts from these populations, so for the purpose of this article we will not designate this as a susceptibility variant. Two variants reported from Asian populations however; do appear to be true risk variants for PD. The first G2385R was initially described in a Taiwanese family . Assessment of this variant in large Asian populations showed association with risk for disease in Taiwanese [51–53], Japanese , Hong Kong Chinese  and mainland Chinese [56, 57] populations. In general this work showed that the risk variant, 2385R is present in PD populations at a frequency of ~10%, whereas it is only found in 0.5%–5% of controls. Taking a fairly conservative view of these results would suggest that carrying the risk allele imparts a two-fold increase in an individuals risk of Parkinson’s disease. Given that this association appears robust across Asian populations, this risk allele is an underlying factor in a very large number of PD cases worldwide. More recently a second LRRK2 risk allele, also identified within Asian PD populations has been described [58, 59].
Variants in the gene OMI/HTRA2 (OMIM #606441) were recently associated with an increased risk for PD . This gene encodes a serine-protease with proapoptotic activity containing a mitochondrial targeting sequence at its N-terminal region. Several lines of evidence in the literature support a role for OMI/HTRA2 in neurodegeneration, the first of which was produced by Gray and colleagues  when they showed that Omi/HtrA2 interacts with presenilin-1, which is encoded by a gene known to be involved in Alzheimer’s disease. Moreover beta-amyloid, which plays a pivotal role in the pathogenesis of Alzheimer’s, was shown to be cleaved by Omi/HtrA2 . Mouse models also provided support for the involvement of this protein in neurodegeneration: a mutation in the protease domain of Omi/HtrA2 was found as the genetic cause underlying disease in the mnd2 mutant mouse  and the knockout mouse showed loss of neurons in the striatum concomitant with parkinsonian features .
The description of mutations associated with PD came from the work of Strauss and colleagues in 2005 . Here, they screened a large cohort of 518 German PD patients and 370 healthy control individuals for mutations in OMI/HTRA2. One variant (p.G399S) was found only in PD patients (n=4). The other variant (p.A141S) was found significantly overrepresented in the PD group (p=0.039) suggesting that it could act as a risk factor for PD. In vitro studies of both variants provided evidences of functional effects. Additionally, Omi/HtrA2 was detected in Lewy bodies in brain tissue from PD patients.
Recently, Simon-Sanchez and Singleton  presented a thorough analysis of the coding region of OMI/HTRA2 in a case-control study, which comprised a large cohort of PD patients (n=644) and neurologically normal controls (n=828). The mutation initially thought to be pathogenic was found at the same frequency in PD samples and controls (0.77% and 0.72% respectively), indicating that it is not disease causing, but probably a rare variant in the German population. Similarly for the p.A141S variant, no association with PD was found. Evidence for the involvement of Omi/HtrA2 in neurodegenerative diseases are quite compelling at this point, but the genetic basis for this involvement is still very much debatable.
Mutation of the gene PARK2, was the second genetic cause of parkinsonism identified. Mutation of this gene was found to cause an autosomal recessive juvenile form of parkinsonism. As with many autosomal recessive diseases, most proven pathogenic mutations are loss of function variants, involving large structural genomic disruption of the coding region of the gene or premature termination of the transcript; in addition several missense mutations have been identified with varying levels of proof vis a vis association with disease. More controversial still, is the role of single heterozygous mutations as risk factors for late onset typical PD. Much of this work has been driven by observations in family based studies where later onset typical PD was seen in relatives of patients with young onset parkinsonism – analysis of these cases revealed homozygous or compound heterozygous PARK2 mutations in the young-onset cases and possession of single PARK2 mutations in the later onset affected relatives. These studies have been criticized because of the significant confound of ascertainment bias; i.e. the families that tend to be collected and analyzed are those with many affected family members. Further support for the role of single PARK2 mutations as a risk factor for disease is the observation that heterozygous mutation carriers display dopamine reuptake deficiency in fdopa PET analysis . The strength of the case for a role of heterozygous PARK2 mutations as risk loci based on these observations, and on data arising from case control analyses examining this issue, has been hindered by the lack of studies that have performed full sequence and gene dosage analysis of PARK2 in large groups of cases and controls. Only a small number of studies have taken this approach and so far that data does not strongly support a large role for these mutations in typical PD, identifying pathogenic mutations in the heterozygous state in both cases and controls. This issue warrants a large scale sequencing effort, however, it is likely that several thousand cases and controls will need to be fully sequenced to finally prove or disprove this putative association.
Genome wide association studies (GWAS) were a much-anticipated technology, and the application of this approach is expected to facilitate major inroads into our understanding of the genetic basis of disease . The basic tenet underlying GWAS is the common disease common variant hypothesis. A growing number of GWAS are being published and these are proving a valuable approach in understanding the genetics of complex disease. Two studies have been published thus far in PD; the success of these studies was limited somewhat by the sample size used, a point that is illustrated by their failure to identify SNCA or MAPT as potential risk loci. However, given the increasing investment in this technology  it is probable that several laboratories around the world are investing in large-scale GWAS in PD and that the next 2 years will see the identification of novel risk loci for this disease.
Because GWAS require large sample series they necessitate inter-laboratory collaborations and large consortia, individual investigators are by and large, unable to accumulate large enough series. Clearly the formation of such collaborations is a good thing for research; they facilitate communication between scientists, maximize the chances of finding positive signals and engender trust between research groups that allows collaboration outside the immediate aims of the collaborative framework. This latter point is particularly critical; as technologies improve GWAS will be expanded to include better genomic coverage (currently even the most dense SNP platforms probably only capture ~70% of common variation) in more samples.
The most immediate challenge following GWAS will be in understanding the pathobiological consequences of identified risk variants. These will not easily be amenable to traditional disease model approaches used now in cell biological and transgenic research; the primary limitation being that the biological effects of risk variants are likely extremely subtle. Parsimony would suggest that the majority of identified risk variants will be non-coding, in all likelihood exerting an effect through expression, either modulating constitutive levels, expression level in response to a stimulus, sub-cellular expression and/or splicing. The easiest of these to catalog are alterations in expression and indeed some effort has gone toward characterizing in a genome wide manner, the effects of individual genetic variants on expression of proximal (cis) and distal (trans) transcripts . The creation of standard genotype-expression transcript maps will be a critical step in understanding the effects of disease associated genetic variants, and there have already been moves to create such a resource (http://nihroadmap.nih.gov/GTEx/).
While GWAS are providing a unique set of insights into complex diseases, it is only the first of many burgeoning technologies that will impact our understanding of biology and disease. Most anticipated of these is cost-effective genome wide resequencing; the launch of this type of work is a goal of the 1000 genomes project (http://www.1000genomes.org/page.php), which aims to catalog rare and common human variation by sequencing the genome of at least 1000 individuals from around the World. Currently this is a huge endeavor and genome resequencing for case-control analysis is cost- and time-prohibitive; however, this is likely to change over the next 2–5 years. The type of next generation sequencing employed for this research will not only facilitate genome resequencing but also allows us to analyze other features previously impractical; this includes genome wide assays of DNA methylation, analysis of histone modification, analysis and identification of transcription binding sites, full transcriptome sequencing and identification of allelic imbalance in expression. Each of these approaches provides revolutionizing data in their own right; however, the true power of such data will become evident as we integrate these datasets to garner a systems based understanding of biological and disease processes.
In summary, there are several common risk loci unequivocally associated with risk for PD; in each instance these genes were originally implicated in the disease process by studying families with disease. The advent and application of novel technologies promises to define other common genetic variants that exert risk for disease, help in the identification of rare risk variants and facilitate in the understanding of the pathobiological consequences of genetic variants linked with disease.