|Home | About | Journals | Submit | Contact Us | Français|
Mutations in the leucine-rich repeat kinase 2 (LRRK2) gene together represent the most common genetic determinant of Parkinson's disease (PD) identified to date. The vast majority of patients with LRRK2-related PD reported in the literature carry one of three pathogenic substitutions: G2019S, R1441C, or R1441G. While G2019S and R1441C are geographically widespread, R1441G is most prevalent in the Basque Country and is rare outside of Northern Spain. We sought to better understand the processes that have shaped the current distribution of R1441G. We performed a haplotype analysis of 29 unrelated PD patients heterozygous for R1441G and 85 wild-type controls using 20 markers that spanned 15.1 Mb across the LRRK2 region. Nine of the patients were of Basque origin and 20 were non-Basques. We inferred haplotypes using a Bayesian approach and utilized a maximum-likelihood method to estimate the age of the most recent common ancestor. Significant but incomplete allele sharing was observed over a distance of 6.0 Mb and a single, rare ten-marker haplotype 5.8 Mb in length was seen in all mutation carriers. We estimate that the most recent common ancestor lived 1,350 (95% CI, 1,020–1,740) years ago in approximately the seventh century. We hypothesize that R1441G originated in the Basque population and that dispersion of the mutation then occurred through short-range gene flow that was largely limited to nearby regions in Spain.
Parkinson's disease (PD; MIM 168600) is the second most common neurodegenerative disorder and affects approximately 1–2% of the population over 60 years of age. It is characterized clinically by bradykinesia, resting tremor, rigidity, and postural instability, and pathologically by loss of dopamine neurons in the substantia nigra and Lewy body formation [1–3]. Only one in five PD patients report a family history of the disease and in most instances, PD is thought to result from a complex interaction between genetic and environmental factors [4, 5]. However, studies of rare multigenerational pedigrees in which PD segregates in a Mendelian pattern have yielded five “causal” genes: PARK2 (MIM 600116 and 602544), PINK1 (MIM 605909), PARK7 (MIM 606324), SNCA (MIM 163890), and LRRK2 (MIM 609007). Of these five genes, mutations in LRRK2 are the most prevalent in PD patients of European origin .
The majority of patients with LRRK2-related PD reported in the literature carry pathogenic variants within one of two mutational hotspots: codon 1441 in exon 31 (R1441C/G/H), and codon 2019 in exon 41 (G2019S). R1441C, R1441H, and G2019S have each arisen from at least three separate founding events and are widely geographically distributed [7–10]. All three mutations occur in Asians and in multiple European subpopulations. In contrast, R1441G is largely limited to Northern Spain. Originally, discovered in four families in the Basque Country , R1441G is found in approximately 20% of Basque patients with familial PD . It was also identified at lower frequencies in patients from nearby provinces in Spain who did not report Basque ancestry [13, 14]. Previous work suggests that these patients might share the same background haplotype, but interpretation of these data is limited by a lack of overlap in the markers analyzed across studies [11, 12, 14, 15]. In this study, we sought to further explore whether PD patients who carry R1441G share a common founder, and if so to estimate the age of the founding event.
The study population was comprised of 29 unrelated PD patients who carried R1441G (Table 1), nine relatives of the patients, and 85 healthy mutation-negative controls from Northern Spain. Limited haplotype data on 14 of these patients have been published elsewhere [12, 14]. Twenty-eight of the patients were recruited from hospitals in three neighboring regions of Northern Spain (Asturias, n=15; Cantabria, n=1; Basque Country, n=12) and one (a Hispanic patient who did not report Basque ancestry) was ascertained from a movement disorder clinic in North America . All patients met UK Parkinson's Disease Society Brain Bank clinical diagnostic criteria for PD . The nine relatives came from three families (PJ68, FAM6, and FAM8); eight were affected and one was unaffected.
In Spain, an individual carries both of their parent's surnames throughout life and their name does not change with marriage. Thus, a great deal of information about recent ancestry can be gained by simply examining an individual's name. PD patients were classified as “Basque” if they had one or two Basque surnames and “non-Basque” if neither of their surnames was of Basque origin. Among the 29 PD patients included in the study, nine were categorized as Basque, and of these, four (FAM1, FAM3, FAM5, and FAM9) possessed two Basque surnames.
The controls were derived from two sources. Forty-two of the controls were autochthonous individuals from the Basque Country whose parents each had two Basque surnames (referred to hereafter as “Basque Controls”; mean age, 36.1±10.8 years; age range, 28–73 years; male, 47.6%). The remaining controls (n=43) were blood donors at one of two hospitals in Asturias, Spain (mean age, 40.9±11.5 years; age range, 20–62 years; male, 71.4%). Ancestral classification by surname was not possible for these individuals because data were collected anonymously.
The study was approved by the local ethics authorities at each institution and written informed consent was obtained from all participants.
We selected a total of 20 markers (15 microsatellites and five single nucleotide polymorphisms [SNPs]) spanning a distance of 15.1 Mb across the LRRK2 region for genotyping in all study participants. We began with a set of 15 markers which have been used in several previous haplotype analyses of LRRK2 [11, 12, 14, 15]. We then added two microsatellites (D12S345 and D12S1713) and a SNP (rs1511547) chosen from the MAP-O-MAT database (http://compgen.rutgers.edu/mapomat/) to fill large gaps between markers. In genotyping rs1511547 (by sequencing) we identified two novel SNPs (rs55917927 and rs56260627) for which the minor allele was rare (frequency <13%) among controls but present in all patients carrying R1441G. These two potentially informative SNPs were also added to the marker set.
SNP genotyping was performed by sequencing with the Big-Dye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA). Microsatellites were amplified by PCR using fluorescently labelled forward primers. Genotypes were determined using an ABI PRISM 3130 Genetic Analyzer and GeneMapper 4.0 software (Applied Biosystems). Centre d'Etude du Polymorphisme Humain (CEPH) samples 1331-01, 1331-02, 1347-13, 1362-14, 1413-18, and 1416-12 (Coriell Cell Repositories, Camden NJ, USA) were used as a reference for microsatellite allele size determination. All PCR primer sequences and assay conditions are available on request.
We used PHASE software v2.1.1 (http://www.stat.washington.edu/stephens/software.html) to infer haplotypes when marker phase could not be resolved by pedigree data . We estimated the age of the most recent common ancestor for individuals who carried R1441G using the program Estiage . This maximum-likelihood algorithm uses information on the recombination fractions between the mutation and each marker, the frequencies of the shared allele at each marker, and the position of the first marker in each direction that is no longer shared, to calculate the number of generations (with 95% CI) elapsed since the most recent common ancestor introduced the mutation into the population. We defined a marker as shared if a single allele was included in all disease haplotypes or in at least half and at significantly greater frequency (Fisher's exact test) than in 170 inferred control haplotypes (collapsing all other alleles into a single bin). Genetic map positions for each marker were derived from the linkage-mapping server MAP-O-MAT, and physical positions were taken from the National Center for Biotechnology Information (NCBI) human genome assembly Build 36 .
All 29 R1441G carriers shared a ten-marker background haplotype, bounded by D12S2080 and D12S2519, which spanned a distance of 5.8 Mb across the LRRK2 region (Fig. 1). The haplotype included alleles at two markers, rs55917927 and rs34591826 (M1646T), that were present at a frequency of only 3.0% and 0.6%, respectively, in control subjects. This haplotype was not observed among 170 control chromosomes. This strongly suggests that the mutation carriers in our study population originated from a single founder.
The telomeric boundary of the haplotype shared by all mutation carriers was delimited by a single subject (PE139). In this individual, both alleles at D12S2520 and D12S2521 were divergent from the ones shared in all other carriers (Fig. 1). This suggests a previous recombination event between D12S2519 and D12S2520 for the disease chromosome carried by PE139, rather than recurrent mutation at two consecutive markers or an error in assigning phase. Among the disease chromosomes analyzed in the study, incomplete but significant allele sharing was seen across an interval of 6.0 Mb, from D12S2080 to D12S1048 (Fig. 1).
Because many of the markers in our dataset were tightly linked, genetic distances between them could not be accurately estimated from available genetic maps. Instead, we used the genetic length and physical distance between D12S2080 and D12S1301 to calculate an average of 0.30 cM/Mb across the entire region and then computed recombination fractions using the Kosambi mapping function as previously described . We then estimated the age of the most recent common ancestor, designating D12S345 and D12S1301 as the first unshared markers in each direction. This yielded an age estimate of 45 (95% CI, 34–58) generations. Using the most widely published intergenerational interval, 25 years , our data indicate that the patients in our study shared a common ancestor 1,125 (95% CI, 850–1,450) years ago. However, several recent studies have concluded that 30 years is a better approximation of generation time for the period of human history in question [21–23]. Substituting a 30-year interval increases the age of the founding event to 1,350 (95% CI, 1,020–1,740) years ago. Using only Basque controls in the analysis yielded a very similar age estimate of 48 (95% CI, 37–63) generations. Excluding the subject (PE139) with the shortest shared haplotype also had little effect on the estimated age (45 generations; 95% CI, 35–58).
R1441G displays a frequency gradient which peaks in the Basque Country (Fig. 2) and it appears to be very rare outside of Northern Spain. Among several thousand PD patients from elsewhere in Europe and North America who have been screened for R1441G, only one (patient F67) has been found to carry the mutation [13, 16, 24–28]. Based on this pattern of distribution and the results of the present study, we hypothesize that R1441G originated in the Basque population and that a founding event occurred in approximately the seventh century. Dispersion of the mutation then occurred through short-range gene flow, largely limited to nearby regions. This scenario is consistent with current knowledge on the origin and history of the Basque population. The Basques are unusual among modern inhabitants of Europe in that they are thought to constitute a “relic” population descended from Paleolithic Europeans and have remained relatively genetically isolated [29, 30]. This view is supported by archaeological findings, linguistic studies, and analyses of both classic genetic and molecular markers [31–33]. However, examination of Y-chromosome markers also indicates that modest levels of gene flow have occurred between the Basques and neighboring populations (e.g. Catalans) over the past two millennia . In addition, the distribution of mutations for other diseases that are believed to have originated among the Basques is similar to that seen for R1441G. This includes the CAPN3 2362AG→TCATCT mutation for limb-girdle muscular dystrophy type 2A (estimated to have arisen in the sixth to eighth century)  and PRNP D178N which results in fatal familial insomnia .
In contrast to R1441G, the LRRK2 G2019S mutation has been frequently observed across Europe, the Middle East, North Africa, and the Americas [12, 13, 37–43]. Most G2019S carriers share a common ancestor who is estimated to have lived approximately 2,250 years ago and likely originated in the Middle East . While the spread of R1441G was probably slowed by the geographic and cultural boundaries that surround the Basque region, G2019S became widely dispersed, perhaps as a result of the large-scale migrations of the Jewish Diaspora.
Paisan-Ruiz and colleagues performed a haplotype analysis of the LRRK2 region in four extended Basque PD pedigrees with R1441G using a large set of SNP and microsatellite markers . The authors later genotyped a subset of these markers (11 SNPs) that spanned 2.0 Mb across the region in 17 singleton PD cases (16 Basque and one non-Basque) who carried the mutation . In both studies, they concluded that all mutation carriers shared a common founder, but did not calculate an age for the founding event. In the four Basque families and in the singleton group, complete allele sharing was observed over distances of 1.0 Mb and 170 kb, respectively. In our sample of 29 carriers, we observed complete allele sharing over a much larger distance of 5.8 Mb. Though the authors did not formally test for significant but incomplete allele sharing, at a marker (rs10876410) 1.3 Mb upstream from the mutation, only nine of the 17 singleton cases shared alleles. In contrast, in our sample we observed complete allele sharing at rs10876410 and at four markers further upstream, including D12S2080 located 5.7 Mb away from R1441G (Fig. 1). The shorter haplotypes observed by Paisan-Ruiz and colleagues imply that the founding event for R1441G might be substantially older than the estimate derived from our dataset. The reasons for the differences between our data and theirs are not entirely clear, and direct comparisons between datasets are difficult because only one of the markers (rs10876410) overlapped between studies. One possible reason is the nature of the markers used. While microsatellites are on average more informative, SNPs have much lower and less variable mutation rates . Thus, within the same region, shared haplotypes based solely on microsatellites might be expected to be shorter than those based on SNPs alone. However, large differences in haplotype size remain even if one considers only the SNP markers in each dataset. Another consideration is whether the subjects in one or more studies might have actually arisen from multiple founders. In our sample, we believe that this is highly unlikely because all mutation carriers were heterozygous for rs34591826 (located 9.7 kb downstream of R1441G) and the minor allele frequency for this SNP was only 0.6% in controls (Fig. 1). The minor allele of rs34591826 was also included within the disease haplotype in the four large Basque families . However, in the 17 singleton PD cases rs34591826 was not genotyped and the four SNPs (rs4768224, rs12423567, rs1820544, and rs10784616) that constituted the core haplotype shared by all subjects in that study were less informative . The shared alleles for these four SNPs had frequencies ranging from 0.28 to 0.70 in the HapMap CEU sample (http://www.hapmap.org). Thus, there is somewhat less certainty that all of the individuals in this singleton PD sample did indeed descend from a common ancestor, particularly because the analysis was conducted with phase-unknown data and statistical methods were not used to infer phase. Finally, another possibility to explain the differences between our data and those of Paisan-Ruiz and colleagues is genotyping error.
Our study also had some limitations. Because DNA from family members was available for only three mutation carriers, we used Bayesian methods to reconstruct haplotypes in most instances. This has the potential to introduce additional uncertainty into our estimates of the age of the founding event which is not taken into account with the maximum-likelihood methods we used. Also, the use of surnames to categorize mutation carriers as Basque or non-Basque is subject to potential misclassification and it is possible that the ancestors of some subjects came from both groups.
Our data provide further empirical evidence that the Basques have remained genetically isolated over the past one to two millennia. Our findings also lend further support to the idea that important genetic determinants for PD can be highly population-specific. Finally, there is little information available on the prevalence of R1441G in Central and South America where the majority of intercontinental Basque migration has occurred. We are addressing this issue in ongoing studies of the LRRK2 gene in PD cohorts from across these regions.
We dedicate this paper to our dear colleague, Dr. Luis M. Guisasola, a superb clinician, researcher, and educator, who recently passed away. We thank the individuals who participated in the study. This work was supported by the Basque Government and University of the Basque Country (grant S-PE07UN44, M.M.P. and M.C.G-F.); the NIH (NINDS, K08 NS044138, C.P.Z.); the Department of Veterans Affairs (Merit Review Award, C.P.Z.); the Parkinson's Disease Foundation (Fellowship Award, I.F.M.); the Spanish Fondo de Investigacion Sanitaria (grant FIS PI070014, J.I; grant 05/008, V.A.); and the Veterans Integrated Service Network 20 Geriatric, Mental Illness, and Parkinson's Disease Research, Education, and Clinical Centers.
Ignacio F. Mata, Geriatric Research Education and Clinical Center S-182, Veterans Affairs Puget Sound Health Care System, 1660 South Columbian Way, Seattle, WA 98108, USA. Department of Neurology, University of Washington, Seattle, WA, USA.
Carolyn M. Hutter, Department of Epidemiology, University of Washington, Seattle, WA, USA.
María C. González-Fernández, Servicio General de Investigación Genómica: Banco de ADN, Universidad del País Vasco, Vitoria-Gasteiz, Spain.
Marian M. de Pancorbo, Servicio General de Investigación Genómica: Banco de ADN, Universidad del País Vasco, Vitoria-Gasteiz, Spain.
Elena Lezcano, Unidad de trastornos del movimiento, Hospital de Cruces, Baracaldo, Spain.
Cecilia Huerta, Genética Molecular-Instituto de Investigacion Nefrológica, Hospital Universitario Central de Asturias, Oviedo, Spain.
Marta Blazquez, Servicio de Neurología, Hospital Universitario Central de Asturias, Oviedo, Spain.
Renee Ribacoba, Servicio de Neurología, Hospital Alvarez-Buylla, Mieres, Spain.
Luis M. Guisasola, Servicio de Neurología, Hospital Universitario Central de Asturias, Oviedo, Spain.
Carlos Salvador, Servicio de Neurología, Hospital Universitario Central de Asturias, Oviedo, Spain.
Juan C. Gómez-Esteban, Unidad de trastornos del movimiento, Hospital de Cruces, Baracaldo, Spain.
Juan J. Zarranz, Unidad de trastornos del movimiento, Hospital de Cruces, Baracaldo, Spain.
Jon Infante, Servicio de Neurología, Hospital Universitario “Marqués de Valdecilla”, Universidad de Cantabria, Santander, Spain.
Joseph Jankovic, Department of Neurology, Baylor College of Medicine, Houston, TX, USA.
Hao Deng, Department of Neurology, Baylor College of Medicine, Houston, TX, USA. Center for Experimental Medicine, The Third Xiangya Hospital, Central South University, Changsha, China.
Karen L. Edwards, Department of Epidemiology, University of Washington, Seattle, WA, USA.
Victoria Alvarez, Genética Molecular-Instituto de Investigacion Nefrológica, Hospital Universitario Central de Asturias, Oviedo, Spain.
Cyrus P. Zabetian, Geriatric Research Education and Clinical Center S-182, Veterans Affairs Puget Sound Health Care System, 1660 South Columbian Way, Seattle, WA 98108, USA ; Email: zabetian/at/u.washington.edu. Department of Neurology, University of Washington, Seattle, WA, USA.