The phylogenetic relationships of rs9786140 [M412], rs9786194 [M415], rs17842518, and rs17250114 SNPs26
were initially investigated using DHPLC in a geographically diverse set of 30 haplogroup R samples. During this process, two new markers [M478, M479] were detected in the flanking regions of rs17842518 and rs17250114, respectively, and confirmed by direct sequencing. In addition, markers L11, L23, S116, M520 and M529 were evaluated in our data set based on unsolicited insights from the genetic genealogy community. The phylogenetic relationships of Y-chromosome haplogroup R components studied are presented in . It is important to recognize that ascertainment bias is likely responsible for the currently observed disproportionate branching patterns.5
All branches are shown with common marker names and labeled according to standard YCC nomenclature guidelines.29
Henceforth, for shortness and clarity, the haplogroups are referred to in the text by the defining mutations rather than the cumbersome YCC labels. Y-chromosome genotype data available for 10355 DNA samples were used to evaluate the diversification of 2193 haplogroup R-M343 samples, the majority (n
=2043) of which were derived for the M269 mutation. All of the haplogroup R-M207 chromosomes studied were derived for either the R1-M173 or R2-M479 markers, ie no R-M207*
chromosomes were detected in our sample (the star *
symbol here refers to the unresolved status in the phylogeny beyond the given marker). However, we cannot rule out the possible existence of such lineages, as our study lacks coverage in Central Asia and India. It should be noted that some previous studies on India have reported the presence of R-M207*
, ranging from ~1–3%.11, 32
The frequencies of basal haplogroup R1a-M420*
and various haplogroups associated with R2-M479 and R1b-M343 elements surveyed in populations from the West, North, East, Central and Southeast regions of Europe, the Circum-Uralic regions, the Caucasus and Near/Middle East, Turkey and Pakistan are presented in Supplementary Table S4.
Figure 1 (a) Phylogenetic relationships of haplogroup R binary polymorphisms studied. The names of six polymorphisms whose phylogenetic positions were determined in representative-derived samples, but not surveyed in the entire sample collection are indicated (more ...)
Although the frequency of R1 lineages is currently the highest in Europe, the phylogeographic argument for their origin outside Europe, likely somewhere in West Asia, arises from the geographic distribution of the primary splits in the R1 phylogeny: at least three basic R-M207-derived haplogroups – R1a-M420*
and R2 – occur mostly outside Europe. shows approximate locations of the 118 populations studied and proportional sample sizes. As the intensity of sampling is thin relative to the expanse of West Asia, the spatial-frequency surfaces for this region should be viewed as preliminary. Of the total of 193 R1b-M73 chromosomes detected, all except two Russians occurred outside Europe, either in the Caucasus, Turkey, the Circum-Uralic and North Pakistan regions (), in contrast to its considerably more widespread companion R1b-M269 clade (). With the exception of rare incidences of R1b-V88 in Corsica, Sardinia13
and Southern France (Supplementary Table S4), there is nearly mutually exclusive patterning of V88 across trans
-Saharan Africa vs
the prominence of P297-related varieties widespread across the Caucasus, Circum-Uralic regions, Anatolia and Europe. The detection of V88 in Iran, Palestine and especially the Dead Sea, Jordan (Supplementary Table S4) provides an insight into the back to Africa migration route.
The frequency data for 13 major R1b1-P297 components with minimum frequency ≥10% were used to create spatial distribution maps (), whereas the phylogenetic relationships of the haplogroups are shown in . Besides the obvious differences in the geographic spreads of the M73, M269 and V88 branches that stem out of the R1b-M343 node as noted above, there are apparent geographic patterns also in the downstream branches, between markers M412 and M222 (). Although it is likely that additional sub-haplogroups within the more numerous L23*(xM412) assemblage currently remain hidden, it is instructive that these chromosomes often exceed 10% frequency in the Caucasus, Turkey and some SE Europe and Circum-Uralic populations (Supplementary Table S4; ), whereas conversely they typically display frequencies ≤5% in Western Europe (except for an instance of 27% in Switzerland's Upper Rhone Valley) in contrast to the prominent spread of derived M412 varieties in West Europe ().
Major R1b Founder Effect in West Europe
R1b-M412 appears to be the most common Y-chromosome haplogroup in Western Europe (>70%), while being virtually absent in the Near East, the Caucasus and West Asia (). Recent founder effects could explain why the M412-L11 assemblage of chromosomes is abundant and restricted to Western parts of Europe ().
Examples of additional founder effects and subsequent demographic expansions are evident among the more prominent L11-related, S116 () and U106 () components that generally distribute West and East of the Rhine river basin, respectively. Within the three major sub-haplogroups of the S116 assemblage further geographic localization is evident. Specifically, S116*(xU152, M529) occurrence is maximal in Iberia (), whereas the U152 branch is most frequent (20–44%) in Switzerland, Italy, France and Western Poland, with additional instances exceeding 15% in some regions of England and Germany (). Last, the M529 clade is highest (25–50%) in England and Ireland (), with the M222 sub-clade () mainly restricted to Ireland.
As the methodology assumes one founder, the expansion times will be inflated if multiple founders or recurrent gene flows have occurred. Thus, these estimates should be viewed as the upper bounds of dispersal times. A total of 1029 chromosomes were included in the Y-STR-based coalescent analysis involving components of the R1b-M343-affiliated phylogeny. The coalescent estimate for the Y-STR network tree of 245 M269*
+L23(xM412) chromosomes is 10
270±1680 years Before Present (BP). This estimate approximates the median TMRCA dates (8.5–12.5k years) of M269 clade across Europe based on alternative demographic inference methodology.33
Our estimate of 8870±1708 years BP, based on 757 M412 chromosomes, suggests that the M412 lineage evolved in Europe soon after the arrival of a L23*
ancestor. The coalescent times for 11 sub-haplogroups averaged across populations in which the sample size was 5 are presented in Supplementary Table S2. Notable are the equivalent expansion times for all S116 (n
=481), Td=8630±1529 years BP and U106 (n
=239), Td 8742±1551 years BP-related lineages.
Archeologically, there are two attested phases regarding the geographic spread of the Linearbandkeramik
(LBK). The first phase extended to the upper Danube river near Munich. The second phase extended further to the Paris basin.34
Furthermore, there is evidence of several post-LBK Neolithic expansions, ca 6000 years BP from the Paris basin region toward Northern Italy, Southern France and Iberia, characterized by the Chasseen horizon,35, 36
as well as to England.37
We examined the geographic regional patterns of S116, U106, U152 and M529 haplogroups more quantitatively within particular distance classes by spatial autocorrelation analysis. All these four sub-clades displayed clinal distributions of frequency variation (Supplementary Figure 1).
We investigated the association of Td for the S116 assemblage with the great circle distance from both Paris and Munich and U106 from Warsaw both as representations of the transition to agriculture in the North-Central European plain. shows that S116 Td decreases with distance from Paris (r=−0.51, P<0.025, n=16 one-tailed Pearson's) and with distance from Munich (r=−0.49, P<0.05, n=16 one-tailed Pearson's). There was no significant correlation of U106 Td with distance from Warsaw (r=−0.40, ns). As Td estimates are sensitive to outliers, we also calculated the correlations between the mean Y-STR variance at distances from Paris and Munich. Both correlations remained significant. It is important to recognize that we used regression analyses to identify the approximate geographic source of S116 diversity as it spreads outward and not to chronologically date the spreading events as multiple S116 lineages were likely involved.
Figure 2 Regression plots of coalescent times for S116 lineages vs distance from (a) Paris and (b) Munich. Population codes: France (fra); Germany (ger); England (eng); Switzerland (swz); Netherlands (net), Ireland (ire); Denmark (den); Italy (ita); Slovakia (slk); (more ...)
We conducted principal components analysis to investigate affinities of haplogroup R1b fractions among different populations based on the frequency distributions of M269*, L23, M412*, L11*, U106, S116*, U152 and M529 with respect to total M269. shows the contributions of the sub-haplogroups to the first two principal components. The first principal component separates L23 from M412 and its sub-clades, whereas the second differentiates the sub-haplogroups within M412. shows Western Europeans clustering in an approximately congruent manner with geography (according to the frequencies of M412 sub-clades) on the left, with Central and Eastern Europeans in the middle and a group of populations from the Balkans, Turkey, the Caucasus and the Circum-Uralic region on the right, separated by a high frequency of L23.
Figure 3 Principal component analysis by haplogroup R1b sub-clades: (a) M269*, L23, M412*, L11*, U106, S116*, U152 and M529 sub-haplogroups with respect to total M269, and (b) by collapsing the 118 populations into 34 regionally (more ...)