The region under study herein is shown in . The map shows the Levantine countries and regions (modern borders), including Lebanon, Syria, Akka and Jordan.
The Y-chromosomal hg distribution in the Levantine population was compared to the surrounding Middle Eastern and North African countries () using 884 newly collected samples in addition to our existing Middle Eastern population database. (Table S1
and Table S2
). As previously reported (Zalloua et al., 2008b
), the most frequent haplogroups present in the Levant are J1, J2, R1b, E1b1b1 and I (Figure S1
), and these are shown Figures 2
, although the statistical analyses, such as Nei diversity and AIDA, were performed on all the haplogroups found in each sample, as shown in . We first consider these distributions individually.
The haplogroup frequency for J1 peaked in the Arabian Peninsula (Yemen, UAE, and Kuwait) and decreased beyond the Middle-East and North Africa (). J1 frequencies in Syria, Akka and Jordan were more comparable to Lebanon than to the remaining Arabic countries (58.3% in Qatar and 72.5% in Yemen; ). Hg J2, in contrast, was present at its highest frequency in the Lebanese population (29.4%) and was significantly more frequent there, than in the remaining Levantine regions (p < 0.05) (). As previously reported (Zalloua et al., 2008a
), it decreases towards the west in North African countries and towards the east in the Arabian Peninsula (29.4% in Lebanon compared to 7.6% in Egypt and 8.3% in Kuwait; and ).
The frequencies of the R1b and I haplogroups (Rootsi et al., 2004
) peaked in Europe and the gradient faded beyond the Levant (). R1b showed some variability in the Levant (4.5% - 9%), had minimal presence in Qatar (1.4%) and was absent from the Yemen sample. Iraq and Kuwait showed significantly higher frequencies of R1b (10.8% and 9.5% respectively; ) which may be explained by the strong historical Ottoman influence (Al-Zahery et al., 2003
Finally, E1b1b1 (previously E3b), showed the highest concentrations in North African and Berber-speaking populations (Egypt, Morocco and Tunisia; ) (Cruciani et al., 2002
, Bosch et al., 2001
). It showed significant variability in frequency among the Levantine regions (16.2% in Lebanon, 12% in Syria, 26.4% in Akka and 23% in Jordan) (pairwise comparisons: p-value Lebanon vs. Syria = 0.015, Lebanon vs. Akka = 0.009 and Lebanon vs. Jordan = 0.007); however, E1b1b1 frequencies in the Levant as a whole were significantly lower than those in North African countries (42.7% in Egypt, 51.3% in Tunisia and 52.5% in Morocco; ) (p-values for pairwise comparisons with Lebanon all < 0.001).
We next investigated the overall Y-chromosomal genetic structure of the region. In an autocorrelation (AIDA) analysis, the autocorrelation index II
decreased from positive to negative values with increasing geographical distance (), demonstrating an underlying clinal pattern: nearby populations tend to be similar (positively correlated), while distant populations tend to be dissimilar (negatively correlated). SAMOVA, however, invariably distinguished additional single samples as the number of groups specified was increased, revealing a lack of distinct clusters of geographically contiguous samples, perhaps reflecting the sampling strategy which provided multiple Levantine samples and diverse set of more distant ones (Table S4
Analysis of binary marker data in Middle-Eastern and North African populations
In order to examine the haplogroup distribution further, we performed a PCA analysis on the frequencies of the nine haplogroups listed in , with any additional rare haplogroups combined into a single ‘others’ category. We included the Levantine regions plus samples from Egypt, Morocco, Tunisia, Cyprus, Turkey, Iran, Iraq, Jordan, Qatar, UAE, and Yemen. The percentages of variance associated with each principal component are shown in (lower right panel). PC1 captures 61.3% of the variation, followed by a substantially smaller 27.1% for PC2, with PC3 and PC4 explaining 6.4% and 2.2% respectively. Following this, the remaining PCs carry 3.0% of the variation (Table S5
PC1 increases with decreasing E1b1b and increasing J1 (Table S5
). It showed the largest variation across North Africa, reflecting the high frequency of E1b1b across this region, together with the more localized distribution of J1 in the East. The PC1 scores place the Levantine sites close to each other and close to Cyprus and Egypt in the span from Morocco to the West to Qatar and Yemen to the East. Within this group, SC, LC, and SIS show a reduced J1 score relative to inland Levantine regions. PC2 increases with increasing J2, and decreasing J1 and E1b1b (Table S5
). Its distribution identified a gradient in the south to north direction through the Levant, placing Africa in the southern portion of the Levant along with UAE, SIE, PS and Qatar. However, the localized J1 and J2 gradients show increasing values for more coastal sites LC, SC, SIN, LI, and SIC compared to PS and SIE. PC3 increases with decreasing J2 and increasing L (Table S5
). Almost all of the regions appeared similar to each other, including all the North African samples, except for Iran and SIE. PC4 increases with increasing R1b and G, decreasing R1a and J2 (Table S5
). This principal component shows the largest spread among Levantine sites. In this case, SC, LC, LI, and SIE show larger values, while SIC, SES, PS, and SIN show smaller ones.
The principal components capturing the largest variations primarily establish the Levant within the context of the larger-scale Neolithic signal across North Africa, as well as variations between Iran and Iraq, and the Arabian Peninsula. PC4 shows the strongest signal differentiating among Levantine sites.
An MDS analysis of Y-STR-based genetic distances RST (2D MDS showed Stress = 0.08178 and RSQ = 0.98172), showed similar general features as the two leading principal components. Levantine populations were mostly clustered, while North African populations were progressively more distinct as the distance west increased. Two exceptions were SIE, which however is seen to be divergent in PC3, and the similarity between Jordan and Cyprus, not observed among any of the PCs.
A higher resolution contour map of haplogroup frequency distribution among Levantine cities (Table S3b and S3c
) revealed coast/inland opposing gradients for J1 and J2 (). We then regrouped the Levant into northern and southern regions (as shown in ) and into three regions going from west to east (coast, inland and further inland regions). J1 frequencies but not J2 were significantly different along the South to North axis of the Levant. More strikingly, going from coastal to inland regions in the Levant, there was a significant increase of J1 frequencies (19.8 to 48.2%, p<0.001) compared to a significant decrease of J2 frequencies (26.8% to 3.4%, p<0.001) (). This steep difference between the coast and inland regions was particularly remarkable considering the small geographical area under consideration.
Coastal/inland and north/south classifications of haplogroup frequencies in the Levant
Application of the Nei diversity estimator to haplogroup frequency data from the individual Levantine populations showed a minimum value 0.669 for SIE. This would suggest that that population is dominated by roughly 3 haplogroups. Haplogroups J1 and L are the most frequent, with a number of other rarer haplogroups. The rest of the entries in show diversities in the 0.7 to 0.8 range, suggesting a rough average number of dominating haplogroups in the range of 3.3 to 5. shows that these populations are dominated by two or three haplogroups, but with slightly lower relative frequencies than in the SIE population, and with higher frequencies among the remaining haplogroups. The coastal and near coastal groups SC+LC+PS and SIN+SIC+LI+SIS show similar Nei diversity values, while the far-inland SIE shows the greatest reduction in Nei diversity.
Diversity values were further investigated by including both haplogroup and STR haplotype, and varying the number of STR loci used. The number of SNP+STR types ranged from 257 for 2 STR loci through to 1286 for 10 STR loci (). For such large numbers of STR+SNP types, the probability that any two chromosomes drawn from the population may be expected to share the same type is small. For all samples, the estimated probability that two chromosomes would share the same STR+SNP type was roughly twice as high for inland samples than for coastal samples, indicating consistently lower inland diversity. Further, the differences between inland and coastal diversities were much larger than the corresponding estimated standard errors (). Estimation of the effective population size using BATWING, however, led to similar numbers for the two regions: ~1,200 (95% CI 840 - 1550) near the coast and ~1,230 (95% CI 934 - 1534) inland.
We sought, through STR network analysis, to assess whether or not the observed geographic distribution of each haplogroup was reflected in geographic variations of STR haplotype distributions. The J1 and J2 () sister clades depicted a clear non-uniform geographic distribution of STR haplotypes and few instances of haplotype sharing across geographic regions. Consistent with previous analyses, coastal Levantine regions were well represented in the J2 network. Some evidence of sharing with Jordan was also apparent. The J1 network was dominated by inland Levantine samples (mainly Jordan and inland Lebanon and Syria). The R1b network showed much less geographic correlation, possibly because most of the R1b chromosomes have entered the region recently (). In fact, without extra-Levantine representation of R1b to establish context, it is difficult to identify where these R1bs originated. Finally, E1b1b showed a clear demarcation between the Levantine STR haplotypes and North African STR haplotypes, with a lower diversity among North African STR haplotypes than among Levantine STR haplotypes (). 91 STR haplotypes belonged to the E1b1b1 haplogroup within the Levantine population compared to 60 STR haplotypes within the N. African population (data not shown).