Loci comparison
The Mediterranean area is a key region for the study of human genetic differentiations as it represents a natural geographic boundary between Europe and Africa, with people of different cultural backgrounds located on both sides. From a genetic point of view, North Africans are closer to Southern Europeans than to sub-Saharan Africans [
47], but different markers reveal heterogeneous results regarding the amount of genetic differentiation or gene flow between the northern and southern shores of the Mediterranean Sea [
25-
35,
37,
38,
40]. In this study, we focused on the region of the Strait of Gibraltar encompassing the Iberian Peninsula and Western Maghreb to try to understand the genetic patterns exhibited by different classical and molecular loci in relation both to natural selection - in particular that affecting the HLA locus - and demographic history. To this aim, we chose 7 independent loci (ABO, RH, GM, MNSs, HLA-DRB1, mtDNA and the Y chromosome) evolving under different evolutionary forces and all tested in a representative number of populations located on both sides of the Strait.
After applying an appropriate re-sampling procedure on the observed data to overcome the problem of data heterogeneity among the different loci, we first estimated several statistics describing genetic diversity in the region under study (Table and Figure ). Three loci - ABO, MNSs, and HLA-DRB1- showed a very low level of differentiation among populations, with (almost) no difference across Gibraltar compared to what was observed on both sides of the Strait. The first two loci code for blood group antigens usually typed by serological methods. ABO exhibits very homogeneous frequencies of its classical A, B and O alleles at the worldwide scale, except in Amerindians where O is almost fixed in most populations [
47]. The global distribution of MNSs haplotype frequencies is more heterogeneous, but, similar to ABO, the patterns are not easily interpreted in relation to the geographic distribution of human populations. Like for many other blood groups, the molecular basis of ABO and MNSs have been clarified recently and previous suspicions of natural selection acting on these systems [
48] have been confirmed by population genetics analyses: actually, ABO is one of the most polymorphic genes in humans [
8]. Although neutrality tests performed on classical ABO frequencies do not allow to conclude to any kind of selection (this study), this polymorphism shows clear evidence of balancing selection at the molecular level, in particular within the O null-allele class [
8,
10]. Interestingly, only 0.02% of the genetic variance is accounted for by differences among populations within the O alleles [
8], which is less than 7-9% estimated for the HLA-B, -C and -DRB1 loci [
49] and much less than the 15% average estimated for other classical and DNA polymorphisms [
50-
52]. Balancing selection is thus also compatible with the observed apportionment of ABO genetic diversity.
The MN polymorphism is determined by the glycophorin A (GYPA) gene for which significant departures from neutral expectations towards an excess of heterozygotes have been confirmed at the population level both on allele frequencies (Additional file
1: Supplemental Table S1) and on molecular data [
43] (on the other hand, the Ss polymorphism defining GYPB does not show almost any deviation from neutrality according to our tests). These results argue in favour of the "decoy hypothesis" whereby GYPA receptors, the most abundant on the erythrocyte surface, would attract pathogens and prevent them to affect more vital tissues [
43]. However, the rapid evolution of human glycophorins may have also been driven by
P. falciparum, ("evasion hypothesis") as both GYPA and GYPB are receptors of this malaria parasite [
44]. In the present study, the low proportion of genetic variance due to differences between North-Western Africa and South-Western Europe is almost as extreme for MNSs than for HLA-DRB1 (Table and Figure ). However, the estimated selection coefficient is much higher for HLA-DRB1 (
s = 2.2%, CI 95% [0.7.-5.5%]) than for MNSs (
s = 0.2%, CI 95% [0.0-9.1%]), and is not significantly different from zero for the latter (Figure ). Also, we found
s = 0.0, CI 95% [0.0-0.3%] for ABO. Therefore, our study suggests that natural selection had a significant influence on the evolution of HLA-DRB1 but is not - or no more - detectable on the other two loci. Other studies failed to demonstrate the consequence of balancing selection on HLA genetic patterns despite clear evidence of deviation from neutrality [
1]. This is probably because natural selection is weak on this gene (e.g.
s = 2.2% for HLA-DRB1 compared to 10-20% for G6PD/A- [
53] and 4-9% for HbC [
54], two cases of selection linked to malaria) and would only be detectable by exploring regions where gene flow is reduced (like across geographic barriers) and where differences with neutral markers would be unambiguous. In addition, natural selection may have operated at unequal intensities in different environments (e.g. in regions characterized by different levels of pathogen richness or prevalence of specific diseases), leaving heterogeneous signals in the genetic pool of human populations. We tested this latter hypothesis on our HLA-DRB1 data by estimating
s independently in NWA and SWE. Interestingly, we found that
s was higher in NWA (1.9%, CI 95% [0.3%-6.2%]) than in SWE (0.7%, CI 95% [0.0%-3-0%]) where it is not significantly different from zero (Figure ). These results suggest that the two regions may have undergone a different environmental history, which is a reasonable hypothesis over the 20,000 years period chosen for our simulations, during which important climatic variation occurred (the beginning of this period corresponds to the last glacial maximum, or LGM).This opens new perspectives for the study of human genetic history where the genetic patterns of partially selected polymorphisms like HLA would be explored in relation to environmental factors varying in space and time, in addition to other parameters.
In sharp contrast with ABO, MNSs and HLA-DRB1, the level of genetic differentiation among populations appears to be particularly high across the Strait of Gibraltar for the Y chromosome. Y-chromosome markers are known to discriminate populations and groups of populations much more than other polymorphisms, with a global variation among populations of 33-39% [
55,
56]. Despite the fact that the estimations of gene flow (
Nm) on each side of the Strait of Gibraltar or across it are remarkably similar for STR and SNP datasets, the genetic differentiation (
FCT) between NWA and SWE measured with SNPs is more than twice that measured with STRs. This result, which was reproduced in an analysis of a smaller dataset that included exactly the same individuals tested for SNPs and STRs (26 samples, total
n = 1552 Y chromosomes,
FCT of 44.2% and 26.6% for SNPs and STRs, respectively), is independent of the pattern of differentiation among populations: population pairwise
FSTs (
RSTs for STRs) were indeed highly correlated (
r= 0.987). Genetic differentiation measured by STRs could be lowered because of the specific mutation process driving the evolution of microsatellite loci, which can produce alleles identical in state but not identical by descent, thereby rubbing out the effect of genetic drift [
57-
60]. However, Rousset [
61] has shown that homoplasy has no simple effect on
FST, because this measure is not only affected by the mutation rate at microsatellite loci but also by the mutational model governing them. On another hand, a recent study that compared large-scale SNP and STR genotyping in the Human Genome Diversity Panel (HGDP) concluded that SNP-based
FSTs could be inflated by ascertainment bias [
62]. It seems thus plausible that a combination of factors, i.e. ascertainment bias in Y-chromosome SNPs and homoplasic effects in Y-chromosome STRs concur here to make estimations of population subdivision diverge. Note also that we encountered problems to reproduce by simulation some characteristics of both Y-chromosome SNP and STR datasets:
i.e. the very high variance of genetic diversity between samples for STRs (see the standard deviation for the gene diversity
sd H in Additional file
1: Supplemental Figure S8) and the very high genetic differentiation between continents for SNPs (see
Dinter in Additional file
1: Supplemental Figure S9). This discrepancy between observed and simulated statistics could be due either to an overrepresentation of frequent mutations in the SNP dataset [
63] or to a choice of very polymorphic STRs (i.e. for forensic purposes), a kind of ascertainment bias that we are not able to reproduce by simulation.
Whichever the nature of the markers used, the remarkable finding of higher levels of continental subdivision associated with the Y chromosome than with other polymorphisms could be due to the fact that haploid components of the genome are more influenced by genetic drift and selection than diploid genes, due to their smaller effective population size [
64]. However, a very different pattern (i.e. only a weak genetic barrier at the Strait of Gibraltar) is observed in this study for mtDNA, which is also haploid, thus arguing for a higher female effective population size [
56]. The peculiar behaviour of the Y chromosome could then indicate some sex-specific history of migration in the Mediterranean area, with a major demographic effect of males in both Europe and North Africa, at least during the Neolithic [
65], [
66], and significant female gene flow across the Strait of Gibraltar. Therefore, although contradictory results have been obtained elsewhere between observed mtDNA/Y-chromosome diversity patterns and their expectations based on patrilocality and matrilocality [
56,
67], a higher level of female migration, as that observed at a global scale [
68], is here evidenced for the first time across a sea barrier. Finally, beside sex differences in migration rates, another possible explanatory factor for higher female than male effective population size that is receiving more attention now is a higher variance in reproductive success for males than for females [
69,
70]. All the hypotheses given above to explain the results of the Y chromosome are of course not mutually exclusive.
RH, GM and mtDNA exhibit close and intermediate proportions of genetic variation across the Strait of Gibraltar, compared to ABO, MNSs, and HLA-DRB1, on one side, and the Y chromosome, on the other side. We thus consider that they are closer to an average for neutral markers, with a significant FCT between 2.2 and 4.7% across the Strait, and a genetic variation (FSC) of 1.2 to 2.2% on both sides of the Strait (Table and Figure ). This result is particularly relevant because close values are found for two nuclear loci (RH and GM, described by frequency data) and one sex-specific molecular marker (mtDNA, described by DNA sequences), which are a priori difficult to compare.
Ancient genetic pattern
Because demography is acting simultaneously on the whole genome (contrary to selection which acts locally), we used 4 loci (RH, GM, mtDNA and the Y chromosome) to infer the demographic scenario which best fits the current genetic structure around the Strait of Gibraltar (Western Mediterranean). Our simulations show without ambiguity that the genetic pattern observed in the Western Mediterranean was mostly constituted in pre-Neolithic times. Indeed, the most probable
scenario (
P) involves gene flow since 20,000 years, not only between populations located on both sides of the Strait but also across the Strait. Time elapsed since the Neolithic transition was too short to allow for the current genetic structure to emerge during this period. This is revealed by the very low probability associated to
scenario N compared to all other
scenarios involving gene flow during the Palaeolithic (Figure ). This result is compatible with the fact that the genetic pool of South-Western Europe (in particular the Iberian Peninsula) has been only weakly modified by the Neolithic transition [
71] and that the genetic impact of the Neolithic transition in North Africa has been limited to eastern regions according to classical genetic markers [
25], although the picture is less clear for mtDNA [
72]. The notable exception is the Y chromosome for which a non-negligible proportion of simulations starting in the Neolithic period give compatible results. It has been suggested that the Y-chromosome genetic structure observed in both North Africa [
65] and Europe [
66] is mainly the result of early food-producing societies, which matches rather well our observations. However, we cannot be conclusive about the scenario that best fits Y-chromosome diversity because scenarios
N or
P may be alternatively preferred depending on marker types (SNPs and STRs, Figure ) and deme size (Additional file
1: Supplemental Figure S10). Moreover, as already stated above, our simulations of Y-chromosome data failed to reproduce the actual data with as much accuracy as they did for the other genetic systems.
It is relatively surprising that we do not obtain a better fit to the observed data when considering the Neolithic transition and the Arabian conquests, in addition to the gene flow occurring in the Palaeolithic era (Scenarios
PN and
PNI, Figure ). The first obvious explanation is that our simple models for the Neolithic transition and Arabian conquests do not capture the principal features of those two events. Alternatively, recent demographic events would not have substantially disturbed the genetic pattern established during the Palaeolithic, which seems compatible with recent theoretical studies suggesting a strong inertia of local genetic pools [
73]. In any case, our results support the view that the genetic impact of the Arabian conquest in the Maghreb has been limited, particularly in Morocco and even less in the Iberian Peninsula which was invaded mostly by Maghreb Berbers under Arab leadership [
30]. More refined modelling would be necessary to better study the impact of those events on the genetic structure.
Gene flow on both sides of the Strait of Gibraltar
Our results show that gene flow between populations either within South-Western Europe or within North-Western Africa is not particularly reduced. We compensated the relative lack of precision of the point estimates obtained individually with each marker by multi-locus analyses.
Nmintra is thus estimated between 43.6 and 97 in our study. This estimation is lower than the estimation of 164 +/- 21 obtained for a worldwide STR dataset [
74] but is concordant with another estimate obtained for post-Neolithic populations from mtDNA (> 20 [
75]). Under our model,
Nmintra represents a rough estimate of the mean gene flow between populations in the studied area since the Last Glacial Maximum (~20,000 years ago). This rough estimate neither takes into account the variation of
Nm over time, nor at specific periods such as after the Neolithic transition.
Unfortunately, we did not obtain very precise estimations for the other demographic parameters, notably
Nminter which measures gene flow across the Strait of Gibraltar. We estimated a
Nminter between 4.2 and 64.7 with respect to the mean gene flow between populations located on each side of the Strait (within South-Western Europe and within North-Western Africa) but the overall reduction is not as strong as the one estimated for the Y chromosome (
Nminter = 2). This rough estimation confirms that the Strait of Gibraltar does not constitute such a strong barrier as suggested by Y-chromosome data. For comparison,
Nm estimated on the basis of mtDNA for nowadays hunter-gatherer populations is smaller than 5 [
75]. Substantial gene flow across the Strait is not particularly surprising considering that its width has been at maximum equal to 12 kilometres (present time). Our main explanation of the wide interval obtained for the estimation of
Nminter is that our model lacks certain features that may have impacted on the level of gene flow between populations across the Strait of Gibraltar:
i) migrations may have been periodical rather than continuous over time. One can imagine that gene flow across the Strait resulted from the movement of groups of individuals at several periods of time, due for example to climatic changes (sea-level up and down) or for cultural reasons;
ii) the Mediterranean Sea had a profound impact on exchanges between populations located around it [
24,
76], but its exact role as a vector or barrier to migration may have been variable in time and in different regions. In particular, the Mediterranean Sea may have promoted east-west migration along its coasts but its influence on north-south migration is uncertain. Our model of constant gene flow is maybe too simplistic to capture the impact of maritime movements over the Mediterranean Sea;
iii) very different migration patterns for males and females across the Strait may also contribute to blur the signal.