As described previously, total frequencies for the haplogroup H decline toward both the East and the South (Table ). The haplogroup H represents 44% of the mtDNA variation in the Iberian Peninsula, but only 22% in the Near East. Likewise, this distribution still reaches 25% in North Africa, but drops to only 9% in the Arabian Peninsula. Haplogroup H subclade distribution is also very different in the various regions. Subhaplogroups H1 and H3 are the dominant subgroups in the Iberian Peninsula (45% and 16%, respectively) and North Africa (42% and 13%, respectively) whereas unclassified H haplotypes (H*) account for 40–50% of the H diversity in the Arabian Peninsula and the Near East. Furthermore, while H1 (12%) is still the most frequent subgroup, followed by the H5 (8%) in the Near East, the modal subclades in the Arabian Peninsula are H2a1a (18%) and H6b (14%). Pairwise FST distances based on sub-haplogroup frequencies display a high heterogeneity among the main regions (Table ). However, the level of statistical significance between the Iberian Peninsula and North Africa (p < 0.05) is lower than that for any other pairwise comparison (p < 0.001). In addition, within North African populations, the Tunisians, Tunisian Berbers and Moroccan Berbers are different from the Saharan and Moroccan Arabs, while the last two are comparatively less different from the Iberian Peninsula. The relative proximity of the Iberian Peninsula to the westernmost North African populations is graphically reflected in Figure . It is evident that Tunisians and Berbers are closest to the Near East and the Arabian Peninsula. A principal component analysis (PCA) points to subhaplogroups H1 and H3 as being primarily responsible for the Iberian-Moroccan-Saharan connection, whereas H4, H5, H7, H8 and H11 testify the Near East influence (data not shown). Similarly, haplotypic based FST distances show a strong influence of the Iberian Peninsula on the Western Moroccan and Saharan North African populations, and indicate that Tunisians are comparatively the most remarkably influenced by the Near East (Table and Figure ). Globally, North Africa shares a similar number of haplotypes with the Iberian Peninsula compared with the Near East (Table ). However, a detailed analysis of the ratios between haplotypic identities relating each North African population with the Iberian Peninsula or the Near East confirms that the Western populations, comprising Moroccan Arabs, Saharans and Mauritanians, are the most notably influenced by the Iberian Peninsula, whereas the Tunisian Berbers, Tunisians, and the Moroccan Berbers have received relatively more gene flow from the Near East (Table ). At this point, it is noteworthy that all the Arabian Peninsula haplotypes shared with North Africa are a subset of those shared by the latter with the Near East, pointing to a minor direct input of the Arabian Peninsula on the North African populations. Haplogroup (Table ) and haplotype (Table ) genetic diversities demonstrate that the Northwestern African populations (Moroccan Arabs and Saharans) are genetically less diverse than the more central Tunisian and Berbers, a fact that could be explained by a stronger Near East influence on the later populations. Although global haplogroup and haplotypic diversities are not statistically different among regions (Table and ), the European subgroup H1 appears to be significantly more diverse in the Near East (87 ± 5) than in the Iberian Peninsula (75 ± 3) or North Africa (67 ± 6). Moreover, the genetic diversity for the Western European subgroup H3, which is absent in the Near East, is also higher in North Africa (74 ± 9) than in the Iberian Peninsula (65 ± 6). Transformation of molecular genetic diversities in coalescence ages gives 18,345 ± 4,051, 14,201 ± 2,984, and 11,366 ± 2,354 years for H1 in the Near East, Iberian Peninsula and North Africa, respectively. On the other hand, the coalescence ages for H3 in the Iberian Peninsula (10,342 ± 2,634) and North Africa (10,866 ± 4,107) are similar. However, only H1 ages in Near East and North Africa are statistically different from each other.
Distribution of subhaplogroup H frequencies (%) in the studied populations.
FST (by 1,000) based on subhaplogroup, above the diagonal, and haplotype, below the diagonal, frequencies.
Graphical relationships among the studied populations. Codes are as in Table 1. MDS plots based on FST haplogroup (a) and haplotypic (b) frequency distances.
Population and regional haplotypic composition.
The relative affinities among regions are based on subhaplogroup frequencies, which do not take into account differences between haplotypes assorted in the same subgroup, or in haplotypic matches, whose identity is based only on partial HVSI sequences. In addition, it has to be taken into account that half of the H lineages detected in North Africa are not shared with other regions and that this percentage is even greater in the putative source regions of the Near East (70%) and the Iberian Peninsula (76%). These facts point to a higher differentiation among regions and between populations than those observed previously. Indeed, complete or nearly complete sequencing of some apparently identical samples indicates that the real genetic heterogeneity among regions is greater than those estimated above (Figure ). To begin with, the HVSI motif 16093 -16189 that characterizes subgroup H1f was found in an individual (Mor 2047) from Morocco (Figure ) also in an H1 background. This sub-group is particularly abundant and mainly restricted to Finland and the surrounding populations [36
]. At first sight, this coincidence would seem to point to a new link between North European with North African populations like that found previously for U5b1b [26
]. However, in this case, further analysis of the coding region in the North African sample revealed a lack of the three coding region mutations that additionally characterize the Finish H1f subgroup [38
] (Figure ). This lack of identity between haplotypes assorted in the same subgroup and sharing the same or similar HVSI motif can be extended to other cases. For instance, there is a group of H sequences that shares the 16145 – 16222 HVSI motif consistently found in Northwestern Africa, the Sahara and several Western Sahelian populations [15
]. The complete sequencing of a Mauritanian sample (Mau 2027) allowed the assignation of this type to the subhaplogroup H1 (Figure ). The direct connection of this motif with a German sequence was previously suggested [15
]. However, the additional presence of transitions 16304 and 456 in the HVSI and HVSII regions respectively in that German haplotype [43
] indicated that it should be classified as belonging to the H5 instead of the H1 subgroup, which does not support a direct link between these regions. In contrast, the two 16145 – 16222 haplotypes sporadically detected in the Iberian Peninsula [[44
] and unpublished results] belonged to the North African subgroup as they shared the coding 10257 mutation, in addition to the H1 diagnostic transition 3010, with the totally sequenced Mauritanian sample (Figure ). It seems that the 10257 transition defines a new subgroup within H1. This fact points to a possible, although not recent, North African demic influence on the Iberian genetic pool. Another interesting group of sequences belonging to the H1 subgroup in North Africa is that characterized by the 16172 – 16311 motif, which we [15
] and others [19
] have found mainly in Saharan samples. Haplotypes with, or including, this HVSI motif have also been detected in European [45
] and in Asian [50
] samples, but not in the Iberian Peninsula yet (see Additional file 1
). However, the possibility of direct phylogenetic links among such distant regions is very weak, because all of those individuals further classified in both regions belong to the H5 subgroup or the HV haplogroup [48
] in Europe, or to the HV or the R2 haplogroups [53
] in the Middle East, which strongly points to yet another case of HVSI convergence in distinct backgrounds of coding regions. In addition to the CRS, the 16189 and the 16311 HVSI motifs are quite abundant in North Africa (see Additional file 1
). However, when these samples were screened for the coding region positions observed in completely sequenced European or Middle East individuals that held the same HVSI motifs (Figure ), none of these positions appeared in the North African samples. This lack of homogeneity again strongly points to their different monophyletic coding backgrounds, in spite of their HVSI matches, a fact repeatedly found in other studies [38
]. Indeed, in this study, there are also instances of molecular convergence in the coding region. Sequences How 73H and Jor 843 share the 12236 transition, although they respectively belong to the H* and H5 subgropups (Figure ). The 12358 transition also presents one such case that is shared by four sequences (Her 127, Ach 28, MM H2, and Mau 2027) belonging to different H subgroups (Figure ).
Figure 2 Phylogenetic tree of complete (continuous branches) or nearly complete (discontinuous branches) haplogroup H mtDNA sequences. Numbers along links refer to nucleotide transitions. "A" and "T" indicate transversions; "d" deletions and "i" insertions. Recurrent (more ...)