Analysis of mtDNA from 266 Malagasy individuals (Table ) is broadly consistent with previous genetic studies [15
]. We see a combination of Southeast Asian and African lineages that are likely to trace back to the initial settlement of the island around the 7th
century AD. However, our results based on complete mitochondrial genomes also revealed the presence of five novel mtDNA lineages that cluster into a previously uncharacterized clade whose geographic distribution seems to be restricted to the island of Madagascar (Additional files 1
). The age estimates for this clade and its main sub-branches are shown in Table .
Inferred frequencies of mtDNA haplogroups in three Malagasy populations: Merina, Vezo, and Mikea.
Molecular dates estimated for the TMRCA and founder of Malagasy haplogroups M23, M23a and M23b from coding region information.
Of the five novel lineages one was found among the Mikea hunter-gatherers (at a frequency <1%) and four among the Vezo fishermen (at a frequency of ~4%) (table , highlighted). Comparative phylogenetic analysis of worldwide mtDNA genomes confirmed the clustering of these five lineages into a deep-rooted branch within macrohaplogroup M, which we name M23. This new branch carries all the diagnostic polymorphisms of macrohaplogroup M as well as a substantial series of mutations that separates it from the root of macrohaplogroup M (Figure ). Haplogroup M23 is characterized by 11 coding region mutations (viz. 2706-8360-9438-9545-10142-10295-11569-11899-12279-12618-15025) and 8 control region mutations (viz. 152-195-204-417-533-16263-16311-16519) (Figure ). The absence of a polymorphism at 2706 in the five M23 carriers indicates a back mutation, and we consider this position another basal polymorphism of haplogroup M23. Haplogroup M23 splits into two branches, M23a and M23b. The former is represented by one individual from the Vezo group whilst the latter, defined by a substitution at np 8188 encompasses all the remaining Vezo lineages and is also present in the Mikea.
Figure 1 Phylogenetic tree constructed from complete mtDNA sequences for five Malagasy individuals. M and V represents the Mikea and Vezo populations, respectively. Mutations were scored relative to the rCRS . Numbers along links refer to nucleotide positions. (more ...)
So far we found no convincing association of M23 with any known M branches. None of the diagnostic coding region mutations of M23 overlaps with the diagnostic markers in other M haplogroups that emerge directly from the root of macrohaplogroup M (see van Oven and Kayser [28
]). While some control region mutations (152, 195, 16311, 16263 and 16519) are shared by other deep-rooted M haplogroups (e.g., M1, M28, M29 and M46), these positions are known to be recurrent and cannot be safely considered as linking M23 to other haplogroups within macrohaplogroup M. This confirms the robustness of our phylogenetic reconstruction, and the basal position of M23 within M.
More detailed examination of the phylogeny, geographic distribution, and molecular dating of the M23 lineage reveals three further key points:
(1) As noted before, the position of M23 at the root of macrohaplogroup M indicates that M23 is a deep branch of the human mtDNA phylogeny. The length of the M23 branch suggests either strong genetic drift effects or that this cluster may encompass further branches yet to be identified. Indeed, a relatively small proportion of mtDNA variation has been surveyed in the putative areas of origin of M23. Therefore more extensive sampling is needed to refine the overall geographic distribution and branching structure of this clade, However, the fact that this clade has no specific link to other known branches within macrohaplogroup M suggests a deep-rooted ancestry, possibly tracing back to the Out of Africa event. Such a deep root is also shared with many other lineages that emerged independently from the root of macrohaplogroup M. These lineages are especially prevalent in South Asia [2
]. This general pattern has been interpreted as supporting the view of a rapid dispersal of modern humans at the time of the out-of-Africa exodus, followed by a long period of isolation resulting in non-overlapping distributions of derived M haplogroups in relict or isolated populations/regions along the dispersal route. Thus, our results suggest that the Mikea hunter-gatherers and Vezo fishermen of Madagascar descend, if only in very small part (≤4%), from one such deep-rooted, isolated population.
(2) M23 lineages have an extremely restricted geographic distribution. A survey of all complete mtDNA sequences reported in the literature (>6,700 sequences; http://www.phylotree.org/
) could not detect M23 sequences anywhere outside Madagascar. Moreover, the screening of control region polymorphisms that are diagnostic for M23 against a larger global panel of mtDNA control region variants confirmed that the M23 control region motif is indeed rare, as only four individuals shared the 13 control region diagnostic mutations for M23 (Additional file 1
). Although comparative analysis based only on the first hypervariable sequence (HVS1) reveals a few more individuals that share the four HVS1 mutations of M23 (16223, 16263, 16311 and 16519), these nucleotide positions are known to be fast-mutating and recurrent, and consequently cannot be considered diagnostic of haplogroup M23 (Additional files 1
). Interestingly, three of the four individuals sharing the 13 control region mutations for M23 are African Americans who are likely to trace their ancestry to sub-Saharan Africans, although no M23 carriers have been detected on mainland Africa itself (Additional files 2
). The fourth individual is from the Arabian Peninsula (Dubai, United Arab Emirates), a region placed in Southwest Asia which has a long history of interactions with Africa, probably dating back to the dispersal of modern human along the southern dispersal route [3
]. The modern population of Dubai has a genetic composition strongly influenced by female-mediated gene flow from sub-Saharan Africa, as well as migration from South Asian populations [32
], which have the highest observed levels of basal M lineages [2
]. Although we have only detected four individuals potentially affiliated to M23, they are likely to descend from an African and/or Southwest Asian source, again placing the origin of M23 somewhere between these two regions. Unfortunately, lacking genealogical records for these four individuals, we cannot confirm their maternal African origin, and without additional mtDNA coding region information, the link with African populations remains highly speculative. However, if confirmed, this finding would suggest that the origin and dispersal of M23 lineages is restricted to the circum-Arabia/northwestern Indian Ocean regions.
(3) Despite the limitations of molecular dating [34
], the estimated founder age of macrohaplogroup M using the M23 branch considered alone is 62-73 kyr (95% confidence interval, 44-94 kyr) (Table ). This conforms to the revised age estimate of macrohaplogroup M [35
], and is slightly older than the proposed date for the dispersal of anatomically modern humans from Africa, as well as the population expansion accompanying it [2
]. The time to the most recent common ancestor (TMRCA) of M23 has been estimated at 9.4 kyr (95% confidence interval: 1.9-17 kyr) using a recently improved control region mutation rate [35
] (Table ), in broad agreement with dates obtained using previous coding region mutation rates (Table ). Considering that a demographic expansion may predate a geographic one, it is worth noting that the lower age estimates of M23, and especially of its subclade M23b, fall clearly within the Holocene (1.7-3.9 kyr; 95% confidence interval, 0-8.2 kyr). Although this is broadly consistent with a late Holocene date for the initial settlement of Madagascar [14
] and the concomitant demographic/geographic expansions, the large confidence intervals add uncertainty to the dispersal date of M23 and leave open the possibility that this rare lineage may represent an early pre-Austronesian expansion into Madagascar.
The presence of the M23 clade among the Malagasy Vezo fishermen and Mikea hunter-gatherers provides additional mtDNA evidence upon which a better picture of the colonization of Madagascar can be built. However, open questions remain, including the geographic origin of M23, and the time and mode of its spread into Madagascar. These outstanding issues can only be partially investigated with the currently available data. The M23 lineage is not present in any of the putative parental populations of the Malagasy (Africans and Island Southeast Asians), suggesting either its absence from these populations, or that it is so exceedingly rare there that it has not yet been detected [17
] (Additional files 1
). Indeed, relative to their genetic diversity, Africans and Southeast Asians have not been widely sampled, although Borneo (the likely source of the Austroensian expansion into Madagascar) has been relatively well surveyed, and a high number of published mtDNA sequences (n = 157) is currently available from this area [38
]. Nonetheless, M23 lineages have not been identified in this region. Even if M23 is as rare in Borneo as it is in Madagascar (1.9%), the probability of it being detected there is high: P
(M23 | n, freq) = 0.95. However, the extreme population structure of Indonesia [41
] may mean that M23 is restricted to populations that have not yet been sufficiently sampled, or at all.
An alternative hypothesis is that the M23 motif developed in situ in Madagascar, either completely or partially. If this is the case, a pre-M23 lineage should have evolved more or less in isolation within the founder population that later participated in the colonization of Madagascar.
The identification of four individuals of African and Southwest Asian origin who share the 13 diagnostic control region mutations for M23 pinpoints these regions as potential sources for M23. Whilst, the data does not allow us to make clear phylogeographic inferences regarding M23 origin, our results may provide some evidence of ancient contacts across the Indian Ocean involving Africa, Madagascar and South Asia. The deep-rooted topology of M23 and its age estimate coupled with its very restricted distribution within Madagascar, makes unlikely its presence in the island as a result of recent contacts, and is more in agreement with the patterns of human contacts across the Arabian Sea and the Indian Ocean, which pre-dated the Austronesian expansion into Madagascar [24
Whilst more extensive screening of the putative parental populations in Africa and South Asia will help to ascertain the geographic origin and distribution of M23, our initial examination of Malagasy mtDNA diversity suggests that the origin of M23 lineages may be found in the circum-Arabia/northwestern Indian Ocean regions and that their arrival to Madagascar may pre-date the Austronesian settlement of the island. This lends support to oral tribal traditions stressing the earlier presence of non-Malagasy speakers (e.g. Vazimba; [23
]) and re-emphasizes the importance and complexity of the circum-Arabia and Indian Ocean corridor since the late Pleistocene.