In this study we defined one novel haplogroup M41, and revised the classification of haplogroups M3, M18, and M31. The remaining haplogroups are also classified into subhaplogroups (M3a, M4a, M6b, M33a, M34a, M37a and M40a) by including our complete sequence information (Fig. and also see Additional file 1
). The geographical distributions of these samples with their haplogroup affiliation are given in Fig. and the populations and their linguistic affiliations are described in Table .
The 'autochthonous' haplogroups of Indian macrohaplogroup M.
Geographical distribution of completely sequenced (mtDNA) samples used for the reconstruction of Indian M phylogeny. (*) Random sample from Karnataka state.
Samples that were sequenced for the complete mitochondrial DNA and their geographical location, linguistic affiliations and haplogroups assigned,
We have revised the classification of haplogroup M3 that was previously characterized by a coding region mutation 4580 and two control region substitutions 482 and 16126. In our survey of >5000 samples across India, we found a considerable number of samples that have mutations at nps 482 and 16126 but don't have mutation at 4580 (our unpublished data). This suggests that 482 and 16126 are the basal mutations of this haplogroup, and 4580 might have originated later and this represent subhaplogroup M3a. Haplogroup, M18 was previously characterized by only the HVS I mutation (16318T), but now we have defined this haplogroup by two coding region mutations (12498 and 15942), and an additional control region mutation (194) (Fig. and also see Additional file 1
Further, we have defined several subhaplogroups based on the sharing mutations between our own and Sun et al. [11
] data [see Additional file 1
]. Subhaplogroup M3a defined by a coding region mutation at 4580, M4a defined by two coding (6620 and 7859) and three control region (152, 16145 and 16261) mutations. Subhaplogroup M6b defined by two coding region mutations (3486 and 5585). M33a defined by two coding substitution (8562 and 15908); and M34a by six coding region (3447-8404-10361-11992-12311-14094) and three control region (146, 16095 and 16359) mutations. M37a defined by a single coding (7853) and two control region (151-152) mutations. M40a defined by a coding (13542) and three control region (200, 16179 and 16294) mutations. Interestingly, haplogroups M4, M18, M30, M37 and M38 shared a common coding region mutation (12007) from the root of haplogroup M (superhaplogroup M4'30) and later differentiated by coding and control region mutations (Fig. and also see Additional file 1
). We have defined one novel haplogroup, designated as M41, by six coding (870-6297-12398-12469-13656-15601) and three control (375, 16327 and 16330) region mutations with the T159 sequence of Sun et al. [11
] (Fig. and also see Additional file 1
). M41 has also been tentatively classified in three subhaplogroups M41a, M41b and M41c. One complete sequence O9 from Andamanese specific haplogroup M31a has also been included here that defines subhaplogroup M31a1 by a single coding (13710) as well as a control region (200) substitution. Our reanalysis of this lineage suggests population-specific two clear-cut subclades of M31a in this island. M31a1 specific mutations 200 and 13710 are exclusively present in Onge and Jarwa populations while 9617 defining M31a2 clade is present in only Greater Andamanese individuals [see Additional file 1
]. All the coding region diagnostic mutations of macrohaplogroup M subhaplogroups are listed in Table .
Coding region diagnostic mutations of Indian (mtDNA) M subhaplogroups.
We have also calculated the age estimates for all M branches (Fig. ) both by using the estimated mutation calibration rate of Mishmar et al. [16
], which has been recently applied in most of the mtDNA studies [1
], and ρ (the averaged distance to a specified founder haplotype) and a mutation rate of one transition per 20,180 years between nps 16090–16365 [18
]. Standard errors for coalescence time calculation were calculated following Saillard et al [19
]. We found some conflict between the age estimated by both of the methods. Since, the complete sequences do not reflect the actual population size and geographical distribution, former method [18
] has been used for colescent time estimation. The detailed coalescent time list is given in Additional file 2
It is interesting to note that most of the new M lineages are deep rooting, and more likely arose in situ
in the Indian subcontinent just after the arrival of the anatomically modern humans (Fig. ). As shown in the figure , it is apparent that all the autochthonous lineages under analysis emerge directly from the root of the macrohaplogroup M. There is no intermediate lineage shared by any two haplogroups, except for haplogroup M4'30 (Fig. ). The star-like and non-overlapping pattern (Fig. ) indicates that all the lineages have originated independently from the root of the macrohaplogroup M, thus supporting a rapid dispersal of modern humans along the Asian coast after they left Africa, followed by a long period of isolation [2