Analysis of short stretches of mt DNA HVSI and HVSII region have significantly aided in clearly discriminating some of the M lineages. With the aim of understanding migration routes of diverse Indian people, more control region sequences are being generated without much support from coding region sites, resulting in an increasing number of conflicts within the classification of its lineages. We report a phylogenetic tree constructed from the whole genome sequencing of twenty-three Indian and one Ethiopian M lineage to resolve some of the anomalies occurring due to recurrent mutations in control region.
The control region sequences have exhibited the presence of an array of M lineages in India [12
], despite which, complete mt DNA sequencing suggests that most of these lineages arose as limited offshoots of the main M trunk. The newly constructed M phylogeny displays difference in the branching patterns of lineages, with total number of substitution sites varying even within a lineage. Substantial variation in branch length within a lineage is indicative of the existence of further branching that could probably be delineated with generation of additional data. The M2 genome and its sub-lineages, more specifically M2a, have been well described in Indian population. Nevertheless complete sequencing of M2a demonstrated the presence of a novel site T9758C, which is characterized as a diagnostic marker for M2a sub-lineage, in addition to the previous reported, G5252A and A8396G transitions [15
]. This finding reinforces the importance of sequencing a large number of individuals belonging to a lineage for describing a detailed phylogeny. The study was unable to trace any specific marker for M2b sub-lineage. In the absence of any lineage specific marker for M2b till date, it might be suggested that this sub-lineage is not a distinct clade of M2 lineage but is an M2 with additional HVSI motif (G16274A, T16357C). However, more M2b genomes need to be completely sequenced before reaching this conclusion and though HVSI sequences are not very reliable for constructing phylogenies, this cluster can well differentiate individuals with only one or both mutations and in turn resolve the phylogeny to its finer sub-lineages. M2 lineage is the oldest M lineage found in India with an estimated age of approximately 50,000 YBP, using only coding region motifs estimation, opposed to the expansion date of 60,000–75,000 yrs calculated from control region sequence information [15
Although the G12007A substitution has been previously identified in other haplogroups, besides the M lineages [29
], this study presents a novel lineage M30 that was differentiated to include mitochondrial genomes possessing G12007A substitution. The erection of M30a sub-lineage with its root at T195A and G15431A may help in further classifying M* samples that have yet to be identified owing to the absence of any characteristic HVSI motif.
Mitochondrial genomes possessing the 16223, 16300 motif appear to be a promising new sub-lineage arising from M30. Additional complete mtDNA sequencing of similar sub-types may further help in precisely defining this branch. The M30 lineage was relatively younger than the M2 lineage with an expansion age of approximately 33,000 YBP, calculated on the basis of its coding region sequence information.
An important contribution of this study is placement of M18, M6 and previously defined M3 and M4 lineages in the M phylogeny. In the absence of a coding region marker for M18 lineage [20
], G12007A substitution provides a stable root to M18 type, which is defined only on basis of the HVSI motif A16318T. The recognition of M31 lineage with an A5319G basal transition further reduces the number of branches arising from the trunk of M lineage. Since M6 is already well characterized, we propose that it remain as a sub-lineage of M31. However, it is essential that previously defined M3 and M4 lineages be completely removed from the phylogeny. Furthermore, it might be realized that the newly defined M4a lineage [20
], might in fact be a sub-lineage or independent lineage by itself.
The M phylogenetic tree has largely aided in resolving the position of M5 lineage. Until recently, transition at G16129A along with basal motif of M, was used to characterize this lineage [34
] and is currently described by the presence of coding region mutation at T12477C [20
]. The phylogenetic tree constructed on the basis of complete mt DNA genome sequencing provides evidence to support our finding that at least two sub-lineages arise from M5 that share a transversion at site C10986A and may or may not possess T12477C transition. Presence of T12477C transition in only one of the two M5 mt DNA genomes sharing an identical HVSI motif, C16223T and G16129A, further substantiates the importance of coding region markers in precisely identifying mitochondrial phylogenies. Even though G16048A, HVSI motif has not been included under M5 owing to absence of T12477C, this study includes this motif under M5 lineage. However, prior to defining G16048A, G16129A and C16223T cluster, it is imperative that more samples representing this HVSI motif be completely sequenced. The age of M5 lineage is estimated to be 34,095+/- 6,425 YBP, indicating that M5 and its sister lineages M30 and M31 probably branched out from M haplogroup around the same time.
The newly defined M25 lineage did not share mutation sites with any other lineage and independently arose from M trunk with G15928A and T16304C substitutions.
The moderately high frequency of C16223T, T16325C HVSI motif types in Indian samples suggests that there might be a potential new lineage, which might be more accurately described once additional genomes possessing this motif are fully sequenced. In the absence of this sequence information, no attempt was made to classify this sequence type in to a lineage and hence, designated as M*. The other M* lineages bearing the control region motif, C16223T, T16126C and C16223T, C16251T, C16267T could not be resolved further for similar reasons.
Although, this study presents only a preliminary view of the M phylogeny, yet the emerging data may be highly useful in resolving the long-standing debate on Asian origin of M macrohaplogroup. Since M macrohaplogroup is derived from L3, which finds its roots in East Africa, it is believed that presence of M1 in Ethiopia further substantiates an African origin of M. A similar hypothesis had been drawn for the U6 lineage that is autochthonous to North Africa although U haplogroup displays its maximum diversity in Near East [35
]. The authors of this study prepared an in-depth phylogeography of U6 to infer a back migration of this lineage from West Asia to North Africa. In the absence of a detailed M1 phylogeny, we have focused our attention on M2 to estimate the place of split of M from L3 as Africa or Asia. Interestingly, a single M2 genome differs in its coding region from the root of M at ten sites as compared to M1, which possess only four substitutions. Also, sub-lineages of M2, M5, M30 and M31 show long branch lengths, highlighting the deep roots of these lineages. Considering the antiquity of M2 and other East Asian specific M lineages [33
], Ethiopian M1 lineage is by far a relatively newer branch. Our study on M1 and M2 mitochondrial genomes clearly established the Asian origin of M macrohaplogroup, followed by a back migration to Africa. We further suggest that as more M1 mt DNA genomes are sequenced, there is a possibility that this lineage might find its root in one of the peripheral branches of Asian M lineage.