Distribution and diversity of Y-chromosome
diversity statistic, h,
based on the frequency of different haplogroups (), ranges from 77% in the Maram to 86.2% in the Pnar among the Khasi-Khmuic Austro-Asiatic groups, whereas it is 77.5% in the Tibeto-Burman Garo. For Y-STR haplotypes, while it ranges from 96.1% in Nongtrai to 99.9% in Khynriam in the Khasi-Khmic populations, it is 99.3% for the Garo. Out of the 26 potential haplogroups defined by the markers used in this study a total of 12 haplogroups were found in these populations (). O-M95, with its frequency ranging from 17% in War-Khasi to 42% in War-Jaintia, was the most common haplogroup in all the Austro-Asiatic populations followed by the undifferentiated O-M122 (ranging from 11% in Nongtrai to 34% in Bhoi) where as in the Tibeto-Burman Garo the frequency of O-M134 and undifferentiated O-M122 haplogroups (23% and 17%, respectively) were the most common. H-M69 and its subclade H-M82 which is reported to be in high frequency in most of the Indo-European populations 
are present with an average frequency of only 3% among them.
Genetic Diversity (in percentage) based on Y-Chromosome and mtDNA analysis of populations from Meghalaya
Rooted maximum-parsimony tree of Y-chromosome haplogroups defined by binary markers along with their frequency in Nine Meghalayan Populations.
Population structure based on Y-chromosome
Based on the multidimensional scaling (MDS) of the Pairwise FST distances computed using haplogroup frequencies of Austro-Asiatic (Khasi from northeast India and others) and neighboring non-Austro-Asiatic populations, the two-dimensional MDS plot is furnished in . A good fit between the two-dimensional MDS plot and the source data (pairwise value of FST) was obtained (stress value of 18%). Broadly speaking, most of the Austro-Asiatic populations, including all the three linguistic sub-families of Austro-Asiatics i.e Mundari, Khasi-Khmuic and Mon-Khmer tribes, irrespective of their geographic affiliations, are placed in the upper right quadrant; Nicobarese, Ho, Santhal, She and Zhuang are somewhat removed from the others. On the other hand, most of the Tibeto-Burman populations are differentiated from the Austro-Asiatic populations and the Indo-European populations (clustered in the lower right quadrant) on the 1st and 2nd dimension, respectively. The Khasi-Khmuic populations, which form a compact cluster near the centroid, do not cluster with the Tibeto-Burman populations of Northeast India, barring the Garo of Meghalaya which has contiguous geographic distribution and marital interaction with them. Overall, the populations of the same linguistic family seem to cluster together, with few exceptions such as the Austro-Asiatic Lodha, which is placed among the Indo-European populations.
Plot on the first two dimensions derived from the multidimensional scaling of the pairwise FST distances of the populations based on Y-haplogroups.
The analysis of molecular variance (AMOVA) yielded a significant but low FST values for both Y-SNPs (0.02) and STRs (0.02), suggesting a probable recent differentiation of the Khasi-Khmuic populations (). For Y-SNPs, whereas among group differentiation between the Khasi and Southeast Asian Austro-Asiatic populations is low (0.03) and non-significant it is relatively high and significant between the Khasi and Mundari populations (0.08). On the other hand, the FCT value between Khasi-Khmuic and Indian Tibeto-Burman populations is very high and significant (0.30) while between Khasi-Khmuic and Southeast Asian Tibeto-Burman it was relatively low and non-significant (0.03). Although there is virtually no difference in the haplogroupic composition of the Tibeto-Burman Garo from Meghalaya and Southeast Asian Tibeto-Burman populations as suggested by the FCT (−0.01627), it is surprisingly high (0.17975) between the Garo and the other Indian Tibeto-Burman populations.
Analysis of Molecular Variance using Y-SNPs/STRs between groups of populations categorized on the basis of geography and languages
Profile of new mtDNA haplogroups
Based on Hypervariable segment (HVS) I and the known coding region SNPs most of the individuals could be assigned to specific haplogroups/lineages. However, there were still many individuals who could not be assigned to any existing lineages. Based on their HVS-I motif we could group these samples into 6 broad clades, and resequenced complete mtDNA of 1-2 samples from each of those clades to assign them to a known or new haplogroups (). We also resequenced complete mtDNA for the samples falling in haplogroup B as none of the defining mutations for the subhaplogroups of B were found. The analysis of complete mtDNA suggests the presence of four new haplogroups which we have designated as M48, M49, M50 and B7. All the motifs in the coding region of the M48, except for 6336, which defines M30a 
have not been reported and therefore we assign all these samples a new lineage. While the average frequency of M48 is 11% among the Austro-Asiatic Khasi groups, ranging from zero in War-Jaintia to as high as 26% in Lyngngam, it is present with a frequency of 4% among the Garo. Although haplogroups M49 and M50 are found with an average frequency of about 3% each in the Khasi populations, they could not be traced in the Garo as well as in some of the subgroups of Khasi. A subset of mutations at 150-9452-12950-13928C of our B-haplogroup samples has been reported in one of the samples (SD10313) of Han Chinese 
which also falls in undifferentiated haplogroup B. We have proposed to name it as haplogroup B7 including the Han Chinese samples.
Phylogenetic tree of new haplogroups based on full mtDNA along with the TMRCA and associated 95% Confidence Interval.
In addition to these four new haplogroups we propose two new sub-haplogroups –M33b- within M33, and M31c within M31 (). The samples falling in M31c has all the defining mutations of M31 but do not share any of the coding region motifs with either M31a which has been reported in the Andamanese of Andaman and Nicobar island 
and other Tribal populations of India 
, or M31b found in Rajbanshis (SW1) of Northeast India 
. Therefore, we propose a new haplogroup, i.e. M31c. While this haplogroup is absent in the Garo, it is found with an average frequency of ~5% in the Austro-Asiatic Khasi populations with a maximum frequency of ~17% among the Bhoi. The samples of M33b have mutations which define M33 and it also shares mutations at positions 1719-3221-16293-16324 with the Rajbanshi sample (SW23) which is now re-designated as M33b. The frequency of M33b, with the exception of Pnar (~22%) is low and found only in Lyngngam, Khynriam and Garo (~2, ~3 and ~3%, respectively). On the other hand, M33a which were found to be in extremely high frequency in the Garo (~55%) and with an average frequency of~5% in Khasi-Khumic populations has been also reported in the Brahmins of Uttar Pradesh, India 
and in the two populations of South India 
. It is interesting to note that all the samples of this study, except one Khynriam sample, forms a single sublineage defined by 16316 HVS-I motif which distinguishes it from other M33a lineages found in other parts of India.
Distribution and diversity of mtDNA haplotypes/lineages
In the 444 samples representing the 8 Khasi-Khmuic Austro-Asiatic tribes and a Tibeto-Burman Garo a total of 117 distinct HVSI haplotypes were found. Among these, 67 haplotypes are unique, each represented by single individual. Of the remaining, 37 are shared at least by two different tribes out of which only 10 are shared between Garo and Khasi subtribes. Based on the phylogenetic analysis of mtDNA control and coding region SNPs, 37 distinct haplogroups and subhaplogroups were observed among the studied populations (). The samples that still remained unclassified in M and R are only ~6%, and 0.5%, respectively. Among the Austro-Asiatic Khasi, ~80% of the variation is accounted for by a set of 10 haplogroups–M*, M4a, M9a, M31c, M33a, M33b, M48, MD, MD4 and U2, whereas in the Garo a subset of only 3 haplogroups–M*, M33a and U2-accounted for ~80% of the total sample. However, these 3 haplogroups account for only ~18% of the sampled individuals from the neighbouring Austro-Asiatic Khasi populations.
Tree Drawn from a Median-Joining Network of mtDNA Haplogroups Observed in Nine Meghalayan Populations.
The mtDNA haplogroup diversity () among the Austro-Asiatic groups is low and ranges from 83.1% in War-Jaintia to 93.6% in Bhoi whereas in Garo the diversity is extremely low (66.9%). Similarly, the haplotype diversity () for the Austro-Asiatic groups ranges from 86.8% in War-Jaintia to 96.1% in Khynriam where as in the Garo it is 68.1%.
Population relationships based on mtDNA haplogroups
The two dimensional plot of the multidimensional scaling of the genetic distance matrix of the 40 populations, including 8 Khasi subtribes and Garo of the present study and other relevant populations from the South and southeast Asia, is shown in . The plot depicts the Tibeto-Burman Garo and Austro-Asiatic Nicobarese (a Mon-Khmer population) and Sakai as extreme outliers. As expected, the Mundari Austro-Asiatic populations, with predominantly South Asian mtDNA haplogroups, are placed as outliers aligning with the two Indian Indo-European populations on the extreme right corner of the plot. Although the Khasi-Khmuic Austro-Asiatic populations, except for Nongtrai, Lyngngam, form a constellation near to the left of centroid, it also has other populations such as Han, Lisu and Bai as part of this constellation. The Southeast Asian Tibeto-Burman populations is scattered along the 1st axis. Similarly, the Indian Tibeto-Burmans do not form its own cluster. Overall, the three different sub-families of Austro-Asiatic populations do not form a homogeneous cluster, unlike in the case of Y-chromosome.
Plot on the first two dimensions derived from the multidimensional scaling of the pairwise FST distances of the populations based on mtDNA haplogroups.
Although the AMOVA suggests low FST value (0.05), hence low differentiation among the Khasi-Khmuic populations (), it is quite high between them and the Garo (0.12). The differentiation of Khasi-Khmuic tribes with Southeast Asian Austro-Asiatic populations is moderate (0.05) but is much higher with those of Mundari populations (0.12). Surprisingly, the Tibeto-Burman Garo of Meghalaya shows high degree of differentiation with the other Tibeto-Burman populations of India (0.17) as well as Southeast Asia (0.13).
Analysis of Molecular Variance using mtDNA haplogroups between groups of populations categorized on the basis of geography and languages
Time to Most Recent Common Ancestors (TMRCA)
The TMRCA was calculated based on mtDNA coding region (nucleotide position 577-16023) with the average sequence evolution rate as 1.26±0.08×10−8
base substitutions per nucleotide per year 
. The TMRCA of the haplogroups based on the full mtDNA sequence () suggest a younger age of Khasi/Northeast Indian haplogroup M (41,000 YBP) compared to what has been obtained in the other studies 
for Indian M haplogroup (54,000 YBP). This is because of the very low age contribution from the M48 haplogroup. Reanalyzing the data by removing M48 increases the age to ~50,000 YBP which is close to what has been obtained in the other studies. The TMRCA of haplogroup M31 and M33 is ~40,000 YBP and ~50,000 YBP suggesting that M33 like M31 is an archaic lineage. The age of B7 suggests that this haplogroup has originated ~28,000 YBP in East Asia where all the other sub-haplogroups of B have been hypothesized to have originated.