The West Eurasian component
This component comprises the N haplogroups I and W, and the R haplogroups R0, H, T2 and U (xU2a,b,c) and is almost absent in the Tharus (only one H and one T2 mtDNAs from Chitwan). In contrast, it reaches a high frequency (25.0%) in New Delhi, where most of the haplogroups of this component are found, and is also common in Indians from Terai (12.5%) and Andhra Pradesh (10.3%). However, in spite of the similar frequencies, the two latter populations are remarkably different in their composition: Hgs I, U1 and T2 characterize the Terai Hindus, whereas Hgs U2e, U5a1 and U9a the Andhra Pradesh tribals.
Among the West Eurasian U sub-clades, particularly interesting are U7 and U9. In the New Delhi sample, U7 shows a frequency (10.4%) that is quite similar to that of Iran (9.4%) and close to its peak (12.3%) in the West Indian state of Gujarat [12
]. U9 is a rare haplogroup previously observed in Pakistan [42
], Yemen and Ethiopia [23
]. Interestingly, the U9 mtDNA that we found in Andhra Pradesh, together with an Ethiopian mtDNA, defines the new U9a sub-group (Figure ), thus confirming the ancient genetic links between East Africa, Southwest Asia and India.
Although the West Eurasian component is probably primarily related to migrations during the Holocene period, the exact source and time of such migrations is difficult to establish [12
The new haplogroups M51
were detected in the eastern part of the Indian subcontinent, while M53
seems to belong to the West Indian area. As for the new sub-clades of previously described haplogroups, M4c
, linking one Tharu of Chitwan with one Indian from Andhra Pradesh [30
], could be typical of Tribal groups, and M43a
, is observed at the Indian border with Nepal. Sub-clade M5a1
characterizes peoples from North India (New Delhi and Uttar Pradesh [30
]), whereas M5a2
is present in Southern India [28
]. Both haplogroups M33 and M35 show many inner branches, but while M35
is diffused inside the Indian subcontinent, relating the Tharu groups and the Hindu from New Delhi with populations of South India, M33
is also spread elsewhere. Indeed, its sub-clade M33a
includes one Egyptian mtDNA, thus connecting the Indian subcontinent with North Africa, whereas M33b
, described in Western Bengalese [30
] and in the Indian region of Megalaya [31
], has been observed in Eastern Tharus. Therefore, it may represent a clade of the Northeast Indian subcontinent.
Of particular interest is the detection of haplogroups M21 and M31 (two subjects each) among the central Tharus. The Tharu M21
sequence (Figure ) shares nine mutations with one of the three M21 lineages found in all Orang Asli groups of Malaysia [24
] and in other groups from Southeast Asia [44
], belonging to the sub-group M21b
. The Tharu M31
sequence, together with one Megalaya mtDNA [31
], clusters with one West Bengal Rajbhansi [21
] and defines a sub-group of M31b
. This subclade, together with M31a2 of the tribal Lodha, Lambadi and Chenchu populations, represents the Indian counterparts of the M31a1 Andaman lineages [27
], further supporting a common ancestry of the Indian subcontinent and people of the Bengal Bay islands.
As for the R haplogroups, R7 and R30 are of particular interest. Very informative for the structure and for the age evaluation of haplogroup R7 is the Andhra Pradesh sequence #56 (Figure ) that defines an extremely deep branch of the R7 in India. This branch shares with the root of the phylogeny of Chaubey et al.[54
] only the mutations 13105, 16319 and, in addition, it does not display the 16260 and 16261 mutations characterizing the R7a and R7b branches observed in different R samples from Indian groups [11
] and, interestingly, in one R7 Tutsi from Rwanda (unpublished data). Two Tharu mtDNAs, one from Chitwan and one from Eastern Terai, belong to the R30
haplogroup. The first is closely related to two Indian sequences, one from Andhra Pradesh and the other from Uttar Pradesh, and contributes to define a sub-clade of the R30a [54
]. The second joins a Punjab sequence [54
] with a Japanese deep lineage [22
] indicating an ancient link between India and Japan. A more recent connection with Japan is, in turn, revealed by the F1d
haplogroup showing a tight linkage between an Eastern Tharu sequence and two Japanese mtDNAs. Another noteworthy connection with outside areas is evidenced by the U9
haplogroup that, being shared by an Ethiopian and an Andhra Pradesh mtDNA, reveals a not recent link between Ethiopia and India.
Even if the PC analysis of mtDNA haplogroup frequencies observed in the present study compared with those of relevant populations accounts for only about a quarter of the variance, four main clusters are defined: West Eurasian [12
], Indian area [12
], East Asian [58
], and Southeast Asian [44
] (Figure ). The first two are well-distinguished from the others by the first PC, which points out a separation between the West and the East Eurasian gene pools; afterwards, the second PC distinguishes West Eurasians from Indians and East Asians from Southeast Asians. Tharu groups are located in the middle of the area among the clusters but, while the central groups are closer to East Asians, Eastern Tharus turned out to be closer to the Indians. Other samples from the border between India and Nepal, such as those from Uttar Pradesh, remain inside the Indian cluster (including the group Th-Up composed of marginal "Hinduized" Tharus [12
]. As for Indians, they all group together, in agreement with a deep (Late Pleistocene) common maternal ancestry of caste and tribal populations [11
], perhaps due to some accepted practices (such as the anuloma) that allow a woman of a lower social level to enter a higher level by marriage [55
Figure 7 Principal component analysis of mtDNA haplogroup frequencies. Comparison samples from Western Eurasia (Iran): Irn-W, Irn-E, Irn-C, Irn-SW, Irn-SE ; Indian subcontinent: AP, Andhra Pradesh ; WB-1, Castes from Bengal; WB-2, Kurmis from West Bengal; (more ...)
The phylogeny and frequencies of the 28 Y-chromosome haplogroups observed in the present study are shown in Figure .
Two new variants are reported. The first, M481
, defines the new haplogroup F5 and consists of a C→T transition at np 163 within the STS containing the P36 mutation [62
]. The second, Tdel, was first noticed in haplogroup O2-P31
while typing the P31 marker and was confirmed by sequencing. This is due to a T deletion in the 6T stretch starting at np 127, adjacent to the P31 T to C transition [63
]. The T deletion, not found in the other examined Hg O derivatives, is always present in our O2 samples (all tribals; four of the Eastern Tharus and one from Andhra Pradesh). Taking into account that this haplogroup is often recognized through markers different from P31 and that in other studies, where the P31 was examined [64
], a technique not detecting Tdel was employed, additional DHPLC/sequencing analyses of P31 chromosomes are necessary to evaluate the extent of the contemporary presence of the two mutations. It is worth noting that these samples were also all positive for the PK4 marker recently observed in four Pakistani Pathans [36
]. Another variation, consisting of an A to G transition at np 147, was observed in two H-M82 samples while sequencing the M89 marker. This mutation, which was not found either in H-M69* or in H2-APT chromosomes, characterizes the H1 subgroup but, due to the impossibility of typing all the M82 samples, as well as any M370* and M52* Y chromosome, at present, we cannot define the precise phylogenetic position of this novel transition inside the sub-haplogroup.
On the basis of known or supposed haplogroup origin [11
], three main components (East Asian, West Eurasian and Indian) can also be identified for the Y chromosome. The incidence of the various components in each population is depicted in the histograms of figure .
The East Asian component
made up by haplogroups C(xC5), D, N, O3, Q, and K*, and mainly represented by Hg O3, is, on the whole, much more frequent among Tharus (39.8%) than among Indians (7.7%). The high Tharu frequency, mostly accounted for by the subgroup O3-M117
(83.8%), shows a wide range in the three groups with significant differences between Th-CI vs
both Th-CII (P < 0.02) and Th-E (P = 0.001). Among the less represented East Asian markers of interest is Hg D
that is very frequent in Tibet, absent in other Nepalese populations [37
] but present in six Central Tharus: as D1-M15 in two Th-CI subjects and as D*-M174 in four Th-CII subjects. The latter, by showing the DYS392 -7 repeat allele that characterizes the D3-P47 chromosomes [37
], could belong to the recently identified Hg D3* [73
]. In addition, two other haplogroups were encountered: K-M9*
in a single Eastern Tharus and Q1-P36
in two Tharus-CII. Hg Q, which is present in Tibetans, was seen in only one sample from Kathmandu [37
]. In Indians, the very scarce East Asian component was represented by three Hg O3
(each belonging to a different sub-haplogroup and to a different Indian sample), one C3-M217
in Terai (previously observed only in a few Kathmandu and Tibetan samples [37
]), two N1-LLY22g*
, one in Terai and one in New Delhi and by three Q1-P36
in New Delhi. Only three East Asian haplogroups, Q1-P36, O3-M134* and O3-M117, are shared between Tharus and Indians.
The West Eurasian component, represented by haplogroups E, G, and J, shows a higher incidence among Tharus (15.9%) than among Indians (7.7%). With the exception of three E3-M35* Eastern Tharus and two G-M201 (one in New Delhi and the other in Andhra Pradesh), the main part of this component is accounted for by haplogroup J (Tharus 14.0%, Indians 5.8%), present only as J2, namely J2-M410* and J2-M241*. Whereas the latter haplogroup is shared by all Indian and Tharu samples, the J2-M410* was found in all Tharus but in only one Hindu of New Delhi, where one sample of its derivative J2-M68 was also present. If one considers the total frequency of this component in each sub-group, among Indians the highest value is observed in the Hindus of New Delhi (10%), and, among Tharus, in the group of Eastern Terai (30%). It is noteworthy that the frequency of Eastern Tharus is about three times higher than that of the other two Tharu samples (P ~ 0.03 vs Th-CI and 0.02 vs Th-CII). This component may reflect several events of gene flow from the Early Holocene to the present, passing through Neolithic farmers.
The Indian subcontinent component includes lineages of haplogroups C, F, H, L, O, R and among Indians it ranges from 80% in the New Delhi sample to 85% in Terai, and to 90% in the Andhra Pradesh. Among Tharus, with the exception of an incidence of ~32% in the Th-CI group, it reaches values around 50% in the other two groups. Hgs H and R are the most frequent haplogroups of this component. Hg H (Tharus: 25.7% Indians: 18.3%) is represented by five sub-groups: H-M69*, H1-M52*, H1-M370*, H1-M82* and H2-APT. Whereas H-M69* was detected at similar frequencies (mean 8.8%) in all the Tharu sub-groups, and in two Indians of Andhra Pradesh (6.9%), H1-M82* was seen in all Tharus and Indians. By contrast, H1-M52* (2.0%) and H1-M370* (6.1%) were seen only in the New Delhi Hindus, and H2-APT (11.7%) only in the Tharus-CII.
Hg R, besides a single R* from New Delhi, was detected in all groups as R1a1-M17* and R2-M124 with important differences between Tharus (13.5%) and Indians (52.9%), mainly due to R1-M17* (8.8% vs 41.3%). Within the two populations, significant differences were also observed: the Tharu-CII sample differs from the Eastern one (3.9% vs 16.2%, P ~ 0.05); the Hindus from Terai (69.2%) appear very distant from both the New Delhi Hindus (34.7%, P < 0.01) and the Andhra Pradesh tribals (27.6%, P ~ 0.005). However, this important difference could be, at least partially, influenced by the genetic background of the sample that in recent times moved from India to Nepal after malaria eradication.
The Indian component can be resolved into the most likely endogenous (local) haplogroups (C5, F*, H, the two new F5-M481 and O2a1a-Tdel
), and the inter-regional ones (L, R1 and R2). In the first group we have included the lineage HgO2-P31-Tdel
found in the tribals of both Eastern Tharu and AP Indian samples. The T deletion further characterizes the HgO2-M95 clade that is considered a genetic footprint of the earliest Palaeolithic Austro-Asiatic settlers in the Indian subcontinent [14
], and also as an autochthonous Indian Austro-Asiatic population marker [72
]. The remaining endogenous haplogroups include haplogroup C5-M356, shared between Indians and Tharus (two in the Terai Hindus and one in the Tharus-CII), haplogroup F-M89* and its new derivative F5-M481, both considered as tribal markers and observed in Andhra Pradesh (10.3%). As for the inter-regional haplogroups L-M20, R1-M17 and R2-M124, they display within India a considerable frequency and haplotype associated high microsatellite variance. However, whereas this observation for the subgroup L1-M76 of L-M20 and for R2-M124 showing lower frequencies outside this region, is considered indicative of a local origin, for R1-M17 the situation is more complex, as well as the position of L-M20*. Actually, the high frequency of the R1-M17 haplogroup found in the Central Eurasian territory, together with its gradient of diffusion that was associated with the Indo-European expansion [74
], would leave some uncertainty about its geographic origin. However, the high microsatellite variation supports an ancient presence, dated in our samples over 14 ky [see Additional file 3
] of the M17 marker in the Indian subcontinent, as suggested by Kivisild et al. [11
], and sustained by Sengupta et al. [15
] and Thanseem et al. [71
], who consider the Indo-European M17 only a contribution to a local Early Holocene pre-existing Indian M17. Thus, it is reasonable to assume that even this inter-regional haplogroup has ancient relationships with the Indian area. Interestingly, the M17 Y-chromosomes of the Indian subcontinent differentiate from those of Central Eurasia in that they are virtually all 49a,f/Taq
I Ht 11 [77
As to the rare haplogroup L-M20*, it was present in two individuals of the New Delhi sample. Only one of these Y-chromosomes could be analyzed for the microsatellites and compared in a network with other seven available samples L-M20* of Turkish and Italian origin (unpublished data), showing that it was very distant from the others.
Age estimates of the main haplogroups with some comparative data [15
] are reported in Additional file 3
. Although age estimates deserve caution, particularly when samples are small and standard errors large, a good general agreement between the two datasets is observed. As for haplogroup H1-M82*, not reported by Sengupta et al. [15
], its age is very similar in all groups, with variance (0.093–0.110) lower than that (0.19) previously observed in some Indian groups [11
]. Special attention is deserved by haplogroups J2-M410*and R1-M17*, showing variances very different in the various Tharu and Indian sub-groups and the highest values in the Eastern Tharus and tribals of Andhra Pradesh. Interesting is also Hg R2-M124 for which the Tharu total variance rises to 0.271, a value obtained by adding just two samples from the other Tharu groups to six homogeneous Th-CII samples (variance 0.033), thus stressing again the Tharu heterogeneity.
The PC analyses of the haplogroup frequencies, which were performed with the Nepalese and Tibetan data of Gayden et al. [37
] and the Indian caste and tribal groups of Sengupta et al. [15
], are illustrated in Figure . In both plots, a cluster of tribals, including Tharus and the Indians from Andhra Pradesh, is evident and separated from the caste groups. As for the Nepalese populations, all are very distant from Tibetans. Tharus, with the Eastern group always in a peripheral position, cluster together in the same quadrant of the plot, distinct from those occupied by the other three Nepalese groups.
Figure 8 Principal component analysis of Y-chromosome haplogroup frequencies. (a) Comparison with Nepalese and Tibetan groups ; (b) Comparison with some Indian caste and tribal groups  where our data have been normalized to the Sengupta level of resolution. (more ...)