Using a geographically well-defined sample of caste populations from Tamil Nadu, India, this study arrives at many conclusions similar to those from our previous studies of caste populations from Andhra Pradesh, India [
13,
20,
30]. In both cases, there is extensive sharing of Y and mtDNA haplogroups among castes, and the overall level of inter-caste differentiation is low. This finding is consistent with many other studies of genetic structure and gene flow patterns among caste populations [
6,
32,
33,
40].
Paternally-inherited Y-chromosome SNPs show that caste populations have greater affinity to a sample of Europeans than to a sample of eastern Asians. Unlike the Y-chromosome data, maternally-inherited mtDNA polymorphisms demonstrate a contrasting pattern – castes, regardless of rank, have higher affinity to eastern Asians than to Europeans. These patterns were present in samples from both geographical locations suggesting that South Indian paternal lineages have been more substantially influenced by western or central Eurasians compared to South Indian maternal lineages. Unlike our previous study of Andhra castes, [
13] direct haplogroup sharing between Tamil castes and our sample of Europeans is more limited, suggesting a potentially greater time depth for the development of these patterns. More extensive sampling will be required to resolve this difference.
Using Y-chromosome data, Tamil castes of different rank have differential affinities to our sample of Europeans, with upper castes demonstrating greater affinity than lower castes. Genetic distances are weakly correlated with caste rank distances and correlations from Y-chromosome data are stronger than correlations based on mtDNA or autosomal data. This pattern argues for a differential contribution of male lineages to castes of different rank and limited male mobility between castes in South India.
An interesting difference between the data sets from Andhra Pradesh and Tamil Nadu is also observed. For the former sample, inter-caste distance based on mtDNA polymorphisms (HVS1 sequence) demonstrated a strong relationship to caste rank, while distances based on Y-chromosome data did not. This was interpreted as evidence of historical upward female mobility in the caste system [
30]. (We note, however, that the primary reason for a lack of correlation between Y-chromosome distances and caste rank was close affinity between the upper-caste Brahmin and lower-caste Relli samples [
20].) In contrast, the Tamil Nadu samples show a higher correlation between Y-chromosome distances and caste rank than between mtDNA distances and caste rank. This difference likely reflects differential apportioning of individuals as the caste system originated or subsequent differences in male-female mobility patterns.
Recently, several studies have underscored the complexity of Y-chromosome variation in Indian populations. Sahoo et al. (2006) presented evidence that the R1a haplogroup has attained high frequencies and high diversity in northern India, central Asia, and eastern Europe. They also reported high frequencies of Y-chromosome haplogroup H in caste and tribal populations and provided compelling evidence for an origin of haplogroup H in South India. Upon further analysis, their data show that, as in our study, the frequency of haplogroup R lineages is higher in Brahmins (upper rank) than in lower-rank castes (0.53 vs. 0.41), while the frequency of H lineages is lower in Brahmins than in lower castes (0.15 vs. 0.34).
In a study of broadly distributed Indo-European and Dravidian castes, Sengupta et al. (2006) suggested that the majority of Indian Y-chromosome haplogroups are at least 10,000 to 15,000 years old as gauged by Y-chromosome microsatellite diversity, thus predating the origin of the caste system. The antiquity and complex geographic distribution of the R1a1 and R2 haplogroups led these authors to conclude that the majority of the subcontinent Y-chromosomes arrived in or before the early Holocene (10,000 years ago) rather than in a later Indo-European expansion. Likewise, and concordant with other studies of tribal Indian populations, [
5] we observe Y-chromosome R1a1 lineages in South Indian tribal Irula (unpublished data), a population substantially differentiated from South Indian castes [
18].
An examination of the R and H haplogroup frequencies of Indo-European-speaking castes reported by Sengupta et al. (2006) shows that, as in our study, R haplogroup frequencies in upper castes exceeded those of middle and lower castes (0.62, 0.35, and 0.38, respectively), while H haplogroup frequencies were lowest in upper castes (0.14), intermediate in middle castes (0.38), and most frequent in lower castes (0.44). For Dravidian castes, R (0.62) was more frequent than H (0.14) in upper castes while R and H had similar (within 6%) frequencies in middle and lower castes.
A recent analysis of caste and tribal populations from eastern India (Orissa) demonstrated Indo-European influences on paternal caste lineages [
41]. Brahmins showed high Y-chromosome affinity to eastern Europeans (M17, haplogroup R1a1). In contrast, maternal mtDNA polymorphisms revealed primarily Indian-specific lineages. Taken together, our studies and at least three other studies of Y-chromosome lineages in Indian castes demonstrate that upper castes show genetic affinity to populations residing north and northwest of the Indian subcontinent. This affinity appears, in part, to result from varying frequencies of Y-chromosome R lineages and older South Asian lineages such as F* and H.
Indian mtDNA lineages demonstrate high diversity, suggesting that a majority of Indian maternal lineages are also relatively old and likely predate historically documented expansion events [
38,
42]. Older, deep-rooting mitochondrial lineages belonging to the N macrolineage are prevalent in western Eurasia and are distributed in a West – East cline, with high frequencies in Anatolia and Iran and moderate frequencies in Pakistan and northwestern India [
43]. In this study we observe higher frequencies of basal U lineages in upper castes than in lower castes. Higher resolution haplogroup results, however, show little evidence of between caste differences. This may indicate differences in founding populations. More likely, though, it may suggest ancient migration and integration of various U haplogroups into different pre-caste populations with subsequent, non-uniform lineage sorting and differentiation over time. In contrast, and consistent with early human expansion across South Asia, the predominantly Asian M clade mitochondrial haplogroups account for more than half of all Indian mitochondrial lineages and reach their highest frequencies in lower caste and tribal groups [
6,
13].
While Y-chromosome and mtDNA polymorphisms yield valuable information, it must be borne in mind that they each represent a single linkage group. Estimates based on these systems are thus subject to a high level of stochastic variability [
44,
45]. In addition, the Y-chromosome and mtDNA may both have been affected by natural selection, [
46,
47] which can further complicate the interpretation of population history. Coalescence dates based on these systems must also be viewed with appropriate caution, in part because of their large confidence intervals. More importantly, a coalescence date is not necessarily a reliable indicator of the founding date of a population [
45] because these dates are affected by the size of the founder population and by subsequent gene flow patterns. To gain a more complete and reliable portrait of population history, multiple, independent autosomal polymorphisms should also be examined.
Our analysis of 45 unlinked autosomal STRs reveals that in Tamil Nadu, genetic distances between castes are positively correlated with caste rank. A similar pattern was detected in upper, middle, and lower rank castes of Andhra Pradesh using these STRs [
20] and
Alu and L1 insertion polymorphisms [
13]. An analysis of the Kallar, Vanniyar, and Pallar castes, which also reside in Tamil Nadu, showed that upper – lower caste distance estimates (0.0553) exceeded those for upper – middle castes (0.0329) and middle – lower castes (0.0515) [
40]. Majumder et al. [
37,
48] presented Y-chromosome, mtDNA, and autosomal data from several caste populations in Uttar Pradesh. Subsequent analysis indicated that caste rank was correlated with genetic distance for all three types of systems [
20]. Similar correlations have been observed in a number of other studies of Indian populations [
31,
33,
49]. A relatively greater affinity between upper-caste populations and Europeans has been observed for autosomal polymorphisms in our Andhra Pradesh and Tamil Nadu samples and in a number of other analyses of autosomal data [
6,
50,
51].
Although significant correlations between caste-rank and genetic distances are apparent, model-based clustering algorithms did not detect structure within the Tamil or Andhra populations. We suggest that this finding results from the low amount of differentiation between all caste groups but also from a lack of sufficient power in 45 unlinked STRs to detect high-resolution population structure. With ~250 K SNPs typed in a subset of the Andhra upper and Andhra lower castes, individuals can be clustered into these population groups using genotype information alone [
52]. Likewise, using > 950 K SNPs, the Tamil upper and Tamil lower castes demonstrate group-specific clustering by principal component analysis (unpublished data).
Considering the complex history of Indian populations, it is not surprising that some studies demonstrate an association between caste rank and genetic distance, whereas others do not. A recent study of 15 geographically dispersed Indian populations residing in the United States using 1200 markers found little evidence for caste or geographic structure [
53]. However, sampling strategy (relocated vs.
in situ) or other factors, such as a very wide geographic dispersion of the study populations, may confound correlations if they exist. Admixture and gene flow can also vary substantially between caste populations in the various regions of India. Linguistic differences may influence the genetic structure of local caste populations [
34]. The linguistically different NTS Upper caste Brahmins showed several differences in comparison to the other Tamil castes in this analysis. Yet, because Indian populations show only a small amount of genetic differentiation, [
17,
53] a large number of autosomal loci will be necessary for adequate power to detect consistent patterns of variation if they are present [
54,
55]. Ancestry-informative autosomal polymorphisms, high-density genotyping, and extensive population sampling will provide better resolution of the relationships between Indian and other Eurasian populations.
The results presented here underscore the complexity of the Indian caste system. Although other interpretations may be possible, our data are consistent with a model in which nomadic populations from northwest and central Eurasia intercalated over millennia into an already complex, genetically diverse set of subcontinental populations. As these populations grew, mixed, and expanded, a system of social stratification likely developed in situ, spreading to the Indo-Gangetic plain, and then southward over the Deccan plateau. A strong patrilineal social structure, accompanied by a developing practice of caste endogamy, may have contributed to an asymmetric apportioning of Y-chromosome, autosomal, and to a lesser extent, mtDNA lineages. Remnants of these patterns can still be detected in some of the inhabitants of peninsular South India.