Samples were typed with 131 binary markers and 16 STRs from the Y chromosome. In all, we found that 27 binary markers were polymorphic and identified 18 haplogroups (). All 18 haplogroups were found in the Indian sample group and 13 were present in the Jaunpur samples (; full data are in Supplementary Table 1). The three most frequent haplogroups accounted for 52% (45/86) of the Indian Y chromosomes but for 76% (108/146) of Jaunpur Y chromosomes. Of these haplogroups, R1a1 and R2 were present in all castes with R1a1 being the most common haplogroup among the Indian caste samples (22%), especially among the Brahmins (35%) while R2 was the most common in the Jaunpur district (36%) and represented 87% of Jaunpur Kshatriya Y lineages. H1a* was the third most common haplogroup present in 16% of the Indian samples, and was equally distributed among all castes except the Vaishyas. In the Jaunpur district it represented 20% of the samples and was particularly frequent among Jaunpur Brahmins (65%). L1 was also well-represented among the Indian samples with a frequency of 10% but was relatively rare in Jaunpur (2%). These haplogroup frequencies are in good agreement with previous studies of caste populations (Sengupta et al. 2006
), and suggest that our samples are representative of the larger caste population.
Figure 1 Y-chromosomal phylogeny of binary markers. The markers screened in this study are shown on each branch, and the haplogroup or paragroup (marked *) at the end. The haplogroups found in this survey are indicated in red. The pie charts are drawn with an (more ...)
Haplogroup frequencies and haplogroup and haplotype genetic diversities
Overall, the Jaunpur castes showed a marked reduction in genetic diversity compared with the rest of India. However, this reduction was not equally distributed among the castes, but was instead restricted to the Brahmins and Kshatriyas. Genetic diversity (Nei 1987
) based on haplogroup frequencies ranged from 0.84 to 0.91 (Standard Error (SE) <±0.08) in the Indian castes, and similar values were also detected among the Vaishyas, Sudras and Panchamas from the Jaunpur district. But it was 0.54 (SE ±0.1) among the Jaunpur Brahmins and only 0.23 (SE ±0.08) in the Jaunpur Kshatriyas (). A similar pattern was also seen when the microsatellite analysis was carried out. From the 227 samples we identified 156 microsatellite haplotypes, 81 in each sample group. Only 6 haplotypes were shared between the two sample groups and none between haplogroups. Of the 156 haplotypes, 133 (85%) were individual-specific, while the rest were shared by 2 to 27 individuals. As expected for these fast-mutating markers, a high genetic diversity was found in the Indian populations with values between 0.97 and 1 (SE <± 0.06) among the different castes. A similar range was also seen among the Vaishyas, Sudras and Panchamas from the Jaunpur district, but it decreased to 0.83 (SE ±0.06) in the Jaunpur Brahmins and to 0.73 (SE ±0.07) in Jaunpur Kshatriyas. These last two results are particularly striking since they are the lowest ever described using such a large number of microsatellites (Qamar et al. 2002
; Xue et al. 2006
; Zerjal et al. 2002
To quantify the patterns of genetic variation among castes further, we calculated summary statistics and applied other methods of analysis to reveal features of the structure within each population, concentrating on microsatellite data because of their freedom from ascertainment bias (, Figures , and ). Low-diversity castes showed low θk
values (less than 10) indicating a smaller male effective population size (Helgason et al. 2000b
). The θk
statistic also allows us to investigate the effect of the small sample sizes on our conclusions (Helgason et al. 2000a
). We calculated the expected number of haplotypes after sampling to saturation (the point at which increasing the sample size by 10 is expected to add less than one new haplotype). The result () shows that sampling is incomplete, even in Jaunpur, but that the expected number of haplotypes in a lower caste such as the Panchamas is similar (to within a factor of two) when Jaunpur and the rest of India are compared, but very different (by a factor of 18) when an upper caste such as the Brahmins is examined. A concordant result was also obtained when a Bayesian approach which models both mutational and demographic processes, implemented in the program BATWING (Wilson et al. 2003
), was used to estimate effective population sizes of the different castes as described (Xue et al. 2006
). With this method the Brahmins and the Kshatriyas from Jaunpur again showed the smallest effective population sizes (). To measure the mutational distance between all pairs of haplotypes within castes, we calculated Average Squared Distance (ASD) values. Most castes showed ASD values greater than 145; however, the Brahmins and the Kshatriyas from Jaunpur were exceptions to this pattern, with values of 92 and 30 respectively (). Cumulative numbers of pairwise differences were used to compare the amount of genetic variation between haplotypes for all the castes. The Brahmins and the Kshatriyas from Jaunpur showed outstandingly low numbers of pairwise differences compared to the rest of the castes ().
Population Statistics based on Microsatellite data
Figure 2 Cumulative pairwise differences between microsatellite haplotypes within each caste. The caste samples are indicated by the same colours as in : from Jaunpur, Brahmins (J-B), Kshatriyas (J-K), Vaishyas (J-V), Sudras (J-S) and Panchamas (J-P); (more ...)
Figure 3 Median-joining network for all the microsatellite haplotypes found in this study. Binary marker data are also included. Circles represent haplotypes and the size is proportional to the number of chromosomes; the color scheme is the same as in (more ...)
MDS analysis of population pairwise RST values based on microsatellite haplotypes. The color scheme and abbreviations are the same as in Figures and . The stress value was 0.17.
To visualize the haplotype variation within each caste, we constructed a median-joining network from combined binary and microsatellite data using the program Network (version 18.104.22.168) (Bandelt et al. 1999
) (). Two star-shaped clusters stand out. The most striking one lies among haplogroup R2 chromosomes (M124-derived; lower left corner of the network). In this cluster, the central haplotype is present at high frequency (12% of the entire sample set) and contains mostly (24/27) Jaunpur Kshatriya samples. The surrounding haplotypes (one step away) also belong mainly to this population, while the more distant haplotypes are from a variety of castes from both Jaunpur and India. The second cluster in the network lies in haplogroup H1* (derived for M82; lower right corner of the network) and consists almost entirely of Jaunpur Brahmin chromosomes (13/15) distributed among three haplotypes. These star-like patterns are informative because they are, for neutrally evolving loci like the Y chromosome, the genetic signature of population bottleneck or founder events followed by drift and population expansion (Xue et al. 2005
; Zerjal et al. 2002
; Zerjal et al. 2003
). It thus appears that in the Jaunpur district such events must have occurred, but only in the Brahmins and Kshatriyas. It is interesting to note that most of the samples in the Kshatriya cluster belong to the Chandel community. An oral tradition in the community claims that they are descendants of a king named Ajhuraj who moved in the Jaunpur area about 400 years ago, providing possible historical support for the genetic evidence of a founder event in this caste. The existence of a haplotype cluster that is highly specific for the Jaunpur Kshatriyas allows the recognition of likely gene flow from this caste to others in the same area, from the sharing of this cluster of lineages. We estimated the TMRCA (mean ± SD) for the Kshatriya cluster using the program Network 22.214.171.124 with the ‘evolutionary’ mutation rate proposed by Zhivotovsky et al. (Zhivotovsky et al. 2004
). Since the shape of the genealogy is not known, we calculated minimum and maximum numbers of meioses corresponding to late or early branching. Assuming that each chromosome from this lineage in another caste represented a single gene flow event, we estimated the rate of Y-chromosomal movement out of the Kshatriyas at 0.7% per generation (0.104%-1.48%), somewhat lower than previously proposed rates (Wooding et al. 2004
Microsatellite haplotype frequencies and their molecular distances were used to calculate the molecular variance (Analysis of Molecular Variance, AMOVA) within and between castes (Excoffier et al. 1992
), and population pairwise genetic distances (Rst) (Slatkin 1995
). When all castes were pooled together in one group, a substantial fraction of variability (14.5%, P
<0.0001) was due to differences among castes. This value is higher than those described in previous studies with smaller number of Y binary markers (Basu et al. 2003
), suggesting a stronger genetic differentiation among the populations examined here. However the “among castes variation” dropped to 1.2% (P
>0.05) if the Brahmins and the Kshatriyas from Jaunpur were removed from the calculation. This analysis leads to two important conclusions. First, the majority of the genetic structuring is indeed restricted to the Jaunpur upper castes (Brahmins and the Kshatriyas); second, the other castes are not significantly different from one another by this measure. We displayed the pairwise RST
values obtained from haplotype data as an MDS (Multi-Dimensional Scaling) plot (). The Brahmins and the Kshatriyas from Jaunpur appear as outliers, while the rest of the castes are quite tightly grouped. Within this group, however, no clustering is evident, either according to caste or distinguishing the Jaunpur from the Indian samples. We performed the same analysis of molecular variance using haplogroup frequencies and pairwise ΦST
values. The results (not shown) were similar to , showing that consistent conclusions can be obtained using different markers.