|Home | About | Journals | Submit | Contact Us | Français|
Twenty-six Y-chromosomal short tandem repeat (STR) loci were amplified in a sample of 769 unrelated males from Nepal, using two multiplex polymerase chain reaction (PCR) assays. The 26 loci gave a discriminating power of 0.997, with 59% unique haplotypes, and the highest frequency haplotype occurring 12 times. We identified novel alleles at four loci, microvariants at a further two, and nine examples of amelogenin-Y deletions (1.2%). Comparison with a similarly sized Bhutanese sample typed with the same markers suggested histories of isolation and drift, with drift having a greater effect in Bhutan. Extended (11-locus) haplotypes for the Nepalese samples have been submitted to the Y-STR Haplotype Reference Database (YHRD).
The analysis of multiple Y-chromosomal short tandem repeats (STRs) provides informative male-specific DNA profiles in forensic analysis. As well as possessing high discriminating power in distinguishing individuals, haplotypes defined by STRs can provide information about likely geographical origin, since they are often concentrated in particular populations or regions.
Population databases of Y haplotypes  are increasing in size and coverage, greatly contributing to the utility of Y-chromosomal analysis in forensic casework. In this study we describe alleles at 26 Y-STRs, and properties of the haplotypes they define, in a large sample of a previously unrepresented population, that of Nepal in the Himalayas. Eleven-locus haplotypes have been submitted to the Y-STR Haplotype Reference Database (YHRD), and full data are available from the authors on request. Our report follows guidelines for the publication of population data .
Sampling and Y-chromosomal analysis of 769 Nepalese males was undertaken as part of a larger collaborative project  investigating genetic diversity in Himalayan populations within the framework of their cultural and linguistic diversity . Here we describe our initial findings with Y-STRs, treating the Nepalese sample as a single population; future publications will explore genetic relationships between subpopulations of the Himalayas. The sample represents 15 distinct ethnolinguistic groups widely distributed throughout Nepal, with ~75% of sampled individuals speaking languages belonging to the Tibeto-Burman family, and the remainder speaking Indo-European languages.
In this study, we employ the same set of Y-STRs as that used recently to analyse 856 Bhutanese males . This allows us to carry out a preliminary comparison of diversity and haplotype sharing between these two Himalayan samples.
Seven hundred and sixty-nine Bhutanese males provided blood samples with informed consent, and DNA was extracted as described . DNA samples from collections of the authors, including Y Chromosome Consortium (YCC) cell lines , were used as haplotype reference materials.
Two PCR multiplexes (a 20plex  and a partially overlapping 14plex ) were used to type 26 Y-STRs, as follows: DYS19, DYS385a/b, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS425, DYS426, DYS434, DYS435, DYS436, DYS437, DYS438, DYS439, DYS447, DYS448, DYS460, DYS461, DYS462, YCAIIa/b, and Y-GATA-H4.1. The eleven Y-STR markers in the European ‘extended haplotype’ (http://www.yhrd.org/) are indicated in bold. The 14plex includes the amelogenin sex test. Full details of the protocol are given by Parkin et al. .
Allele nomenclature (explained fully in Parkin et al. ) was according to Butler et al.  and Bosch et al. , with the exception of DYS439, DYS448 and Y-GATA-H4.1, where nomenclature was changed for compatibility with ISFG recommendations . Compared to Butler et al. , seven repeats were subtracted from DYS439, three subtracted from DYS448, and eight added to Y-GATA-H4.1.
Tables 1 and 2 show the allele frequency distributions for all the Y-STRs studied. Diversities of individual STRs are comparable with those observed in a recently studied Bhutanese sample: DYS385 (when considered as a genotype, Table 2) is the most diverse marker within the Y-STR set, with a gene diversity (h) of 0.915, and the most polymorphic single-locus marker is DYS439 (h = 0.726).
Previously unreported alleles (defined with reference to Butler , Parkin et al.  and STRBase, http://www.cstl.nist.gov/biotech/strbase/index.htm) were found at four loci, as follows: DYS426 (allele 13), DYS437 (allele 11), DYS439 (allele 15), DYS447 (alleles 17, 18 and 19).
‘Null’ alleles or multiple peaks were reproducibly obtained at a number of loci. For DYS448, three individuals carried null alleles, while one carried both alleles 20 and 21. For DYS461, one individual carried both alleles 13 and 14. As observed previously , DYS425 exhibits a relatively high frequency of various nulls and duplications.
Microvariants (partial alleles) were observed at two loci (Tables 1 and 2) and confirmed in uniplex assays after initial detection in multiplexes. Those at DYS385 were not investigated further, but those at DYS447 were analysed by sequencing, and shown to result from a deletion of 1 bp within the pentanucleotide repeat array .
Nine chromosomes showed absence of the amelogenin Y (AMELY) peak in electropherograms. Analysis of sequence-tagged sites revealed that these chromosomes carry interstitial deletions of Yp including the AMELY locus (data not shown); none showed null Y-STR alleles, however, which is consistent with the size and location of known AMELY deletions with respect to the position of Y-STR loci . A previous study has found AMELY deletions at a frequency of ~2% in India , so our finding of deletions at 1.2% frequency in Nepal is not unexpected; in contrast, however, none were found in our previous study of Bhutan . These AMELY deletion chromosomes form part of a large set that is currently being characterised, and will be described fully elsewhere.
Haplotype diversity (equivalent to power of discrimination, PD) was calculated, omitting chromosomes carrying null alleles and duplications. This provided a sample size of 741. For the full set of 26 Y-STRs, there are 437 unique haplotypes (59.0%), and PD is 0.9970. The corresponding values for the 20plex , extended (11-locus) haplotype and minimal (9-locus) haplotype are shown in Fig. 1.
Fig. 1 also shows the distribution of haplotypes present more than once in the dataset. Despite the large number of loci used here, in the 741 males one 26-locus haplotype is shared by 12 individuals (Fig. 1a), and a further 13 haplotypes are shared by between 5 and 9 individuals; notably, all these common haplotypes are restricted to particular subpopulations, illustrating the influence of drift. Reduction to 11-locus extended haplotypes allows a global search within the YHRD (release 18): this fails to find matches for three of the six most common Nepalese extended haplotypes (frequency ≥10), consistent with isolation and drift.
The availability of large Y-STR haplotype datasets on Nepalese and Bhutanese samples allows us to make comparisons between the frequencies and distributions of alleles and haplotypes in these two Himalayan populations.
Allele distributions at individual loci are similar between the Nepalese and Bhutanese samples, but this gives little information about population relationships. Particular rare and distinctive alleles may carry more information, because they probably reflect identity-by-descent: a good example of this is the sharing of microvariants at DYS447 , but apart from this there is little evidence for specific inter-population sharing.
Comparison of haplotype distributions reveals a striking difference between the two populations. The proportion of unique haplotypes in the Nepalese sample is significantly greater than that in the Bhutanese, for all four haplotype resolutions considered (Fig. 2). For example, for the extended haplotype there are 41.8% (±1.8%) unique haplotypes in Nepal, but only 23.3% (±1.5%) in Bhutan. This is explained by the presence of several common haplotypes at high frequency in Bhutan: in the Nepalese dataset, the most common extended haplotypes are each present in 13 individuals, while in the Bhutanese there are haplotypes present in 15, 16, 24 (two instances) and 27 individuals .
There are no 26-locus haplotypes shared between Nepal and Bhutan, indicating an absence of very recent gene flow. However, forty extended haplotypes are shared between the two samples, and their relationships (omitting the bilocal marker DYS385) are illustrated in a median-joining network in Fig. 3. Most of them fall into one large cluster, with haplotypes linked by single mutational steps, probably representing a common Y-SNP haplogroup. Other shared haplotypes are more widely spread, and may represent several different haplogroups.
To ask if these shared extended haplotypes are more generally common and widespread, we sought matches for the six most predominant examples (combined frequency >10) within the YHRD. Three of the six haplotypes find a total of six exact matches, all within populations originating from China or the Indian subcontinent. We also find a total of 30 one-step mutational neighbours for five of the six haplotypes, all of Asian origin. One haplotype finds neither exact matches nor one-step neighbours. Thus, the common haplotypes shared between Nepal and Bhutan are Asian-specific, but not generally frequent.
Our study emphasises the discriminating power of high-resolution Y-STR typing, and provides the first substantial dataset on a Nepalese sample. The comparison of Nepalese and Bhutanese datasets reveals an interesting overall picture of isolation and drift within these Himalayan populations, with drift having a greater effect in Bhutan than Nepal. Haplotype sharing provides evidence of some gene flow between Nepal and Bhutan, or possibly of gene flow into both from some other population. Further light will be thrown on these relationships when Y-SNP data become available.
We thank all DNA donors, and the many organisations and volunteers of indigenous language communities in Nepal who gave us their assistance. The research was conducted in association with the Centre for Nepal and Asian Studies (CNAS) at Kirtipur under the Bilateral Agreement for Academic Cooperation between Tribhuvan University and Leiden University. As part of the European Science Foundation EUROCORES Programme OMLL, this work was supported by the Arts and Humanities Research Council and the EC Sixth Framework Programme under Contract no. ERAS-CT-2003-980409. T.K., G.L.v.D. and P.dK. were supported by funds from the Netherlands Organisation for Scientific Research (NWO grant number 231-70-001). M.A.J. was supported by a Wellcome Trust Senior Fellowship in Basic Biomedical Science (grant no. 057559).