|Home | About | Journals | Submit | Contact Us | Français|
To evaluate the novel triplex polymerase chain reaction (PCR) assay for the analysis of polymorphic Y-chromosomal short tandem repeat loci (Y-STR).
A total of 14 Y-STR loci was analyzed. Allele frequencies for 3 tetrameric Y-STR loci (DYS449, DYS456, and DYS458) and extended haplotype loci typed by Y-PLEXTM 12 system were investigated in a sample of 50 unrelated healthy Czech male donors. We computed the relevant intra-population statistic parameters for our data (gene diversity, average gene diversity over loci, and mean number of pairwise differences) and compared our sample set with other Central European populations using RST pairwise genetic distance.
We focused on the comparison of genetic diversity between the Y-STR extended haplotype loci and that of the 3 additional loci, and on the benefit of using DYS449, DYS456, and DYS458 in forensic and population genetics applications. Total gene diversity in our sample set was 0.998367 when using all 14 loci. Our data analysis revealed very high genetic diversity at DYS449 locus (0.876735), which surpasses even the diversity at DYS385a/b (0.819592). Population comparison showed no difference between Czech, Bavarian, Austrian, and Saxon sample set. A minor difference was found between Czech and Polish sample set.
Typing of 3 Y-chromosomal microsatellite polymorphisms may provide a useful complement to already established sets of Y-STRs.
DNA typing using a number of polymorphic short tandem repeats on human Y chromosome (Y-STR) has already become a broadly applied approach in areas such as forensic genetics and paternity testing (1). Also, the possibility of amplification of multiple STRs in a single polymerase chain reaction (PCR) provides a very efficient and reliable genotyping tool. Until recently, 219 Y-STRs have been described (2), most of which are polymorphic. In forensic genetics applications, Y-STRs are useful for discrimination of paternal lineages rather than for individual identification. In combination with the biallelic polymorphisms, Y-STRs are also applied in population genetic studies.
The main aim of this study was to design a triplex PCR assay that allows fragmentation analysis of samples labeled with only one fluorescent dye. The loci DYS449, DYS456, and DYS458 were chosen for their reported high diversity in Euro-American population (3), as well as for their absence in the broadly used commercial forensic kits (PowerPlex® Y System [Promega, Madison, WI, USA], Mentype® Argus Y-12QS [Biotype, Dresden, Germany]), although DYS456 and DYS458 (not DYS449) are included in widely used AmpFSTR® Yfiler® PCR Amplification Kit (Applied Biosystem, Foster City, CA, USA) (4). DYS449 and DYS456 have also been used, together with other 25 Y-STR loci, in a major population study (5). Here we report on allele frequency data and basic intra-population diversity indices of the 3 Y-STRs, as well as those of 11 other Y-STR loci included in the extended haplotype set that were analyzed in the Czech population sample.
DNA samples were obtained from 50 unrelated healthy Czech Caucasian male participants living in or around Prague (Czech Republic). All samples were part of the internal DNA depository of the Department of Anthropology and Human Genetics, Faculty of Science, Charles University in Prague and all procedures were conducted in accordance with the institutional ethical guidelines. Fifty samples used in our study were collected from the military conscripts; the collection was conducted on voluntary basis during the period 1995-1998 in the area of Central Bohemia and Prague. The data were strictly anonymous, therefore the DNA samples could not, under any circumstances, be connected with their donors.
The DNA was isolated from peripheral full blood, using the salting-out protocol according to Miller (6). The samples were typed for 2 sets of Y-STR markers. The first set consisted of 11 loci included in the Y-PLEXTM 12 system (ReliaGene Technologies, Inc., New Orleans, LA, USA). These are 9 minimal haplotype loci (minHT): DYS19, DYS385a/b, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, and 2 other polymorphic loci (DYS438 and DYS439). All these loci are suggested for the European extended Y-STR haplotype (exHT) (7). The second set had been developed by our team (8) and it includes the 3 loci (DYS449, DYS456, DYS458). Primer sequences and other related details for these loci are shown in Table 1; forward primers were labeled with 6-FAM (fluorescein). Conditions for the PCR-triplex reaction were as follows: 3.0 mM MgCl2; 1.4x PCR buffer (70 mM KCl, 14 mM Tris-HCl); 250 µM dNTPs; 0.15 U Taq (5 U/µL) (TaKaRa Bio, Inc., Otsu, Japan) and 3 pairs of primers (each of total 200 µM concentration). Total reaction volume was 10 µL and we used 5-20 ng of template DNA per reaction (quantified by spectrophotometry). Initial 95°C denaturation lasted for 6 minutes, followed by 35 cycles at 95°C for 60 seconds, at 64°C for 60 seconds, and at 72°C for 60 seconds. Terminal elongation was extended to additional 10 minutes (Table 1).
PCR products were genotyped in the DNA Sequencing Laboratory, Service Center of the Charles University in Prague, Faculty of Science using ABI PRISM® 3100-Avant Genetic Analyzer and analyzed with Genotyper 3.7 software. Only initial genotyping of DYS449, DYS456, and DYS458 was performed by GENERI- BIOTECH, s.r.o. (Hradec Králové, Czech Republic). Allele ladders for none of the 3 markers were available; therefore, reading of the electropherograms was done by hand. Negative controls (female DNA) were used in all reactions.
Alleles for all the loci were classified according to the recommendations of the International Society of Forensic Genetics (9). The total gene/haplotype diversity was calculated according to Nei (10). Average gene diversity over loci (heterozygosity), mean number of pairwise differences (number of different alleles), and RST pairwise genetic distances were computed using Arlequin software, version 3.1 (11).
Table 2 shows an overview of the haplotypes of all 50 analyzed samples. The data are available in the www.yhrd.org database (release 26). The allelic ranges of DYS449, DYS456, and DYS458, and their frequencies for each locus can be seen in Table 3 (Table 2 and Table 3).
As the main aim of this study was to assess the advantages of using the 3 additional Y-STR loci, we performed 3 different test configurations: 1) using only extended haplotype loci – in total 11 loci, 2) using extended haplotype loci plus the novel triplex loci (ie, DYS449, DYS456, and DYS458) – in total 14 loci, and 3) using only the novel triplex loci. Table 4 shows the number of unique haplotypes discriminated by the 3 different setups we used and the total gene diversity values for each test.
Remarkably, the highest average gene diversity across loci (0.798367) was detected in test configuration using only the novel triplex loci. This result properly reflects the high discrimination capacity of the 3 novel loci. Mean number of pairwise differences values (depending on the number of different alleles) is closely correlated with the number of used loci itself, so the result for test configuration using only the novel triplex loci was the lowest of all tested configurations (Table 4). It is apparent from the data that the total gene diversity or “virtual heterozygosity” in case of this study was very high in all the 3 setup conditions, in spite of the limited sample size. To enable a more precise evaluation of the benefits of using more Y-STR loci, we used values rounded to 6 decimals. The highest gene diversity (0.998367) was measured when using all available loci (test configuration extended haplotype loci plus novel triplex loci). In this case, we were able to distinguish 48 unique paternal lineages (haplotypes) in the set of the 50 samples (discriminating capacity 0.96). It is a much better result when compared with the test configurations with only extended loci or with only novel triplex loci, where only extended haplotype loci (42 unique haplotypes out of the 50 samples; discriminating capacity 0.84), only DYS449, DYS456, and DYS458, were used (40 out of 50; discriminating capacity 0.80).
Results of average gene diversity across loci revealed a remarkable feature. The loci in the new triplex had much higher average gene diversity (test configuration using only the novel triplex loci) than the extended haplotype loci alone or when combined with the new 3 loci). DYS449 was the most diverse locus (Table 5). In our data set it was even more polymorphic than DYS385a/b, the traditional “champion” in gene diversity. This fact, together with high diversities of both DYS456 and DYS458, generated an average gene diversity of the novel triplex of 0.798367. We must admit, however, that it is highly probable that in a larger data set the diversity of the duplicated locus DYS385a/b would rise noticeably and push up the average gene diversity in the test configurations using only the novel triplex loci and extended haplotype loci plus the novel triplex loci. Normal values of gene diversity of DYS385 locus were reported to score over 0.82 (3). From this point of view, our data set is unique with a low diversity at this locus. DYS449 has been reported as an extremely diverse locus not only in European populations (12), but also in Asian populations (South Korea) (13).
Values of the mean number of pairwise differences did not reveal any unexpected information. A small number of loci and their similar allelic ranges within the test configuration using only the novel triplex loci are the reasons for the lower score (2.395102) when compared with the test configurations using only extended haplotype loci and using extended haplotype loci plus the novel triplex loci.
Inter-population comparison using the RST pairwise genetic distance is a common method for assessing population (dis)similarities. We ran this analysis to compare data from 5 different populations. Besides our data set, we included 4 neighboring populations from the Central European region: Germans from Bavaria, Germans from Saxony, Austrians from Innsbruck area, and Poles from Warsaw region. It is worth to note that, due to the inconsistency of the data from different sources, we were able to use only 12 out of 14 presented loci (ie, DYS449 and DYS458 were omitted). Table 6 demonstrates that the analyzed Czechs are very closely related to all neighboring populations. The only significant difference we found was between Czechs and Poles, albeit only on the 5% level of significance. When 1% level of significance was set up, all pairwise genetic distances were non-significant. Closer affinity of the Czech population to both the Austrian and German populations is not surprising given the well-known historical connections between Czechs from Bohemia, Austrians, and Germans from Saxony and Bavaria.
Our foremost finding is the high gene diversity of DYS449, DYS456, and DYS458 (0.876735, 0.795918, and 0.722449) that determines these loci as progressive markers for forensic genetics analyses. Among the 3 mentioned polymorphisms, DYS449 showed the highest levels of genetic diversity. However, we could not include this marker into the population comparison, because of the lack of population data for this polymorphism. In Figure 1, we compare the frequency spectrums of DYS449 alleles from several other studies. The high diversity of DYS449 was already mentioned by Redd (3), who reported DYS449 gene diversity of 0.812. In all subsequent studies, DYS449 was described as one of the most diverse loci. Butler (5) reported its diversity of 0.8318 in a pooled sample of Caucasian, African American, and Hispanic populations. Park (13) found DYS449 diversity in Koreans of 0.8433, while Rodig (12) reported DYS449 gene diversity of 0.8574 for European and Turkish sample set. Despite our small sample size, the frequency spectrum of DYS449 in the Czech population was found to be similar to the results of Rodig (12) with the highest frequencies for alleles 28-32. In several subpopulations (Germans from Munich and Germans from Berlin) from their pooled sample set, Rodig (12) reported higher gene diversity of DYS449 than of DYS385ab. We described the same in our data set.
Interestingly, in our data set we also detected the microvariant 31.2. Similar microvariants have also been documented in Koreans (13) and in Germans from Berlin (12). DYS449.2 microvariants have been shown to mark an important substructure of Y chromosomal phylogenetic lineages from Cameroon (16), although the cases of DYS449.2 microvariants from European haplogroups (R1a, R1b) were all different de novo deletions out of the repeat motive, without any meaning for a shared substructure in chromosomes with these microvariants.
We believe that extension of our sample size would not change the results of DYS449 analysis remarkably, except for possible detection of some of the rare alleles (such as alleles 25 and 26). More importantly, we might be able to detect more copies of the allele 24, which has not been found in western European populations yet, and thus could be limited only to populations from Eastern Europe.
To further investigate the possibilities of using the new Y-STR loci, we ran the haplogroup prediction analysis. For this analysis, we used the software tool Haplogroup Predictor provided by Whit Athey (http://www.hprg.com/hapest5/) (17,18). Using Y-STR information, one can predict the haplogroup of this chromosome. It is not unusual to get the prediction probability values of >99% when using high number of Y-STR polymorphisms. Table 7 shows the results of the analysis. The resolution of Y chromosome haplogroups in the Czech population has already been referred to some extent by others (19-23). A more detailed study has been performed by Luca (24).
These findings make the presented 3 Y-STRs (DYS449, DYS456, DYS458) a useful complement to the already used satellite markers on Y chromosome, consisting of the loci with the reported high gene diversity in Euro-American population. And even the novel triplex alone, showing a gene diversity of 0.991020 in our population sample, has a great potential to discriminate between different paternal lineages. For applications where a high discrimination power is needed, such as human population genetics and especially forensic genetics, we suggest including the loci from our Y-STR triplex together with other commonly used Y-STR polymorphisms (ie, extended haplotype loci), since it enables to obtain additional sensitivity at only a slight increase of the costs. Haplotypes resulting from this study were uploaded into the www.yhrd.org database (Y Chromosome Haplotype Reference Database – release 26) (24) and are fully available for direct forensic comparisons and calculations and also for research in human population genetics.
This research was supported by grant LN00B107 of the Ministry of Education, Youth, and Sports of the Czech Republic to the European Centre for Medical Informatics, Statistics, and Epidemiology (EuroMISE Centre), Institute of Computer Science, Academy of Sciences of the Czech Republic, D.V. i. E.E. was supported also by grant MSM 0021620843 of the Ministry of Education, Youth, and Sports of the Czech Republic. The authors appreciate helpful comments from Jack Freeman. We are also grateful to the 3 anonymous reviewers for their helpful comments to the early version of the manuscript.