|Home | About | Journals | Submit | Contact Us | Français|
We analysed 67 short tandem repeat polymorphisms from the non-recombining part of the Y-chromosome (Y-STRs), including 49 rarely-studied simple single-copy (ss)Y-STRs and 18 widely-used Y-STRs, in 590 males from 51 populations belonging to 8 worldwide regions (HGDP-CEPH panel). Although autosomal DNA profiling provided no evidence for close relationship, we found 18 Y-STR haplotypes (defined by 67 Y-STRs) that were shared by two to five men in 13 worldwide populations, revealing high and widespread levels of cryptic male relatedness. Maximal (95.9%) haplotype resolution was achieved with the best 25 out of 67 Y-STRs in the global dataset, and with the best 3-16 markers in regional datasets (89.6-100% resolution). From the 49 rarely-studied ssY-STRs, the 25 most informative markers were sufficient to reach the highest possible male lineage differentiation in the global (92.2% resolution), and 3-15 markers in the regional datasets (85.4-100%). Considerably lower haplotype resolutions were obtained with the three commonly-used Y-STR sets (Minimal Haplotype, PowerPlex Y®, and AmpFlSTR® Yfiler®). Six ssY-STRs (DYS481, DYS533, DYS549, DYS570, DYS576 and DYS643) were most informative to supplement the existing Y-STR kits for increasing haplotype resolution, or – together with additional ssY-STRs - as a new set for maximizing male lineage differentiation. Mutation rates of the 49 ssY-STRs were estimated from 403 meiotic transfers in deep-rooted pedigrees, and ranged from ~4.8×10−4 for 31 ssY-STRs with no mutations observed to 1.3×10−2 and 1.5×10−2 for DYS570 and DYS576, respectively, the latter representing the highest mutation rates reported for human Y-STRs so far. Our findings thus demonstrate that ssY-STRs are useful for maximizing global and regional resolution of male lineages, either as a new set, or when added to commonly-used Y-STR sets, and support their application to forensic, genealogical and anthropological studies.
Human Y-chromosomal short tandem repeat polymorphisms (Y-STRs) or microsatellites, usually in combination as haplotypes, are widely used to resolve and relate male lineages in forensic, genealogical, evolutionary and anthropological studies [1-4]. Haplotype resolution is central to most applications of Y-STRs and depends not only on the number of markers used but also on their independent mutability. Differences in Y-STR diversity and allelic spectra between geographic regions are well known [5,6], and can sometimes be used to infer the geographic region of paternal ancestry of a person [7,8]. However, they can also result in differences in haplotype resolution between loci and geographic regions . Also, at a more local level, higher frequencies of indistinguishable Y-chromosomes can sometimes be found due to members of the same male lineage living in the same geographic region, a phenomenon usually referred to as (male) population substructure. It has been observed that 7-16 highly polymorphic Y-STRs are insufficient for differentiating male lineages when applied to populations that underwent a strong (male) bottleneck in their history: for example, identical Y-STR haplotypes were found in two populations from Pakistan with a frequency of 14% (16 Y-STRs) , in Finns at 13% (16 Y-STRs) , or even over entire geographic regions such as Polynesia at 16% (7 Y-STRs) . Reduced Y-STR diversity leading to a large number of indistinguishable Y-STR haplotypes can also be caused by cultural effects such as patrilocal residence pattern and polygyny as previously observed in New Guinea , or by strongly biased male expansion due to male occupation history and privilege as in Central and Eastern Asia [14,15]. The resulting lack of resolution can be especially problematic when a limited number of Y-STRs is used in forensic applications of male lineage identification.
Therefore, more Y-STRs than included in the three commonly-used sets (the 9 Y-STRs comprising the so-called Minimal Haplotype, the 12 Y-STRs included in the PowerPlex Y® System [Promega], or the 17 Y-STRs from the AmpFlSTR® Yfiler® PCR Amplification Kit [Applied Biosystems]) are needed for improving the resolution of male lineage differentiation in particular populations, and also for differentiating male relatives in any population. A large number of additional Y-STRs (166) have been described previously ; however, population-genetic data are still scarce for most of these additional markers. In this study, we analysed 67 Y-STRs in 590 unrelated males from 51 globally distributed populations covering eight geographic regions from all inhabited continents except Australia, (the HGDP-CEPH panel ). These 67 Y-STRs comprise 18 previously-used Y-STRs including all of those that are part of commercially-available Y-STR kits, as well as 49 additional Y-STRs described recently [16,18]. From the latter, we chose simple loci that exist in a single copy on the non-recombining part of the human Y-chromosome and contain only one uninterrupted variable stretch of repeats to avoid the problems of length homoplasy (as with complex Y-STRs) and allele-locus assignment (as with multi-copy Y-STRs). Simple Y-STRs also have a great advantage over complex markers due to a more direct relationship between mutation rate and length variation  as relevant in evolutionary studies. Previous analysis suggests that ssY-STRs may lead to more precise time estimates when applied to male lineage dating in anthropological and evolutionary studies due to a higher correlation between repeat count and repeat variance compared with complex Y-STRs . Single-copy Y-STRs do not suffer from the problem of equivocal allele-locus assignment usually associated with multi-copy Y-STRs, which may result in an underestimation of the haplotype resolution, and additionally can cause problems in correctly inferring the number of males who contributed to a crime scene sample in forensic studies.
Here, we investigate male lineage differentiation, both on a global level and regional levels, considering a total of 67 Y-STRs as well as a set of 49 rarely-studied simple single-copy Y-STRs (ssY-STRs) alone. In addition, we studied improvements of global and regional haplotype resolution by adding the most informative ssY-STRs to the three commonly-used sets of Y-STRs. Finally, we estimated mutation rates for all 49 ssY-STRs by analyzing deep-rooted pedigrees to understand the basis of their value in resolving male lineages and to stimulate future uses of these markers in forensic, genealogical, and anthropological studies where the knowledge of mutation rates is crucial.
DNA samples from the Human Genome Diversity Panel (HGDP)  were provided by The Centre d’Etude du Polymorphisme Humain (CEPH). Samples where identity, first and second degree biological relationship, origin mix-up or duplicated samples were identified previously based on autosomal 783 STRs  have been excluded from the analysis and only males from the H952 set  were used. Altogether 590 males from 51 populations of 8 worldwide geographic regions were studied: 163 from East Asia (18 populations), 84 from Europe (8), 48 from the Middle East (3), 20 from North Africa (1), 16 from Oceania (2), 20 from the Americas (5), 162 from South and West Asia (8), and 77 from Sub-Saharan Africa (6). In addition, DNA samples from 104 members of 28 deep-rooted pedigrees from Canada , Germany , and China  covering all together 403 meiotic transfers were analysed.
We genotyped 49 simple single-copy Y-STRs ascertained from 166 recently described Y-STRs  (see Supplementary Table S1 or Table 3 for the list of all markers) using 15 small multiplex reactions as described elsewhere  with the following modifications. PCRs on HGDP-CEPH samples were carried out in 384-well microtiter plates using 10 μl volumes containing 1× PCR buffer including 1.5 mM MgCl2 (Applied Biosystems Inc., Foster City, CA), 1 mM dNTPs (Roche Diagnostics GmbH, Mannheim, Germany), 0.25 units of AmpliTaqGold DNA polymerase (Applied Biosystems), 0.25–1 ng DNA and the primer pairs and concentrations described in Supplementary Table S1, using a GeneAmp PCR Systems 9700 machine (Applied Biosystems). Initial denaturation was at 95°C for 15 min, followed by 20 cycles of touchdown PCR: 94°C for 30 s, 70°C for 45 s, 72°C for 1 min, with a 1°C decrease in annealing temperature every cycle, and then 15 cycles of standard PCR (94°C for 30 s, 50°C for 45 s, 72°C for 1 min), and finishing with extension at 60°C for 45 min and storage at 4°C. PCRs of deep-rooting pedigree samples were carried out in 96-well microtiter plates in 20 μl volumes using a MJ Research PTC-200 (Bio-Rad, Hercules, CA). PCR fragment lengths were analyzed by mixing 1 μl of PCR product with 9.7 μl Hi–Di Formamide (Applied Biosystems) and 0.3 μl CXR 60–400 bases size marker (Promega, Madison, WI) and running on 36 cm×50 μm capillaries containing POP-4 polymer (Applied Biosystems) on an ABI Prism 3100 Genetic Analyzer (Applied Biosystems). Capillary electrophoresis was carried out at 1 kV for 22 s followed by 15 kV for 25 min with a run temperature of 60°C. Allele sizes were measured using GeneMapper v3.7 software (Applied Biosystems). In addition, 18 previously used Y-STRs (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, Y-GATA-H4, and DYS388) of which all except DYS388 were genotyped using the AmpFlSTR® Yfiler® PCR amplification kit(Applied Biosystems) according to the instructions provided by the manufacturer. DYS388 was genotyped as described elsewhere . Furthermore, 15 autosomal STRs were genotyped in the HGDP-CEPH samples using the PowerPlex 16 kit (Promega) according to the instructions provided by the manufacturer.
Haplotype resolution was estimated by dividing the number of haplotypes identified in the sample by the total number of samples per global or regional dataset. Evaluation of Y-STR loci according to their contribution towards increasing haplotype resolution was achieved by applying a hill-climbing approach. The marker with the highest variability was selected first. The next marker was then chosen according to the maximal increase in the resolution of the haplotype containing the already-chosen markers and the additional marker. To reduce computing time, individuals that were already distinguished by the chosen markers were excluded from the next calculation. In this way, we obtained the minimal number of markers that provided the maximal increase in haplotype resolution in a given set of individuals. Haplotype resolution was preferred to rank the Y-STR loci according to their ability for maximizing male lineage differentiation as it is a more direct measure for resolving haplotypes than the often used haplotype diversity. However, we would like to point out that simple haplotype resolution as estimated here is sensitive to sample size effects, which shall be considered when interpreting large values from small sample sizes. Mutation rates were estimated using a Bayesian-based approach as described in detail elsewhere . In brief, a hierarchical binomial model was used. Three different MCMC chains were run in parallel for each mutation rate estimation in order to improve chain mixing. Each MCMC chain consisted of 100,000 runs but only the last 50,000 were retained. Of the 50,000 retained simulations per parameter estimation, a thin of 15 was applied to reduce the amount of autocorrelation of close steps. Therefore, a final set of 10,000 simulations was used.
Male lineage differentiation was investigated in 590 HGDP-CEPH samples from the H952 set  using 67 Y-STRs, and identified 563 different haplotypes. Notably, we found 47 males who were involved in the sharing of 20 haplotypes representing 16 pairs, two trios, one quartet and one quintet of individuals (Table 1). All males who shared a complete 67-locus Y-STR haplotype were sampled from the same respective population. This finding is remarkable as based on the previous analysis of 783 autosomal STRs it was concluded that the H952 set contains no pairs of relatives closer than first cousins, with possible exceptions in the Karitiana and Surui from Brazil . We additionally tested for close relationships between the individuals sharing a complete 67-locus Y-STR haplotype by inspecting the data of 15 autosomal STRs usually applied to forensic human identity testing (PowerPlex 16 System, Promega) that were originally generated for other purposes (PdK in preparation). We found that males who share an identical Y-chromosome differed in their autosomal STR profile by 19 (63.3%) to 28 (93.3%) of the 30 possible alleles (with the exception of the Karitiana and Surui, described below). This autosomal DNA finding excludes close biological relationships and is in agreement with Rosenberg’s conclusions . However, two Karitiana males differed by only 50% of the 30 autosomal STR alleles and three Surui males by only 46.7% or 50%, respectively, indicating that a somewhat close relationship cannot be excluded in these cases also in agreement with Rosenberg ; thus, one of the Karitiana and two of the Surui were excluded from further analyses (Table 1). This left a total of 587 males in the global dataset, of which 24 could not be differentiated by the 67 Y-STRs analyzed.
Our Y-chromosomal and autosomal STR data together with Rosenberg’s extensive autosomal STR data  implies that substantial male relatedness in the respective populations exists that shall be further distant than the level of first cousins. Male-line common ancestors in these populations must have lived sufficiently long ago for the observed autosomal STR diversity to have accumulated, largely by random assortment and recombination, but not so long ago that Y-STR mutations would have accumulated. It also demonstrates that males who are not closely related based on autosomal DNA evidence may nevertheless be difficult to differentiate by Y-chromosomal DNA-analysis, even when large numbers of Y-STRs are used. The occurrence of this phenomenon in 13 of the 51 populations from five of the eight continental regions studied (East Asia, Middle East, Oceania, South Asia and sub-Saharan Africa) indicates that this may not be a rare phenomenon, at least in anthropological samples i.e. samples from indigenous populations of usually small population size. Additional effects that could have caused the observed Y-chromosome sharing may include strong male bottlenecks, preferential mating and polygynie, patrilocality and strongly biased male expansion due to male occupation history and privilege.
We investigated the maximal and highest possible differentiation of the 587 HGDP males that could be obtained, both on the global and various regional scales, and the minimal number of Y-STRs needed for this (Table 2, Figure 1 and and2,2, Supplementary Table S2). We considered five marker sets: i) all 67 Y-STRs analysed, ii) 49 rarely-studied ssY-STRs, and for comparative reasons the three commonly-used Y-STR sets iii) 17 Y-STRs included in the AmpFlSTR® Yfiler® PCR Amplification Kit (Applied Biosystems) (Yfiler), iv) 12 Y-STRs included in the PowerPlex Y® System (Promega) (PPY), and v) 9 Y-STRs comprising the so called Minimal Haplotype (MH). In the analysis of all 67 Y-STRs, the best 25 markers were sufficient to obtain the highest possible haplotype resolution of 95.9% in the global dataset (Table 2), which also marked the maximal resolution in this dataset (see above for why this is lower than 100%). From the 49 ssY-STRs, 25 best markers were enough to obtain the highest possible haplotype resolution of 92.2% in the global dataset (Table 2), 3.7% lower than the maximal value. In comparison, the highest possible haplotype resolution obtainable from the global sample set using Yfiler markers, which was already reached with 15 of the 17 markers (only one of GATAH4, DYS438 and DYS391 was needed to improve resolution), was 90.5%. In particular, this value is 1.7% lower than the resolution obtained with the most informative 25 ssY-STR, and 5.4% lower than the maximal resolution in the global dataset obtained with the best 25 of all 67 markers (although only about half of the number of markers were used with Yfiler). This shows that the male lineage differentiation obtained with the commonly-used Yfiler kit is fairly high, but also that it does not always reflect the maximum resolution, and can be further increased using additional Y-STRs (see below). However, considerably lower resolutions were obtained with the PPY Y-STRs at 84.7% (for which all 12 markers were needed), which is 11.2% lower than maximum in the global dataset, and especially with the MH Y-STRs at 81.3% (for which all 9 markers were needed), which is 14.6% lower than the maximal resolution in the global dataset. This may be expected from the smaller number of markers involved, but does clearly illustrate the limitation of both marker sets, in particular of the Minimal Haplotype traditionally used in forensic and anthropological studies, and the need of supplementation with additional Y-STRs (see below).
The separate analyses of the eight geographic regions (Figure 1 and and2,2, Supplementary Table S2) using the complete set of 67 Y-STRs revealed maximal haplotype resolutions ranging from 89.6% (with 11 markers) in the Middle East to 100% in Europe, North Africa and Native Americans (with eight, seven and five markers, respectively). Complete individualization in the latter three regions were observed because they did not include males with identical 67-locus Y-STR haplotypes (or potentially closely related males were excluded on the basis of autosomal DNA profiling as with Native Americans, see above). For the regional analysis using the 49 ssY-STRs, the highest possible resolution ranged from 85.4% (with 11 markers) in the Middle East, to 100% in Europe (with 11 markers). The Yfiler analysis provided highest possible regional haplotype resolutions ranging from 79.2% in the Middle East (with eight markers) to 100% in North Africa (with seven markers). As expected, and also seen in the global dataset, male differentiation was more limited when using the PPY Y-STRs (68.8% with seven markers in the Middle East to 96.4% in Europe with 10 markers) and even more so with the MH Y-STRs (from 66.6% in the Middle East with six markers to 88.1% in Europe with all nine markers), clearly demonstrating the limitations of these markers sets for male lineage differentiation in egional analysis in addition to global analyses as described above, and the need for supplementing with additional Y-STR markers (see below).
Figure 3 illustrates the loss of resolution with the various Y-STR marker sets in relation to the maximal male lineage differentiation as obtained with the most informative 25 of all 67 Y-STRs studied. Loss of resolution obtained with the minimal set of ssY-STRs ranged from zero (three markers) for Oceania to 11.8% (three markers) for Native Americans; from zero in Oceania (three markers) and North Africa (seven markers) to 10.4% in the Middle East (eight markers) for the minimal set of the Yfiler markers; from zero in Oceania (three markers) to 20.8% in the Middle East (seven markers) for the minimal set of the PPY markers; and from 6.3% in Oceania (three markers) to 25% in North Africa (six markers) for the minimal set of the MH markers. We found that the highest possible haplotype resolution usually decreased with decreasing numbers of markers initially considered in the marker sets (Figure 1), and, consequently, loss of resolution increased (Figure 3), both in the global as well as the regional datasets. This may be expected as more markers considered initially provide more possibilities to find the minimal number with maximal resolution, but again demonstrates the limitation of the commonly-used Y-STR sets. Exceptions were Oceania where the highest possible haplotype resolution was obtained with all minimal markers sets (except for MH); North Africa as well as in Native Americans, where the highest possible haplotype resolution provided by the minimal set of Yfiler markers was higher than that obtained with the minimal set of ssY-STRs; and again in Native Americans where the same maximal haplotype resolution was obtained with the minimal sets of PPY and MH markers. We also found that the minimal number of Y-STRs needed to reach highest possible haplotype resolution usually decreased with decreasing numbers of markers initially included in the marker sets, although some exceptions were observed (Figure 2). Most notable were the findings in Oceania, where only three Y-STRs from all five sets (although not always the same ones) were sufficient to provide maximal possible haplotype resolution, which additionally was identical for four of the five sets. This can be explained by strong bottleneck / founder effects in the regional population history and can be additionally influenced by the low sample size of 16 individuals from the two populations analysed.
Furthermore, we investigated how many and which of the Y-STRs from the five marker sets were most informative for maximizing male lineage differentiation in the global and regional datasets (Table 2, Figure 2, Supplementary Table S2). From the 67 Y-STRs, three (Oceania) to 26 (South Asia) were needed to reach maximal resolution in the regional datasets, compared with the 25 best markers that provided maximal resolution in the global dataset. Highest possible differentiation was reached with three (Oceania) to 15 (South Asia) ssY-STR markers compared with the 25 best ssY-STRs that achieved highest possible resolution in the global dataset. Of the 17 Yfiler markers, between three (Oceania) and 12 (South Asia) were needed for the highest possible resolution (15 markers were needed in the global analysis), as well as between three (Oceania) and 10 (South Asia) for the 12 PPY markers (all 12 markers were also needed in the global analysis), and between three (Oceania) and all nine (Europe) for the MH markers (all nine markers were also needed in the global analysis).
The results from the global analysis suggest that three rarely-studied ssY-STRs (DYS570, DYS576, and DYS481) partly in addition to two commonly-used Y-STRs (DYS385, DYS458) contributed most to haplotype resolution (Table 2). However, the regional analyses revealed more heterogeneity in the most resolving markers (see Supplementary Table S2), as may be expected. The contribution of DYS385 may be overestimated since the two loci were not separated experimentally; instead, allele-locus assignment was performed artificially by assigning the smaller of the two alleles to one locus and the larger to the other, which has been shown to not always represent the true assignment when experimentally separating both alleles . Moreover, two other ssY-STRs DYS549 and DYS533 (also DYS643 to a certain extent) scored high in the haplotype resolution ranking list of the 49 ssY-STRs, but somewhat less so in the 67 Y-STR analyses (Table 2). The three most informative ssY-STRs DYS570, DYS576, and DYS481 as identified here in the worldwide analysis were also highlighted previously as three of the five Y-STRs that comprised the smallest set of loci in supplementing the nine MH Y-STRs leading to 100% resolution in six regional German, one Dutch and one Turkish population sample respectively . Four highly resolving ssY-STRs DYS481, DYS570, DYS576 and DYS549 from our study were among the 14 Y-STRs that provided 99.5% and 99.7% resolution respectively in an European American and an African American population sample . Not surprisingly, four Y-STRs that showed highest degree of variation in a previous study of 76 worldwide individuals (DYS481, DYS570, DYS576, and DYS643) , also showed the highest values in maximizing global haplotype resolution in our study (perhaps somewhat less so for DYS643, see below for the limited value of this marker in supplementing commonly-used Y-STR sets).
Finally, we extracted the minimal set of ssY-STRs that provided highest possible haplotype resolution when supplementing the three commonly-used Y-STR sets: Yfiler, PPY and MH. This revealed the most informative ssY-STRs that can be used on top of the commonly-used Y-STR sets to further increase the resolution of male lineage resolution in forensic or anthropological cases where males in question could not be differentiated with the conventional Y-STRs. In the global dataset, we achieved the overall maximal haplotype resolution of 95.9% with 13 ssY-STRs on top of the Yfiler markers, an increase of 5.4% (Figures 4 and and5,5, Table 2). For the PPY and the MH analyses, 17 and 20 ssY-STRs, respectively, were needed to reach the highest possible resolution of 94.5% (which marks 98.5% of the overall maximal resolution), representing an increase of 9.9 and 13.3% respectively (Figure 4 and and5,5, Table 2). Also in the regional analyses we could considerably increase by supplementing with ssY-STRs the haplotype resolution achieved by all three commonly-used Y-STR sets (except where the maximal value was reached by the original marker sets), mostly only with a few ssY-STR markers (Figure 4 and and5,5, Supplementary Table S2). Gain of resolution ranged from 1.2% in Europe to 10.4% in the Middle East for the Yfiler supplementary analysis; from 3.6% in Europe to 20.8% in the Middle East for the PPY supplementary analysis; and from 5.9% in Native Americans to 22.9% in the Middle East for the MH supplementary analysis. The number of ssY-STRs required to reach the highest possible haplotype resolution on top of the three commonly-used Y-STR sets ranged from one in Europe to eight in South Asia supplementing Yfiler, from one in Native Americans to nine in South Asia supplementing PPY, and from one in Oceania and Native Americans to 11 in South Asia supplementing MH (Figure 5). The most informative ssY-STR marker to supplement commonly-used Y-STR sets for maximizing male lineage differentiation was DYS576 followed by DYS549, DYS533 and DYS570 in the global analyses of all three commonly-used marker sets (Table 2), whereas other ssY-STRs were more informative in some of the various regional analyses (Supplementary Table S2).
To understand the discriminatory power of ssY-STRs and to stimulate their application to forensic and evolutionary studies, we estimated the mutation rates of the 49 rarely-studied ssY-STRs for which no mutation data were previously available. For this purpose, 104 members of 28 deep-rooted pedigrees covering 403 meiotic transfers per locus were genotyped. Bayesian-based estimates of median locus-specific mutation rates ranged from 4.8×10−4 (95% credible interval: 2×10−5-3×10−3) for 31 loci where no mutation was observed, to 1.3×10−2 (95% CI: 4.8×10−3-2.7×10−2) for DYS570 (7/403) and 1.5×10−2 (95% CI: 6.2×10−3-2.9×10−2) for DYS576 (8/403) (Table 3). The median mutation rate across all 49 ssY-STRs was 7.9×10−4 (95% CI: 2.8×10−4-1.5×10−3), which is somewhat smaller than the median mutation rate of 2.2×10−3 (95% CI: 1.9×10−3-2.6×10−3) previously estimated across 16 Yfiler Y-STRs in a comprehensive family-based study based on more than 135,000 meiotic transfers . This difference may simply reflect stochastic fluctuation caused by the small number of meiotic transfers covered in this preliminary mutation analysis. Alternatively, this finding may be influenced by the ascertainment of the ssY-STRs used here, which was less biased towards highly-variable ones than for the commonly-used Y-STRs. More pedigree and/or family data are needed to establish a reliable estimate of the mutation rates and characteristics of these ssY-STRs. However, the mutation rates estimated here for DYS570 and DYS576 at >1% are the highest reported for any Y-STR locus studied thus far , and confirmation in a large number of father-son pairs is currently underway (M.K. et al. in preparation). Notably, the three markers with the highest mutation rate estimates here: DYS570, DYS576 and DYS481 (0.4×10−2; 95% CI: 1×10−3 −1.4×10−2) were also the three ssY-STRs that contributed most to the haplotype resolution in the global and some regional analyses (Table 2, Supplementary Table S2); and additionally, DYS576 (and to a lesser extent also DYS570) was most informative in supplementing commonly-used Y-STR sets in global analyses (Table 2, Supplementary Table S2). Also notable, although no mutation was observed among the 403 meiotic transfers considered here for DYS643, this locus was found to be similarly polymorphic as DYS576, DYS570, and DYS481 in a previous study of 76 worldwide individuals , and also was the number four in the list of the 25 most informative ssY-STRs (but did not appear as highly informative when supplementing commonly-used Y-STR sets, see Table1).
Simple single-copy Y-STRs have several advantages since their simple repetitive structure and unique copy number make fragment length analysis and allele-locus assignment unequivocal. The ability to provide very high resolution male lineage differentiation when used as new sets, or to markedly improve the haplotype resolution achieved by the three commonly-used Y-STR sets when used as supplementary sets, provide clear arguments in favour of using ssY-STRs in forensic and anthropological studies. This is especially so when performing Y-STR typing for forensic applications in populations with reduced Y-chromosome diversity, or for male relative differentiation in any population. Our results provide recommendations on how to supplement commonly-used Y-STR marker sets with additional ssY-STRs for maximizing male lineage differentiation, with DYS576, DYS570, DYS549 and DYS533 being suggested as most promising markers. Alternatively, our study may lead to the application of new sets of ssY-STRs avoiding complications introduced by complex and multi-copy Y-STRs, in particular the best 25 ssY-STR markers identified here that achieved 96.1% of the maximal haplotype resolution in the global dataset, with DYS570, DYS576, DYS481, DYS549, DYS533, and DYS643 recommended as most promising markers for this. The mutation data of the ssY-STRs generated here suggest that many of the ssY-STRs studied may mutate somewhat slower compared to most of the currently-used Y-STRs, providing advantages for paternity testing and anthropological studies. However, two ssY-STRs (DYS570 and DYS576) are suggestive for having a higher than usual mutation rate, providing advantages in resolving male lineages including differentiation of male relatives in forensic and anthropological studies.
We thank the original donors and CEPH for providing the HGDP DNA samples. We are grateful to the contributing members of the deep-rooting pedigrees. We thank Si-Keun Lim for providing information on genotyping assays before publication. Kaye Ballantyne is acknowledged for useful comments on the manuscript. AW was supported by the Deutsche Forschungsgemeinschaft (SFB 680 to MK), QY by a Joint Project from the Natural Scientific Foundation of China and the Royal Society, as well as YX and CTS were by The Wellcome Trust. This study was supported by funds from the Netherlands Forensic Institute (to MK) and received additional support by a grant from the Netherlands Genomics Initiative / Netherlands Organization for Scientific Research (NWO) within the framework of the Forensic Genomics Consortium Netherlands (to MK and PdK).