Human Y-chromosomal short tandem repeat polymorphisms (Y-STRs) or microsatellites, usually in combination as haplotypes, are widely used to resolve and relate male lineages in forensic, genealogical, evolutionary and anthropological studies [1
]. Haplotype resolution is central to most applications of Y-STRs and depends not only on the number of markers used but also on their independent mutability. Differences in Y-STR diversity and allelic spectra between geographic regions are well known [5
], and can sometimes be used to infer the geographic region of paternal ancestry of a person [7
]. However, they can also result in differences in haplotype resolution between loci and geographic regions [9
]. Also, at a more local level, higher frequencies of indistinguishable Y-chromosomes can sometimes be found due to members of the same male lineage living in the same geographic region, a phenomenon usually referred to as (male) population substructure. It has been observed that 7-16 highly polymorphic Y-STRs are insufficient for differentiating male lineages when applied to populations that underwent a strong (male) bottleneck in their history: for example, identical Y-STR haplotypes were found in two populations from Pakistan with a frequency of 14% (16 Y-STRs) [10
], in Finns at 13% (16 Y-STRs) [11
], or even over entire geographic regions such as Polynesia at 16% (7 Y-STRs) [12
]. Reduced Y-STR diversity leading to a large number of indistinguishable Y-STR haplotypes can also be caused by cultural effects such as patrilocal residence pattern and polygyny as previously observed in New Guinea [13
], or by strongly biased male expansion due to male occupation history and privilege as in Central and Eastern Asia [14
]. The resulting lack of resolution can be especially problematic when a limited number of Y-STRs is used in forensic applications of male lineage identification.
Therefore, more Y-STRs than included in the three commonly-used sets (the 9 Y-STRs comprising the so-called Minimal Haplotype, the 12 Y-STRs included in the PowerPlex Y® System [Promega], or the 17 Y-STRs from the AmpFl
STR® Yfiler® PCR Amplification Kit [Applied Biosystems]) are needed for improving the resolution of male lineage differentiation in particular populations, and also for differentiating male relatives in any population. A large number of additional Y-STRs (166) have been described previously [16
]; however, population-genetic data are still scarce for most of these additional markers. In this study, we analysed 67 Y-STRs in 590 unrelated males from 51 globally distributed populations covering eight geographic regions from all inhabited continents except Australia, (the HGDP-CEPH panel [17
]). These 67 Y-STRs comprise 18 previously-used Y-STRs including all of those that are part of commercially-available Y-STR kits, as well as 49 additional Y-STRs described recently [16
]. From the latter, we chose simple loci that exist in a single copy on the non-recombining part of the human Y-chromosome and contain only one uninterrupted variable stretch of repeats to avoid the problems of length homoplasy (as with complex Y-STRs) and allele-locus assignment (as with multi-copy Y-STRs). Simple Y-STRs also have a great advantage over complex markers due to a more direct relationship between mutation rate and length variation [16
] as relevant in evolutionary studies. Previous analysis suggests that ssY-STRs may lead to more precise time estimates when applied to male lineage dating in anthropological and evolutionary studies due to a higher correlation between repeat count and repeat variance compared with complex Y-STRs [18
]. Single-copy Y-STRs do not suffer from the problem of equivocal allele-locus assignment usually associated with multi-copy Y-STRs, which may result in an underestimation of the haplotype resolution, and additionally can cause problems in correctly inferring the number of males who contributed to a crime scene sample in forensic studies.
Here, we investigate male lineage differentiation, both on a global level and regional levels, considering a total of 67 Y-STRs as well as a set of 49 rarely-studied simple single-copy Y-STRs (ssY-STRs) alone. In addition, we studied improvements of global and regional haplotype resolution by adding the most informative ssY-STRs to the three commonly-used sets of Y-STRs. Finally, we estimated mutation rates for all 49 ssY-STRs by analyzing deep-rooted pedigrees to understand the basis of their value in resolving male lineages and to stimulate future uses of these markers in forensic, genealogical, and anthropological studies where the knowledge of mutation rates is crucial.