Most of the STR markers used in the population and evolutionary studies of the human Y chromosome have been tri- or tetranucleotide repeats (e.g. in the Applied Biosystems AmpFlSTR® Yfiler™ Kit and the PowerPlex® Y System). Given the relatively lower mutation rates of tri- and tetranucleotide STRs compared to dinucleotide loci, it is theoretically plausible that the penta- and hexanucleotide repeats evolve at a lower rate than tri- and tetranucleotide repeats, although still much faster than SNPs. They should therefore prove to be an attractive class of STR markers to be used in Y chromosome population and forensic relationship testing studies.
If a population is at mutation-drift equilibrium, the variance at an STR locus is proportional to the (effective) mutation rate
[17]. In equilibrium, the variance ratio between penta/hexa and tri/tetra STRs times a mutation rate of tri- and tetranucleotide markers would give a mutation rate of penta- and hexanucleotide STRs. However, variation within any haplogroup in any human population is far from equilibrium. An estimate that would represent the effective mutation rate among the penta- and hexanucleotide markers studied is within-population within-haplogroup STR variation averaged across various populations and haplogroups. Bearing this in mind, it is important to use as much data as possible in order to obtain the entire ranges of Y-STR variation. For this reason, we included 115 samples from the R1 clade with two common haplogroups showing opposite clinal patterns
[26],
[27] in Europe–R1a and R1b1b2, and one rare haplogroup that has apparently gone through bottlenecks and/or founder effects–R1b1b1. It can be seen that both the average repeat variance and the average diversity vary considerably between different data sets and haplogroups within our data (); therefore, obviously, studies with larger data sets would improve on our results. Nevertheless, this study shows consistent average repeat variance and diversity ratios of approximately 0.5 between penta/hexa and tri/tetra markers, which allows us to estimate that the average mutation rate of penta- and hexanucleotide STRs is around a half of that of tri- and tetranucleotide STRs. The major contributors to this difference are penta- and tetranucleotide markers, we cannot draw any conclusions from hexa- and trinucleotide markers due to too small numbers of loci. Overall, we notice a trend that STRs of increased size of the repeat unit exhibit lower variation.
Since repeat complexity and repeat count (in case of complex STRs, the repeat count of the longest homogenous array) have also been reported to influence STR marker variation
[7], we analysed our markers according to these features in order to ascertain whether the difference observed between tri/tetra and penta/hexa marker variation was indeed due to repeat unit size. Based on the limited number of markers included in the present study, repeat variance and diversity averaged across simple versus complex repeats (disregarding repeat unit size) showed hardly any difference at all, whereas repeat count did seem to have an effect on marker variation, especially on repeat variance (higher repeat variance corresponding to higher repeat count), the latter observation confirming previous results
[7]. Our data set and that of
[7] are not well comparable, the latter having a large number of loci and a small number of samples, whereas we have a small number of loci and a larger number of samples, and we cannot state definitively whether STR marker variation depends on repeat unit size or repeat count (or both). However, sequence composition has no effect on STR variation, since neither Student's nor Welch's
t test showed any significant difference in the sequence composition of penta/hexa versus tri/tetra markers (calculating the proportions of the nucleotides in the repeats and considering that A

=

T and G

=

C, p>0.2 for each test).
In order to compare age estimates based on tri- and tetranucleotide versus penta- and hexanucleotide markers, coalescence ages of Y chromosome haplogroups were calculated based on both the tri/tetra and the penta/hexa STR results, using the previously estimated mutation rate of 6.9×10
−4 per 25 years
[17] for the tri/tetra markers and a two times lower mutation rate of 3.45×10
−4 per 25 years for the penta/hexa markers. For our calculations, different sample sets representing various Y chromosome clades were assembled to compare the age estimates of tri/tetra or penta/hexa STRs to SNP-based estimates
[24]. The results () show that in most cases, coalescence age estimates based on the tri/tetra and penta/hexa marker clocks are comparable, although the error margins are rather wide. While within the R clade the SNP-based age estimate is, as expected, lower than the STR-based estimates, it is greater than the STR-based estimates for the older clades K, F, and CF (). This indicates STR locus saturation, which seems to occur more rapidly in case of tri- and tetranucleotide markers (the age estimate for the CF clade based on tri/tetra marker results is 42,200 years, considerably lower than the estimate of 64,700 years based on penta/hexa marker results and the estimate of 68,900 years based on SNP marker results
[24]). On the whole, absolute age estimates vary considerably and are therefore rather unreliable, while relative age estimates show patterns more consistent with the relative age distribution of SNP-defined haplogroups.
The penta- and hexanucleotide markers analysed were relatively more clock-like in their behaviour (α

=

0.5–1.7, ) than the tri- or tetranucleotide loci in their variance time series. DYS392, Y PENTA 1, and DYS437 were not variable enough to be informative within a time frame of 20,000 years, particularly considering our limited sample sizes; on the other hand, DYS456, DYS458, and DYS391 appeared to be quickly saturated (). The generally clock-like behaviour of penta- and hexanucleotide markers underlines their applicability in evolutionary studies.
Based on our results, penta- and hexanucleotide STR markers surpass tri- and tetranucleotide markers in the ability to distinguish Y chromosome haplogroups without SNP data ( and ). Their ability to group samples according to their haplogroups is confirmed by the results of the combined Fisher test showing significant differences in repeat score distributions of penta/hexa loci between different haplogroups. Although the establishment of reliable phylogenetic relations requires additional SNP marker data, STRs can be used to distinguish Y chromosome haplogroups and, in some cases, subdivisions within haplogroups, as we show in this study for R1a and R1b1b (). Our findings show that in some cases, samples can be accurately assigned to Y chromosome haplogroups based solely on Y-STRs, corroborating the conclusion of a recent study
[9].
In conclusion, our results show that STRs of increased repeat unit size have a lower rate of evolution. This must naturally be taken into account when estimating STR mutation rates, and along with the slower locus saturation and the generally clock-like behaviour exhibited by the penta- and hexanucleotide markers analysed in this study, it makes STRs with longer repeat units well applicable in population and evolutionary studies, perhaps even more so than their counterparts with shorter repeat units.