1.  The phylogenetic and geographic structure of Y-chromosome haplogroup R1a 
R1a-M420 is one of the most widely spread Y-chromosome haplogroups; however, its substructure within Europe and Asia has remained poorly characterized. Using a panel of 16 244 male subjects from 126 populations sampled across Eurasia, we identified 2923 R1a-M420 Y-chromosomes and analyzed them to a highly granular phylogeographic resolution. Whole Y-chromosome sequence analysis of eight R1a and five R1b individuals suggests a divergence time of ∼25 000 (95% CI: 21 300–29 000) years ago and a coalescence time within R1a-M417 of ∼5800 (95% CI: 4800–6800) years. The spatial frequency distributions of R1a sub-haplogroups conclusively indicate two major groups, one found primarily in Europe and the other confined to Central and South Asia. Beyond the major European versus Asian dichotomy, we describe several younger sub-haplogroups. Based on spatial distributions and diversity patterns within the R1a-M420 clade, particularly rare basal branches detected primarily within Iran and eastern Turkey, we conclude that the initial episodes of haplogroup R1a diversification likely occurred in the vicinity of present-day Iran.
2.  Regional Differences in the Accumulation of SNPs on the Male-Specific Portion of the Human Y Chromosome Replicate Autosomal Patterns: Implications for Genetic Dating 
PLoS ONE  2015;10(7):e0134646.
Factors affecting the rate and pattern of the mutational process are being identified for human autosomes, but the same relationships for the male specific portion of the Y chromosome (MSY) are not established. We considered 3,390 mutations occurring in 19 sequence bins identified by sequencing 1.5 Mb of the MSY from each of 104 present-day chromosomes. The occurrence of mutations was not proportional to the amount of sequenced bases in each bin, with a 2-fold variation. The regression of the number of mutations per unit sequence against a number of indicators of the genomic features of each bin, revealed the same fundamental patterns as in the autosomes. By considering the sequences of the same region from two precisely dated ancient specimens, we obtained a calibrated region-specific substitution rate of 0.716 × 10-9/site/year. Despite its lack of recombination and other peculiar features, the MSY then resembles the autosomes in displaying a marked regional heterogeneity of the mutation rate. An immediate implication is that a given figure for the substitution rate only makes sense if bound to a specific DNA region. By strictly applying this principle we obtained an unbiased estimate of the antiquity of lineages relevant to the genetic history of the human Y chromosome. In particular, the two deepest nodes of the tree highlight the survival, in Central-Western Africa, of lineages whose coalescence (291 ky, 95% C.I. 253–343) predates the emergence of anatomically modern features in the fossil record.
3.  Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus 
European Journal of Human Genetics  2012;20(12):1275-1282.
Haplogroup G, together with J2 clades, has been associated with the spread of agriculture, especially in the European context. However, interpretations based on simple haplogroup frequency clines do not recognize underlying patterns of genetic diversification. Although progress has been recently made in resolving the haplogroup G phylogeny, a comprehensive survey of the geographic distribution patterns of the significant sub-clades of this haplogroup has not been conducted yet. Here we present the haplogroup frequency distribution and STR variation of 16 informative G sub-clades by evaluating 1472 haplogroup G chromosomes belonging to 98 populations ranging from Europe to Pakistan. Although no basal G-M201* chromosomes were detected in our data set, the homeland of this haplogroup has been estimated to be somewhere nearby eastern Anatolia, Armenia or western Iran, the only areas characterized by the co-presence of deep basal branches as well as the occurrence of high sub-haplogroup diversity. The P303 SNP defines the most frequent and widespread G sub-haplogroup. However, its sub-clades have more localized distribution with the U1-defined branch largely restricted to Near/Middle Eastern and the Caucasus, whereas L497 lineages essentially occur in Europe where they likely originated. In contrast, the only U1 representative in Europe is the G-M527 lineage whose distribution pattern is consistent with regions of Greek colonization. No clinal patterns were detected suggesting that the distributions are rather indicative of isolation by distance and demographic complexities.
Y-chromosome; haplogroup G; human evolution; population genetics
4.  Afghan Hindu Kush: Where Eurasian Sub-Continent Gene Flows Converge 
PLoS ONE  2013;8(10):e76748.
Despite being located at the crossroads of Asia, genetics of the Afghanistan populations have been largely overlooked. It is currently inhabited by five major ethnic populations: Pashtun, Tajik, Hazara, Uzbek and Turkmen. Here we present autosomal from a subset of our samples, mitochondrial and Y- chromosome data from over 500 Afghan samples among these 5 ethnic groups. This Afghan data was supplemented with the same Y-chromosome analyses of samples from Iran, Kyrgyzstan, Mongolia and updated Pakistani samples (HGDP-CEPH). The data presented here was integrated into existing knowledge of pan-Eurasian genetic diversity. The pattern of genetic variation, revealed by structure-like and Principal Component analyses and Analysis of Molecular Variance indicates that the people of Afghanistan are made up of a mosaic of components representing various geographic regions of Eurasian ancestry. The absence of a major Central Asian-specific component indicates that the Hindu Kush, like the gene pool of Central Asian populations in general, is a confluence of gene flows rather than a source of distinctly autochthonous populations that have arisen in situ: a conclusion that is reinforced by the phylogeography of both haploid loci.
5.  Molecular Dissection of the Basal Clades in the Human Y Chromosome Phylogenetic Tree 
PLoS ONE  2012;7(11):e49170.
One hundred and forty-six previously detected mutations were more precisely positioned in the human Y chromosome phylogeny by the analysis of 51 representative Y chromosome haplogroups and the use of 59 mutations from literature. Twenty-two new mutations were also described and incorporated in the revised phylogeny. This analysis made it possible to identify new haplogroups and to resolve a deep trifurcation within haplogroup B2. Our data provide a highly resolved branching in the African-specific portion of the Y tree and support the hypothesis of an origin in the north-western quadrant of the African continent for the human MSY diversity.
6.  A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe 
The phylogenetic relationships of numerous branches within the core Y-chromosome haplogroup R-M207 support a West Asian origin of haplogroup R1b, its initial differentiation there followed by a rapid spread of one of its sub-clades carrying the M269 mutation to Europe. Here, we present phylogeographically resolved data for 2043 M269-derived Y-chromosomes from 118 West Asian and European populations assessed for the M412 SNP that largely separates the majority of Central and West European R1b lineages from those observed in Eastern Europe, the Circum-Uralic region, the Near East, the Caucasus and Pakistan. Within the M412 dichotomy, the major S116 sub-clade shows a frequency peak in the upper Danube basin and Paris area with declining frequency toward Italy, Iberia, Southern France and British Isles. Although this frequency pattern closely approximates the spread of the Linearbandkeramik (LBK), Neolithic culture, an advent leading to a number of pre-historic cultural developments during the past ≤10 thousand years, more complex pre-Neolithic scenarios remain possible for the L23(xM412) components in Southeast Europe and elsewhere.
Y-chromosome; haplogroup R1b; human evolution; population genetics
8.  Separating the post-Glacial coancestry of European and Asian Y chromosomes within haplogroup R1a 
Human Y-chromosome haplogroup structure is largely circumscribed by continental boundaries. One notable exception to this general pattern is the young haplogroup R1a that exhibits post-Glacial coalescent times and relates the paternal ancestry of more than 10% of men in a wide geographic area extending from South Asia to Central East Europe and South Siberia. Its origin and dispersal patterns are poorly understood as no marker has yet been described that would distinguish European R1a chromosomes from Asian. Here we present frequency and haplotype diversity estimates for more than 2000 R1a chromosomes assessed for several newly discovered SNP markers that introduce the onset of informative R1a subdivisions by geography. Marker M434 has a low frequency and a late origin in West Asia bearing witness to recent gene flow over the Arabian Sea. Conversely, marker M458 has a significant frequency in Europe, exceeding 30% in its core area in Eastern Europe and comprising up to 70% of all M17 chromosomes present there. The diversity and frequency profiles of M458 suggest its origin during the early Holocene and a subsequent expansion likely related to a number of prehistoric cultural developments in the region. Its primary frequency and diversity distribution correlates well with some of the major Central and East European river basins where settled farming was established before its spread further eastward. Importantly, the virtual absence of M458 chromosomes outside Europe speaks against substantial patrilineal gene flow from East Europe to Asia, including to India, at least since the mid-Holocene.
Y chromosome; haplogroup R1a; human evolution; population genetics
9.  The coming of the Greeks to Provence and Corsica: Y-chromosome models of archaic Greek colonization of the western Mediterranean 
The process of Greek colonization of the central and western Mediterranean during the Archaic and Classical Eras has been understudied from the perspective of population genetics. To investigate the Y chromosomal demography of Greek colonization in the western Mediterranean, Y-chromosome data consisting of 29 YSNPs and 37 YSTRs were compared from 51 subjects from Provence, 58 subjects from Smyrna and 31 subjects whose paternal ancestry derives from Asia Minor Phokaia, the ancestral embarkation port to the 6th century BCE Greek colonies of Massalia (Marseilles) and Alalie (Aleria, Corsica).
19% of the Phokaian and 12% of the Smyrnian representatives were derived for haplogroup E-V13, characteristic of the Greek and Balkan mainland, while 4% of the Provencal, 4.6% of East Corsican and 1.6% of West Corsican samples were derived for E-V13. An admixture analysis estimated that 17% of the Y-chromosomes of Provence may be attributed to Greek colonization. Using the following putative Neolithic Anatolian lineages: J2a-DYS445 = 6, G2a-M406 and J2a1b1-M92, the data predict a 0% Neolithic contribution to Provence from Anatolia. Estimates of colonial Greek vs. indigenous Celto-Ligurian demography predict a maximum of a 10% Greek contribution, suggesting a Greek male elite-dominant input into the Iron Age Provence population.
Given the origin of viniculture in Provence is ascribed to Massalia, these results suggest that E-V13 may trace the demographic and socio-cultural impact of Greek colonization in Mediterranean Europe, a contribution that appears to be considerably larger than that of a Neolithic pioneer colonization.
10.  The emergence of Y-chromosome haplogroup J1e among Arabic-speaking populations 
Haplogroup J1 is a prevalent Y-chromosome lineage within the Near East. We report the frequency and YSTR diversity data for its major sub-clade (J1e). The overall expansion time estimated from 453 chromosomes is 10 000 years. Moreover, the previously described J1 (DYS388=13) chromosomes, frequently found in the Caucasus and eastern Anatolian populations, were ancestral to J1e and displayed an expansion time of 9000 years. For J1e, the Zagros/Taurus mountain region displays the highest haplotype diversity, although the J1e frequency increases toward the peripheral Arabian Peninsula. The southerly pattern of decreasing expansion time estimates is consistent with the serial drift and founder effect processes. The first such migration is predicted to have occurred at the onset of the Neolithic, and accordingly J1e parallels the establishment of rain-fed agriculture and semi-nomadic herders throughout the Fertile Crescent. Subsequently, J1e lineages might have been involved in episodes of the expansion of pastoralists into arid habitats coinciding with the spread of Arabic and other Semitic-speaking populations.
Y-chromosome haplogroup J1e; Neolithic; Arabic languages; pastoralism
11.  Y-chromosome Short Tandem Repeat Intermediate Variant Alleles DYS392.2, DYS449.2, and DYS385.2 Delineate New Phylogenetic Substructure in Human Y-chromosome Haplogroup Tree 
Croatian Medical Journal  2009;50(3):239-249.
To determine the human Y-chromosome haplogroup backgrounds of intermediate-sized variant alleles displayed by short tandem repeat (STR) loci DYS392, DYS449, and DYS385, and to evaluate the potential of each intermediate variant to elucidate new phylogenetic substructure within the human Y-chromosome haplogroup tree.
Molecular characterization of lineages was achieved using a combination of Y-chromosome haplogroup defining binary polymorphisms and up to 37 short tandem repeat loci. DNA sequencing and median-joining network analyses were used to evaluate Y-chromosome lineages displaying intermediate variant alleles.
We show that DYS392.2 occurs on a single haplogroup background, specifically I1*-M253, and likely represents a new phylogenetic subdivision in this European haplogroup. Intermediate variants DYS449.2 and DYS385.2 both occur on multiple haplogroup backgrounds, and when evaluated within specific haplogroup contexts, delineate new phylogenetic substructure, with DYS449.2 being informative within haplogroup A-P97 and DYS385.2 in haplogroups D-M145, E1b1a-M2, and R1b*-M343. Sequence analysis of variant alleles observed within the various haplogroup backgrounds showed that the nature of the intermediate variant differed, confirming the mutations arose independently.
Y-chromosome short tandem repeat intermediate variant alleles, while relatively rare, typically occur on multiple haplogroup backgrounds. This distribution indicates that such mutations arise at a rate generally intermediate to those of binary markers and Y-STR loci. As a result, intermediate-sized Y-STR variants can reveal phylogenetic substructure within the Y-chromosome phylogeny not currently detected by either binary or Y-STR markers alone, but only when such variants are evaluated within a haplogroup context.
13.  Y-chromosomal evidence of the cultural diffusion of agriculture in southeast Europe 
The debate concerning the mechanisms underlying the prehistoric spread of farming to Southeast Europe is framed around the opposing roles of population movement and cultural diffusion. To investigate the possible involvement of local people during the transition of agriculture in the Balkans, we analysed patterns of Y-chromosome diversity in 1206 subjects from 17 population samples, mainly from Southeast Europe. Evidence from three Y-chromosome lineages, I-M423, E-V13 and J-M241, make it possible to distinguish between Holocene Mesolithic forager and subsequent Neolithic range expansions from the eastern Sahara and the Near East, respectively. In particular, whereas the Balkan microsatellite variation associated to J-M241 correlates with the Neolithic period, those related to E-V13 and I-M423 Balkan Y chromosomes are consistent with a late Mesolithic time frame. In addition, the low frequency and variance associated to I-M423 and E-V13 in Anatolia and the Middle East, support an European Mesolithic origin of these two clades. Thus, these Balkan Mesolithic foragers with their own autochthonous genetic signatures, were destined to become the earliest to adopt farming, when it was subsequently introduced by a cadre of migrating farmers from the Near East. These initial local converted farmers became the principal agents spreading this economy using maritime leapfrog colonization strategies in the Adriatic and transmitting the Neolithic cultural package to other adjacent Mesolithic populations. The ensuing range expansions of E-V13 and I-M423 parallel in space and time the diffusion of Neolithic Impressed Ware, thereby supporting a case of cultural diffusion using genetic evidence.
Balkan Neolithic; farming transition; peopling of Europe; Y-chromosome haplogroups
14.  Y-chromosome Short Tandem Repeat DYS458.2 Non-consensus Alleles Occur Independently in Both Binary Haplogroups J1-M267 and R1b3-M405 
Croatian Medical Journal  2007;48(4):450-459.
To determine the human Y-chromosome haplogroup backgrounds of non-consensus DYS458.2 short tandem repeat alleles and evaluate their phylogenetic substructure and frequency in representative samples from the Middle East, Europe, and Pakistan.
Molecular characterization of lineages was achieved using a combination of Y-chromosome haplogroup defining binary polymorphisms and up to 37 short tandem repeat loci, including DYS388 to construct haplotypes. DNA sequencing of the DYS458 locus and median-joining network analyses were used to evaluate Y-chromosome lineages displaying the DYS458.2 motif.
We showed that the DYS458.2 allelic innovation arose independently on at least two distinctive binary haplogroup backgrounds and possibly a third as well. The partial allele length pattern was fixed in all haplogroup J1 chromosomes examined, including its known rare sub-haplogroups. Within the alternative R1b3 associated M405 defined sub-haplogroup, both DYS458.0 and DYS458.2 allele classes occurred. A single chromosome also allocated to the R1b3-M269*(xM405) classification. The physical position of the partial insertion/deletion occurrence within the normal tetramer tract differed distinctly in each haplogroup context.
While unusual DYS458.2 alleles are informative, additional information for other linked polymorphic loci is required when using such non-conforming alleles to infer haplogroup background and common ancestry.
15.  Inference of ancestry: constructing hierarchical reference populations and assigning unknown individuals 
Human Genomics  2006;2(4):212-235.
The ability to infer personal genetic ancestry is being increasingly utilised in certain medical and forensic situations. Herein, the unsupervised Bayesian clustering algorithms structure, is employed to analyse 377 autosomal short tandem repeats typed on 1,056 individuals from the Centre d'Etude du Polymorphisme Humain Human Diversity Panel. Individuals of known geographical origin were hierarchically classified into a framework of increasingly homogeneous clusters to serve as reference populations into which individuals of unknown ancestry can be assigned. The groupings were characterised by the geographical affinities of cluster members and the accuracy of these procedures was verified using several genetic indices. Fine-scale substructure was detectable beyond the broad population level classifications that previously have been explored in this dataset. Metrics indicated that within certain lines, the strongest structuring signals were detected at the leaves of the hierarchy where lineage-specific groupings were identified. The accuracy of unknown assignment was assessed at each level of the hierarchy using a 'leave one out' strategy in which each individual was stripped of cluster membership and then re-assigned using the supervised Bayesian clustering algorithm implemented in GeneClass2. Although most clusters at all levels of resolution experienced highly accurate assignment, a decline was observed in the finer levels due to the mixed membership characteristics of some individuals. The parameters defined by this study allowed for assignment of unknown individuals to genetically defined clusters with measured likelihood. Shared ancestry data can then be inferred for the unknown individual.
population genetics; human population structure; clustering; Bayesian inference; short tandem repeats (STRs)
16.  Phylogenetic applications of whole Y-chromosome sequences and the Near Eastern origin of Ashkenazi Levites 
Nature Communications  2013;4:2928.
Previous Y-chromosome studies have demonstrated that Ashkenazi Levites, members of a paternally inherited Jewish priestly caste, display a distinctive founder event within R1a, the most prevalent Y-chromosome haplogroup in Eastern Europe. Here we report the analysis of 16 whole R1 sequences and show that a set of 19 unique nucleotide substitutions defines the Ashkenazi R1a lineage. While our survey of one of these, M582, in 2,834 R1a samples reveals its absence in 922 Eastern Europeans, we show it is present in all sampled R1a Ashkenazi Levites, as well as in 33.8% of other R1a Ashkenazi Jewish males and 5.9% of 303 R1a Near Eastern males, where it shows considerably higher diversity. Moreover, the M582 lineage also occurs at low frequencies in non-Ashkenazi Jewish populations. In contrast to the previously suggested Eastern European origin for Ashkenazi Levites, the current data are indicative of a geographic source of the Levite founder lineage in the Near East and its likely presence among pre-Diaspora Hebrews.
Population genetics studies continue to debate whether Ashkenazi Levites originated in Europe or the Near East. Here, Rootsi et al. use whole Y-chromosome DNA sequences to unravel the phylogenetic origin of the Ashkenazi Levite and suggest an origin for the Levite founder lineage in the Near East.
