This study describes for the first time the Y-chromosome diversity of the main ethnic groups in Afghanistan. We have explored the genetic composition of modern Afghans and correlated their genetic diversity with well established historical events and movements of neighboring populations. The study data strongly shows that continuous migrations and movements through Central Asia since at least the Holocene, have created populations structures that today, are highly correlated with ethnicity in Afghanistan.
A previous study on Pakistan 
, that included ethnic groups also present in Afghanistan (Baluch, Hazara, Pashtun), showed that Y-chromosome variation was structured by geography and not by ethnic affiliation. With the exception of Hazara, all ethnic groups in Pakistan were shown to have similar Y-chromosome diversity, they clustered with South Asians, and they are close to Middle Eastern males. A Y-chromosome study 
on populations from Turkmenistan, Uzbekistan, Kazakhstan, Kyrgyzstan, and Tajikstan, found that there is greater diversity among populations that share the same ethnic group than among the ethnic groups themselves. These observations support a common genetic ancestry hypothesis for these populations irrespective of ethnicity. We have also found substantial differences among the various groups of Afghanistan. The inter-ethnic comparisons however could not be tested in this study since information on tribe and clan affiliation was not available. The high genetic diversity observed among Afghanistan's groups has also been observed in other populations of Central Asia 
. It is possibly due to the strategic location of this region and its unique harsh geography of mountains, deserts and steppes, which could have facilitated the establishment of social organizations within expanding populations, and helped maintaining genetic boundaries among groups that have developed over time into distinct ethnicities.
The RM networks of the major common haplogroups show that the flow of paternal lineages among the various ethnic groups is very limited, and it is consistent with high level of endogamy practiced by these groups. Similar Y-chromosome results have been previously reported among the Central Asian ethnic groups 
, but with less pronounced genetic differentiation in maternal lineages 
, most likely the results of endogamous practices that were tolerant to assimilation of foreign females.
The prevailing Y-chromosome lineage in Pashtun and Tajik (R1a1a-M17), has the highest observed diversity among populations of the Indus Valley 
. R1a1a-M17 diversity declines toward the Pontic-Caspian steppe where the mid-Holocene R1a1a7-M458 sublineage is dominant 
. R1a1a7-M458 was absent in Afghanistan, suggesting that R1a1a-M17 does not support, as previously thought 
, expansions from the Pontic Steppe 
, bringing the Indo-European languages to Central Asia and India.
MDS and Barrier analysis have identified a significant affinity between Pashtun, Tajik, North Indian, and West Indian populations, creating an Afghan-Indian population structure that excludes the Hazaras, Uzbeks, and the South Indian Dravidian speakers. In addition, gene flow to Afghanistan from India marked by Indian lineages, L-M20, H-M69, and R2a-M124, also seems to mostly involve Pashtuns and Tajiks. This genetic affinity and gene flow suggests interactions that could have existed since at least the establishment of the region's first civilizations at the Indus Valley and the Bactria-Margiana Archaeological Complex.
Furthermore, BATWING results indicate that the Afghan populations split from Iranians, Indians and East Europeans at about 10.6 kya (95% CI 7,100–15,825), which marks the start of the Neolithic revolution and the establishment of the farming communities. In addition, Pashtun split first from the rest of the Afghans around 4.7 kya (95% CI 2,775–7,725), which is a date marked by the rise of the Bronze Age civilizations of the region. These dates suggest that the differentiation of the social systems in Afghanistan could have been driven by the emergence of the first urban civilizations. However, the dates suggested by BATWING should be treated with care, since BATWING does not model gene flow and differential assimilation of incoming migrations. These events could alter the time of split. However, it was previously shown that topologies and times of splits in the modal trees generated by BATWING are insensitive to in-migration 
, which leaves BATWING timing results insusceptible to in-migrations and invasions that might be expected to reduce the times of split 
. On the other hand, the times of population splits for BATWING's modal trees are very susceptible to subsequent migration between those populations. This means that the 2 major waves of splitting could have occurred earlier, but since RM networks of the major haplogroups show limited gene flow between the ethnic groups and since the population structure suggested by MDS and Barrier correlate populations from the historically connected 
Bronze Age sites to Pashtun and Tajik, BATWING suggested splits in Afghan populations at 4.7 kya (95% CI 2,775–7,725) are very probable. A previous study by Heyer et al conducted in Central Asia 
have also estimated significantly older dates for the emergence of ethnic groups from what has been historically known. These older dates may be explained by the fact that This suggests that the ethnic groups could have resulted from a encompass fusion of different populations 
or that ethnicities developed were established from anin already structured population(s).
BATWING's hypotheses model mutations and coalescent events, reflecting ancestral structures from which the current populations have emerged. Later expansions into the region would have assimilated the ancestral population, granting the Afghans distinctive genetics from the expanding source populations even though they shared general genetic features. This is evident in the Afghan Hazara and Afghan Uzbek who have always been associated with expanding Mongols and Turco-Mongols. Although we have found that at least third to half of their chromosomes are of East Asian origin, PCA places them between East Asia and Caucasus/Middle East/Europe clusters.
Historical expansions and invasions appear to have had differential contribution in shaping Afghanistan population structures. We have found limited genetic evidence of expansions previously thought to have left specific imprints in current populations.
The E1b1b1-M35 lineages in some Pakistani Pashtun were previously traced to a Greek origin brought by Alexander's invasions 
. However, RM network of E1b1b1-M35 found that Afghanistan's lineages are correlated with Middle Easterners and Iranians but not with populations from the Balkans.
The Islamic invasion in the 7th
century CE left an immense cultural impact on the region, with reports of Arabs settling in Afghanistan and mixing with the local population 
. However the genetic signal of this expansion is not clearly evident: some Middle Eastern lineages such as E1b1b1-M35 are present in Afghanistan, but the most prevalent lineage among Arabs (J1-M267) was only found in one Afghan subject. In addition, the three Afghans that identified their ethnicity as Arab, had lineages autochthonous to India.
We also note that three Hazara subjects belonged to haplogroup B-M60, which is very rare outside Africa. RM network shows that the subjects had a recent founding ancestor from East Africa, which could have been brought to Afghanistan through slave trade. This shows that the genetic ethnic boundaries have been selectively permeable, however the history of the rules of assimilation in this region over time are not yet clearly understood.
Language adoption and spread in Afghanistan also seem to have been a complex process. The Afghan genetic structure tends to correlate Hazara and Uzbek which belong to two different language families. Hazara, like Pashtun and Tajik, belong to the Indo-Iranian group of the Indo-European family, while the Uzbek language is in the Turkic family. The form of Turkic spoken by the Uzbek appears to be a direct descendent of an extinct Turkic language that was developed in the 15th
century CE 
. It appears that the dominating genetics shared among Uzbek and Hazara split >1 ky prior to this date. Therefore, it is possible that language differences in Afghanistan reflect a more recent cultural shift.
In conclusion, Y-chromosome diversity in Afghanistan reveals major differences among its ethnic groups. However, we have found that all Afghans largely share a heritage of a common ancestral population that emerged during the Neolithic revolution and remained unstructured until 4.7 kya (95% CI 2,775–7,725). The first genetic structures between the different social systems started during the Bronze Age accompanied, or driven, by the formation of the first civilizations in the region. Later migrations and invasions to the region have been differentially assimilated by the ethnic groups, increasing inter-population genetic differences, and giving the Afghan a unique genetic diversity in Central Asia.