To determine the genetic affinity between the Daic populations and the Western Austronesians, we typed twenty single nucleotide polymorphisms (SNPs) and seven short tandem repeats (STRs) in the non-recombining region of 1,509 Y chromosomes sampled from 30 Daic populations, 23 ISEA populations, and 11 Taiwan aboriginal populations (see Figure for locations of the populations and Table for population information). Almost all of the Daic populations in China and all of the Taiwan aboriginal populations were sampled in this study.
Classification, population, and location information of the populations sampled in this study
In addition, principal component (PC) analysis of 134 East Asian populations encompassing all linguistic groups in East and Southeast Asia was performed using the frequencies of haplogroups defined by SNPs. The result showed that Daic populations are closer to the Western Austronesian groups than any other East and Southeast Asian populations are (Figure ), indicating a strong genetic affinity between Daic speakers and Western Austronesians. The separation of the Daic-ISEA-Taiwan cluster from the other ethnic groups is attributable to PC2 rather than to PC1, and O1a* is the haplogroup that shows the strongest correlation with PC2 (r2
= -0.875, P
; see Additional file 1
for details). Furthermore, O1a-M119 is the dominating haplogroup in Taiwan aborigines (average 77%) ranging from 54% to 100% (Table , sum of O1a* and O1a2). This lineage is also highly prevalent in Daic speakers (20.5%) and in ISEA (21.2%), but not in the other East Asians (< 5%) [23
]. Therefore, O1a-M119 is expected to provide much information for delineating the relationship between the Daic and Western Austronesians.
Figure 2 Principal component plot of Y-SNP. (A) PC plot of all the population samples. DC (green stars) is closest to MP (purple crosses) and TA (blue crosses). All of the other groups including ST, HM, AA, and AT (red spots including triangles, squares and diamonds) (more ...)
Y-SNP haplogroup frequencies of the newly studied samples (%)
The PC plot of Figure indicates that some Daic populations are close to the Sino-Tibetan cluster. It is possible that Daic and Sino-Tibetan populations have a common ancestry, which might have resulted in their genetic resemblance. However, another explanation for this observation is that Daic populations in mainland East Asia may have been influenced by Han Chinese genetically as they coexisted as neighbors since around 2,500 years ago. Admixture analysis can estimate the proportions of assumed Daic or Han ancestry in the present Daic populations, and some Daic populations isolated from Han Chinese can be used as the parental population in this admixture analysis. Aboriginal populations on Hainan Island (Hlai, Jiamao, and Cun) and Taiwan Island are assumed to have been relatively isolated, as their cultures were little influenced by the exotic cultures on the mainland. Therefore, the genetic structures of these island aborigines might be the closest to that of ancestral Daic [35
To estimate the assumed genetic influence of Han Chinese on the mainland Daic, we applied the Y SNP data of mainland Daic, Hainan aborigines, Taiwan aborigines, and Han Chinese [34
] to our admixture analysis. For this analysis, we set the latter three pooled populations as the parental populations of mainland Daic. Our results show that the genetic contribution of the Hainan aborigines is very high (2.145 ± 0.927), while those of the Han Chinese (-0.314 ± 0.422) and Taiwan aborigines (-0.831 ± 0.662) are hardly detected. Here the negative values of the genetic contribution estimated by the ADMIX program suggest that there is no possible contributions to the present Daic populations. This result indicates that the paternal lineages of Daic populations are relatively undisturbed, and the genetic affinity between Daic and Western Austronesian populations has hardly been influenced by population admixture.
The ISEA populations may also be admixed. In our study, we assumed that the ISEA were mixed by three potential parental populations: Daic populations, Taiwan aborigines, and the indigenous populations of the Sunda Islands, who are similar to Papuans. We performed an admixture analysis on the Indonesians, and included data of the Papuans from the literature [36
] as one of the parental population structures in the analysis. Our analysis showed the following admixture proportions: Daic (0.713 ± 0.124), Taiwan (0.143 ± 0.125), and Papuans (0.144 ± 0.050), indicating that the contribution of the Daic ancestry on the Indonesians is the most dominant. There is some uncertainty in these data as our assumption that the ISEA population is an admixture can not be tested.
As the haplogroup O1a* is the most unique haplogroup of the Daic and Western Austronesian populations, we estimated pairwise genetic divergence between Daic, Indonesians, and Taiwan aborigines using seven STRs carried by O1a* individuals (see Table for genetic distances and Additional file 2
for STR raw data). Our study shows that the divergence between Taiwan aborigines and Indonesians is the largest, and is about 3-fold as much as that between the Daic group and Taiwan aborigines. The divergence between the Daic group and Indonesians is comparable to that between the Daic group and Taiwan aborigines. These findings indicate that the Indonesians and Taiwan aborigines are genetically closer to the Daic group than the two Western Austronesian groups are to each other. Furthermore, the diversity based on the seven STRs carried by O1a* individuals is higher in the Daic speakers than the diversities in Indonesians and Taiwan aborigines (Table ). The population with the highest diversity is not always the oldest, but can also be a result of admixture with other neighbouring populations. However, the high diversity of the O1a* haplogroup of the Daic speakers should have resulted from the oldest age of the population, as this haplogroup is almost absent in the neighbouring populations and no admixture can bring more diversity. Taking the results of diversity and divergence together, the Daic population group is likely the ancestral group from which the Indonesians and Taiwan aborigines derived separately in paternal lineages. Other haplogroups of Y chromosomes (e.g. O3-M122, O2a-M95) displayed a similar pattern as O1a*, showing that the Daic group is genetically closer to Indonesians and Taiwan aborigines than these latter two groups are to each other (Table ). Interestingly, O2a may be traced even further to Austro-Asiatic populations as suggested by a recent study [38
Y-STR diversity of O1a, O2a, and O3 haplogroup
A median-joining network was constructed based on 7-STR haplotypes of O1a* individuals in the three ethnic groups (Figure ). If THH of ISEA is true, i.e., ISEA primarily derived from Taiwan aborigines, one would expect sharing and/or connections of ISEA lineages and Taiwan aboriginal lineages in the network. In Figure , Daic lineages (green nodes) constitute the center of the network. All ISEA lineages (yellow nodes) and Taiwan aboriginal lineages (blue nodes) are either shared or connected to one of the Daic lineages, either directly or indirectly. In contrast, none of the Taiwan aboriginal lineages (except for one) are shared with or connected to the ISEA lineages. These observations suggest that ISEA did not directly derive from Taiwan aborigines but that the ISEA and Taiwan aborigines derived from the Daic independently of each other.
Figure 3 Haplotype network of Y-STRs of Haplogroup O1a* individuals. As the original network was too complicated to display, here we presented the shortest tree of the largest possibility reduced from the network (this function is available in the recent versions (more ...)
We further noticed the Daic lineages that are connected to ISEA lineages in the network. Interestingly, most of the Daic haplotypes connecting to the ISEA are either from Hainan Island or from Guangxi, which is to northwest of Hainan (green nodes with dark green frames in Figure ). These Hainan and Guangxi populations are located around the Gulf of Tonkin. In particular, Cham, a Malayo-Polynesian population in South Vietnam, as well as Tsat in Hainan, which is a subgroup of Cham [11
], were found to connect Daic and Indonesians in the network. Therefore, we hypothesized that the ISEA likely originated in the area around the Gulf of Tonkin, and migrated southward through the Indochina Peninsula to the Malaya Peninsula before they spread to most of the islands of the Pacific Ocean and the Indian Ocean.
The age of the O1a* haplogroup was estimated in the network. The total age is 33765 ± 5221 years, which corresponds to the last Ice Age. The age of all the Daic samples in the network is 33193 ± 5577 years, close to the age of O1a*. It is not easy to estimate the real age of the Taiwan clusters as they overlap with the Daic haplotypes to a large extent. This kind of overlap also indicates multiple migrations from Daic populations to Taiwan aborigines. We estimated the age of the Taiwan cluster in the left side of the network to be 14659 ± 3110 years. The estimated age of all the Taiwan samples is 21268 ± 3148 years. Interestingly, this latter age is close to the age of the oldest human remains found in Taiwan, those of the Chochen
]. Therefore, we conclude that the migration of O1a* individuals from the mainland to Taiwan Island occurred during the Palaeolithic Age.
Because two fairly specific clusters of ISEA haplotypes can be observed in the network, we performed time estimates in both clusters. The age of the left ISEA cluster in the network is 9895 ± 2393 years, whereas that of the right cluster is 25880 ± 7137 years. The linguistic estimate for the origin of the Malayo-Polynesian is younger than that of our estimates, around 5000–6000 years ago [16
]. Moreover, little overlap between Daic haplotypes and ISEA haplotypes is observed in the network, which indicates bottleneck effects might have formed the two ISEA clusters during the emigration of ISEA populations out of the ancestral Daic populations. Geographically, the bottleneck might be the narrow seashore of Vietnam. Therefore, the O1a* haplogroup was most probably introduced into ISEA populations during the origin of the Malayo-Polynesians more than 7500 years ago. However, the possibility of recent migrations of the O1a individuals into ISEA can not be ignored, because the genetic time estimate is not precise enough to eliminate such a possibility.
It should be noted that, in the Express Train Hypothesis, there are two different aspects: 1) the origin of the migrations, i.e. the Taiwan Homeland Hypothesis, and 2) the mode of migrations, i.e., a rapid dispersal starting from Indonesia. In this study, we examined the THH in Western Austronesians by including the Daic speakers and ISEA, both of which are largely missing in previous studies. We show that Taiwan is not likely the homeland of Indonesian ISEA, at least not for the major paternal lineages. Although both Taiwan aborigines and Indonesian ISEA derived from the Daic, their departures occurred separately, suggesting that the major paternal lineages of Western Austronesian populations are not monophyletic.
Interestingly, the spread of the domestic pig in the Southeast Asia archipelago and the Pacific took place in almost the same way as that of Western Austronesian populations suggested by our study. The pigs in Taiwan and in regions as far as Micronesia came directly from the mainland of East Asia, while those in the Southeast Asian archipelago and Polynesia came from the Indochina Peninsula. It is assumed that the domestic pig was introduced by human populations during early migrations, which would imply that humans have also entered the Southeast Asia archipelago and the Pacific in two different routes [41
In fact, our observations are consistent with a monophyletic Austro-Tai super-phylum which contains Daic speakers, Malayo-Polynesians, and Taiwan aborigines [5
]. The observations presented in this study demonstrate that it is absolutely necessary to include Daic populations and ISEA in the Austronesian origin studies. Without these groups, Polynesians and Taiwan aborigines would have appeared most similar to each other, leading to the conclusion that all the Austronesians originated in Taiwan.
Our results suggest that the Gulf of Tonkin is more likely the homeland of the paternal lineages of ISEA. Due to the complex nature of population migrations from Eastern Indonesia to the Pacific Islands [23
], and the pronounced genetic division between Eastern and Western Austronesians [27
], we opted not to include Polynesian data in our analysis. Instead, we only analyzed Western Austronesians. The absence of O1a-M119 in Polynesian populations is intriguing and it can not be simply explained by invoking the bottleneck effect [21
] given that a great deal of diversity of Y chromosome haplotypes has been observed in Polynesians [23
Consistent with our findings for paternal lineages, mitochondrial DNA studies on populations from Peninsular Malaysia also suggest an ancestry of aboriginal Malays in Indochina around the time of the Last Glacial Maximum [48
]. This ancestry subsequently dispersed through the Malaya Peninsula into island Southeast Asia [48
]. The ISEA mtDNA studies also indicated that if an Austronesian migration from Taiwan did take place, it was demographically minor [49
Most of our conclusions are based on the analysis of O1a*, which is only a fraction of the Y-chromosome lineages found in these populations. The frequency of this group of lineages is remarkable in Taiwanese populations, but it is not so dramatic in Malayo-Polynesians or Daic populations. It is possible that some population events could have involved other Y-chromosome lineages. It is also reasonable that there are other minor parts of paternal lineages with different origins, such as aboriginal populations of Indonesia prior to the formation of Austronesian, or that more recent migrations from South Asia took place [29
]. The genetic relationship amongst the East and Southeast Asians are much more complicated than expected.