In contrast with the popular belief that the ancestral Native American pool in Cuba was totally erased by the massive arrival of Europeans and African slaves and centuries of admixture, and despite the absence of distinct ethnic Native American groups in Cuba, the present results demonstrate the persistence of a substantially high Native American component in the maternal specific gene pool. The presence of an unexpectedly high proportion Native American mtDNA substrate has been described previously in other American populations that also experienced dramatic demographic changes in colonial times, such as Puerto Rico [
29,
30](Martinez-Cruzado et al. 2001)(Martinez-Cruzado et al. 2005), Brazil [
31] and Mexico [
32]. The estimated Native American component inferred in Cuba is higher than those estimates based on nuclear markers (<5%) [
33]. In addition, the frequency of Native American mtDNA lineages in Cuba is larger than in the English-speaking Caribbean countries (5.4%) [
34] as well as in Afro-American populations from Central and South America such as the Garifuna from Honduras (15.9%) and the Chocó people from Colombia (16.3%) [
35] (see Additional file
3). In these cases, the indigenous population was even more dramatically replaced by African slaves and to a certain extent by Europeans. Our results differ from a previous independent study carried out in the Cuban province of Pinar del Rio [
36], whose authors estimated that 50% of maternal lineages in this province were of European, 46% African, and a maximum 4% of Native American origin. Our results indicate that the Native American mtDNA haplogroup patterns are statistically homogeneous across the island, the maternal Native American substrate being higher than 25% in all provinces. Specifically, in Pinar del Rio we detected 33% Native American maternal lineages, a figure that significantly contrasts (χ
2 = 12.32; 2 d.f.,
P = 0.002) with the 4% found in the study by Torroni [
36]. This difference highlights the risk of population stratification that can easily show up in case-control disease association studies, for example, leading to an increase of the false positive rate [
26,
37]. Forensic genetic studies are also sensitive to population stratification. This is especially true in pseudo-ethnic groups such as the 'Hispanics', a term firmly established in American societies, and in particular, in the USA [
38]. Although our recruitment scheme was designed to capture a representative sample of the Cuban population, and although blood donations were not rewarded, our sample could include unapparent socioeconomic biases that would distort ancestry estimates.
The origin of Native Americans in the Caribbean, such as Ciboneys and Tainos, is a controversial issue. The present mtDNA Native American haplogroup frequencies in contemporary Cuba differ significantly (
P < 0.0001) from the haplogroup composition observed in ancient DNA samples from Ciboneys and Tainos. Fifteen ancient specimens of Ciboneys from Cuba have been analyzed [
6] and classified into haplogroups C1 (nine individuals), D1 (five individuals), and A2 (one individual), while, in a different study [
5], 24 samples of extinct Tainos from the neighboring island of the Dominican Republic were analyzed and classified as C1 (18 individuals) and D1 (6 individuals). Neither haplogroups A2 nor B2 were observed. According to Lalueza-Fox et al. [
6], the scarcity of haplogroup A2 and the predominance of lineages C1 and D1 in the Caribbean point towards South America as the origin of both the Tainos and the Ciboneys. However, an argument based only on average continental haplogroup frequencies can be misleading since haplogroup frequencies vary substantially in different present-day Native American populations, either within North, Central, and South America. A process of progressive island colonization coming from the Orinoco Valley and/or from the Yucatan provides an appropriate ground for the action of genetic drift. Intensive episodes of genetic drift are in fact the rule more than the exception in other Native American populations. Over half of the Cuban sequences belonging to haplogroups C1 and D1 described in the present study have been already described in ancient DNA studies [
5,
6] in a total of 39 individuals (24 Tainos and 15 Ciboneys). However, these sequences are common in Native American populations (from both North and South America) and many represent founding lineages in the continent. In contrast to the hypothesis by Lalueza-Fox et al. (2003), our data suggest that both North and South America could have contributed to the original gene pool of Cuban Native Americans. We anticipate an even more complex scenario where the contribution of other Native American people coming from different continental locations in the post-colonization period could have contributed to the already admixed population. In fact, importation of Native Americans from Central and North America has already been reported [
39]. Therefore, sampling effects consisting of merely the existence of close maternal relatedness between the individual analyzed could have contributed to distorting the haplogroup patterns observed in ancient Tainos. This hypothesis would also explain why the predominance of the C1 and D1 haplogroups in these pre-Columbian samples is not observed neither in present-day Cuba nor in Puerto-Rico [
29,
30], where the Tainos were also the Native inhabitants before the European arrival.
Although the hypothesized Southern origin of the Native American Cuban people as coming from the Orinoco Valley has been historically favored [
2-
4], our results indicate that a substantial genetic input from Central and North America (e.g. Yucatan or Florida peninsulas) cannot be ruled out. Due to the vulnerability of haplogroup frequencies to genetic drift, the phylogeographic information provided by the sequences is a necessary complementary tool in order to locate the origin of the Cuban sample in the context of the American continent. Thus, the comparison of the Cuban sequences to a dataset of more than 4,000 sequences covering the entire continent suggests a multiregional origin within America since the number of matches was similar for North, Central and South America. We are also aware that phylogeographic information is still of limited use because Native American lineages are scarcely informative for the HVS-I mtDNA control region. A higher molecular resolution based on the analysis of complete genomes (or high throughput mtDNA SNP coding region scans) can be useful to refine phylogeographic inferences [
40,
41].
Besides the presence of maternal Native American substrate in Cuba, the present results show a strong sexual asymmetry between European males and non-European females in Cuba. In contrast to the 33% Native American presence in the female lineages, no Native American fraction was found in the Y-chromosome haplogroups. This result is in agreement with historical documentation, which records the high prevalence of Native American-'
white mestizos' in the first generations after the conquest. The European settlers were in vast majority men, and mating between Spanish men and Native American women was not uncommon during the first generations of settlers [
7]. Similar sex biases between Native American and European founders have been previously described in Brazil [
31,
42,
43] and Colombia [
44]. Regarding the African component, the strong bias between the mtDNA and Y-chromosome haplogroup frequencies is also noticeable. While the African lineages constitute 45% of the total maternal lineages, they are present in only 18% of the Cuban Y-chromosomes. Although extremely high amounts of African slaves were carried to Cuba, the African-born slave population presented extremely high rates of mortality and an unfavorable sex ratio. The '
mulattos' were considered inferior in the Cuban society since the beginning of the slave trade [
7], so that mating between African men and European women was strongly discouraged. In contrast, the mating of European masters and the African slave women was more common.