As an alternative to the use of self-identified “racial” and “ethnic” categorization, it has been proposed that genetically determined ancestry may be more accurate when considering biomedical and/or clinical research
27. However, we must be clear about our definition of ancestry as it can be defined on several levels: biogeographical (i.e. African vs. Asian), geographical (i.e. south-east Asian vs. northern European), geopolitical (i.e. Cambodian vs. Swedish), or cultural terms (i.e. Jewish vs. Berber). Furthermore, the description of ancestry can be self-identified, identified by an observer, or estimated from genetic data. In addition, ancestry can be defined by one or multiple sources.
Some reports have suggested that self-identified race/ethnicity correlates with clustering of genetic ancestral groups
31. However, self-identified estimates of genetic ancestry are less accurate than genetic testing and likely vary by population
27,32–33. Genetic estimates of ancestry become especially relevant for populations that have undergone recent admixture, where distortions in the relationship between genetic and self-assessed ancestry have been described. As an example, a recent study performed in the Southwest U.S. reported that 85% of Hispanics underestimate their Native American admixture proportions, while most Native Americans systematically underestimate their European ancestry
34.
Genetic ancestry (admixture) of a given individual or population can be estimated by using most genetic polymorphisms. However, ancestry informative markers (AIMs) are routinely used because the number of markers required to estimate ancestry is inversely proportional to the informativeness of the marker. AIMs are those genetic markers, usually SNPs, which exhibit high allele frequency differences between parental populations, i.e. African vs. European. Using highly informative AIMs means that fewer markers are required to obtain robust ancestry estimations, which also means lower genotyping costs. One measure of ancestral informativeness of a specific polymorphism is Delta (δ), the absolute difference in allele frequency between two ancestral populations. A δ value of 1 implies complete ancestry informativeness and a δ value of 0 implies no informativeness for ancestry. Most markers are only informative for one pair of ancestral populations, while some are informative for more than one pair and, in general, a delta value of > 0.5 is considered as highly informative for ancestry
35. Even though δ is the most obvious measure for ancestral informativeness, other measures such as F
ST, I(n), and Fisher’s information content have been used
36–38. In general F
ST and I(n) are slightly more accurate methods of ranking markers than δ
37.
Several statistical methods have been proposed to estimate individual admixture proportions using different maximum-likelihood, Bayesian, and principal component (PC) approaches. According to some studies, the method selected has a relatively small impact on the accuracy of individual admixture estimates
39–40. By far the most important factor in determining accuracy of the admixture estimate appears to be the number of markers used to estimate admixture and their informativeness. Even though that they have been used in population genetics for decades, PC-based estimations are more widely applied recently to many large scale dense genotype datasets, especially ones in which the variation in ancestry may be difficult to ascertain through other methods
41–43.
In the U.S., a significant proportion of the population consists of admixed populations. Therefore, categorical classifications are likely to misrepresent the rich genetic variation that exists within these populations. For example, in the 2000 U.S. Census, 48% of Hispanics self-identified as White, 2% as African/African American, 1% as American Indian, and 42% as “Some Other Race”
44. As illustrated in , we can see an example of genetic ancestry estimated in Puerto Ricans, who are considered to be a Latino ethnic group. In this case, individuals self-identified themselves and their four grandparents as being “pure” Puerto Ricans. In contrast with the homogeneity in self-identification, there is a remarkable genetic heterogeneity between individuals in the contributions of the different ancestral groups.
Given the continuum of African ancestry in African Americans (), it is surprising that remnants of the “One-Drop Rule” still persist in the eyes of most Americans. The “One-Drop Rule” defines a person as African American with as little as a single drop of “African blood”, regardless of the origin of his or her other ancestors
47. This rule was historically implemented as a way to enlarge the slave population with the children of slave holders and it was maintained in the Jim Crow era to keep the
status-quo of social groups. From a social perspective, this “One-Drop Rule” has encouraged racism but has also brought together the African American community. Recently, Barack Obama was elected as president of the U.S., a historic event. Although half of his ancestry is of European descent, media and general public opinion have “unambiguously” classified him as the first African American president. From a genetic point of view there is no scientific justification to classify such a diverse population as a single and homogenous group. From a social point of view there is likely a “threshold” of ancestry in which all members of the population are classified within the category (i.e. President Obama). This social classification is likely to be contextual and specific to population and time/era. Measurement of genetic ancestry is the only method available so far to estimate the degree of African ancestry among African Americans, as family genealogy and questionnaire data are not reliable predictors
33. We also must be mindful of the fact that genetically determined ancestry does not capture the social and cultural determinants that contribute to an individuals’ affiliation with a particular racial or ethnic group.