In this study, we genotyped a panel of 107 AIMs to investigate the degree of west African ancestry among AAs and Nigerians and administered a questionnaire to explore the variables used by each cohort in constructing racial identity. We found that while nearly all selfidentified AAs had a majority of west African ancestry, AAs had significantly more European admixture and greater admixture variability than the Nigerians. Self-report of a high degree of African ancestry in a three-generation family tree did not accurately predict the degree of west African ancestry calculated from our AIMs. Analysis of questionnaire responses revealed that no simple question proxy effectively estimates the degree of west African ancestry among US-born AAs. However, relative degree of west African ancestry could be effectively determined using both MLE and PCA. The results of our study thus suggest that while self-identified race could identify a cohort of individuals with a high degree (>80%) of west African ancestry, an admixture-matched case-control design may be more accurate and objective for conducting genetic association studies in admixed populations.
There are conflicting reports in the literature regarding the ability of self-identified race to serve as an accurate predictor of population clusters. In our study, self-reported race generally accorded well with inferred genetic population cluster. The mean west African ancestry among the AA participants was 83%, and only one of the participants in the US-born AA cohort did not have a majority of west African ancestry on genotype analysis. Thus, based on our genotyping data, if members of the AA cohort were assigned to one of the five major population genetic clusters (African, Caucasian, Pacific Islander, East Asian, and Native American, as defined by Risch, 2002)(7), all but one of the participants would be classified together in the African cluster. (Of note, our study is limited by the relatively small sample sizes of both cohorts, which may lead to skewing of both mean ancestry estimates and collected questionnaire information. Thus, the population level inferences should be taken with caution.) In contrast, in a previous study, Wilson et al. used microsatellite markers to analyze individuals from eight populations and observed that genetically inferred clusters corresponded poorly to commonly used racial labels(11). However, this study used far fewer markers and also classified Ethiopians as Blacks and New Guineans as Asians, while more recent population studies suggest that the genetic ancestries of these groups are European and Pacific Islander, respectively(7). Other studies have found that given sufficient numbers of markers and sample sizes, self-defined race may correspond well with inferred genetic clusters. Rosenberg et al. tested the correspondence of predefined population groups with those inferred from individual multilocus genotypes and found general agreement between the genetic and predefined populations(9). Tang et al. studied 3,636 subjects participating in the Family Blood Pressure Program who identified themselves as belonging to one of four racial groups (white, African American, East Asian, and Hispanic). Subsequent genetic cluster analysis using microsatellite markers produced four major clusters with near-perfect correspondence with the four racial categories(10).
The AA cohort in our study had a mean of 15% European admixture, which is consistenty with previous reports of a range of 7-23% European admixture among US AAs(14-16). Of note, the estimates of 4% European and 1% Native American ancestry in the Nigerian population is likely due to bias in ML estimates due to the limited number of markers. We found that among participants, there was a significantly higher proportion of admixture and higher variability in admixture proportions, in the US-born AA cohort compared to a population that emigrated from Africa (i.e., Nigerians) (). The significant variation in individual ancestry estimates among the AA cohort suggests that this group, like the CHS AA cohort(15), represents a diverse population consisting of several subpopulations. For participation in the AA cohort, subjects identified both parents as AAs who were born in the US. Although data regarding grandparental race were not used to screen study participation, these data were collected through a threegeneration family tree during administration of the questionnaire. In this study population, all AA subjects described that the race of at least three of their four grandparents was consistent with African ancestry. Individuals and society have historically classified children of mixed race ancestry as AA, even when one parent is Caucasian, Asian, or Native American. For AAs, this is a remnant of the “Jim Crow” laws and the “One Drop” rule or “Rule of Hypodescent”. Thus, identification as AA would still occur in cases where the parents and grandparents were of mixed race ancestry. This could also contribute to the greater European admixture and greater admixture variability seen in the AA cohort.
The two cohorts were found to differ significantly in income bracket and education level, raising the possibility of a confounding relationship between socioeconomic status (SES) and degree of west African ancestry. In fact, others have found significant interactions between SES, genetic ancestry, and disease outcome(22). In our study however, analyses of income and education within each cohort suggested that SES is unlikely to represent an important bias in our study population as neither income nor education significantly correlated with degree of west African ancestry within either group (data not shown). SES is a complex construct, operates on multiple levels, and may be time dependent(23). In our study, income and education within each cohort were not significant confounders. However, it is possible that there were unmeasured confounders for which no amount of correction would control. Therefore, generalizations cannot be made regarding the relationship between ancestry and SES, and the potential confounding effect of SES must be addressed specifically in individual studies.
The limited ability of self-reported race to effectively reveal population substructure was also seen in a recent study that compared population structure inferred from individual ancestry estimates with self-reported race(24). In a case-control study of early-onset lung cancer, Barnholtz-Sloan et al. reported that the frequency of the drug-metabolizing gene GSTM1 null “risk” genotype varied both by individual European ancestry and by case-control status within self-reported race, particularly among the AA study participants. Furthermore, they found that genetic risk models that adjusted for European ancestry provided a better fit for this relationship between GSTM1 genotype and lung cancer risk compared with the model that adjusted for selfreported race. The results of this and other studies suggest that the likelihood of identifying disease-susceptibility loci will be lower in studies that rely on less accurate measures of population stratification (e.g., self-reported race)(25). Thus, genetic classifications of ancestry may provide a more objective and accurate method of defining homogeneous populations which can be used to investigate specific population-disease associations.
Because of cost and feasibility issues that may discourage the incorporation of admixture testing in the design of both preclinical and clinical studies, we developed a questionnaire to search for questions or combinations of questions that may reliably serve as a proxy for west African ancestry. We are not aware of any previous reports that have investigated relationships between factors used by individuals in constructing racial identity and individual ancestry estimates as determined through genotyping. When the entire dataset (i.e., two cohorts) was examined as a whole, several questionnaire items were found to have a significant association with percentage west African ancestry. Many of these items appear to be related to characteristics of an immigrant population (e.g., Nigerians), such as birthplace, self-described nationality, language spoken at home, number of family generations living in the US, self described ethnicity, and estimation of importance of one's success in school for his/her community. However, when the AA cohort was examined separately, no question or set of questions significantly predicted degree of west African ancestry, as determined both by univariate analysis and factor analysis of survey items. Self-reported ancestry using a threegeneration family tree also could not accurately predict degree of west African ancestry. Although reported grandparent race was highly sensitive for ancestry, it was not specific. Since all participants in our study reported that at least three grandparents were of a race consistent with African ancestry, this information could not distinguish those who actually had a high degree of African ancestry. The lack of specificity of reported grandparent race likely is due to the imprecision of racial categories. Our family tree analysis was limited by the relatively similar background of our study participants; for example, all AA participants indicated that three or all of their grandparents were of a race consistent with African ancestry. Thus, studying a population with a greater degree of admixture may be more appropriate for investigating the utility of a three-generation family tree in predicting degree of African ancestry. A recent study by Burnett et al., however, suggests that self-reported ancestry may have poor reliability(26). In this study, Burnett et al. prospectively asked siblings to list the countries of origin of both parents. Participants in this study were recruited at the Mayo Clinic and were primarily Caucasian. Nevertheless, Burnett et al. found that only 49% of sibling pairs agreed completely on the countries of origin of both parents and this agreement only increased to 68% when named countries were postcoded into six population genetic clusters (Eurasia, East Asia, Oceania, America, Africa, and the Kalash group of Pakistan).
Applying PCA to the AIMs genotypes of our study population, we were unable to identify a principal component that could be used to order participants by relative degree of west African ancestry and compared with percentage west African ancestry calculated using MLE. However, PCA of the study participants' genotype frequencies was able to identify a cluster of subjects with highly similar SNP distributions. The majority of subjects (both AA and Nigerian) with a high percentage of MLE-calculated west African ancestry were included in the first principal component, indicating an overall concordance between individual ancestry estimates calculated with MLE and subjects groupings with PCA. A few subjects included in the first principal component had a lower percentage west African ancestry calculated with MLE. We suspect this difference may result from methodological differences and our use of point estimates rather than confidence intervals for estimated ancestry in our MLE calculations.
We began this study to investigate the degree of admixture among self-reported AAs following our previous studies of breast cancer tumor biology in AAs(3, 4, 27). Genetic heterogeneity in study subjects could impair the ability of a study to detect true biological differences between racially-defined, apparently uniform groups. We have found that genetic ancestry proportions can vary significantly within groups of individuals who would self-identify as the same racial group. Our work suggests that to maximize the predictive value of clinical inferences from genome-wide association studies, one must consider within-as well as between-population association. Thus, while self-identified race can identify a cohort of individuals with a high degree of African ancestry, admixture-matched case-control studies will be more effective in studying differences in disease incidence and outcomes in specific racial populations.