|Home | About | Journals | Submit | Contact Us | Français|
Ancestry informative markers (AIMs) measure genetic admixtures within an individual beyond self-reported racial/ethnic (SRR) groups. Here, we used genetically determined ancestry (GDA) across SRR groups and examine associations between GDA and HIV-1 RNA and CD4+ counts in HIV-positive children in the US.
41 AIMs, developed to distinguish 7 continental regions, were detected by real-time-PCR in 994 HIV-positive, antiretroviral naïve children. GDA was estimated comparing each individual’s genotypes to allele frequencies found in a large set of reference individuals originating from global populations using STRUCTURE. The means of GDA were calculated for each category of SRR. Linear regression was used to model GDA on CD4+ count and log10 RNA, adjusting for SRR and age.
Subjects were 61% Black, 25% Hispanic, 13% White and 1.3% Unknown. The mean age was 2.3 years (45% male), mean CD4+ count 981 cells/mm3, and mean log10 RNA 5.11. Marked heterogeneity was found for all SRR groups with high admixture for Hispanics. In adjusted linear regression models, subjects with 100% European ancestry were estimated to have 0.33 higher log10 RNA levels (95% CI: (0.03, 0.62), p=0.028) and 253 CD4+ cells /mm3 lower (95% CI: (−517, 11), p = 0.06) in CD4+ count, compared to subjects with 100% African ancestry.
Marked continental admixture was found among this cohort of HIV-infected children from the US. GDA contributed to differences in RNA and CD4+ counts beyond SRR, and should be considered when outcomes associated with HIV infection are likely to have a genetic component.
Numerous studies have demonstrated that race/ethnicity can be an important factor in determining health and risk for specific diseases1–4. However, although most studies have used self -reported race/ethnicity (SRR), it is known that this approach to determining ancestry is often inaccurate, does not reveal the extent of genetic admixture and can mask the importance of race/ethnicity as a risk for disease or complications associated with specific drugs or drug combinations5–8. To measure the extent of admixture within a given individual, panels of ancestry informative markers (AIMs) are being used to characterize ancestry. Genetically determined ancestry (GDA) estimates are measurements, based on genomic variants, which are predictive of the regions (or ancestry) from which individuals inherited their genetic alleles. Due to population stratification the allele frequency for some single nucleotide polymorphisms (SNPs) can be extremely high within some regional populations, but low among other regional populations9. These differences can be extensive permitting the identification of panels of SNPs to identify continental ancestry9. Additionally, these same regional patterns suggest that AIMs may also be correlated with other SNPs which may be predictive or causal for a specific clinical phenotype10. Therefore, AIMs have been found to be useful for disease risk assessment and for control of confounding that may be due to population stratification.
As a result of the human diaspora that has occurred in the last 400–600 years, some regions in the world have populations with high admixtures of continental ancestry9. The result is that self-reported race and ethnicity groups can include a mixture of genetic backgrounds. Self-reported race and ethnicity can also reflect the historical and/or social context of the group with which the person identifies3. This implies that self-reported race measures a mixture of genetics and social factors leaving self-reported race as a poor measurement for the genetic composition of an individual.
Although human immunodeficiency virus type-1 (HIV) can infect all populations, in the United States and other countries, specific groups are at increased risk of infection. However, even among these risk groups, despite recurrent high risk exposures, some people do not become infected, while others who become infected may have substantially different rates of progression of HIV-associated diseases. Host genetic factors have been identified as important determinants for both risk of HIV infection and rate of disease progression11–14. Although racial and ethnic minorities, particularly those of African-American and Hispanic ancestry, are disproportionately infected with HIV, few studies have examined the impact of ancestry on HIV disease, and none to our knowledge in children. In a study of 91 HIV-infected adults, it was shown that the association with the CYP2B6 metabolizer phenotype and virologic response to an NNRTI-based (efavirenz or nevirapine) antiretroviral regimen was confounded by GDA and that self-reported race/ethnicity was insufficient15. A more nuanced conclusion was reached in a paper from the Multicenter AIDS Cohort that described dyslipidemia in 1,779 HIV-1 infected men16. They found a significant interaction between GDA and HIV/HAART status for all lipids tested, a low concordance between self-reported race and GDA in admixed populations, and better performance of GDA relative to self-reported race in statistical models. However, they still concluded that self-reported race remained a good clinical surrogate for GDA. In an additional study of 310 North American HIV-infected participants, the hazard ratios for the time to virologic suppression when comparing the CCR5-2459 GA and AA alleles to the AA alleles were stronger among participants with higher African ancestry, but no associations were found with GDA and the time to virologic suppression17.
In the research reported here, we have used a novel 41-SNP panel of AIMs18 to identify continental origin and admixture among HIV-infected children in the U.S. who predate the availability of effective combination antiretroviral therapy (cART). We have investigated the relationship between genetically determined ancestry (GDA) and self-reported race (SRR). Additionally, we have examined the associations of continental ancestry on CD4+ and HIV plasma RNA in children prior to the initiation of cART.
The analysis participants were children who participated in the Pediatric AIDS Clinical Trial Group (PACTG) protocols P152 (n=431)19 and P300 (n=563)20. These trials were U.S. based, prospective, randomized, double blind, placebo controlled, multicenter protocols that assessed the efficacy of combination nucleoside reverse transcriptase inhibitor (NRTI) treatment regimens prior to the availability of effective cART. To be included in one of these trials a child needed to have symptomatic HIV infection, be between 3 months and 18 years of age for P15221, and between 42 days and 15 years of age for P300. For both protocols the children had to meet criteria for a diagnosis of HIV infection from Centers for Disease Control (CDC) classification system available at the time the protocols accrued. The vast majority of these children were HIV-infected through mother-to-child transmission as this cohort predates the routine administration of antiretrovirals to pregnant women.
A recently described highly informative panel of 41 AIMs was used to determine continental ancestry18. Each of the 41 SNPs was detected using real-time PCR on DNA specimens obtained from peripheral blood mononuclear cells of 994 HIV-infected children in P152 and P300.
The continental origin of analysis participants was estimated by comparing each child’s genotypes to allele frequencies found in a large set of 3517 reference individuals originating from 107 populations around the world18. Reference populations were grouped into the 7 world-regions Europe, Africa, America, Central/South Asia, South/West Asia, East Asia and Oceania. Population structure and ancestry estimates were obtained in a trained clustering analysis using STRUCTURE v18.104.22.168,23. Five independent runs were performed at K=7, using 20,000 burn-in cycles and 20,000 MCMC replications under the admixture model, including prior population information of the reference set. Allele frequencies were updated using only individuals with population information at a migration prior of 0.05. Uniform priors were used for the degree of admixture (“infer α" option) and for the allele frequency (λ = 1 option). All other parameters were set at default. Continental ancestry calling was performed by assigning the predominant continental origin to each subject.
Three-way admixture of the analysis children determined to be from Africa, Europe, or America was further estimated using STRUCTURE with prior population information of reference populations from Africa (N=761), Europe (N=1011), and America (N=407) under an admixture model with correlated allele frequencies at K=3 groups and reported as percent GDA.
CD4+ lymphocyte count, CD4+ lymphocyte percentage, and HIV plasma RNA were used as the outcomes for this study. These were measured at entry prior to initiation of therapy. P152 used the NASBA HIV-1 RNA QT Amplification System21 and P300 used the Roche Amplicor quantitative RNA PCR assay20 to measure HIV plasma RNA.
Baseline characteristics are presented as percentages, means and standard deviations as appropriate. Linear regression with a robust variance estimator24 was used to measure the associations between GDA and CD4+ lymphocyte counts, CD4+ lymphocyte percent and log10 HIV RNA. Adjustment variables were all selected a-priori and included age, weight for age z-score, study (P152/P300) and where appropriate self-reported race/ethnicity. Genetically determined ancestry was used as a proportion in the regression analyses so that the regression slopes are interpreted for a 100% change in GDA. This parameterization was used so that a direct comparison to self-reported race could be made. Because GDA totaled to 100%, the region (e.g. African GDA) not in the model is interpreted as a reference group. In total, we considered six separate regression models to estimate the adjusted associations. The first model (Model 1) was used to estimate the effects for GDA without adjustment for SRR. The second model (Model 2) was used to estimate the effects of SRR without adjustment for GDA. The third model (Model 3) included both GDA and SRR so that GDA and SRR are adjusted for each other. The remaining models (Model 4 through Model 6) included GDA after subsetting based on SRR. All confidence intervals (CI) are 95% CI and p-values < 0.05 were considered to be statistically significant. R version 2.15.1 and SAS version 9.2 were used for the analyses.
Of the 994 participants for whom the complete panel of the 41 AIMs were determined, 61% self-reported as Black, 25% self-reported as Hispanic, 13% self-reported as White, and 1% reported as other races or without a self-reported race (Table 1). Fifty-five percent were female; the average age was 3.8 years; and the average weight for age z-score was −0.66. The mean CD4+ lymphocyte count was 981; the mean CD4+ percent was 24%; and the mean plasma log10 HIV RNA was 5.11. Of the 994 subjects with AIMs measurements, 826 (168 with missing data) had HIV-RNA data, and 987 (7 with missing data) had CD4+ counts and percentages. Missing HIV-RNA data was due to the availability of specimens.
We first examined the association of SRR with GDA for each subject. Because the continental ancestry of the vast number of subjects clustered in three regions, Africa, Europe and the Americas, these continents were used to describe the GDA in these analyses. As seen in Figure 1, histograms for GDA for the three regions by self-reported race illustrate the relative skewness (departure from symmetry), kurtosis (degree of “peakedness”) and extensive variability among self-reported racial groups. All of the histograms display strong skewness, with the possible exception of the European GDA for those that self-report as Hispanic. Histograms with the most kurtosis include the Americas GDA for those that self-report as White or Black, and for the Africa GDA for those that self-report as White. Figure 2 displays the mean GDA by self-reported race and the overall continental ancestry distribution for the entire cohort. For those who self-reported as Black, the mean GDA was 74% for African ancestry, 17% for European ancestry and 9% for Native Americas ancestry. For those who self-reported as White, the mean GDA was 14% for African ancestry, 76% for European ancestry and 10% for Native Americas ancestry. For self-reported Hispanics, GDA was 25% for African ancestry, 53% for European ancestry and 22% for Native Americas ancestry.
Because viral load is an important indicator of HIV replication and a predictor of disease progression, we examined the role of genetic ancestry in determining the quantity of virus detected in the plasma of subjects prior to their initiating antiretroviral therapy. In our initial analyses, we examined by regression analysis for all self-reported racial/ethnic groups, the pre-treatment HIV RNA comparing subjects with European or Americas ancestry to those with African ancestry (Table 2). In this analysis (Model 1) a higher percentage of European ancestry was associated with higher log10 RNA relative to children with more African ancestry (mean change in log10 RNA for a 100% change in GDA(slope) = 0.18, CI (0.01, 0.36), p-value = 0.041). In the same analysis, children with a higher percentage of Native American ancestry had a non-statistically significant lower log10 viral load compared to those with more African ancestry (slope = −0.29, CI (−0.79, 0.20), p-value = 0.25). Similarly, children with a higher percentage of Native American ancestry had marginally significant lower log10 viral load when compared to those with more European ancestry (slope = −0.47, CI (−1.00, 0.06), p-value =0.080). The higher log10 RNA for those with more European ancestry held up after controlling for self-reported race (slope = 0.33, CI (0.03, 0.62), p-value=0.028) (Model 3), and the directionality of the estimated slopes were similar after subsetting on self-reported race (Models 4 through 6). When the cohort was divided into subsets based on self-reported race, there was only one statistically significant result (self-reported Blacks: slope = 0.62, CI (0.18, 1.05), p-value=0.006). The interaction test for GDA and SRR was statistically significant (Model 3 plus an interaction term, p-value = 0.039), implying that the mean change of log10 HIV RNA as a function of GDA differed by SRR.
Model 2 and Model 3 (Table 2) contains comparisons of self-reported race for the mean log10 HIV RNA without and with adjustment for GDA, respectively. When comparing those who self-report as White to those who self-report as Black, there was a marginally significant result (slope = 0.13, CI (−0.01, 0.28), p-value=0.065) (Model 2). This estimate is similar to the Europe estimate from Model 1. After controlling for GDA, no significant association was identified (Model 3, SRR based on self-report as Black as the reference) indicating that continental ancestry was a stronger predictor of viral set point than self-reported race/ethnicity.
We performed similar regression analyses to those described above for the association of continental ancestry with CD4+ count and percentage (Table 2). When controlling for GDA (Model 3), in the analyses of self-reported race, those who self-identified as White had a higher CD4+ count compared to those who self-identified as Black (slope = 243, CI: (28,448), p-value = 0.025). Similarly, when CD4+ percentage was used in the analyses, subjects that self-reported as White had on average higher CD4+ percentage than those who self-identified as Black (slope = 3.5, CI: (0.2, 6.7), p-value=0.039). In adjusted linear regression models, subjects with 100% European ancestry were estimated to have 253 CD4+ cells /mm3 lower (95% CI: (−517, 11), p-value = 0.06) when compared to subjects with 100% African ancestry. When comparing the SRR estimates from Model 2 against Model 3 the estimates change due to the correlation of GDA and SRR.
When we adjusted for host genetic factors that were previously found to be related to HIV RNA, CD4+ Count and CD4+ percentage25 using the same participants the estimated regression slopes for the GDA were similar (Supplemental Table 1).
To our knowledge, this is the first report that has examined the importance of continental ancestry in a cohort of HIV-infected children from the U.S. Our findings indicate that AIMs provide information above and beyond self-reported race and ethnicity, and demonstrate that there is considerable ancestry variability within self-reported race for these U.S. based studies of HIV-infected children. Associations were identified between GDA and HIV disease severity markers, such as HIV RNA, CD4+ counts and CD4 percent, with effects remaining after adjusting for self-reported race; these associations were strongest among those who self-reported as Black. Additionally, the estimated associations between self-reported race and HIV RNA and CD4+ were stronger when adjusting for GDA. This implies that GDA may be a confounder for the socioeconomic effect of being a member of different racial groups, and that without adjustment for continental ancestry this socio-economic association could not be fully estimated. This argues for inclusion of AIMs in adjusted analyses when there is either a suspected strong genetic effect or a suspected strong socioeconomic effect on the outcome of interest.
Additionally, AIMs can be used to minimize bias associated with population stratification in case-control association studies of genetic markers26,27. Clinically, AIMs may be useful in disease classification and identification of genetic risk. For example, a study of European Americans would have misidentified genetic variants in LCT and IRF4 genes with rheumatoid arthritis without accounting for continental ancestry28. Moreover, in certain situations, differences may exist even within continental ancestry populations. For example, Menotti et al observed that when examining the risk for coronary heart disease applying a model of northern Europeans to southern Europeans overestimated the absolute risk and vice versa29. Because we also found similar results when controlling for some important genetic predictors, it is likely that additional genetic markers are related to CD4+ and HIV RNA, and are correlated with AIMs. This premise is supported by our findings that the associations with CD4+ and HIV RNA remain after adjustment for the genetic markers that we previously found associated with CD4+ and HIV RNA.
The previously reported confounding effect of GDA with the CYP2B6 metabolizer phenotype and virologic response to an NNRTI-based (efavirenz or nevirapine) antiretroviral regimen15 would not have played a role in our findings since HIV RNA and CD4+ were measured before study participants received any antiretrovirals. However, the association that we found with HIV RNA supports the plausibility that GDA is a confounder when investigating a virologic response to an NNRTI based antiretroviral regimen. In addition, the reported interaction with GDA and lipid levels16 poses an interesting question for pediatric research, particularly given that an increase in lipid levels have been reported in children30,31. This remains an area of open research. Lastly, the reported interaction with the CCR5-2459 genotype and GDA on the time to viral suppression17 might be congruent with our findings since it would be expected that there is a relationship with the interacting variables and the outcome under study32; nevertheless, viral suppression and GDA were not found to be statistically significant in the Cheruvu 2014 study. However, there are some important differences between the P152/P300 cohort examined in this study and these other HIV reports. P152/P300 consisted of children while the other studies included adults; therefore, extrapolation may not be valid. In addition, we did not study the time to virologic suppression, rather we studied pre-ART plasma HIV RNA, CD4+ counts and CD4+ percentages.
There are a few limitations to our study. The P152/P300 protocols did not collect information on socioeconomic status and thus, we were not able to more finely describe effects within varying socioeconomic levels. Also, since the P152/P300 studies were conducted in the 1990s it is possible that infants were born to women more likely to have difficulty with substance abuse than observed in more contemporary HIV-infected pregnant women33. Finally, children in this study had a median age of 3.77 years, and did not have access to cART. It would be unusual in the US for a child to reach this median age without having initiated antiretroviral therapy. Thus, the observed results might not generalize to children who have early access to cART.
In summary, we have found through the identification of continental ancestry that the population of children with HIV infection within the U.S. has considerable ancestral heterogeneity, and that self-reported race/ethnicity is often not truly reflective of a child’s genetic background. Moreover, identification of continental ancestry provides additional information with regards to HIV RNA and CD4+ cell count and percent beyond what is observed with self-reported race/ethnicity. Therefore, it is possible that many studies in the HIV literature that have included ancestry in the analysis based on self-reported race could have resulted in misleading conclusions. The utilization of AIMs to identify continental ancestry should be considered when outcomes associated with HIV infection are likely to have a genetic component.
This research was supported by R01 NS077874 (SAS), R01 MH085608(CNM, AXM), R03 MH103995 (CNM, AXM), and UM1AI068616(SSB, MF, MQ, and TF).
Overall support for the International Maternal Pediatric Adolescent AIDS Clinical Trials Group (IMPAACT) was provided by the National Institute of Allergy and Infectious Diseases (NIAID) of the National Institutes of Health (NIH) under Award Numbers UM1AI068632 (IMPAACT LOC), UM1AI068616 (IMPAACT SDMC) and UM1AI106716 (IMPAACT LC), with co-funding from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the National Institute of Mental Health (NIMH). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
Conflicts of Interest: None to declare