|Home | About | Journals | Submit | Contact Us | Français|
A mean of 9–10 years of human immunodeficiency virus type 1 (HIV-1) infection elapse before clinical AIDS develops in untreated persons, but this rate of disease progression varies substantially among individuals. To investigate host genetic determinants of the rate of progression to clinical AIDS, we performed a multistage genomewide association study.
The discovery stage comprised 156 individuals from the Multicenter AIDS Cohort Study, enriched with rapid and long-term nonprogressors to increase statistical power. This was followed by replication tests of putatively associated genotypes in an independent population of 590 HIV-1–infected seroconverters.
Significant associations with delayed AIDS progression were observed in a haplotype located at 1q41, 36 kb upstream of PROX1 on chromosome 1 (relative hazard ratio, 0.69; Fisher’s combined P = 6.23 × 10−7). This association was replicated further in an analysis stratified by transmission mode, with the effect consistent in sexual or mucosal and parenteral transmission (relative hazard ratios, 0.72 and 0.63, respectively; combined P = 1.63 × 10−6).
This study identified and replicated a locus upstream of PROX1 that is associated with delayed progression to clinical AIDS. PROX1 is a negative regulator of interferon-γ expression in T cells and also mitigates the advancement of vascular neoplasms, such as Kaposi sarcoma, a common AIDS-defining malignancy. This study adds to the cumulative polygenic host component that effectively regulates the progression to clinical AIDS among HIV-1–infected individuals, raising prospects for potential new avenues for therapy and improvements in AIDS prognosis.
Polymorphisms in numerous human genes have been reported to confer differential susceptibility to human immunodeficiency virus (HIV) infection and rates of progression to AIDS [1, 2]. Genes encoding the major HIV-1 coreceptor chemokine (C-C motif) receptor 5 (CCR5), its ligands, and HLA class I genes are well documented and consistently replicated AIDS restriction genes. Homozygosity for the CCR5 Δ32 allele provides near absolute protection against HIV-1 infection, whereas Δ32 heterozygosity delays progression to clinical AIDS . Certain HLA-B alleles are variously associated with increased or decreased rates of progression . For example, B5701 is strongly and consistently associated with slower disease progression [4, 5] and elite viral control .
Recent genomewide association studies (GWAS) involving HIV-infected individuals that use Illumina genotyping platforms have confirmed the strong associations between variation in HLA genes and the surrogate markers of plasma viral load and CD4+ T cell count [7–9]. However, the statistical noise introduced by a large number of tests generally require P values ≤5 × 10−8 for genomewide significance; in GWAS, true-positive signals with P values above this threshold cannot be distinguished from false-positive signals by purely statistical methods. This was observed in the failure of early HIV-related GWAS to identify previously identified AIDS restriction genes, such as CCR5 Δ32 or RANTES (CCL5). However, the effects of CCR5 Δ32 on disease progression were confirmed in a meta-analysis . Approaches using replication or meta-analysis of GWAS results can help overcome these problems .
Although HIV-1 load is a robust prognostic marker for clinical disease progression, studies show that viral load explains <50% of the variation in time from primary infection to the development of clinical AIDS [12–14]. Therefore, it is important to assess the direct host genetic contribution to the actual clinical end point of HIV infection: AIDS or AIDS-related death. To address the potential differences between the virologic and clinical end points, we used a 2-stage strategy to identify host common genetic polymorphisms associated with variation among HIV-infected individuals in their rate of progression to clinical AIDS (Centers for Disease Control and Prevention [CDC] 1987 definition). First, we conducted a GWAS involving a population of HIV-1–infected men from the Multicenter AIDS Cohort Study (MACS) who were chosen to be enriched with participants representing the extreme ends of phenotypic distribution rates of HIV-1 disease progression to clinical AIDS: rapid progressors and long-term nonprogressors. A focus on extreme genotypes improves our power to detect differences between these readily discernible groups [15, 16], as seen elsewhere with AIDS phenotypes [17, 18]. Second, we selected the top-ranking single-nucleotide polymorphisms (SNPs) from the initial GWAS for replication tests in an independent cohort of 590 HIV-1 seroconverters. Third, we stratified the replication cohort by transmission mode (sexual or parenteral) and tested for consistent effects across these distinct populations.
In stage 1 of the study (discover stage), we conducted a GWAS involving HIV-1–infected unrelated men who have sex with men from the MACS, a longitudinal prospective cohort conducted since 1984 in 4 US cities: Baltimore, Chicago, Los Angeles, and Pittsburgh . A total of 6973 men have been enrolled. From April 1984 through March 1985, 4954 men were enrolled; 668 more men were enrolled from April 1987 through September 1991. A third enrollment of 1351 men occurred from October 2001 through August 2003. The overwhelming majority, if not all, of the participants in our study were infected with HIV-1 subtype B, and they were eligible for inclusion if they were naive to highly active antiretroviral therapy (HAART) or treated only with zidovudine monotherapy.
We attempted to select equal numbers of individuals in 3 distinct categories of AIDS-free interval: 51 rapid progressors, 57 moderate progressors, and 48 long-term nonprogressors (Table 1). Study participants were chosen to be enriched with those who had HIV-1 disease progression rates from the extreme ends of this phenotypic distribution (rapid progressors and long-term non-progressors), because inclusion of extreme participants has been shown to increase power in genetic association analyses [15–18]. Rapid progressors were seroconverters for whom the interval from the estimated date of seroconversion to the date of the first clinical AIDS diagnosis or death due to an AIDS-related disease  was <5 years; it is estimated that ~10% of the MACS participants are rapid progressors . Moderate progressors were seroconverters for whom this interval was close to the median AIDS-free interval of 9.2 years . Long-term nonprogressors included seronconverters and those already infected at entry in the cohort. These individuals had no 2 consecutive CD4+ T cell counts <500 cells/mm3, no AIDS diagnosis, and no HAART use for ≥14.8 years after seroconversion (or after enrollment, if they were seropositive at entry). Dates of seroconversion were estimated as the midpoint between the last seronegative visit and the first seropositive visit; only seroconverters with <1 year between these visits were included in our study. For long-term nonprogressors, the end point was defined as the date of the last follow-up visit or the date of first HAART treatment, if applicable. Approximately 10% of the MACS participants are long-term nonprogressors . These phenotype criteria yielded 194 individuals for genotyping.
For the stage 1 discovery analysis, we successfully genotyped 189 individuals with use of the Affymetrix GeneChip Human Mapping 500K Array Set (http://www.affymetrix.com; 5 individuals either did not pass the minimal call rate threshold [>95%] or were found to have potential sample errors in subsequent quality control). DNA was obtained from peripheral blood mononuclear cells (PBMCs) for 118 individuals and from lymphoblastoid cell lines for 71 individuals with limited PBMC availability. Genotypic fidelity between PBMCs and lymphoblastoid cell lines genotypes was validated using paired genotypic samples from 16 individuals . SNP genotypes were called using the Affymetrix BRLMM algorithm ; the mean call rate for all SNPs was 98.5%. For SNP quality assurance, we repeated the genotyping of 471,000 SNPs for 151 individuals with use of the Affymetrix Genome-Wide Human SNP Array (version 6.0; J.L.T., in preparation) in a different laboratory (Laboratory of Genomic Diversity, National Cancer Institute) and achieved >99.5% genotyping concordance between the genotype calls for the 2 laboratories (unfiltered for SNP call rate; 71,121,000 genotypes were compared).
The stage 1 discovery study population included individuals of 4 self-reported ethnicities: white, non-Hispanic; black, non-Hispanic; white, Hispanic; and Asian or Pacific Islander. To confirm ancestries, we combined our 189 sample genotypes with 30 HapMap reference samples  were also genotyped on Affymetrix 500K arrays, including 10 individuals from each of the following populations: Yoruba in Ibadan, Nigeria; Japanese in Tokyo, Japan, and Chinese in Beijing, China; and Utah residents with ancestry from northern and western Europe. Using this combined data set, and after removing regions of known high linkage disequilibrium, we estimated identity-by-state pairwise distances with the genomewide association software PLINK (version 1.07) ; we used these estimates for multidimensional scaling analysis. Among the 189 MACS participants analyzed, self-reported ethnicities were consistent with observed HapMap ancestry in all but 2 instances. To avoid spurious associations resulting from population stratification, we restricted subsequent genomewide association analyses to 156 individuals whose MACS samples were grouped with the HapMap European American population (51 rapid progressors, 57 moderate progressors, and 48 long-term nonprogressors) (Table 2). In this population, we also corrected for potential population stratification with use of a modified Eigenstrat method ; 14 significant principal component axes were identified and included as covariates in the regression models described below. Q-Q plots testing the normality of the P value distribution after SNP filtering and Eigenstrat correction showed no significant deviations from what would be expected with a null hypothesis (λ = 1.0056; λ = 1.0 is expected with a null hypothesis), indicating little effect of population stratification.
We tested for associations between individual SNP genotypes and phenotypes for time to clinical AIDS by using 2 approaches: implemented with PLINK software . First, we used logistic regression and an additive genetic model and included as covariates age at seroconversion and the 14 significant principal component axes identified by the Eigenstrat method. In this categorical approach, phenotypes were coded as follows: rapid progression (3), moderate progression (2), long-term nonprogression (1). Next, we used quantitative progression phenotypes (log10 transformed) in a linear regression model (with the same genetic model and 15 covariates). Results were overlapping for both analyses; only the results for logistic regression (categorical progression phenotypes) are shown. To avoid excessive artifacts of small samples and for quality control screening of genotypes before testing, we excluded all SNPs with minor allele frequencies of <5% (n = 113,205), all SNPs that deviated from Hardy-Weinberg equilibrium at a significance level of P < .001 (n = 4459), and all SNPs that yielded unambiguous genotype calls in <95% of samples (n = 54,739). These quality control measures yielded 345,926 SNPs. We located SNPs in a gene or gene region with use of the Ensembl database , implemented with WGA-Viewer software (version 1.25) ; SNPs were mapped to 5′ upstream, 5′ untranslated, coding, intronic, 3′ untranslated, or 3′ downstream gene regions, as well as intergenic regions. The Affymetrix signal intensity plots for all top-ranking SNPs were examined to confirm genotype calls. We show P values corrected using the false discovery rate procedure .
From the stage 1 analysis, we selected the 25 top-ranking SNPs with P values of <1 ×10−5 and q values <.70, representing 15 loci with SNPs with r2 values >0.90, to test for replication in an independent replication cohort of 590 seroconverters. These stage 2 participants were enrolled in 5 natural history cohorts of patients with HIV infection or AIDS: AIDS Link to the Intravenous Experience (n = 13) , MACS (n = 291, excluding the 156 MACS individuals used in our stage 1 GWAS) , the San Francisco City Clinic Cohort Study (n = 76) , and the Multicenter Hemophilia Cohort Study (n = 169) . Details of these cohorts have been described elsewhere . The date of seroconversion after study enrollment was estimated as the midpoint between the last seronegative and the first seropositive HIV-1 antibody test result; only individuals of European ancestry with <2 years between the 2 tests were included. The censoring date was either the date of the last follow-up visit or 31 December 1995, the date HAART became the standard of care, to avoid potential confounding by virus-suppressive therapy. Genotyping was performed using the Affymetrix Genome-Wide Human SNP Array, version 6.0.
In the stage 2 analysis, we analyzed the 25 top-ranking stage 1 SNPs for genotypic association with progression to clinical AIDS (CDC 1987 definition), time to AIDS-related death, and time to CD4+ T cell count <200 cells/mm3. Cox model P values were subjected to a Bonferroni correction for 15 independent tests, representing all SNPs tested for replication and pruned for linkage disequilibrium at r2 > 0.9 (25 SNPs in 15 loci). Next, we stratified the replication cohort by viral transmission mode and tested the significant associations in each resulting population. The 2 discrete populations represented sexual transmission (men who have sex with men; n = 405) and parenteral transmission (hemophiliacs and injection drug users; n = 182).
Our discovery-stage genomewide association analysis of 345,926 validated SNPs with rate of progression to clinical AIDS resulted in 25 SNPs with statistically significant associations at P < 1 ×10−5 and corrected P values of <.70 (q values, corrected for false discovery rate) (Table 3). These SNPs were tested for replication in an independent cohort of 590 seroconverters. In this replication population, results with an additive (codominant) genetic model revealed a single haplotype segment including 3 linked SNPs (r2 > 0.9) 36 kb upstream of PROX1 on chromosome 1, with minor alleles that were associated with delayed progression to AIDS diagnosis as well as AIDS-related death (Figure 1) after Bonferroni correction (rs17762192; discovery-stage, P = 7.13 ×10−5; replication-stage, P = 4.8 ×10−4 and P = 7.2 ×10−3 [corrected]; relative hazard ratio, 0.69; Fisher’s combined P = 6.23 ×10−7) (Table 2). Effects were consistent in the replication analysis stratified by HIV-1 transmission mode, in which equivalent relative hazard ratios for AIDS diagnosis were found in the sexual and parenteral transmission populations (relative hazard ratios, 0.72 for sexual and 0.63 for parenteral transmission; Fisher’s combined P = 1.63 ×10−6) (Table 2).
The 3 associated SNPs were examined for linkage disequilibrium in relation to PROX1. There was a strong backbone of linkage around the associated SNPs, and for 2 SNPs, the linkage disequilibrium extended, albeit weakly, into the PROX1 coding region (Figure 2, which appear only in the electronic version of the Journal). We are presently dissecting the haplotype structure of the region, which includes genotyping additional SNPs and resequencing the region, to further refine the association with progression to clinical AIDS.
Our stage 1 genomewide association analysis provides an opportunity to replicate previously identified candidate AIDS restriction genes in these cohorts. In the analysis of progression to clinical AIDS, we found clear associations for SNPs within 50 kb of HLA-B (the most significant was rs16899646; P = 1.33 ×10−5). Likewise, we found associations for SNPs within or in linkage disequilibrium with SNPs within the following genes, reported elsewhere to influence AIDS progression: CCR2/CCR5 (rs916093; P = 1.8 ×10−3) and CXCR1 (Table 4).
The associations found by Fellay et al  between RNF39 and ZNRD1 and disease progression were not replicated in our stage 1 analysis (assessing the SNPs in strong linkage disequilibrium with those identified by Fellay and colleagues). The discrepancies between these 2 progression GWAS may reflect small sample sizes (n = 486 in the study by Fellay et al  and n = 156 in the present study) or the differences in disease progression phenotypes. Fellay et al used time from seroconversion to the start of antiretroviral treatment or time to the predicted or observed first CD4+ T cell count <350 cells/mm3, whereas both stage 1 (discovery) and stage 2 (replication) of our study tracked time from seroconversion to a clinical AIDS diagnosis or AIDS-related death.
Because recent HIV-related GWAS have focused on viral load phenotypes [7–9], we sought to replicate these associations with use of viral load measurements for the 156 individuals from stage 1. Set point viral load was calculated according to the methods of Fellay et al  as the mean of log10-transformed viral load (measured in viral RNA copies per milliliter) from 6 months to 3 years after the first visit with a positive HIV-1 antibody test result, including only visits before any antiretroviral therapy. To avoid measurements obtained during the initial peak viremia or during the accelerating phase of chronic disease characterized by increasing viral load, we removed all measurements > 0.5 log10 higher or lower than the mean viral load during visits from 6 months to 3 years after seroconversion, according to the methods of Fellay et al .
We tested for associations between SNP genotypes and viral load phenotypes with use of the aforementioned SNP filtering and linear regression methods. Fellay et al  examined viral RNA set point in 486 European individuals and found 2 SNPs strongly associated with set point HIV-1 load. One polymorphism was found in a nonsynonymous coding nucleotide of HLA complex P5 (HCP5); this SNP is in high linkage disequilibrium with HLA-B5701 on chromosome 6. The other SNP was found in the 5′ region of HLA-C. Although there was minimal overlap in SNPs tested between our genotyping platform (Affymetrix 500K) and that used by Fellay and colleagues (Illumina 550K), certain relationships were replicated (Table 5) and reaffirmed the dominant role of the HLA in controlling viral load. For example, SNP rs2248462, intergenic to HCP5 on chromosome 6 and associated with viral set point by Fellay and colleagues (P = 3.61 ×10−6) , was also significantly associated with viral set point in our study (P = 4.15 ×10−4; corrected, P = 1.16 ×10−2); this SNP is in strong linkage disequilibrium (r2 = 1) with 2 other associated SNPs from our study: rs2516422 (P = 5.12 ×10−4) and rs2395034 (P = 5.13 ×10−4). The minor alleles of these 3 SNPs were associated with decreased viral set point. SNP rs9348876, 7.8 kb upstream of AIF1 on chromosome 6, was significantly associated with viral set point both by Fellay et al  (P = 4.09 ×10−5) and in our data set (P = 1.3 ×10−3; corrected, P = 3.75 ×10−2).
In our stage 1 GWAS of viral set point, we observed significant associations with HIV-1 load for several SNPs in the HLA region, as reported elsewhere . This portion of our study confirms the dominant role of HLA genes in controlling viral load.
In our 2-stage association study of host genetic polymorphisms and rate of progression to clinical AIDS, we found SNPs near the transcription factor PROX1 that were confirmed in replication tests, with use of a larger and independent set of seroconverters. Human PROX1 is involved in biologic functions closely tied to HIV infection, most notably as a negative regulator of interferon (IFN) γ expression in T cells . IFN-γ plays an important role in HIV disease progression through its activity as a regulatory cytokine and inflammatory effector. Furthermore, PROX1 encodes a transcription factor in which differential expression has been shown to mediate the progression of Kaposi sarcoma , a common sequela of HIV-1 infection. The exact role of this locus in HIV disease progression is unclear, although the regulation of IFN-γ in T cells by PROX1 presents a possible mechanism of action. In our stage 1 population, the association between this locus and disease progression is not explained by an association with Kaposi sarcoma as the AIDS-defining illness. Additional studies will be necessary to determine the association between these SNPs and differential levels of expression of PROX1 and/or IFN-γ.
These data suggest that, beyond the major role of HLA in viral control, a cumulative polygenic host component may be involved in the regulation of rate of progression to clinical AIDS. Our results should prove to be valuable in informing larger host genetics studies of HIV-1 and progression to AIDS and in guiding future replication and follow-up studies in other cohorts and populations.
We thank Rob Hall, Trevor King, and Roger Bumgarner at the Center for Array Technologies, University of Washington; Janet Schollenberger at the Center for the Analysis of MACS Data, Johns Hopkins University; and all the individuals who participated in the cohorts used for this study.
Financial support: National Institutes of Health (NIH; R37 AI47734 to J.I.M. and T32 AI07140 to J.T.H.) and University of Washington Center for AIDS Research Genomics Core (P30 AI27757 to J.T.H. and J.I.M.). The Multicenter AIDS Cohort Study is funded by the National Institute of Allergy and Infectious Diseases, with additional supplemental funding from the National Cancer Institute (NCI) and the National Heart, Lung and Blood Institute (UO1 AI35042, 5MO1 RR00722 [GCRC], UO1 AI35043, UO1 AI37984, UO1 AI35039, UO1 AI35040, UO1 AI37613, and UO1 AI35041); partial support was provided by the Intramural Research Program, NCI, NIH. The AIDS Link to the Intravenous Experience study was supported by the National Institute on Drug Abuse (R01-DA04334 and R01–12586). The San Francisco City Clinic Cohort Study was supported by the Centers for Disease Control and Prevention (U64/CCU900523–08).
Presented in part: 16th Conference on Retroviruses and Opportunistic Infections, Montreal, Quebec, February 2009 (poster 544).
Potential conflicts of interest: none reported.