|Home | About | Journals | Submit | Contact Us | Français|
Background.Host genetic variation influences human immunodeficiency virus (HIV) infection and progression to AIDS. Here we used clinically well-characterized subjects from 5 pretreatment HIV/AIDS cohorts for a genome-wide association study to identify gene associations with rate of AIDS progression.
Methods. European American HIV seroconverters (n = 755) were interrogated for single-nucleotide polymorphisms (SNPs) (n = 700,022) associated with progression to AIDS 1987 (Cox proportional hazards regression analysis, co-dominant model).
Results. Association with slower progression was observed for SNPs in the gene PARD3B. One of these, rs11884476, reached genome-wide significance (relative hazard = 0.3; P =3. 370 × 10−9) after statistical correction for 700,022 SNPs and contributes 4.52% of the overall variance in AIDS progression in this study. Nine of the top-ranked SNPs define a PARD3B haplotype that also displays significant association with progression to AIDS (hazard ratio, 0.3; P = 3.220 × 10−8). One of these SNPs, rs10185378, is a predicted exonic splicing enhancer; significant alteration in the expression profile of PARD3B splicing transcripts was observed in B cell lines with alternate rs10185378 genotypes. This SNP was typed in European cohorts of rapid progressors and was found to be protective for AIDS 1993 definition (odds ratio, 0.43, P = .025).
Conclusions.These observations suggest a potential unsuspected pathway of host genetic influence on the dynamics of AIDS progression.
AIDS has taken over 30 million lives in the 30 years since it first emerged as a cluster of rare infections and cancers among men who have sex with men (MSM) and recipients of human immunodeficiency virus (HIV)–contaminated blood products in the early 1980s. HIV has evaded most public health prevention campaigns, and efforts to develop a vaccine have been largely unsuccessful. Highly active antiretroviral therapy (HAART) has been developed, which delays progressive immune deficiency and improves survival. However, HAART does not clear HIV infection, can be difficult to tolerate, and occasionally elicits serious adverse reactions . Genetic association studies of candidate genes have revealed several common genetic variants that modulate AIDS progression, including CCR5-Δ32, HLA-homozygosity, HLA-B27 and -B57, IL10-5'A, MYH9, and KIR variants[2–6]. One of these AIDS Restriction Genes (CCR5-Δ32) encouraged the pharmaceutical development of HIV entry inhibitors, a new class of anti-HIV drugs[7–9] and, in one case, was employed in a successful bone marrow transplantation to a patient with AIDS who remained clear of HIV-1 for >2 years .
There are specific challenges to genome-wide association studies with susceptibility and resistance to infectious diseases generally and HIV/AIDS associations in particular. Because genetic variation is not causative, but rather modulates the host's response to an infectious agent, effects are expected to be moderate. In addition, operative protective or susceptible alleles may be rare in a population. Finally, few clinically well-defined pretreatment HIV/AIDS cohorts exist, and clinical outcomes and progression measures are varied. Therefore, power for any one test is limited. Nevertheless, a number of genome-wide association studies (GWAS) have been applied with some success to AIDS cohorts [11–15]. Genome-wide significant findings (P < 5 × 10−8) have emphasized the importance of the human leukocyte antigen (HLA) region in viral load (VL) set point and progression. There have also been a few intriguing suggestions of new associations with AIDS progression that approach but do not achieve a strict Bonferroni correction for significance (with P values in the range of 10−5 to 10−7), replicate in additional cohorts, and have as-yet-unexplored biological or functional links to AIDS. Most of these studies have been limited to small numbers of patients (ranging from <50 to a few hundred) with extreme and well-defined categorical progression phenotypes (for instance, rapid and slow progressors [14–18]). To target as-yet-undiscovered associations, we have interrogated a different and larger set of SNPs (those present on the Affymetrix 6.0 genotyping array) and have taken advantage of the detailed clinical data from 6 prospective American AIDS cohorts, previously used for discovery of 32 candidate gene associations with HIV acquisition, AIDS progression, AIDS defining condition, or HAART efficacy [3, 19]. Subjects (a total of 1526 European American [EA] and 437 African American [AA] study participants; supplemental Table 1) were genotyped for 934,968 SNP variants represented on the Affymetrix 6.0 genotyping array; after quality control (QC) filters that screen for SNP veracity and utility, 700,022 SNPs were analyzed. Here, we present association with progression to AIDS 1987 (Centers for Disease Control and Prevention [CDC] definition), a complex but clinically relevant phenotype, in all of the successfully genotyped EA seroconverters (755 total subjects).
After correction for multiple SNPs, we found a statistically significant signal for a single SNP within the PARD3B gene region on Chromosome 2. Additional linked SNPs in the region approached genome-wide significance (P values of 10−5 to 10−8) and define a protective haplotype (odds ratio [OR], 0.3; P = 3.2 × 10−8). One of these SNPs was assessed for functional consequence by quantification of PARD3B transcripts (spliced and unspliced variants) in lymphoblastoid cell lines with alternative PARD3B SNP genotypes and was interrogated in additional cohorts for association with AIDS (1993 definition ).
EA and AA subjects from 6 cohorts—Multicenter AIDS Cohort Study (MACS, principal investigator [PI] J. Phair), Multicenter Hemophilia Cohort Study (MHCS, PI J. Goedert), D.C. Gays (DCG, PI J. Goedert), Hemophilia Growth and Development Study (HGDS, PI E. Gomperts), San Francisco City Clinic Study (SFCC, PI S. Buchbinder), and AIDS linked to Intravenous Experience (ALIVE, PI G. Kirk) —were selected for GWAS on the basis of the clinical relevance and availability of high-quality high–molecular weight DNA. Seroconverters, defined as subjects who entered their respective studies with a seronegative HIV status and had an interval of ≤3 years between their last negative and first positive HIV test result, were available from 5 cohorts (EA: 405 from MACS, 222 from MHCS, 40 from DCG, 75 from SFCC, and 13 from ALIVE; AA: 39 from MACS, 9 from MHCS, 2 from DCG, 6 from SFCC, and 226 from ALIVE); seroconversion date was set as the midpoint between last negative and first positive HIV test result.
DNA was extracted from lymphoblast cell lines and run on the Affymetrix genome-wide human SNP array 6.0 genotyping platform. DNA (325 ng/sample) was prepared for both Sty1 and Nsp1 restriction enzyme digestion for this assay, an increase of 125 ng over Affymetrix recommendations, which greatly increased our success rate. The remainder of the assay was performed as per manufacturer's instructions.
First-pass QC of samples was performed in Genotyping Console 3.0.2. All samples that passed contrast QC (CQC) (CQC>.4) were genotyped for 934,968 SNPs using Affymetrix Power Tools (apt-probeset-genotype 1.10.0). Samples were genotyped by batch (200–500 samples), and individuals with <90% call rates were excluded from further analysis. Four controls were run a total of 39 times, and 145 duplicate samples and 8 Centre d'Etude du Polymorphisme Humain (CEPH) parent-offspring trios were run within and between plates and verified for concordance across runs (concordances ≥99%). Once acceptable genotyping standards were met, Affymetrix genotypes were compared with previously produced Taqman, Illumina, and/or Perlegen genotypes at a mean of 85 (range, 10–142) sites per sample. Samples with <90% concordance with their previous genotypes were discarded. Affymetrix sex calls (based on Y chromosome markers, X chromosome heterozygosity, and the intensity of the sex chromosome invariant probes) were compared with clinical data files, and inconsistent individuals were discarded.
PLINK was used to assess cryptic identity and cryptic relatedness and to verify sex calls. Identical subjects were investigated and discarded (if clinical data did not match) or were reduced to a single genotype file (if clinical data, including birth date, matched). Related subjects were flagged.
Individuals that passed subject QC were re-clustered and interrogated for all 934,968 SNPs as a single batch. SNPs were then filtered on the basis of multiple criteria. Autosomal perfect-match SNPs supported by Affymetrix (n = 868,157) were evaluated for Mendelian inheritance in CEPH trios (389 dropped), SNP call rates of >95% (39,220 dropped), Hardy Weinburg equilibrium (HWE; P > .001; 6512 dropped), and minor allele frequency >0.01 (121,403 dropped), resulting in a total of 700,022 SNPs for association analyses in EAs.
Population structure was assessed using the Principal Components Analysis (PCA) module of Eigensoft software. EA and AA subjects segregated with no discrepancies between clinical and genetic assignment. Within EA subjects, the top 10 most significant eigenvectors were calculated, and the top 2 were plotted (data not shown). The distribution of EA study subjects along the top 2 eigenvectors demonstrated a topology that mirrored that seen in several other studies of the European population structure, including a cline that probably reflects Northern European to Southern European descent and a cluster that most likely represents Ashkenazi Jews in our cohorts. AIDS outcomes were not significantly associated with population substructure. We also corrected for population structure with the genomic control method . The λ value was 1.01, and a λ correction did not alter the association results significantly. Both the PCA and genomic control analyses suggested that population stratification was not a confounding factor in our HIV progression analyses.
All SNPs that contributed to population structure (P < 10−7) were flagged in subsequent tests (displayed as an “X” in the Manhattan plot; Figure 1B). For all SNPs that indicated a potential association, the first 2 eigenvectors were used as covariates in the applicable tests, and P values were adjusted accordingly. None of the SNPs presented in this manuscript demonstrated significant contribution to population structure, and correction for structure did not alter the P values significantly.
Cox model analysis was performed for all SNPs passing the QC filters, using an additive model taking the number of copies of the rare allele, 0, 1 or 2, as the explanatory variable. The analysis used PROC PHREG in Statistical Analysis Software (SAS) ; the significance was evaluated using the log likelihood test. We performed a strict Bonferroni correction for the 700,022 SNPs used in the GWAS[28–30. All SNPs identified as interesting by statistical analysis were examined visually for quality assurance. SNP rs11884476 was further investigated to calculate explained fraction (EF) [3, 31, 32] using the SAS-based program, SUREV. Haplotypes were inferred by implementation of an expectation-maximization algorithm.
Both rapid progressors (AIDS, by the CDC 1993 definition, within 3 years of seroconversion) and slow progressors (CD4+ cell count >500 cells/mL and no clinical symptoms of AIDS at least 8 years after seroconversion) from the French GRIV cohort were typed using the Illumina HumanHap300 genotyping beadchips [14, 15], as were rapid progressors and controls in the Amsterdam Cohort Study (ACS). 376 Dutch and 695 French general population seronegative controls were typed on the same platform. One SNP in the PARD3B region of significance was the same on both Illumina and Affymetrix platforms (rs10185378) and was tested as a candidate SNP in a categorical case-control analysis in a combined set of GRIV and ACS rapid progressors (n = 123) versus HIV-negative control subjects of similar ethnicity (n = 1071). Eigenstrat, implemented in Eigensoft, was used to identify outliers in the population who were subsequently removed from the analysis. The first 2 eigenvectors were used as covariates to correct for any remaining population structure. An attempt was made to impute the other 8 SNPs presented here using IMPUTE 2 . However, these SNPs could not be imputed with high confidence in all of the data; 8%–26% of the samples had <90% confidence for genotype calls, with most SNPs missing ~20% of the calls. Minor alleles were absent from most of the imputed data, so statistics could not be reliably performed for these SNPs.
RNA was extracted from lymphoblastoid cell lines of individuals with alternative PARD3B genotypes using the RNeasy Mini kit (Qiagen) and quantified using a NanoDrop spectrophotometer (NanoDrop Technologies). First-strand complementary DNA synthesis was performed using SuperScript III (Life Technologies), according to the manufacturer's instructions. Gene expression was measured by quantitative reverse-transcription polymerase chain reaction (PCR) on a LightCycler 480 (Roche Applied Science). Primers were designed to span the junction between PARD3B exons 19 and 20 (isoforms B and C) or exons 19 and 21 (isoform A) and included 1 forward primer (5′-AAACATGGTGGCCTGAGAGA-3′) and 2 reverse isoform-specific primers (isoform A: 5′-CGTAGGACGGCCACTTTC-3′, isoforms B and C: 5′-TGATGTTTTGCTCCAATCCTT-3′). The FAM-labeled probe used to detect both PARD3B isoforms was Universal ProbeLibrary probe 78 (Roche Applied Science). Quantification of PARD3B expression was performed in multiplex reactions with the human β-actin reference gene assay (Roche Applied Science). Each reaction contained 1× Light Cycler Master, 100 nM each probe, and 200 nM each primer, in a final volume of 10 μL. Reactions were performed in triplicate. PCR conditions were 5 min at 95°C, followed by 45 cycles of 10 s at 95°C, 20 s at 60°C, and 1 s at 72°C. PARD3B isoform expression was normalized to β-actin expression for each reaction, and samples were excluded if the coefficient of variation was >.04 for the triplicate. The relative effect of genotype at rs10185378 on differential splicing was tested by linear regression using R, version 2.11.1, functions lm, and estimable (in library gmodels).
We present data from 700,022 SNPS that passed QC filters (see Methods) in the 755 EA seroconverters from 5 HIV/AIDS natural history cohorts. Of 803 EA seroconverters from 5 cohorts, 24 failed Affymetrix QC or the initial call rate cutoff of 90%, 5 were excluded for inconsistencies with their clinical file, and 15 were excluded due to sample mix-up (cryptic identity or discordance with previous genotypes). Once these 759 subjects were re-clustered and recalled; all but 4 had call rates of >95%, with an average call rate of 98.8%. When those 4 subjects and SNPS failing filters were removed from the analysis, leaving the 755 seroconverters and 700,022 SNPs considered here, average call rate increased to 99.3%.
A test for progression to the 1987 definition of AIDS using Cox model survival analysis revealed departure of multiple SNPs from expectation (Figure 1A and 1B ) including 1 SNP, rs11884476 on chromosome 2, position 206,026,838, that reached genome-wide significance (P = 3.370 × 10−9; Figure 1B). The minor allele was low frequency in our population (0.063) and protective, with a relative hazard of 0.3 (Figure 2A). When the study cohort was subdivided on the basis of risk group, MSM and hemophiliacs displayed parallel significant associations (for MSM, relative hazard (RH) = 0.43 [P = .0024]; for hemophiliacs, RH = 0.3 [P = .0012]; Figures 2B and 2C).
The protective SNP is located near exon 20 of the PARD3B gene region, which is a homolog of a cell polarity gene identified in Caenorhobditis elegans that is associated with tight junctions in C. elegans and human cells. To quantify the influence of the PARD3B genotype on AIDS progression, we calculated the EF of progression to AIDS 1987[3, 31, 32] for PARD3B rs11884476 along with 13 genetic factors previously identified as affecting AIDS progression (Table 1). When compared with the 13 previously described AIDS restriction genes re-assessed in this group of subjects, PARD3B factor rs11884476 had the largest EF, 4.52%. The overall EF for all 14 factors considered together was 15.6%.
Eight additional SNPs clustered around the exon 20 region of PARD3B were among the top 10 hits, with P ≤ 1.9 × 10−06 and a protective effect (Table 2; Figures 1B and 3A). Additionally, a total of 14 PARD3B SNPs fell within the lowest 1% of GWAS P values (data not shown). The 9 highest-ranking PARD3B SNPs, including the top 5 hits overall, were all included in a single protective haplotype within PARD3B (P = 3.2 × 10−8; hazard ratio [HR], 0.29; Figure 3A). These signals did not replicate in a cohort of 282 AA seroconverters.
Only 1 SNP (rs10185378) in the protective linkage group was in the coding region: a missense (C to T) in exon 20 that results in a change from a Threonine to an Isoleucine. This SNP has also been genotyped in 2 well-defined cohorts of rapid progressors; from the GRIV cohort [14, 15] and the Amsterdam Cohort Study  (n = 84 and n = 39, respectively). We examined the effect of this SNP in an analysis of rapid progressors compared with uninfected controls from both cohorts and found that it had a significant protective influence (P = .025; odds ratio [OR], 0.47; co-dominant model), consistent with our findings (Figure 3). In all cohorts (GRIV, ACS, and the discovery NCI - LGD combined cohorts), not a single rapid progressor was homozygous for the minor allele, and the overall frequency of this allele within rapid progressors was half that of the general population (data not shown). Although this amino acid change is not predicted to have a profound effect on protein function, this SNP is within a predicted splice site enhancer for exon 20 .
To explore this potential functional basis of PARD3B-based association, we developed a real-time PCR assay to detect expression of alternative splice forms in lymphoblastoid cell lines (LCLs) with alternative PARD3B genotypes for the codon-altering SNP, rs10185378  (Figure 4A). PARD3B produces 3 alternatively spliced reference sequences; 2 of these, isoforms B and C, have an exon (exon 20) that is absent in isoform A  (Figure 4B). When we compared RNA expression of these isoforms in LCLs bearing different PARD3B genotypes (CC, n = 12; CT, n = 8; TT, n = 9), we observed an increased average expression of isoform A and a decreased expression of isoforms B and C in cell lines homozygous (TT) for the minor (protective) allele (Figure 4B). There is a significant shift in relative expression of the 2 splice forms (P = .006; co-dominant model; Figure 4C). These data lend in vitro support to the in silico prediction that this enhancer influences PARD3B splicing .
The advance from candidate gene studies to genome-wide association studies for AIDS, as for other diseases, has the notable advantage of being able to reveal previously unsuspected genetic influences. The GWAS results presented here implicate rare genetic variants in the PARD3B gene as conferring slower progression to clinical AIDS among HIV-infected EAs.
PARD3B is a homolog of a C. elegans cell polarity determinant that localizes to tight junctions in human epithelial cells . It may be relevant that human PARD3B gene products interact with several members of the SMAD (mothers against decapentaplegic homolog) family. SMAD proteins are signal transducers and transcriptional modulators that mediate multiple signaling pathways and are involved in a range of biological activities, including cell growth, apoptosis, morphogenesis, development, and immune responses. Of particular interest are the interactions of PARD3B encoded proteins with SMAD2, SMAD3, SMAD4, and SMAD7, all of which interact directly with HIV-1 [39–43]. SMAD3 and 4 also interact with TGFβ to regulate Tat-mediated transcription of HIV-1 LTR. Individually, SMAD3 increases viral promoter activity, whereas SMAD4 decreases it. SMAD4 also suppresses activation by SMAD3 . These proteins have similar effects on Tat-mediated transcription of CCL2 (formerly MCP-1; ). SMAD2 and 7 are involved in HIV-1 glycoprotein 120-induced tubular epithelial cell apoptosis . SMAD7 helps prevent apoptosis, whereas SMAD2 is involved in downstream signaling .
PARD3B is a large gene (with 23 exons coding for 1205 amino acids and spanning 1.07 Mbp) located on chromosome 2. One of the exons (exon 20) is alternately spliced. Three splice variants of the PARD3B product have been characterized and shown to have differences in their protein binding properties . The functional significance of these alternative forms is unknown, but they may also have different expression profiles . A SNP in the cluster of association, rs10185378, which introduces a missense mutation in exon 20, also eliminates a predicted splice enhancer for this exon. We tested this same SNP for transcript production using a group of 29 lymphoblastoid cell lines of alternate genotypes and observed a significant increase in messenger RNA levels of the PARD3B isoform lacking exon 20 relative to forms with exon 20 with the rare allele (TT) at rs10185378 (Figure 4). It is by no means clear that this is the causative polymorphism; additional studies into PARD3B function with respect to HIV-host interactions are indicated.
As with any association result that may produce false-positive results, replication in independent cohorts is essential. Although results here are significant for this association, several factors make independent replication particularly important. The fact that the gene identified falls in an unexpected category demands additional evidence for the association to be validated. By the same token, if confirmed, the identification of PARD3B would have the potential to reveal an entirely new avenue of HIV cellular activity, with potential new targets for therapy, and could lead to a new understanding of the process of AIDS pathogenesis. The corroborative association of PARD3B-SNP rs10185378 in the European GRIV and ACS rapid progression cohorts provides an initial replication for this association (Figure 3B). A EA seroconverter cohort with AIDS 1987 progression would be a more valuable, though unavailable, comparison, because the signal in our cohort clearly associates with this particular phenotype and is not evident when comparing rapid and slow progressors from our cohorts for CD4+ cell count <200 cells/mL and does not achieve genome-wide significance for AIDS 1993 definition (Figure 5A). Therefore, it is not surprising that our SNP of highest significance did not replicate for CD4+ cell count <300 cells/mL or AIDS 1993 in a dichotomous analysis of the GRIV rapid versus slow progressors (data not shown). In addition, allele frequency and haplotype structure in African populations is quite different than in EA populations, which might account for the failure to replicate this signal in a small (n = 282) AA seroconverter replication cohort.
Although this GWAS demonstrates the strength of this approach for detecting unsuspected AIDS restriction genes, it also illustrates some weaknesses. Although we observed significant unadjusted P values for many of the previously validated ARGs , none achieved genome-wide significance in the present GWAS, nor did we detect additional recently reported associations from other HIV/AIDS progression GWAS[11–14] with this phenotype (Figure 5B). Deviations in SNP choice and region coverage from platform to platform account for some of the disparity between results from similar studies (as demonstrated by our difficulty in using Illumina HapMap300 SNPs to impute Affy 6.0 SNPs with high confidence in our PARD3B region of significance; see Methods), as do alternative choices in disease phenotype tests and genetic models. The AIDS progression end point phenotype examined here represents a proven, well-powered analysis for progression to clinical AIDS. However, it is a complex phenotype, and further exploration of PARD3B association with multiple sequelae is now clearly indicated. A more detailed exploration of genetic associations with all of the clinical data collected on these AIDS cohorts will ultimately allow a much broader view of multiple AIDS phenomena, albeit with statistical penalties for multiple test approaches (J. A. Lautenberger, J. L. Troyer, and S. J. O'Brien, unpublished data). As the interpretation of the full range of data necessarily involves subjective judgment, with a corresponding loss of statistical rigor, it seems useful to extract from the full dataset the statistically straightforward analysis presented here, which implicates a strong new AIDS progression genetic association.
The Intramural Research Program of the National Institutes of Health, National Cancer Institute, Center for Cancer Research; the National Cancer Institute, National Institutes of Health, under contract HHSN26120080001E. The Hemophilia Growth and Development Study is funded by the National Institutes of Health, National Institute of Child Health and Human Development, 1 R01 HD41224. Funding for genetic studies on the ACS was provided by the Netherlands Organization for Scientific Research (TOP, registration number 9120.6046).
We gratefully acknowledge and assign collaborative credit investigators who have assembled the AIDS study cohorts: Multicenter AIDS Cohort Study (MACS, PI J. Bream), Multicenter Hemophilia Cohort Study (MHCS, PI J. Goedert), D.C. Gays (DCG, PI J. Goedert), Hemophilia Growth and Development Study (HGDS, PI E. Gomperts), San Francisco City Clinic Study (SFCC, PI S. Buchbinder) and AIDS linked to Intravenous Experience (ALIVE, PI G. Kirk). We also acknowledge the excellent work of Michelle Hall, Mary McNally, Lisa Maslan, David Wells, and Jamie Troxler, who worked tirelessly to produce the genotypes, and the LGD student interns who provided sample preparation and technical assistance. This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government.