|Home | About | Journals | Submit | Contact Us | Français|
Seven genome-wide association studies (GWAS) have been published in AIDS and only associations in the HLA region on chromosome 6 and CXCR6 have passed genome-wide significance.
We reanalyzed the data from three previously published GWAS, targeting specifically low frequency SNPs (minor allele frequency (MAF)<5%). Two groups composed of 365 slow progressors (SP) and 147 rapid progressors (RP) from Europe and the US were compared with a control group of 1394 seronegative individuals using Eigenstrat corrections.
Of the 8584 SNPs with MAF<5% in cases and controls (Bonferroni threshold=5.8×10−6), four SNPs showed statistical evidence of association with the SP phenotype. The best result was for HCP5 rs2395029 (p=8.54×10−15, OR=3.41) in the HLA locus, in partial linkage disequilibrium with two additional chromosome 6 associations in C6orf48 (p=3.03×10−10, OR=2.9) and NOTCH4 (9.08×10−07, OR=2.32). The fourth association corresponded to rs2072255 located in RICH2 (p=3.30×10−06, OR=0.43) in chromosome 17. Using HCP5 rs2395029 as a covariate, the C6orf48 and NOTCH4 signals disappeared, but the RICH2 signal still remained significant.
Besides the already known chromosome 6 associations, the analysis of low frequency SNPs brought up a new association in the RICH2 gene. Interestingly, RICH2 interacts with BST-2 known to be a major restriction factor for HIV-1 infection. Our study has thus identified a new candidate gene for AIDS molecular etiology and confirms the interest of singling out low frequency SNPs in order to exploit GWAS data.
In recent years, the development of new DNA technologies has allowed the successful completion of genome-wide association studies (GWAS) and multiple genetic associations were identified in several diseases such as Celiac disease1, Schizophrenia2, or Type 2 diabetes3. In AIDS, several GWAS have been published since 20074–10 and associations passing genome-wide significance have been found solely in chromosome 6 in the region of HLA (HCP54–6, HLA-C4, 5, 11) and in the CXCR6 gene12. The design of genotyping chips tends to rely mainly on common variants with minor allele frequencies (MAF) > 5%. Moreover, the power to detect low allele frequency SNP associations in AIDS is weakened for two complementary reasons: 1. lower frequency means less individuals at stake and thus weaker p values, and 2. AIDS-related genomic cohorts have enrolled fewer patients compared to other pathologies such as chronic kidney disease13 or Type 2 diabetes3. However, low frequency SNPs are predicted to have the potential for greater functional consequences than common alleles and may contribute strongly to genetic susceptibility to common diseases14–16; thus, they constitute very good candidates for genetic association studies. We therefore decided to re-analyze the genome-wide data obtained from the GRIV cohort on slow progression 6 and on rapid progression7, by focusing specifically on low frequency SNPs (MAF < 5 %). In order to increase the power of the study, we also included in our analysis rapid and slow progressors from the Dutch ACS cohort11 and from the American MACS156 group8.
Slow Progressors and Rapid Progressors were gathered from 3 HIV cohorts:
The GRIV (Genomics of Resistance to Immunodeficiency Virus) cohort, established in 1995 in France, is a collection of DNA samples to identify host genes associated with slow-progression and with rapid progression to AIDS17–20. Only Caucasians of European descent living in France were eligible for enrolment to reduce confounding by population substructure. These criteria limit the influence of the ethnic and environmental factors (all subjects live in a similar environment and are infected by HIV-1 subtype B strains) and put an emphasis on the genetic make-up of each individual in determination of rapid (RP) or slow progression (SP) to AIDS. The RP and the SP were included on the basis of the main clinical outcomes, CD4 T-cell count and time to disease progression. SP were defined as asymptomatic HIV-1 infected individuals for more than 8 years, no treatment and a CD4 T-cell count above 500/mm3. The SP group (n=270) was composed of 200 males and 70 females aged at inclusion from 19 to 62 (mean=35). Rapid progressors (RP) were stringently defined as having a two or more CD4 T-cell counts below 300/mm3 less than 3 years after the last seronegative testing. The RP group (n=84) was composed of 72 males and 12 females aged at inclusion from 21 to 55 (median=32). DNA was obtained from fresh peripheral blood mononuclear cells or from EBV-transformed cell lines. All patients provided written informed consent before enrolment in the GRIV genetic association study.
The ACS (Amsterdam Cohort Studies) cohort is composed of 316 HIV-1 homosexual men and 100 HIV-1 drug users. This cohort was established to follow the course of HIV-1 infection using various endpoints related to HIV-1 infection and AIDS. The ACS participants were described in detail previously11.
Since CD4 T-cell count was assessed during routine clinical follow-up, we could extract the ACS SP and RP patients respecting the GRIV criteria (SP: n=36, RP: n=41). The SP and RP status was easily determined among seroconverter subjects since the date of seroconversion was known. We could also extract SP from seroprevalent subjects when the time of seropositivity was known to be higher than 8 years.
The MACS156 study comprises a subset of 156 HIV-1 homosexual men enrolled in the MACS (MultiCenter AIDS Cohort Study) cohort, a prospective cohort originally established to investigate the natural history of HIV infection21. This subset of MACS European American participants was chosen to be enriched with the extremes AIDS progression phenotypes 8. The MACS156 participants were described in detail previously8.
Since CD4 T-cell count was assessed during routine clinical follow-up, we could extract the MACS SP and PR respecting the GRIV definition (SP: n=59, RP: n=22). As with the ACS cohort, the SP and the RP were selected from seroconverter subjects. SP were also selected from seroprevalent subjects.
Three Caucasian control groups from France, The Netherlands, and the USA were merged and used as a control group.
The Data from an Epidemiological Study on Insulin Resistance Syndrome (D.E.S.I.R) program was a 9-year follow-up study designed to clarify the development of the insulin resistance syndrome. Subjects were recruited from 1994 to 1996 from volunteers insured by the French social security system, which offers periodic health examinations free of charge22. This control group comprised 694 participants both non obese and normoglycemic of the D.E.S.I.R. trial, all French and HIV-1 seronegative. It was composed of 281 males and 413 females aged from 30 to 64 years.
This control group corresponds to 376 Dutch subjects genotyped with HumanHap300 BeadChips23.
This control group corresponds to Caucasian subjects genotyped with HumanHap300 BeadChips from the Illumina Genotyping Control Database (www.illumina.com). There were 324 individuals.
The GRIV cohort, the ACS cohort, and the control groups were genotyped using the Illumina Infinium II HumanHap300 BeadChips (Illumina, San Diego, CA, USA). The genotyping quality was assessed for each group using the BeadStudio software (version 3.1, Illumina). Missing data >2%, minor allele frequency (<1%) and deviations from Hardy-Weinberg equilibrium in the control groups (p<1.0×10−3) were removed during these quality control steps6, 7, 11. The MACS156 group genotype data were obtained through the Affymetrix GeneChip Human Mapping 500K Array (Affymetrix, Santa Clara, CA, USA). Different quality control filters were applied to ensure reliable genotyping data8. For all the cohorts, we removed outliers exhibiting non-Caucasian ancestry, cohort by cohort, using the Eigenstrat method24.
We considered two pooled case groups (SP from GRIV, ACS, and MACS156 group on the one hand, RP from GRIV, ACS, and MACS156 on the other hand) and the pooled control groups (D.E.S.I.R., CTR-ACS, Illumina-CTR). We retained the 8584 SNPs exhibiting a MAF < 5% for the SP-CTR comparison (Bonferroni 5.8×10−6), and 10295 SNPs exhibiting a MAF < 5% for the RP-CTR comparison (Bonferroni 4.8×10−6). It was important to choose SNPs with low frequency in either groups since we were looking for factors either promoting progression (low MAF in SP compared to CTR or low MAF in CTR compared to RP) or preventing progression (low MAF in RP compared to CTR or low MAF in CTR compared to SP). The choice to screen specifically low frequency SNPs stems from two complementary reasons: 1. a biological reason: HIV-1 infection is a multi-factorial disease with several genetic factors impacting disease. Several groups have pointed out that most signals involved in complex diseases should deal with the low frequency variants spectrum 14–16. Indeed, this observation was confirmed in AIDS since the main signal found up to now was in HCP5 with a low frequency variant 4–6, 9. 2. a statistical fact: in our case-control configuration, a low frequency either in the CTR or in the CASE group means systematically a weaker p value for a given Odds Ratio (OR which measures the real biological impact of the SNP). For instance, for an OR of 0.5 in the dominant mode, the p values obtained are: 0.02 with a SP MAF of 2% and a CTR MAF=3.9%, 1.5×10−5 with a SP MAF of 5% and a CTR MAF=9.5%, 9.98×10−8 with a SP MAF of 10% and CTR MAF=18.2%. Moreover, in the Illumina genotyping chips used, SNPs with low MAF are under-represented in genotyping chips compared to SNPs with higher MAF (data not shown). Overall for a biological effect measured by a given OR, SNP associations are thus more difficult to identify in the low frequency spectrum since they exhibit weaker p values by essence and since they are under-represented and thus artificially penalized through global Bonferroni corrections.
We performed a case-control analysis by comparing either the SP group (n=365) or the RP group (n=147) consisting in GRIV, ACS, and MACS patients with the control group consisting of D.E.S.I.R, CTR-ACS, and Illumina control individuals (n=1394). The statistical analysis was performed by a logistic regression (with SNPtest software25) in the dominant mode, taking into account stratification by adding the 2 first Eigenstrat PC axes as covariates using EIGENSOFT. Testing for association under the dominant model was appropriate since we lacked power to test for associations under the recessive model and additionnaly, in this context, the dominant model is identical to the additive mode. For each SNP passing the Bonferroni threshold, we recomputed the regression by adding the HCP5 SNP (rs2395029) as a covariate to check for non-independence due to linkage disequilibrium (with SNPtest software25).
Using SNPtest Impute software 25 it was possible to impute untyped SNPs of the MACS156 study subjects, absent from the the Affymetrix GeneChip Human Mapping 500K Array (Affymetrix, Santa Clara, CA, USA) and present in the Illumina HumanHap300 BeadChips (Illumina, San Diego, CA, USA). They were imputed using the HapMap release 21 phased data for the European population (CEU) as panel of reference (http://www.hapmap.org). Only the SNPs imputed with high reliability (imputation quality score25 P>0.9) were retained.
After comparing the total SP group (combined from GRIV, ACS, and MACS 156) with the total control group (combined from D.E.S.I.R, CTR-ACS, and CTR_Illumina), we also performed an individual analysis of each group GRIV, ACS, and MACS156 for the four SNPs passing the Bonferroni threshold. For GRIV, we checked the result obtained in our previous GWAS6. For ACS, we tested the association between the four SNPs and times to time to AIDS93 by linear regression. For MACS156 group, we also tested the association between the SNPs and time to AIDS93 by linear regression.
For each SNP exhibiting a significant association, we looked for the other SNPs in linkage disequilibrium (r2≥0.9) in the HapMap population of Western European ancestry (CEU, HapMap data Release 21a/phase II January 2007, on NCBI B35 assembly, dbSNP125, http://www.hapmap.org) in order to identify the genes possibly tracked by the SNP associations. A SNP was assigned to a gene if it was located within the gene or in the 2kb flanking regions (potential regulatory sequence), otherwise it was considered intergenic.
To further explore the associations observed, we tried to identify putative modifications in mRNA expression as shown in Genevar26 and Dixon27 databases, in splicing (FastSNP28, http://fastsnp.ibms.sinica.edu.tw/pages/input_CandidateGeneSearch.jsp/), in polyadenylation (polyAH, http://linux1.softberry.com/berry.phtml?topic=polyah&group=programs&subgroup=promoter and polyApred, http://www.imtech.res.in/raghava/polyapred/submission.html), or in transcription factor binding sites (SignalScan, http://www-bimas.cit.nih.gov/molbio/signal/, TESS, http://www.cbil.upenn.edu/cgi-bin/tess/tess?RQ=WELCOME, and TFSearch, http://www.cbrc.jp/research/db/TFSEARCH.html, derived from TRANSFAC database). We used the Genecards database to look for the tissues and organs expressing the proteins (GeneCards29, http://www.genecards.org).
For the 8584 SNPs with a MAF < 5 %, we tested a total of 365 SP patients and compared them with a control cohort of 1394 seronegative individuals (from France, the Netherlands, and the US, see methods). Four signals passed the Bonferroni threshold in the dominant mode, three in chromosome 6 and one in chromosome 17 (Table 1). Not unexpectently, the best result was obtained for the well-replicated HCP5 rs2395029 (p=8,54×10−15). The two other associations found in chromosome 6 were for rs9368699 in C6orf48 (p= 3,03×10−10) and rs8192591 in NOTCH4 (p=9,08×10−07). These genes are in partial linkage disequilibrium with HCP5-rs2395029 (resp. r2=0.68, r2=0.57). The fourth association corresponded to the chromosome 17 SNP rs2072255 located in the RICH2 gene (p=3.30×10−6). This SNP is in full LD (r2=1) with rs2072254 corresponding to a synonymous change (Gly188Gly) of RICH2 (Figure 1 and see map, Supplemental Digital Content 1). The fours SNPs corresponded to the higher end of the MAF distribution of the SNP studied (see histogram, Supplemental Digital Content 2), which is logical since larger numbers of subjects lead to stronger p values.
In order to evaluate the role of LD in these associations, we recomputed the p-values using the HCP5 SNP (rs2395029) as covariate. The two chromosome 6 SNPs were not significant in the adjusted analysis (p=0.78 for rs9368699 in C6orf48, p=0.31 for rs8192591 in NOTCH4), but the association remained statistically robust for the chromosome 17 SNP after controlling for the HCP5 SNP, p=1.82×10−6 for rs2072255 in RICH2. In line with this computation, we found that the rs2072255 frequency was not significantly different between the SP elite controllers12 and among the SP non elite controllers 12 (p=0.7).
The subjects carrying the rs2072255-A allele were 9.04% in the SP group (Table 1), 18.87% in the control group, and 17.1% in the RP group (this excludes the hypothesis of an effect on infection). Interestingly these frequencies were consistent within each of the three SP groups as well as within each of the three control groups (Figure 2). Moreover, the positive signal for association of rs2072255 was confirmed in the individual cohorts GRIV, ACS, and MACS156 study: the comparison of the NP with D.E.S.I.R controls in GRIV led to p= 8.1×10−5, the analysis of ACS by linear regression led to p=0.05, and the analysis of the MACS156 group by linear regression led also to p=0.05. A table summarizing the results in the different cohorts is provided in supplemental digital content 3. Finally, the rs2072254 RICH2 exonic SNP (in LD with the rs2072255) is located in a splicing site according to FastSNP28 (Figure 1).
When comparing the 10295 selected SNPs between the 147 RP patients with the 1394 seronegative control group, no signal passed the Bonferroni threshold.
We decided to reanalyze previous GWAS data on AIDS cohorts by focusing specifically on low frequency SNPs (MAF<5%). For that, we combined rapid and slow progressors from three international cohorts from France (GRIV), Netherlands (ACS), and US (MACS156 study) totalling 365 SP and 147 RP, who were compared with 1394 controls (seronegative individuals). No association was found when comparing the RP group with the CTR group. This was not a surprise since the RP group comprised only 147 individuals and for a MAF of 5% in the CTR, one needed to get a quite strong biological effect (OR>2.8) to pass the Bonferroni threshold. Four SNPs passed the Bonferroni threshold when comparing the SP group with the CTR group. Among them, three are in chromosome 6 and were previously found significant by several studies: rs2395029 in HCP54–6, 8, 9, 11, rs9368699 in C6orf485, 6, rs8192591 in NOTCH45. NOTCH4 is an interesting candidate gene due to its role in immune regulation and the NOTCH4 rs8192591 corresponds to a non-synonymous Gly534Ser protein variant. This association was found independent from the HCP5 rs2395029 by Fellay et al.5, however the signal disappeared in our own study when using HCP5 rs2395029 as covariate. A possible explanation for this discrepancy could be the use of viral load as an endpoint in the study by Fellay et al.5 while we used here a progression phenotype.
The fourth signal identified in the present study is new and corresponds to rs2072255 in the chromosome 17 RICH2 gene. The RICH2 gene encodes a Rho-type GTPase activator composed of 818 amino acid (89,247 kDa) with an intracellular localisation. It is expressed highly in the brain, and at a basal level in several tissues notably in the lymph nodes29. A recent study has shown that RICH2 is a part of the physical link between BST-2 and the actin cytoskeleton and prevents the internalisation of BST-230. RICH2 could thus contribute to the externalisation of BST-2 which prevents HIV-1 virion budding and release31. This is a counteraction of HIV-1 Vpu known to favor internalization and degradation of BST-231. The rs2072255-A RICH2 allele favors progression to AIDS since 18.87% of the CTR carry the variant while only 9.04% of the SP carry it (Table 1, Figure 2). Interestingly, rs2072255 is in total LD with the SNP rs2072254 located in a splicing site of RICH2 as suggested by FastSNP (Figure 2). If the rs2072254-G minor allele alters mRNA splicing, it could lead to a down-modulation of RICH2 and thus explain a diminished effect of BST-2 against HIV-1 production.
The identification of three chromosome 6 signals already confirmed by several studies shows the relevance of targeting specifically low frequency SNPs in GWAS. The genetic and biological data regarding the RICH2 signal are also quite compelling and provide a new relevant candidate gene to explore the molecular etiology of HIV-1 pathogenesis. Further genetic and experimental studies will be needed to confirm and understand the effect of RICH2 in AIDS pathogenesis.
Sources of support: This work was supported by Agence Nationale de Recherche sur le SIDA (ANRS), Sidaction, Fondation de France, Innovation 2007 program of Conservatoire National des Arts et Métiers (CNAM), AIDS Cancer Vaccine Development Foundation, Neovacs SA, Vaxconsulting. Sophie Limou benefits from a fellowship from the French Ministry of Education, Technology and Research and Sigrid Le Clerc benefits from a fellowship of ANRS. The Amsterdam Cohort Studies on HIV infection and AIDS, a collaboration between the Amsterdam Health Service, the Academic Medical Center of the University of Amsterdam, Sanquin Research, and the University Medical Center Utrecht, are part of the Netherlands HIV Monitoring Foundation and financially supported by the Netherlands National Institute for Public Health and the Environment. The authors acknowledge funding from the Netherlands Organization for Scientific Research (TOP, registration number 9120.6046). The MACS is funded by the National Institute of Allergy and Infectious Diseases, with additional supplemental funding from the National Cancer Institute. UO1-AI-35042, UL1-RR025005 (GCRC), UO1-AI-35043, UO1-AI-35039, UO1-AI-35040, UO1-AI-35041. This project has been funded in part with federal funds from the National Institutes of Health, under contract HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. This Research was supported [in part] by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research.
The authors are grateful to all the patients and medical staff who have kindly collaborated with the GRIV project. Data in this manuscript were collected by the Multicenter AIDS Cohort Study (MACS) with centers (Principal Investigators) at The Johns Hopkins Bloomberg School of Public Health (Joseph B. Margolick, Lisa P. Jacobson), Howard Brown Health Center, Feinberg School of Medicine, Northwestern University, and Cook County Bureau of Health Services (John P. Phair, Steven M. Wolinsky), University of California, Los Angeles (Roger Detels), and University of Pittsburgh (Charles R. Rinaldo). Website located at http://www.statepi.jhsph.edu/macs/macs.html. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.