The search for novel host genetic factors that influence HIV pathogenesis has focused on a restricted and largely overlapping set of phenotypes. These phenotypes have logically and primarily focused on three clinically relevant end points and measurements that are summarized below. It follows that the susceptibility loci identified thus far participate in innate and adaptive immunity. All of the studies reviewed below are of high quality in terms of study designs, genomic data collection, and data analysis. Moreover, a compelling mechanistic rationale exists for each gene identified.
HIV RNA viral load at set point refers to the acute phase of the initial HIV infection when viral replication attains a steady-state. Though challenging to define given the typical follow-up period in most cohort studies (i.e., bi-annual visits), and the exclusion of many HIV-infected individuals who did not meet the inclusion criteria, the concordance among findings from GWAS studies to date were striking (Table ). Specifically, three loci have mapped to the major histocompatibility locus on chromosome 6 and have been verified in every cohort examined for viral load set point to date. Human leukocyte antigen (HLA) P5 (HCP5), HLA class B (HLA-B), and HLA-C harbor protective alleles that were associated with lower viral load at set point [6
]. The high degree of correlation (termed linkage disequilibrium [LD]) between HCP5 and HLA-B made their independent associations with viral load at set point difficult to disentangle due to the small number of cases with rare recombination events between these loci. However, the association originally mapped to the HCP5 locus was subsequently dissected [6
••]. These analyses suggest that the HCP5 locus is associated with higher viral load at set point and the HLA-B locus (primarily HLA-B*57) is responsible for the protective effects detected by the single nucleotide polymorphism (SNP) located in HCP5 [15
Genome-wide searches for HIV-related traits
Disease progression, defined as the time from seroconversion until the point at which immunosuppression occurs (i.e., a CD4+
T-cell count less than 350 cells/mm3
, initiation of highly active antiretroviral therapy [HAART]), is a clinical end point of considerable interest. The extremes of the distribution in terms of disease progression, rapid progressors (RP), and long-term non-progressors (LTNP) have been the focus of several genome searches. Aside from the associations previously identified with the related phenotype, viral load at set point HCP5, HLA-B, HLA-C and variation of the zinc ribbon domain containing 1 (ZNRD1) gene are associated with disease progression [6
Three studies that employed unique study designs are described below that resulted in the identification of additional novel disease loci for disease progression. The first study sought to refine the LTNP phenotype by excluding elite controllers. Elite controllers differ from LTNPs in that they suppress RNA viral load at levels that are below the limit of detection. Exclusion of elite controllers in a GWAS for LTNP uncovered an additional risk locus, C-X-C chemokine receptor type 6 (CXCR6), validated in several cohorts [11
•]. The second study sought to capture the entire spectrum of progression by initially screening three subgroups (i.e., RP, moderate progressors, LTNP), followed by replication in a larger cohort [8
]. Variation in the prospero homeobox 1 (PROX1) gene was associated with slower progression. The third study first performed a two-stage linkage analysis in two family-based cohorts of macaques, and replicated an association signal detected on the X chromosome with viral load at set point and disease progression in a cohort of HIV-infected individuals [14
]. The association signal mapped to an intragenic SNP located between the gene encoding for ribosomal protein S6 kinase alpha-6 (RPS6KA6) and the gene encoding for cylicin-1 (CYLC1). Subsequent validation in a larger sample may allow the gene underlying this association to be definitively identified.
Whereas the association signals with LTNP show considerable overlap with loci detected using disease progression [6
] as the phenotype, analysis of RP yielded unique loci [10
] that have proven difficult to replicate in other cohorts. This inability to replicate may be due in part to the under-representation of RP in most cohorts. However, the possibility that individuals who are RP or LTNP harbor risk alleles that are unique to each tail in the distribution cannot be discounted. The minor alleles of SNP mapping to the gene-encoding protein arginine methyltransferase 6 (PMRT6), the gene encoding the sex-determining region Y-box 5 (SOX5) gene, and the gene encoding for the transforming growth factor, beta receptor associated protein 1 (TGFBRAP1) alleles were depleted in RP. The risk allele mapping to the retinoid X receptor gamma (RXRG) gene was enriched in RP.
The majority of genome searches performed to date have focused on European-descent populations [5
]. This approach is reasonable as it reflects the demographic of the epidemic when the cohorts analyzed thus far were initiated. The recent development of methods to account for more complex population substructure has paved the way to examination of populations with more diverse ancestry, such as Africans [9
] and African Americans [13
]. Although the first recently reported GWAS for viral load set point in African Americans failed to identify risk loci that exceeded the significance thresholds required of genome-wide searches, the associations with HCP5 and HLA-C were validated [13
]. Examination of a completely different phenotype, maternal-to-child transmission in a cohort of HIV-serodiscordant children of HIV-infected mothers from Malawi, yielded several positional candidate genes. However, none exceeded the a priori significance thresholds. Further examination of these suggestive association signals may provide insights into the host genomic influence of the vertical transmission of HIV. Both studies suggest that novel phenotypes may provide additional novel genes that influence other facets of HIV transmission and pathogenesis.
To date, two genome searches have pursued novel HIV traits. The first involved examination of not only circulating RNA viral load, but viral DNA that serves as an estimate of the HIV viral reservoir [5
]. In addition to the verification of the previous associations with HCP5 and HLA-C, two additional associations with both lower RNA and DNA viral load were identified. The first was in the syndecan 2 (SYND2) and the second was with an intragenic SNP that detected two flanking positional candidate genes: DEAH (Asp-Glu-Ala-His) box polypeptide 40 (DDX40) and the human homolog of yippee-like 2 (YPEL2) [16
]. Future validation efforts may be able to identify which of the two genes (DDX40 or YPEL2) underlies this later association signal.