|Home | About | Journals | Submit | Contact Us | Français|
Human immunodeficiency virus (HIV-1) variants in brain primarily use CCR5 for entry into macrophages and microglia, but dual-tropic (R5X4) HIV-1 has been detected in brain and cerebral spinal fluid (CSF) of some patients with HIV-associated dementia (HAD). Here, we sequenced the gp120 coding region of nine full-length dual-tropic (R5X4) env genes cloned directly from autopsy brain and spleen tissue from an AIDS patient with severe HAD. We then compiled a dataset of 30 unique clade B R5X4 Env V3 sequences from this subject and 16 additional patients (n=4 brain and 26 lymphoid/blood) and used it to compare the ability of six bioinformatic algorithms to correctly predict CXCR4 usage in R5X4 Envs. Only one program (SVMgeno2pheno) correctly predicted the ability of R5X4 Envs in this dataset to use CXCR4 with 90% accuracy (n=27/30 predicted to use CXCR4). The PSSMSINSI, Random Forest, and SVMgenomiac programs and the commonly used charge rule correctly predicted CXCR4 usage with >50% accuracy (22/30, 16/30, 19/30, and 25/30, respectively), while the PSSMX4R5 matrix and “11/25” rule correctly predicted CXCR4 usage in <50% of the R5X4 Envs (10/30 and 13/30, respectively). Two positions in the V3 loop (19 and 32) influenced coreceptor usage predictions of nine R5X4 Envs from patient MACS1 and a total of 12 Envs from the dataset (40% of unique V3 sequences). These results demonstrate that most predictive algorithms underestimate the frequency of R5X4 HIV-1 in brain and other tissues. SVMgeno2pheno is the most accurate predictor of CXCR4 usage by R5X4 HIV-1.
Human immunodeficiency virus type 1 (HIV-1) infects macrophages and microglia in the central nervous system (CNS) and causes HIV-associated dementia (HAD) or mild neurocognitive impairment in 10–20% of AIDS patients. HIV-1 variants in brain are genetically distinct from those in lymphoid tissues and other organs, and specific sequences in the envelope glycoprotein (Env) coding region of gp160 have been associated with brain compartmentalization.1 HIV-1 tropism is influenced by the interaction of Env with CD4 and a coreceptor, typically CCR5 or CXCR4. CCR5 (R5) is the primary coreceptor for HIV-1 infection of macrophages and microglia. Several studies identified HIV-1 brain or CSF isolates capable of mediating entry using CXCR4 (X4 isolates) or both CCR5 and CXCR4 (R5X4 or dual-tropic isolates).1–4 However, the frequency of X4 or R5X4 strains in the brain of AIDS patients is unknown.
The third hypervariable loop of Env gp120 (V3), a disulfide-linked loop of approximately 35 amino acids, makes direct contact with the coreceptor and is the primary determinant for R5 or X4 tropism.5 Bioinformatic algorithms that use the V3 sequence to predict HIV-1 coreceptor usage have been developed as a timely and cost-effective alternative to traditional phenotypic assays.6–10 However, database sequence sets are heavily dependent on R5 sequences and may underestimate X4 usage and dual-tropism. Furthermore, the ability of bioinformatic algorithms to reliably predict X4 usage by R5X4 isolates has not been addressed. Here, we cloned and sequenced nine full-length R5X4 HIV-1 Envs from autopsy brain and spleen tissues from an AIDS patient with severe HAD. We then added these sequences to a larger dataset of unique clade B R5X4 V3 sequences and compared the ability of freely available bioinformatic algorithms to accurately predict X4 usage.
MACS1, a male homosexual patient in the Chicago component of the Multicenter AIDS Cohort Study with no history of antiretroviral therapy, had severe HAD and a CD4+ T cell count of 2cells/μl at the time of death.3,4 Analysis of CCR5 alleles by polymerase chain reaction (PCR) demonstrated that the patient was homozygous wild type for CCR5. At autopsy, sections through the frontal and parietal cortex showed pathology consistent with HIV encephalitis (i.e., multiple microscopic foci of necrosis and focal perivascular lesions throughout the white matter occasionally associated with multinucleated giant cells). HIV vacuolar myelopathy and leukoencephalopathy were unusually advanced within the brain stem and cerebellum. We previously isolated four R5X4 HIV-1 viruses from brain (br) and spleen (spln) tissue from this patient [MACS1-br (pbmc), MACS1-br (mdm), MACS1-spln (pbmc), and MACS1-spln (mdm)]. These viruses were isolated from cultures with CD8-depleted peripheral blood mononuclear cells (PBMC) or monocyte-derived macrophages (MDM) as indicated. These R5X4 isolates replicated efficiently in MDM and microglia and induced syncytia formation in >90% of cells by day 10 postinfection. The brain- and spleen-derived isolates entered macrophages and microglia primarily via CXCR4 and induced neuronal apoptosis in primary brain cultures, suggesting that R5X4 variants may be pathogenic in the CNS.3,4
To investigate the frequency of R5X4 variants in tissues from patient MACS1, env genes were amplified from genomic DNA isolated from autopsy brain and spleen tissues and cloned into the pCR3.1 expression plasmid as described.4,11 A single-round infection assay screen yielded 28 clones that encoded functional Envs (n=10 brain and 18 spleen clones). Ten Envs from this set (n=5 brain and 5 spleen) were selected for sequencing and further analysis. Expression and processing of 9/10 Envs on 293T cells were verified via Western blotting with antibodies directed against gp120 (goat anti-gp120 from the National Institutes of Health AIDS Research and Reference Reagent Program) (data not shown).2,4,11 Coreceptor usage was investigated using a cell-cell fusion assay as previously described (Table 1).4,11 We previously showed that CCR5 and CXCR4 usage determined in this cell-cell fusion assay correlates well with coreceptor usage determined in viral infection assays.2,3,12,13 The well-characterized ADA (R5), 89.6 (R5X4), and HxB2 (X4) Envs were used as controls. Nine of 10 MACS1 Envs tested (n=5/5 brain and 4/5 spleen) were equally capable of using CCR5 and CXCR4 for fusion in CD4-expressing cells (Table 1). None of the Envs showed a reduced dependence on CD4 levels in the cell-cell fusion assay (data not shown). One Env (spleen-derived clone sp7a-14) was nonfunctional based on cell-cell fusion assays using either CCR5 or CXCR4; this Env contains an amino acid variant (R507) at the gp120/gp41 interface, disrupting the REKR motif required for furin cleavage, which is critical for HIV fusion.14 Thus, Env sp7a-14 is probably nonfunctional due to loss of gp160 cleavage.
Analysis of amino acid sequences revealed that nine envs (four brain and five spleen) encode a full-length gp120 protein. One brain-derived env clone (br6b-9) has a short N-terminal truncation with the sequence initiating at the second methionine (position 26) due to a frameshift at position 15 in the N-terminus; this truncation did not affect Env function in the cell-cell fusion assay. Phylogenetic analysis of gp120 nucleotide sequences confirmed distinct compartmentalization of brain- and spleen-derived Env clones (Fig. 1). Phylogenetic analysis of V1V2 amino acid sequences showed tight clustering of brain V1V2 sequences and separation of brain- and spleen-derived Env V1V2 sequences (Fig. 1). MACS1 brain-derived Env clone br6b-8 was more closely related to V1V2 sequences derived from spleen than from brain.
The gp120 V3 loop contains important determinants of coreceptor usage and syncytium induction in MT-2 cells.5 Consequently, patterns of V3 amino acid variation are frequently used to predict coreceptor usage of primary HIV strains.6–10,15,16 To determine the ability of freely available bioinformatic algorithms to predict X4 usage in R5X4 Envs, we compiled a dataset containing 30 unique clade B R5X4 V3 sequences (27 unique V3 sequences from 16 patients in 9 published studies and 3 unique V3 sequences from patient MACS1; n=4 brain- and 26 lymphoid/blood-derived sequences) (Fig. 2).2,12,13,17–22 Envs in this dataset were cloned and sequenced directly from tissue or from low passage isolates, and coreceptor usage of the clones was experimentally determined using viral infection assays12,17,20–22 or a combination of viral infection assays and cell-cell fusion assays2,13 We then used this dataset to determine the ability of six bioinformatic algorithms to accurately predict X4 usage by R5X4 Envs (Fig. 2).
Two simple and commonly used prediction methods are the “11/25” rule and the charge rule. In the “11/25” rule, the presence of a positively charged amino acid at either position 11 or 25 of the V3 loop predicts that the virus can use X4 to mediate entry.6–9,15,16 This prediction method is significantly less accurate for X4 than for strictly R5-tropic viruses, with <50% and >90% accuracy, respectively.6,9,16 The charge rule states that an increase in the net charge of the V3 loop (>3) is strongly associated with CXCR4 usage.10 The position-specific scoring matrix (PSSM) detects nonrandom distributions of V3 amino acids at adjacent sites associated with an empirically determined group of sequences.8,15 Two separate matrices are available for clade B Envs at http://ubik.microslu.washington.edu/computing/pssm. PSSMX4R5 bases predictions on sequences of known coreceptor usage phenotype. PSSMSINSI bases predictions on known syncytium-inducing phenotypes on the MT-2 cell line.8 Random forest (http://yjxy.ujs.edu.cn/R5-X4 pred.rar) evaluates the relative importance of 37 features of the V3 loop including amino acid variation at each position, net charge, and polarity.10 Finally, two versions of the support vector machine (SVM) algorithm can be used for coreceptor phenotype predictions.9 SVMgenomiac (http://genomiac2.ucsd.edu:8080/wetcat/v3.html) outputs a categorical score (CCR5 or CXCR4) using a dataset aligned to a standard amino acid sequence. SVMgeno2pheno (http://coreceptor.bioinf.mpi-sb.mpg.de/cgi-bin/coreceptor.pl) similarly outputs a categorical score based on alignment of V3 nucleotide sequences. The accuracy of these methods in predicting coreceptor usage of R5X4 Envs has not been reported.
Coreceptor usage predictions for the R5X4 V3 sequence dataset using the “11/25” rule were comparable to those reported for X4-tropic V3 sequences, with an accuracy of 43.3% (13/30 Envs). Including position 24 in this rule (“11/24/25” rule) is reported to increase the accuracy of prediction of X4-tropic sequences.6 However, including position 24 did not affect the prediction accuracy of this R5X4 dataset. The Cardozo et al. model was constructed using strictly R5- or X4-tropic Envs. However, R5X4 Envs contain V3 loops that can adopt conformations capable of interacting with either CCR5 and CXCR4, and may therefore contain surface patches that do not conform to the static models of the V3 structure in this proposed model.6 A second commonly used coreceptor-prediction method, the charge rule, predicted that 83.3% of the V3 data set could use CXCR4 for entry (25/30 Envs). Of the two PSSM matrices, the PSSMX4R5 matrix correctly predicted X4 usage for 10/30 V3 sequences (33.3% accuracy). PSSMX4R5 predicted that 3/3 MACS1 unique V3 sequences would be strictly R5-tropic (0% accuracy). The PSSMSINSI matrix correctly predicted X4 usage in 22/30 V3 sequences, thereby increasing the accuracy of prediction for our dataset of R5X4 Envs from 33.3% to 73.3%. X4 usage was correctly predicted for 2/3 unique MACS1 V3 sequences (66.6% accuracy). The random forest program predicted X4 usage for 16/30 Envs, with an accuracy of 53.3%, and predicted that 3/3 unique MACS1 Envs were strictly R5-tropic (0% accuracy). SVMgenomiac correctly predicted that 19/30 Envs could use CXCR4 to mediate entry, with an accuracy of 63.3%. As with PSSMSINSI, X4 usage was correctly predicted for 2/3 unique MACS1 V3 sequences (66.6% accuracy). Finally, SVMgeno2pheno predicted that 27/30 R5X4 Envs could use X4 for entry, with an accuracy of 90.0% (specificity, defined here as the rate of false positives, was set at 10%). Prediction of X4 usage for the unique MACS1 Envs was 3/3 (100%). X4 usage predictions were most accurate when specificity levels were at 5–10%. Increasing the stringency of predictions by decreasing the specificity rate to <5% resulted in a concurrent decrease in prediction accuracy, but could be overcome by inputting clinical data (CCR5 genotype and CD4+ counts).
While no single variant appears to affect the accuracy of coreceptor prediction in every genetic background, changing the Env V3 sequence entered into bioinformatic programs to assess the effects of specific amino acids on prediction accuracy in our R5X4 dataset identified several positions that may contribute to the accuracy of coreceptor prediction using bioinformatic approaches. Both the X4R5 and SINSI PSSM matrices incorrectly predict that MACS1 brain Env clone br6b-8 is R5-tropic. Env br6b-8 contains the consensus alanine at position 19 in the V3 loop, while the other eight MACS1 Env clones contain a valine at this position. Changing the amino acid sequence from our dataset of 30 unique V3 sequences from valine to alanine at position 19 also changes the PSSM prediction from X4- to R5-tropic in nine additional Envs. These results suggest that an alanine-to-valine change at position 19 may be associated with a change from R5- to X4-tropism in sequences from the training sets of both matrices. Another interesting finding from the PSSM matrix predictions is that a glutamine-to-lysine variant at position 32 in the V3 loop changed the prediction for our set of MACS-1 Env clones from R5-tropic to X4-tropic. This trend is applicable to additional V3 sequences from our dataset that contain the lysine variant at position 32. These results suggest that inclusion of V3 amino acid sequences from more X4 or R5X4 Envs to algorithm training sets will further increase the prediction accuracy of these matrices.
In summary, we identified R5X4 HIV-1 in brain from a patient with severe HAD. Comparison of the ability of bioinformatic algorithms to correctly predict X4 usage by R5X4 Envs in our dataset showed that the frequency of R5X4 HIV-1 is underestimated by most commonly used predictive algorithms. SVMgeno2pheno is the most accurate predictor of CXCR4 usage by R5X4 HIV-1 in brain and other tissues. Bioinformatic prediction tools provide a convenient method to screen for coreceptor usage, an issue of increasing importance for clinicians considering the use of CCR5 antagonists in HIV-infected patients. It will therefore be important for future studies to increase X4 and R5X4 sequences associated with bioinformatic algorithm training set sequences in the development of prediction tools in order to better define the patterns of amino acid variation that contribute to inaccurate predictions.
Sequences reported here were assigned GenBank accession numbers EU401895-EU401904.
We thank Mark Jensen for helpful discussions. This work was supported by NIH Grants NS37277 and MH83588. M.M. was supported in part by NIH fellowship 1F31NS060611-01. P.R.G. is the recipient of an Australian National Health and Medical Research Council (NHMRC) R. Douglas Wright Biomedical Career Development Award and was supported, in part, by a grant from the Australian NHMRC (433915). Core facilities were supported by Harvard Medical School Center for AIDS Research (CFAR) and DFCI/Harvard Cancer Center grants.