|Home | About | Journals | Submit | Contact Us | Français|
To identify a pre-HAART gene expression signature in peripheral blood mononuclear cells (PBMCs) predictive of CD4+ T-cell recovery during HAART in HIV-infected individuals.
This retrospective study evaluated PBMC gene expression in 24 recently HIV-infected individuals before the initiation of HAART to identify genes whose expression is predictive of CD4+ T-cell recovery after 48 weeks of HAART.
The change in CD4+ T-cell count (ΔCD4) over the 48-week study period was calculated for each of the 24 participants. Twelve participants were assigned to the ‘good’ (ΔCD4 ≥ 200 cells/μl) and 12 to the ‘poor’ (ΔCD4 < 200 cells/μl) CD4+ T-cell recovery group. Gene expression profiling of the entire transcriptome using Illumina BeadChips was performed with PBMC samples obtained before HAART. Gene expression classifiers capable of predicting CD4+ T-cell recovery group (good vs. poor), as well as the specific ΔCD4 value, at week 48 were constructed using methods of Class Prediction.
The expression of 40 genes in PBMC samples taken before HAART predicted CD4+ T-cell recovery group (good vs. poor) at week 48 with 100% accuracy. The expression of 22 genes predicted a specific ΔCD4 value for each HIV-infected individual that correlated well with actual values (R = 0.82). Predicted ΔCD4 values were also used to assign individuals to good vs. poor CD4+ T-cell recovery groups with 79% accuracy.
Gene expression in PBMCs can be used as biomarkers to successfully predict disease outcomes among HIV-infected individuals treated with HAART.
HIV-infected individuals who successfully suppress HIV replication while receiving HAART but minimally increase CD4+ T-cell counts in the peripheral blood are characterized as having poor immune recovery. Multiple definitions of poor immune recovery exist, including the failure to increase CD4+ T-cell counts within the first year of HAART above a certain threshold (i.e., 200 cells/μl) or by greater than 50, 100, or 200 cells/μl compared with counts taken before HAART initiation (ΔCD4) [1–5]. Poor CD4+ T-cell recovery is a common and significant health problem for HIV-infected patients, affecting close to a third of some cohorts [1,2]. Piketty et al.  demonstrated that the relative risk of clinical progression (AIDS-defining event or death) was 13.3-fold higher for patients with ΔCD4 less than 100 cells/μl compared with patients with increases above this threshold. Multiple factors have been associated with poor CD4+ T-cell recovery during HAART and include increasing age, hepatitis C virus co-infection, HAART regimen, persistent low level virus replication, decreased CD4+ T-cell proliferation, and host genetic variation .
A variety of disease states including viral infections [6–9], malignancies [10–12], and mental disorders [13,14] can modulate gene expression in peripheral blood mononuclear cells (PBMCs), as these cells circulate systemically. Therefore, PBMC gene expression profiling has been used to construct classifiers capable of diagnosing disease states, predicting disease outcomes and determining patient response to drug therapy [15,16]. Advantages of using PBMCs include their accessibility, ease of isolation from whole blood, and large yields per patient for subsequent RNA extraction.
Vahey et al.  previously analyzed gene expression in PBMC samples taken from 48 HIV-infected patients electing to discontinue HAART in the AIDS Clinical Trials Group (ACTG) Study A5170. Good outcome (N = 24) was defined as a decline of less than 20% in the CD4+ T-cell count over the 24 weeks following HAART discontinuation and poor outcome (N = 24) as a decline of more than 20%. Prediction analysis of microarrays in R  was used to identify 53 genes whose expression at HAART discontinuation could predict with 81% accuracy those patients who would later progress to the good vs. poor outcome at week 24. Interrupting HAART is not a viable therapeutic approach for HIV-infected patients, as it results in significant virus rebound [18,19] and increased morbidity and mortality . Of greater clinical relevance would be the ability to predict, before HAART initiation, those HIV-infected patients who will later exhibit poor CD4+ T-cell recovery. To this end, we have analyzed gene expression in PBMC samples from 24 HIV-infected patients in the Acute Infection and Early Disease Research Program (AIEDRP) and constructed gene expression classifiers capable of predicting the extent of CD4+ T-cell recovery.
HIV-infected male participants in the San Diego AIEDRP cohort were retrospectively selected for this study The study period was delineated by CD4+ T-cell counts taken before the start of HAART and after 48 weeks of HAART, which were then used to calculate ΔCD4. From a total of 328 patients, 98 were selected who were HAART-naive before enrollment, continuously adhered to HAART, developed (<50 HIV RNA copies/ml) and maintained (no subsequent concurrent measurements of >200 HIV RNA copies/ml) complete viral suppression during the study period, and had viably stored PBMC samples for microarray analysis taken within 2 weeks of the start of HAART. To focus on HIV-infected participants with lower CD4+ T-cell counts before HAART, 24 participants with baseline counts less than 500 CD4+ T cells/μl were selected for gene expression analysis. All but one of the study participants were treated continuously over the study period with a protease inhibitor (PI)-based or non-nucleoside reverse transcriptase inhibitor (NNRTI)-based HAART regimen. A single patient started therapy with a triple nucleoside reverse transcriptase inhibitor regimen but was switched to an NNRTI-based regimen during the study period (abacavir/lamivudine/zidovudine). Twelve patients belonged to the good (ΔCD4 ≥ 200 cells/μl) and 12 to the poor (ΔCD4 < 200 cells/μl) CD4+ T-cell recovery group using the ΔCD4 threshold proposed by Haas et al. .
Viable PBMCs from the 24 participants were obtained by rapidly thawing cryopreserved samples at 37°C. RNA was isolated from PBMC samples using RNeasy Mini Kits (QIAGEN, Germantown, Maryland, USA) and its quality was assessed by calculating an RNA integrity number (RIN) using the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, California, USA). RNA from all 24 samples was deemed of sufficient quality (mean RIN 8.15 ± 0.93) for microarray gene expression analysis and was used to generate cDNA. Biotinylated labeled cRNA was generated from cDNA for hybridization to HumanWG-6 v3 Expression BeadChips (Illumina, San Diego, California, USA) and the expression analysis of 48 803 transcripts. Raw gene expression data were log2 transformed and robust spline normalized using the Bioconductor package lumi  in R (version 2.8.0). The quality of microarray data was confirmed using MA-plots constructed using the affyPLM package . Genes whose expression was not detected in any of the samples were removed from further analysis. Class discovery analysis clustered samples based on their gene expression in an unsupervised manner and identified batch effects that were removed using ComBat . Gene expression data are available at the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE19087.
Supervised methods of Class Prediction in BRB-ArrayTools  were used to construct gene expression classifiers that could assign HIV-infected patients to a discrete CD4+ T-cell recovery group (good vs. poor). Classifiers composed of different numbers of genes were constructed by Recursive Feature Elimination (RFE) and their ability to predict CD4+ T-cell recovery was assessed by several multivariate classification methods [i.e., diagonal linear discriminant analysis, support vector machines (SVMs), and compound covariate, nearest neighbor and centroid predictors) in a leave one out cross-validation (LOOCV) approach.
The least angle regression (LAR) plugin in BRB-ArrayTools was used to implement the LASSO algorithm, which develops a linear model for predicting a continuous response variable (ΔCD4) from gene expression data within a cross-validated framework. LASSO avoids the overfitting characteristic of least squares linear regression when the number of genes is large compared with the number of samples . For more information regarding the above statistical methods, please refer to the BRB-ArrayTools manual .
The HIV-infected participants in the good (ΔCD4 ≥ 200 cells/μl) and poor (ΔCD4 < 200 cells/μl) CD4+ T-cell recovery groups were compared for differences in demographic, virological, and immunological data (Table 1). Participants in the good recovery group had significantly lower CD4+ T-cell counts and higher viral loads before the start of HAART None of the other parameters were significantly different between recovery groups.
Microarray gene expression data for the entire transcriptome were generated for each participant using PBMC samples taken before HAART. The ability of the expression of different numbers of genes to predict whether a participant would progress to the good vs. poor CD4+ T-cell recovery group at week 48 was assessed using different multivariate classification methods in a LOOCV approach. Classification accuracy first reached 100% when the expression of 40 genes was used with the SVM multivariate classification method (Fig. 1a). Descriptions of these 40 genes are presented in Supplementary Table 1. No other multivariate classification method attained an accuracy of 100% for assigning patients to CD4+ T-cell recovery groups.
In addition to predicting CD4+ T-cell recovery group, gene expression prior to HAARTwas used to predict the specific ΔCD4 value for each participant at week 48. Twenty-two genes were identified in the final model using the LASSO algorithm whose expression predicted ΔCD4 values in a LOOCVapproach that correlated with actual values with an R = 0.82 as calculated by Pearson's correlation analysis (Fig. 1b). Descriptions of these 22 genes are presented in Supplementary Table 2.
Normally gene expression in PBMCs is used to predict whether patients will progress to one of two discrete classes . In this respect, we have used Class Prediction methods to identify 40 genes whose expression can predict with 100% accuracy whether an HIV-infected individual will progress to the ‘good’ or ‘poor’ CD4+ T-cell recovery group after 48 weeks of HAART (Fig. 1a). However, when CD4+ T-cell recovery groups were defined using a ΔCD4 threshold of 200 cells/μl, it is unclear whether this classifier is truly a prognostic classifier for recovery group or a diagnostic classifier for differences in clinical parameters at baseline (Table 1). Class Prediction analysis was repeated using a different ΔCD4 threshold of 100 cells/μl  to define good (N = 17) and poor (N = 7) recovery groups, as there were no differences in baseline statistics between groups at this threshold. Despite unbalanced numbers between recovery groups, 10 genes were identified that could predict recovery group with 92% accuracy (data not shown), indicating that gene expression before HAART initiation does indeed have value for predicting recovery group.
Using gene expression to predict the specific ΔCD4 value for each participant overcomes limitations associated with imposing a dichotomous threshold on a continuous variable in order to define outcome groups , and thus any significant differences in clinical data at baseline between these groups. The LASSO algorithm was used to select a final model of 22 genes whose expression levels were able to predict ΔCD4 values with good correlation (R = 0.82) to actual ΔCD4 values in a LOOCVapproach (Fig. 1b). Predicted ΔCD4 values were able to assign HIV-infected participants to good and poor recovery groups based on the threshold of 200 cells/μl with 79% accuracy. With ΔCD4 thresholds of 100 and 300 cells/μl, the classification accuracies were 75 and 88%, respectively (data not shown).
The mechanistic implications for CD4+ T-cell recovery of the genes identified in this study are unclear (Supplementary Tables 1 and 2). Genes were selected based on their expression in PBMC samples and their ability to predict CD4+ T-cell recovery. Gene expression should be analyzed in the CD4+ T-cell subset, as it is directly related to the disease phenotype, in order to identify genes that drive CD4+ T-cell recovery. Genes differentially expressed in this subset between recovery groups (good vs. poor) following 48 weeks of HAART should be mapped to biological pathways and gene ontologies to elucidate the mechanism of CD4+ T-cell recovery.
In the future, gene expression classifiers may be formulated into bench-top assays for use in the HIV clinic to identify patients at risk of poor CD4+ T-cell recovery. Prior to use in the HIV clinic, the accuracy of the gene expression classifiers constructed in this study must be internally validated by analyzing a greater number of HIV-infected individuals in the AIEDRP cohort. Additionally, classifiers must be externally validated using patients from an unrelated HIV-infected cohort to avoid participant selection or demographic biases in the San Diego AIEDRP cohort . In summary, the utility of gene expression data in HAART-naive HIV-infected individuals to predict the future course of disease has been clearly demonstrated.
Gene expression data were generated via funding through a Developmental Grant from the Center for AIDS Research (CFAR) at the University of California San Diego (UCSD). This work was performed with the support of the Genomics Core at the UCSD CFAR, the San Diego Veterans Medical Research Foundation, National Institutes of Health research grants (AI69432, AI043638, MH62512, MH083552, AI077304, AI36214, AI047745, AI007384 and AI74621), and a research grant from the California HIV/AIDS Research Program (RN07-SD-702). Microarray hybridization and scanning was performed at the UCSD Biomedical Genomics (BIOGEM) core facility with the help of Dr Gary Hardiman (Director) and James Sprague. Class Prediction analysis was performed using BRB-ArrayTools developed by Dr Richard Simon and the BRB-ArrayTools Development Team. We would like to thank the reviewers of this manuscript and Dr Sanjay Mehta whose comments resulted in a more coherent presentation of our results.
C.H.W. conceived the study, analyzed patient and microarray data, wrote the original manuscript, and addressed the reviewers' comments. N.B.B. analyzed microarray data and generated the figures presented in the manuscript. P.D. and S.E.R. extracted and assessed the quality of RNA from PBMC samples. Y.Z. implemented the LASSO algorithm for the prediction of ΔCD4 from gene expression data. M.G. and J.L. advised on experimental design. J.P. performed statistical analyses. M.G., D.D.R., D.S., and S.J.L. provided access to blinded patient clinical data and aided in the interpretation of gene expression data.
Although not a direct conflict of interest, C.H.W., M.G., D.D.R., D.S., and S.J.L. have received research support from Pfizer Inc. In addition, S.J.L. has served on the clinical advisory board for Monogram Biosciences Inc. and received research support from Merck Laboratories. D.D.R. is also a consultant for Theraclone, Myriad, Bristol-Myers Squibb, Anadys Pharmaceuticals Inc., Gilead Sciences, Hoffman-La Roche Inc., Merck and Co. Inc., Monogram Biosciences, Biota, Chimerx, Idenix and Gen-Probe. J.L. is employed by Illumina Inc., manufacturer of the microarray platform used in this study, but this does not represent a conflict of interest, as this platform was selected prior to his inclusion on the project and he did not perform any data analysis.