|Home | About | Journals | Submit | Contact Us | Français|
To evaluate a model for predicting time to AIDS or death among HIV-infected persons initiating highly active antiretroviral therapy (HAART).
The model was constructed from 1,891 HAART initiators in the Collaborations in HIV Outcomes Research/US (CHORUS) cohort. The model's predictive ability was assessed using internal bootstrap validation techniques and data from 716 HAART initiators at Johns Hopkins HIV Clinical Cohort (JHHCC) in whom HIV disease was, in general, more advanced.
The estimated concordance statistic was 0.632 with the bootstrap method and 0.625 in JHHCC. Mean predicted and observed 3-year AIDS-free survival for JHHCC was 0.76 and 0.73 (95% CI 0.69−0.77), respectively; mean predicted and observed 5-year AIDS-free survival was 0.69 and 0.57 (95% CI 0.52−0.62). Sensitivity analyses showed that the discrepancy between predicted and observed AIDS-free survival after 3 years could be due to differences in lost-to-follow-up rates between cohorts.
The model was fair at using baseline characteristics to order patients’ risk of disease progression, but did not accurately predict AIDS-free survival >3 years after HAART initiation. Different variable definitions, patient characteristics, and loss-to-follow-up highlight the challenges of using data from one cohort to predict AIDS-free survival in an independent cohort.
Observational cohort studies have been vital to our understanding of the natural history of HIV disease progression, and in the identification of factors associated with prognosis after antiretroviral therapy initiation [1-2]. Recognizing the importance of such studies, the National Institute of Allergy and Infectious Diseases at the U.S. National Institutes of Health have sponsored the International Epidemiologic Databases to Evaluate AIDS (IeDEA), a global consortium to combine data and answer questions about the HIV epidemic (www.iedea-hiv.org) [3-4].
Combining data from heterogeneous sites or applying knowledge learned from one site to another can be challenging, as patient characteristics, clinical practice patterns, and even HIV-1 strains can differ between cohorts. Nevertheless, patient characteristics, especially immunologic status, at the initiation of highly active antiretroviral therapy (HAART) have been shown to be associated with subsequent AIDS defining events (ADE) and death [2,5-11].
Using data from the Collaborations in HIV Outcomes Research/US (CHORUS) cohort, we observed that percent (%CD4) and absolute (aCD4) CD4+ lymphocytes at HAART initiation, prior exposure to antiretroviral therapy (ART), and probable route of HIV acquisition (injection drug use [IDU] or non-IDU) were associated with subsequent progression to ADE/death in a multivariable model . Using these patient characteristics, as well as HIV-1 RNA at HAART initiation and demographic features, we created a prognostic model for ADE/death in HIV-infected persons initiating HAART. A calculator that uses this model to predict ADE-free survival probabilities up to seven years after HAART initiation is available at http://biostat.mc.vanderbilt.edu/HIVSurvivalPrediction.
The goal of this study was to assess how well our model predicted ADE/death in an independent cohort. First, we used re-sampling techniques to internally estimate our model's predictive ability when applied to independent data. Second, we applied our model to a cohort of HIV-infected persons initiating HAART in Baltimore, MD, the Johns Hopkins HIV Clinical Cohort (JHHCC) Adult HIV Clinic. The JHHCC was specifically chosen in order to evaluate the predictive model on a cohort different from CHORUS in terms of demographics, common routes of infection, and level of immunosuppression at HAART initiation. Finally, we explored possible reasons for the discrepancies between predicted and observed ADE/deaths in the JHHCC.
Both the CHORUS cohort and the JHHCC have been described in detail previously [13-14]. Briefly, our study included persons from both cohorts who were ≥ 18 years of age and initiated HAART between August 1, 1997 and January 1, 2005, and followed them through August 1, 2005. HAART was defined as at least two nucleoside reverse transcriptase inhibitors (NRTIs) in combination with at least one protease inhibitor (PI) and/or non-NRTI (NNRTI), or at least three NRTIs. To be included in this study, persons had to remain on their first HAART regimen for at least 30 days, have “baseline” (i.e., at time of HAART initiation) demographic data, aCD4, %CD4, and HIV-1 RNA values available within 180 days prior to or 7 days after the date of first HAART, and have a baseline HIV-1 RNA ≥ 5000 copies/mL. HAART was available for virtually all patients in care at all sites during the study period. The study of both cohorts was approved by local Institutional Review Boards, all persons had provided prior written informed consent for use of clinical data, and only aggregated non-identifiable patient data were used in the analysis.
A study event was defined as the first new AIDS defining event (ADE) or death after initiation of the first HAART regimen. ADEs were based on 1993 CDC classification criteria, excluding diagnoses based on aCD4 < 200 cells/mm3. Causes of death in CHORUS and in the JHHCC were determined and categorized as AIDS-related, not AIDS-related, or unknown using pre-specified criteria . At the JHHCC, deaths were determined by ongoing medical record review, and annual surveillance using the national social security index and the National Death Index. Persons who did not have an event were followed until the end of the study period, or until they were lost to follow-up, defined as the day of withdrawal from their cohort or the last clinic visit date for persons who did not have an encounter for nine months for the CHORUS cohort and 18 months for the JHHCC. Analyses were also performed, and are presented in Section 3.3, using the CHORUS definition of lost to follow-up for both cohorts. Secondary analyses added 4.5 months of follow-up time to those lost to follow-up in both cohorts, yielded similar results, and are not presented.
Our prediction model was originally developed to measure the association between %CD4 and ADE/death after adjusting for aCD4 and other potential confounders . A Cox proportional hazards model was fit to the CHORUS data including the predictor variables baseline %CD4, aCD4, log10-transformed plasma HIV-1 RNA, age, sex, race (non-Hispanic white or non-white by self report), prior ART use, and probable route of HIV infection (IDU vs. other routes). These variables were chosen for inclusion in the model a priori, and were included in all candidate models regardless of their statistical significance. Percentage CD4 and aCD4 were square-root transformed to normalize the data. To avoid assuming linearity in the hazard, continuous predictors (%CD4, aCD4, VL, and age) were expanded by fitting restricted cubic splines. For candidate models, each continuous predictor was assigned the same number of knots (0, 3, 4, or 5). The final model used 3 knots (the optimal number of knots based on the the Akaike information criteria; knots were located at the default values 10th, 50th, and 90th percentiles). An interaction term between %CD4 and aCD4 was also included in the model. Based on evidence of a possible violation of the proportional hazards assumption , the models were stratified on age quintiles, leaving age in the model as a continuous predictor to account for possible residual information [12,18-20]. All analyses were performed using R statistical software (Version 2.3.1; available at http://www.r-project.org).
The predictive ability of the model was assessed both internally and externally using the following measures chosen a priori:
In addition, to further assess our model's predictive ability in the JHHCC, predictive accuracy was computed by comparing the mean predicted survival for the entire JHHCC over the follow-up period with the Kaplan-Meier ADE-free survival curve.
Measures of a model's predictive ability produced by applying the original model to the original data tend to be over-optimistic, i.e. to predict that the model will fit better than it actually does in independent data . The validity of a model applied to independent data can be assessed fairly using re-sampling techniques (bootstrapping) . From the original 1891 rows (patients) in the CHORUS dataset, 1891 rows were randomly sampled with replacement creating bootstrapped data. Model selection techniques identical to those used to create the original model were used to fit a new model to the bootstrapped data. The entire process was repeated to produce a total of 1000 bootstrap replications. Details are described elsewhere .
In each bootstrap replication, our measures of the model's predictive ability (measures 1, 2, and 3, discussed above) were computed using the bootstrap model applied to the bootstrapped data. Then the same measures were computed using the bootstrap model applied to the original data. The average difference across bootstrap replications between the measures applying the bootstrap model to the original data versus applying the bootstrap model to the bootstrapped data was used to estimate the measures’ optimism. The original model's predictive ability when applied to independent data was then estimated by subtracting a measure's estimated optimism from the original model's measure of predictive ability .
We tested whether the c-statistic indicated better than chance concordance by performing a permutation test: Each person's follow-up time and event indicator were randomly shuffled and assigned as a pair to another person's predictors. The model was fit following all model selection procedures described above to the permuted data and the c-statistic was computed. This process was repeated 1000 times, and the corresponding p-value was the proportion of permutations resulting in a c-statistic greater than the c-statistic from the original data.
Sensitivity analyses were performed by creating a new model from the CHORUS data in which each person lost to follow-up was simulated to live or have an event (ADE or death), based on a fixed event probability chosen to be the same for all patients lost to follow-up (i.e., not conditional on patient characteristics). If the patient was randomly selected to have an event, the time until their event was then randomly drawn from a uniform distribution between 0 and 1 years after the date of their last visit. The prognostic model was then re-fit to this new CHORUS data and applied to predict mean ADE-free survival in the JHHCC. This process was repeated 100 times, and the average of the 100 repetitions was reported as the mean predicted ADE-free survival for the JHHCC based on the assigned probability of having an ADE/dying if lost to follow-up.
In the CHORUS data, the proportion of all patient pairs in which the ordering of predictions and outcomes were concordant was c=0.678. Based on the bootstrap, the expected concordance when applying the model to independent data was c=0.625. This suggests that the model is substantially better than chance at using patient characteristics to order their clinical prognosis (p<0.001).
Figure 1a demonstrates the predictive accuracy of the model. CHORUS participants were divided into 5 categories based on their predicted ADE-free survival, and the figure plots mean predicted 3-year ADE-free survival probability versus the Kaplan-Meier estimate (and 95% confidence interval) for each group. If there were perfect calibration, then the two values would be equal (i.e., fall on the dashed line). The observed ADE-free survivals for given predicted ADE-free survival probabilities when the model is applied to independent data based on the bootstrap are shown as ‘X’ in the plot. Based on the bootstrap, for individuals with high or low predicted ADE-free survival probabilities, the model is expected to over-estimate or under-estimate, respectively, the true ADE-free survival probability. This suggests that there is some over-fitting, which is further supported by a shrinkage estimate based on the bootstrap of 0.60.
To summarize, based on bootstrap methods of validation, the prognosis model is expected to have fair agreement with observed ADE-free survival when applied to independent data, but is expected to slightly over-estimate ADE-free survival probability for individuals predicted to have a high ADE-free survival probability and to underestimate ADE-free survival for those predicted to have a low ADE-free survival probability.
We next applied our model to predict ADE-free survival probabilities for persons initiating HAART in the JHHCC. Table 1 shows characteristics of CHORUS and the JHHCC. The JHHCC included fewer persons than CHORUS (716 versus 1891), and had shorter follow-up time (median of 2.7 years versus 4.6 years; p<0.0001). Persons initiating HAART at JHHCC had more advanced HIV disease as indicated by lower aCD4 (median 133 vs. 240) and %CD4 (median of 12 vs. 16), and higher plasma HIV-1 RNA (median log10 copies/mL=4.9 vs. 4.7) (p<0.0001 for each). Persons in the JHHCC were also more likely to be female (35% vs. 10%), non-white (82% vs. 28%), ART naïve before initiation HAART (60% vs. 47%), and to have IDU as a potential route of infection (46% vs. 6%) (p<0.0001 for each).
ADE-free survival in the JHHCC was lower than that observed in CHORUS: 38% of HAART initiators had an event during study follow-up, compared to 25% in CHORUS. Interestingly, the percentage of individuals with an ADE was fairly similar between cohorts (15% in CHORUS vs. 12% in JHHCC). However, 26% of individuals in JHHCC died, compared to 10% in CHORUS. It is worth noting that the causes of death were similar between the two cohorts; in CHORUS, 52% of deaths were AIDS related, 22% not AIDS related, and 26% unknown; whereas in the JHHCC, 53% of deaths were AIDS related, 20% were not AIDS related, and 26% were unknown.
We evaluated our model's predictive ability on the JHHCC by computing the same statistics discussed above. The estimated concordance was 0.632, indicating that our model correctly ordered HAART initiators according to their ADE-free survival 63.2% of the time. This concordance was very similar to that predicted based on the bootstrap validation (c=0.625).
Figure 1b shows the mean predicted 3-year ADE-free survival based on the CHORUS model applied to the JHHCC data, plotted against the actual 3-year ADE-free survival in the JHHCC (Kaplan-Meier estimates and 95% confidence intervals). This plot is very similar to what was expected based on the bootstrap validation (Figure 1a), with those predicted to have very low and very high 3-year ADE-free survival tending to have slightly higher and slightly lower observed ADE-free survival, respectively. Shrinkage in the JHHCC was estimated as 0.58, similar to the bootstrap validation estimate, 0.60, and consistent with Figures 1a and 1b.
Figure 2 shows the estimated ADE-free survival for CHORUS and for JHHCC compared with the mean predicted ADE-free survival in the two arms. The Kaplan-Meier estimate of ADE-free survival probability in the JHHCC was greater than predicted during the first year after HAART initiation, and then much less than predicted more than 3−4 years after HAART initiation. In the JHHCC, the Kaplan-Meier estimates (95% CI) for ADE-free survival at 3, 4, and 5 years were 0.73 (0.69, 0.77), 0.64 (0.60, 0.68), and 0.57 (0.52, 0.62), respectively, whereas the mean predicted ADE-free survival probabilities at those times were 0.76, 0.72, and 0.69. Kaplan-Meier estimates beyond 5 years of follow-up in the JHHCC were quite low, but based on few patients.
To summarize, as expected based on our internal validation, our prognostic model had fair concordance with observed ADE-free survival when applied to independent data from JHHCC. The model slightly over-estimated the 3-year ADE-free survival probability for those predicted to have a high ADE-free survival based on their characteristics at HAART initiation, and slightly under-estimated the 3-year ADE-free survival for those predicted to have low ADE-free survival. However, Figure 2 demonstrates that our model was poor at predicting ADE-free survival for patients 4 or more years after HAART initiation.
We explored reasons why the predicted and observed ADE-free survival differed >3 years after HAART initiation in the JHHCC. First, we added new variables to our predictive model and re-fit it to the CHORUS cohort, in hopes that the inclusion of these additional variables might result in a predictive model that would remove some of the difference between the mean predicted and observed ADE-free survival when applied to the JHHCC. A model which included first HAART regimen in addition to all other covariates did not improve the fit (Table 2).
In a recent study, the ART-CC found that the incidence of death for IDUs was roughly constant over 6 years, whereas the incidence of death for non-IDUs dropped sharply during the first year and then remained much lower than that of IDUs . Therefore, we hypothesized that a reason for the difference in ADE-free survival between the two cohorts could be due to the difference between incidence rates (baseline hazards) between IDUs and non-IDUs, particularly because in JHHCC 46% were IDUs versus 6% in CHORUS. Although IDU was included and found to be a significant risk factor in the original model fit to the CHORUS data, we did not stratify by (fit a separate baseline hazard for) probable route of infection because there was little evidence to suggest that the IDU hazard ratio changed with time (p=0.6). However, we considered the possibility that this result was due to insufficient power and re-fit the model to CHORUS data, stratifying over IDUs in addition to age categories. This did not account for the difference in observed and predicted ADE-free survival (Table 2). As seen in Figure 3, although the incidence rates were clearly non-proportional between cohorts, they were roughly parallel for IDUs and non-IDUs within each cohort.
We also hypothesized that differences in adherence rates could explain the difference between ADE-free survival in CHORUS and JHHCC. Note that a measure of adherence does not belong in a prognostic model, as adherence is unknown at the time of HAART initiation. This model was for exploratory purposes; including additional baseline covariates which help predict adherence would be appropriate for our prognostic model. Adherence, measured as the number of detectable (>400) viral load measurements divided by the total number of viral load measurements after HAART initiation, was associated with prognosis in CHORUS (HR=0.20 comparing 82% and 3% viral loads suppressed (third and first quartiles), 95% CI=0.16−0.25). However, the model which included adherence did not account for the difference between predicted and observed ADE-free survival in the JHHCC (Table 2), and in fact made the difference greater, as patients at CHORUS tended to have a higher percentage of detectable viral loads post-HAART (Table 1).
The largest discrepancy between the CHORUS and JHHCCs was the difference in lost to follow-up rates: 23% in CHORUS and only 5% in JHHCC. This difference appeared to be due in part to different definitions of lost to follow-up: 9 months without a visit for CHORUS vs. 18 months for JHHCC. If we used the 9-month lost to follow-up definition for JHHCC, then 22 individuals who died more than 9 months after their last visit were recorded as lost to follow-up. This slightly increased the ADE-free survival estimates for the JHHCC, but the ADE-free survival estimates still remained lower than the mean predicted ADE-free survival probabilities (Table 2).
It is possible that many of those lost to follow-up in the CHORUS cohort actually progressed to AIDS or died. However, based on a multivariable logistic regression model, those lost to follow-up in CHORUS tended to be male, white, and have higher aCD4 counts (P<0.05 for all); characteristics associated with longer ADE-free survival. We were able to more closely examine those lost to follow-up while in care at the Comprehensive Care Center (CCC), Nashville, TN. CCC patients accounted for 38% (716/1891) of the CHORUS population. Of these patients, 13% (96/716) were lost to follow-up, which is lower than the 29% (344/1175) observed among non-CCC sites in CHORUS. Using a social security index death search, we found that 7 of our 96 lost to follow-up patients had died, but in all cases the date of death was greater than 9 months after the date of their last visit. Only two persons had dates of death less than 18 months after their last visit. We also discovered that 2 persons who had originally been censored at the study freeze date actually died 67 and 110 days before the close of the study. Therefore, from our further inspection of CCC data, we were only able to find two additional deaths. We were unable to obtain additional data on potential ADEs for those lost to follow-up.
A sensitivity analysis was performed to investigate the impact of under-ascertainment of ADE or death for those lost to follow-up. In order for the mean predicted ADE-free survival to be consistent (lie within the 95% confidence intervals) with the observed ADE-free survival at years 3, 4, and 5 using the corrected data (censoring the 22 JHHCC deaths as lost to follow-up and including the 2 additional CHORUS deaths), at least 50% of those lost to follow-up in CHORUS must have had an ADE or died within one year of their last visit (Table 2). In order to predict the same ADE-free survival as observed at 5 years, 90% of those lost to follow-up in CHORUS must have had and ADE or died within one year of their last visit.
Using re-sampling techniques and an independent dataset from JHHCC, we have described the prognostic ability of a model to predict ADE-free survival based on patient characteristics at initiation of HAART. In short, our model was fair at ordering patients’ risk of disease progression based on characteristics at HAART initiation, but was not good at predicting ADE-free survival more than 3 years after HAART initiation in an independent cohort with different characteristics.
Several measures of the model's predictive ability were remarkably consistent between those estimated internally using the bootstrap and those seen in the JHHCC data. Bootstrapping estimated that the proportion of all patient pairs in which the predictions and outcomes would be concordant would be approximately 63%; the concordance measure of our model in the JHHCC was 63%. Bootstrapping estimated that those patients predicted by the model to have low 3 year ADE-free survival probability would tend to do a little better than predicted, with an estimated shrinkage coefficient of 0.60. This was also seen in the JHHCC data; the estimated shrinkage was 0.58.
However, the model was poor at predicting the ADE-free survival probability beyond 3 years after HAART initiation. It is expected that predictive accuracy deteriorates with time, and the JHHCC survival estimates 5 or more years after HAART initiation were based on a relatively small number of patients. Nonetheless, the difference between cohorts is still notable. Those in the JHHCC were much more likely to die than predicted by our model. It does not appear to be due to differing initial HAART regimens, subsequent adherence rates, or the high number of IDUs in the JHHCC. The JHHCC tended to have more advanced HIV disease at initiation of HAART, as measured by lower aCD4, lower %CD4, and higher HIV-1 RNA. Though these covariates were included in the prediction model, perhaps the CHORUS dataset from which the model was created did not include enough patients with similar characteristics, and thus did not create a good model for patients in the more advanced stages of HIV disease who initiate HAART, and/or for IDUs and non-whites. Although we believe standards of care were similar between sites, unmeasured differences could have influenced results. Between-site differences in the rate of disease progression could also be due to other unmeasured factors such as social deprivation and community-level mortality rates.
It seems probable that the discrepancy between predicted and observed ADE-free survival in the JHHCC was at least in part due to differences in lost-to-follow-up rates. Some of the discrepancy in rates of loss to follow-up were due to differences between the cohorts’ standard definitions for loss to follow-up (9 and 18 months without a visit for CHORUS and JHHCC, respectively). Definitions of loss to follow-up vary substantially between cohorts (e.g., ), and this is an important source of heterogeneity that should be considered when taking results from one cohort and then applying them to another. However, when using a common definition for loss to follow-up (9 months without a visit) the rate of loss to follow-up was still substantially higher in CHORUS than JHHCC, and approximately 50% of those lost would have had to have had an ADE or died in order for the predicted ADE-free survival estimates (derived from CHORUS) to be consistent with the observed estimates in JHHCC.
There are other models for predicting ADE-free survival, most notably the model from the ART-CC [2, 25]. Our models are similar in concept but constructed differently. ART-CC used parametric survival models (Weibull), we used semi-parametric models (Cox); ART-CC grouped continuous predictors into categories, we employed restricted cubic splines; ART-CC removed covariates that were poor predictors of progression from the model, we left them in; ART-CC chose a final prognostic model using a leave-one out cross-validation system, we used a model previously chosen based on Akaike's information criteria (no cross-validation) to evaluate the association between %CD4 and ADE/death. There are advantages and disadvantages to the different modeling approaches with regards to model flexibility, efficiency, parsimony, and clinical interpretation which have been discussed elsewhere [18-19,26-28]. We were unable to perform a head-to-head comparison of the models as one of the key predictors from ART-CC, clinical stage at HAART initiation, was not available. However, comparing the measures of prognostic accuracy for our model with those by the ART-CC , their model appeared to perform better. Their estimated c-statistic applied to independent data was 0.73 compared to ours of 0.63, perhaps reflecting the importance of including clinical disease stage; their model showed little evidence of over-fitting, whereas ours did, perhaps due to the overuse of spline terms and implying we should have fit a more parsimonious model; and the mean predicted ADE-free survival for the ART-CC model applied to CASCADE data was quite close to the Kaplan-Meier estimates throughout the 3 years of follow-up. Importantly, it was for greater than 3 years of follow-up that our model had especially poor prediction when applied to JHHCC data.
Perhaps the most important information gleaned from this study was simply that HIV cohorts can differ greatly, and possibly in ways that are not measured by the several variables that are typically considered prognostic. This study highlighted some of the challenges of creating a model from one cohort and trying to predict observations in another. We believe that the difference between predicted and observed ADE-free survival >3 years after HAART initiation in the JHHCC was primarily due to differences between the CHORUS and JHHCC that no reasonable model would be able to overcome. These differences included substantially different loss-to-follow-up rates, patient demographics, and levels of immunosuppression at HAART initiation. As we continue to combine and apply information across cohorts, such as has been done in the ART-CC and is now being done in IeDEA, this cross-cohort heterogeneity needs to be recognized.
Financial support: Vanderbilt-Meharry Center for AIDS Research (NIH program 930 AI54999) and National Institutes of Health (grant K23 AT002508-01 to T.H., grant K24 A1065298 to T.R.S., and grants K24 DA00432 and R01 DA11602 to R.D.M.)
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.