|Home | About | Journals | Submit | Contact Us | Français|
Targeted screening remains an important approach to human immunodeficiency virus (HIV) testing. The authors aimed to derive and validate an instrument to accurately identify patients at risk for HIV infection, using patient data from a metropolitan sexually transmitted disease clinic in Denver, Colorado (1996–2008). With multivariable logistic regression, they developed a risk score from 48 candidate variables using newly identified HIV infection as the outcome. Validation was performed using an independent population from an urban emergency department in Cincinnati, Ohio. The derivation sample included 92,635 patients; 504 (0.54%) were diagnosed with HIV infection. The validation sample included 22,983 patients; 168 (0.73%) were diagnosed with HIV infection. The final score included age, gender, race/ethnicity, sex with a male, vaginal intercourse, receptive anal intercourse, injection drug use, and past HIV testing, and values ranged from −14 to +81. For persons with scores of <20, 20–29, 30–39, 40–49, and ≥50, HIV prevalences were 0.31% (95% confidence interval (CI): 0.20, 0.45) (n = 27/8,782), 0.41% (95% CI: 0.29, 0.57) (n = 36/8,677), 0.99% (95% CI: 0.63, 1.47) (n = 24/2,431), 1.59% (95% CI: 1.02, 2.36) (n = 24/1,505), and 3.59% (95% CI: 2.73, 4.63) (n = 57/1,588), respectively. The risk score accurately categorizes patients into groups with increasing probabilities of HIV infection.
In 2006, the Centers for Disease Control and Prevention (CDC) published revised recommendations for performing human immunodeficiency virus (HIV) testing in health-care settings in the United States, specifically suggesting the use of routine (nontargeted) opt-out HIV screening in all health-care settings where the undiagnosed prevalence is greater than or equal to 0.1% (1). The rationale for this approach included identifying more patients with HIV infection and identifying them earlier in the course of their disease, thus helping to mitigate individual morbidity and transmission of the virus (2). Unfortunately, nontargeted HIV screening has not been widely adopted in clinical practice (3–5), partly because it is operationally difficult (6) and costly (7, 8) and requires screening of a large number of persons in order to identify a modest number of newly diagnosed patients, in both high-prevalence and low-prevalence settings (7, 9).
In 2007, the World Health Organization and the US Preventive Services Task Force published recommendations for performing HIV testing in health-care settings, but contrary to the CDC recommendations, they endorsed targeted screening as the primary means of HIV testing (10, 11). These recommendations were put forth, in part, because of the paucity of data to support nontargeted screening in settings where the epidemic remains low-level or concentrated, including North America, Europe, and segments of other continents where a wide variation in the epidemiologic profile of HIV still exists (10, 12).
Although the concept of targeted HIV screening has existed for approximately 25 years and risk characteristics have been widely studied (13–15), specific targeted screening approaches remain largely undefined (16–19). Specifically, it remains unclear which criteria should be used to target patients, what the relative strengths of their associations are with HIV infection, and how they may be combined and incorporated into clinical practice to target an unselected population. A critical first step, however, is rigorous assessment of patient characteristics and their associations with HIV diagnosis. It would be useful to develop a tool that could be used to objectively estimate a patient’s risk of having undiagnosed HIV infection. As such, our goal in this study was to systematically evaluate a large number of characteristics and to derive and validate a clinically meaningful prediction tool with which to accurately categorize patients into risk groups for undiagnosed HIV infection.
We conducted an analysis of a large, prospectively collected data set from the Denver Metro Health Clinic, a sexually transmitted disease (STD) clinic administered by Denver Public Health in Denver, Colorado. The clinic is the largest of its kind in the Rocky Mountain region and serves over 10,000 patients annually, with an undiagnosed HIV prevalence of approximately 0.5%.
Consecutive patients aged 13 years or older who came to the STD clinic between January 1, 1996, and December 31, 2008, were included in the derivation sample. All patients visiting the STD clinic for outpatient HIV testing underwent structured health and behavioral screening as part of traditional HIV prevention counseling. Information on patient characteristics was collected by trained clinic staff using a computerized data collection system, allowing for real-time data entry.
As part of routine assessment, the following data were collected from each patient prior to testing: 1) demographic characteristics (age, gender, and race/ethnicity); 2) symptoms (rash, pruritus, genital discharge, and dysuria); 3) history of sexually transmitted infections (gonorrhea, chlamydia, herpes simplex, syphilis, and genital warts); 4) sexual history (lifetime number of partners and numbers of partners in the previous month and the previous 4 months); 5) specific sexual practices (vaginal intercourse, insertive or receptive oral intercourse, insertive or receptive anal intercourse, and condom use); 6) gender(s) of sexual contacts; 7) previous HIV testing history; and 8) other risk factors associated with the transmission of HIV infection (injection drug use, prostitution, or sexual contact with a prostitute or someone who injects drugs or is infected with HIV). All of these variables were available for analysis and served as candidate predictor variables during the development of the risk model. All patients underwent either conventional or rapid HIV testing, and all who tested preliminarily positive completed Western blot confirmatory testing. Confirmed HIV infection served as the outcome, or dependent variable.
Our modeling approach proceeded in a systematic fashion and included several sequential steps that are detailed in the Web Appendix (http://aje.oxfordjournals.org/) (20–22). We used multivariable logistic regression to develop the prediction model. Bivariate statistical testing or automated variable selection techniques were specifically not used to drive selection of variables in the model (23, 24). Instead, variables were included in the model using several sequential manual steps based on our knowledge of the epidemiology of HIV and known or hypothesized associations between patient characteristics and HIV infection. Variables were selected for inclusion based on their statistical associations, their influence on the regression coefficients of other included variables, and Akaike’s Information Criterion as a global test of model goodness of fit.
The first step consisted of including only patient demographic characteristics in the model. Because age was a continuous variable and its relation with HIV infection was expected to be nonlinear, we used fractional polynomials to model its relation with the outcome. Fractional polynomials evaluate sets of 1- and 2-order transformations using a limited set of potential transformations of a variable relative to the outcome. From these sets, a “best”-fit transformation is identified. This approach is considered superior to modeling of continuous data as linear or arbitrarily identifying cutpoints within the variable, because it accounts for potential nonlinear relations between the predictor and the outcome (25, 26). Using our results, we graphed the relation between age and HIV diagnosis to identify potential inflection points by which the probability of HIV diagnosis changed. As a result of this analysis, we categorized age into the following 4 mutually exclusive groups: <22 or >60 years, 22–25 or 55–60 years, 26–32 or 47–54 years, and 33–46 years.
Subsequent steps included the addition of variables related to sexual history, sexual orientation, sexual practices, other risk behaviors, history of sexually transmitted infections, and clinical symptoms. These variables were entered manually into the model in sequential groups. Combinations of variables were also evaluated (e.g., insertive and receptive oral intercourse combined as insertive or receptive oral intercourse). A full model was developed to include all statistically significant variables defined by a P value of 0.05 or less. Interaction terms were not evaluated for inclusion in the model given the large number of candidate variables and the complexity of developing and interpreting a model with interaction terms.
The internal validity of the model was assessed using 10-fold cross-validation. Calibration was assessed by graphically comparing predicted HIV prevalence with observed HIV prevalence, drawing a linear regression line through the points, and calculating its slope and R2. Discrimination was assessed by constructing a receiver operating characteristic (ROC) curve and calculating the area under the curve (27).
We then pruned the full model to develop the simplest model without losing its predictive ability. We retained the demographic variables and individually removed other risk score variables, beginning with those that had the weakest associations with HIV infection. Cross-validation was performed after removal of each variable, and new calibration and ROC curves were generated and compared with those of the preceding model. This pruning approach continued until the simplest model was created while ensuring maximal calibration and discrimination. A risk score was then created by multiplying the final model’s regression coefficients by 10 and rounding them to the nearest integer (28). The range of the risk scores was then examined to identify cutpoints for grouping patients into statistically unique categories based on HIV prevalence. Proportions are reported with 95% confidence intervals.
We estimated the necessary sample size for the derivation of the risk model to minimize the possibility of overfitting the multivariable logistic regression model. As a general rule, multivariable logistic regression analyses require an event (i.e., HIV infection)-to-predictor ratio of 10:1 (29). We evaluated approximately 50 candidate predictor variables, so we estimated that we would require at least 500 newly diagnosed HIV-infected patients in the derivation sample. Given an estimated prevalence of HIV infection from the STD clinic of 0.5%, we estimated that we would require approximately 100,000 patients in total for the derivation part of this study.
The derived risk score was then externally tested using observations from January 1, 1998, through June 30, 2010, from the emergency department at the University of Cincinnati Medical Center in Cincinnati, Ohio. The University of Cincinnati Medical Center is a 665-bed tertiary-care facility and level 1 trauma center. There are approximately 90,000 adult visits to the emergency department annually. A targeted emergency department HIV screening program, in continuous operation since 1998, is an adjunct clinical program of the emergency department and is staffed by trained counselors who provide testing (using conventional HIV enzyme immunoassay with confirmatory Western blot) and comprehensive prevention counseling (30). Patients are identified for testing based on review of triage notations, electronic medical records, or referral by emergency department staff. The primary means of selection include identification of risk characteristics, clinician concern, and patient request. Patients are also referred when identified as having signs or symptoms suggestive of HIV infection. Any of a broad list of criteria may prompt testing, but patients are not systematically assessed for all possible risk indicators. Those who consent undergo structured and comprehensive risk assessment in conjunction with prevention counseling. The undiagnosed HIV infection prevalence in this setting is approximately 0.7%.
The risk score was applied to each patient included in this validation sample. Patients were then grouped into the unique risk categories identified during model development, and the proportions within each group are reported as percentages with 95% confidence intervals. Similar to our approach to internal validation, calibration was assessed by graphically comparing predicted HIV prevalence with observed HIV prevalence, by generating a linear regression line, and by calculating its slope and R2. Similarly, discrimination was assessed by constructing an ROC curve and calculating the area under the curve.
All data were managed using Microsoft Access (Microsoft Corporation, Redmond, Washington), and statistical analyses were performed using SAS, version 9.2 (SAS Institute, Inc., Cary, North Carolina), Stata, version 10.1 (Stata Corporation, College Station, Texas), or SPSS, version 18 (SPSS Inc., Chicago, Illinois). Fractional polynomials using the “fracpoly” command in Stata were used to model continuous and ordinal variables (e.g., age and lifetime number of sexual partners) and the outcome (25, 31). Because the extent of missing data was small and likely not influential relative to the estimates from the regression analysis, we used a Markov chain Monte Carlo approach to multiple imputation as a means to provide unbiased estimates of regression coefficients in the final model and to confirm their stability between the complete-case analysis and the analysis where imputation was used (32, 33). In addition, an unconditional bootstrapping approach was used to estimate 95% confidence intervals for the regression coefficients of the final model (34). This study was approved by the institutional review boards from each institution.
The derivation sample consisted of 92,635 patients. Of these, 504 (0.54%) were diagnosed with HIV infection during the clinical encounter. The median age was 27 years (interquartile range, 22–35; range, 13–87), and 62% were male. In terms of race/ethnicity, 42% were white, 29% were Hispanic, 25% were black, and 4% represented another race or ethnicity.
Table 1 shows the distribution of all candidate predictor variables by HIV serostatus. Patients infected with HIV were generally older and more likely to be male, to be white, to have had a larger number of sexual partners, to have had sex with males, and to have participated in insertive or receptive oral or anal intercourse. Patients with HIV infection were also more likely to have injected drugs, been tested previously for HIV infection, used condoms, had a previous diagnosis of gonorrhea or syphilis, and served as a prostitute or had sex with an HIV-infected partner or someone who injected drugs. Genital rash was the only symptom that was more common among patients infected with HIV.
The following variables were independently associated with HIV diagnosis and were included in the full risk model: age, gender, race/ethnicity, sex with a male, sex with a female in the previous year, sex with an HIV-infected partner, vaginal intercourse, insertive anal intercourse, receptive anal intercourse, oral intercourse, injection drug use, prostitution in the previous year, prior HIV testing, and history of syphilis, genital rash, or genital discharge. The slope of this model was 0.82, its R2 was 0.88, and the area under the ROC curve was 0.86 (95% confidence interval (CI): 0.84, 0.88).
Table 2 shows the results of the final pruned risk model. This model included age (categorized as <22 or >60 years, 22–25 or 55–60 years, 26–32 or 47–54 years, or 33–46 years); gender (male or female); race/ethnicity (black, Hispanic, white, or other); sexual practices (sex with a male, receptive anal intercourse, and vaginal intercourse); and other risk behaviors (injection drug use or past HIV testing), with a composite score ranging from −14 to +81. This model demonstrated outstanding calibration (calibration regression slope of 0.95 and an R2 of 0.94) and discrimination (area under the ROC curve = 0.85, 95% CI: 0.83, 0.88).
All patients were then assigned scores according to the risk model and categorized into the following 5 mutually exclusive groups: <20 (very low risk); 20–29 (low risk); 30–39 (moderate risk); 40–49 (high risk); and ≥50 (very high risk). The HIV prevalences within these groups were 0.12% (95% CI: 0.08, 0.15) (n = 54/46,627), 0.24% (95% CI: 0.19, 0.30) (n = 78/32,446), 0.67% (95% CI: 0.48, 0.91) (n = 40/5,965), 2.28% (95% CI: 1.77, 2.89) (n = 66/2,897), and 5.66% (95% CI: 5.02, 6.36) (n = 266/4,700), respectively (Figure 1).
The top 3 risk groups (i.e., patients scoring 30 points or greater) represented 73.8% (95% CI: 69.7, 77.6) (n = 372/504) of all patients diagnosed with HIV infection yet only 14.6% (95% CI: 14.4, 14.9%) (n = 13,562/92,635) of the total derivation sample; the top 2 risk groups (i.e., patients scoring 40 points or greater) represented 65.9% (95% CI: 61.6, 70.0) (n = 332/504) of those diagnosed with HIV infection yet only 8.2% (95% CI: 8.0, 8.4) (n = 7,597/92,635) of the sample; and the top risk group (i.e., patients scoring 50 points or greater) represented 52.7% (95% CI: 48.3, 57.2) (n = 266/504) of those diagnosed with HIV infection yet only 5.1% (95% CI: 4.9, 5.2%) (n = 4,700/92,635) of the sample.
The validation sample consisted of 22,983 patients, of whom 168 (0.73%) were identified with HIV infection. When categorized into the same 5 risk groups, HIV prevalence was 0.31% (95% CI: 0.20, 0.45) (n = 27/8,782) for persons with a score of <20, 0.41% (95% CI: 0.29, 0.57) (n = 36/8,677) for those with a score of 20–29, 0.99% (95% CI: 0.63, 1.47) (n = 24/2,431) for those with a score of 30–39, 1.59% (95% CI: 1.02, 2.36) (n = 24/1,505) for those with a score of 40–49, and 3.59% (95% CI: 2.73, 4.63) (n = 57/1,588) for those with a score of ≥50 (Figure 1).
In this sample, the top 3 risk groups represented 62.5% (95% CI: 54.7, 69.8) (n = 105/168) of all patients diagnosed with HIV infection yet only 24.0% (95% CI: 23.5, 24.6) (n = 5,524/22,983) of the sample; the top 2 risk groups represented 48.2% (95% CI: 40.5, 56.0) (n = 81/168) of those diagnosed with HIV infection yet only 13.5% (95% CI: 13.0, 13.9) (n = 3,093/22,815) of the sample; and the top group represented 33.9% (95% CI: 26.8, 41.6) (n = 57/168) of those diagnosed with HIV infection yet only 6.9% (95% CI: 6.6, 7.2) (n = 1,588/22,815) of the sample.
Risk score variables for both the derivation sample and the validation sample are shown in Table 3. The calibration regression slope for the validation sample was 1.07 and its R2 was 0.98 (Figure 2A). The area under the ROC curve for the validation sample was 0.75 (95% CI: 0.70, 0.78) (Figure 2B).
To our knowledge, this is the first study to empirically derive and externally validate an HIV risk prediction tool for estimating the prevalence of undiagnosed HIV infection among patients seeking medical care. Prior work has been conducted to evaluate risk and to improve targeting of certain patients, but such work has focused either on specific clinical settings or on specific patient populations, and to our knowledge such practices have not been further evaluated or widely used (16, 19). Our risk score categorizes patients, based on a limited number of variables, into groups with increasing HIV prevalence, and its use has the potential to improve the effectiveness and efficiency of HIV screening across settings where HIV testing is performed by accurately identifying patients at risk and helping to ensure that they are targeted in a systematic, high-yield fashion. This approach may serve best in nontraditional clinical care settings such as emergency departments, urgent care centers, or general practices where implementation of nontargeted screening is most challenging.
Current CDC recommendations for HIV testing call for nontargeted opt-out screening in most health-care settings (1). Other advisory groups, including the World Health Organization and the US Preventive Services Task Force, have tempered their recommendations in support of targeted strategies (10, 11). While the ability to screen every patient for HIV infection would clearly yield the largest numbers of diagnoses and have the greatest impact on mitigating the epidemic, we believe abandoning targeted HIV screening is premature, especially in light of ongoing barriers related to implementation of nontargeted HIV screening (2). In addition, prior work by our team and by others evaluating nontargeted HIV screening in real-world clinical settings demonstrates that this method is costly and probably fails to identify a large number of persons with HIV infection (7, 9, 35, 36).
Traditional targeted screening methods are thought to be ineffective, and arguments to abandon this approach remain active because it is thought that clinicians are too busy to perform risk assessments and patients are unwilling or unable to provide accurate risk information. However, previous research indicates that most patients are willing to discuss and disclose risk behaviors when asked (18, 37, 38), while novel approaches for screening, including use of computerized kiosks and health educators (39–41), have already improved the fidelity of assessing HIV risk. Additionally, traditional risk assessments have relied on subjective assessment of risk by clinicians and have not been systematically applied, something the risk score may obviate.
The risk score may also help prioritize HIV testing and prevention resources in the United States and other countries where the epidemic is concentrated, and where clinicians and public health leaders must make important decisions about where to focus HIV testing efforts. The risk score may be administered using paper forms, although integration into an electronic medical care system will likely be optimal. The latter approach would allow for real-time calculation of a patient’s score and provide the opportunity to limit the number of questions asked, depending on initial responses to questions.
Although the fundamental premise of the risk score is to identify those most at risk, it may also be used to identify patients at low risk, thus limiting the number of tests performed where the yield of new diagnoses is low; this approach also would reduce the likelihood of false-positive tests encountered during large-scale screening (36), especially when performed in low-prevalence settings. Furthermore, the risk score may have the additive benefit of informing clinicians and patients about the risk and allow both to share decision-making when considering testing for HIV infection or when establishing a risk reduction plan. Decision and cost-effectiveness modeling and application of the risk score in clinical practice will help define its most optimal use, which is likely to vary depending on the setting and its underlying HIV prevalence (6).
The risk score includes only 8 variables and was shown to have excellent test characteristics. The model included 3 demographic variables and 5 risk behavior variables and reflects national demographic and risk behavior estimates from the CDC. The estimated numbers of HIV and acquired immunodeficiency syndrome cases are highest among persons aged 20–49 years, with the peak number of cases occurring between ages 35 and 44 years (42). We used a nonlinear approach to model the relation between age and HIV infection, with findings consistent with those reported nationally. Additionally, the incidence of HIV infection is estimated to be highest among men and those who are black or Hispanic, all findings confirmed by our model (42). Finally, risky behaviors, including sex with a male (regardless of whether the person being tested was male or female), anal intercourse, and injection drug use, were also associated with HIV infection, whereas vaginal intercourse and past HIV testing were negatively associated with HIV infection.
The risk score includes patient demographic characteristics, which are relatively easy to ascertain and are generally collected as part of standard practice where HIV testing is performed, but it also includes 5 potentially sensitive risk-based characteristics that may be variably disclosed by patients. In particular, variable disclosure of “sex with a male” if the patient is male, “receptive anal intercourse,” and “injection drug use” may limit the acceptability of routinely implementing this instrument. Prior to including risk-based characteristics in the model, we developed a demographic-only model, which did not perform well (data not shown). Our results support the need to ascertain and highlight the importance of ascertaining at least some behavioral characteristics to estimate a patient’s risk of HIV infection.
Although the risk score was developed and tested using 2 large and distinct patient samples, its predictive accuracy also differed slightly and thus may differ in other communities or clinical settings. In addition, because HIV testing was optional in both clinical settings, selection bias may have been introduced. The generalizability of the risk score is supported, however, by the fact that the model’s calibration between the two study groups was similar. Because this study was not powered to discriminate statistically between the 5 risk groups defined during derivation and because there was a slight loss in discrimination between the derivation and validation cohorts, additional validation may be warranted. Future investigation may require the model’s refinement prior to clinical implementation. Although it is believed that risk for acquiring HIV infection may change over time, the face validity of the risk score in relation to the epidemiology of HIV infection in the United States suggests that the shift in risk is relatively slow. Regardless, prior to or during the initial phases of clinical implementation, the ability of the model to risk-stratify patients should be assessed and adjusted as necessary, including recalibration to maximize the predictive accuracy of the score in each unique setting.
In summary, we derived and validated a risk score that categorized patients into groups with increasing probabilities of HIV infection. The risk score may help clinicians identify patients for directed HIV testing or help public health leaders focus limited resources to improve and streamline approaches to HIV screening.
Author affiliations: Department of Emergency Medicine, Denver Health Medical Center, Denver, Colorado (Jason S. Haukoos, Emily Hopkins, Brooke Bender, Lynsay A. MacLaren, Richard L. Byyny); Department of Emergency Medicine, School of Medicine, University of Colorado, Aurora, Colorado (Jason S. Haukoos, Comilla Sasson, Richard L. Byyny); Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado (Jason S. Haukoos); Department of Emergency Medicine, University of Cincinnati Medical Center, Cincinnati, Ohio (Michael S. Lyons, Christopher J. Lindsell); Department of Emergency Medicine, Johns Hopkins Hospital, Baltimore, Maryland (Richard E. Rothman, Yu-Hsiang Hsieh); Denver Public Health, Denver Health Medical Center, Denver, Colorado (Mark W. Thrun); and Department of Medicine, School of Medicine, University of Colorado, Aurora, Colorado (Mark W. Thrun).
This research was supported, in part, by an Independent Scientist Award (K02 HS017526) from the Agency for Healthcare Research and Quality to Dr. Haukoos.
Conflict of interest: none declared.