|Home | About | Journals | Submit | Contact Us | Français|
The World Health Organization (WHO) recommendation for regular tuberculosis (TB) screening of HIV-positive individuals with Xpert MTB/RIF as the first diagnostic test has major resource implications.
To develop a diagnostic prediction model for TB, for symptomatic adults attending for routine HIV care, to prioritise TB investigation.
Cohort study exploring a TB testing algorithm.
HIV clinics, South Africa.
Representative sample of adult HIV clinic attendees; data from participants reporting ≥1 symptom on the WHO screening tool were split 50:50 to derive, then internally validate, a prediction model.
TB, defined as “confirmed” if Xpert MTB/RIF, line probe assay or M. tuberculosis culture were positive; and “clinical” if TB treatment started without microbiological confirmation, within six months of enrolment.
Overall, 79/2602 (3.0%) participants on ART fulfilled TB case definitions, compared to 65/906 (7.2%) pre-ART. Among 1133/3508 (32.3%) participants screening positive on the WHO tool, 1048 met inclusion criteria for this analysis: 52/515 (10.1%) in the derivation and 58/533 (10.9%) in the validation dataset had TB. Our final model comprised ART status (on ART > 3 months vs. pre-ART or ART < 3 months); body mass index (continuous); CD4 (continuous); number of WHO symptoms (1 vs. >1 symptom). We converted this to a clinical score, using clinically-relevant CD4 and BMI categories. A cut-off score of ≥3 identified those with TB with sensitivity and specificity of 91.8% and 34.3% respectively. If investigation was prioritised for individuals with score of ≥3, 68% (717/1048) symptomatic individuals would be tested, among whom the prevalence of TB would be 14.1% (101/717); 32% (331/1048) of tests would be avoided, but 3% (9/331) with TB would be missed amongst those not tested.
Our clinical score may help prioritise TB investigation among symptomatic individuals.
The World Health Organization (WHO) recommends, as part of activities to address the vast global burden of HIV-related tuberculosis (TB), regular screening for active TB of all people living with HIV (PLHIV) followed by Xpert MTB/RIF (Cepheid, Sunnyvale, CA) as the primary diagnostic test.  The recommended TB screening tool, which comprises any one of current cough, fever, weight loss or night sweats (subsequently referred to as the WHO tool), was developed for use in resource limited settings.  This simple tool, which was designed to rule out TB prior to the provision of isoniazid preventive therapy (IPT) to PLHIV, maximises sensitivity (78.9%) and negative predictive value (97.7% at TB prevalence of 5% in PLHIV), but has low specificity (49.6%) and positive predictive value (8% at TB prevalence of 5% in PLHIV).  South Africa, which is home to the world’s largest HIV epidemic  and where 62% of individuals with TB are also HIV-positive,  has rolled out Xpert as the initial diagnostic test for all individuals with symptoms suggesting TB.  Regular TB screening of PLHIV with a tool that generates large numbers of patients requiring further investigation, of whom only a small proportion will have TB, combined with a diagnostic test that is currently far more expensive than smear microscopy, poses a huge challenge in resource constrained settings. In these settings prioritising testing for those at greatest risk of TB will help preserve resources.
Multivariable prediction models estimate the probability that an individual either has or will develop a particular condition. These models are increasingly abundant in the literature, with variable quality of construction as well as reporting, as highlighted by the recent TRIPOD statement which presents a recommended reporting framework. [6, 7] Clinical scoring algorithms have been developed for PLHIV with symptoms suggestive of TB to prioritise investigation for those with greatest probability of having TB prior to antiretroviral therapy (ART) initiation,  and improve case finding,  but these algorithms have not been validated or applied to patients on ART.
The aim of our study was to develop a score, comprising elements readily available in primary care, to predict probability of TB in adults attending for routine HIV care screened for TB and found WHO tool positive. This score was used to develop a simple tool to help health care workers in resource limited settings decide whom to prioritise for TB investigation.
We used data collected for “Xpert for people attending HIV/AIDS care: test or review?” (XPHACTOR), a prospective cohort study evaluating a risk-based algorithm to prioritise Xpert MTB/RIF testing amongst adults attending for routine HIV care in South Africa, to develop and validate our clinical score. Fig 1 depicts XPHACTOR study flow.
We enrolled a systematic sample of adults (aged ≥18 years) attending two hospital-based and two community health centre (CHC) clinics in Gauteng province, South Africa, for HIV care, irrespective of presence of symptoms suggestive of TB. Patients taking anti-tuberculosis treatment within the previous 3 months were excluded. Patients were enrolled into three groups: “on antiretroviral therapy (ART)” (currently taking or ART-experienced) group; “pre-ART” (in HIV care but not yet taking ART) group; and “HIV Testing and Counselling (HTC)” (newly-diagnosed HIV-positive). We recruited to the on ART group from hospital clinics because their patient population solely comprised those ART-experienced; and pre-ART and HTC groups were recruited from CHC clinics. At the time of the study, ART eligibility comprised CD4 ≤350 cells/mm3 or WHO clinical stage ≥3.
At enrolment, research staff administered a standardised questionnaire incorporating the WHO TB screening tool (any of current cough, fever, night sweats or unintentional weight loss), measured height and weight, mid-upper arm circumference (MUAC), and recorded most recent clinic CD4 cell count. Further investigation was prioritised according to the XPHACTOR algorithm with an immediate spot sputum sample sent for Xpert MTB/RIF for (i) all assigned “high priority” (any of: current cough, fever ≥ 3 weeks, body mass index [BMI] <18.5 kg/m2, CD4 <100x106/l, measured weight loss ≥10% in preceding 6 months, or other feature raising high clinical suspicion of TB); (ii) those in pre-ART group with CD4<200 x106/l at enrolment (iii) all in HTC group at enrolment, the latter two categories (who were recruited for XPHACTOR substudies) because of a priori high risk of active TB. For all other participants a spot sputum sample was frozen at -80°C within 24 hours, for testing with Xpert at the end of the study.
Participants were reviewed monthly to three months, with repeat WHO symptom screen and a spot sputum requested for Xpert MTB/RIF if “high priority” by the study algorithm at that visit, with the exception of those in the “on ART” group who were asymptomatic at enrolment who were telephoned at 1 and 2 months to update locator information but were not asked about TB symptoms. At the 3-month visit sputum (induced if necessary) and blood were collected for mycobacterial culture on liquid media (Bactec MGIT 960 and 9240 systems) from all study participants. We allowed a broad window period around the scheduled 3-month visit, till around six months, in order to maximise study follow-up.
Participants who submitted an Xpert sample were reviewed and if Xpert-positive, TB treatment was initiated; if negative, further investigation in accordance with national guidelines was facilitated (chest radiograph, sputum culture and trial of antibiotics).
Results of all investigations were fed back to clinic staff, who were responsible for management decisions. Clinic medical records were reviewed at the end of the study to ascertain any additional relevant investigations and/or TB diagnoses. Deaths were identified through reports from participant-nominated contacts, clinic staff, and by accessing the Department of Home Affairs vital statistics database using participants’ South African identification (ID) numbers.
We restricted our analysis to all XPHACTOR participants who were WHO tool positive at enrolment and established in care (i.e. not newly testing HIV positive); and excluded those taking isoniazid preventive therapy (IPT) at enrolment, as those on IPT were likely to have recently undergone investigation for TB, and hence were effectively “pre-screened” for TB.
To be deemed clinically useful a prediction model should demonstrate accurate prediction of the outcome in data other than that in which the model was developed. We developed our prediction model using part of our dataset, and undertook internal validation of model performance using the remainder of the dataset.  Enrolment to XPHACTOR was staggered by site, commencing with hospital clinics; hence the dataset was stratified by site and split 50:50 by median date of enrolment within site. Data from the earlier half were used to derive our prediction model (derivation dataset), and from the latter half for validation (validation dataset). Data were analysed using Stata 14 (Stata Corporation, College Station, TX, USA), as detailed below.
Our outcome was confirmed or clinical TB versus “not TB”, ascertained within 6 months of enrolment to XPHACTOR, as defined below.
“Confirmed” TB was defined as a positive result on i) Xpert MTB/RIF or ii) line probe assay (LPA) performed on smear-positive or cultured isolate (GenoType MTBDRplus, Hain Lifesciences) or iii) M. tuberculosis (Mtb) culture, from any sample (including stored sputum and those requested by the health care provider) collected within six months of XPHACTOR enrolment. Individuals who started TB treatment within six months of enrolment (including those with treatment starts reported in the context of a separate verbal autopsy sub-study), in the absence of microbiological confirmation, were assigned “clinical” TB. This was based on the assumption that an HIV-positive adult with a positive bacteriological test result or starting TB treatment within six months after enrolment likely had active TB at enrolment, supported by data from Zimbabwe which estimated the mean duration of smear-positivity prior to TB diagnosis amongst HIV-positive adults to be 18–33 weeks. 
“Not TB” was defined as fulfilling all of the following: absence of criteria for confirmed or clinical TB; and alive at least 3 months after enrolment. Participants who did not fulfil the case definitions for TB or “not TB” were deemed to have an unclassifiable outcome and excluded from the analyses.
Pulmonary and extrapulmonary TB were classified in accordance with WHO definitions. 
There is no consensus on the best method for selecting candidate variables, but suggested approaches include using literature review, clinical knowledge and studying the distribution of predictors in the study data. [6, 13, 14] It is recommended, to ensure predictive accuracy, that the total number of candidate predictors is limited so that there are at least 10 outcomes for each candidate predictor studied. [6, 13] We considered predictors from data collected at enrolment to XPHACTOR known to be associated with prevalent and/or incident TB amongst PLHIV: age, sex, previous TB treatment, smoking, alcohol use, history of ART, duration on ART, previous IPT, previous cotrimoxazole preventive therapy (CPT), presence of individual WHO tool symptoms, duration of WHO tool symptoms, BMI, MUAC, CD4 count, haemoglobin, and viral load. [15–25] History of mining,  health care work,  and incarceration,  although established risk factors for TB were not considered as <10% participants fell into each category. The following variables were also excluded: MUAC, measured weight loss, haemoglobin and viral load, due to >20% missing data; and previous IPT, as there was only one outcome amongst participants with previous IPT.
A priori we combined history of ART with duration on ART to generate “ART status” categorised as: pre-ART or on ART <3 months vs. on ART for >3 months, as amongst patients on ART, duration of <3 months is a predictor for prevalent TB.  A priori we considered ART status, CD4 cell count, and BMI for our adjusted model, and used univariable screening to select additional candidate predictors with P-value (p)<0.25.
We undertook multivariable logistic regression of candidate predictors, sequentially removing the variable with the largest Wald p-value >0.05 (stepwise backward elimination), to generate our final model.  A complete-case analysis was undertaken, excluding participants with missing information relating to any of the candidate predictors. A model that categorised the number of WHO symptoms as 1 vs. >1 symptom (model A) was compared with one that included individual WHO tool symptoms (model B), aiming to select the simplest and most practical model to implement in primary care. We also considered a model without CD4 count for settings where this might not be easily available.
Transformations of continuous variables (BMI and CD4) were assessed using fractional polynomials. In our final selected model we tested for interactions between remaining variables and “ART status”.
We assessed model calibration, the agreement between probability of TB predicted by the model and observed probability of TB within quantiles of predicted risk, graphically in a calibration plot; and statistically using the Hosmer-Lemeshow test. We assumed p<0.05 from the Hosmer-Lemeshow test as indicating lack of model fit (poor calibration), although the test has limited statistical power to detect poor calibration unless the sample size is large and the outcome frequent.  We assessed discrimination, the ability of our model to differentiate patients with TB vs. those without, using the area under the receiver-operating characteristic curve (AUROC). AUROC 0.7 to 0.79, 0.8–0.89, ≥0.9 are respectively considered acceptable, excellent and outstanding discrimination. 
Continuous variables in the final model were categorised in a clinically meaningful manner based on their functional form, and each beta coefficient from this logistic regression model was divided by the smallest coefficient and rounded to the nearest integer to assign points to each variable. The total number of points was summed for each participant to calculate the clinical score.
We used the beta coefficients and intercept from the final regression models (before and after categorisation of continuous variables) generated from the derivation dataset to calculate the risk score for each participant in our validation dataset. We converted the risk score into predicted risk using predicted risk = 1/(1+e-risk score),  and assessed performance of the regression model in the validation dataset by evaluating calibration and discrimination.
The study was approved by the ethics committees at the University of the Witwatersrand, University of Cape Town, and the London School of Hygiene & Tropical Medicine. All consenting participants gave written consent or, for illiterate participants, witnessed verbal consent. For illiterate participants, there was an impartial witness present during the consenting process, who then signed the relevant witness section of the consent form. All ethics committees approved the consent form, including the section on the use of witnessed oral consent for illiterate participants, at the beginning of the study. Principles expressed in the Declaration of Helsinki were followed in the conduct of this research.
We enrolled 3508 participants established in care (i.e. not newly testing HIV positive) to XPHACTOR. Overall, among patients taking ART, 783/2602 (30.1%) reported one or more symptom in the WHO tool and 79/2602 (3.0%) had TB. Among pre-ART patients 350/906 (38.6%) reported ≥1 symptom and 65/906 (7.2%) had TB. For this analysis, 2418/3508 were excluded because WHO tool negative (2227) or on IPT at enrolment (191), and a further 25 participants were excluded because of “unclassifiable” outcome leaving 1065 who were WHO tool positive and eligible for our analysis (Fig 2). We undertook a complete-case analysis and therefore excluded a further 17 participants with missing candidate predictor data (S1 Table), leaving 1048 for our analysis.
Table 1 compares the characteristics of participants in the derivation and validation datasets. There were 515 participants in the derivation dataset, enrolled between September 2012 and September 2013, amongst whom 52 (10.1%) participants fulfilled case definitions for TB (36 confirmed, 16 clinical). In the validation dataset there were 533 participants enrolled between May 2013 and March 2014, amongst whom 58 (10.9%) participants fulfilled case definitions for TB (39 confirmed, 19 clinical). The proportion with pulmonary vs. extrapulmonary disease in derivation vs. validation datasets amongst those with confirmed TB was pulmonary (35/36 vs. 37/39) and extrapulmonary (1/36 vs. 2/39); and amongst those with clinical TB was pulmonary (9/16 vs. 7/19), extrapulmonary (4/16 vs. 5/19) and not recorded (3/16 vs. 7/19).  The median time from enrolment to earliest of positive TB test or date TB treatment was started amongst all participants diagnosed with TB (derivation and validation datasets combined) was 7 days (IQR 0, 63), with 90% of diagnoses made within 120 days of enrolment.
In derivation and validation datasets, median age was 41 years, 72% were in the on ART group, most participants were female (67% vs. 71%), and the most common WHO tool symptoms reported at enrolment were cough (59% vs. 66%) and weight loss (46% vs. 42%; Table 1). At enrolment median CD4 was greater in derivation compared with validation dataset (378 vs. 334 cells/mm3), and median BMI was similar (24 kg/m2). Participants in the derivation dataset were more likely to report previous IPT than those in the validation dataset (9.9% vs. 3.6%).
Table 2 summarises the candidate predictors considered for model A, which categorised number of WHO symptoms reported (1 symptom vs. > 1 symptom), and the final multivariable model. We excluded age, alcohol status and previous history of TB as p>0.25 in univariable analysis. Our final model (model A) comprised: ART status (on ART > 3 months = 0 v pre-ART or ART <3 months = 1); BMI (continuous, linear); CD4 (continuous, linear); number of WHO symptoms (1 symptom = 0 v >1 symptom = 1). A linear relationship with log odds of the outcome was found to be adequate using fractional polynomials for both BMI and CD4 count. No evidence was found for statistical interactions between ART status and CD4 count, BMI, or number of WHO symptoms (Wald p-value ≥0.9). This model had Hosmer-Lemeshow statistic p = 0.65 and AUROC 0.79 (95% confidence intervals [CI] 0.73–0.86) indicating statistically adequate calibration and discrimination in the derivation dataset (Fig 3, S2 Table). In a sensitivity analysis where we excluded all clinical TB and used a gold standard of bacteriologically-confirmed TB, we obtained the same final multivariable model (S3 Table).
Univariable screening to select candidate predictors may result in the rejection of important predictors. [6, 13] When we repeated our multivariable analysis without univariable screening, and included all candidate predictors considered for model A, using stepwise backward elimination we obtained the same final model.
The risk score and predicted risk were calculated for the validation dataset using model A and showed that calibration and discrimination were adequate (Hosmer-Lemeshow p = 0.31 [S2 Table], AUROC 0.75 [95% CI 0.68–0.82]), though the calibration plot demonstrates over-prediction at higher deciles of risk (Fig 3).
S4 Table presents an alternative multivariable model developed using individual WHO tool symptoms rather than total number of symptoms (model B). Model B comprised ART status, BMI, cough, night sweats and unintentional weight loss. In the derivation dataset there was evidence that presence of cough was modified by ART status (p = 0.03 for interaction term); and the model had adequate calibration and discrimination (Hosmer-Lemeshow statistic p = 0.81, AUROC 0.82 [95% CI 0.76–0.88]). In the validation dataset this model had poor calibration (Hosmer-Lemeshow statistic p = 0.01) although discrimination was acceptable (AUROC 0.75 [95% CI 0.69–0.82]).
We repeated our multivariable analysis using all candidate predictors considered for model A removing CD4 count, for use in a setting where CD4 count is not easily obtainable. This model containing ART status, BMI and number of WHO symptoms (data not shown), performed adequately in the derivation dataset (Hosmer-Lemeshow statistic p = 0.54, AUROC 0.77 [95% CI 0.71–0.84]). In the validation dataset this model had poor calibration (Hosmer-Lemeshow statistic p = 0.02) although discrimination was acceptable (AUROC 0.70 [95% CI 0.63–0.77]).
We selected model A as our final model to develop the risk score because it was simpler and performed better in the validation dataset.
We used WHO BMI categorisation of <18.5 kg/m2 as underweight, 18.5–24.9 kg/m2 as normal weight, and ≥ 25 kg/m2 as overweight. CD4 count was categorised as <200 cells/mm3, 200–349 cells/mm3 and ≥ 350 cells/mm3 to reflect clinically relevant cut-offs and the skewed CD4 count distribution amongst HIV-infected patients with TB.  The multivariable model with categorisation of these continuous variables in the derivation dataset is presented in Table 3. This model had statistically adequate discrimination in both derivation (AUROC 0.79 [95% CI 0.73, 0.86]) and validation datasets (AUROC 0.72 [95% CI 0.65, 0.79]). The Hosmer-Lemeshow statistic p-value was 0.89 in the derivation dataset but 0.02 in the validation dataset indicating poor calibration in the validation dataset.
The clinical score for each predictor was generated and the possible range for the total score was 0 to 16 (Table 3).
Table 4 shows the percentage of patients diagnosed with TB at each value of clinical score in derivation and validation datasets, and S1 Fig boxplot illustrates the distribution of clinical score, stratified by dataset, amongst those diagnosed with TB vs. those not diagnosed with TB.
Fig 4 shows the performance of the clinical score at different cut-offs, in terms of sensitivity, specificity, negative predictive value and AUROC in the entire dataset. A cut-off of clinical score of ≥ 3 to trigger TB investigation had sensitivity of 91.8% (95% CI 85, 96.2), specificity 34.3% (95% CI 31.3, 37.5), negative predictive value 97.3% (94.9, 98.7) and AUROC 63.1% (95% CI 60.1, 66.1). Increasing the cut-off to ≥7, where sensitivity and specificity were closest offered the best discrimination (AUROC 70.1% [95% CI 65.8, 75.1]), with improvement in specificity to 73.7% (95% CI 70.7, 76.5), but sensitivity was only 67.3% (95% CI 57.7, 75.9) although negative predictive value was maintained at 95% (95% CI 93.2, 96.5).
We selected a cut-off of clinical score ≥ 3 to trigger TB investigation as we deemed that in this population, in order to avoid missing TB diagnoses, maintaining a higher sensitivity was more important than optimising discrimination. Investigating patients who had a clinical score of ≥ 3 would have resulted in no further investigation of 30% (155/515) patients in the derivation dataset, and missed 6% (3/52) of TB diagnoses. The same cut-off for investigation in the validation dataset would have resulted in no further investigation of 33% (176/533) patients and missed 10% (6/58) of TB diagnoses. Amongst the nine patients with clinical score < 3 and TB diagnosed (4 confirmed TB, 5 clinical TB) all had had been on ART for ≥ 3 months and all reported only one symptom which was cough; median BMI was 24.3 kg/m2 (range 20.3–30.8) and median CD4 was 429 cells/mm3 (range 241–1183).
Fig 5 presents a proforma of how this scoring system, using combined data from both derivation and validation datasets to demonstrate the prevalence of TB by clinical score group, could be used in practice (combined data sets, N = 1048). If investigation was prioritised for individuals with a score of ≥3, 68% (717/1048) of symptomatic individuals would be tested, among whom the prevalence of TB would be 14.1% (101/717). 32% (331/1048) of tests would be avoided using this strategy, at the cost of missing 8% (9/110) individuals with TB or 3% (9/331) with TB amongst those not tested.
Our study is the first to derive and internally validate a clinical score for patients attending for routine HIV care, both ART-experienced and pre-ART, for use as a second step after TB screening with the WHO tool. The score is designed to assist health care workers in resource limited settings to identify whom to prioritise for TB investigation. Our score uses elements which should be readily available at any level of health care and is simple to use, highlighting to less experienced clinicians those at greatest risk of TB, and providing a useful tool for other cadres of health care worker. In our study population, not investigating those who have a clinical score <3, amongst whom the prevalence of TB is 3% (9/331), would avoid investigation of 32% (331/1048) of those reporting WHO symptom(s), whilst missing only 8% (9/110) of TB diagnoses. We hypothesise that the WHO tool positive patients with clinical score <3 who had TB diagnoses missed were more likely to have less advanced disease and more favourable prognosis. This is suggested by their clinical characteristics (all on ART with normal weight and CD4 count >240 cells/mm3), and consistent with findings from other studies. [32–34] In the broader context of the original XPHACTOR study population, not investigating the 2227 who were WHO tool negative at enrolment would have missed the 28 TB diagnoses in this group (Fig 2). The overall risk of TB in those who were WHO tool negative or had clinical score <3 was 1% (37/2558), and the two step strategy (WHO tool followed by clinical score) would have avoided investigating 78% (2558/3275) of clinic attendees (Fig 2). The TB diagnoses missed using this strategy, 27% (37/138) of TB diagnoses, were mainly amongst those who were WHO tool negative.
Our clinical scoring system compares favourably, in terms of simplicity and ability to identify patients with lowest prevalence of TB, with that derived by Balcha et al, also as a second step after WHO symptom screen, for ART naïve patients attending for HIV care in Ethiopia.  In this smaller and as yet unvalidated (internally or externally) study, amongst 569 WHO tool positive patients, a more complex score which included Karnofsky status, MUAC, peripheral lymphadenopathy and anaemia, using a cut off of ≥2 was able to avoid investigation of 45% (255/569) of whom 8% (20/255) had culture confirmed TB. Rudolf et al derived TBscore II from a population in Bissau who were seeking care for symptoms suggestive of TB, of whom only 164 were HIV-positive.  Their score is also more complex than our score, incorporating physical signs in addition to symptoms and also requires both internal and external validation.
The majority of our study population were established on ART in contrast to those in the meta-analysis which derived the WHO tool who were largely pre-ART,  and the populations used to derive clinical scores by Balcha  and Rudolf.  Thus our study addresses a key question concerning operationalisation of TB screening among the increasingly large population of adults on ART. Our study participants were established in HIV care and thus likely had had previous screening, which is known to reduce sensitivity of the WHO tool for bacteriologically confirmed TB,  as also is ART use.  Rangaka et al evaluated the utility of the WHO tool to rule out TB prior to IPT in a population similar to ours, i.e. both pre-ART and on ART although duration on ART was shorter (median 12 months), against a gold standard of culture-confirmed TB.  Their study suggested that amongst those on ART addition of BMI and CD4 to the tool could be considered, but recommended sputum culture first for all prior to IPT.  We ensured that in our clinical score we included BMI, CD4 and a measure of ART status which incorporated duration on ART, and believe that our score therefore will prove useful for all patients screened for TB during routine HIV care. Our score obviates the need for a separate tool for those pre-ART vs. on ART, although people with newly diagnosed HIV have such a high TB prevalence that investigation for all may be justified. 
In contrast with other studies deriving clinical algorithms or evaluating performance of the WHO rule, [2, 8, 29, 33, 35] our case definition for TB included clinical TB. This reflects the real life scenario of high TB burden resource-limited settings, and is a strength of our study. Most of our TB diagnoses were bacteriologically-confirmed pulmonary TB, which is what the WHO tool was largely designed to rule out prior to provision of IPT.  In sensitivity analysis restricted to bacteriologically-confirmed TB we obtained the same final multivariable model (S3 Table).
We assumed that all participants starting TB treatment or with a sample which was bacteriologically confirmed collected within six months of enrolment were likely to have had active TB at enrolment. We based this decision on data from a community survey and TB notification data in Zimbabwe, estimating a mean duration of smear-positivity prior to TB diagnosis amongst HIV-positive adults of 18–33 weeks.  In actual fact 90% of our study participants who started TB treatment commenced within four months of enrolment. In the derivation vs. validation dataset the interquartile range for time from enrolment to TB diagnosis is shorter (0–31 vs. 0–83 days) and this may reflect implementation of a substudy later in the course of XPHACTOR evaluating causes for persistent TB symptoms in patients without TB diagnosis by the 3-month visit. There were 47 participants (with 10 TB diagnoses) in this substudy in the validation dataset compared with 7 in the prediction dataset (with 1 TB diagnosis). We undertook the majority of our case notes reviews towards the end of the XPHACTOR study and this is reflected in the longer duration of follow up in the derivation vs. validation dataset, which may have resulted in ascertainment bias in terms of TB diagnoses made in the derivation dataset, although the total number of TB diagnoses was similar in both groups. Differences between the derivation and validation datasets represent a strength in terms of evaluating our predictive model, as non-random splitting which reduces the similarity of the two datasets is preferred for internal validation.
We developed our score in accordance with TRIPOD recommendations  and internal validation of our final multivariable model (model A) demonstrated adequate calibration and discrimination in our validation dataset. The multivariable model resulting from our categorisation of BMI and CD4 in a clinically meaningful manner, also showed acceptable and clinically useful discrimination in the validation dataset. Our model requires external validation in order to confirm that it predicts well in individuals outside of our dataset  and, following this, impact studies to assess patient outcomes and cost effectiveness of this strategy.  Assuming external validity, our suggested threshold for investigation (clinical score ≥3), could be varied depending on available resources. We have suggestions for updating our prediction model, which we were unable to evaluate due to insufficient data: MUAC, which is simpler to measure than BMI; haemoglobin, because anaemia is a strong independent predictor of TB amongst those poised to initiate ART;  and viral load.  Recent WHO guidelines recommend ART initiation for all PLHIV at any CD4 count suggesting that in settings where viral load monitoring can be assured that CD4 count for monitoring purposes may be reduced or stopped.  CD4 count itself is not always easily available, and in these settings viral load monitoring is also unlikely to be easily available, but given this new guidance models without CD4 should be considered.
Strengths of our study include systematic evaluation of a representative sample of adults attending for routine HIV care who underwent rigorous assessment for TB and longitudinal follow-up which minimised the number of TB diagnoses missed, and model development and validation in accordance with TRIPOD guidelines. 
We have developed and internally validated a simple clinical score comprising ART status, BMI, CD4 count and number of WHO symptoms, for patients attending for routine HIV care in resource limited settings. Our score is designed to identify, amongst those reporting WHO tool symptom(s), whom should be prioritised for TB investigation. Our findings are highly relevant given the national roll out of Xpert MTB/RIF in South Africa.
We thank the study participants; the nursing and medical staff of Chris Hani Baragwanath and Mamelodi hospitals, Ramokonopi and Jabulani Dumane community health clinics, South Africa; the staff of National Health Laboratory Services, South Africa; and the staff of Aurum Institute for their essential contributions to this study.
GC received funding for the study from the Bill and Melinda Gates Foundation, grant number OPP1034523 (Churchyard). URL: http://www.gatesfoundation.org. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The XPHACTOR Clinical Score dataset has been uploaded to the LSHTM Data Compass repository at https://doi.org/10.17037/DATA.204 and is made available on request. Our study participants consented to the use of information collected from the XPHACTOR study for HIV and TB related research. Users of our data need to agree to this condition in order to fulfil the study's ethical obligations to research participants. The study team wish to avoid unnecessary barriers to access and will seek to respond to data requests as quickly as possible.