|Home | About | Journals | Submit | Contact Us | Français|
Febrile neutropaenia is a frequently occurring and occasionally life-threatening complication of treatment for childhood cancer, yet many children are aggressively over-treated. We aimed to undertake a systematic review and meta-analysis to summarise evidence on the discriminatory ability and predictive accuracy of clinical decision rules (CDR) of risk stratification in febrile neutropaenic episodes.
The review was conducted in accordance with Centre for Reviews and Dissemination methods, using random effects models to undertake meta-analysis. It was registered with the HTA Registry of systematic reviews, CRD32009100453.
We found 20 studies describing 16 different CDR assessed in 8388 episodes of FNP. No study compared different approaches and only one CDR had been subject to testing across multiple datasets. This review cannot conclude that any system is more effective or reliable than any other.
To maximise the value of the information already collected by these and other cohorts of children with febrile neutropaenia, an individual-patient-data (IPD) meta-analysis is required to develop and test new and existing CDR to improve stratification and optimise therapy.
Children undergoing treatment for malignancy have an excellent chance of survival, with overall rates approaching 75%.1 In most cases, children who die following treatment for cancer do so of their disease, but despite huge improvements in supportive care, around 16% of deaths within 5 years of diagnosis are due to the complications of therapy.2,3 One such life-threatening complication in immunocompromised children remains infection, frequently presenting as the occurrence of fever with neutropaenia.4 A robust risk stratification model which reliably predicted those children at very low risk of having a significant infection could result in reduced intensity and/or duration of hospitalised antibiotic therapy. Those at high risk of complications could be targeted for more aggressive management. At present there are many differing policies for the management of febrile neutropaenia in paediatric practice5,6 with lack of agreement about how and which clinical decision rules (CDR), if any, are used.
A clinical decision rule is a tool designed to be used at the bedside to assist clinical decision making.7 These rules should be validated by assessing them on a separate population; to test both how well the rule differentiates the risk groups (discriminatory ability) and to determine the absolute estimates of risk within these groups (predictive accuracy).
In adult oncology practice the Multinational Association for Supportive Care in Cancer (MASCC) risk index8 provides a CDR to identify patients at low risk of serious medical complications during febrile neutropaenia. The factors identified included ‘young age’ (<60 years) and no chronic obstructive airways disease, among other features specific to the disease type and presentation at each episode, and has been used as the basis for the out-patient management of fever in low-risk neutropaenic adult patients.9 The MASCC rule is of very limited applicability in this group: it did not include children in its derivation, the age criterion is non-discriminatory, and chronic airways disease is extremely rare. Accordingly, studies on children and young people require separate, detailed examination.
This systematic review aimed to identify, critically appraise and synthesise evidence on the discriminatory ability and predictive accuracy of existing CDRs in febrile neutropaenic episodes in children and young people undergoing treatment for malignant disease.
The review was conducted in accordance with ‘Systematic reviews: CRD’s guidance for undertaking reviews in health care10 and registered on the HTA Registry of systematic reviews: CRD32009100453. It sought studies which aimed to derive or validate a CDR in children or young people (aged 0–18 years) presenting with febrile neutropaenia. Both prospective and retrospective cohorts were included, but those using a case–control (‘two-gate’) approach were excluded as these have been previously shown to exaggerate diagnostic accuracy estimates.11 Studies exclusively addressing the prediction of radiologically confirmed pneumonia are subject to a separate review [in submission to Arch Dis Child – Still under consideration].
An electronic search strategy (see Web Appendix 1) was developed which examined the following databases from their inception to February 2009:
Reference lists of relevant systematic reviews and included articles were reviewed for further relevant articles. Published and unpublished studies were sought and no language restrictions applied. Non-English language studies were translated. Two reviewers independently screened the title and abstract of studies for inclusion, and then the full text of retrieved articles. Disagreements were resolved by consensus.
The validity of each study was assessed using 11 of the 14 questions from the QUADAS assessment tool for diagnostic accuracy studies.12 The QUADAS tool was adapted specifically for the review,13 omitting questions on ‘time between index and reference test’, ‘intermediate results’ and ‘explanation of withdrawals’ (see Web Appendix 2). The CDR and reference tests are necessarily related, and the design of a CDR means that ‘intermediate’ results are included in any analysis. The issue of incomplete data was addressed in the analysis of the method of derivation or validation, and as such was not included as a quality criterion.
Data were extracted by one reviewer and checked by the other. The data extracted included age and sex distribution of the included participants, geographical location of the study and participant inclusion/exclusion criteria. The performance of the CDR as a 2 * k table (where k refers to the number of strata described) as well as the methods used to derive the CDR (where applicable), the variables considered, methods of statistical analysis and approach to multiple episodes in individual patients and missing data were also extracted.
Quantitative synthesis was undertaken for studies which tested the same CDR and, where appropriate, was investigated for sources for heterogeneity.
For dichotomous test data, analyses were attempted with a bivariate model (using ‘metandi’ in STATA1014). For tests with very small numbers of studies to pool (n 4) fitting a bivariate model is problematic as the procedure frequently fails to converge. In these cases, a univariate approach was used (pooling sensitivity and specificity separately).15
For tests where three-level (low, medium and high risk) results were produced, an approach based on a previous meta-analysis of three-level CDR results was used.16 This random-effects meta-analysis was undertaken using WinBUGS 1.4.317 to estimate the proportions of individuals classified as low, medium or high risk in the bacteraemic and non-bacteraemic groups. As an extension to this method, bivariate random effects were applied to the calculation of each proportion. Data from studies which used a similar rule but provided only two of the risk categories (i.e. low versus medium–high) were also included in this analysis.18 These proportions were used to calculate likelihood ratios (LR) for each risk category and the corresponding 95% credible intervals (CrI).
Heterogeneity between study results was explored through consideration of study populations, study design, predictor variables assessed and outcomes chosen, although the small number of studies in each category limited this approach. Sensitivity analysis was undertaken by comparing results when the original (derivation) dataset was included and excluded.
For those areas where a quantitative synthesis was not possible, a narrative approach was used.
Twenty-one articles reporting on 20 studies19–38 were eligible for inclusion in the review (see Fig. 1). The studies included patients from 1 month to 23 years old, with a wide range of malignancies, and a total of 7840 episodes of FNP describing 11 outcomes, summarised in 5 clusters: death, critical care requirement, serious medical complication, significant bacterial infection and bacteraemia (see Table 1). Eight of these studies were prospective,25,26,29,30,32,33,36,37 11 were retrospective19–23,27,28,31,34,35,38 and 1 was a retrospective analysis of prospectively collected data.24
The studies varied in quality. Potential biases due to threats to independent outcome assessment were present in some studies (see Web Appendix 3). The applicability of the studies to specific populations also varied (see Table 1). Thirteen definitions of febrile neutropaenia were used, with 12 definitions of fever and 4 of neutropaenia. However, all definitions are clinically similar, with any variation at the ‘lowest risk’ part of the spectrum of classification.
The 16 reports of attempts to derive a CDR varied in the populations included the predictor variables and adverse outcomes they reported. The model-building technique, the reporting and handling of missing data and multiple-episode data and the use and categorisation of continuous and categorical variables were also assessed. Details are available in Web Appendices 4 and 5.
The CDR had diverse test performance (see Table 2 for detail). This heterogeneity has largely been explored using a narrative structure, as pooling across all the studies was not possible due to the varied rules, outcomes and populations studied. It was examined by analysis of the tabulated CDR performance data and graphically with plots of sensitivity and specificity (Web Fig. 1 for unpooled studies and Figs. 2 and 3 for pooled studies).
Meta-analysis of studies which used identical CDR was undertaken in two cases: the ‘Rackoff rule’36 to examine bacteraemia,22,26,28,34,36 and the ‘Santolaya rule’ for serious infectious complications.32,33
The ‘Rackoff rule’ discriminates between three groups of individuals at low, moderate and high risk of bacteraemia. A study which reported ‘microbiologically documented infection’ rather than the narrower ‘bacteraemia’ appeared as a significant outlier (see Fig. 2a).34 Exclusion of this study led to a more Normal distribution of the posterior probability plots. Undertaking a sensitivity analysis by exclusion of the initial rule derivation study demonstrated poorer discriminatory ability (see Fig. 2b for a ‘best estimate’ summary) LR [low] = 0.22 (95% CrI 0.03–1.85), LR [medium] = 0.79 (95% CrI 0.12–2.06) and LR [high] = 3.41 (95% CrI 0.24–18.7). The probability of bacteraemia in each of these groups will vary with the baseline chance of bacteraemia. If we use a 22% overall prevalence of bacteraemia (the average proportion over the included studies which report these data) the predictive values are; Low risk = 6% (95% CrI 1–34%), Mid risk = 18% (CrI 3–37%) and High risk = 49% (95% CrI 6–84%).
The ‘Santolaya rule’ showed a moderate ability to differentiate between low- and high risk groups considering the outcome of ‘invasive bacterial infection’. The derivation sample performed marginally less effectively than the validation set. The pooled estimate of test accuracy is LR [low] = 0.17 (95% CI 0.12–0.23) and LR [high] = 2.87 (95% CI 2.43–3.38), see Fig. 3. Using the average ‘invasive bacterial infection’ rate of 47%, this leads to the probability of ‘invasive bacterial infection’ in the low group as 13% (95% CI 9–13%) and 72% (95% CI 68–75%) in the high group. The two studies examining this rule are from the same research group (although in a multi-centre study environment) and the rule has not been subject to further validation.
Assessments of potential sources of heterogeneity showed that derivation studies generally had better accuracy than validation studies. The outcome studied also appeared to alter rule performance but the heterogeneity of rules and populations make this difficult to examine clearly. Those CDRs developed in a population where the highest risk patients are excluded (e.g. bone marrow transplant recipients) did not seem to differ from rules developed without these exclusions. All these analyses are confounded by the correlation of location, population, outcome and rule. For example: the Santolaya studies took place in Chile, excluded BMT, looked at a broad definition of infectious complications and developed a 5-item rule, the Rackoff model was developed in the United States, did not clearly exclude any patient group, primarily examined bacteraemia and produced a rule based on a single haematological parameter and temperature.
Examination of the detailed content of all the proposed rules shows they address four major domains (Web Appendix 6). The first can be considered stable patient-related factors, including age and the underlying disease. The second group reflects treatment; the presence of a central venous catheter and the type or duration since last chemotherapy. The third group reflects episode-specific clinical features, such as maximum temperature, the patient’s blood pressure or clinical features of infection. The final group contains episode-specific laboratory test values. These are various markers of bone marrow function where, excepting,23 each rule uses a single item which reflects one of the three major cellular components: haemoglobin, platelets, leucocytes (or a subset); and serum inflammatory markers (C-reactive protein). An exploratory analysis of the individual features common across predictive studies shows that age, malignant disease state, clinical assessments of circulatory and respiratory compromise, higher temperatures and bone marrow suppression all have some explanatory power.
This is to our knowledge the first systematic review and meta-analysis of risk prediction rules in paediatric febrile neutropaenia. It describes 20 studies producing 16 separate models, assessing a variety of outcomes, with individual differences in definitions, covering five main categories: death, critical care requirement, serious medical complication, significant bacterial infection and bacteraemia. Despite the inclusion of nearly 8000 episodes of FNP, this review cannot conclude that any system is more effective or reliable than any other.
A clinical decision rule for febrile neutropaenic episodes can be broadly considered to have two uses. Primarily it is to decide if the risk of an episode is ‘low enough’ to allow reduced intensity therapy (e.g. outpatient management), but at the opposite end of the risk scale, a CDR may be helpful to direct increasingly close observation and more aggressive management. The patients at ‘high risk’ do not have such clear management options: there are no effective truly prophylactic measures to prevent sepsis syndrome but early recognition may prevent progression to septic shock.39
The majority of CDR in this review focus upon defining a group at ‘low risk’ of complications. Two rules in particular have been subject to greater verification, other rules show promise and have clinical/physiological similarities, but have had less validation.
The performance of only one rule could be reasonably assessed across multiple datasets; that of absolute monocyte count and temperature criteria proposed by Rackoff36 to exclude bacteraemia. This CDR, tested in 1171 episodes over five datasets, in three different groups across time and in different centres, has the greatest strength of evidence. The most appropriate pooled estimate of the rule’s effectiveness shows limited discriminatory ability, LR [low] = 0.22 (95% CrI 0.03–1.85), LR [medium] = 0.79 (95% CrI 0.12–2.06) and LR [high] = 3.41 (95% CrI 0.24–18.7). The marked uncertainty in these estimates is best demonstrated by the post-test probabilities of bacteraemia: Low risk = 6% (95% CrI 1–34%), Mid risk = 18% (CrI 3–37%) and High risk = 49% (95% CrI 6–84%).
Of the other rules the Santolaya model33 shows a moderate ability to differentiate between groups at low and high risk of ‘serious infection’, but again with marked uncertainty (LR [low] = 0.17 (95% CI 0.12–0.23) and LR [high] = 2.87 (95% CI 2.43–3.38), post-test probabilities for the low risk group = 13% (95% CI 9–13%) and in the high risk group 72% (95% CI 68–75%)). The rule has been developed and tested in Chile, which may limit its applicability in Western Europe and North America. The proportion of patients with bacteraemia (~25%) is similar to the other studies in this review, but their broad definition of adverse medical outcomes, as found in ~50% of cases, does not have a direct comparator among the Western European/North American studies reviewed, therefore no accurate conclusion can be reached. This reflects an uncertainty in the selection of the desired and measurable outcome. An ideal study would consider not just death, critical care requirement, serious medical complication, significant bacterial infection or bacteraemia but ‘an absence of adverse consequences’.
Any adaptation or development of a new rule should primarily look to assess the variables shown in the many CDR reviewed to be of predictive value, over those found purely by ‘p-value’ sampling of bivariable testing. In addition to reducing random error, building upwards from the simple clinical variables of age, disease and basic clinical examination will ensure any complex tests add significant value to the fundamentals of patient assessment.
An analysis of the techniques used to build the CDR was incorporated into this review. The studies are spread across a number of years, and during that time there have been significant methodological developments and technological improvements which have made previously complex computation within the reach of many health researchers. However, a series of previously described methodological problems with diagnostic/prognostic model papers were present in this review. These included: small event-per-variable ratios leading to models more likely to be overfitted to their original dataset and disappointing in clinical practice40; overestimation of accuracy from derivation studies; failure to examine for non-linear relationships, which may misjudge a predictor as unimportant,41 for example, there are plausible reasons to assume that patient age may have a non-linear ‘U’-shaped relationship with infection and outcome,42 as should time-from-chemotherapy; use of data-driven stepwise variable selection and cutpoint determination techniques which may give spurious results43,44; premature categorisation of continuous data; lack of examination of missing data and suboptimal examination of clustered data.
This review has demonstrated a wide range of rules for the prediction of adverse outcomes during episodes of febrile neutropaenia in children. None of the rules identified has been subject to the extensive geographical and temporal discriminatory validity assessments that mark the highest quality CDRs, and many potential difficulties with model building have been identified. Practical application of many of these CDR within an in-patient environment is likely to be safe but without further research uncertainty will remain as to the efficiency of the CDR in use. To provide this information and maximise the value of the information already collected by these and other cohorts of children with febrile neutropaenia, an individual-patient-data (IPD) meta-analysis is being undertaken to develop and test new or existing prediction models and provide a firmer basis for stratified treatment trials in this common and occasionally fatal complication of therapy.45
R.S.P. conceived the idea and co-ordinated and led the review, developed the protocol, undertook screening and data extraction, synthesis and drafted the manuscript. He has had full access to the data and has final responsibility for the paper and the decision to submit. R.W. developed the protocol, undertook screening and data extraction and read and modified the manuscript. L.A.S. developed the protocol, reviewed the results and read and modified the manuscript. A.J.S. developed and checked the synthesis and read and modified the manuscript.
R.S.P. is supported by an MRC Research Training Fellowship G0800472, which also supported R.W. for this review. A.J.S. and L.A.S. received no external funding for their work in this study. The funder had no role in the design or conduct of the study nor the production of, or decision to submit, this manuscript.
The authors wish to acknowledge Lindsey Myers and Melissa Hardman (CRD) for their search support.
Appendix ASupplementary data associated with this article can be found, in the online version, at doi:10.1016/j.ejca.2010.05.024.