Study identification and characteristics
Ninety seven studies were identified as being potentially relevant and retrieved. Two of these studies were in press at the time of data extraction and have since been published.w11 w13 Forty four studies were excluded because they were reviews or commentaries, 12 did not study croup, nine had inadequate randomisation strategies, four were retrospective studies, two had no control group, one had no outcome of interest, and one was a duplication. Therefore, 24 studies were included (references and full details of these studies can be found in table A on the BMJ’s website). The weighted κ score between two reviewers was 0.89, indicating substantial agreement.
Twenty two of the included studies had been published in English, one in French, and one in Spanish.Dexamethasone was evaluated in 17 trials, budesonide in nine, and methylprednisolone in three; some studies examined more than one drug. Five of the trials compared active treatments; 19 were placebo controlled. The mean age of the children in the different studies ranged from 13 months to 45 months; the minimum age was 4 months and the maximum was 12 years. Fourteen trials were conducted on inpatients, and 10 were conducted on outpatients. However, studies tended to be small with a median of 40 (interquartile range 36 to 60) participants. The pooled baseline rates using fixed effect models were reported.
Quality assessment of trials
The intraclass correlation between two reviewers was 0.63 for the Jadad scale, 0.98 for allocation concealment, and 1.0 for sponsorship, indicating at least substantial agreement in all cases. The median Jadad score was 3 (interquartile range 2.75 to 4) or 60% (55% to 80%) for the best quality of reporting. Allocation concealment was adequate in 11 (46%) of the studies, inadequate in one (4%), and unclear in 12 (50%). Pharmaceutical sponsorship was identified in three (13%) studies, support was from other sources in three (13%), and not mentioned in 18 (75%). Overall, the quality of studies was better than has been observed for other diseases.
9,20,21 Croup score
The most frequent outcome utilised in 13 studies was the clinical croup score based on a 17 point ordinal scale developed by Westley.
19 Other scoring systems, none of which have been validated, were utilised in five studies; in six studies no clinical score was reported.
The improvement in the Westley croup score at 6 hours was 2.8 (95% confidence interval 2.2 to 3.5) for dexamethasone or budesonide versus 1.0 (0.3 to 1.7) for placebo. The difference in improvement in the Westley score between treatment arms at 6 hours was 1.6 (1.1 to 2.2). The pooled standardised effect size was 1 (0.6 to 1.5) at 6 hours and 1 (0.4 to 1.6) at 12 hours. From our data, a standard effect size of 1.2 (0.7 to 1.7) corresponded with an improvement of 1.6 (1.1 to 2.2) in a Westley score (fig ) (see appendix 2 on the BMJ’s website for a list of included trials). This change was not significant at 24 hours; however, fewer patients were evaluated at 24 hours and hence the lack of significance may be a reflection of a lack of statistical power. The magnitude of change of −1 is similar to that seen at earlier evaluation points but the 95% confidence interval crosses 0. A decrease in effect size of 1 from baseline is thought to be a clinically important change.
At 6 hours, the difference in risk was 15% (95% confidence interval 2% to 28%) with a number needed to treat of 7 (4 to 50). The baseline rate of clinical improvement was 41% (32% to 50%). At 12 hours the risk difference was 21% (9% to 33%) with a number needed to treat of 5 (1 to 11). The baseline rate of clinical improvement was 68% (58% to 77%). At 24 hours, the risk difference was 12% (3% to 22%) and the number needed to treat was 8 (5 to 33). The baseline rate of clinical improvement was 83% (75% to 91%). Although not all studies contributing tothe effect size expressed their results as improved versus not improved, the degree of benefit of a number needed to treat of 5 to 7 patients (at different assessment times) would be sufficient to support the use of glucocorticoids over placebo.
Additional interventions
There was no significant increase in the use of antibiotics among those treated with glucocorticoids as compared with those treated with placebo when expressed as the difference in risk. This was consistent for the dexamethasone group (4%, −20% to 27%) and the budesonide group (−2%, −17% to 13%). There was a significant decrease noted in the use of adrenaline in the glucocorticoid groups with a difference in risk of −9% (−16% to −2%) in the budesonide group (number needed to treat 10; baseline rate 16%) and −12% (−20% to −4%) in the dexamethasone group (number needed to treat 8; baseline rate 23%). There was no significant impact on the use of supplemental glucocorticoids among either those treated with dexamethasone (4%, −4% to 13%) or those treated with budesonide (−15%, −32% to 2%).
When any glucocorticoid was compared with placebo (11 studies, 1150 patients) there was no significant change in the rate of difference of intubation or tracheotomy −2% (−14% to 10%; baseline rate 3.2%, 2.9% to 3.5%).
Hospitalisation
Overall, a significantly shorter time was spent in accident and emergency when children were treated with a glucocorticoid as compared with placebo (5 studies, 596 patients); the weighted mean difference was −11 (−18 to 4) hours. For inpatients, the difference was −16 (−31 to 1) hours.
There was a non-significant decrease of −16% (−39% to 6%) in the rate of hospitalisation for patients treated with budesonide versus patients treated with placebo (baseline rate 32%, 24% to 39%). This was also true for patients treated with dexamethasone as compared with patients treated with placebo (−2%, −31% to 5%) or if any glucocorticoid was compared with placebo (−14%, −12% to 5%). The more conservative random effects model was used to derive the overall estimate of the difference in hospitalisation rates because there was significant heterogeneity between studies. If the fixed effects model estimate was used there was a significant decrease in hospital admissions between patients treated with budesonide and those treated with placebo (−15%, −20% to −10%).
Sensitivity and subgroup analyses
The sensitivity analysis showed that the method of scoring the severity of croup was important (fig ). An effect size of −1.2 (−1.7 to −0.7) was identified when the Westley croup score was used (9 studies, 569 patients) as compared with an effect size that was 50% smaller when other croup scores were used (4 studies, 497 patients; −0.6, −1.5 to 0.3), a size that was no longer significant. The Westley score is the only method that has undergone validation and reliability testing and been shown to be sensitive to important changes in a patient’s clinical status. The smaller treatment effect noted with non-Westley scores could be the result of sensitivity to change or perhaps a greater degree of variability caused by low reliability.
We were unable to compare the route of administration of glucocorticoids in a meaningful way because of the lack of standardisation of scores between studies. The quality weighting of the effect size did not change the estimate or the width of the 95% confidence interval; this is in part explained by the high methodological quality of the studies. The estimate derived from studies in which allocation was adequately concealed was −1.2 (−1.9 to −0.5) and for the studies in which it was inadequately concealed or in which it was unclear was −0.9 (−1.4 to −0.3). These differences are probably not clinically or statistically significant.
Publication bias
We identified a marked publication bias, and there is also the possibility that small studies that showed that glucocorticoids had no effect were suppressed from publication. There was a significant correlation between treatment effect and sample size (for example, rank correlation test P=0.013; graphical method P=0.004). The Dear-Begg estimate of this correlation was 0.29. Pooled effect size at 6 hours calculated using the simple graphical method was −1.1 (−1.5 to −0.8); with the selection model it was −1.2 (−2.4 to −0.01); and with the trim and fill method it was −0.2 (−0.8 to 0.4). The trim and fill method suggested that seven small trials were suppressed because their results were not significant.