We report the first attempt to evaluate the reproducibility of the decision rule based on PCT and ureteral dilation proposed by our group
[14]. In the derivation study, a significant relationship was found between VUR ≥3 and the rule (P <0.0001), even when considering the rounded rule, or the rule based on PCT only. These significant relationships were not found again in the validation set (P >0.1). The 47% specificity (95%CI, 42–51) of the rule for the prediction of VUR ≥3 was confirmed (46%; 95%CI, 41–52), but not the sensitivity: 60% (95% CI, 50–76) in the validation set vs. 86% (95% CI, 74–93) in the derivation population. The results were similarly comparable between validation and derivation populations for the rounded rule and the rule based on PCT only. Applying the rule to the validation set, we would not have prescribed cystography, and then misdiagnosed VUR in 16 patients (15 without ureteral dilation), representing 34% of the patients with VUR ≥3. The rule only had missed 9 (16%) children among those with VUR ≥3 in the derivation study
[14].
The first issue to be addressed to investigate the decreases of predictive ability of a rule is the difference in the derivation and validation populations, which would explain why a decision rule could not be transferred across those sets of patients
[21]. In the present case, populations were not significantly different (P >0.05) for the classic parameters: gender, prevalence of all-grade and high-grade of VUR. Nevertheless, the validation population had a significantly lower level of inflammatory biomarkers: for CRP for all children, in PCT was also lower in children with VUR ≥3. This result means that the entire distribution of PCT values was moved towards lower values for children with high-grade VUR in the validation population; it could explain why the group of 15 (94%) out of 16 patients with VUR≥3 were missed by the rule, and thus belonged to the rule branch of “patients without ureteral dilation and PCT <0.63 ng/mL”. As the weight of the rule is mainly carried by PCT, a lowest distribution of PCT values might have a major influence on the results of the rule validation set; it is demonstrated by that 15 (94%) out of the 16 patients missed by the rule were not rescued by ureteral dilation criterion. Indeed, this hypothesis also may explain why even the rule based on PCT alone failed to reproduce the 85% sensitivity, even though this result had previously been validated in two multicentre cohort studies
[16],
[17]. This significant difference in inflammatory biomarkers distributions between validation and derivation sets could be due to the fact that samples were collected at different time points during UTI course. We were not able to verify this suggestion, because the centres identified the exact time of inflammatory markers measurement from the appearance of fever, and this data was not collected. Further study of this time interval would be necessary to improve this rule and to implement it safely. These differences regarding the inflammatory parameters distributions were unpredictable before the study validation, because all centres were European and applied the same standard procedures to diagnose and treat children with UTI. Moreover, there was also a trend (p

=

0.08) in the difference between derivation and validation sets on the ureteral dilation on renal US number in children with VUR <3. This finding may also have added to the differences between the two populations concerning the key variables of the rule. Furthermore, we acknowledge that the rule included ureteral dilation, which is a renal US criterion with no measurement of its inter-operator variability in a multi-case multi-reader study, even if it was found to be the best US renal criterion to predict high-grade VUR
[18]. This weakness of the rule needs to be evaluated and corrected.
The second issue concerning the validation's difficulties to reproduce derivation results is the limitations of the external validation study. The validation was a secondary analysis of previously published prospective cohort studies, as was the derivation study. Because our group performed a systematic review and meta-analysis on PCT in UTI in children, we gathered the worldwide published data of children with PCT and VUR
[20]. The derivation study was based on the initially published cohort studies, while the validation study was premised on the later ones. We did not, however, believe that the structure of these studies would affect the quality of the data or introduce a bias.
The use of sterile bags for urine collection for the non-toilet trained children in half of the centres might have introduced a selection bias, because this technique is less specific than suprapubic aspiration or urethral catheterization
[1]. However, there was no significant difference in the number of children for whom urine were collected by sterile bags, compared with those in the derivation population: 238 (48%) children vs. 214 (52%), P

=

0.3 (). Interestingly, the specificities of the rules were significantly higher in the subgroup of children for whom urine specimens were collected properly than in the whole population. This result can be explained by the lower specificity for UTI diagnosis of sterile bags compared with recommended techniques. Indeed, because VUR is known to be a risk factor for UTI, the use of sterile bags might have increased the number of children with VUR <3 more than those with VUR ≥3. In the same manner, because PCT is positively correlated with the presence and severity of UTI
[20], and because the weight of the rule was more carried by PCT than by ureteral dilation
[14], the use of sterile bags may have increased the number of children with a negative result than the one with a positive one. Nevertheless, some centres (e.g. Padova, Italy), in order to decrease the likelihood of false positive results due to bag urine collection, included only children with two consecutive positive urine cultures. The combined increases (in the number of children with VUR <3 and in the number of children with PCT <0.5 ng/mL when sterile bags are used to collected urine) resulted in an underestimate in the specificity in children for whom urine were collected by sterile bags, and thus accounts for the significant difference in specificities. It did not, however, explain the failure to reproduce the sensitivity of the rule, nor the loss of a significant relationship between VUR ≥3 and the rules. To summarize, the validation study may have had some limitations, but none of these limitations appeared to explain fully the failure to validate externally the decision rule.
The validation of the decision rule based on PCT and ureteral dilation on renal US confirmed the rule specificity, but showed a loss of its sensitivity, which led to a misdiagnosis of 34% of children with VUR ≥3. The fact that the rule performed better in the derivation population than in the validation one was predictable, according to the Evidence Based Working Group
[21]. However, the decrease in sensitivity might also be primarily due to differences regarding PCT distributions between derivation and validation sets and was thus unpredictable prior to the study. The rule therefore may need greater refinement, particularly regarding the time of PCT measurement and the variability between observers of ureteral dilation. Furthermore, the outcome should also be reconsidered and modified for a composite outcome, including high-grade VUR and renal scars, which are precisely the cause of kidney injuries leading to future complications, and the real goal of any nephroprotection strategy including VUR treatment.