In this large, prospective, multi-center study of children with suspected appendicitis, our previously derived low risk prediction rule maintained high sensitivity and modest specificity in a validation cohort. Furthermore, we refined our low risk rule in order to improve test sensitivity. These low risk rules identify children with suspected appendicitis at low but not zero risk for appendicitis.
Our study adds to a growing literature on the use of clinical prediction rules for managing patients in the emergency department.17, 21
Similar to prior studies, our goal was to identify patients at low risk for illness in order to reduce reliance on diagnostic imaging and to reduce inefficient care delivery. As our study confirms, CT is heavily relied upon to diagnose and manage children with acute abdominal pain.10
The potential benefit of our clinical prediction rule lies in its ability to stratify patients, identifying those at low risk for appendicitis.
Several previous investigators have developed clinical prediction rules or scores for the diagnosis of appendicitis.24–27
The Alvarado and Samuel scores are the most commonly cited and although the original studies noted excellent test performance, external validation by independent investigators revealed conflicting results.28–30
It should be noted, that both scoring systems were intended to identify patients with appendicitis rather than identify a low risk group.24
Compared to these prior scores, advantages of our prediction rule include its simplicity, external validation in a large sample across multiple PEDs, and ability to more accurately identify a low risk cohort. Lastly, a decision tree format may be easier than a numerical based score for clinicians to remember and use.
Although the sensitivity of our validated low risk prediction rule was high, the NPV was lower than in the derivation study (98% derivation vs. 92.7% validation). As a result, 42 children (4.5% of patients with appendicitis) were misclassified as not having appendicitis. This rate of misclassification may concern clinicians given the potential medical and legal consequences associated with missed appendicitis. We anticipated this issue and thus refined our rule with the goal of improving the sensitivity and NPV. Our refined prediction rule provides sensitivity and NPV which are somewhat higher (98.1% and 95.3%, respectively), but the specificity and PPV of the rule do diminish. Furthermore, the refined rule would still miss some cases of appendicitis (19 patients). Consequently, either rule may be appropriate to identify a low risk population (risk of appendicitis: 7.3% validated rule, 4.8% refined rule) who clinicians may choose to observe for progression of abdominal symptoms. The use of ultrasound and/or surgical consultation may also be viable alternatives. Given the high rate of negative appendectomies in the low risk cohort (> 30%) as compared to the overall study cohort (8.5%), it would be prudent for surgeons to be cautious operating on low risk patients. Ultimately, our prediction rules may be best suited for integration into an appendicitis care algorithm to help stratify risk and guide clinical management (e.g. observation with serial examination for low risk patients).
It is important to consider the potential use of our low risk prediction rules in relation to the performance of CT. Although CT has demonstrated a sensitivity of 94% [95% CI 92–97%] and a specificity of 95% [95% CI 94–97%] for appendicitis, the PPV of CT will be lower when it is used in populations with a low prevalence of appendicitis.9
In addition, the NPV of CT is not 100%. 19
In our present study, if clinicians had acted upon CT results in isolation, 20 patients would have had missed appendicitis (inappropriately discharged home) and 27 patients would have had negative appendectomies (data available upon request). These results support concerns raised by several investigators that the excessive use of CT may lead to unnecessary operations, delays in care and increased costs.31–33
Physicians may have concerns regarding the reliability of the clinical variables included in our prediction rules. Through the course of our study, we collected data on the inter-rater reliability of clinical history and physical examination findings, the results of which have been presented previously.20
The presence of nausea had a kappa of 0.44 [95% CI 0.37–0.52], maximal tenderness in the RLQ 0.45 [95% CI 0.36–0.54], and pain with walking 0.54 [95% CI 0.45–0.63], indicating moderate reliability for all three parameters.
Ultimately, the clinical utility of our prediction rules is in their ability to provide a quantitative assessment of risk for appendicitis. In this study we elected to stratify patients as either “low risk” or “not low risk” for appendicitis. In this scheme, patients identified as “low risk” had a risk of appendicitis of 7.3% (validated rule) or 4.8% (refined rule). However, by observing how patients flow within the decision trees, specific risks for appendicitis can be determined depending on a patient’s particular signs and symptoms (range of 4–12% for the various terminal nodes). As electronic health record-based clinical decision support becomes more common within EDs, the ability to calculate an appendicitis risk may allow physicians to tailor management based on their own risk tolerance and availability of diagnostic imaging and surgical resources.
Our study had the following limitations. Enrollment of patients varied considerably by site. To assess for enrollment bias, we conducted random medical record audits which revealed that missed patients were similar to those enrolled. Although we enrolled pediatric patients from numerous geographical regions, enrollment occurred exclusively in PEDs. Therefore, our results may not be able to be generalized to other settings. Our clinical prediction rule was developed and validated in cohorts where the rate of appendicitis was quite high (> 30%). Use of the rule in an urgent care or clinic setting, where the rate of appendicitis is lower, might result in a higher NPV but lower PPV. We collected clinical parameters only at the time of enrollment, thus the patients’ exam may have changed prior to final disposition. Although we made every attempt to follow-up patients discharged from the PED, we cannot exclude the possibility that some underwent appendectomies at alternative facilities. Lastly, we stress that our study was not an implementation study; clinicians should understand the potential risks and benefits of using the validated rule prior to formal implementation of the refined rule before external validation.