|Home | About | Journals | Submit | Contact Us | Français|
To validate and refine a clinical prediction rule to identify which children with acute abdominal pain are at low risk for appendicitis (Low Risk Appendicitis Rule).
Prospective, multi-center cross-sectional study.
Ten pediatric hospital emergency departments.
Children 3–18 years old who presented with suspected appendicitis from May 2009 – April 2010.
The test performance of the Low Risk Appendicitis Rule.
Among 2625 patients enrolled, 1018 (38.8%; 95% confidence interval [CI] 36.9% – 40.7%) had appendicitis. Validation of the rule resulted in a sensitivity of 95.5% (95% CI 93.9 – 96.7%), specificity of 36.3% (33.9 – 38.9%) and negative predictive value (NPV) of 92.7% (90.1 – 94.6%). Theoretical application would have identified 573 (24%) as low risk, misclassifying 42 patients (4.5%; 95% CI 3.4% – 6.1%) with appendicitis. We refined the prediction rule, resulting in a model that identified patients at low risk if: a) absolute neutrophil count (ANC) ≤ 6.75 × 103/µL and no maximal tenderness in right lower quadrant (RLQ) or b) ANC ≤ 6.75 × 103/µL, maximal tenderness in the RLQ but no abdominal pain with walking/jumping or coughing. This refined rule had a sensitivity of 98.1% (97.0 – 98.9%), specificity of 23.7% (21.7 – 25.9%) and NPV of 95.3% (92.3 – 97.0%).
We have validated and refined a simple clinical prediction rule for pediatric appendicitis. For patients identified as low risk, clinicians should consider alternative strategies such as observation or ultrasound, rather than proceed to immediate imaging with CT.
Appendicitis is the most common surgical emergency in children and acute abdominal pain accounts for 5–10% of all pediatric emergency department (PED) visits.1–3 The diagnosis of appendicitis can be difficult, with many children misdiagnosed on initial presentation.4 Furthermore, negative appendectomy and perforation rates remain high, indicating a need to re-evaluate the diagnostic assessment for this disease.5–8
Computed tomography (CT) has high sensitivity and specificity for appendicitis and is heavily relied upon in the evaluation of patients with possible appendicitis.9 However, despite dramatic increases in CT utilization, substantial improvements in patient outcomes have not been realized.5, 10–13 This discrepancy is potentially the result of over-utilization of CT which is problematic as it results in unnecessary exposure to ionizing radiation, prolonged ED visits, and increased costs.6, 13–14
Prior studies have described substantial variability in the evaluation and management of children with suspected appendicitis.10, 15 Standardizing the approach to patients with suspected appendicitis through clinical prediction rules could reduce variability and reliance on CT, thus promoting the delivery of efficient, safe and cost-effective health care.16 Clinical prediction rules can be utilized to risk stratify patients, allowing for tailored management based upon patients’ risks for disease.17
In 2005, our research team published a low risk clinical prediction rule for pediatric appendicitis.18 Single-center internal validation revealed a sensitivity and negative predictive value (NPV) of 98% [95% CI 89–100%] and 98% [95% CI 85–100%], respectively.18 Hypothetical application of the rule could have led to a 20% reduction in CT utilization. Prior to implementation, independent validation of this rule is important. The objective of the current study was to validate and potentially refine our clinical prediction rule in a multi-center cohort of children with suspected appendicitis.
We performed a prospective, cross-sectional study of children with suspected appendicitis at 10 PED’s that are members of the Pediatric Emergency Medicine Collaborative Research Committee (PEM-CRC) of the American Academy of Pediatrics. The PEM-CRC reviewed and approved the final study protocol. The study was approved by each participating site’s Institutional Review Board (IRB) and data user agreements were formalized between sites and the central data center. Seven IRBs granted a waiver of written informed consent/assent and instead allowed for verbal consent. At the three remaining sites, written consent from the guardians and assent from children seven years of age and older was obtained.
Children 3 to 18 years presenting to the PED with acute abdominal pain of less than 96 hours duration and who were being evaluated for suspected appendicitis were approached for enrollment. We defined “suspected appendicitis” patients as those for whom the treating physician obtained blood tests, radiological studies [CT and/or ultrasound (US)] or a surgical consultation for the purpose of diagnosing appendicitis. Radiological studies or surgical consultations were obtained at the discretion of the treating physician. We excluded patients with: pregnancy, prior abdominal surgery (i.e. gastrostomy tube, abdominal hernia repair), chronic abdominal illness or pain (e.g. inflammatory bowel disease, chronic pancreatitis, chronic/recurrent appendicitis), sickle cell anemia, cystic fibrosis, or a medical condition affecting the provider’s ability to obtain an accurate history. We also excluded patients who had radiologic studies (CT or US) of the abdomen performed prior to PED arrival or a history of abdominal trauma within 7 days of the PED evaluation.
Prior to initiation, site principal investigators (PI’s) received standardized training that included a detailed manual of operations and instructions on the proper completion of case report forms (CRFs). Site PI’s subsequently conducted group and one-on-one instructional sessions with clinicians who worked in their respective PEDs.
A PEM attending or fellow physician completed a standardized history and physical examination on a structured CRF. A resident physician, nurse practitioner or physician assistant was allowed to complete the CRF with attending oversight. A subset of participants had a separate, independent assessment performed by a second clinician within 30–60 minutes of the first evaluation. Clinicians completed CRFs prior to knowledge of CT or US results.
CRFs were completed on paper and subsequently entered into Adobe Pro (Adobe Systems, San Jose, CA) for electronic transfer to the central data management warehouse through an electronic CRF (Teleforms™). Quality-assurance practices at the data warehouse included surveillance for missing and duplicate data. We determined capture rate by reviewing the PED visit, admission, pathology and radiology logs for 2 random days of each study month. Two sites were able to perform active surveillance (daily data capture monitoring). We compared demographic, clinical and outcome data between enrolled and missed patients to detect possible enrollment bias.
The primary outcome was the performance of the clinical prediction rule to identify children at low risk for appendicitis. Patient disposition was based upon physician discretion. Among patients undergoing surgery, we determined the presence of appendicitis from the attending pathologist’s written report. Appendiceal perforation was determined from the attending surgeon’s written operative report. A priori, we standardized the terminology to code pathology and operative reports.
For patients discharged from the PED, we conducted telephone follow-up within 2 weeks to determine resolution of signs and symptoms, visits to other sites of care, and need for surgery. If we were unable to contact the guardian, we reviewed the medical record for 90 days after the index PED visit to determine if the patient underwent a CT, US, or operation at that facility.
The previously published low risk prediction rule consisted of the following variables: ANC ≤ 6.75 × 103/µL, absence of nausea, and absence of maximal tenderness in the RLQ of the abdomen. On the CRFs, clinician’s had the option of coding the presence of nausea as “yes”, “no” or “don’t know” and maximal tenderness in the RLQ as “yes”, “no” or “unsure”. Responses of “don’t know” or “unsure” were analyzed as if the patient had the finding. We excluded patients if any of the prediction rule components were missing. A sensitivity analysis was performed to determine the effect on test performance of recoding “don’t know/unsure” findings as present, absent or missing. We calculated performance of the rule as sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). We assessed the accuracy of the low risk rule based on whether patients were identified as “low risk” in either of the terminal decision tree nodes (as analyzed in the original study).18
We anticipated that our validated prediction rule may have diminished performance, thus a priori we planned to refine the rule. We conducted binary recursive partitioning analyses (CART 6.0, Salford Systems, San Diego, CA) to refine our prediction rule, in order to create models that had higher sensitivity (> 95%) without affecting specificity (25–35%). We aimed to create rules for which the risk of appendicitis in the low risk group was less than or, at minimum, similar to the approximately 6–7.5% false negative rate of CT.9, 19 We entered variables into the model that were included in our original study as well as any patient history and physical examination variables that had at least moderate inter-rater reliability (kappa > 0.4).20 The following variables were entered: duration of abdominal pain, nausea, emesis, history of focal RLQ pain, presence of abdominal tenderness, maximal tenderness in the RLQ, abdominal pain with walking, abdominal pain on the right side with walking, the ANC and WBC (using both continuous and categorical cut-points) We identified the categorical cut-points through the use of univariate recursive partitioning. For this analysis, responses that were marked “unsure” or “don’t know” were coded as missing data. We used the Gini splitting method for classification trees and internally validated the results of our refined model using ten-fold cross validation. To create the models, we varied costs to always favor not missing a case of appendicitis rather than diagnosing appendicitis in a child that did not have the illness.
Patients were enrolled in 10 PEDs with broad US geographic distribution from March 2009 through April 2010. We removed data from one site prior to analysis as their capture rate was below 40%. Therefore, the study cohort was comprised of 2625 patients across the remaining 9 sites, representing 71% of eligible patients. Enrollment by site ranged from 223 to 473 patients and the capture rate varied from 48% – 96%. A total of 1018 (38.8%; 95% confidence interval [CI] 36.9% – 40.7%) patients were diagnosed with appendicitis, of whom 275 (27%; 95% CI 24.4% – 30.0%) had a perforated appendix. Of those undergoing an operation, no evidence of appendicitis by pathology was found in 95 patients (negative appendectomy rate = 8.5%; 95% CI 7.0 – 10.3%). We completed telephone follow-up on 88% of patients discharged from the PED. None of the 186 patients lost to telephone follow-up had evidence of an appendectomy via review of the medical record (Figure 1).
The mean age of enrolled patients was 10.8 [SD ± 3.8] years and 51% were male. The most common diagnoses among patients who did not undergo an appendectomy included: non-specific abdominal pain (42%), gastroenteritis (14%), and constipation (12%). Clinicians obtained a CT in 55%, an ultrasound in 37% and both studies in 9% of patients. In total, 2116 (81%) patients underwent diagnostic imaging. Missed patients were similar to enrolled, with a mean age 11.6 years [SD ± 3.6], 53% of male sex, a 42% rate of appendicitis, and 29% having perforated appendicitis (Table 1). Among missed patients, clinicians used US more frequently (68%), CT less frequently (44%), and there was a higher rate of using CT or US (93%).
Complete data for rule performance was available for 2390 patients (91%). The most common reason for exclusion from analysis was the absence of a white blood cell count (188 patients). The test characteristics of validation are provided in Table 2; we include the test characteristics of the derivation sample from our prior published study for comparison.
Theoretical application of the low risk rule is presented in Figure 2. A sensitivity analysis revealed no significant change in test performance based on coding of unsure/don’t know responses (data available upon request). In total, 573 patients (24% of those with complete data) were identified as low risk; 64 (11%) underwent an operation for presumed appendicitis of whom 42 had pathology proven appendicitis and 22 had negative appendectomies. In addition, 296 (52%) underwent a CT, 241 (42%) an US and, in total, 465 (81%) had either a CT or US performed. Application of the low risk rule would have theoretically prevented 22 unnecessary operations, 465 (24%) diagnostic imaging studies, but missed 42 (4.5%; 95% CI 3.4% – 6.1%) patients who were ultimately diagnosed with appendicitis. In Table 3, we present the clinical characteristics of the 42 patients with appendicitis who were misclassified by the prediction rule.
The refined model identified patients as low risk for appendicitis through use of the following a) ANC ≤ 6.75 × 103/µL and no maximal tenderness in the right lower quadrant (RLQ) or b) ANC ≤ 6.75 × 103/µL, maximal tenderness in the RLQ but no abdominal pain with walking/jumping or coughing (Figure 3). Test characteristics of the refined model are presented in Table 4. Of the 400 patients identified as low risk, 27 (6.7%) underwent an operation, 19 of whom had appendicitis. Additionally, of these 400 patients, clinicians obtained CT or US in 301 (75%), including 180 patients (45%) who had a CT.
In this large, prospective, multi-center study of children with suspected appendicitis, our previously derived low risk prediction rule maintained high sensitivity and modest specificity in a validation cohort. Furthermore, we refined our low risk rule in order to improve test sensitivity. These low risk rules identify children with suspected appendicitis at low but not zero risk for appendicitis.
Our study adds to a growing literature on the use of clinical prediction rules for managing patients in the emergency department.17, 21 22–23 Similar to prior studies, our goal was to identify patients at low risk for illness in order to reduce reliance on diagnostic imaging and to reduce inefficient care delivery. As our study confirms, CT is heavily relied upon to diagnose and manage children with acute abdominal pain.10 The potential benefit of our clinical prediction rule lies in its ability to stratify patients, identifying those at low risk for appendicitis.
Several previous investigators have developed clinical prediction rules or scores for the diagnosis of appendicitis.24–27 The Alvarado and Samuel scores are the most commonly cited and although the original studies noted excellent test performance, external validation by independent investigators revealed conflicting results.28–30 It should be noted, that both scoring systems were intended to identify patients with appendicitis rather than identify a low risk group.24 Compared to these prior scores, advantages of our prediction rule include its simplicity, external validation in a large sample across multiple PEDs, and ability to more accurately identify a low risk cohort. Lastly, a decision tree format may be easier than a numerical based score for clinicians to remember and use.
Although the sensitivity of our validated low risk prediction rule was high, the NPV was lower than in the derivation study (98% derivation vs. 92.7% validation). As a result, 42 children (4.5% of patients with appendicitis) were misclassified as not having appendicitis. This rate of misclassification may concern clinicians given the potential medical and legal consequences associated with missed appendicitis. We anticipated this issue and thus refined our rule with the goal of improving the sensitivity and NPV. Our refined prediction rule provides sensitivity and NPV which are somewhat higher (98.1% and 95.3%, respectively), but the specificity and PPV of the rule do diminish. Furthermore, the refined rule would still miss some cases of appendicitis (19 patients). Consequently, either rule may be appropriate to identify a low risk population (risk of appendicitis: 7.3% validated rule, 4.8% refined rule) who clinicians may choose to observe for progression of abdominal symptoms. The use of ultrasound and/or surgical consultation may also be viable alternatives. Given the high rate of negative appendectomies in the low risk cohort (> 30%) as compared to the overall study cohort (8.5%), it would be prudent for surgeons to be cautious operating on low risk patients. Ultimately, our prediction rules may be best suited for integration into an appendicitis care algorithm to help stratify risk and guide clinical management (e.g. observation with serial examination for low risk patients).
It is important to consider the potential use of our low risk prediction rules in relation to the performance of CT. Although CT has demonstrated a sensitivity of 94% [95% CI 92–97%] and a specificity of 95% [95% CI 94–97%] for appendicitis, the PPV of CT will be lower when it is used in populations with a low prevalence of appendicitis.9 In addition, the NPV of CT is not 100%. 19 In our present study, if clinicians had acted upon CT results in isolation, 20 patients would have had missed appendicitis (inappropriately discharged home) and 27 patients would have had negative appendectomies (data available upon request). These results support concerns raised by several investigators that the excessive use of CT may lead to unnecessary operations, delays in care and increased costs.31–33
Physicians may have concerns regarding the reliability of the clinical variables included in our prediction rules. Through the course of our study, we collected data on the inter-rater reliability of clinical history and physical examination findings, the results of which have been presented previously.20 The presence of nausea had a kappa of 0.44 [95% CI 0.37–0.52], maximal tenderness in the RLQ 0.45 [95% CI 0.36–0.54], and pain with walking 0.54 [95% CI 0.45–0.63], indicating moderate reliability for all three parameters.
Ultimately, the clinical utility of our prediction rules is in their ability to provide a quantitative assessment of risk for appendicitis. In this study we elected to stratify patients as either “low risk” or “not low risk” for appendicitis. In this scheme, patients identified as “low risk” had a risk of appendicitis of 7.3% (validated rule) or 4.8% (refined rule). However, by observing how patients flow within the decision trees, specific risks for appendicitis can be determined depending on a patient’s particular signs and symptoms (range of 4–12% for the various terminal nodes). As electronic health record-based clinical decision support becomes more common within EDs, the ability to calculate an appendicitis risk may allow physicians to tailor management based on their own risk tolerance and availability of diagnostic imaging and surgical resources.
Our study had the following limitations. Enrollment of patients varied considerably by site. To assess for enrollment bias, we conducted random medical record audits which revealed that missed patients were similar to those enrolled. Although we enrolled pediatric patients from numerous geographical regions, enrollment occurred exclusively in PEDs. Therefore, our results may not be able to be generalized to other settings. Our clinical prediction rule was developed and validated in cohorts where the rate of appendicitis was quite high (> 30%). Use of the rule in an urgent care or clinic setting, where the rate of appendicitis is lower, might result in a higher NPV but lower PPV. We collected clinical parameters only at the time of enrollment, thus the patients’ exam may have changed prior to final disposition. Although we made every attempt to follow-up patients discharged from the PED, we cannot exclude the possibility that some underwent appendectomies at alternative facilities. Lastly, we stress that our study was not an implementation study; clinicians should understand the potential risks and benefits of using the validated rule prior to formal implementation of the refined rule before external validation.
We validated and refined a clinical prediction rule for pediatric appendicitis, identifying a population of children with suspected appendicitis who are at low but not zero risk for appendicitis. If applied, clinicians will need to balance the risks of missing a case of appendicitis with the increased risk of negative appendectomies and the potential long-term risks associated with exposure to ionizing radiation. Clinicians should consider alternative strategies such as observation or ultrasound for patients identified as low risk rather than proceeding to immediate imaging with CT.
The PEMCRC data center is supported in part by the Center for Clinical Effectiveness at Baylor College of Medicine/Texas Children's Hospital. We thank all of the clinicians who enrolled patients into this study and the research coordinators who greatly facilitated study completion.
Supported by Grant Number UL1 RR024156 from the National Center for Research Resources (NICRR), a component of the National Institute of Health (NIH) and NIH Roadmap for Medical Research. In addition, Anupam Kharbanda received salary support through the Empire Clinical Research Program (New York State).
The funding agencies took no part in data analysis, interpretation or manuscript preparation. No person received any honorarium or other payment to produce this manuscript. This manuscript was written by ABK and all authors take full responsibility for the integrity of the data and the accuracy of data analysis.
Presented in part at the annual meeting of Pediatric Academic Societies, Denver, Colorado, May 2011
Conflict of Interest
None of the authors have any financial disclosures or conflicts of interest to report.