|Home | About | Journals | Submit | Contact Us | Français|
Appendicitis is a common disease requiring surgery. Bedside ultrasound (BUS) is a core technique for emergency medicine (EM). The Alvarado score is a well-studied diagnostic tool for appendicitis. This study aimed to investigate the relationship between patients’ symptoms, Alvarado score and ultrasound (US) findings, as performed by emergency physicians (EPs) and radiologists, of patients with suspected appendicitis.
Three EM specialists underwent the BUS course and core course for appendicitis assessment. Patients suspected of having appendicitis were selected and their Alvarado and modified (m) Alvarado scores calculated. The specialists performed the BUS. Then, patients were given a formal US and surgery consultation if necessary. Preliminary diagnoses, admission or discharge from the emergency department (ED) and final diagnosis were documented. The patients were also followed up after discharge from the hospital.
The determined cut-off value was 2 for Alvarado and 3 for mAlvarado scores. The sensitivities of the two scores were 100%. Each score was used to rule out appendicitis. The results of EP-performed BUS were as follows: accuracy 70%, sensitivity 0.733, specificity 0.673, + LR 2.24, and – LR 0.40 (95%CI). Radiologists were better than EPs at diagnosing appendicitis and radiologists and EPs were equally strong at ruling out appendicitis by US. When US was combined with Alvarado and mAlvarado scores, EP US+Alvarado/mAlvarado scores <3 and radiology US+Alvarado/mAlvarado scores <4 perfectly ruled out appendicitis.
BUS performed by EPs is moderately useful in detecting appendicitis. Combined with scoring systems, BUS may be a perfect tool for ruling out decisions in EDs.
Appendicitis is the most common cause of acute abdominal pain requiring surgical treatment in patients less than 50 years old, with a peak incidence in the second and third decades. Although emergency physicians (EPs) may be able to diagnose with ease acute appendicitis that presents in a typical fashion, typical presentations are an exception, not a rule. Atypical presentations are commonly misdiagnosed, resulting in increased morbidity, mortality and potential litigation. Emergency ultrasound (EUS) continues to develop and is now a core technique in emergency medicine (EM). Currently, there are 11 core EUS applications, and each application is covered in the literature. The six initially established applications are: (1) focused assessment with sonography for trauma (FAST) examination; (2) abdominal aortic aneurysm; (3) emergency echocardiography; (4) pregnancy; (5) hepatobiliary ultrasound; and (6) renal ultrasound. The five recently added applications are: (1) deep venous thrombosis; (2) thoracic ultrasound; (3) musculoskeletal ultrasound; (4) ocular ultrasound; and (5) procedural ultrasound. The American College of Emergency Physicians’ (ACEP) 2008 revision of their Emergency Ultrasound Guidelines Policy Statement updates the original 2001 policy statement and details how EUS has expanded and where it stands today. The utility of clinician-performed ultrasonography (US) for suspected appendicitis is unclear.[2–5] Published data concluded that US has a high specificity for ruling in the diagnosis of appendicitis, with variable sensitivity for ruling it out. The Alvarado score is a well-tested and widely published 10-point clinical scoring system. An Alvarado score over 6 was recommended for any appendectomy diagnosis (Table 1).
In this study, we aimed to investigate the relationship between patient symptoms, Alvarado score and US findings of patients suspected of having the diagnosis of acute appendicitis when EPs and radiologists performed US. In addition, this study also tested the performance characteristics of each of these diagnostics separately, as well as in combination with each other.
The ethics committee of our tertiary care university teaching hospital approved the study protocol. Three randomly selected emergency medicine (EM) specialists, who were not experienced in bedside ultrasound (BUS) detection of appendicitis, each underwent a one-day introductory course. The topics of the course included ultrasound for trauma, intrauterine pregnancy, abdominal aortic aneurysm, cardiac ultrasound, biliary ultrasound, urinary tract, deep venous thrombosis, musculoskeletal ultrasound, thoracic ultrasound, ocular ultrasound, and procedural guidance. After this course, they took a second course on six-hour appendicitis assessment as a core course given by an experienced radiologist. During this course, they underwent hands-on training on 25 patients in order to learn to detect appendicitis. These courses were prepared under the guidance of the International Federation for Emergency Medicine’s Point-of-Care Ultrasound (PoCUS) Curriculum Guidelines. Each working shift was arranged to include one physician from the US group. The patients were diagnosed as having appendicitis via US performed by EPs based on the following findings: appendix-anteroposterior diameter over 6 mm, non-compressible and aperistaltic appendix image, periappendiceal anechoic fluid collection, a 2-mm increase in appendiceal wall thickness, the presence of appendicolith, and the presence of ultrasonographic McBurney sign. They were recorded in a formal US report by radiologists who were blinded to the study protocol, and if necessary the radiologists consulted the surgeon who was also blinded to the study protocol. This was a limited ultrasound (US) and no attempt was made to identify other abdominal pathologies.
Between January 1 and March 31, 2015, patients with acute abdominal pain were screened for the study in the emergency department (ED). Adult patients with acute abdominal pain referred to the ED were asked to provide informed consent for participation in the study. Patients aged 18 and above who were admitted to the ED with abdominal pain suggesting suspected appendicitis (as determined by another ED-attending EP who was blinded to the study protocol after history taking and physical examination) were eligible for inclusion in the study and their Alvarado scores or modified Alvarado scores were calculated as described in the literature.[10–12] After calculation of their scores, PoCUS, as performed by the EPs, was used to screen all enrolled patients with suspected appendicitis.
The exclusion criteria were as follows: age less than 18 years, previous appendectomy, pregnancy, inability to follow up by phone, low PoCUS image quality, frank peritonitis, neurological deficits interfering with the ability to localize abdominal pain and hypotension. Finally, 100 patients were enrolled in the study. After history taking and physical examination of the patients, the EM physicians performed US using a Mindray model M7® ultrasound machine with a 5–10 MHz linear probe (Mindray® Bio-Medical Co., Shenzhen, China). B-mode dynamic views of the appendices were recorded. This procedure required 5.5 minutes on average. Each in-group physician documented preliminary diagnoses, admission or discharge from the ED, and final diagnosis based on the pathological specimen after surgery. The patients were also followed up by phone to identify their one-week or one-month mortality rate after discharge from the hospital.
The pathological and clinical results of operations and outpatient follow-up were evaluated to make a final diagnosis of either appendicitis or another condition, and this diagnosis was taken as the gold standard, which was compared with radiology, EM US and Alvarado scores for the evaluation of diagnostic utility and accuracy. The variables were expressed as mean±standard deviation with their confidence intervals. Statistical analysis was performed using MedCalc statistical software version 15.2.2 (MedCalc Software bvba, Ostend, Belgium; http://www.medcalc.org; 2015). Receiver operating characteristic (ROC) curves were calculated using MedCalc, as reported by DeLong (1988). Clinical utility estimators were calculated using a specific online calculator (Richard Lowry, Professor of Psychology Emeritus, Vassar University. Available at vassarstats.net). Concordance or agreement and correlation analyses were performed using Cohen’s weighted K statistics for the physicians in each group. The population size was calculated according to a preliminary study conducted in our institution. The primary outcome of the ’correlation between EP diagnosis at admission and final diagnosis at admission or discharge’ was selected. We estimated that we would achieve at least a correlation of 0.5 with a power of 0.80 and a type I error rate of 0.05. The calculated sample size was 29 for the two-tailed correlation. Three patients from the study were excluded because of follow-up failure or poor image quality, respectively.
Before enrollment, three EPs evaluated 16 patients, and their findings were evaluated. Their calculated intraclass correlation coefficients (ICCs) for absolute agreement (n=14) were 0.91 (95%CI 0.80–0.97) for the preliminary diagnosis of appendicitis.
The mean age of 43 (43%) male patients was 33.58 ±15.78 (95%CI 28.73–38.44) and that of 57 (57%) female patients was 32.30±13.56 (95%CI 28.70–35.90). There was no significant difference in mean age between the two sexes (P=0.663).
According to Alvarado and mAlvarado scores, appendicitis was diagnosed (Table 2). All of the appendicitis cases had a score of 3 and above for the mAlvarado and 2 and above for the Alvarado score, which were determined as the rule-out cut-off values for each score (sensitivity 100%). ROC analyses were performed to determine the area under curves (AUCs) (accuracy) of the Alvarado and mAlvarado scores, and both scores’ abilities to discriminate appendicitis from other diagnoses were compared. The accuracy (AUC) of the Alvarado and mAlvarado scores was 0.698±0.053 (95%CI 0.598–0.786; P=0.0002) and 0.686±0.053 (95%CI 0.586–0.776; P=0.0004), respectively, without any statistically significant difference (pairwise comparison of ROC curves; P=0.4161). The diagnostic utility of the above-mentioned cut-off values (mAlvarado <3 and Alvarado <2) to rule out appendicitis for each score was exactly the same (Table 3).
We also analyzed each component of the Alvarado and mAlvarado scores, as well as the US, for their diagnostic utility and correlation with the final diagnosis of appendicitis. The highest correlation coefficients for the diagnosis of appendicitis were found for the following variables: presence of an appendicular diameter >6 mm, presence of an appendicular wall thickness >2 mm, presence of compressibility of the appendix, presence of periappendiceal fluid, and presence of the sonographic McBurney sign. The clinical utility of these variables is shown in Table 4. The presence and absence of an appendiceal diameter >6 mm, wall thickness >2 mm, and periappendiceal fluid are consistently reported together in US examinations, which is the reason for the exact same correlation coefficients and clinical utility estimators. However, according to the likelihood ratios (LRs) of those variables, none of them is powerful enough alone to rule in or out the diagnosis of appendicitis. On the other hand, their rule-in capacity is higher than that of either of the Alvarado scores (+ LR of 1.05) since these scores are designed for their rule-out capabilities.
The diagnostic accuracy of US performed by the EPs and radiology physicians is compared in Table 5. In 98 of 100 patients, both physicians performed US. The EPs and radiologists were only 65.3% accurate in each other’s diagnoses. The false positive (FP) rate of the radiologists was 3/30 (10%) and for the EPs it was 18/52 (34.6%). The true positive (TP) rates were 90% and 65.4%, respectively. The true negative (TN) rate of the radiologists was 51/68 (75%) and that of the EPs was 36/48 (75%).
The diagnostic workup of patients with lower-right-quadrant pain who present to the ED often involves a combined team approach by the ED, radiology and surgery. Although it has been shown in the radiology literature that the use of US and computed tomography (CT) has improved the diagnostic performance of physicians, these study modalities are time-consuming, delay the diagnosis and final disposition, and in the case of CT, the patient is exposed to ionising radiation.[14–20]
Appendicitis is diagnosed using US by demonstrating the lack of compressibility of a non-peristalsing tubular structure found in the lower-right quadrant that measures more than 6 mm in diameter (Figure 1). Depending on the patient’s body habitus, it may be necessary to use constant pressure in the lower-right quadrant with a transducer to compress subcutaneous fat and displace loops of the bowel.
Apart from individual case reports, to date there have been four published clinical trials on EP-performed BUS for the diagnosis of appendicitis.[2,21–23] Chen et al found that BUS had a sensitivity of 96.4% and a specificity of 67.6% for the diagnosis of appendicitis, compared to a sensitivity of 86.2% and a specificity of 37% based on surgeons’ clinical judgment. However, the prevalence of appendicitis was 75% in their study and all physician sonographers had extensive BUS experience, reflecting a setting atypical for most EDs. Fox et al published two studies on the topic. Their first study was a retrospective registry review, which revealed that EPs without focused training on the use of BUS to diagnose appendicitis had a sensitivity of 39% and a specificity of 90%. This was followed by a prospective study (in which all physician investigators received standardised training), which concluded that BUS was 65% sensitive and 90% specific in diagnosing appendicitis. The main difference between our study and theirs was that we investigated the ICCs of the three EPs in the study group for absolute agreement and depicted the performance characteristics of all EPs as being similar to each other. Also, we combined the scoring systems with the results of the BUS to increase the diagnostic performances of the EPs. Multivariate logistic regression BUS findings showed that the appendix diameter was >6 mm and the appendix wall thickness was >2 mm. This is largely in line with the current radiology literature. Je et al determined that the optimal appendix diameter and wall thickness cut-off value for diagnosis of pediatric appendicitis were 5.7 mm and 2.2 mm, respectively. In another study, Van Randen et al found thickened appendix (>6 mm), transducer tenderness and periappendiceal fat infiltration to be significant variables predicting ultrasound diagnostic accuracy.
Our BUS had a lower sensitivity and specificity than that generally reported in the radiology literature.[25–27] We also had a significant number of false positive BUS studies. We speculate that this might be related to the limited application-specific training and experience of our sonographers. Appendiceal sonography can be hard to master, given the difficulty in visualizing the uninflamed appendix, frequent anatomical variation, common interference from the surrounding structures and mimicry from other intra-abdominal pathologies.
According to the TP rate and positive LR values, radiologists are better than EPs at ruling in the diagnosis of appendicitis with US and approaching a perfect specificity in doing so. On the other hand, according to the TN rate and negative LR values, radiologists and EPs can be regarded as being equally strong at ruling out appendicitis with US, which may be considered as moderate.
When EP and radiology physician-performed US is combined with Alvarado and mAlvarado scores for their ruling-out capabilities, EP US + Alvarado/mAlvarado scores <3 and radiology US + Alvarado/mAlvarado scores <4 perfectly rule out the presence of appendicitis with a sensitivity of 100%, a negative LR value of 0 and a negative predictive value of 100%. However, this combination is not efficient since only 45%–55% (positive predictive value) of the patients proved to have appendicitis as a final diagnosis. Nonetheless, more prospective validation studies must be performed on different patient populations to confirm the score’s external validity before it can be recommended for widespread use.
In addition to more in-depth education and hands-on experience prior to implementation of the appendix BUS protocol, we would recommend a low threshold for confirmatory studies on inconclusive or difficult bedside studies based on our anecdotal experience.
A major limitation of the study was the convenience sampling of the subjects, leading to selection bias. Randomized controlled trials must be conducted to overcome this bias in the future. Our sample size was relatively small, leading to large confidence intervals in some of our calculated test characteristics. Future large-scale studies would be necessary to confirm our findings.
In conclusion, BUS performed by EPs with limited training is moderately useful for the diagnosis of appendicitis. However, EPs may rule out appendicitis by using US as efficiently as radiologists. In addition, a combined model with scoring systems may be a perfect tool for making ruling-out decisions in EDs. Future potential trials based on our results may include a derivation of a ’BUS and Alvarado score’, comprised previously mentioned components, possibly leading to better accuracy than can be achieved by BUS alone.
Ethical approval: The ethics committee of our tertiary care university government teaching hospital approved the study protocol.
Conflicts of interest: The authors declare that there are no conflicts of interest related to the publication of this paper.
Contributors: Ünlüer EE proposed the study, analysed the data and wrote the first draft. All authors contributed to the design and interpretation of the study and to further drafts.