Our results demonstrate that clinical examinations by PEM physicians for the diagnosis and management of SSTIs are unreliable. Similar results were demonstrated in other studies of bedside clinical examinations. In a study evaluating adult patients with sore throats, physicians demonstrated slight to moderate agreement in physical examination findings.8
Only fair agreement was demonstrated when 2 attending physicians examined adult patients for conjunctival pallor.18
For the evaluation by emergency physicians of adult patients with ankle injuries, the reliability of several components of the physical examination that were incorporated in the decision to obtain radiographs was poor to fair.13
A study of physical examinations for patients with suspected appendicitis between senior surgical residents and PEM physicians showed slight to moderate agreement, depending on the physical examination component.11
Finally, in a study of children with abdominal pain, the reliability between PEM attending physicians and surgical residents was poor to moderate.9
When we stratified our results, we found little to no improvement in agreement among our selected covariates. We theorized that the clinical evaluation of younger children might be more variable and, although we did find that physicians were statistically more likely to agree regarding older children, the clinical significance of this finding is questionable and should be pursued further. The level of training has been shown to be a factor in reliability and correlates with how much specific attention raters pay to relevant cues and how much interest they actually have in the activity being assessed.15
However, we did not demonstrate that 2 physicians who were more experienced, according to our definition, were more reliable than pairs who were not more experienced or were of different experience levels. It is possible that greater experience does not lead to more-consistent examination results in the case of SSTIs, which suggests that the examination of these lesions is difficult for all practitioners and they may require a more-objective means of assessment.
Possible explanations for our findings of a lack of reliability include a lack of consistent clinical criteria and indications for diagnosing and treating these lesions and potential subjectivity in interpreting clinical examination results. Therefore, more-standardized, more-objective methods of diagnosis should be investigated as a means to improve reliability, such as focused education and teaching regarding examination of SSTIs and evaluation of bedside imaging studies, such as clinician-performed ultrasonography for this indication.
There are several limitations to this study. Although the 2 examiners were independent and blinded to each others’ opinions, we cannot exclude the possibility that the study physician obtained clues to the treating physician’s opinion, such as topical anesthesia applied to the area, a nurse preparing for intravenous sedation, or parents discussing the treatment plan. However, any such clues would serve to increase agreement; therefore, our findings would represent an overestimation of the true agreement. We also cannot exclude the possibility that the second physician performed a less-careful history and physical examination than the treating physician, because he or she might have been less concerned than the treating physician with making an accurate diagnosis. This would lead to an underestimation of the true κ, although we expect the impact to be minimal. We considered standardizing the history and physical examination, to overcome this limitation, but we wanted our study to reflect true practice conditions. In addition, we considered multiple lesions on a single patient to be independent, which might inflate the κ statistic if physicians were more willing to assign a similar diagnosis and plan to a second or third lesion. However, our results were the same when we selected randomly 1 lesion per patient. When we stratified our analysis according to physician experience, we determined a priori that ≥3 years of practice after fellowship training would define experience. It is possible that this cutoff point does not represent enough experience that there would be an effect on agreement. Also, non-PEM physicians were included in the group of physicians with <3 years since PEM training. Some of those physicians might have had more experience, in terms of number of years of practice. However, we think that fellowship training represents a different level of training and experience, compared with the training and experience of those without sub-specialty training, regardless of years of practice. If some of these physicians were misclassified, it is not clear how the misclassification would have affected our results. Because this was a single-center study, it is possible that our results cannot be generalized to other practice settings; however, the large number of patients and physicians who participated in the study, each with individual practice patterns, might serve to mitigate this limitation. Moreover, because we enrolled a convenience sample of patients during times when a research associate and a study physician were available, there might be a selection bias regarding the patients who were enrolled in the study. However, when we evaluated this by comparing demographic characteristics of missed and enrolled patients, the 2 populations were similar. In addition, it is unlikely that patients who were missed would represent a different population with a different disease process. Finally, in the case of some labile or dynamic disease processes, such as abdominal pain, it is possible that patients’ examination results change between examiners; therefore, the κ statistic may be underestimated. SSTIs typically are stable within the window of our examinations, however, and our estimate of reliability should not have been affected.