Opinions differ in clinical practice regarding what constitutes a congruent hip, particularly the joint in a severely dysplastic hip. This discrepancy may result in markedly different recommendations for the type of surgical treatment. Also, as the literature reports the outcomes of PAO based on preoperative congruency, it is important to have reliable, reproducible measures of congruency to render the results of these studies meaningful [9
]. We have noted differences in opinion regarding what constitutes a congruent joint, particularly in cases of severe dysplasia. A different assessment of congruency could substantially affect the approach to treatment. Of the various reported classifications of hip congruency [6
], we selected three that are commonly used, are easily measured in a clinical setting, do not require any specific imaging software, and, thus, are of potential practical value. We performed this study to measure the intraobserver and interobserver reliabilities of these commonly used measures of congruency.
Our investigation has several limitations. First, we evaluated only radiographs. The observers had no knowledge of the patient’s history or physical examination findings and could not contextualize the radiographs. Second, measurements of interrater reliability from the second reading may have limited validity owing to practice effects and potential bias, although reviewers were asked to avoid discussing the study with other practitioners between the two readings. Third, although the radiographs were taken at one center we noted some variability in positioning of the affected leg. In particular, positioning for the von Rosen view can be limited by the patient’s symptoms and restricted ROM. For this reason, all observers were given the AP and von Rosen views with which to rate the hip congruency. Fourth, some of radiographs were of hips of patients with severe dysplasia (Fig. ). These three congruency measures may have better intrarater and interrater reliabilities if applied to a patient population with more subtle findings of hip dysplasia. In some cases, improved agreement was noted in patients with minimal dysplasia (Fig. ). Finally, our observers had no specific training in using the classification systems of Yasunaga et al. and Okano et al. It is possible that with training the reliability would have been higher. Nevertheless, we provided the reviewers with the information currently available in the literature.
Fig. 3A−B These are representative (A) AP and (B) von Rosen views of a hip with severe dysplasia. Subjectively, four raters judged this hip to be incongruent, and two thought the hip was congruent. Interestingly, all six raters believed this represented a poorly (more ...)
Fig. 4A–B These are representative (A) AP and (B) von Rosen views of a hip with mild dysplasia. All four staff agreed that this hip was excellent, good, and congruent. The two fellows rated the hip as good or poor with the criteria of Yasunaga et al. and Okano (more ...)
We presumed there would be good intrarater reliability for the three methods. We found low combined intraobserver reliability for the classifications of Okano et al. and Yasunaga et al. When evaluating the reviewers independently, two attending surgeons had much higher intrarater reliability for the classification of Yasunaga et al. For the most part, our reviewers had difficulty duplicating their results 1 month apart for either classification system. To our knowledge, there are no previous published studies regarding intraobserver reliability of the classification systems of Okano et al. and Yasunaga et al. The combined intraobserver reliability for the subjective criteria was high at 0.74. Even if raters do not agree among themselves on a congruent hip, they consistently recognize what they personally consider to be a congruent joint. Clohisy et al. reported on intrarater reliability of a subjective measure of congruency [3
]. They found a combined intrarater reliability of 0.50. Thus, subjective opinion appears to produce the highest intrarater reliability when compared with other measures.
We also presumed there would be low interrater reliability for measurements of hip congruency. We found low interobserver reliability for all three methods when used to measure congruency in hips with a spectrum of hip dysplasia. This was true for the subgroup of pediatric orthopaedic fellows and the attending orthopaedic surgeons. Clohisy et al. had similar findings in their study on interobserver reliability for various hip measures [3
]. They rated congruency using a subjective yes/no criteria with an AP view of the pelvis [3
]. This method is similar to our subjective criteria. They found the kappa coefficient for the congruency rating was poor at 0.29. We had similar results with an interrater reliability of 0.21 using subjective criteria for congruency. As our method is based only on the qualitative judgment whether the arc of the acetabulum matches the arc of the femoral head, it is understandable that there would be differences in opinion, resulting in a low kappa score. Okano et al. and Yasunaga et al. provided more detailed descriptions of congruency using three- and four-part classification systems [9
]. In our literature review, we did not find any previous reports of the interrater reliability for the classification of Yasunaga et al. As part of a larger study, Okano et al. rated 20 hips using their method and reported an excellent interrater kappa value of 0.92 [9
]. To our knowledge, this has not been reproduced in other studies. Our overall interrater kappa using the classification of Okano et al. was 0.25, reflecting low agreement in our patient population. In contrast to our reviewers, Okano et al. [10
] likely are familiar with their classification system and are better able to produce similar results, and thus high kappa scores. Alternatively, the hips in our series might have had more severe dysplasia than those in the series by Okano et al., rendering their classification system less reliable among our reviewers.
There has been increasing interest in the role of hip congruency as a surgical indication and as a prognostic factor for results after acetabular osteotomy. Traditionally, congruency has been considered a prerequisite for reconstructive osteotomy. Our observations suggest practitioners may have their own subjective understanding of what constitutes a congruent hip. Only subjective opinion was a reproducible measure of congruency for the individual surgeon, with good intrarater reliability. However, other commonly used measures of congruency have low intraobserver reliability, and all three methods have low interobserver reliability. Additional studies with more specific guidelines are needed to validate the current measures of congruency. Alternatively, a new radiologic rating of congruency with greater reproducibility among practitioners may aid in refining operative indications and understanding postoperative outcomes for osteotomies in the context of severe hip dysplasia.