|Home | About | Journals | Submit | Contact Us | Français|
The reliability and validity of the Hartofilakidis et al. classification system in adults with congenital hip disease (CHD) were examined. The radiographs of 102 adult patients (158 hips) with CHD were independently assessed by three senior surgeons. Interobserver variability was assessed by examining the agreement between the three raters while validity of the classification system was assessed by examining the agreement between the assessment by either one of the three raters and the intraoperative finding (reference standard). The interobserver agreement between the three observers was high ranging from 0.720 to 0.854 (substantial to excellent) while the agreement of the preoperative prediction with the intraoperative findings was 87.4% (K=0.823, excellent agreement). The Hartofilakidis et al. classification system reliably predicts from preoperative pelvis radiographs the bone deficiencies encountered during the operation.
Nous avons voulu tester dans ce travail la fiabilité et la viabilité de la classification d’Hartofilakidis dans les séquelles des affections congénitales de la hanche. Les radios de 102 adultes (158 hanches) avec lésions congénitales ont été analysées, de façon indépendante par trois chirurgiens seniors avec validation inter observateur et intra observateur. Les résultats de cette analyse inter observateur sont hautement significatifs de 0.720 à 0.854 et les prévisions pré opératoires confirmées par les constatations per opératoires sont excellentes 87.4% (k=0.823). La classification d’Hartofilakidis est un système parfaitement viable qui permet de prévoir un pré opératoire les lésions que l’on pourra constater en per opératoire.
For the classification of congenital hip disease (CHD) in adults various systems have been proposed [1, 12, 16, 17]. The Hartofilakidis et al. classification system, initially published in 1988, relies on intraoperative findings to describe the hip pathology encountered during the operation [1, 2]. The major distinguishing feature is the description of acetabular deformity (Table 1). This classification system encompasses three types of deformity in the adult hip, i.e. dysplasia, low dislocation and high dislocation (Fig. 1). The demographics and clinical presentation vary between the various CHD types . Patients with dysplasia become symptomatic later in adult life, the anatomical distortion of the femur and the acetabulum is milder, the operation is simpler and the results of total hip replacement (THR) are generally better compared to the more severe forms of the disease . In 20–50% of adults with hip osteoarthritis the underlying problem is hip dysplasia .
Ideally, a classification system in addition to being reliable should validly predict the intraoperatively anticipated structural bone deformities or abnormalities and aid treatment planning. It should also include all types of deformity, be simple, easy to memorise and accurate. Reliability is essentially the extent of the agreement between repeated measurements, and validity is the extent to which a method of measurement provides a true assessment of that which it purports to measure [3, 5].
The purpose of this study was to examine the interobserver reliability and the validity of the Hartofilakidis et al. classification of CHD in adults comparing radiographic morphology with intraoperative findings.
The anteroposterior pelvis radiographs of 102 patients with hip osteoarthritis secondary to CHD were examined. The radiographs were obtained from the senior author’s (GH) database. The total number of hips examined was 158. The study was approved by the Institutional Review Board.
In all cases the morphology of the hip joint and the pathology of hip deformity were assessed and recorded at the time of surgery by the senior author (GH). There were 22 cases (13.9%) with dysplasia, 70 cases (44.3%) with low dislocation and 66 cases (41.2%) with high dislocation.
Each radiograph was independently assessed by three experienced senior hip surgeons from different Universities of Greece. The observers were not involved in the selection of the radiographs and had no knowledge of the name, age and sex of each patient. All observers received a detailed description and a diagrammatic explanation of the Hartofilakidis et al. classification system along with a CD containing all radiographs. Interobserver testing was carried out in a blinded fashion. These results were then compared with the intraoperative findings as recorded by the senior author by another examiner. Interobserver reliability was assessed by examining the agreement between the three raters. The validity of the classification system was assessed by examining the agreement between the assessment by each one of the three raters and the intraoperative findings, which were used as a reference standard (Table 2).
The intraclass correlation coefficient is often used as an index of reliability in a measurement study. In these studies, there are N observations made on each of K individuals. These individuals represent a factor observed at random. This design arises when K subjects are each rated by N raters.
The intraclass correlation coefficient may be thought of as the correlation between any two observations made on the same subject. When this correlation is high, the observations on a subject tend to match, and the measurement reliability is ‘high’.
Sample size calculation was based on the primary outcome with the aim of showing a reliability that was at least substantial (kappa>0.7); the power was set to 90%, α=0.05, and β=0.10. Using this approach, a sample size of n=three observers and K=20 X-rays per observer was calculated for each group.
Assessment of interobserver consistency was accomplished using two parameters: the proportion of agreement and the kappa coefficient as proposed by Fleiss.
The observed proportion of agreement is the percentage of instances in which the observers agreed. The kappa coefficient involves adjustment of the observed proportion of agreement by correction for the proportion of agreement which arises due to chance.
Observer’s agreement with the gold standard method was examined using Cohen’s quadratic weighted kappa (K) coefficient. Interobserver agreement was assessed by calculating kappa coefficients for every possible pair of observers. Kappa is the chance-corrected proportional agreement, and possible values range from +1 (perfect agreement) via 0 (no agreement above that expected by chance) to -1 (complete disagreement). Interpretation of the data was performed according to Landis and Koch . An agreement is graded as slight (Κ=0–0.2), fair (Κ=0.21–0.40), moderate (Κ=0.41–0.60), substantial (Κ=0.61–0.80) and almost perfect (Κ=0.81–1). The observed proportion of agreement with the gold standard method among the observers was compared with the chi-square test. Pearson’s chi-square test was also used to compare the distribution of the three types of hip deformity between the right and the left hip. The level of significance was p< 0.05. The statistical analysis was performed using SPSS version 13.00 statistical package.
The agreement between the gold standard method and the three observers was excellent. The overall percentage of agreement of all three observers compared with the intraoperative judgment was 87.4% (K=0.823, excellent agreement). The agreement for observer 1 was 85.7% (K=0.8, excellent agreement, p<0.0005), for observer 2 87.7% (K=0.831, excellent agreement, p<0.0005) and for observer 3 88.7% (K=0.839, excellent agreement, p<0.0005).
There was no statistically significant difference among the three observers concerning the percent of agreement with the gold standard method (Table 3).
The percentage of agreement between observer 1 and observer 2 was 80.3% (K=0.728, substantial agreement), between observer 1 and observer 3 it was 87.2% (K=0.816, excellent agreement) and between observer 2 and 3 it was 81.7% (K=0.740, substantial agreement) (Table 4).
Several classification systems have been used to describe the different types of CHD in adults [1–4]. However, the reliability of those systems has not explicitly been the focus of attention and their validity has not been, to our knowledge, been reported.
This study evaluated the interobserver reliability of the Hartofilakidis et al. classification system using preoperative anteroposterior pelvis radiographs and examined its validity using the intraoperative findings as the gold standard method.
Reliability of a classification system depends on the consistency of measurements or observations and has to do with the quality of measurement or observation. It also describes the extent of the agreement between repeated measurements. A reliable classification system classifies a disease or fracture consistently but it does not necessarily reveal what in reality is happening. A valid classification system reveals the true underlying pathology of the disease or fracture [3, 5].
Validity is the best approximation to the “truth” providing a true assessment of that which it purports to measure or describe.
Reliability and validity are not independent but they are related to each other. A method may be reliable if it measures something consistently but is valid only if the result of measurement approximates the true value. Validity implies reliability but not vice versa. Reliability is a necessary but not sufficient condition for validity. A classification system may be reliable but not valid, but it cannot be valid without being reliable .
Not every commonly used classification system in orthopaedics is reliable or reproducible. Substantial variation within the ratings of several radiograph reviewers may be noted. The accuracy of any classification system, which can be considered as a measuring instrument, is described estimating the reliability and the validity of the data. The quality of a measurement or observation can be described with the determination of reliability and validity. The more valid and reproducible a classification system is the better are the communication between clinicians and researchers, the treatment planning and the evaluation of the results of any given therapeutic intervention.
Preoperative radiographs are used as a projection of the pathology or the expected bone deficiencies encountered during hip surgery. The surgeon should be able to recognise on preoperative radiographs the anatomical abnormalities which may be encountered during the operation. This facilitates preoperative planning and assists in effectively dealing with the technical difficulties of the operation.
In our study there was significant agreement between the preoperative rater’s assessments (reliability) and the intraoperative assessment (validity) with weighted kappa values of >0.75. These results may be partially due to the fact that the reviewers, though from different institutions, were familiar with the Hartofilakidis et al. classification having performed many THRs for CHD. This may be considered as a limitation of the study; however, we assumed that the more experienced the observers the better would be the understanding of the definitions of a classification system and the prediction of the expected bone deficiencies found intraoperatively.
In tentative cases the underlying anatomy of the hip joint can be evaluated with three-dimensional (3-D) computed tomography (CT). With exclusion of the femoral head from the final image the deficiencies of the acetabulum can be readily appraised (Figs. 2 and and33).
The inter- and intraobserver reliabilities of the Crowe and Hartofilakidis classifications were recently examined by Decking et al. . In this paper the radiographs of 51 patients (62 hips with CHD) were included. According to those authors both systems can be recommended because of their high reliability; however, to our knowledge, validation of any CHD classification system has not yet been performed. This is the case because classifications such as the one of Crowe et al.  cannot be validated because they are based not on the pathology of the acetabulum or the femur but rather on arbitrary assumptions or measurements performed on radiographs. The Hartofilakidis et al. classification is based on the pathology of the acetabulum as determined intraoperatively.
This study evaluated the validity of the Hartofilakidis et al. classification using as a gold standard the intraoperative findings. There was a high correlation between the preoperative radiographs and the intraoperative findings among all participants in this study. The intraoperative findings in contrast to the radiographic findings are considered soft data . Observer bias to the validation process has been kept to a minimum since all intraoperative observations were made and recorded by the senior author.
A classification is purported to measure some future performance. The success of THR in CHD, as judged by the rate of revision surgery, depends on the severity of the anatomical distortion of the acetabulum which is explicitly described by the Hartofilakidis et al. classification. Survival of the THR in patients with CHD depends on the type of the acetabular deformity [8, 11, 14, 15], acetabular component loosening being the weak link.
The Hartofilakidis et al. classification system has been shown to be reliable as well as valid in predicting the acetabular abnormalities encountered during THR. It provides a reliable estimate of the acetabular bone loss in adults with hip osteoarthritis secondary to CHD. With this classification the structural alterations of the acetabulum could be reliably predicted using standard anteroposterior pelvis radiographs.
Knowledge of the structural deformities of the acetabulum in CHD in adults and of the segmental roof or wall defects anticipated during surgery, as predicted by the Hartofilakidis classification, improves preoperative planning and may improve cup placement, anticipating an increase in the endurance and longevity of the component.
The reliability and validity of the Hartofilakidis et al. classification system relates to its ability to consistently predict the pathoanatomical changes of the acetabulum and thus to apprise the surgeon of the different defect types anticipated during surgery.
We wish to thank Antonis Galanos, M.Sc. for providing help with the statistical analysis.