The overall interobserver reliability of this fracture classification is better than most other reported agreement for fracture classification systems in adults. According to Landis and Koch, a kappa value of 0.66 would be rated as substantial agreement [32
]. It is reasonable to believe that the reliability of the fracture classification will improve in the clinical setting, where information about the patient is available.
The number of categories will affect the reliability of any classification. This is obvious if we think of a classification with only one category. Adult fracture classifications have often many categories due to the various fracture patterns that can occur in brittle bone, such as intraarticular affection and comminution. For example, the AO classification of distal radius fractures in adults has 3 types, 9 groups and 27 subgroups, and the reliability has been reported to be less than satisfactory by several authors [6
] (Table ). However, intraarticular fractures and severe comminution are rare features of pediatric fractures. It is therefore possible to reduce the number of categories and increase the reliability of the classification without loss of prognostic value. For example, the Gartland classification of supracondylar humerus fractures in children has only three categories, and has one of the highest reported kappa values for interobserver reliability [34
There are very few fracture classifications for pediatric fractures compared to the vast array of different classifications that exists for fractures in adults. The Arbeitsgemenischaft für Osteosynthesefragen (AO) has recently proposed a comprehensive fracture classification system for pediatric fractures [35
]. This fracture classification contains categories for fracture types that are unique for pediatric bone, such as bowing fractures and growth plate injuries. However, this classification does not make the distinction between the buckle (torus) and the greenstick fracture of the distal radius. It is generally agreed that these two common pediatric fracture types are different entities which behave differently and need different treatment and follow up [18
]. In addition, the AO group has added ligamentous avulsion injuries of the wrist as a separate category. This is a very rare injury in children and was not identified when the AO classification system for pediatric fractures was validated [38
The AO group reported a kappa value of 0.70 for metaphyseal fractures of the distal radius [38
]. However, in this study there were only two categories; complete fractures and buckle/greenstick categorized together. Epiphyseal fractures were analyzed separately. The authors defined the correct classification as that defined by most raters, and then excluded the epiphyseal fractures when analyzing the reliability for metaphyseal fractures. This raises a few concerns: A fracture classification should include all possible fracture categories for that bone (distal radius). When confronted with an injured wrist, the clinician does not know if the physis is involved before the radiological examination. There is often disagreement between raters whether the fracture involves the physis or not. This is certainly the case in our study, as is demonstrated in Figure . If we excluded all the growth plate injuries as defined by most raters, there would still be raters that would categorize some of the remaining fractures as physeal. Furthermore, buckle and greenstick fractures should be managed differently [18
]. By placing these fractures in the same category the classification will not offer helpful guidelines to the clinician. In addition, it is important that the sample is representative of the study population, since the kappa statistics will vary according to the prevalence of the categories under study [39
]. When the number of categories is reduced by excluding one type of fracture, this will change the prevalence of the different fracture types in the sample compared to that of the population at risk, and thus changing the kappa statistics. It is therefore essential that the included fractures come from an unfiltered consecutive series to make sure the sample is representative of the population. This is specifically important when examining the reliability of distal radius fractures in children, since the distribution of categories is highly uneven, with buckle fractures representing the majority of cases.
Our results demonstrate that the fracture classification is not only dependent on the number of categories and the prevalence of the categories in the study population, but also on the experience of the raters. Ideally, a classification system should be simple and independent of the experience of the rater. However, the effect of experience on reliability has previously been described for other classification systems [2
]. The effect of the experience on this particular classification is noteworthy, since these fractures are considered benign and are generally treated by the youngest doctors. The best result at the first reading in our study was achieved by the orthopaedic consultants. It is worth noticing that two experienced consultants had lower intraobserver agreement than the senior registrars. The senior registrars have several years of experience in fracture management, and are involved in fracture classification on a daily basis. At our institution the consultants are in general not involved in fracture management of the distal radius in children, except occasionally while on-call. It seems that both daily fracture management and general experience in orthopedics enhances the reliability.
Stable distal radius fractures in children are extensively monitored with both clinical and radiological follow ups [18
]. In this series of 105 consecutive fractures, 65 fractures were by consensus defined as buckle fractures. These stable fractures were given a total of 72 clinical follow-up examinations and 34 further radiological examinations. These could have been avoided with more focus on the fracture classification and better supervision. The junior registrars had statistically significant lower kappa value for interobserver reliability than the more experienced raters. They placed fewer fractures in the buckle group, and rated more fractures as greenstick or physeal injuries. This generated more unnecessary follow-ups, but didn't risk any adverse outcome. We coclude that junior registrars overdiagnose, and safe-guard themselves by placing more fractures in categories that merit a follow-up. We encourage the junior registrars to ask for a second opinion. We can avoid an appointment in an overbooked fracture clinic, the child can stay in school and the parents don't have to take time off work to take the child to hospital.
Limitation of the study
All raters in this study were selected from one institution. This may limit the generalizability of the results to other institutions, thus reducing the external validity of the study. The type and amount of instruction for each rater prior to enrollment in the study is unknown. However, at our institution no systematic instruction for classification are given to doctors treating these fractures, and we have no reason to believe that this is different at other institutions. Only one of the consultants was trained in pediatric orthopedics, and thus the findings may not be relevant for institutions where specialists in pediatric orthopedics are involved in outpatient fracture treatment.