|Home | About | Journals | Submit | Contact Us | Français|
This study tested the reliability of a 5-point ordinal scale used to grade the severity of degenerative changes of zygapophyseal (Z) joints on standard radiographs.
Modifications were made to Kellgren’s grading system to improve agreement for grading the severity of osteoarthritic changes in lumbar Z joints. These included adding 1 grade of no degeneration, multiple radiographic views, and structured examiner training. Thirty packets of radiographic files were obtained, which included representation of all 5 grades including no degeneration (0) and Kellgren’s 4-point (1 to 4) joint degeneration classification criteria. Radiographs were digitized to create a radiographic atlas that was given to examiners for individual study and blinded evaluation sessions. Intra-rater and inter-rater agreement was determined by weighted kappa (κw) from the examination of 79 Z joints (25 packets).
Using the modified scale and following training, examiners demonstrated a moderate to substantial level of inter-rater agreement (κw = 0.57, 0.60, and 0.68). Intra-rater agreement was moderate (κw = 0.42 and 0.54).
The modified Kellgren 5-point grading system provides acceptable intra- and inter-rater reliability when examiners are adequately trained. This grading system may be a useful method for future investigations assessing radiographic osteoarthritis of the Z joints. v
Low back pain is a substantial contributor to global disability.1 One key source of low back pain is the lumbar zygapophyseal (Z) joint (facet joint).2–9 Several etiologies have been hypothesized for Z joint mediated pain, including osteoarthritis.9–14 Induction of degenerative changes in the nociceptor innervated Z joint (e.g., via inflammation) enhances the transmission of pain-related information15 and nociceptive behaviors as shown in preclinical osteoarthritis pain models.16–19 Although recent clinical studies suggest that older patients with severe Z joint degeneration more frequently report low back pain,20 the relationship between Z joint osteoarthritis and low back pain remains unclear,21,22 possibly due to confounding patient subpopulations.20 Thus, further work in both preclinical models and clinical subjects is needed to better understand this complex relationship. To begin to address this relationship we wanted to first determine if we could reliably grade the severity of degenerative changes of Z joints observed on radiographs of human subjects using a modified grading system. To date there are no grading scales using radiographs of the lumbar spine Z joint to assess the severity of osteoarthritis that are considered reliable for outcomes based research.23 If reliable, this scale could be a useful method to help determine the relationship between Z joint osteoarthritic changes and low back pain.
Degenerative changes in Z joints observed with radiographs are consistent with those found in other synovial joints. These changes include apophysial hypertrophy, subchondral sclerosis, osteophytosis, joint space narrowing, and joint surface irregularity.24–30 Evidence of Z joint degeneration can be visualized on standard radiographs27–30 and grading systems that include many of these changes have been described.23,31,32 The current recommendations for the development of grading scales to assess lumbar Z joint degeneration on radiographs before implementing them into outcomes research are that the potential scales have 3 to 5 grades (starting with 0 for no degeneration) and demonstrate a reliability coefficient (unweighted kappa, weighted kappa, or interclass) over 0.40.23 Grading degenerative changes is inherently difficult for many reasons and to date no grading system for assessing the lumbar Z joints with standard radiographs has been found to reach this level of reliability.23 Kellgren and colleagues31,32 created radiographic grading classification systems for degenerative changes of many synovial joints, including the Z joints. These Kellgren systems could be advantageous over other radiographic grading scales assessing Z joint degeneration, which have fewer grades, lack categories that clearly describe the degenerative changes, and have levels of agreement below the threshold of acceptability.23,33 Previous reliability studies showed that when used with only a lateral (LAT) lumbar radiographic view the Kellgren and Lawrence31 5-point scale was not reliable for assessing lumbar Z joint degenerative changes. Kellgren32,34 later developed a more detailed 4-point classification scale (removing no degeneration as grade 0) for the grading of Z joints in the cervical spine. Modifications of this scale to a 5-point classification by including a grade of no degeneration using only a LAT view of the cervical spine was at the threshold of acceptable reliability for outcomes research;23,34 however, the authors suggested that additional views may improve agreement when assessing degenerative changes of the Z joint.34 This is in agreement with previous studies, wherein investigators who performed radiographic studies of Z joints emphasized the need for multiple views of the assessed region.21,26,29,33,35
The most useful radiographic view for the visualization of the lumbar Z joints is the oblique (OBL) due to the joint orientation.29,30,33 Lumbar OBL views improve visualization of the joint space and assessment of osteoarthritic changes.29,30,33 Herein, we evaluated the reliability of a modified version of Kellgren’s detailed 4 grade system of cervical Z joint degenerative changes by using anterior-posterior (AP), LAT, and OBL lumbar radiographs. Because most of the radiologic findings associated with degeneration of the cervical Z joints also apply to the lumbar region,30 we hypothesized that modification of Kellgren’s methods by adding a grade of no degeneration (0) to create 5 grades, plus the addition of multiple views, and trained examiners would result in acceptable reliability when applied to the lumbar spine.
The purpose of this study was to test the reliability of a 5-point ordinal scale that grades the severity of degenerative changes of Z joints on standard radiographs. If reliable, these methods could be used to assess Z joint degeneration for research and clinical purposes.
The National University of Health Sciences Institutional Review Board approved this project. Packets of radiographs were obtained from a review of the file database from the Department of Diagnostic Imaging over the previous 4 years. Selection of the radiographs was carried out by the primary investigator (JL) and the Director of Diagnostic Imaging (DDI, JR), neither were examiners in the reliability studies. Attempts were made to include all degenerative grade classifications of Kellgren’s criteria (Figure 1). Radiographs were excluded based on poor technical factors, anatomic anomalies, and overlying pathology that hindered visualization of the articular processes or Z joints. Each radiograph could contain up to 4 Z joints to grade (left and right L4-L5 and left and right L5-S1) and different grades could exist at different Z joints. To ensure patient privacy and blinded evaluation, the identification plates were covered and the radiographs were housed in blank jacket covers. In compliance with HIPAA and to prevent examiner bias the patient’s clinical condition was withheld.
Using these selection criteria, evaluation packets were compiled for the reliability study. Each packet contained 3 to 5 radiographic views from the same patient: anterior-posterior (AP) (including AP lumbosacral angulated spot for L5-S1 vertebral levels), lateral (LAT), and 1 oblique (OBL) for each side being assessed. For example, if the left L4-L5 Z joint only was assessed, the packet would include 3 views: AP, LAT, and left OBL. If the left L4-L5 and right L5-S1 Z joints were assessed on the same patient, then the following 5 views would be included in the packet: AP, AP Spot (for L5-S1), LAT, left OBL and right OBL. Twenty-five packets with 79 Z joints were used for the examination portion of the study. The mean age of the patients was 56.2 years (18 males and 7 females). An additional 5 packets were used for training the examiners. A radiographic atlas was compiled from the training packets with representation of every grade. This atlas was provided to the examiners to use for review and as a resource during examination sessions.
Three examiners were used in the study, a chiropractic radiologist and 2 chiropractic radiology residents. The 2 residents (examiners 1 and 2) were the “primary examiners” and examiner 3 was used as an expert for comparison to the residents’ grading.
The examiners received instruction in the modified Kellgren’s 5-point classification system (Table 1) through 3 training sessions in a 1-week period. The training sessions were led by the DDI using the 5 training packets. Examiners viewed the training radiographs with the DDI and received the radiographic atlas for further independent study. Upon completion of training sessions the examiners scheduled times to grade the radiographs for the study. Each examiner received the packets in the same order and graded degenerative changes independently. Examiners were allowed to refer to the atlas during the grading session. Examiners were blinded to others’ results.
Examiners 1 and 2 performed a second examination of all radiographs. The 2 grading sessions were separated by an interval of 3 weeks. During the 3-week interval, these examiners were encouraged to implement the grading criteria during their daily radiographic reading of clinic patients. The examiners were also given 2 review sessions by the DDI prior to grading the films for the second time. Following the first grading session, the packets were randomized for the second grading session using a random number generator.
Data collected from the first and second grading sessions were analyzed for inter-rater reliability for examiners 1 and 2. Data from the second session were compared to the first grading session for intra-rater reliability for examiners 1 and 2. The single grading session performed by examiner 3 was analyzed for inter-rater reliability with both primary examiners for sessions 1 and 2.
The data were analyzed descriptively to calculate percent agreement between examiners and sessions. Inferential analysis was performed to calculate weighted Kappa (κw) scores36 for 5 categories using MedCalc software. The categories were as follows: ≤ 0=poor, 0.01-0.20=slight, 0.21-0.40=fair, 0.41-0.60=moderate, 0.61-0.80=substantial, 0.81-1.0=almost perfect agreement. κw provides weighted values to account for the degree of disagreement between 2 observers and is the preferred method for evaluating ordinal data for reliability36 as previously reported in imaging reliability studies of the Z joint osteoarthritis grading scales.23,37
Table 2 summarizes the results of grading the 79 Z joints. The level of inter-rater reliability for (complete) agreement between the primary examiners was moderate (κw = 0.60, 49.4%, 39 joints). Agreement within 1 grade difference was 89.9%, an additional 40.5% (30 additional joints). These scores are above the recommended threshold for acceptable level of agreement (κw > 0.40) for grading lumbar facet degeneration with standard radiographs.23
Intra-rater agreement was evaluated for examiners 1 (κw = 0.42) and 2 (κw = 0.54). Examiner 1 had 39.0% agreement (31 joints) and an additional 40.5% agreement with 1 grade difference (32 joints; agreement within 1 grade = 79.7%). Examiner 2 had 33.8% agreement (27 joints) and an additional 52.5% agreement with 1 grade of difference (42 joints; agreement within 1 grade = 86.3%).
Inter-rater reliability was also evaluated for both sessions for examiner 1 and 2 with examiner 3. In session 1, examiner 1 demonstrated fair agreement (κw = 0.37, 27.8% agreement, 61.9% within 1 grade) as did examiner 2 (κw = 0.39, 16.4% agreement, 69.5% within 1 grade). Session 2 revealed moderate agreement for examiner 1 (κw = 0.57, 32.9% agreement, 94.9% within 1 grade) and substantial agreement for examiner 2 (κw = 0.68, 50.6% agreement, 94.9% within 1 grade).
We assessed the reliability of a modified Z joint osteoarthritis scale described by Kellgren32 when applied to lumbar spine radiographs. The modifications included adding a grade of 0 (no degeneration), additional views for adequate visualization of the Z joint, and formalized training sessions with compilation of an atlas for independent study and review during the grading sessions. Inter-rater reliability between primary examiners and an expert examiner (i.e., 1 and 3, 2 and 3) from session 1 to session 2 demonstrated a large increase in percent agreement within 1 grade and substantial agreement for session 2 (mean κw = 0.63). This suggests that examiners 1 and 2 were making more informed decisions with increased training and experience. The inter-rater reliability scores from session 2 and between examiners 1 and 2 were above the recommended threshold of acceptability for grading lumbar Z joint degenerative changes from radiographs.23 Based upon these findings we consider this method of grading Z joint degeneration to have acceptable reliability for clinical research.23,36 However, this level of agreement is at the threshold of acceptability, and reliably grading Z joint osteoarthritic changes using radiographs is difficult;23 consequently, we would also recommended that for outcomes based research, grading should be performed independently by 2 trained examiners who will then come to a consensus grade on any scoring disagreements.
The increase in inter-rater agreement with examiner 3 was accompanied by diminished intra-rater reliabilities as would be anticipated from the “continued learning” of the 2 primary examiners. Consequently, intra-rater agreement for the examiners between the 2 sessions was lower than inter-rater agreement. One reason for this could be the level of training of the 2 primary examiners. An a priori assumption is that an experienced radiologist will evaluate radiographs more consistently and with more accuracy than those with less training and experience. Studies have also demonstrated that training can improve the performance of examiners.38,39 We are convinced that our examiners demonstrated improvement in performance as evidenced by comparing the inter-rater reliability between the third (experienced) examiner and the 2 primary examiners for the first (κw =0.37, 0.39) and the second sessions (κw =0.57, 0.68). This may be explained by the additional training before the second session. Possible extensions of these data could be to test the hypothesis that additional training would further improve inter-rater reliability. Even with these considerations, the intra-rater reliability scores, like the inter-rater scores, were above the recommended threshold of acceptability for grading degeneration of lumbar Z joints from standard radiographs.23
Although we found modifications of the Kellgren methods (i.e., using a 5-point scale, additional radiographic views, and focused training of examiners) provided adequate reliability for outcomes based research, there are limitations to this approach. Clearly, a reliability study cannot demonstrate validity of a scale, thus further studies (e.g., determining the relationship between the severity of radiographic and histopathological osteoarthritic changes in animal studies and cadaveric studies) are required to address this issue. Although using radiographs is acceptable in delineating the presence from the absence of degenerative change,33 radiographic assessment of Z joints underestimates the severity of degenerative changes compared to assessment using advanced imaging techniques (i.e., computerized tomography, CT, and magnetic resonance imaging, MRI).33,40–42 Thus, important extensions of this work are to examine reliability of modifications to ordinal grading scales intended for advanced imaging techniques, which may improve reported reliability scores23,37,43–45 and subsequent clinical studies.
The clinical feasibility of our approach is also somewhat limited. Recent guidelines have concluded that the routine use of all diagnostic imaging is not indicated for the treatment of non-specific low back pain without certain red flags.46–48 This limits the clinical use of the scoring system reported here as lumbar radiographs are only recommended for patients when there is a suspicion of serious disease (e.g., cancer, infection, immunosuppression), they are surgical candidates or have had prior lumbar surgery, and they present “with one or more of the following: low-velocity trauma, osteoporosis, focal and/or progressive deficit, prolonged symptom duration [no improvement after 6 weeks of treatment], age > 70 years.”48 Notably, the age recommendation (> 70 years) suggests some potential clinical use of this scale as there is evidence that supports a relationship between the severity and extent of Z joint degeneration and reports of low back pain in older individuals (mean age 67),20 which is the age group reported to have the highest prevalence of chronic low back pain that is relieved by Z joint nerve blocks.49 Another limitation to consider before implementing this modified scale clinically is the risk of additional radiation exposure from adding OBL views. Although this is an important consideration, the use of an alternative imaging modality for assessing osteoarthritis such as CT has a higher radiation dose (6 versus 1.5 millisieverts) and is more expensive (at least 6 times) than standard radiographs.50,51 This scale may have clinical importance if a relationship between Z joint degeneration and low back pain is clarified, a hypothesis which requires more study. Considering these current limitations, we cannot recommend the general use of this modified scale for clinical practice. This, in addition to the specialized training and multiple examiners needed for acceptable reliability, supports the use of this approach primarily for outcomes based research as a method for examining the role of Z joint degeneration in subpopulations of low back pain patients.
We have previously performed preclinical histopathological animal studies evaluating the effects of spinal hypomobility on degenerative changes in rat spinal joints.52 Future studies are planned to compare degenerative changes found in human Z joints using the methods described in this study with microscopic and radiographic findings of Z joints using our spinal osteoarthritis animal model.52 Initial work to test the reliability of applying this method to our preclinical model of Z joint osteoarthritis would be the first step to assess the actual degeneration indicated by radiographic findings. Other ongoing mechanistic studies are assessing the effects of spinal manipulation on the time profile of degenerative changes in the same animal model. Radiographic findings from these animals could then be used to begin the process of translating findings from the animal studies to humans with similar levels of degeneration.
A reliable assessment of degenerative changes is useful in understanding the mechanism of action and clinical effects of spinal manipulation and how these are affected by the severity of Z joint degeneration. Cramer et al. demonstrated that the Z joints gap with chiropractic adjustment in healthy53,54 and low back pain55 subjects, which helps to provide understanding of a potential mechanism of this therapeutic intervention. Future studies evaluating the mechanism and clinical effects of spinal manipulation could use similar approaches for grading degenerative change used in this study, and/or apply similar approaches to advanced imaging (e.g. CT, MRI). Such methods could help assess the relationship between Z joint degeneration and Z joint gapping following manipulation and/or assess the relationship between Z joint degeneration and patient response (i.e., reduction of pain and improvement of function) to manipulation.
Z joint osteoarthritis is now considered a disease involving the whole joint, including the articular cartilage, subchondral bone, ligaments, capsule (i.e., external and synovial membrane), periarticular paraspinal muscles, and soft tissues;10,56 many of these structures are not optimally observed with radiography. Thus, further considerations should also include advanced and emerging alternative imaging approaches such as CT, MRI, and diagnostic ultrasound10,57 that enable visualization and assessment of a broader spectrum of osteoarthritic changes to these Z joint related structures. Such imaging modalities provide intriguing options that could bypass some limitations of radiography41 and allow for a better assessment of the relationship between Z joint osteoarthritis and low back pain.
In summary, future work should examine if modifications of grading scales to assess the severity of lumbar Z joint degeneration that are designed for more sensitive advanced imaging modalities can also provide better reliability. These key methodological studies will help to develop a critical foundation for assessing the severity of Z joint osteoarthritis that may allow for a better understanding of the relationship between Z joint degeneration and low back pain.
This study suggestions that the modified Kellgren grading scale provides acceptable reliability and a more specific assessment of Z joint degeneration when appropriate radiographic views are included and multiple adequately trained examiners come to a consensus. These methods are likely most useful for future investigations assessing the relative degeneration of the Z joints in preclinical and clinical low back pain studies.
Funding for this project was provided by the National Institutes of Health/National Center for Complementary and Alternative Medicine (grant # 3R01AT000123, parent grant # 2R01AT000123).
We gratefully acknowledge Matthew Budavich, BA for support with reference management.
CONFLICTS OF INTEREST
No conflict of interest were reported.
Contributorship:Concept development (provided idea for the research): JWL, GDC, JAR, EEL, JPDS, KL
Design (planned the methods to generate the results): JWL, GDC, JAR, EEL, JPDS, KL
Supervision (provided oversight, responsible for organization and implementation, writing of the manuscript): JWL, GDC, JAR, EEL, JPDS, KL
Data collection/processing (responsible for experiments, patient management, organization, or reporting data): JWL, GDC, JAR, EEL, JPDS, KL
Analysis/interpretation (responsible for statistical analysis, evaluation, and presentation of the results): JWL, GDC, JAR, EEL, JPDS, KL
Literature search (performed the literature search): JWL, GDC, JAR, EEL, JPDS, KL
Writing (responsible for writing a substantive part of the manuscript): JWL, GDC, JAR, EEL, JPDS, KL
Critical review (revised manuscript for intellectual content, this does not relate to spelling and grammar checking): JWL, GDC, JAR, EEL, JPDS, KL.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Joshua W. Little, Assistant Professor, Center for Anatomical Science and Education (CASE), Department of Surgery, Saint Louis University School of Medicine.
Thomas J. Grieve, Instructor, Department of Clinical Sciences, National University of Health Sciences.
Gregory D. Cramer, Professor and Dean of Research, Research Department, National University of Health Sciences.
Jeffrey A. Rich, Radiologist, Northwestern Health Sciences University.
Evelyn E. Laptook, Assistant Professor, National University of Health Sciences.
Joseph P.D. Stiefel, President, National University of Health Sciences.
Kathleen Linaker, Executive Director, Chiropractic Programs, Department of Chiropractic, D’Youville College.