A prerequisite for testing the usefulness of a classification system for LBP is for examiners to be able to reliably classify the proposed LBP problems. In the current study, two experienced physical therapists trained in a standardized examination for classifying LBP based on the MSI model, were able to determine the LBP classifications for a sample of people with non-specific LBP with substantial agreement.29
The current study extends our previous findings 25, 28
by demonstrating that, with training, a physical therapist not involved in the development of the examination can reliably classify people with LBP. These findings also demonstrate that substantial reliability can be attained using a test-retest design instead of a simultaneous observation design as used previously.25
Such findings extend the generalizability of our initial reliability testing and suggest that the examination could be used by other experienced physical therapists reliably given the appropriate training. Training in this study included self study of a procedure manual, supervised practice of examination procedures and application of the classification rules to 10 individuals as well as discussion. We believe, however, that learning the specific classification rules was key to attaining a high level of reliability.
We demonstrated similar reliability values as those reported for application of the MK system to classify.37, 38
Two recent studies assessing the inter-tester reliability37, 38
of the MK system involved physical therapists who had received extensive training and reported a minimum of 5 years of clinical experience using the MK system. All therapists were credentialed in the examination procedures of the MK system. The first study by Kilpikoski et al38
used test-retest methods similar to our study and reported that 2 physical therapists obtained an overall percent agreement for LBP classification of 95% and a kappa value of 0.6. Razmjou et al37
used simultaneous testing to assess reliability of 2 physical therapists and reported agreement of 93% and a kappa value of 0.7.
The reliability reported in the current study is better or similar to that reported for reliability of physical therapists using the TBC system. Direct comparison of the current results to those of the TBC system is not possible, however, due to differences in therapist characteristics.39, 40
Heis et al39
examined the reliability of four experienced therapists who were newly trained to apply the TBC classification system. Data from one therapist was not included in the final analysis due to low agreement with the other therapists in the study. The authors reported that the agreement of the three remaining therapists was 55% with a kappa value of 0.46. Fritz et al40
, also reported on a test-retest design of the reliability of therapists to classify using the TBC system. The therapists had an average of 5.5 years (6 month to 15 years) experience using the TBC system. Agreement of the seven therapists in making a classification decision was 65% with a kappa value of 0.56. Therapist training was not described. provides a summary of the results of studies of the reliability of different cohorts of therapists to classify using the three classification systems (MK, TBC MSI),
Inter-tester reliability of impairment based classification systems.
We have now reported on the inter-tester reliability of physical therapists classifying LBP in two independent samples.25, 28
The methods in the current study were more rigorous than our prior work and yet the obtained agreement is higher in the current study than that obtained in our previous study.25, 28
The improvement in agreement is likely due to the therapists having more explicit rules for classifying than those available for our original reliability study (Appendix A
). The prior study was the first attempt to test the measurement properties of the test items used in the examination, and the primary goal of the study was to examine the ability of therapists to make reliable judgments about individual items from the examination. The rules provided for classifying during the original study were more general than our present rules, and during training less emphasis was placed on learning and applying the rules for assigning a LBP classification than on making judgments about individual test items.25, 28
Information obtained from our original reliability study25, 28
and subsequent studies18–20, 22, 23, 32
have allowed us to develop more specific guidelines for making judgments during individual test items, and to develop more detailed rules for classification. Clarification of criteria for judgments during the examination and development of more specific classification rules likely contributed to our improved therapist agreement in the current study.
Currently, to assign a classification with the MSI system, symptoms must be either produced or increased with some test items during the examination. One subject reported no change in symptoms during either examination. A second subject reported no change in symptoms during the first examination and reported one test as symptom-provoking during the second examination. Following the rules for classifying, both therapists did not assign a classification to the first subject. For the second subject, the first therapist did not assign a classification while the second therapist was able to assign a classification. Thus, a limitation of our current criteria for classification is that an examiner may not be able to classify subjects with a low level of symptom irritability during the examination. After analysis of reliability was completed, the charts for the two subjects described were examined. In the instances where a classification was not assigned, each therapist recorded what she believed the patient’s LBP classification would be based solely on judgments of signs with tests of movement and alignment across the examination. In both subjects, the therapists agreed upon the classification, even though there were little to no tests that evoked symptoms. The criterion of symptom reproduction during the examination, therefore, may represent a limitation in the classification rules. Based on the example from the current study it may be possible that the classification rules could be modified to permit classification based on the signs during tests of movement and alignment made across the examination in the absence of symptom production.
We did not display perfect agreement to classify the LBP problems present in our sample. The therapists disagreed on the classification of five subjects. To determine the nature of our disagreements, we reviewed the data from the examination forms of the subjects for which there was disagreement. The first disagreement is described in the previous paragraph. Two additional classification disagreements were due to the therapists’ interpretation of symptoms during individual examination items. Specifically, two subjects described “pressure” in their low back region with a number of the items. One therapist interpreted the “pressure” as the subjects’ symptoms; the other therapist did not. Thus, the classification disagreement in these two cases was a result of the therapists’ interpretation of symptom behavior. In one subject, the therapists did not agree on the patient’s symptom report on a number of test items. The differences could have been a result of subject variability or misinterpretation of symptoms by the examining therapist. On inspection of the examination data from the fifth subject the therapists agreed on the responses to individual items across the examination. One therapist, however, chose a classification inconsistent with the rules. Thus, one therapist misapplied the rules to classify the subject’s LBP problem.
We consider the use of a test-retest design in this study to add to the strength of our findings. In our previous work,25, 28
both therapists were present during the assessment of each patient. One therapist performed the examination, while the second therapist observed. The simultaneous observation method used in the previous study was intended to remove any variability in patient status or in therapist methods that could affect the results of a test-retest study design. Since the prior study was our first attempt to examine any of the measurement properties of the examination and classification system, the primary question we asked was whether, when the therapists see and hear the same responses could they make the same judgments. The use of the simultaneous observation method, therefore, could have positively affected our prior inter-tester agreement.26
In the current study, each therapist performed the examination independently. Despite possible variability in patient status and variability in methods between the two therapists, our inter-tester reliability was substantial.29
The current study has limitations. First, the extent to which the current sample is representative of all individuals with LBP is not known. Our sample of subjects was recruited from the community through advertisements, flyers placed in physician and physical therapy clinics and a University web-based volunteer registry. The subjects in our sample, therefore, may not represent all individuals with LBP who would present to a medical facility for treatment. The subjects, however, had similar Oswestry scores and pain location and severity as patients who typically are referred to our clinical setting. In addition, our subjects had chronic, recurrent LBP and minimal disability as indexed by the scores on the Oswestry Disability Index. We do not know if we would have similar results in subjects with an acute onset of LBP or with higher levels of disability. Future studies are needed to assess the use of the examination in subjects with acute LBP or higher levels of LBP-related disability.
A second potential limitation is the truncated distribution of the LBP classifications identified in the study sample. There were no subjects classified as lumbar flexion or lumbar extension in the current sample. Such a finding might also suggest that our study population may not represent all patients with LBP. We do know based on prior data22, 41
as well as those of others42
that the prevalence of lumbar flexion and lumbar extension problems appears to be less than that of the other proposed classifications. Although the percent agreement was 83%, the skewed distribution of subjects across categories may have contributed to an attenuation of the kappa value.
A third potential limitation is the fact that both examinations for each subject were performed within the same day. We chose to perform both examinations on the same day, however, to ensure stability of subject responses. Stability of the subjects’ behavior is an important assumption of a test-retest design so that any differences between test sessions is due to variability in therapist methods and not a result of true change in the subject over time.43
We also examined people within the same day to make the study more feasible for subjects to participate. A potential disadvantage of repeated testing in the same day is that subjects’ symptoms could be increased during the second examination compared to the first examination. Any differences in subjects between the two sessions, however, did not substantially affect our reliability as evidenced by the kappa value (k=.75) obtained.
Finally, the generalizability of our findings to other examiners may still be somewhat limited. Both therapists were experienced in treating musculoskeletal pain problems and had practiced applying the concepts of the MSI model for LBP to patients. The first author had used the examination and treatment principles in her clinical practice across 7 years. The second author had primary responsibility in developing the examination and used the procedures extensively in prior studies. We do not know if we would find similar reliability in examiners with less clinical experience or less experience applying the principles of the model that is the basis for the MSI classification system. Our primary purpose with the current study, however, was to examine what therapists’ reliability to classify would be when we used a more rigorous study design (test-retest design) and when someone who was not involved in the original development of the examination was tested. The current study suggests that the reliability to classify people with LBP under more stringent conditions is actually better than that attained in our earlier reliability study. An appropriate follow-up to the current work would be to examine the inter-tester reliability of novice, but trained examiners. Such work is currently underway. After a two day instructional course, 13 examiners with no experience to moderate experience with the MSI classification system classified written cases of data from people with LBP. Agreement among therapists was excellent with an overall kappa of 0.81 (CI: 0.78–0.83, p<0.01) (unpublished data).