|Home | About | Journals | Submit | Contact Us | Français|
The responsiveness of an instrument measuring health-related quality of life is an important indication of its construct validity. The SRS-22 Patient Questionnaire has become the most widely used patient-reported outcome instrument in the clinical evaluation of patients with idiopathic scoliosis. The responsiveness of the SRS-22 following surgical treatment in patients with idiopathic scoliosis has not been fully assessed. The aim of this study is to evaluate this factor by calculating the minimal important differences (MIDs) of the SRS-22 Questionnaire. The study included 91 patients with idiopathic scoliosis (77 females and 14 males), who underwent surgical treatment; mean age at the time of surgery was 18.1 years. Patients completed the SRS-22 questionnaire before surgery and at a follow-up visit (mean follow-up, 45.6 months). At follow-up, patients rated their overall situation as related to before surgery with a four-point Likert scale: 1—Worse, 2—Same, 3—Better, 4—Much Better. This evaluation represented the global perceived effect (GPE) and served as the anchor criterion for calculating the MID. MIDs were calculated using two approaches. The anchor-based MID (MID-A) was defined as the mean preoperative/follow-up difference in SRS-22 scores in the group of patients who stated they were much better than before surgery (GPE = 4). Using the same anchor criterion, the optimal cut-off value able to identify patients that had clearly improved was determined on a receiver operating characteristic (ROC) curve. In addition, the distribution-based MID (MID-D) was calculated by the standard error of measurement method. The MID-As found for the different subscales and the sum score were: pain 0.6, function 0.3, image 1.3, mental health 0.3, average sum score 0.6, and raw sum score 13.1. The cut-off values on the ROC curve were: pain 0.2, function 0.0, image 1.6, mental health 0.4, average sum score 0.4, and raw sum score 10. The MID-Ds were: pain 0.6, function 0.8, image 0.5, mental health 0.4, average sum score 0.5, and raw sum score 6.8. As was expected, the MID values differed according to the calculation method used. In light of the fact that the MID-As for the function and mental health subscales are below the measurement error of the instrument, it seems preferable to use the MID-D values for determining subscale changes. If the purpose is to analyze sum score changes (either the raw or average values), the MID-A is preferable because it includes the patient’s evaluation of the results of surgery.
The SRS-22 Patient Questionnaire has become the most widely used patient-reported outcome (PRO) instrument for evaluating individuals with idiopathic scoliosis. It has been properly validated in both adolescents [2, 4] and adults , and adaptations in several languages are now available [1, 6, 12, 13, 19, 22]. To determine the validity of the SRS-22, the internal consistency, test–retest reliability, factorial analysis, floor and ceiling effects, and convergent-discriminant validity have been tested [4, 10, 14].
The responsiveness is another important factor related to the construct validity of the instrument. The simplest way to assess this characteristic is by determining whether clinically relevant changes are associated with statistically significant differences in the scores of the scale over time. Along this line, Asher et al.  analyzed the responsiveness of the SRS-22 questionnaire in 58 patients who underwent surgery for idiopathic scoliosis. Patients were assessed at 3, 6, 12 (only 38 patients) and 24 (only 19 patients) months. Statistically significant differences were found in the pain, function, and image subscales, and in the total score. There were no significant changes in the mental health subscale. Bridwell et al.  analyzed a series of 56 adult patients 2 years after the surgical intervention. These authors found a significant change in all the SRS-22 subscales and in the total score. In addition, the change in the SRS-22 Questionnaire was greater that that found for the Oswestry Disability Index and the scores of the SF-12 components.
Nonetheless, to interpret the effect of a treatment, it is not only necessary to determine whether there is a statistically significant difference; but also important to assess how relevant the effects are for the patients [16, 26]. For this reason, it is preferable to evaluate an instrument’s responsiveness by determining the relationship between the score changes and the patient’s self-reported clinical changes over time . This approach leads to determination of the minimal important difference (MID), that is, the smallest difference in the score of the outcome instrument that informed patients perceive as important [11, 15, 16].
Currently, there is no consensus as to the best method for determining the MID. For a specific PRO, it is recommended to obtain a range of MIDs obtained by different methods . Most commonly, two approaches are used for this purpose: anchor-based methods and distribution-based methods. Anchor-based methods compare the change in the PRO being examined with another measure of change. Most commonly, patients are requested to rate the global perceived effect (GPE) of the intervention. From these results, the MID is usually calculated as the mean change of the scores in patients that have clearly improved. This MID is also termed minimal clinically important difference . Another method within the anchor-based group of methods is to determine the MID that best differentiates between patients that have clearly improved and the remaining patients. To this end, the optimal cut-off point is calculated on a receiver operating characteristic (ROC) curve.
The distribution-based methods are derived from the concept that the MID can be estimated from the distribution of the scores . One commonly used method is calculation of the standard error of measurement (SEM), that is, the measurement error inherent to the instrument. When calculating the SEM, an estimator of reliability is included (e.g., the intraclass correlation coefficient, ICC) . This type of MID is also known as the minimal detectable change .
The aim of this study is to calculate the MID of the SRS-22 Questionnaire with the use of two anchor-based methods (mean score change and optimal cut-off value) using the GPE as the external criterion for change, and with a distribution-based method, calculation of the SEM.
Patients were enrolled in the two participating centers. The eligibility criteria were the following: a diagnosis of idiopathic scoliosis and scheduled surgical treatment to correct the deformity, age 10 to 40 years, a suitable radiologic study, and completion of the SRS-22 questionnaire. Between July 2001 and July 2006, 97 patients from the two centers were included in the study. After an interval of at least 2 years since surgery, patients were contacted for a radiological and clinical review for the study.
The SRS-22 contains 22 questions covering 5 domains: function/activity 5 items; pain 5 items; self-perceived image 5 items; mental health 5 items; and satisfaction with treatment 2 items. Each item is scored from 1 (worst) to 5 (best). Each domain has a total sum score ranging from 5 to 25, except for satisfaction, which ranges from 2 to 10. The sum of the first 4 domains gives a maximum subtotal of 100, and when the satisfaction domain is included, the maximum total is 110. In the present paper, results are expressed as the mean (total sum of the domain divided by the number of items answered) for each domain and the subtotal score. Because the aim of this study is to analyze the MIDs of the different subscales, data on satisfaction had to be excluded, as most patients did not answer the questions about satisfaction before surgery. Thus, for the present study, the terms sum score and total score refer to the scores for the group of four scales, excluding the satisfaction scale. The total score was presented in three ways: the raw sum score without satisfaction (possible range 20–100), the average sum score (raw score/20), and the raw score percentage of improvement. The questionnaire used was the revised version, which includes the modification of question 18 [5, 7].
The assessment of GPE was performed at the follow-up visit. Patients were asked to rate their overall situation in relation to before surgery with a 4-point Likert scale: 1, Worse; 2, Same; 3, Better; or 4, Much Better.
MIDs were determined using three different methods:
Of the 97 patients included in the study, 4 were lost to follow-up and 2 refused to participate in the follow-up visit. Therefore, the analysis is based on 91 patients (93.8% response rate) (77 females and 14 males) with a mean age at the time of surgery of 18.1 years (range 10–38 years). The mean follow-up time was 45.6 months (range 24–87 months). All patients were diagnosed with idiopathic scoliosis and were scheduled to receive surgical treatment. The curve pattern was determined according to the Lenke classification : 32 cases were type 1, 17 type 2, 10 type 3, 5 type 4, 10 type 5, and 17 type 6. The mean magnitude of the upper thoracic curve was 49.8º, the main thoracic curve 61.5º, and the thoracolumbar/lumbar curve 60.8º. The type of surgery performed included posterior spinal fusion and instrumentation in 74 cases, anterior spinal fusion and instrumentation in 8 cases, and anterior and posterior spinal fusion and instrumentation in 9 cases. At the time of the follow-up visit, the average magnitude of the upper thoracic, main thoracic and thoracolumbar/lumbar curve was 32.8º, 29.4º, and 30.1º, respectively.
The scores for the SRS-22 scales before surgery and at follow-up are shown in Table 1. A statistically significant improvement was seen in the sum score and in the pain, image, and mental health scales, whereas the improvement in the function scale was not significant. The mean percentage improvement of the raw sum score was 13.9% (range −27.6 to 75.4%).
In the follow-up interview, 47 patients (51.6%) considered they were much better than before surgery (GPE = 4), 37 considered they were better (GPE = 3), 5 stated they were the same (GPE = 2), and 2 stated they were worse (GPE = 1). The validity of the scale assessing the global perceived effect was supported by the correlation between the GPE ratings and the preoperative/follow-up difference in the SRS-22 raw sum score (r = 0.4, P = 0.0001).
The mean SRS-22 scale scores and sum score (raw and average) for the group with an evident improvement (GPE = 4) and for the remaining patients (GPE < 4) are shown in Table 2. In addition, the percentage of improvement of the raw sum score is reported for each group. The mean score changes for the group with GPE = 4 was significantly greater (t test P < 0.05) for all the subscales and the sum score than the changes observed in the group with GPE < 4. The mean score change of the GPE = 4 group represents the minimal important difference using GPE as the anchor criterion (MID-A). The SEM was calculated using the system described in “Methods”, and the minimal importance difference was determined with the distribution-based method (MID-D).
The MID-A and MID-D, as well as the optimal cut-off points of the ROC curves, with the respective sensitivity, specificity, and area-under-the-curve (AUC) are shown in Table 3. The AUC represents the probability that the cut-off correctly differentiates between the two groups of patients (GPE = 4 vs. GPE < 4). In the way of orientation, an AUC of 0.7–0.8 is considered acceptable, whereas an AUC of 0.8–0.9 is considered excellent .
This study provides data on the various MIDs for the subscales and sum score of the SRS-22 patient questionnaire in a group of patients who underwent surgery for idiopathic scoliosis. The sample is heterogeneous for age, with both adolescents and adults included. Nevertheless, the version of the questionnaire used has shown similar validity for the age range (10–40 years) included in the analysis [5, 7].
As was expected, the MID values differed substantially depending on the calculation method used. This has also been observed in other outcome instruments commonly applied in the evaluation of low back pain [17, 20, 25]. There is currently no consensus regarding the recommended methods for determining the MID .
Calculation by an anchor-based method is simple and the result is easy to interpret. The MID-A (or minimal clinically important difference) represents the change in the scale score of a group of patients selected according to their response on the GPE rating. Nonetheless, some researchers question the use of global ratings. One important criticism is that the validity of the GPE scale is unknown; nonetheless, in our case, several data support the scale’s validity. First, the changes in the various subscales showed a significant difference between the two groups of patients (GPE = 4 vs. GPE < 4). Second, there was a statistically significant correlation between the GPE rating and the score changes. This correlation is generally considered necessary [11, 15, 16, 23] to accept the validity of the rating score. Moreover, because of this association, the GPE acquires the aspect of a fast, valid tool for evaluating the patient in daily practice. This has also been seen in other PROs used to assess low back pain . To assess the GPE, we used a scale including only four possible answers: much better, better, same, and worse. There is no consensus regarding how many categories a GPE rating scale should have. What is clear, however, is that the choice of patient group used as the external criterion for calculating the MID is arbitrary. The greater the number of levels, the smaller the difference between the adjacent levels, and the smaller will be the MID; hence, there is a risk that it will not exceed the inherent error of the measuring method. Although some experts have recommended the use of scales with seven levels , we opted for a scale with four levels, as has been applied in other studies investigating the outcome of spinal surgery .
Calculation of the MID-A in the present study was based on the score change in patients who considered their status “much better” than before surgery. This group was chosen after analyzing the SRS-22 score changes in each of the groups defined by the GPE. Taking the raw sum score as an example, we found that the mean change was 13 in the “much better” group, 4.8 in the “better” group, −0.8 in the “same” group, and −12 in the “worse” group. Thus, it is evident that patients in the “much better” group perceived an obvious improvement in their condition after the intervention, those in the “better” group only a slight improvement and those in the other groups believed they were not better. Thus, the MID-A represents a change from the patient’s viewpoint, that is what PROs attempt to express. Despite the potential problems with this method (recall bias, concern about anchor validity, etc.) it is the one recommended by most experts [23, 24].
The optimal cut-off point is another anchor-based method. Based on an external anchor criterion (GPE category in this case), analysis of the ROC curve determines a value that allows differentiation between patients who have improved and those who have not. In addition, data are obtained on the accuracy (area under the curve), sensitivity, and specificity of this cut-off value.
The MID-D (or minimal detectable change), described as the minimal amount of change that is not attributable to the noise or imprecision of the measurement instrument, is a more conservative method that is particularly useful in questionnaires such as the SRS-22, in which the difference between the various responses is subtle . One advantage of the MID-D is that it is easier to generate than anchor-based data. The biggest criticism to this method is that it is based purely on statistical calculations and does not take into account information provided by the patients. The MID-D, based on calculation of the SEM, represents the minimal score change, above which (with 95% certainty) it is guaranteed that the change is not due to measurement error. It provides useful information for assessing MID values obtained with anchor-based methods because it enables assessment of the distance between the MID and the measurement error .
We found that the MID-As of the function and mental health subscales were below the MID-D, whereas the MID-As of the image and pain subscales, as well as the sum score (both raw and average) were greater than the respective MID-Ds. In clinical research, when the MID-A value is lower than the MID-D, the validity of the instrument may be questioned because it cannot be guaranteed that the change observed is not due to an measurement error of the instrument. In contrast, when the MID-A is greater than the MID-D, the MID-A is the value of choice because in all probability it is a true change, perceived by both the patient and the instrument.
Using the optimal cut-off point method, we obtained lower MIDs (except in the image subscale) that were below the level of the measurement error. Moreover, the sensitivity and specificity were low, the curve did not have a smooth profile, and the area under the curve was far from excellent. We also calculated the MID-A of the percentage of improvement of the raw sum score (range, 13.2–20.6%). It has been suggested that the percentage of improvement can be useful in instruments (such as the SRS-22) that present a marked ceiling effect, that is, those in which patients with high baseline scores have little possibility to improve . The analysis of the MIDs was only performed in relation to improvement. It was considered unfeasible to perform the analysis of change for relevant worsening because of the small number of patients involved (only two considered they were worse after surgery).
Because there is no consensus as to the optimal method for calculating the MID for a specific PRO instrument in a specific patient population, it is inevitable that a range of MID values must be managed. In the case of the SRS-22, the situation is complex because the questionnaire is divided into five scales (sum score and four subscales) each with its respective MID. The investigator must decide which MID is the most suitable, in keeping with the objectives of the clinical research. An MID value can be used to dichotomize a group of patients under study. Dichotomization facilitates comprehension of the results for clinicians because they are often unaware of the clinical significance of a specific raw score or score change . Dichotomization of patients according to an MID value (e.g., the difference of mean values or a percentage of patients exceeding this score) is useful for designing clinical studies (sample size calculation) and interpreting the results, and is the method recommended by regulating agencies, such as the Food and Drug Administration .
To assess the results of a specific treatment, the criteria for success/failure should be defined a priori, and it is crucial to know the MIDs of the measurement instrument for this task. Although it is not within the aim of the present study, the data presented allow a preliminary analysis of the effects of surgery on the clinical status of patients assessed with the SRS-22. The results show that patients have a clinically significant improvement only in pain, self-perceived body image, and overall quality of life. These data should be related to the characteristics of the study group. The study patients had moderately severe scoliosis (Cobb angle around 60º), in which the main clinical problem is the trunk deformity. Our data confirm a generalized clinical impression: in this range of curve magnitude, the impact of scoliosis is produced in the areas of body image and pain, and this has repercussions on the patient’s general quality of life. It is logical that in this group of patients no significant changes are produced in function or mental health. Thus, the lack of responsiveness of the SRS-22 in these areas may correspond to a clinical reality: at baseline, before surgery, these patients have high scores in these domains and there is little margin for improvement. It would be interesting to assess the responsiveness of the instrument in groups of patients with more severe baseline status and lower scores on the subscales before surgery.
In the case of the SRS-22 and in accordance with our results, use of the MID-D should be recommended when the aim is to analyze each of the scales of the instrument (Pain 0.6; Function 0.8; Image 0.5; Mental Health 0.4; Raw-Sum 6.8; Average-Sum 0.5) because some subscales show an MID-A below the measurement error. Since the MID-D does not take into account the clinical importance of the change, it should be accompanied by data on the patients’ satisfaction with the treatment . This is not a problem because the SRS-22 has a valid and reliable scale on satisfaction . However, if the interest is focused on the patient’s response, we suggest using only the MID-A of the sum score (Raw-sum 13.1; Average-sum 0.6), which is higher than the MID-D. Thus, the SRS-22 would be presented with a single value, as occurs with other PRO instruments, such as the Oswestry Disability Index and the Roland–Morris Disability Questionnaire. The cut-off points obtained from the ROC curve seem to have little use in the SRS-22 because they are below the measurement error, and the sensitivity and specificity are low.
To our knowledge, this is the first study that has provided data on the MIDs of the SRS-22 Patient Questionnaire. In all probability, future studies in this line will yield values somewhat different from the MIDs reported herein. As Revicki et al.  have suggested for other PRO instruments, it is likely that a single MID value will not be established for the SRS-22 until a systematic review involving several studies is available.
The authors want to acknowledge Celine Cavallo for translating and editing the paper.