|Home | About | Journals | Submit | Contact Us | Français|
Using health-related quality-of-life measures for patient management requires knowing what changes in scores require clinical attention. We estimated changes on the European Organization for Research and Treatment of Cancer Quality-of-Life-Questionnaire-Core-30 (EORTC-QLQ-C30) representing important changes by comparing to patient-reported changes in supportive care needs.
This secondary analysis used data from 193 newly-diagnosed cancer patients (63% breast, 37% colorectal; mean age 60 years; 20% male) from 28 Canadian surgical practices. Participants completed the Supportive Care Needs Survey-Short Form-34 (SCNS-SF34) and EORTC-QLQ-C30 at baseline, 3 weeks, and 8 weeks. We calculated mean changes in EORTC-QLQ-C30 scores associated with improvement, worsening, and no-change in supportive care needs based on the SCNS-SF34. Mean changes in the EORTC-QLQ-C30 scores associated with the SCNS-SF34 improved and worsened categories were used to estimate clinically important changes, and the ‘no change’ category to estimate insignificant changes.
EORTC-QLQ-C30 score changes ranged from 6 to 32 points for patients reporting improved supportive care needs; statistically significant changes were 10-32 points. EORTCQLQ-C30 score changes ranged from 21-point worsening to 21-point improvement for patients reporting worsening supportive care needs; statistically significant changes were 9-21 points in the hypothesized direction and a 21-point statistically significant change in the opposite direction. EORTC-QLQ-C30 score changes ranged from a 1-point worsening to 16-point improvement for patients reporting stable supportive care needs.
These data suggest 10-point EORTC-QLQ-C30 score changes represent changes in supportive care needs. When using the EORTC-QLQ-C30 in clinical practice, scores changing ≥10 points should be highlighted for clinical attention.
One of the important applications of patient-reported outcome (PRO) measures is in routine clinical practice to inform patient management [1-12]. For this purpose, patients complete PROs and the resulting scores are provided to clinicians who may then use them to help identify and address issues in patients’ functioning and well-being. However, this requires an understanding of what scores require clinical attention, including those that are poor in absolute terms and those representing an important worsening from a previous assessment. Scores that are poor in absolute terms are identified using a cut-off threshold at a single timepoint (e.g., all fatigue scores >30); score changes are determined by differences in scores at two timpeoints (e.g., a fatigue score that worsened from 5-points at Time 1 to 20-points at Time 2 – a change of 15 points, which may be important even though it does not reach the absolute threshold of 30).
PRO scores representing potential problems, either in absolute terms or an important change, can be brought to the attention of clinicians to help inform their patient management. Notably, patients also frequently are able to access their PRO results and assess their progress . Research has demonstrated that highlighting in some way the PRO scores on the results reports that may require clinical attention can help clinicians and patients apply the PRO results to improving the patient’s care . However, for many PRO measures, the absolute cut-off scores and important change scores are unknown. Research is needed to identify absolute cutoff scores and important score changes to inform interpretation of the PRO score reports, thereby improving the value of using PROs in clinical practice.
In previous research, we used a needs assessment to identify absolute cut-off scores associated with patient-reported unmet needs [15-16]. We first conducted an analysis using data from 117 breast, prostate, and lung cancer patients in the United States who completed both the European Organization for Research and Treatment of Cancer Quality-of-Life-Questionnaire-Core-30 (EORTC-QLQ-C30) and the Supportive Care Needs Survey-Short Form-34 (SCNS-SF34) at a single point in time . These data were from a study that examined the health-related quality of life (HRQOL) and supportive care needs of cancer patients, and those two measures were selected based on preliminary research that indicated they were most relevant for clinical practice applications . The study data were later used for a secondary data analysis to identify cut-off scores on the EORTC-QLQ-C30 associated with patient-reported unmet needs. Specifically, we examined the content of the SCNS-SF-34 to identify items potentially associated with EORTC-QLQ-C30 domains and tested these relationships using receiver-operating characteristic (ROC) curve analysis. There were six EORTC-QLQ-C30 domains (physical function, role function, emotional function, pain, fatigue, global health/QOL) associated with SCNS-SF34 items as established by areas under the ROC curve (AUC) ≥0.70. For these six domains, we identified cut-off scores on the EORTC-QLQC30 associated with unmet needs based on the SCNS-SF34 with sensitivity ≥0.85 and specificity ≥0.53. We later replicated the analyses in a sample of 408 Japanese breast cancer patients who completed the Japanese versions of the EORTC-QLQ-C30 and SCNS-SF34 . The results of the Japanese data analysis were substantially similar to those from the original study, in terms of the EORTC-QLQ-C30 domains significantly associated with SCNS-SF34 items, and the sensitivity and specificity associated with the various cut-offs. Together, these two analyses provide useful information regarding absolute cut-off scores on the EORTC-QLQC30 to serve as a threshold for identifying patients with unmet needs and potentially requiring clinical attention.
However, because the previous datasets were cross-sectional, it was not possible to explore the changes in EORTC-QLQ-C30 scores that are associated with changes in patients’ supportive care needs. In this analysis, we used data from a longitudinal study in which cancer patients completed the EORTC-QLQ-C30 and SCNS-SF34 at multiple time points. The purpose was to estimate score changes on the EORTC-QLQ-C30 representing changes in supportive care needs, and thus requiring clinical attention. Specifically, mean changes in the EORTC-QLQC30 scores associated with the SCNS-SF34 improved and worsened categories were used to estimate clinically important changes, and the ‘no change’ category was used to indicate insignificant changes. The results of this analysis were intended to facilitate the use of the EORTC-QLQ-C30 in clinical practice by identifying score changes that should be highlighted for clinical attention.
This was a secondary analysis using data collected from a cluster randomized controlled trial (RCT) evaluating a community-based nursing-lead coordination of care intervention . The original RCT was conducted in Toronto, Canada, and recruited newly diagnosed breast and colorectal cancer patients within 7 days of their surgery at 28 participating surgical clinics. Eligibility criteria included no previous or concomitant malignancies (except non-melanoma skin cancer or carcinoma in situ of the cervix), legally able to provide informed consent, 18 years of age or older, able to speak and read English, and residing in Toronto, ON.
Trained interviewers collected patients’ QLQ-C30 and SCNS-SF34 questionnaires via telephone at baseline (2-7 days post-discharge from surgery), 3 weeks (2-3 weeks post-baseline), and 8 weeks (8-10 weeks post-baseline), along with other PRO measures. The data collection timepoints were based on the intervention being evaluated in the RCT and the estimated care trajectories for patients following cancer surgery. Specifically, the intent was to capture patient PROs following recovery (4 weeks after discharge) but before their formal entry into the cancer care system to begin treatment (10 weeks). Respondents were provided with a copy of the instrument to follow during the telephone interview. The interviewer read the instructions on the survey, explained the response options, and repeated the response options as necessary during the interview. The original study found no significant effect of the intervention on PRO scores.
The EORTC-QLQ-C30 is a multi-dimensional HRQOL measure designed for use in cancer patients . It includes five functional measures (physical, role, emotional, social, cognitive), eight symptoms (fatigue, pain, nausea/vomiting, appetite loss, constipation, diarrhea, insomnia, dyspnea), as well as global health/QOL and financial impact. Most items use a 4-item scale from ‘not at all’ to ‘very much’ and a one-week recall period. Raw scores are transformed to a 0-100 scale, with higher scores representing better functioning/QOL and greater symptom burden. The QLQ-C30 is used widely as both an outcome measure in clinical studies and a PRO in clinical practice [3, 5, 10, 20], and studies that have specifically investigated the appropriateness of questionnaires for clinical practice have supported its use [17, 21].
The SCNS-SF34 addresses unmet needs in five domains: physical and daily living, psychological, health system and information, patient care and support, and sexual [22-23]. For each item, respondents use a five-point scale: 1=not applicable (meaning they did not experience the issue), 2=satisfied (meaning the issue applies to them but is being adequately addressed), 3=low unmet need, 4=moderate unmet need, and 5=high unmet need. The instructions provide detailed explanations regarding each response category and explicitly label responses of “not applicable” and “satisfied” as “no need” and low, moderate, or high as “some need.” The instructions and questionnaire can be downloaded from . The recall period used in the RCT was ‘since surgery’ for baseline, or since the last survey for the week 3 and week 8 assessments.
The original study was approved by Hamilton Health Sciences Research Ethics Board, and the Johns Hopkins team was provided with a de-identified dataset. Thus, the Johns Hopkins School of Medicine Institutional Review Board deemed the current analysis as exempt.
The sample demographics were summarized using descriptive statistics. As a preliminary step to our main analysis, we first used the baseline data from the Canadian sample to replicate the analyses conducted in the original cross-sectional US and Japanese samples examining absolute cut-off scores [15-16]. This preliminary analysis was conducted to confirm that the associations between SCNS-SF34 items and EORTC-QLQ-C30 absolute cut-off scores previously established [15-16] were also present in this Canadian sample, thereby supporting our approach of using analogous methods to examine changes in scores over time – the novel aspect of the present analysis. The details of the absolute cut-off score analyses have been described [15-16]. Briefly, we calculated the AUC for each QLQ-C30 domain and potentially related SCNS-SF34 items and domains. Based on our previous analyses [15-16], we hypothesized that these six QLQ-C30 domains would be associated with these SCNS-SF34 items with AUC ≥0.70: physical function with ‘work around the home’; role function with ‘work around the home’; emotional function with ‘feelings of sadness’, global health/QOL with ‘feeling unwell a lot of the time’; pain with ‘pain’; and fatigue with ‘lack of energy/tiredness’. We calculated the associated sensitivity and specificity for various cut-offs and compared them qualitatively with our previous findings [15-16].
We then proceeded with the analysis exploring changes in QLQ-C30 scores associated with changes in patient-reported supportive care needs, as measured by the SCNS-SF34. We first calculated the number of observations for each potential change in SCNS-SF34 responses for the relevant items. For example, observations that were 1=not applicable at baseline and 3=low unmet need at week 3 were counted as worsening because the patient went from no unmet need to some unmet need. As another example, observations that were 4=moderate unmet need at baseline and 2=satisfied at week 8 were counted as improvement because the patient went from some unmet need to no unmet need. This was done for changes between each pair of timepoints separately: baseline vs. week 3, baseline vs. week 8, and week 3 vs. week 8.
Figure 1 depicts changes on SCNS-SF34 items categorized as improvement, worsening, or unchanged. Because of sparse data for categories 3=low unmet need, 4=moderate unmet need, and 5=high unmet need, those three categories were combined into a single category “some unmet need.” Because of this merging across categories, changes within “some unmet need” (e.g., response of 5=high need at Time 1 and 3=low need at Time 2) were not considered in this analysis because they did not represent a change between “no need” and “some need,” nor did they represent no-change. Therefore, responses of 3=low need, 4=moderate need, or 5=high need at Time 1 and responses of 1=not applicable or 2=satisfied at Time 2 were categorized as improvement. The opposite changes were categorized as worsening (responses of 1 or 2 at Time 1 and 3, 4, or 5 at Time 2). The unchanged category included responses of 1=not applicable at both Time 1 and Time 2, and 2=satisfied at both Time 1 and Time 2. We combined the results across all three timepoints (e.g., observations of changes using baseline as Time 1 and week 3 as Time 2 were combined with observations of changes using week 3 as Time 1 and week 8 as Time 2). For each category (improved, worsening, unchanged), we calculated the mean changes in QLQ-C30 domain scores using intercept-only generalized estimating equation (GEE) linear regression models, to account for correlation among changes from the same patient. Effect sizes were calculated as the model estimate divided by the standard error (equivalent to the z-score). We did not adjust for multiple comparisons, and include p≤0.05 to represent statistical significance for descriptive purposes only.
A total of 193 patients participated in the RCT. The mean age was 60 years (range: 22 to 88), and 20% were male. The majority (63%) had breast cancer, and the remainder (37%) colorectal cancer. Over half (57%) had a college degree, and 62% were married. Of the 193 patients who completed the baseline assessment, 186 (96%) completed the week 3 assessment, and 179 (93%) completed the week 8 assessment.
The preliminary analyses comparing the results of the Canadian baseline data to the previous results regarding absolute cut-off scores in the US and Japanese sample produced generally similar results, though there were some differences of note. Three of the six QLQ-C30 domains that were associated with SCNS-SF34 items in the original  and Japanese  analyses with AUC≥0.70 met the 0.70 threshold in these Canadian data: emotional function, pain, and global health/QOL (Table 1). Whereas for physical function and role function, the highest AUC was 0.69, and for fatigue, the highest AUC was 0.68. All of the QLQ-C30 domains that had AUCs<0.70 in our previous analyses [15-16] had AUCs<0.70 in this analysis.
Second, the SCNS-SF34 items found to have the strongest association with the QLQ-C30 domains were not always the same in the Canadian data as they had been in both the US and Japanese data, though most of these differences were minor. The association with physical function was higher for the SCNS-SF34 item ‘feeling unwell a lot of the time’ (AUC=0.69) than for ‘work around the home’ (AUC=0.67). The association with emotional function was higher for the SCNS-SF34 item ‘feeling down or depressed’ (AUC=0.79) than for ‘feelings of sadness’ (AUC=0.76). The difference for role function was somewhat greater, with the SCNS-SF34 item ‘not being able to do the things you used to do’ having an AUC=0.69 versus ‘work around the home’ with an AUC=0.58. Because of these differences, when calculating the sensitivity and specificity of the various QLQ-C30 cut-off scores, we investigated both the item with the strongest association from the previous analyses [15-16] and the item with the strongest association in the Canadian data (Table 2). The results were not substantially different using the SCNS-SF34 item with the strongest association in the Canadian data, compared to the SCNS-SF34 item with the strongest association from the US and Japanese analyses. Therefore, to maintain consistency across studies, we used the SCNS-SF34 item previously established as associated with the QLQ-C30 domain [15-16] in the remaining analyses.
Table 3 presents the changes in QLQ-C30 scores associated with improvement, worsening, and no-change on the associated SCNS-SF34 items. These change scores provide estimates of changes on these QLQ-C30 domains that represent a change in patient’s unmet needs, and therefore provide interpretation guidance for future applications using the QLQ-C30 in clinical practice.
To summarize, 12 QLQ-C30 change scores were associated with improvements on the SCNSSF34 [i.e., changes in the 6 QLQ-C30 domains for SCNS-SF34 items that went from (1) ‘some unmet need’ to ‘not applicable’ or (2) ‘some unmet need’ to ‘satisfied’]. Across these 12 changes, the number of observations ranged from 25 to 74. The QLQ-C30 mean improvement ranged from 6 to 32 points, with 11 changes reaching statistical significance (range 10-32 points). The absolute value of the effect sizes ranged from 1.70 to 7.90. In general, the changes in QLQ-C30 scores for patients who improved from ‘some unmet need’ to ‘not applicable’ were larger than for those who improved from ‘some unmet need’ to ‘satisfied,’ suggesting that there is a bigger change between not having the issue at all versus having the issue adequately addressed.
There were fewer observations for the 12 EORTC-QLQ-C30 change scores associated with SCNS-SF34 items categorized as worsened [i.e., changes in the 6 EORTC-QLQ-C30 domains for SCNS-SF34 items that went from (1) ‘not applicable’ to ‘some unmet need’ or (2) ‘satisfied’ to ‘some unmet need’]. Sample sizes ranged from 8 to 66. While the pattern of EORTC-QLQC30 mean changes from ‘not applicable’ to ‘some unmet need’ was generally as hypothesized, the pattern from ‘satisfied’ to ‘some unmet need’ was less consistent, with 3 domains’ mean changes representing improvement rather than worsening. Overall, the EORTC-QLQ-C30 mean changes ranged from a 21-point worsening to a 21-point improvement. Four changes were statistically significant in the hypothesized direction (range 9-21 points), and one in the opposite direction (21 points). Effect sizes for associations in the hypothesized direction went as high as 3.70, but we also found effect sizes as high as 3.60 in the opposite direction.
The sample sizes for the 12 EORTC-QLQ-C30 change scores associated with SCNS-SF34 items that were unchanged were much larger, ranging from 49 to 217. Specifically, these 12 QLQ-C30 changes were calculated for SCNS-SF34 items that were (1) ‘not applicable’ at both timepoints or (2) ‘satisfied’ at both timepoints on the 6 EORTC-QLQ-C30 domains (physical, role, emotional, global health/QOL, pain, fatigue). Even though the magnitude of the mean QLQ-C30 change scores tended to be smaller for these categories, ranging from a 1-point worsening to 16-point improvement, 10 of the changes were statistically significant. Effect sizes ranged from 0.50 to 5.80 in absolute value.
An important enabler of using PROs in clinical practice is guidance on how to interpret questionnaire scores, and in particular, identifying scores requiring attention [25-26]. In previous research, we were able to identify absolute cut-off scores on the QLQ-C30 associated with patient-reported unmet needs [15-16]. This information is useful, as a prior analysis demonstrated that the QLQ-C30 domains that were the poorest in absolute terms were most likely to predict the issues bothering patients the most . It may also be useful to identify important changes in QLQ-C30 scores, and the results of the present analysis provide estimates of QLQ-C30 score changes associated with changes in patient’s needs. These estimates of meaningful change can be used in practice to highlight domains on the QLQ-C30 that may require clinical attention.
The approach taken in this analysis is similar to, but distinct from, analyses aiming to identify a minimally important difference (MID). This analysis provides estimates of clinically important changes representing changes in patients’ supportive care needs. It should be noted that the estimates for important changes were derived from group-level means, and that an individual patient’s perception of change may differ. Practically speaking, these estimates are intended to identify potential concerns for the clinician and patient to discuss, rather than to make a definitive diagnosis. Further, these estimates do not necessarily reflect the smallest difference that would be considered important and suggest the need for a change in management (i.e., the MID) , nor do they distinguish between minimal, moderate, and maximal important differences. Nevertheless, it is instructive to compare our findings to other research that has investigated interpretation of changes on the EORTC-QLQ-C30. Osoba et al.  and King  proposed categorizing changes on the QLQ-C30 of 5-10 points as small differences, 10-20 points as moderate differences, and greater than 20-point differences as large, with a 10-point change considered meaningful . The findings of our study largely confirm these estimates.
For improvement, only one mean change was less than 10 points, seven mean changes were between 10 and 20 points, and four were higher than 20 points. The 11 changes that reached statistical significance ranged from 10 to 32 points. This suggests that the results from our novel approach of estimating changes in QLQ-C30 scores based on changes in supportive care needs are consistent with previous research aimed at estimating minimally important differences, at least for estimates of improvement.
The results of our analysis of score worsening are more difficult to interpret. Of the 12 SCNSSF34 change scores categorized as worsened, three mean changes in the hypothesized direction were less than 10 points, three were between 10 and 19 points, and two were 21 points. The four changes that were statistically significant in the hypothesized direction ranged from 9 to 21 points. However, four of the mean changes occurring in the physical function, role function, and pain domains were actually improvements, ranging from 0.5 to 21 points, with the 21-point change reaching statistical significance. These findings contrary to our hypotheses are consistent with the phenomenon of response shift.
Response shift is defined as a change in “internal standards, values, or conceptualizations” [31, p1115], leading to patients shifting how they would have responded retrospectively. Thus, a patient who initially rated their health as good, who then worsened but changed their valuation of health states (i.e., had a response shift), might now report their health as very good. In support of this explanation, Kvam et al.  found evidence of response shift on the QLQ-C30 in multiple myeloma patients. In the Kvam study, patients who worsened retrospectively reported better HRQOL at baseline for pain, fatigue, and physical function, but improving patients only demonstrated response shift on global QOL. The Kvam findings, in combination with the results from this study, suggest patients who are worsening may have shifted their standards or valuation of their previous HRQOL, leading to counter-intuitive results when compared with current reports on certain domains, particularly physical function and pain. What is interesting in our study is that both the SCNS-SF34 and QLQ-C30 were both self-reported measures collected simultaneously. Why would patients report worsening on the needs assessment but improvements on an HRQOL questionnaire? This question should be explored further to determine whether needs assessments may be less subject to response shift, and if so, why.
Another reason the results of our analysis would not be expected to establish ‘minimally’ important differences is because all of the unmet need categories (low, moderate, high) were collapsed into ‘some unmet need.’ There were not enough observations that changed between low and moderate and between moderate and high to examine changes of a single category. Further research using larger databases may be able to examine these smaller changes to provide a more refined estimation of the smallest difference representing a change in unmet needs.
The RCT that provided the data for this analysis assessed baseline after surgery, so we do not have scores for before surgery. However, because this analysis compared changes in EORTCQLQ-C30 scores with changes in SCNS-SF34 scores, regardless of when the changes occurred, it is not relevant whether the changes in scores occurred before or after surgery. We also combined the mean changes across all three time points (e.g., baseline to 3 weeks and baseline to 8 weeks). It may be instructive to examine the changes for each time point separately using larger datasets. Because the sample comprised 63% patients with breast cancer, the results here are most reflective of these patients, and there may be differences in the findings by cancer type. However, due to small sample sizes, we were unable to explore these differences fully.
Finally, the statistical significance of our results should be interpreted with caution. Due to differing sample sizes and associated power, some changes were small but statistically significant while others were large but statistically insignificant. Nevertheless, the magnitudes of the changes that we observed and effect sizes are generally informative. In summary, the results from this study add to the body of literature regarding the clinical significance of changes in QLQ-C30 scores that may require clinical attention. Our findings support changes of 10 points or greater as being clinically meaningful, particularly for improvements. The findings regarding worsening also suggest 10-points as being clinically meaningful, but due to the evidence of possible response shift, these findings should be interpreted with greater caution. Notably, the approach used here would be expected to provide larger than the ‘minimal’ changes representing a change in unmet needs, giving us confidence that these changes are important. Combining these results with previous research in this area contributes to the interpretability of the QLQ-C30, thereby facilitating its use in clinical practice to improve patient care.
This analysis was funded by the American Cancer Society (# MRSG-08-011-01-CPPB). The original data collection was supported by the Canadian Health Services Research Foundation, Ontario Ministry of Health and Long-term Care. Drs. Snyder and Carducci are members of the Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins (P30 CA 006973).
CONFLICT OF INTEREST STATEMENT The authors report no conflict of interest.