PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Respir Med. Author manuscript; available in PMC 2010 April 19.
Published in final edited form as:
PMCID: PMC2856332
NIHMSID: NIHMS188878

The SF-36 and SGRQ: validity and first look at minimum important differences in IPF

Abstract

Rationale:

Health-related quality of life (HRQL) is an important outcome in drug trials. Little is known about how the Short Form-36 (SF-36) and Saint George's Respiratory Questionnaire (SGRQ) perform in idiopathic pulmonary fibrosis (IPF).

Objectives:

To examine the validity of the SF-36 and SGRQ and to determine scores from each that would constitute a minimum important difference (MID).

Methods:

We analyzed data from a recently completed trial that enrolled subjects with well-defined IPF who completed the SF-36, SGRQ, and Baseline/Transition Dyspnea Index at baseline and six months. We compared mean changes in HRQL scores between groups of subjects whose disease severity changed over six months according to clinical anchors (FVC, DLCO, and dyspnea). We estimated the MID for each domain by using both anchor- and distribution-based approaches.

Main results:

Results supported the validity of the SF-36 and SGRQ for use in longitudinal studies. Mean changes in domain scores differed significantly between subjects whose clinical status improved and those whose clinical status declined according to the anchors. MID estimates for the SF-36 ranged from 2-4 points and from 5-8 points for the SGRQ.

Conclusion:

In IPF, the SF-36 and SGRQ possess reasonable validity for differentiating subjects whose disease severity changes over time. More studies are needed to continue the validation process, to refine estimates of the MIDs for the SF-36 or SGRQ, and to determine if a disease-specific instrument will perform better than either of these.

Keywords: interstitial lung disease, pulmonary fibrosis, quality of life, validity, minimum important difference

Introduction

Idiopathic pulmonary fibrosis (IPF) is a progressive interstitial lung disease (ILD) without effective therapy. Patients with IPF have impaired health-related quality of life (HRQL) in nearly every domain,1 and dyspnea is one strong driver of that impairment.2

By quantify patients' perceptions,3 HRQL instruments capture information that physiologic or radiologic measures do not. Thus, investigators view HRQL as an important outcome to use when attempting to determine the effectiveness of a particular intervention. In patients with IPF, the Short Form- (SF-) 36 and Saint George's Respiratory Questionnaire (SGRQ) yield scores reflecting impaired HRQL, and at single time points, their scores correlate with clinical measures of IPF severity.4 In IPF, what is unknown about either of them is whether they are responsive to underlying change in status and whether they can discriminate between patients whose status over time improves, remains unchanged, or declines. Also lacking for IPF is a basic understanding of how to interpret changes in HRQL scores. Finally, the minimum score change considered clinically important (i.e., the minimum important difference or MID) is known for the SF-36 and SGRQ for certain conditions—but not for IPF.

The overarching goal of this study was to advance understanding and improve interpretation of SF-36 and SGRQ scores in IPF. The main hypotheses were that scores would decline in subjects whose disease progressed; that both the SF-36 and SGRQ could discriminate patients who improve, remain stable, or decline or over time; and that we could use both anchor- and distribution-based methods to establish MID estimates for these instruments in patients with IPF.

Methods

Overview

We used data from a recently completed trial (the Bosentan Use in ILD-1 or BUILD-1)5 for this retrospective analysis. Details of the BUILD-1 study have been described previously.5 Briefly, subjects had very well-defined IPF according to accepted consensus guidelines.6,7 The SF-36 version 1®, SGRQ, and Baseline/Transition Dyspnea Index (BDI/TDI) were administered at baseline, six months, and twelve months. We used baseline and six month data for our study because this provided us the greatest number of datapoints with which to perform our analyses.

Assessment tools

The SF-36 is a generic questionnaire with 36 items that measure functional health and well-being.8 It comprises eight domains and two psychometrically-established summary components, each derived from four domain scores. Domain and summary component scores range from 0-100; higher scores correspond to better health status or well-being. For each domain and summary component, as endorsed by SF-36 developers, we used scoring algorithms to generate linear T-score transformations (http://gim.med.ucla.edu/FacultyPages/Hays/util.htm; last accessed August 1, 2008). Such transformations place scores on scales with mean scores equal to 50 (and standard deviations of 10). The SGRQ is a self-administered, obstructive lung disease-specific questionnaire with 50 items comprising three domains, each scored from 0-100, with higher scores corresponding to worse HRQL.9 The BDI has three domains.10 The TDI is a follow-up questionnaire that asks respondents to rate (from ‘major deterioration’ = −3 to ‘major improvement’ = +3) how dyspnea has changed over time for each BDI domain; thus, scores for the TDI range from −9 (largest deterioration) to +9 (largest improvement).

Statistical Analysis

We used baseline values to calculate mean scores, standard deviations, standard errors of measurement (SEM), and internal consistency reliability (Cronbach's alpha11) coefficients for each instrument. Next, we applied the methods of Kosinski and colleagues12 and their use of known-groups validity13 to examine relationships between either SF-36 or SGRQ scores and FVC, DLCO, and dyspnea, which we will heretofore refer to as anchors. Excluded from our analyses were subjects whose FVC, DLCO, TDI, or entire HRQL questionnaires were missing at either baseline or six months.

We began the analyses by calculating anchor change scores. For FVC, we categorized subjects as “unchanged” if the difference in the raw FVC value at month six was within 7% (inclusive) of the baseline value, as “changed minimally” if the difference at month six was between 7 and 12% (exclusive) of baseline, and as ”changed more than minimally” if the difference was ≥ 12%. We used the widely accepted cut-off value of 15% to represent a significant difference from baseline in DLCO; we elected not to parse DLCO into more categories because of the greater statistical “noise” in DLCO as compared with FVC, and it is far less clear to us what the range for a minimum change in DLCO should be. Thus, we did not used DLCO as an anchor in the MID analyses (see below). We used TDI scores as an anchor because dyspnea has been shown to be a strong influence on HRQL in patients with IPF,2 and attempts have been made to define the MID for the TDI (at least in populations other than IPF14).

Next, we calculated mean SF-36 and SGRQ scores for subjects within each anchor change category. We used ANOVA—one for each HRQL domain—to compare contrasts in mean changes in SF-36 or SGRQ domain scores across anchor change categories. These models generated F-statistics; a larger F-statistic connotes a domain that yields a larger separation between mean HRQL scores across anchor change categories and/or a smaller within group variance.

We used Pearson product-moment correlation coefficients to examine relationships between anchors and HRQL scores. To derive MID estimates for domains from each instrument, we used the effect size (ES) and the 1-SEM criterion15,16 as distribution-based approaches. Although there is no consensus about how or even whether17 the ES should be used in the estimation of MIDs, some investigators consider 0.5 to correspond to the MID,18,19 and that is what we used here. In the first anchor-based approach, we used linear regression to examine the relationship between change in HRQL (dependent variable) and change in the anchor—FVC or TDI score (independent variables).20 We derived a point estimate for the MID by plugging into these equations values representing a minimal change (e.g., 10% for raw FVC—roughly the midpoint of our minimum change range of 7-12%—and one point for the TDI) in the independent variable. In the second anchor-based approach, we calculated the weighted average of mean change scores for each HRQL domain for subjects who changed (either improved or declined) minimally according to the FVC and TDI anchors. All analyses were performed with SAS version 9.1.3 (SAS Institute Inc., Cary, NC), and p-values < .05 were considered statistically significant.

Results

Subjects

Demographics, baseline values for FVC and DLCO, and proportions of subjects who changed according to the anchors are found in Table 1. Table 2 displays baseline data for the SF-36 and SGRQ.

Table 1
Characteristics of subjects in BUILD-1
Table 2
Baseline data for the SF-36 and SGRQ

Changes in SF-36 and SGRQ

Except for the Symptoms domain for DLCO, mean change scores from each SGRQ domain differed significantly between categories of change in each of the three anchors. Findings were similar for certain SF-36 domains (Table located in online supplement).

The FVC anchor

For the SF-36, the Physical Functining and Social Functioning domains along with the Physical Component Summary score (PCS) were most useful (i.e., valid) to discriminate between all categories of change in the FVC anchor (Figure 1). The Role Emotional domain (RE) discriminated best between the subset of subjects whose FVC either improved or declined minimally: the difference in RE change scores between subjects in whom FVC improved by 7-12% and those in whom FVC declined by 7-12% was 1.1 standard deviation units (e.g., the difference between an increase of 10.6 points for subjects with FVC improvement 7-12% and a decline of 5.3 points for subjects with FVC decline 7-12% divided by the baseline standard deviation for RE: 10.6-(−5.3)/14.2). For the SGRQ, the Impact domain discriminated best between all categories of change in the FVC anchor as well as between subjects whose FVC either improved or declined minimally: the difference in Impact change scores between subjects in whom FVC improved by 7-12% and those in whom FVC declined by 7-12% was 0.7 standard deviation units (SDU).

Figure 1
Changes in SF-36 scores stratified on changes in FVC%

The DLCO anchor

For the SF-36, the PCS and RE domains discriminated best between all categories of change in the DLCO anchor (Figure 2A). The difference in RE change scores between subjects in whom DLCO improved by > 15% and those in whom DLCO declined by > 15% was 0.9 SDU. For the SGRQ, the Impact domain discriminated best between categories of change in the DLCO anchor (Figure 2B). The difference in Impact change scores between subjects in whom DLCO improved by > 15% and those in whom DLCO declined by > 15% was 0.8 SDU.

Figure 2A
Changes in SF-36 scores stratified on changes in DLCO%
Figure 2B
Changes in SGRQ scores stratified on changes in DLCO%

The TDI anchor

For the SF-36, the Vitality (VT) and PCS domains discriminated best between all categories of change in the TDI anchor. Because of low numbers of subjects with TDI scores of 1 or −1, for this analysis, we elected to compare differences in HRQL change scores between subjects with TDI scores of 2 and those with TDI scores of −2. The VT domain remained most useful to discriminate between subjects whose TDI either improved or declined by 2 points: the difference in VT change scores between subjects in whom TDI improved by 2 and those in whom TDI declined by 2 points was 1.1 SDU. The SGRQ Impact domain discriminated best between all categories of change in the TDI anchor (Figure 3). For the SGRQ, the Symptoms domain discriminated best between subjects whose TDI either improved or declined by 2 points: the difference in Symptoms change scores between subjects in whom TDI improved by 2 and those in whom TDI declined by 2 points was 0.9 SDU.

Figure 3
Changes in SGRQ scores stratified on TDI

MID analyses

Correlations between the two anchors used in these analyses and HRQL scores are presented in Table 3. For the SF-36, distribution-based MID estimates were greater than anchor-based estimates (Table 4). For a given domain, the 1-SEM and 0.5ES estimates were fairly similar. On balance, minimally important changes in FVC corresponded to slightly higher MID estimates than did minimally important changes in the TDI anchor. Means of MID estimates for the SF-36 ranged from 2 for the GH domain to 4 for a number of domains. As for the SF-36, for the SGRQ, distribution-based MID estimates were greater than anchor-based estimates. Grand means of MID estimates for SGRQ domains ranged from 5 for the Activity domain to 8 for the Symptoms domain.

Table 3
Coefficients and p values for correlations between HRQL domains and anchors.
Table 4
Distribution- and anchor-based estimates for MIDs of the SF-36 and SGRQ.

Discussion

We performed the first systematic examination of the longitudinal performance of the SF-36 and SGRQ in patients with IPF. We found subjects whose clinical status changed most had the greatest changes (in the appropriate direction) in SF-36 and SGRQ scores; subjects whose clinical status did not change had essentially no change in HRQL scores; and subjects whose clinical status changed minimally had minimal changes in HRQL scores. We also derived the first MID estimates for the SF-36 and SGRQ in IPF.

There are no data on the longitudinal performance characteristics of the SGRQ in IPF. In the only longitudinal study to examine the SF-36 in IPF,21 Tomioka and colleagues showed that certain domains discriminated between subjects whose clinical status had changed according to pulmonary physiology or peripheral oxygenation. They did not estimate MIDs for SF-36 domains.

Validation is a process involving testing multiple hypotheses about an instrument to determine whether it “behaves” as expected of one designed to measure HRQL,22 and whether its scores can be used confidently (e.g., to determine whether a therapeutic intervention is beneficial). Our results support the validity of the SF-36 and SGRQ for longitudinal use in IPF and allow us to apply meaning to changes in SF-36 and SGRQ scores. For example, a group of IPF patients whose SF-36 PCS domain—which assesses physical health—score drops by three points is likely to have an FVC decline of at least 12% and worsening dyspnea (three-point decline in TDI).

Discriminating between subjects who improve or decline—an attribute some label as discriminant validity—is key to the usefulness of any HRQL instrument. That all domain scores did not change to the same degree (or at all) for certain anchors is not unexpected and does not detract from the usefulness of an instrument. As demonstrated by higher F-statistics, the SGRQ Impacts domain best discriminated between change categories in each of the three anchors. Among SF-36 scales, the PCS best discriminated between change categories in two of the three anchors. This is not surprising, given the greater impairment in physical domains in IPF and that the PCS integrates the four SF-36 physical health domains.

The recently modified definition of MID is that it is the smallest difference in a score that informed patients or proxies perceive as important, either beneficial or harmful, and which would lead the patient or clinician to consider a change in management.20 There is no one correct way to estimate the MID; it should be done using multiple methods.17 There are no published MID estimates for the SF-36 for IPF or even, to our knowledge, for COPD. Examining the results of a study by Kosinski and colleagues,12 in which MIDs for the SF-36 were derived in subjects with rheumatoid arthritis, gives some perspective to our SF-36 MID estimates: after converting estimates from their study to norm-based, we found our estimates to be very similar. Their MID estimate for the PF domain was 3 points versus 3 from this study—for RP 5 vs. 4, BP 5 vs. 3, GH 1 vs. 2, VT 4 vs.3, SF 4 vs. 4, RE 5 vs. 4, MH 5 vs. 3, PCS 3 vs. 3, MCS 4 vs. 3. These similarities are not surprising: one expects that a generic instrument (like the SF-36) would behave similarly, no matter the population.

For the SGRQ, our MID estimates were greater than its widely accepted MID of four points—an estimate derived in patients with obstructive diseases by using expert opinion and anchor-based approaches.23 The divergence likely reflects differences in IPF vs. COPD and the differing behavior of the SGRQ in each. Recall, the SGRQ is obstructive diesease-specific, and certain items tap constructs (e.g, wheezing) not pertinent to IPF patients. Pulished distribution-based MID estimates for the SGRQ vary widely, ranging from 1.3 to 8.4 units.23 Our distribution-based estimates ranged from 6-13.

We chose FVC, DLCO, and dyspnea as anchors because, in patients with IPF, each is key to tracking clinical status, and they are commonly used trial outcomes. We considered a 7-12% change in raw FVC as minimally important, because this range covers both 7% (recently shown to carry prognostic significance in IPF24,25) and 10% (a common endpoint in clinical trials); this gave us a reasonable range around the globally accepted 10% value. In populations other than IPF, a one-unit change in TDI is the MID,14 so we used it here.

The primary limitation of this study is the relatively small number of subjects whose pulmonary physiology changed over time, which left us with imprecise MID estimates. Unfortunately, patient-report global change scores—where a subject rates his overall HRQL at present in relation to baseline, often on a 7-choice Likert scale—were not collected in the BUILD-1 trial; if they had been collected, such scores could have been used as an anchor. Some investigators argue that global change scores make the best anchors.17 The inclusion criterion that subjects' baseline 6MWD had to between 150-499 meters means the results of our analyses may not be translatable to all IPF patients (e.g., those in the end stages of the disease who are unable to walk 150 meters in six minutes). The strength of our study are that it yielded the first-ever estimates of MIDs for the SF-36 and SGRQ in IPF—results that could be useful for guiding future research. In future IPF studies, investigators should perform confirmatory assessments of validity, responsiveness, and MIDs for the SF-36 and SGRQ (or any other instrument). Until a disease-specific instrument is developed and tested, investigators can confidently administer either or both the SF-36 and SGRQ in their studies—and pay close attention to domains that have been shown to be useful.

In sum, we examined the SF-36 and SGRQ in a longitudinal IPF study and found them to perform reasonably well. Each possessed validity for discriminating subjects whose disease status changed by differing degrees over time. We derived the first estimates of the MIDs for these two instruments in IPF. More studies are needed to refine these estimates and further advance our understanding of the behavior of these instruments in IPF.

Supplementary Material

Online supplement

Acknowledgments

The authors wish to acknowledge the efforts of all the personnel at Actelion Pharmaceuticals (trial sponsor) and of the investigators involved in the BUILD-1 trial: Ishaar Ben-Dov, Charles Chan, Jean-Francois Cordier, James Dauber, Joao De Andrade, Adaani Frost, Thomas Geiser, Marilyn Glassberg, Jeffrey Golden, Gary Hunninghake, Sanjay Kalra, Lisa Lancaster, Robert Levy, Fernando Martinez, Keith Meyer, Joachim Mueller-Quernheim, Paul Noble, Christophe Pison, Charles Poirier, Milton Rossman, Paola Rottoli, Gerd Staehler, Domonique Valeyre, Athol Wells, Gordon Yung and David Zisman. We also wish to thank Dr. Diane Fairclough for her comments on a prior version of this manuscript and Dr. Ron Hays for his availability to answer questions pertaining to the MID.

Footnotes

Study conceptualization and design: Swigris, Wamboldt

Data collection: Swigris, Brown, Behr, du Bois, King, Raghu and the BUILD-1 investigators

Statistical analyses: Swigris, Wamboldt

Manuscript preparation and final approval: Swigris, Brown, Behr, du Bois, King, Raghu

The work in this manuscript is the original work of the stated authors. None of the authors has any real or potential conflicts with information in this manuscript.

References

1. Swigris JJ, Gould MK, Wilson SR. Health-related quality of life among patients with idiopathic pulmonary fibrosis. Chest. 2005;127:284–94. [PubMed]
2. Nishiyama O, Taniguchi H, Kondoh Y, et al. Health-related quality of life in patients with idiopathic pulmonary fibrosis. What is the main contributing factor? Respir Med. 2005;99:408–414. [PubMed]
3. Abrams D. Analysis of a life-satisfaction index. J Gerontol. 1976;24:470. [PubMed]
4. Swigris JJ, Kuschner WG, Jacobs SS, Wilson SR, Gould MK. Health-related quality of life in patients with idiopathic pulmonary fibrosis: a systematic review. Thorax. 2005;60:588–94. [PMC free article] [PubMed]
5. King TE, Jr., Behr J, Brown KK, et al. BUILD-1: a randomized placebo-controlled trial of bosentan in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med. 2008;177:75–81. [PubMed]
6. Joint Statement of the American Thoracic Society and the European Respiratory Society Idiopathic pulmonary fibrosis: diagnosis and treatment. Am J Respir Crit Care Med. 2000;161:646–664. [PubMed]
7. Joint Statement of the American Thoracic Society and European Respiratory Society American Thoracic Society/European Respiratory Society international multidisciplinary consensus classification of the idiopathic interstitial pneumonias. Am J Respir Crit Care Med. 2002;165:277–304. [PubMed]
8. Ware J, Jr., Sherbourne C. The MOS 36-item short-form health survey (SF-36). I. Coceptual framework and item selection. Med Care. 1992;30:473–483. [PubMed]
9. Jones P, Quirk F, Baveystock C. The St. George's Respiratory Questionnaire. Respir Med. 1991;85:25–31. [PubMed]
10. Mahler DA, Weinberg DH, Wells CK, Feinstein AR. The measurement of dyspnea. Contents, interobserver agreement, and physiologic correlates of two new clinical indexes. Chest. 1984;85:751–8. [PubMed]
11. Cronbach L. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;22:293–296.
12. Kosinski M, Zhao SZ, Dedhiya S, Osterhaus JT, Ware JE., Jr. Determining minimally important changes in generic and disease-specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis. Arthritis Rheum. 2000;43:1478–87. [PubMed]
13. Kerlinger F. Foundations of behavioral research. Holt, Rinehart, and Winston; New York: 1973.
14. Witek TJ, Jr., Mahler DA. Minimal important difference of the transition dyspnoea index in a multinational clinical trial. Eur Respir J. 2003;21:267–72. [PubMed]
15. Wyrwich K, Nienaber N, Tierney W, Wolinsky F. Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life. Med Care. 1999;37:469–478. [PubMed]
16. Wyrwich K, Tierney W, Wolinsky F. Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life. J Clin Epidemiol. 1999;52:861–873. [PubMed]
17. Hays RD, Farivar SS, Liu H. Approaches and recommendations for estimating minimally important differences for health-related quality of life measures. Copd. 2005;2:63–7. [PubMed]
18. Walters SJ, Brazier JE. What is the relationship between the minimally important difference and health state utility values? The case of the SF-6D. Health Qual Life Outcomes. 2003;1:4. [PMC free article] [PubMed]
19. Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;41:582–92. [PubMed]
20. Puhan MA, Frey M, Buchi S, Schunemann HJ. The minimal important difference of the hospital anxiety and depression scale in patients with chronic obstructive pulmonary disease. Health Qual Life Outcomes. 2008;6:46. [PMC free article] [PubMed]
21. Tomioka H, Imanaka K, Hashimoto K, Iwasaki H. Health-related quality of life in patients with idiopathic pulmonary fibrosis--cross-sectional and longitudinal study. Intern Med. 2007;46:1533–42. [PubMed]
22. Jones PW. Health status measurement in chronic obstructive pulmonary disease. Thorax. 2001;56:880–7. [PMC free article] [PubMed]
23. Jones PW. St. George's Respiratory Questionnaire: MCID. COPD. 2005;2:75–9. [PubMed]
24. Du Bois RM, Albera C, Costabel U, et al. Categorical declines in percent predicted forced vital capacity are associated with a graded risk of death in patients with idiopathic pulmonary fibrosis. Chest. 2008;134:S20003.
25. Zappala C, Latsi P, Nicholson AC, Wells AU. Marginal declines in FVC levels are associated with increased mortality in idiopathic pulmonary fibrosis. Thorax. 2007;175:A143.