|Home | About | Journals | Submit | Contact Us | Français|
To compare psychometric functioning of the Fatigue Severity Scale (FSS) and the Modified Fatigue Impact Scale (MFIS) in a community sample of persons with multiple sclerosis (MS).
A self-report survey including the FSS, MFIS, demographic and other health measures was completed by 1271 individuals with MS. Analyses evaluated the reliability and validity of the scales, assessed their dimensional structures, and estimated levels of floor and ceiling effects. Item response theory (IRT) was used to evaluate the precision of the MFIS and FSS at different levels of fatigue.
Participants had a mean score on the FSS of 5.1 and of 44.2 on the MFIS. Cronbach’s alpha values for FSS and MFIS were all 0.93 or greater. Known-groups and discriminant validity of MFIS and FSS scores were supported by the analyses. The MFIS had low floor and ceiling effects, while the FSS had low floor and moderate ceiling effects. Unidimensionality was supported for both scales. IRT analyses indicate the FSS is less precise in measuring both low and high levels of fatigue compared to the MFIS.
Researchers and clinicians interested in measuring physical aspects of fatigue in samples whose fatigue ranges from mild to moderate can choose either instrument. For those interested in measuring both physical and cognitive aspects of fatigue, and whose sample is expected to have higher levels of fatigue, the MFIS is a better choice even though it is longer. IRT analyses suggest both scales could be shortened without a significant loss of precision.
Fatigue has been defined as a “subjective lack of physical and/or mental energy that is perceived by the individual or caregiver to interfere with usual and desired activities,”(Multiple Sclerosis Council for Clinical Practice Guidelines [MSCCPG], 1998) and is among the most common and disabling symptoms reported by patients with multiple sclerosis (MS). Estimates of the prevalence of fatigue in persons with multiple sclerosis (MS) range from 53 to 92% (Branas, Jordan, Fry-Smith, Burls, & Hyde, 2000). A literature review of indexed studies measuring fatigue in MS between 2005 and 2010 identified 32 different self-report measures of fatigue. Of these, the most commonly used scales were the Fatigue Severity Scale (FSS) (Krupp, LaRocca, Muir-Nash, & Steinberg, 1989) and the Modified Fatigue Impact Scale (MFIS) (MSCCPG, 1998). The purpose of this study was to compare psychometric functioning of the FSS and MFIS in a sample of persons with MS. Specifically, we evaluated the reliability and validity of the scales, examined their dimensional structures, and assessed levels of floor and ceiling effects. In addition to the traditional psychometric analyses, we used an item response theory approach to evaluate the precision of the MFIS and FSS at different levels of fatigue (e.g., high versus low fatigue), a question that cannot be answered within the framework of Classical Test Theory (CTT).
Research participants were recruited through the Greater Washington chapter of the USA National Multiple Sclerosis Society (NMSS), which serves 23 counties in Washington State. Letters of invitation were sent to 7806 persons on the NMSS mailing list. Of the 1629 who responded to invitation letters, 1597 met eligibility criteria of being at least 18 years of age and reported having been diagnosed with MS. Eligible individuals were either mailed a self-report paper survey (n=1368) or directed to an online version of the same survey (n=229). Of these, 1271 individuals responded (80%). A short, anonymous demographics survey was sent to non-responders of the mailing list to assess possible recruitment bias. Responses received from 1046 non-responders indicated that 13% did not have MS despite being listed as persons with MS on the mailing list, and 34% did not recall receiving the initial survey invitation. Overall, the 1271 individuals who completed the study survey were similar on demographic variables to the non-responders except they were more educated (84% reported some college or more education compared to 72% of non-responders; chi2=30.7, p<0.001), slightly younger (53% of responders were 51 or older compared to 71% of non-responders; chi2=70.3, p<0.001), and had shorter mean disease duration (M=13, SD=10) than non-responders (M= 17, SD=12) [t(2041)=8.00, p<0.0001]. The study was approved by the Human Subjects Division at the University of Washington, and all participants provided written informed consent.
The FSS is comprised of nine items scored from 1 to 7 (1 = completely disagree, 7 = completely agree) (Krupp, et al., 1989). Scale scores are the mean of item scores, with lower scores indicating less fatigue. Respondents to the FSS are asked to consider the past week when choosing their answers. Items were chosen based on their ability to identify common features of fatigue in patients with MS and systemic lupus erythematosus. The item content of the FSS primarily focuses on the characteristics of fatigue (e.g., exercise brings on my fatigue, fatigue interferes with certain duties and responsibilities). All but one of the FSS items target physical aspects of fatigue. The remaining item is related to cognitive aspects of fatigue (motivation is lower when fatigued).
The MFIS is a shortened version of the Fatigue Impact Scale (Fisk et al., 1994) that contains 21 of the original 40 items. The measure was developed from interviews of individuals living with MS about how fatigue affects their daily activities and other life areas (Guidelines, 1998). In 1998, the Multiple Sclerosis Council for Clinical Practice Guidelines recommended the MFIS for use in clinical practice and research. The 21 items of the MFIS are scored from 0 to 4 (0 = Never, 1 = Rarely, 2 = Sometimes, 3 = Often, 4 = Almost always), and respondents are asked to consider their experiences with fatigue during the past four weeks. It can also be divided into three subscales: cognitive (10 items), physical (9 items) and psychosocial fatigue (2 items). The MFIS item content emphasizes symptoms of fatigue (e.g., muscles felt weak, needed to rest more often, clumsy).
The survey included measures of pain interference [Brief Pain Inventory (BPI)](Cleeland & Ryan, 1994), pain severity and impact [Pain Impact Questionnaire (PIQ-6)](Becker, Schwartz, Saris-Baglama, Kosinski, & Bjorner, 2007), anxiety [7-item Hospital Anxiety and Depression Scale (HADS)](Zigmond & Snaith, 1983) and depression [Patient Health Questionnaire (PHQ-9)](Kroenke, Spitzer, & Williams, 2001). A question about MS clinical course [self-report graphical image questionnaire](Bamer, Cetin, Amtmann, Bowen, & Johnson, 2007) was included as was the mobility sub-scale of the self-report version of the Expanded Disability Status Scale (EDSS)](Bowen, Gibbons, Gianas, & Kraft, 2001). Responses to the mobility subscale were used to categorize individuals into three groups: minimal (≤4.0), intermediate (4.5-6.5), and advanced (≥7.0) mobility impairment (Bowen, et al., 2001). Questions asking about demographics (e.g. age, gender), socioeconomic variables (e.g. employment) and the duration of MS also were included.
To evaluate floor and ceiling effects, the percentage of respondents with the highest and lowest scale scores were calculated for each of the fatigue scales.
Classical reliability analyses included estimation of internal consistency (Cronbach’s alpha) and item-to-scale correlations for the FSS, for the total MFIS score, and for two of the MFIS subscales (MFIS-Cognitive, MFIS-Physical). Because the MFIS-Psychosocial subscale only contains two items, these statistics were not calculated for it. Cronbach’s alpha has an optimal range (0.7 to 0.9) of internal consistency or item homogeneity, but values over 0.9 indicate item redundancy.(Boyle, 1991) Item-to-scale correlations greater than 0.40 are typically interpreted as evidence of scale reliability.(Everitt, 2002, p. 208) Correlations were estimated using Spearman rank correlation coefficients.
The construct validity of the FSS and MFIS was assessed by examining associations among different measures. We hypothesized that scores on the FSS and MFIS would be moderately to highly correlated. The highest correlation was expected between FSS scores and MFIS-physical scores, because both target physical aspects of fatigue. Weaker associations were expected between fatigue scores and scores on measures assessing other health domains, including pain (BPI, PIQ-6), anxiety (HADS), and depression (PHQ-9). Correlations were estimated using Spearman rank correlation coefficients.
Known-groups validity was assessed by evaluating whether scores distinguished among subgroups that theoretically should differ in mean scores. The MS literature supports the hypothesis that individuals with greater mobility impairment experience greater fatigue. (Hadjimichael, 2008) One way analysis of variance (ANOVA) was used to test the hypothesis that scores for the fatigue scales would be different based on EDSS categories, with higher fatigue reported by participants with higher (i.e., worse) EDSS scores.
To evaluate the dimensional structure of FSS and MFIS responses, we used confirmatory factor analysis (CFA) to fit a model in which all items loaded on a single factor (unidimensional model). Unidimensionality is inherently assumed when summary scores are obtained using all items of the scale. These analyses were conducted using MPLUS 5.21.(Muthen & Muthen, 2009) Fit of the unidimensional model was evaluated based on the comparative fit index (CFI).(Hu & Bentler, 1999) A CFI of 0.90 or greater has been suggested as a criterion for acceptable fit. (Hu & Bentler, 1999)
When the criterion for acceptable fit with a unidimensional model is not met, an alternative factor model is McDonald’s bifactor model.(Reise, Morizot, & Hays, 2007) In this model, all items load on a single, general factor. In addition subsets of items are identified either empirically or theoretically that are expected to load on sub-dimensions (group factors). Because the MFIS has subscales, we fit a bifactor model in which all items loaded on general fatigue, cognitive items loaded on one group factor, and all other MFIS items loaded on a second group factor, i.e., the two psychosocial items were grouped with the physical items. The two psychosocial items were grouped with the physical items, because their content was more similar to items in that subdomain than to the psychosocial items, and this grouping was also supported by an exploratory factor analysis. Modeling the MFIS data using a bifactor model allowed us to estimate the amount of variance accounted for by the subscales compared to the variance accounted for by the overall fatigue factor.
The bifactor analyses also served a second purpose. One of the assumptions of Item Response Theory (IRT) analyses is unidimensionality. When factor loadings on the general factor are greater than 0.30 and the general factor accounts for more variance than do the group factors, then unidimensional IRT models can be reliably applied.(McDonald, 1981)
FSS and MFIS item responses were modeled used the graded response model (GRM) (Samejima, 1969), a model appropriate for items with more than two response options. Based on this model we calculated “information” for each scale and subscale. Information is the equivalent of reliability estimates in classical methods. Its chief advantage is that values are estimated for every level of the trait being measured. CTT reliability statistics generate a single value for an entire scale. This obscures the fact that a scale typically measures different levels of trait with different levels of precision. Scale information was plotted along with the distributions of MFIS and FSS scores. This graphical display provides a picture of a scales relative precision within the study sample, and we have included reference lines to indicate where the scales measure with reliability greater than 0.80 or 0.90.
Of the 1271 individuals participating in the study, 80% (n=992) were women, most were either married or living with a significant other (n=867; 70.3%), and 36.2% (n=447) reported being employed 20 or more hours a week. Participants had a mean age of 50.7 (SD=11.6; range 18-88) and mean disease duration of 13.2 (SD=10.1; range 0-60) years. The most common type of MS reported was relapsing remitting (n=700; 58.5%). Based on the mobility subscale of the EDSS, severity of MS was categorized as minimal for 32.4% of the sample (EDSS≤4.0), intermediate for 47.9% (EDSS 4.5 - 6.5) and advanced for 19.7% (EDSS≥7.0). The sample was similar to MS community samples in published studies with the exception of our sample having a higher proportion of women (81% versus 64% (Kos et al., 2005) and 81% versus 70.4% (Mills, Young, Nicholas, Pallant, & Tennant, 2009)). Demographic information and disease characteristics are displayed in Table 1.
Participants had a mean score on the FSS of 5.1 (SD=1.5) and of 44.2 (SD=18.2) on the MFIS. Mean scores on all variables included in the analyses are listed in Table 2.
We calculated the percentage of with the lowest (floor effect) or the highest (ceiling effect) possible scores on the FSS and MFIS measures (see Table 2). The FSS had low floor effects (0.9%), but higher ceiling effect (6.8%). The floor effects for MFIS-Total scores were comparable to those of the FSS (1.1%), but had a much smaller ceiling effects (0.7%) compared to the FSS. As expected with a two-item subscale, the MFIS-psychosocial subscale had the largest floor (7.4%) and ceiling (9.0%) effects.
Cronbach’s alpha values for the FSS scale, the MFIS subscales, and the MFIS total scores were all 0.93 or greater (see Table 3). This suggests some redundancy in item content. Item-to-scale correlations also were high for the FSS and for the MFIS scale and MFIS subscales.
The patterns of correlations between MFIS subscale scores and MFIS and FSS total scores were in the hypothesized direction and consistent with hypothesized magnitude. The FSS, which chiefly targets physical fatigue, had the highest correlation with the MFIS-physical (rho=0.77) and the lowest correlation with MFIS-cognitive (rho=0.55) (see Table 4).
Estimated associations between the MFIS and FSS scores and scores on other health constructs supported the discriminant validity of MFIS and FSS scores. Correlations between fatigue scores and HADS-anxiety scores were lower than correlations with scores on measures of pain and depression (see Table 4).
Known-groups validity also was supported. Individuals with less mobility impairment (EDSS ≤4.0) reported significantly less fatigue compared to those with more severe mobility impairment (EDSS≥7.0). There were statistically significant differences among the EDSS groups for the FSS [F (2,1218)=118.9, p<0.0001], MFIS-cognitive [F (2,1224)=41.2, p<0.0001], MFIS-physical [F (2,1221)=234.2, p<0.0001], MFIS-psychosocial [F (2,1229)=130.9, p<0.0001], and MFIS-total [F (2,1215)=138.7, p<0.0001]. Post hoc comparisons found statistically significant differences between scores for participants with mild symptoms compared to participants with either moderate or severe symptoms (see Table 5). The MFIS-psychosocial scores were significantly different for respondents with moderate versus severe mobility impairment. All other fatigue scale and subscale differences were non-significant between those with moderate versus severe mobility impairment (see Table 5).
The CFA results supported the unidimensionality of the FSS but not the MFIS. The CFI for FSS was 0.97 [N=1241, Df=5, χ2=13511)] well above the recommended threshold of .90. However, the CFI for MFIS was 0.84 [N=1240, Df=8, χ2=15036)].
Because the unidimensional model did not fit the MFIS data well, a bifactor model was fitted in which all items loaded on a single general factor, the cognitive items loaded on a group factor, and the physical and psychosocial items loaded on a second group factor. The general factor accounted for much more of both the total variance (52%) and the common variance (70%) in scores. The cognitive and physical/psychosocial group factors accounted for 5% and 18% of the total variance, respectively; they accounted for 6% and 24% of the common variance, respectively.
After calibrating FSS and MFIS scores to separate graded response models, we calculated the amount of information provided by each scale and subscale. The resulting functions were plotted against FSS and MFIS total scores observed in the current sample. The scores are displayed in histograms below each graph in Figures 1a and 1b. Also included in the graph are reference lines for reliability estimates of 0.80 and 0.90. As the figure shows, the FSS provides substantial precision in measuring middle levels of fatigue, but was less precise in measuring both low and high levels of fatigue. We calculated the percentages of individuals who were measured with reliability less than each of the two reference reliability standards of 0.80 and 0.90. A total of 107 individuals (8.7%) were measured with <0.80 reliability. Most of these (n=96; 7.8%) were persons with high levels of fatigue (ceiling effect). A total of 189 (21.4%) were measured with <0.90 reliability; most of these (n=166; 13.6%) at the ceiling of the scale.
Figure 1b plots the information for the MFIS-Total score and all subscales. As the figure shows, compared to the FSS scores, the MFIS-Total score provided substantially more precision at the “tails” of the score distribution. As with the FSS, we evaluated percentages of individuals that were measured with reliability less than 0.80 and 0.90. A total of 13 individuals (1.1%) were measured with <0.80 reliability, all of which were at the high end of the scale. A total of 32 individuals (2.6%) were measured with <0.90 reliability, the majority of which were at the high end of the scale (n=22; 1.8%). MFIS scores were much less negatively skewed than FSS scores. A total of 164 (13.4%) subjects had FSS theta values greater than 1.0 (indicating very high levels of fatigue), but the FSS provides relatively small amounts of information at these levels of fatigue. In contrast, the MFIS measures with adequate precision at all levels of fatigue represented in the sample.
The objective of this study was to use modern measurement methods to further examine the psychometric properties of two fatigue scales commonly used in MS research and to assist researchers with the selection of study instruments that best meet the needs of their study. Results suggest that researchers interested in measuring physical fatigue of samples whose fatigue ranges from mild to moderate can choose either instrument. For those interested in measuring both physical and mental fatigue and whose sample is expected to have high levels of fatigue we recommend using MFIS.
The mean FSS score in this study (5.1; SD=1.5) is similar to studies by Valko et al. (Valko, Bassetti, Bloch, Held, & Baumann, 2008) and Krupp et al. (Krupp, et al., 1989), which reported 4.7 (1.6) and 4.8 (1.3) respectively. Scores on the MFIS in this study also were similar to those obtained in other studies. The MFIS median value in the current study was 45.0. Other studies have reported median values ranging from 33.0 (Kos, et al., 2005; Tellez et al., 2005) to 45.0 (Kos, et al., 2005).
In the study sample, very few participants had scores at the floor or ceiling of either fatigue scale. Typically floor and ceiling effects are considered problematic when more than 15% of the sample has either the lowest or highest score possible (Terwee et al., 2007). Neither the FSS nor the MFIS had ceiling and floor effects of this magnitude in this study. The MFIS-psychosocial had the most subjects (9.0%) at the ceiling (worst possible score).
Reliability was evaluated using both CTT and IRT methods. The CTT analyses included estimation of internal consistency and item-to-scale correlations. Cronbach’s alpha values for the FSS and MFIS were all greater than 0.85 suggesting redundancy and opportunity to shorten the scales. The item-to-scale correlations were all greater than the criterion of 0.40, providing evidence of item homogeneity for the MFIS subscales and MFIS total score. In addition to Cronbach’s alpha, this study also used test information obtained from an IRT analysis to examine the precision of the MFIS and FSS along the whole fatigue continuum. MFIS appears to measure with greater precision than FSS at higher levels of fatigue.
Construct validity was assessed by comparing the associations among fatigue subscale scores and total scores (convergent validity) as well as with other health constructs (discriminant validity), including pain interference, anxiety, and depression. Correlations between FSS scores and both MFIS-physical and MFIS-cognitive scores were found to be similar to the values obtained by Tellez et al. (2005) in a previous study, such that there was a greater association between FSS and the MFIS-physical scores than between FSS and the MFIS-cognitive scores. Furthermore, results from this study are consistent with the finding of Tellez et al. (2005) that MFIS scores are more highly correlated with depression scores than are FSS scores. A similar pattern was observed in relation to scores for other health concepts, i.e., lower correlations were observed between FSS scores compared to the MFIS total and domain scores.
Known-groups validity was supported in this study by observing higher fatigue scores (higher levels of fatigue) in subjects with moderate to severe MS symptoms in all the FSS and MFIS scores. This finding is consistent with the MS fatigue literature.(Hadjimichael, 2008) The MFIS-psychosocial was the only domain where a significant difference was observed between participants reporting moderate (EDSS 4.5 – 6.5) and severe (EDSS ≥ 7.0) symptoms which is surprising, because there are only two items in this domain. Future studies should evaluate whether this result is replicated in other samples.
Testing the assumption of unidimensionality required for interpreting the summary score and fitting an IRT model also provides evidence related to construct validity. The degree of unidimensionality was found sufficient for fitting an IRT model for the FSS and MFIS. Previously reported analyses with MS samples both reported support for unidimensionality (Hagell et al., 2006; Mills, et al., 2009). Strict unidimensionality is desirable for applications of IRT, however it is well recognized that data from psychological measures are rarely (if ever) strictly unidimensional. In fact, to represent complex constructs adequately some multidimensionality may be necessary (Reise et al., 2010). The issue is what degree of multidimensionality and resulting parameter estimates distortions can be tolerated. Published studies suggest that IRT scores are fairly robust to dimensionality violations (Camilli et al., 1995; Dorans & Kingston, 1985)
In the current study, IRT was used to examine psychometric properties of scales. If the purpose was to develop IRT-based scoring for MFIS for instance for a computerized adaptive testing application, parameter bias caused by unmodeled multidimensionality might be of more concern. However, in this study context and because the bifactor analysis results supported sufficient unidimensionality for fitting an IRT model, the more multidimensional structure of MFIS compared to FSS is viewed more as a strength of the scale than a concern.
The results of this study highlight important differences between the FSS and MFIS. First, the FSS is shorter but measures primarily physical fatigue and does not measure with adequate precision at higher levels of fatigue which are often reported by individuals with MS. The FSS also utilizes a one week recall period, which one study suggests may be less accurate in measuring mean levels of fatigue than the four week recall period used by the MFIS (Broderick et al., 2008). MFIS is longer and measures both physical and cognitive fatigue with adequate precision along the whole continuum of fatigue commonly reported by people with MS. Therefore, in studies that include people with high levels of fatigue and where cognitive fatigue is of interest, it is preferable to administer MFIS even though it is longer than FSS.
In considering the results from this study, it is important to consider its strengths and limitations. This study included a large sample (n=1271) of individuals with MS living in the community. Limitations of this study include the cross-sectional nature of the data that does not allow for evaluation of responsiveness, an important aspect of psychometric functioning. In addition, the response rate to the initial invitation was low, and no effort was made to recruit a sample representative of individuals living with MS in the United States; therefore, it would be helpful if the study was replicated with different MS samples.
Our analyses (both CTT and IRT) suggested some item redundancy in both scales. In addition, IRT methods can be used to assess bias in responses to the items (referred to as differential item functioning). Longitudinal studies that administer these measures at baseline and after an effective treatment could be used to evaluate the degree to which scores on the measures detect change over time. Additional work may be needed to establish how much change in summary scores constitutes a clinically meaningful change.
The application of modern measurement methods can be used to improve the psychometric properties of both fatigue scales. In addition, fatigue instruments developed using modern psychometric theory are now publicly available (Cella et al., 2010) that allow for computerized adaptive testing and development of short instruments targeted to certain populations or certain levels of fatigue. These instruments lower respondent burden while estimating scores with a high level of precision along the entire fatigue continuum.
The contents of this manuscript were developed under grants from the Department of Education, NIDRR grant numbers H133B031129 & H133B080025, and the National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institute of Health grant number 5U01AR052171. However, these contents do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government.
Publisher's Disclaimer: The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/rep
Dagmar Amtmann, University of Washington, Rehabilitation Medicine, Box 354237, 4907 25th Ave NE, Seattle, WA 98105, Phone: 206 543-4741, Fax: 206 685-3244. Email: dagmara/at/u.washington.edu.
Alyssa M. Bamer, University of Washington, Rehabilitation Medicine, Box 356490, Seattle, WA 98195, Phone: 303 953-8085, Fax: 206 685-3244. Email: adigiaco/at/u.washington.edu.
Vanessa Noonan, University of Washington, Rehabilitation Medicine, Box 356490, Seattle, WA 98195, Phone: 604-707-2126 Fax: 604-707-2121. Email: Vanessa.Noonan/at/vch.ca.
Nina Lang, University of Washington, Rehabilitation Medicine, Box 356490, Seattle, WA 98195, Phone: 206-221-2414, Fax: 206 685-3244. Email: ninaclaire/at/gmail.com.
Jiseon Kim, University of Washington, Rehabilitation Medicine, Box 356490, Seattle, WA 98195, Phone: 512 299-5991. Email: jiseonk/at/u.washington.edu.
Karon F. Cook, Northwestern University, Feinberg School of Medicine, 710 N. Lake Shore Dr. Suite 729, Chicago, IL 60611, Phone: 713 291-3918. Email: karon.cook/at/northwestern.edu.