|Home | About | Journals | Submit | Contact Us | Français|
Few studies have explored issues of sensitivity and specificity for using the fatigue construct to identify patients meeting chronic fatigue syndrome (CFS) criteria. In this article, we examine the sensitivity and specificity of several fatigue scales that have attempted to define severe fatigue within CFS. Using Receiver Operating Characteristic (ROC) curve analysis, we found most scales and sub-scales had either significant specificity and/or sensitivity problems. However, the post-exertional subscale of the ME/CFS Fatigue Types Questionnaire (Jason, Jessen, et al., 2009) was the most promising in terms of specificity and sensitivity. Among the more traditional fatigue scales, Krupp, LaRocca, Muir-Nash, and Steinberg’s (1989) Fatigue Severity Scale had the best ability to differentiate CFS from healthy controls. Selecting questions, scales and cut off points to measure fatigue must be done with extreme care in order to successfully identify CFS cases.
There have been relatively few studies assessing the sensitivity and specificity of fatigue scales which are frequently used to identify individuals with chronic fatigue syndrome (CFS) and differentiate them from healthy controls. The present investigation consists of two distinct studies, both of which employ samples of individuals with CFS and controls, and to assess the effectiveness of several well known fatigue instruments in discriminating between these two groups by utilizing Receiver Operating Characteristic (ROC) curve analyses. The Fukuda et al. (1994) CFS case definition is the currently accepted case definition internationally, although there is no available “gold standard” to assess fatigue severity. This case definition requires an individual to experience six or more months of persisting or recurring chronic fatigue and the co-occurrence of four of eight additional core symptoms. However, these Fukuda et al. requirements have been criticized as lacking operational definitions and guidelines for accurate identification of CFS cases (Jason, King, et al., 1999; Reeves et al., 2003). For example, these criteria do not specify how to assess fatigue severity or the presence of persisting or recurring fatigue for a period of 6 or more months. Partially in response to these problems with operationalizing the Fukuda et al. definition, the Centers for Disease Control and Prevention (CDC) developed an empiric case definition for CFS that involves assessment of symptoms, disability, and fatigue with standardized instruments and specific cutoff points (Reeves et al., 2005).
The CDC’s empiric CFS case definition (Reeves et al., 2005) assesses fatigue using the Multidimensional Fatigue Inventory (MFI) (Smets, Garssen, Bonke, & DeHaes, 1995), a 20-item instrument consisting of several subscales including general fatigue and reduced activity. Reeves et al. define severe fatigue as a score of greater than or equal to 13 on the MFI general fatigue subscale or greater than or equal to 10 on the MFI reduced activity subscale. However, in one study of three groups with CFS, the mean MFI general fatigue scores ranged from 18.3 to 18.8 (Tiersky, Matheis, DeLuca, Lange, & Natelson, 2003), clearly higher than the Reeves et al. recommended cutoff of 13. In addition, reduced activity scale items refer to issues that a person with depression might easily endorse. For example, a person would meet the fatigue criterion if they stated that the following two items were entirely true: “I get little done,” and “I think I do very little in a day.” It is likely that most individuals with major depressive disorder (MDD) would meet this reduced activity criterion. Jason, Najar, Porter, and Reh (2009) found that 38% of those with a diagnosis of MDD were misclassified as having CFS using the CDC empiric CFS case definition (Reeves et al.). With the Reeves et al. empiric case definition criteria, the estimated rates of CFS in the US have increased to 2.54% (Reeves et al., 2007), rates that are about ten times higher than prior CDC prevalence estimates (Reyes et al., 2003) and estimates of other investigators (Jason, Richman, et al., 1999). It is plausible that this inflated CFS prevalence estimate in the U.S. is due to an inappropriate broadening of the case definition, and this might very well be related to specificity problems in the measures and cutoffs selected for this case definition.
There have been several reviews of fatigue instruments (Friedberg & Jason, 1998; 2002), some examining scales measuring fatigue intensity alone and others which integrate dimensions of fatigue intensity and functional outcomes associated with fatigue. As an example of a measure of fatigue intensity alone, Chalder et al.’s (1993) Fatigue Scale is a 14-item verbal rating measure that has strong internal consistency. Using an ROC curve analysis, Jason et al. (1997) found this scale was able to discriminate a CFS sample from a healthy control sample; however, it was not possible to differentiate the CFS sample from a lupus or multiple sclerosis (MS) sample. The Fatigue Scale was also not able to distinguish between CFS and primary depression (Friedberg & Jason, 2002), which is a critical diagnostic issue in CFS. In addition, Goudsmit, Stouten, and Howes (2008) found there was a marked overlap in fatigue scores within a CFS sample between those patients who rated themselves as moderately ill and those who rated themselves as severely ill.
An example of a well known fatigue/function measure is the Fatigue Severity Scale (FSS) (Krupp, LaRocca, Muir-Nash, & Steinberg, 1989). The FSS is composed of nine items, and in the initial validation study, internal consistency was high for specific illness groups (MS and lupus) and healthy controls. Studies of individuals with CFS revealed that FSS scores were significantly higher for individuals with CFS, as compared to individuals with MS or primary depression (Pepper, Krupp, Friedberg, Doscher, & Coyle, 1993). Taylor, Jason, and Torres (2000) compared the FSS with the Fatigue Scale (Chalder et al., 1993) and found for a CFS-like group, the FSS was more closely associated than the Fatigue Scale with severity ratings for the eight core CFS symptoms (Fukuda et al., 1994) and a number of functional outcomes. A ceiling effect in the FSS may limit its utility to assess severe fatigue-related disability, and Stouten (2005) has warned that many fatigue scales do not accurately represent the severe fatigue that is uniquely characteristic of CFS.
Other fatigue scales have also been developed for assessing fatigue severity and related functioning (Fukuda et al., 2008). For example, the 54-item Profile of Fatigue-Related Symptoms (PFRS) was developed to measure symptomatology specifically related to CFS (Arroll & Senior, 2009; Ray, Weir, Phillips, & Cullen, 1992). Each item on the PFRS lists a symptom typical of CFS and respondents are asked to indicate how intensely they have experienced that symptom over the past week. Ray et al. found the following four factors emerged from the PFRS: Emotional Distress, Fatigue, Cognitive Difficulty, and Somatic Symptoms. However, fatigue is a multifaceted construct, and a closer examination of the PFRS Fatigue factor items suggests that many different fatigue states are subsumed in this subscale. For example, items such as “The slightest exercise making you physically tired” and “muscles feeling weak after slight exercise” denotes post-exertional fatigue which relates to the unusual fatigue or malaise experienced by individuals with CFS following exertion. Yet other items within the subscale are more closely related to the experience of energy depletion such as “Feeling physically drained” and “Not having the physical energy to do anything,” indicating inadequate energy reserves which are unrelated to exercise or effort. Consequently, this suggests that fatigue instruments might need to be developed that delineate finer shades of meaning in the context of the physical fatigue experienced by individuals with CFS.
Recently, Jason, Jessen, et al. (2009) developed the ME/CFS Fatigue Types Questionnaire (MFTQ), a 22-item scale designed to measure the duration, severity and frequency of different fatigue-related sensations and symptoms. Fatigue items encompassed the following dimensions: lack of energy resources needed for daily functioning, over-stimulation of the mind or body without the available energy to act out the mental or physiological excited state, exhaustion or interruption related to everyday cognitive processes, tiredness that is associated with physical symptoms commonly seen in cases of influenza, and abnormal exhaustion following physical activity. Several fatigue factors emerged for individuals with CFS (Post-Exertional, Wired, Brain Fog, Energy, and Flu-Like fatigue), but only one factor emerged for a group of healthy controls. The five-factor structure confirmed in the CFS sample suggests that the symptom of fatigue in this illness is a multi-dimensional entity that is distinct from the generalized form of fatigue experienced by healthy individuals. The MFTQ appears to be a reliable and valid measure of fatigue types in individuals with CFS, but these findings need to be replicated in other CFS samples.
The MFTQ has been compared to other fatigue scales, and some intriguing findings have emerged. For example, Jason, Jessen, et al. (2009) found a significant correlation between the MFTQ’s Post-Exertional subscale and Ray et al.’s (1992) PFRS Emotional Distress factor for the control group, but the correlation between these two scales was not significant for the CFS group. These correlations suggest that for the general population, symptoms of post-exertional malaise are significantly related to emotional distress, whereas when people with CFS report symptoms of post-exertional malaise, the symptoms are independent of emotional distress. It is possible that because healthy individuals experience this relationship between emotional distress and post-exertional malaise, they might also believe these two domains are connected for themselves and, by inference, for patients with CFS, when in fact Jason, Jessen, et al.’s findings do not support this.
If fatigue scales are to be recommended for use in the diagnosis of CFS, as has been proposed by Reeves et al. (2005), then it is critically important to assess the effectiveness of these scales in differentiating between those with and without CFS. The current investigation sought to assess this critical issue by examining two distinct samples. Study 1 compares a CFS sample with a control group consisting of healthy college students and adults on the following scales: the MFTQ, FSS, the Fatigue Scale, and the PFRS. Study 2 compares a different CFS sample with a control group consisting of adults with a documented diagnosis of an MDD on the following scales: the MFI, Medical Outcomes Study Short Form-36 Health Survey (SF-36; a widely used measure of disability in CFS research), and the CDC Symptom Inventory, the scale recommended by Reeves et al. (2005) to assess the severity of the core symptoms of CFS. It was hypothesized that the MFTQ (Jason, Jessen, et al., 2009) would have better sensitivity and specificity than the other commonly used fatigue scales in differentiating between individuals with CFS and controls, as this scale was specifically developed for use with CFS samples and has characteristics that differentiate various aspects of fatigue unique to this illness. The implications of appropriate and inappropriate levels of sensitivity and specificity of these scales are far reaching with regards to obtaining well-characterized research samples. Ultimately, this leads to more ambiguity about the nature of CFS and further stigmatization of individuals disabled by this chronic illness.
The CFS group was comprised of 130 participants who indicated that they had CFS. These participants were initially contacted and recruited through several chronic fatigue syndrome support groups, conferences, and newsletters.The control group consisted of participants that were recruited via convenience sampling who either enrolled at a large Midwestern university (N = 167) or were healthy participants (N = 31) referred by the participants in the CFS group (e.g., a healthy family member or friend) (Jason, Jessen, et al., 2009).
The ME/CFS Fatigue Types Questionnaire (MFTQ) is comprised of 22 items assessing Post-Exertional, Wired, Brain Fog, Energy, and Flu-Like fatigue (Jason, Jessen, et al., 2009). Post-Exertional fatigue was defined as abnormal exhaustion following a bout of physical activity (e.g., “Physically drained after mild activity”).
Wired fatigue was considered an over stimulation of the mind or body without the available energy to act out the mental or physiological excited state (e.g., “Body feels over-stimulated when very tired”). Brain Fog fatigue constituted the exhaustion or interruption related to everyday cognitive processes, such as memory recall, speech, or information processing (e.g., “Thinking is hard work and muddy”). Energy fatigue was defined as a lack of energy resources needed for daily functioning (e.g., “Do not have energy to do anything”). Finally, Flu-Like fatigue was the tiredness that is associated with physical symptoms commonly seen in cases of influenza (e.g., “Flu-like symptoms, such as sinus pain, etc.”). Respondents were asked to provide details that described the onset, frequency, and severity of each statement as it relates to the participants’ experience. The onset was identified by month/season and year, the frequency of each fatigue symptom was determined by a 5-point Likert-type scale format (i.e., Never, Seldom, Often, Usually, Always), and severity was rated on a symptom rating scale from 1 to 100 with a higher score indicating that the symptom is more of a problem for the participant. A composite score for each item was then calculated by multiplying the symptom rating score by the frequency score, with a possible range 0 to 400. Each scale score is an average of item composite scores for that scale.
For the CFS sample, five factors emerged, but for the control sample, only one general factor emerged (Jason, Jessen, et al., 2009). Using the CFS sample, the alpha coefficients (Cronbach, 1951) for Post-Exertional, Wired, Brain Fog, Energy, and Flu-Like fatigue were .89, .83, .85, .76, and .84, respectively. For the control group, the Cronbach’s alpha was .92. These values demonstrate high internal consistency for the MFTQ on both samples.
The Fatigue Severity Scale (FSS) is a Likert scale consisting of nine items that assess fatigue severity and functionality (Krupp et al., 1989). Items are rated on a scale of 1 to 7 according to their level of agreement with a given statement, and include such statements as “I am easily fatigued” or “Fatigue interferes with carrying out certain duties and responsibilities.” Values for each item are averaged for a composite score, with higher scores indicating higher levels of impairment as a result of fatigue. In the original scale development study, Krupp et al. demonstrated that the FSS had sufficient internal consistency (Cronbach’s alpha was between .81 and .89 for the study groups), test-retest reliability, and concurrent validity. Krupp, Jandorf, Coyle, and Mendelson (1993) used this scale with 72 patients with CFS, and the mean fatigue score was 6.1 with a standard deviation of .8.
The Fatigue Scale, developed by Chalder et al. (1993), is an 11 item scale intended to measure the severity of fatigue-related symptoms experienced by individuals with ME/CFS. Responses to items are measured using a Likert-style format with four possible response choices related to symptom frequency (0 = less than usual, 1 = no more than usual, 2 = worse than usual, 3 = much worse than usual). The scores are then summed and a higher score indicates more severe fatigue-related symptomatology. The ‘Physical Fatigue’ items include questions such as “Do you have problems with tiredness?” or “Do you lack energy?” The remaining items constitute a ‘Mental Fatigue’ factor with questions such as “Do you have difficulty concentrating?” or “Do you make slips of the tongue when speaking?” The Total scale demonstrated sufficient internal consistency with alpha coefficients of .89 (Chalder et al.).
Ray et al. (1992) created the 54-item Profile of Fatigue-Related Symptoms (PFRS) in order to measure symptomatology specifically related to CFS. Each item lists a symptom typical of CFS and respondents are asked to indicate how intensely they have experienced that symptom over the past week. Responses are given in a seven point Likert-scale format ranging from 0 (not at all) through 3 (moderately) to 6 (extremely). Average item scores are then computed for four separate factors: Emotional Distress, Fatigue, Cognitive Difficulty, and Somatic Symptoms. Scale reliability was assessed for each factor, with alpha coefficients ranging from .88 to .96.
The statistical software package used for data analysis was PASW (formerly SPSS) for Windows, version 17.0. A ROC curve analysis (Hanley & McNeil, 1982) was used to evaluate the ability of the fatigue scales to discriminate between CFS and non-CFS individuals. The ROC curve graphically represents the probability of true positive results in diagnosis as a function of the probability of false positive results of this test. Sensitivity is defined as the probability that the test correctly classifies a CFS subject as positive. A true positive is defined as a participant who scores positive on the fatigue test for CFS and actually has the illness, whereas a false positive occurs when a participant whose fatigue tests positive for CFS does not have the illness. Specificity involves a test correctly classifying a non-ill participant as negative. A true negative is defined as a participant who tests negative on the fatigue test for CFS and does not have the illness, whereas a false negative is defined as a participant who tests negative on the fatigue test for CFS and actually has the illness.
Of the 130 participants that were enrolled in the study with possible CFS, eight were excluded from the analysis because they indicated they were never officially diagnosed with CFS. Of the remaining 122 participants, 22 were excluded because they didn’t provide information on one or more of the MFTQ subscales (i.e., they had missing data on all items for that subscale). The final sample thus consisted of 100 individuals who reported being diagnosed by a physician with CFS and had full information for each of the fatigue scales. The control group consisted of 198 individuals. There were significant differences between the CFS and control samples for all variables, including: age, gender, race, Hispanic origin, marital status, educational attainment, and work status. Socio-demographic information can be found in Jason, Jessen, et al. (2009) and Jason et al. (in press).
An ROC analysis is produced by plotting the sensitivity versus 1 - specificity for all cutoff points of the fatigue scales. The area under the ROC curve (AUC) is an indicator of the discriminatory ability of the scale: a straight line (area = 0.5) means that the scale is doing no better than chance in classifying CFS and non-CFS, while a perfect scale would have a ROC curve with an area of 1. The AUC is a summary measure that essentially averages diagnostic accuracy across the spectrum of test values. The informative AUC ranges from 0.5 to 1.0, and not from 0.0 to 1.0 as would the area under a probability distribution curve. An AUC of .99 means that 99% of the time a randomly selected individual from the CFS group will more adequately fulfill the fatigue criterion than a randomly selected individual from the control group. A test needs an AUC threshold of between 90-100% to have diagnostic meaning, and 95% or above to be considered a good diagnostic tool (Zweig & Campbell, 1993; Zou, O’Malley, & Mauri, 2007).
Table 1 presents data on the AUC, asymptotic standard error, and asymptotic 95% Confidence Intervals. The PFRS Fatigue subscale (Ray et al., 1992), the Fatigue Scale (Chalder et al., 1993), the FSS (Krupp et al., 1989) and the MFTQ’s Energy, Brain Fog, and Post-Exertional fatigue scales (Jason, Jessen, et al., 2009) had a threshold of 90% or greater.
At the 90% sensitivity level for the Fatigue Scale (with a score ≥ 14.50), the PFRS Fatigue scale (a score ≥ 1.88), the MFTQ Energy scale (a score ≥ 18.75), MFTQ Brain Fog scale (a score ≥ 24.17), and the FSS (a score ≥ 4.95), we found a specificity of .61, .62, .71, .73, and .84, respectively. In other words, even if these scales or subscales are able to identify 90% of those individuals with CFS, the tests are less accurate for correctly classifying a non-ill subject as negative (although among these scales, the FSS has the best specificity). However, with 90% sensitivity, the MFTQ Post-Exertional fatigue (a score ≥ 36.07) had a specificity of .93. Furthermore, even at the 95% level of sensitivity for Post-Exertional fatigue (a score ≥ 21.07), the specificity was .86. Clearly, the MFTQ Post-Exertional fatigue scale identifies the highest percentage of true negatives.1
Jason, Najar, et al. (2009) investigated the new Reeves et al. (2005) CFS empiric case definition with 27 participants with a diagnosis of CFS and 37 participants with a diagnosis of a major depressive disorder (MDD). All participants completed two subscales of the Multidimensional Fatigue Inventory (MFI). We investigated whether the fatigue criterion recommended by Reeves et al. to identify severe fatigue in CFS samples had adequate sensitivity and specificity in a sample of those with either CFS or MDD. Although the analyses of this study focus on the fatigue measure (MFI), we also collected information on other criteria used in the Reeves et al. empiric case definition including measures of symptoms (The CDC Symptom Inventory; Wagner et al., 2005) and disability (The Medical Outcomes Study Short Form-36 Health Survey; Ware, Snow, & Kosinski, 2000). As this information was available, we decided to evaluate the sensitivity and specificity of all three criteria (fatigue, symptoms, and disability), which are now used to diagnose CFS using the Reeves et al. empiric CFS case definition.
We recruited a total of 64 individuals, 27 with CFS and 37 with MDD. We obtained our sample of participants with CFS from two sources: local CFS support groups in Chicago and a previous research study conducted at DePaul University. To be included in the study, participants were required to have been diagnosed with current CFS, using the Fukuda et al. (1994) diagnostic criteria, by a certified physician. We solicited 37 participants with a diagnosis of MDD to participate in this study. We found participants from three sources: local chapters of the Depression and Bipolar Support Alliance group in Chicago, Craigslist- a free local classified ads forum that is community moderated, and online depression support groups. To be included in the study, all participants were required to have been diagnosed with MDD by a licensed psychologist or psychiatrist. We carefully screened participants to ensure that participants from the MDD group did not have CFS as defined by the Fukuda et al. criteria (See Jason, Najar, et al., 2009 for more details).
Participants were separated into three groups: Those 27 previously diagnosed with CFS (Jason, Najar, et al. 2009) and who met the new empiric CDC case definition of CFS (Reeves et al., 2005), those 14 from the MDD group meeting the new empiric CDC case definition of CFS criteria (MDD/CFS), and those 23 from the MDD group not meeting the new empiric CDC criteria for CFS (MDD). The Jason, Najar, et al. study suggested that the MDD/CFS group did not have CFS, but had MDD, and yet the Reeves et al. case definition inappropriately classified them as having CFS. Socio-demographic data compared across all three groups of participants indicated a significant age effect, with the average age for the CFS group significantly older than the MDD/CFS group. Furthermore, there were also significant differences in regards to work status between groups with more individuals in the CFS group on disability compared with the MDD/CFS group (For more details, see Jason, Najar, et al.).
This instrument is a 20-item self-report instrument consisting of five scales: general fatigue, physical fatigue, reduced activity, reduced motivation, and mental fatigue (Smets et al., 1995). Each scale contains four items rated from zero to five with the scale score of zero meaning completely true and the scale score of five meaning no, not true. Reeves et al. (2005) employed the MFI to measure severe fatigue, and to do this, they used only two of the five subscales: general fatigue and reduced activity. Using the CDC empiric case definition standards, severe fatigue was defined as greater than or equal to 13 on general fatigue or greater than or equal to 10 on reduced activity.
This 36-item instrument is composed of multi-item scales that assess functional impairment in eight areas: limits in physical activities (physical functioning), limits in one’s usual role activities due to physical health (role physical), limits in one’s usual role activities due to emotional health (role emotional), bodily pain, general health perceptions (general health), vitality (energy and fatigue), social functioning, and general mental health (Ware et al., 2000). Scores in each area reflect ability to function and higher values indicate better functioning. Reliability and validity studies have demonstrated high reliability and validity in a wide variety of patient populations for this instrument (Stewart, et al., 1989). Based on the CDC empiric case definition (Reeves et al., 2005), the SF-36 was used to assess disability. According to Reeves et al., significant reductions in occupational, educational, social, or recreational activities were defined as scores lower than the 25th percentile on physical functioning (less than or equal to 70), or role physical functioning (less than or equal to 50), or social functioning (less than or equal to 75), or role emotional (less than or equal to 66.7). A person would meet the disability criterion for the empiric CFS case definition by only showing impairment in one or more of these four areas (Reeves et al., 2005).
The CDC Symptom Inventory assesses information about the presence, frequency, and intensity of 19 fatigue related symptoms during the past month (Wagner et al., 2005). For each of the eight Fukuda et al. (1994) symptoms, participants were asked to report the frequency (1= a little of the time, 2= some of the time, 3= most of the time, 4= all of the time) and severity (the ratings were transformed to the following scale: 0= symptom not reported, 1= mild, 2.5= moderate, 4= severe)2. The frequency and severity scores were multiplied for each of the eight critical Fukuda et al. symptoms and were then summed. Subjects having four or more symptoms and scoring greater than or equal to 25 on this instrument would meet the symptom criteria according to the CDC empiric case definition (Reeves et al., 2005).
We analyzed the data comparing the CFS group to the MDD group, the CFS group to the MDD/CFS group, and the CFS group to the combined MDD and MDD/CFS groups. As all three analyses provided similar outcomes, we present data on the CFS group versus the combined MDD and MDD/CFS groups (See Table 2). For the MFI General Fatigue scale, when we compared the CFS versus MDD combined group, the ROC resulted in an AUC of .63. For the MFI Reduced Activity scale, the ROC resulted in an AUC of .56. For the Reeves et al. (2005) criteria for CFS, the MFI General Fatigue score of 13 or above had a sensitivity of .85 and a specificity of .35; whereas the MFI Reduced Activity score of 10 or greater had a sensitivity score of .82 and a specificity score of .24. To meet the Reeves et al. fatigue criterion for CFS, a person would need to meet either the General Fatigue cutoff score or the Reduced Activity score, and with this criterion, the AUC was .54 (with an overall sensitivity of 1.0 and a specificity of .08, meaning that all CFS cases would be identified but almost none of the negative cases). The AUC findings and the sensitivity and specificity outcomes for the Reeves et al. empiric CFS case definition for the fatigue criterion would not be considered a good diagnostic tool for selecting CFS cases among a sample of CFS and MDD cases.
Table 2 also presents data on the AUCs, SE, 95% CIs for the symptom (SI) and disability (SF-36) measures. For the Reeves et al. (2005) empiric CFS criteria to be met, a person would need to meet criteria on the MFI fatigue scale (as indicated above), score greater than or equal to 25 on the SI, and score below the 25th percentile for one or more of the four SF-36 subscales in Table 2 (cut off scores are provided in Table 2). The final row in Table 2 indicates whether a person meets the Reeves et al. empiric criteria for all three measures, the MFI, the SI and the SF-36. Using the cutoff scores as required by the Reeves et al. criteria, the MFI fatigue criterion and the SF-36 disability criterion SF-36 selects all CFS cases but identifies almost none of the true negatives. Only the symptom measure (SI) identifies .93 of the CFS cases and identifies about half of the negative cases. These results indicate that the Reeves et al. empiric CFS case definition would not be considered a good diagnostic tool for selecting CFS cases among a sample of CFS and MDD cases.
The study’s primary finding is that many fatigue scales do not have the sensitivity to select only those with CFS and the specificity to discriminate those without CFS. Study 1 compared those with CFS and healthy controls, and while several fatigue scales such as the Fatigue subscale (Ray et al., 1992), the Fatigue Scale (Chalder et al., 1993), the FSS (Krupp et al., 1989), the MFTQ’s Energy and Brain Fog subscales (Jason, Jessen, et al., 2009) had adequate sensitivity, the scales did not identify an adequate percentage of true negatives. In contrast, the MFTQ’s Post-Exertional fatigue (Jason, Jessen, et al., 2009) had adequate sensitivity and specificity. For a low prevalence illness such as CFS, it is critical for a fatigue scale to be able to accurately identify both positive and negative cases.
Each of the studies had controls that represented different populations. If one is trying to use fatigue scales to identify people who might have CFS and the comparison is a healthy population, as in Study 1, several fatigue scales would identify 90% or more of CFS cases, but these fatigue tests are less accurate for correctly classifying a non-ill subject as not having CFS. Study 2 dealt with efforts to discriminate CFS from a population that had a psychiatric illness. For such samples, Study 2 indicated that the MFI (Smets et al., 1995) does not have adequate sensitivity and specificity for identifying fatigue among CFS cases from a sample containing depressed individuals.
The scale with the best sensitivity and specificity was the Post-Exertional subscale of the MFTQ (Jason, Jessen, et al., 2009). Post-Exertional fatigue is defined as abnormal exhaustion following a bout of physical activity (e.g., “Physically drained after mild activity”). Clearly, while almost all of the CFS participants did experience Post-Exertional fatigue, some of the participants did not experience high levels of fatigue. In other words, some patients with CFS are not chronically fatigued, but they have a problem of endurance or stamina, and lengthy times to recover following minimal degrees of activity (Hyde, 1999). Such individuals with CFS might not register impaired scores on overall measures of fatigue intensity. A person who participates in very little activity (possibly to minimize CFS symptoms) when compared to his or her same-age peers, and becomes exhausted upon minimal exertion should not be excluded from a CFS diagnosis. Additionally, the Fukuda (1994) requirement of at least 24 hours of post-exertional fatigue may be too stringent to capture this phenomenon in individuals with CFS who severely limit their activities.
According to Bayes’ theorem (Jaynes, 2003), even if a screening test was 95% accurate, most individuals with the illness would not even be identified, given the low prevalence rate of CFS (about 4 in a thousand; Jason, Richman, et al., 1999). To increase sensitivity, Reeves et al. (2007) recommend screening for at least one of the CFS defining symptoms (fatigue, cognitive impairment, unrefreshing sleep, muscle pain, joint pain, sore throat, tender lymph nodes, or headache) for ≥ 1 month. It is interesting that the symptom with the most sensitivity and specificity, post-exertional fatigue, was not part of these CFS defining symptoms.
We did find that in study 1, Krupp et al.’s (1989) FSS score of > 4.95 would select 90% of CFS cases, and 84% of the time would correctly identify negative cases. In study 1, this scale had a better ability to detect cases and non-cases than either the Fatigue Scale (Chalder et al., 1993) or the Profile of Fatigue-Related Symptoms (Ray et al., 1992). Therefore, if one wanted to use a fatigue scale to identify cases of fatigue for a CFS diagnosis, there is some justification for using a score of > 4.8 on the FSS. However, for those individuals who have low stamina and endurance, but currently have less fatigue because they are severely limiting their daily activities, an argument could also be made that they might satisfy the fatigue criterion by evidencing post-exertional fatigue.
Specification of the criterion for fatigue is of importance as it will have an influence on who is selected for both epidemiologic studies and treatment trials. Reeves et al. (2005) are currently using the MFI fatigue scale (Smets et al., 1995) to identify whether participants meet the fatigue criterion for their CFS empiric case definition. Although study 2 found that the MFI did identify all CFS cases, these scales were not able to successfully identify those who did not have CFS. Similar problems occurred with the SF-36 to measure substantial reductions in functioning.
Identification of CFS samples using methods with optimal sensitivity and specificity will allow for a comparison of findings across different studies, as well as enhance the capability of researchers to identify biological markers for this illness. Substantial implications arise for the CFS patient community if CFS research is improved through the consistent utilization of empirically-derived screening instruments. For example, patients with CFS often report dissatisfaction with treatments they receive, which is in part due to the poor understanding of the etiology of this disabling condition (Ax, Gregg, & Jones, 1997; Deale & Wessely, 2001). If research is conducted on more homogeneous samples and key biological markers are found across studies, then more appropriate treatment options can be developed for CFS. Furthermore, if greater credibility is given to CFS as a medical illness among medical professionals, patients may garner improved support from their health care providers (Ax et al., 1997).
Although standardized fatigue scales are particularly useful in research settings, findings from this study may also have implications for clinical practice. Due to ambiguities in case definitions of CFS and because diagnoses are given after the exclusion of other fatigue-inducing illnesses, care providers are faced with a significant challenge in diagnosing this illness. Oftentimes, patients with CFS are inaccurately given psychiatric diagnoses, such as depression (Deale & Wessely, 2000), and misdiagnosis has consequences for the diagnosis and treatment of CFS. The administration of a brief questionnaire, such as the Post-Exertional subscale of the MFTQ, may assist physicians in identifying key features of CFS symptomatology. Assessing the hallmark symptom of post-exertional malaise will assist in the differential diagnosis of CFS and psychiatric disorders, as this symptom is more common and severe among patients with CFS than those with MDD (Hawk, Jason, & Torres-Harding, 2006).
One limitation of study 1 was the control group was derived from a convenience sample of students and healthy controls referred by the participants with CFS (e.g., parents, other family members). Moreover, the CFS group self-reported their diagnosis and the participants were not asked for any documentation in regards to the diagnosis. Similarly, participants in study 2 self-reported a previous diagnosis of CFS by a physician and both the CFS and MDD groups were recruited via convenience sampling. Other limitations of study 2 include the lack of a healthy control group, small sample sizes, and the focus on only one psychiatric condition.
As indicated in this study, definitions of fatigue might need to include specific guidelines pertaining to the importance of symptom severity in the diagnostic procedure (Cantwell, 1996). Given the high variability in symptom severity among persons with fatigue, standardized procedures need to be employed for determining whether or not a particular symptom is severe enough to qualify as meeting the fatigue criterion for CFS. There is a need for more basic research investigating the specificity and sensitivity of different fatigue criteria being used to diagnose CFS. Determining whether participants meet criteria for symptoms or for substantial reductions in activities also requires more standardization of procedures. Future research is clearly needed to identify instruments that differentiate CFS from other chronic illnesses (Jason et al., 1997), and to assess the specificity and sensitivity of these measures.
The authors appreciate the financial assistance provided by the National Institute of Allergy and Infectious Diseases (grant number AI055735). The authors thank Steven Miller for his help in data analysis.
1Five of these MFTQ Post-Exertion items with the highest factor loadings included: dead, heavy feeling after starting to exercise; next day soreness or fatigue after non-strenuous everyday activities; mentally tired after the slightest effort; physically drained or sick after mild activity; and minimum exercise makes you physically tired. Given the importance of this sub-scale, in addition to a total score, we decided to investigate how many items might be needed to differentiate CFS from controls. We scored an item as being a positive indicator of CFS if it had a frequency of “often” or more and a severity rating of 50 or higher on a 100 point severity scale. We created a variable that indicated whether a person had 0 to 5 Post-Exertional items, and then used this variable in a ROC analysis. The ROC resulted in an AUC of .94 (SE = .02, 95% CI, .90 to .97). The highest sensitivity and specificity occurred for one or more items (.91 and .90, respectively). We therefore recommend that if a person has one or more symptoms, they would meet criteria for Post-Exertional fatigue.
2The scale we used had five choices, and we needed to convert the ratings to a four point scale in order to conform to Wagner et al.’s (2005) severity scaling system.
Publisher's Disclaimer: The published version of this manuscript (located at the link listed above) contains formatting errors within the tables. The formatting is correct in this version of the manuscript and an error notice and correction will be published in the Calls & Announcements section of the Spring 2011 issue of Disability Studies Quarterly