Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Rheumatol. Author manuscript; available in PMC 2012 August 6.
Published in final edited form as:
PMCID: PMC3412585

Content and Criterion Validity of The Preliminary Core Dataset for Clinical Trials in Fibromyalgia Syndrome

Ernest H Choy, MD, Director, Lesley M Arnold, MD, Dan Clauw, MD, Professor of Medicine and Psychiatry, Leslie Crofford, Gloria W. Singletary Professor of Internal Medicine, Chief, Jennifer M Glass, Research Assistant Professor, Lee Simon, MD, Associate Clinical Professor of Medicine, Susan A Martin, Vibeke Strand, MD, Clinical Professor of Medicine, Adj., David A Williams, PhD, Professor Anesthesiology, Internal Medicine, Psychiatry and Psychology, and Philip Mease, MD, Seattle Rheumatology Associates, Chief


Increasing research interest and emerging new therapies for treatment of fibromyalgia (FM) have led to a need to develop a consensus on a core set of outcome measures that should be assessed and reported in all clinical trials, to facilitate interpretation of the data and understanding of the disease. This aligns with the key objective of the Outcome Measures in Rheumatology (OMERACT) initiative to improve outcome measurement through a data driven, interactive consensus process. Through patient focus groups and Delphi processes, working groups at previous OMERACT meetings identified potential domains to be included in the core data set. A systematic review has shown that instruments measuring these domains are available and at least moderately sensitive to change. Most of instruments have been validated in multiple languages. This pooled analysis study aims to develop the core data set by analysing data from 10 randomised controlled trials (RCTs) in FM. Results from this study provide support for the inclusion of the following in the core data set: pain, tenderness, fatigue, sleep, patient global assessment and multi-dimensional function/health related quality of life. Construct validity was demonstrated with outcome instruments showing convergent and divergent validity. Content and criterion validity were confirmed by multivariate analysis showing R square values between 0.4 and 0.6. Low R square value is associated with studies in which one or more domains were not assessed. The core data set was supported by high consensus among attendees at OMERACT 9. Establishing an international standard for RCTs in FM should facilitate future meta-analyses and indirect comparisons.

Keywords: Fibromyalgia, OMERACT, outcome measures, clinical trials, core data set


Fibromyalgia (FM) is a common condition afflicting 2% of the population1. It is characterised by chronic widespread pain with increased sensitivity to pressure elicited pain. The American College of Rheumatology (ACR) classification criteria in 1990 stipulated the presence of chronic widespread pain for at least three months and the presence of at least 11 out of 18 tender points2 Direct and indirect medical costs associated with FM are high3 although using diagnosis positively can reduce healthcare utilisation4. Aside from pain, FM is associated with many symptoms including fatigue, depression, anxiety, and poor sleep quality. Many clinical trials have been conducted in FM; however variances in outcome measurement methodology have made statistical comparison and pooling of results difficult.

The Outcome Measures in Rheumatology (OMERACT) initiative5 has helped to resolve the problem of outcomes measurement variability in rheumatic diseases such as rheumatoid and psoriatic arthritis, by establishing core data sets that should be collected and reported in randomised controlled trials (RCTs). OMERACT offers guidance in selecting core data set domains. Applying the OMERACT filters (i.e., truth, discrimination and feasibility), an iterative process can unfold that continually refines the field's ability to access relevant aspects of disease/syndrome measurement (domains) with precision.

Previous works based on patient focus group and Delphi exercises have established a list of potential core data set domains for trials in FM. Remarkable consensus regarding the relevant domains for FM is supported by a Delphi exercise amongst clinician/researchers, patient focus groups6 a Delphi exercise conducted in patients with FM7 and through voting at OMERACT 7 and 88. Each of these studies provided empirical support for the selection outcome domains that should be considered for inclusion in the core data set9. From these works, the relevant domains for FM appear to be (1) pain, (2) patient global, (3) fatigue, (4) health-related quality of life, (5) multi-dimensional function, (6) sleep, (7) depression, (8) physical function, (9) tenderness, (10) dyscognition (cognitive dysfunction), and (11) anxiety. The Delphi processes and patient focus groups helped to support the face validity (e.g., truth) of these potential domains. The feasibility and discriminatory power of specific instruments used to assess these domains were the topic of a separate systematic review of RCTs in FM10. This latter review found that there were instruments assessing these domains available which were at least moderately responsive to change with effect size of at least 0.4 and were feasible for use in trials of FM (with the exception of dyscogntion). Most outcome measures in RCTs of FM however have been adopted from other diseases (e.g., Beck Depression Questionnaire11 used in evaluation of depression). Support for the valid use of these “adopted” questionnaires in some cases requires additional support. The objective of the current study was to examine some of the psychometric properties of existing outcomes measures being used in trials of FM. This information will help to evaluate the valid use of these “adopted” measures in the context of FM and will further help to establish a core set of domains for investigation in FM RCTs.


Data and analysis

The co-chairmen (PM and EC) on behalf of the steering committee approached four pharmaceutical companies, Forest Laboratories, Jazz Pharmaceuticals, Eli Lilly and Pfizer, for de-identified access to each of their large RCTs in FM for the purpose of evaluating the measurement characteristics of the instruments used by each for domain assessment. Data from 10 RCTs of four compounds being investigated for the treatment of FM were included: milnacipran, duloxetine, pregabalin, and sodium oxybate. Milnacipran and duloxetine are serotonin and nor-ephinehrine reuptake inhibitors while pregabalin is an alpha 2 delta calcium channel agonist. Duloxetine and pregabalin are both licensed in the USA for management of FM and filing for this indication for milnacipran has occurred. Duloxetine is also approved for the treatment of depression and the pain of diabetic peripheral neuropathy. Pregabalin is also approved for the treatment of the pain of peripheral neuropathy and as an adjunct for the treatment of seizure disorder. Sodium oxybate is the sodium salt of gamma hydroxybutyrate. It is a CNS depressant and a sleep modifier. It is licensed for the treatment of cataplexy and excessive daytime sleepiness in narcolepsy. Given that FM is a poly-symptomatic condition; we included trials of different medications reasoning that medications acting on different pathways may have dissimilar impact on individual domains.

Data from RCTs of the same medication have been pooled together for analysis. For commercial sensitivity, medications are coded as A, B, C, and D. Change values were calculated for each outcome measure at baseline and after treatment at the primary endpoint of each trial.

Mapping of Outcome Measures to Domains

All the outcomes measures used in the clinical trials were mapped onto one or more of the following domains: pain, patient global, fatigue, health related quality of life (HRQOL), multi-dimensional function, sleep, depression, physical function, tenderness, dyscognition, and anxiety. For outcome measures which have multiple domains such as Medical Outcomes Survey Short Form 36 (SF-36)12, the individual domains and summary component scores were mapped and included in the analyses separately.

Support for Construct Validity

Construct validity refers to the cumulative evidence supporting whether a given scale or instrument actually assesses the topic it purports to measure. Given almost all the instruments used in RCTs of FM were developed and validated in other medical conditions, it can not be assumed that these “adopted” instruments actually measure fibromyalgia signs and symptoms with the same measurement characteristics as those for the conditions for which they were originally designed. For example, a scale claiming to measure fatigue developed and validated in the context of sports medicine may not be measuring the same type of fatigue affecting individuals with FM. Thus despite the common name “fatigue,” evidence would be needed to support a claim that the same fatigue construct was being assessed by this instrument in both populations.

Support for construct validity for measurement in FM, i.e., whether the instrument is really measuring what it is supposed to measure, has not been established, with the exception of SF-3613. An example is the Medical Outcome Study Sleep Questionnaire14 which is a validated questionnaire developed to assess sleep in patients with sleep disorders. It has been used in a number of RCTs in FM but its validity and performance in FM has not been examined, creating a situation that requires that appropriateness of continued use of this instrument for the sleep domain in FM studies be evaluated.

The construct validity of the instruments is assessed by examining the convergent and divergent relationships of similar and dissimilar instruments. Instruments measuring similar constructs would be expected to have the strongest relationships (either positive or negative depending on the direction of the scale) and un-related constructs would be expected to demonstrate weaker relationships. For this study, correlation matrices containing all the outcome measures used with a given compound were constructed. Thus four matrices were constructed in total. Either Pearson or Spearman correlation coefficients were used depending on the statistical distribution of the instrument. The mean correlation coefficient of outcome measures mapping to the same domain (intra-domain correlation coefficient), was used as an indicator of convergent validity. The mean correlation coefficient of outcome measures of different domains (inter-domain correlation coefficient), was used as an indicator of divergent validity.

Support for the Content Validity of the Domains for the Core Dataset

Content validity refers to the extent to which a single or group of measures is able capture the relevant facets of a given condition. For this study the content coverage of the consensually derived domains was examined by multivariate analysis. Patient global impression of change (PGIC) was used as a surrogate of overall improvement and the dependent variable in multivariate regression analyses. The overall R square values from multivariate regression analyses were used to identify the adequacy of the domains and associated measurement instruments to evaluate overall improvement in these RCTs of FM. For each regression equation, the instrument with the highest univariate correlation with PGIC from each of the domains was included as independent variable.


The domains and the outcome measures used to index the domains from the 10 RCTs are listed in Table 1. Instruments such as the SF-36 or Fibromyalgia Impact Questionnaire (FIQ)15, which were mapped to HRQoL and multidimensional function domains, were almost identical. In trials of one medication, EuroQol was also used. Given the large overlap, HRQoL and Multidimensional Function were merged into one domain: Multidimensional Function.

Table 1
List of Outcome Measures Used in Clinical Trials.

Not all the domains were measured in all RCTs. The number of domains and instruments used in the trials of the four different medications are given in Table 2. While some domains were assessed in all trials (e.g., pain, fatigue), other domains were less consistently assessed (e.g., stiffness, tenderness) and some domains appeared in the evaluation of only one compound (e.g., dyscognition).

Table 2
The Number of Domains and Instruments Used in Clinical Trials of the 4 Different Medications.

Construct Convergence and Divergence

Mean intra-domain correlation coefficients were greater than mean inter-domain correlation coefficients for pain, tenderness, fatigue and depression, therefore, instruments assessing these domains demonstrated convergent and divergent validity (Table 3). For multi-dimensional function and sleep, the difference between mean intra- and inter-domain correlation coefficients was small. For multi-dimensional function, this was expected given the breadth of this construct. For sleep, lack of separation could be due to treatments fail to improve sleep or construct limitations of each of the instruments unable to assess the facets of sleep that are of importance to individuals with FM. For example, the MOS sleep scale assesses snoring (correlation coefficient r=0.02) and waking up with shortness of breath (correlation coefficient r=0.18), which may be relevant for some sleep disorders but less relevant in FM. Thus sleep (despite clinical anecdotes and consensus as being of relevance to FM) did not correlate highly with PGIC and was rather insensitive to change. In some studies, a patient global rating of sleep quality based on a Likert scale was also used. It also showed a moderate correlation with PGIC (correlation coefficient r=0.4) as did the MOS sleep disturbance scale PGIC (R=0.4). These data suggest that subscales may be preferred to the overall indices on some instruments “adopted” from other medical conditions.

Table 3
Convergent and Divergent Relationships.

Originally, measures of tenderness were mapped onto the pain domain. The instruments used included Tender Point Count (TPC) and dolorimetry. However, the correlation coefficient between tenderness and self-reported pain scale was at best moderate (≤ 0.4) while correlation between TPC and dolorimetry correlation was high (r=0.59). This suggested tenderness and spontaneous self-report of pain may not be measuring the same construct in FM and should be treated separately.

In summary, instruments used in these RCTs to measure patient self-reported pain, fatigue, depression, physical function and multi-dimensional function supported the construct validity of these instruments for use in clinical trials of FM. For sleep, the subscale but not overall index was supported. For tenderness, support can only be demonstrated in the trials of one medication in which tenderness was assessed by more than one method. For stiffness, dyscognition and anxiety, convergent and divergent validities could not be determined as these domains were measured by only one instrument in these trials.

Content Validity of the Domains for the Core Dataset

Univariate analysis showed that correlations between instruments of different domains with PGIC were moderate to high (Table 4). For depression, the mean correlation coefficient with PGIC was less than 0.5. However, in all of these clinical trials, patients with severe co-morbid depression were excluded. In addition, patients with moderate depression were also excluded in trials of three of these compounds. Consequently, baseline depression scores were low reducing the effect size of these change scores.

Table 4
Mean Correlation Coefficient Between Instruments of Each Domain with Patient Global Impression of Change.

Multivariate analyses showed moderate to high (0.4-0.67) values of R square, which was related to the number of domains assessed. In studies in which some of the potential domains were not assessed, such as tenderness, the R square value was also lower suggesting that missing key domains will affect the overall coverage of content relevant to the condition of FM.

Regression analyses retained pain, fatigue, physical function, multi-dimensional function and depression in all RCTs of all four compounds. Tenderness was retained in all the trials of the three compounds in which it was assessed and further supports the inclusion of tenderness as a separate domain in the core data set. Sleep was retained in two out of three possible clinical trials groups. Stiffness was retained in 2/4 groups and dyscognition was not retained in these regression analyses.


Data from this study and previous consensus exercises support including pain, fatigue, physical function, and multi-dimensional function as domains in a core data set for clinical trials in FM. Although “adopted” from other medical conditions, the instruments measuring these domains largely demonstrate characteristics supporting face, construct, content and criterion validity in FM. Previous study has also shown that these instruments are at least moderately sensitive to changes10.

Depression is a common symptom in FM and rated as important by both patients and clinicians. This analysis showed that the correlation between depression and PGIC is only moderate. The main reason is likely that the exclusion of patients with moderate and severe depression in most clinical trials results in a low baseline depression score. Therefore, it is unlikely that any instrument would demonstrate a large effect size. Given this exclusion criterion is common in FM RCTs, it seems unnecessary to stipulate its inclusion in the core data set. Nonetheless, the assessment of depression in FM is likely to be helpful in many clinical trials.

“Unrefreshed” sleep is common and thought to be pathogenically importance in FM. Moldofsky et al. showed that symptoms similar to FM could be induced by disturbing the quality of sleep in healthy normal volunteers16. Both patients and clinicians agree regarding its importance. However, to date clinical trials have used instruments not developed in patients with FM and which may not assess the type of sleep problems specific to those patients. The practice of using total indices as the sole outcome endpoints for sleep may not be ideal. Our data suggested that using the sleep disturbance subscale of the MOS sleep scale would have improved the convergent and divergent characteristics of this measure. Since sleep was retained in regression analyses in all but one group, there is a strong argument, for including some element of sleep in the core data set.

The results of the current study suggest tenderness should be included as a separate domain from patient self-reported pain. Pathophysiologically, this would be logical in that it may involve different pathways. Furthermore, it mirrors the need to assess both patient reported pain and tender joint count in rheumatoid and psoriatic arthritis. Although tender point count and dolorimetry have deficiencies such as significant inter-observer variation and may not be the perfect tool, they are feasible and current analyses showed that they contribute significantly to the content validity when added to the core data set. Hence the conclusion to include tenderness in the core data set.

For anxiety, stiffness and dyscognition, currently, there is insufficient data from available clinical trials to support their inclusion into the core data set. Researchers interested in these outcomes should include them in assessment but it is not justifiable to stipulate their assessment in all clinical trials of FM.

The results of this study were presented at the OMERACT 9 FM module and were the basis, along with review of the previous clinician/researcher and patient Delphi exercises, outcome measures, and disease state discussion, for development of consensus on a core domain construct for fibromyalgia (Figure 1)17.

Figure 1
Domains for Fibromyalgia12


We thank the support from Nooshine Dayani, Qu Peng and Robert Palmer from Forest Laboratories Inc, Chinglin Lai, Yanping Zheng and Diane Guinta from Jazz Pharmaceuticals Inc., Daniel Kajdasz and Amy Chappell from Eli Lilly & Co. and Gergana Zlateva and Emir Birol from Pfizer Inc.


The Academic Department of Rheumatology is supported by an Integrated Clinical and Academic Centre grant from the Arthritis Research Campaign, UK. Dr. Williams’ participation was supported in part by Grant Number U01AR55069 from NIAMS/NIH. Dr Arnold's participation was supported in part by Grant Number R01AR053207 from NIAMS/NIH.

Contributor Information

Ernest H Choy, Sir Alfred Baring Garrod Clinical Trials Unit, Academic Department Rheumatology, King's College London, London, UK.

Lesley M Arnold, Department of Psychiatry, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA.

Dan Clauw, University of Michigan, Ann Arbor, Michigan, USA. ude.hcimu.dem@wualcd.

Leslie Crofford, Division of Rheumatology & Women's Health, University of Kentucky, Lexington, Kentucky, USA. ude.yku.liame@2forcjl..

Jennifer M Glass, Research Center for Group Dynamics, Department of Psychiatry, Division of Substance Abuse, University of Michigan, Ann Arbor, Michigan, USA. ude.hcimu@ssalgj.

Lee Simon, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, USA. ude.dravrah.cmdib@nomisl.

Susan A Martin, RTI-Health Solutions, Ann Arbor, Michigan, USA. gro.itr@nitrams..

Vibeke Strand, Division of Immunology/Rheumatology, Stanford University, Portola Valley, California, USA. moc.loa@dnartsV..

David A Williams, University of Michigan, Ann Arbor, Michigan, USA. ude.hcimu@smwaevad.

Philip Mease, Division of Rheumatology Research, Swedish Medical Center, Clinical Professor of Rheumatology, University of Washington, Seattle, Washington, USA. ude.notgnihsaw.u@esaemp.


1. Wolfe F, Ross K, Anderson J, Russell IJ, Hebert L. The prevalence and characteristics of fibromyalgia in the general population. Arthritis Rheum. 1995;38:19–28. [PubMed]
2. Wolfe F, Smythe HA, Yunus MB, Bennett RM, Bombardier C, Goldenberg DL, et al. The American College of Rheumatology 1990 criteria for the classification of fibromyalgia: Report of the multicenter criteria committee. Arthritis Rheum. 1990;33:160–72. [PubMed]
3. Boonen A, van den Heuvel R, van Tubergen A, Goossens M, Severens J, van der Heijde, et al. Large differences in cost of illness and wellbeing between patients with fibromyalgia, chronic low back pain, or ankylosing spondylitis. Ann Rheum Dis. 2005;64:396–402. [PMC free article] [PubMed]
4. Hughes G, Martinez C, Myon E. The impact of a diagnosis of fibromyalgia on health care resource use by primary care patients in the UK: An observational study based on clinical practice. Arthritis and Rheum. 2006;54:177–83. [PubMed]
5. Tugwell P, Boers M, Brooks P, Simons L, Strand V, Idzerda L. OMERACT: An international initiative to improve outcome measurement in rheumatology. Trials. 2007;8:38–43. [PMC free article] [PubMed]
6. Arnold LM, Crofford LJ, Mease PJ, Burgess SM, Palmer SC, Abetz L, et al. Patient perspectives on the impact of fibromyalgia. Patient Edu Couns. 2008 Epub. [PMC free article] [PubMed]
7. Mease PJ, Arnold LM, Crofford LJ, Williams DA, Russell IJ, Humphrey L, et al. Identifying the clinical domains of fibromyalgia: Contributions from clinical and patients delphi exercises. Arthritis Care Res. 2008;59:952–960. [PubMed]
8. Mease P, Arnold LM, Bennett R, Boonen A, Buskila D, Carville S, et al. Fibromyalgia syndrome. J Rheumatol. 2007;34:1415–25. [PubMed]
9. Mease PJ, Clauw DJ, Arnold LM, Goldenberg DL, Witter J, Williams FD, et al. Fibromyalgia syndrome. J Rheumatol. 2005;32:2270–7. [PubMed]
10. Carville SF, Choy EH. Systematic review of discriminating power of outcome measures used in clinical trials of fibromyalgia. J Rheumatol. 2008;35:2094–105. [PubMed]
11. Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–71. [PubMed]
12. Stewart AL, Hays RD, Ware JE., Jr The MOS short-form general health survey. Reliability and validity in a patient population. Med Care. 1988;26:724–35. [PubMed]
13. Gendreau M, Williams DA, Strand V. Validation and minimum clinically important differences in SF-36 in randomized controlled trials of fibromyalgia. 2008. In press.
14. Stewart AL, Ware JE, Brook RH, Davies AR. Physical health in terms of functioning. II. The RAND Corporation; Santa Monica (CA): 1978. Conceptualization and measurement of health for adults in the Health Insurance Study. pp. 236–359.
15. Burckhardt CS, Clark SR, Bennett RM. The fibromyalgia impact questionnaire: development and validation. J Rheumatol. 1991;18:728–33. [PubMed]
16. Moldofsky H, Scarisbrick P, England R, Smythe HA. Musculoskeletal symptoms and non-REM sleep disturbance in patients with “fibrositis syndrome” and healthy subjects. Psychosom Med. 1975;37:341–345. [PubMed]
17. Mease P, Arnold LM, Choy EH, et al. Fibromyalgia syndrome. J Rheumatol. 2009 In press. [PMC free article] [PubMed]