|Home | About | Journals | Submit | Contact Us | Français|
To evaluate clinical validity, including responsiveness, of PROMIS® Pain Interference (PROMIS-PI) and Pain Behavior (PROMIS-PB) T-scores.
Data were aggregated from longitudinal studies of cancer, chronic low back pain (cLBP), rheumatoid arthritis (RA), chronic obstructive pulmonary disease (COPD), and major depressive disorder (MDD). Linear mixed-effects models were used to compare baseline score differences and score changes over time. We calculated standardized response means (SRMs) for subgroups defined by self-reported change in general health and pain.
1357 individuals participated at baseline and 1225 at follow-up. Hypotheses of significant change in PROMIS-PI and PROMIS-PB scores were supported in the intervention groups (cLBP and MDD). Differences in baseline scores for COPD-exacerbators compared to stable-COPD patients were in the hypothesized direction but were not statistically significant. Subgroups reporting better health showed corresponding negative SRM values supporting responsiveness of T-scores to improvement. Responsiveness to decrements was supported in some but not all clinical groups and varied by anchor. More congruent values were obtained when using a pain-specific anchor.
This study provides evidence that PROMIS-PI and –PB scores are sensitive to changes in pain in studies of interventions expected to impact pain. The results inform estimation of meaningful change and support power analyses for comparative effectiveness research.
Approximately 100 million people in the United States experience chronic pain, and pain costs up to $635 billion dollars annually in treatment and lost productivity  In order to develop effective treatments for pain and to evaluate treatment effectiveness, researchers and clinicians need psychometrically sound and clinically valid instruments for measuring different aspects of pain. Quality criteria for the measurement properties of health status measures have been developed that can inform selection of outcome measures in clinical research. Critical features include a measure’s reliability, reproducibility, and validity evidence (including responsiveness).  Recent guidelines for clinical pain research underscore the importance of these psychometric characteristics and emphasize the need for comprehensive health assessment involving multiple facets of patient health-status. 
Pain is a multidimensional construct. Two important aspects of pain are pain interference and pain behavior. Pain interference is the degree to which pain interferes with an individual’s daily activities, and it is increasingly recognized as an important facet of patients' pain experiences. Pain interference has been recommended as a core outcome in clinical trials of pain treatments.  Pain behavior, defined as behavior that typically indicates to others that an individual is experiencing pain, [4–6] can include both verbal (e.g., asking for help, sighing, moaning) and non-verbal (e.g., grimacing, resting, guarding) behaviors. Pain behaviors can be protective by eliciting assistance or support after a precipitating event (e.g. trauma, surgery), but when they are maintained beyond rehabilitation and recovery, pain behaviors can contribute to subsequent physical and psychosocial disability,  making them useful targets for behavioral interventions. [6, 8]
The NIH-funded Patient Reported Outcomes Measurement Information System (PROMIS®) has developed a family of instruments that can be used to measure different aspects of physical, mental, and social health.  More information on the development, validation, and implementation of all PROMIS measures can be found at www.nihpromis.org, and options regarding fixed-length and customized short forms and CAT administration can be found at www.assessmentcenter.net. PROMIS Pain Interference (PROMIS-PI)  and Pain Behavior (PROMIS-PB)  item banks were developed using modern psychometric methods, [12, 13] and the psychometric properties of these measures have been previously evaluated in a large cross-sectional sample that included both healthy people and people with various chronic conditions. [10, 11] PROMIS-PI measures also have been evaluated in ambulatory cancer care,  individuals with disabilities,  inflammatory bowel disease,  and arthritis,  among others. [18–20] The PROMIS-PB also has been evaluated in a range of clinical populations, but in fewer than the PROMIS-PI. [20, 21]
Though PROMIS pain measures have potential advantages ensuing from the advanced methods with which they were developed and are administered and scored, they are relatively new compared to other, more extensively evaluated pain measures such as Medical Outcomes Short Form Bodily Pain scale,  Brief Pain Inventory,  and the 3-item pain scale (PEG). If the PROMIS measures are to be considered for clinical trials and comparative effectiveness research, the clinical validity of the scores in a wide variety of clinical contexts and for different purposes needs to be established. The current paper reports psychometric evaluations of PROMIS-PI and –PB scores in the context of five longitudinal studies of participants with one of five chronic conditions: chronic low back pain (cLBP), cancer, chronic obstructive pulmonary disease (COPD), major depressive disorder (MDD), and rheumatoid arthritis (RA).
Significant changes in PROMIS-PI and –PB scores were hypothesized for those receiving intervention (cLBP and MDD).  We also expected that baseline scores would be higher for patients with COPD who had experienced an exacerbation compared to scores of those with stable COPD. We expected standardized response means (SRMs) to distinguish among participants by self-reported clinical status (worse, about same, better).
Data for this study were collected in longitudinal validation studies conducted by PROMIS investigators with samples of participants with cLBP, cancer, COPD, MDD, and RA. Time between baseline and follow-up varied by clinical sample (i.e., cLBP: 3 months, cancer: 2 months, COPD: 3 months, MDD: 3 months, RA: 12 months).
Two PROMIS pain measures were used in the current study, PROMIS-PI and PROMIS-PB. Like other PROMIS measures, these measures are based on banks of items calibrated using the graded response model that estimates item location (severity) and discrimination (ability to distinguish among people with different levels of the pain outcome.  PROMIS item banks were developed using qualitative and quantitative methods. [10–12, 27–29] The items have a 7-day time-frame.
The use of item banks for measures allows the development of short forms or more flexible administration using computer adaptive testing (CAT), a tailored approach in which the items administered are selected based on individuals’ responses to previous items.  Scores from short forms and those generated using CAT are on a common mathematical metric. For the current study, CAT was used in the cLBP, COPD, and MDD cohorts. PROMIS short forms (Version 1) were used to assess PROMIS-PI and -PB in the arthritis and cancer cohorts. Full PROMIS-PI and -PB item banks are reported in the online appendix.
In addition to standard clinical and demographic descriptors, participants also rated single items related to general health and pain, which served as anchors for subgroup comparisons between those classified as “better”, “about the same” and “worse”.
Detailed descriptions of recruitment, eligibility criteria, and treatments are available in the accompanying introductory article in this issue.  Briefly, participants with cLBP were recruited from the University of Washington Spine Center in Seattle and local recruitment sites. All had cLBP for at least six weeks and received a spinal injection. Participants with MDD were recruited from outpatient treatment clinics at Western Psychiatric Institute and Clinic, Pittsburgh, PA and its affiliates and received treatment in the form of antidepressants, psychotherapy, or both. Participants with COPD had a 10 pack/year history of smoking (i.e., packs per day multiplied by number of years) and met the Global Initiative for Chronic Obstructive Lung Disease clinical criteria for COPD.  They were recruited from multiple participating institutions including the University of North Carolina, North Shore University Health System, The University of Pittsburgh, and Duke University. Participants with COPD-related exacerbations at baseline were included and compared to those with stable COPD who had been exacerbation-free for 2 or more months prior to enrollment. Participants in the cancer study were recruited from North Shore University Health System in Chicago, IL and were administered PROMIS measures in an observational study setting with multiple heterogeneous treatment modalities. Participants were enrolled irrespective of treatment status (i.e. before or after starting treatment regimen). The RA study also was observational. Participants were recruited from multiple sources including the Aging Medical Information System (ARAMIS) and the Stanford Rheumatoid Arthritis Registry. Administration of PROMIS measures was intended to evaluate longitudinal changes in this clinical population, and even though RA is known to worsen over time, all participants received routine clinical care that at times included intervention. Like the cohort with cancer, some participants with RA were expected to improve while others were expected to deteriorate due to the heterogeneity of clinical characteristics and treatment status. No specific hypotheses were developed for the RA and cancer samples, but responsiveness was calculated based on change status at follow-up.
The PROMIS-PI was administered at baseline and follow-up in studies of patients with cLBP, cancer, COPD, MDD, and RA. The PROMIS-PB was administered only in the studies of patients with cLBP, COPD, and MDD (baseline and follow-up). We used linear mixed effects models to estimate average change in scores over time. To account for similarity of repeated measurements within individuals, the models were estimated with random subject effects. [32, 33] Risk of bias from problematic missing data was examined in each clinical sample by assessing relationships between scores on baseline measures and attrition. This risk was found to be minimal, and the data were considered missing at random (MAR) for all analyses. This is advantageous because, when data are MAR, single-time-point data can be included in model estimation. [34, 35] Least square means, standard errors, and 95% confidence intervals were derived from the model.
We stratified each clinical sample into subgroups based on change in self-reported health status using general health anchors and, additionally, using pain anchors. These change anchors were either based on self-reported magnitude of changes or calculated changes in self-reported general health or overall pain scores. Details of how individuals were classified as “better”, “same” or “worse” are reported in Table 1. Standardized response means (SRM; ratio of mean change to the standard deviation of that change) were estimated. We judged a SRM of ≥ |.30| to indicate responsiveness. [36–38] Data management and preparation was carried out using SAS 9.3 for Windows (Copyright 2002–2010 SAS Institute Inc.), and all statistical analyses were carried out using STATA/IC 12.1 (Copyright 1985–2011 Stata Corp LP).
The demographic and clinical profiles of each cohort are detailed in the accompanying introductory article in this issue.  Pooling across all studies, most participants were non-Hispanic whites (82%), roughly half of all participants were 60 years old or older (54%). Most participants were female (68%), with the exception of the sample with COPD (42%).The majority of the cohort with stable COPD (n=79) was white (72%), male (56%), and had a Medical Research Council (MRC) Breathlessness Rating of 1 or 2 (56%). The majority of the cohort with COPD-related exacerbations (n=46) was white (73%), male (61%), 50 or more years of age (91%), and had an MRC Breathlessness Rating of 3 or higher (63%). The majority of the cLBP cohort (n=218) was white (84%), female (56%), 50 or more years of age (62%) with largely moderate to severe back pain (74% ≥8 on 0–10 numeric pain scale for worst back pain). For this cohort, spinal injections were administered an average of 3.9 days after baseline assessment (sd = 6.6). The majority of the MDD cohort (n=196) was white (78%), female (74%), 18–49 years of age (52%) and had a Center for Epidemiologic Studies-Depression score of 22 or greater (73%).  The majority of the cancer cohort (n=310) was white (81%), female (61%), 50 or more years of age (76%), and had an Eastern Cooperative Oncology Group Performance Status Rating (ECOG-PSR) of 0 or 1 (77%). The majority of rheumatoid arthritis sample (n=521) was white (88%), female (81%), 50 or more years of age (88%) and had a HAQ Disability Index of 0–1 (57%). From the combined sample of 1370 participants, less than 1% (n=13) did not have sufficient data available to score PROMIS-PI and -PB measures.
COPD participants defined as exacerbators had mean PROMIS-PI scores of 59.9 compared to a mean of 57.4 for stable participants at baseline. This difference, though in the expected direction, was not statistically significant (p = 0.23). The mean PROMIS-PB score for exacerbators was 55.0 compared to 53.7 for those classified as stable. This difference also was non-significant (p = 0.50)
Baseline, follow-up, and change scores from the mixed effects models are presented in Table 2. Least squares means by clinical group are presented in Figure 1. The largest changes in PROMIS-PI and -PB scores were observed for the cohort with cLBP. These changes, both in magnitude and direction (lower scores at follow-up) were consistent with expectations. Pre- to post-treatment change scores for both PROMIS-PI (Δ =−5.4, 95%CI: −6.6 to −4.3, p<0.001) and PROMIS-PB (Δ=−3.2, 95%CI: −4.0 to −2.4, p<0.001) were statistically significant. PROMIS-PI and -PB score changes were smaller but statistically significant for participants undergoing treatment for depression (Δ=−1.5 [95%CI: −2.8 to −0.2], p=0.027 and Δ=−1.8 [95%CI: −3.1 to −0.5], p=0.005, respectively).
The smallest changes in PROMIS-PI scores were observed for participants with RA; the magnitude of change (Δ=0.1) was not statistically different from zero. Changes in PROMIS-PI scores for the cohort with cancer were also small but were statistically significant (Δ=−1.1, 95%CI: −2.1 to −0.2, p=0.023). Both of these studies were observational in nature and PRO assessment was not tied to treatment status. Accordingly, separate estimates are presented in Figure 1 for individuals with RA reporting improved (baseline mean=58.4 [sd=7.6]; followup mean=55.6 [sd=7.4]; Δ= −2.8; p<0.001) or worsening (baseline mean=53.1 [sd=8.7]; followup mean=55.3 [sd=8.7]; Δ=2.2; p=0.005) general health over time and for those with cancer reporting improved (baseline mean=52.8 [sd=8.5]; followup mean=49.2 [sd=9.1]; Δ=−3.6; p<0.001) or worsening (baseline mean=50.7 [sd=8.7]; followup mean=51.3 [sd=9.7]; Δ =0.6; NS) general health over time.
Based on our criterion reference of SRM values ≥ |.30,| PROMIS-PI and PROMIS-PB measures were responsive to improvement, but there were substantial differences by clinical anchor—general health vs. pain specific (Tables 3 and and4).4). With the exception of values for those with Cancer and those with COPD exacerbation, SRM values for improvement were larger when a pain-specific anchor was used. On the whole, PROMIS-PI scores were more responsive to improvement than were PROMIS-PB scores; the only exception was in the study of individuals with depression.
The PROMIS-PI and –PB scores proved less responsiveness to decrements in pain scores when estimated based on general health anchors. In fact, when general health anchors were used to classify individuals as “worse”, resulting SRM values often were in the wrong direction (negative value indicating improvement in scores). SRM values for “worse” based on pain-specific anchors were more consistent with expectations. A salient example of this finding can be observed in Table 3. PROMIS-PI scores for those with cLBP defined as “worse” had a SRM of 0.44 when calculated based on the pain-specific anchor. In contrast, when using the general health anchor, the estimate was -0.47, a moderately large SRM value, but in the wrong direction. This reversal of direction in some SRM values anchored to worse general health was observed in other studies reported in this issue; that is, some SRM values for groups defined by general health anchors as “worse”, actually had score improvements. Also of note in the results are the relatively large magnitudes of some SRM values for those classified as being the “same” using the general health anchor (e.g., in cLBP, PROMIS-PI = −0.58 and PROMIS-PB SRM = −0.59).
The PROMIS-PI and –PB item banks were developed using state-of-the-art techniques including extensive qualitative evaluations  and modern psychometric methods.[12, 13] The psychometric properties of these measures have been previously evaluated in a large cross-sectional, community and a number of other clinical samples. [10, 11, 14] The findings reported here extend the body of knowledge about how PROMIS-PI and –PB scores function, particularly with respect to changes over time.
Some of our a priori hypotheses were upheld and some were not. We expected baseline PROMIS-PI and –PB scores to be higher, on average in persons with COPD exacerbation compared to those defined as stable, but the observed differences were non-significant. This may have been due, in part, to low statistical power; this cohort was the smallest in the study. However, if the differences had been statistically significant, this would not have altered the fact that the differences in PROMIS-PI and –PB scores were small in magnitude—differences of 2.2 and 1.2, respectively. In our a priori search for clinical comparisons by which to evaluate the PROMIS-PI and –PB scores, we found it intuitively attractive to compare by COPD severity. However, though pain is prevalent in COPD (estimates range 32% to 60%),  it is not used for diagnosing the condition, classifying its severity, or defining an exacerbation. It is possible to have an exacerbation without increase in pain.
A priori hypotheses regarding changes in intervention groups’ scores were upheld. We expected and found statistically significant changes in PROMIS-PI and –PB scores for those receiving intervention for cLBP and for MDD. These changes were substantially larger for the cLBP group than for the MDD group. SRM values for clinically anchored groups supported PROMIS-PI and -PB scores responsiveness to change. Of note, however, were substantial differences in SRM values for “worse” when the sample was classified based on a general health anchor. Not only were SRMs values generally lower when based on the general health anchor, in some cases they were counter-intuitive, which calls into question the appropriateness of using a general health anchor. It also underscores the impact choice of anchor has on psychometric assessments. Further, though anchoring change estimates on patients’ perceptions of change is appealing intuitively, it has limits. Retrospective global ratings of change have been criticized because of their vulnerability to response bias,  and these ratings have been found to be more strongly associated with current status than with change in status. [43, 44]
This study had additional limitations. As stated in the introductory paper,  the decision to conduct the analyses reported in this issue was opportunistic. Studies were conducted around the same time using many of the same measures providing the opportunity to evaluate PROMIS measures by domain across several clinical populations. However, there were methodological differences across studies (e.g., differences in wording of anchors), and this was a limitation. Another limitation is more subtle. Most of the authors of this study were involved in the original work in developing the PROMIS-PI and –PB item banks. We recognize that this “pride of ownership” could influence how results are presented and interpreted.
Despite these limitations, the results of this study add to a growing body of evidence supporting the usefulness of PROMIS-PI and –PB scores in pain studies. Scores were responsive to interventions, especially to an intervention that specifically targeted pain. Psychometric validation, however, is never a completed task, and evaluations of the PROMIS-PI and –PB scores are warranted in other clinical samples and contexts. Of particular interest would be intervention trials in which responsiveness of PROMIS-PI and –PB scores are compared to that of other pain measures. Also useful would be studies that relate scores to other benchmarks of interest to pain clinicians and researchers.
PROMIS® was funded with cooperative agreements from the National Institutes of Health (NIH) Common Fund Initiative (Northwestern University, PI: David Cella, PhD, U54AR057951, U01AR052177, R01CA60068; Northwestern University, PI: Richard C. Gershon, PhD, U54AR057943; American Institutes for Research, PI: Susan (San) D. Keller, PhD, U54AR057926; State University of New York, Stony Brook, PIs: Joan E. Broderick, PhD and Arthur A. Stone, PhD, U01AR057948, U01AR052170; University of Washington, Seattle, PIs: Heidi M. Crane, MD, MPH, Paul K. Crane, MD, MPH, and Donald L. Patrick, PhD, U01AR057954; University of Washington, Seattle, PI: Dagmar Amtmann, PhD, U01AR052171; University of North Carolina, Chapel Hill, PI: Harry A. Guess, MD, PhD (deceased), Darren A. DeWalt, MD, MPH, U01AR052181; Children’s Hospital of Philadelphia, PI: Christopher B. Forrest, MD, PhD, U01AR057956; Stanford University, PI: James F. Fries, MD, U01AR052158; Boston University, PIs: Alan Jette, PT, PhD, Stephen M. Haley, PhD (deceased), and David Scott Tulsky, PhD (University of Michigan, Ann Arbor), U01AR057929; University of California, Los Angeles, PIs: Dinesh Khanna, MD (University of Michigan, Ann Arbor) and Brennan Spiegel, MD, MSHS, U01AR057936; University of Pittsburgh, PI: Paul A. Pilkonis, PhD, U01AR052155; Georgetown University, PIs: Carol. M. Moinpour, PhD (Fred Hutchinson Cancer Research Center, Seattle) and Arnold L. Potosky, PhD, U01AR057971; Children’s Hospital Medical Center, Cincinnati, PI: Esi M. Morgan DeWitt, MD, MSCE, U01AR057940; University of Maryland, Baltimore, PI: Lisa M. Shulman, MD, U01AR057967; and Duke University, PI: Kevin P. Weinfurt, PhD, U01AR052186). NIH Science Officers on this project have included Deborah Ader, PhD, Vanessa Ameen, MD (deceased), Susan Czajkowski, PhD, Basil Eldadah, MD, PhD, Lawrence Fine, MD, DrPH, Lawrence Fox, MD, PhD, Lynne Haverkos, MD, MPH, Thomas Hilton, PhD, Laura Lee Johnson, PhD, Michael Kozak, PhD, Peter Lyster, PhD, Donald Mattison, MD, Claudia Moy, PhD, Louis Quatrano, PhD, Bryce Reeve, PhD, William Riley, PhD, Peter Scheidt, MD, Ashley Wilder Smith, PhD, MPH, Susana Serrate-Sztein, MD, William Phillip Tonkins, DrPH, Ellen Werner, PhD, Tisha Wiley, PhD, and James Witter, MD, PhD. The contents of this article uses data developed under PROMIS. These contents do not necessarily represent an endorsement by the US Federal Government or PROMIS. See www.nihpromis.org for additional information on the PROMIS® initiative.
CONFLICT OF INTEREST
Robert L. Askew: None
Karon F. Cook is an unpaid officer of the PROMIS Health Organization
Dennis A. Revicki: None
David Cella is an unpaid member of the board of directors and officer of the PROMIS Health Organization
Dagmar Amtmann: None
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.