|Home | About | Journals | Submit | Contact Us | Français|
This study examined the measurement invariance of responses to the patient-reported outcomes measurement information system (PROMIS) pain interference (PI) item bank. The original PROMIS calibration sample (Wave I) was augmented with a sample of persons recruited from the American Chronic Pain Association (ACPA) to increase the number of participants reporting higher levels of pain. Establishing measurement invariance of an item bank is essential for the valid interpretation of group differences in the latent concept being measured.
Multi-group confirmatory factor analysis (MG-CFA) was used to evaluate successive levels of measurement invariance: configural, metric, and scalar invariance.
Support was found for configural and metric invariance of the PROMIS-PI, but not for scalar invariance.
Based on our results of MG-CFA, we recommend retaining the original parameter estimates obtained by combining the community sample of Wave I and ACPA participants. Future studies should extend this study by examining measurement equivalence in an item response theory framework such as differential item functioning analysis.
Pain is one of the major distressing symptoms experienced by patients with numerous chronic and acute conditions, and pain interference is an important aspect of the pain experience . Pain interference was among the first outcomes targeted in the National Institutes of Health’s (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) . As defined in the PROMIS domain framework, pain interference refers to “consequences of pain on relevant aspects of persons’ lives and may include impact on social, cognitive, emotional, physical, and recreational activity as well as sleep and enjoyment of life .”
All PROMIS measures, including the PROMIS pain interference (PROMIS-PI), were developed as the banks of items calibrated to an item response theory (IRT) model. Data for the development of the first set of PROMIS measures were obtained by administering candidate items to a large sample comprised predominantly of an Internet panel of community participants (Wave I). The Wave I sample was quite large, but included few individuals with higher levels of pain. With IRT models, precise estimates of item parameters require not just large overall sample sizes, but adequate sample sizes in every response category . To increase the number of observations in response categories indicating higher pain, the PROMIS Wave I sample was combined with data from the American Chronic Pain Association (ACPA) [2, 5–8].
Combining these data solved one challenge, but created a potential methodological issue. Whereas the ACPA data came from a clinical sample, the Wave I sample was predominately drawn from the community and a large portion of these were healthy. Researchers may be concerned that test score differences observed in different subgroups are due to measurement instrument problems rather than true differences in the trait being measured. This question can be answered by detecting a lack of measurement invariance. Measurement invariance means that the same construct is measured similarly across groups. Combining the Wave I and ACPA samples is appropriate only if the PROMIS-PI items are measurement invariant in the two groups. The purpose of this study was to examine measurement invariance of the PROMIS-PI using multi-group confirmatory factor analysis (MG-CFA).
The PROMIS Wave I data included 19,601 participants recruited from YouGovPolimetrix and 1,532 were collected from research sites associated with the PROMIS network. Two data collection designs were utilized. In the “full bank” design, all 56 candidate PI items were administered. In the “block administration,” participants answered only 7-item subsets [2, 6–8]. The current study included only respondents from the full bank design.
Before starting the ACPA data collection, nine items were removed from the candidate PROMIS-PI bank based on initial psychometric analyses following Wave I data collection and secondary review by content experts. Of the nine items removed, five were removed because of poor fit, three items were removed because they did not specifically mention pain, and one item was removed because of poor correlation with other items in the bank. This left a revised candidate item bank with 47 items (See Appendix). These were administered to ACPA participants who were 21 years of age or older and had at least one chronic pain condition for at least 3 months prior to the survey.
Note that only 41 items of the PROMIS-PI were analyzed across the two samples, WAVE I and ACPA, for the current study, because only 41 items were calibrated into the PROMIS-PI item bank based on the results of psychometric analyses and a secondary review by content experts. Furthermore, this study considered participants who had no missing item responses on these 41 items. Thus, a total of 754 PROMIS Wave I and 807 ACPA participants were included in the current study.
The first and weakest level of measurement invariance is configural invariance . Configural invariance requires that the same pattern of item-factor loadings exists across group being compared, that is, the same items have nonzero loadings on the same factors. The next level, metric invariance , additionally requires that factor loadings are not statistically significantly different across groups. Scalar invariance [9, 11] requires configural and metric invariance and, additionally, invariant item intercepts across groups.
To test measurement invariance using MG-CFA, Mplus 6.1 software  was used to estimate each model with weighted least squares mean and variance adjusted (WLSMV) estimation. Goodness of fit was evaluated using χ2, Comparative Fit Index (CFI) , Tucker–Lewis Index (TLI) , and root mean square error of approximation (RMSEA) [15, 16]. CFI and TLI values above 0.95 are preferable , and RMSEA values of less than 0.08 are considered to indicate fair fit . In the MG-CFA approach, fit of a baseline model is compared to the fit of increasingly constrained models. Typically, the χ2 difference test is used to compare the fit of two nested models [17, 19, 20]. When the χ2 difference is not statistically significant, the researcher has evidence supporting the less parameterized model. Like the model fit χ2 test statistic, the χ2 difference test is sensitive to sample size. To account for this, we used an alpha level of 0.05 and also calculated Cheung’s and Rensvold’s ΔCFI index . A difference of less than 0.01 in the ΔCFI index supports the less parameterized model [21, 22]. Model fit was only compared when both of the models of interest individually fit the data.
All 41 items administered to the Wave I and ACPA samples were rated on a 5-point scale ranging from 1 to 5. One item (PI9) was dropped because on this item, the two groups (ACPA and PROMIS Wave I) had a different number of response options, while MG-CFA requires that items administered to both groups have responses for the same number of response categories. ACPA participants endorsed only four response categories because nobody endorsed no interference in response to: How much did pain interfere with your day to day activities? PROMIS Wave I participants endorsed all five response categories. Thus, the choice was to collapse the first and second response category for the PROMIS Wave I sample or to drop the PI9 from the analyses. We chose to drop the item rather than recode the PROMIS Wave I responses.
The initial configural invariance model run with the remaining 40 items had unsatisfactory fit: χ2 (1,500, N = 1,561) = 22,919.14, p < .01, CFI = 0.90, TLI = 0.90, RMSEA = 0.135 (from 0.134 to 0.137). To improve model fit, we examined modification indices and residual correlations. The modification indices suggested adding correlated residuals to improve the model fit. However, doing so resulted in a non-positive latent variable matrix in our study . Moreover, the larger values of modification indices suggested local dependence between items . Instead of modifying the model by adding correlated residuals, we also examined the residual correlations with absolute values greater than 0.20 (suggesting the local dependency). Local independence means that after controlling for the trait level (i.e., pain interference), the response to any item is unrelated to any other item. Local dependence suggests that item responses are linked, that is, that the items are redundant. After examining modification indices, non-positive latent variable matrix, and the residual correlations, we decided to eliminate the following five items: (1) PI11 “How often did you feel emotionally tense because of your pain?”, (2) PI16 “How often did pain make you feel depressed?”, (3) PI42 “How often did pain prevent you from standing for more than one hour?”, (4) PI47 “How often did pain prevent you from standing for more than 30 min?”, and (5) PI55 “How often did pain prevent you from sitting for more than one hour?”. Thus, our final measurement invariance tests included only 35 PROMIS-PI items. Figure 1 illustrates a schematic flow of our item analysis for the current study.
A total of 754 PROMIS Wave I (Men = 344 and Women = 410) and 807 ACPA (Men = 150 and Women = 654, missing = 3) participants were included in the current study (demographics in Table 1). Two datasets (ACPA and Wave I) were statistically different on age, t (1,554) = 3.627, p < .001, gender, χ2 (1, N = 1,558) = 130.67, p < .001, ethnicity, χ2 (1, N = 1,548) = 63.96, p < .001, marriage status, χ2 (2, N = 1,444) = 22.91, p < .001, and education, χ2 (4, N = 1,558) = 49.77, p < .001. Furthermore, an item specifically asking respondents to report current chronic conditions was only administered to the ACPA sample. The most frequently endorsed current chronic pain conditions were lower back pain, neck (or shoulder) pain, and other neuropathic pain (nerve damage) (See Table 2).
The CFA model run with the combined samples confirmed one latent factor χ2 (560, N = 1,561) = 8,795.562, p < .01, CFI = 0.991, TLI = 0.991, RMSEA = 0.097 (from 0.095 to 0.099).
A configural invariance model (i.e., no across group equality constrains on any parameters) was tested across the two samples. The results supported configural invariance between the PROMIS and ACPA samples: χ2 (1,120, N = 1,561) = 10,481.76, p < .01, CFI = 0.96, TLI = 0.95, RMSEA = 0.103 (from 0.102 to 0.105) (See Table 3).
The metric invariance model (i.e., equal constraints on unstandardized item-factor loadings across groups) also had good fit: χ2 (1,155, N = 1,561) = 10,539.40, p < .01, CFI = 0.96, TLI = 0.95, RMSEA = 0.102 (from 0.100 to 0.104). When we compared the fit of configural model (i.e., the same patterns of factor loading across groups) and metric (i.e., equal unstandardized factor loading values across groups) model, the χ2 difference test was statistically significant: Δχ2 (Δdf = 35) = 1,422.39, p < .01; A statistically significant decline in χ2 supporting some relationships among variables statistically differed PROMIS and ACPA samples. As noted above, the χ2 difference is sensitive when there are relatively larger sample sizes, so researches recommended CFI difference test for testing measurement invariance for a large sample size. Dissimilar to the χ2 difference test, the CFI difference test supported metric invariance (ΔCFI = 0.00) (See Table 3).
Next, we examined the PROMIS-PI for scalar invariance. The equivalence of thresholds across groups was not supported: χ2 (1,295, N = 1,561) = 24,484.92, p < .01, CFI = 0.89, TLI = 0.90, RMSEA = 0.151 (from 0.150 to 0.153). Since the scalar invariance model did not fit the data, model fit was not compared to test measurement invariance (See Table 3).
We examined the measurement invariance of PROMIS-PI items across two qualitatively different samples using MG-CFA methods. There is currently no consensus regarding the level of invariance necessary before one can confidently compare scores across groups. Horn and McArdle  require metric invariance; Reise and Widman  require only partial-loading invariance (i.e., partial metric invariance); and Chen, Sousa, and West  require scalar invariance. Our analyses supported measurement invariance at the level of metric, but not scalar invariance.
Had the PROMIS-PI been found to lack configural or metric invariance, a case could be made for re-calibrating the item bank or dropping items that function differently in the two groups. We found, however, that the PROMIS-PI met all but the strictest from of recommended measurement invariance for the comparison of scores across groups. This means that the instrument measures the same construct in both populations and the scores can be used to measure both healthy and clinical samples (such as those with chronic pain). Based on these results, we recommend using the original parameter estimates obtained from the combined sample of Wave I and ACPA participants. For clinicians, this finding means that the instrument can be scored and used as originally published.
The results of the study also suggest that the PROMIS pain interference bank includes items that are locally dependent. Local dependence results in biased parameter estimation [27, 28]. Thus, our results suggest that the PROMIS network should evaluate and address local dependency in the pain interference bank.
Future studies should extend our analyses by testing measurement invariance using an IRT framework. In IRT, lack of measurement equivalence occurs at the item level and is referred to as differential item functioning (DIF) . Comparison of results based on MG-CFA used in this study and results based on IRT methods would further extend our understanding of the level of measurement invariance in the PROMIS-PI.
The project described was supported by Award Number 3U01AR052177-06S1 from the National Institute of Arthritis and Musculoskeletal and Skin Diseases. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Arthritis and Musculoskeletal and Skin Diseases or the National Institutes of Health.
Jiseon Kim, Department of Rehabilitation Medicine, University of Washington, Seattle, WA 98195, USA.
Hyewon Chung, Department of Education, Chungnam National University, 99, Daehak-ro, Yuseong-gu, Daejeon 305-764, Korea ; Email: moc.liamg@7gnuhcnoweyh.
Dagmar Amtmann, Department of Rehabilitation Medicine, University of Washington, Seattle, WA 98195, USA.
Dennis A. Revicki, Center for Health Outcomes Research, United BioSource Corporation, 7101 Wisconsin Ave., Suite 600, Bethesda, MD 20814, USA.
Karon F. Cook, Department of Medical Social Sciences, Northwestern University, Feinberg School of Medicine, 625 N. Michigan Ave., Suite 2700, Chicago, IL 60611, USA.