|Home | About | Journals | Submit | Contact Us | Français|
To develop a social health measurement framework, to test items in diverse populations and to develop item response theory (IRT) item banks.
A literature review guided framework development of Social Function and Social Relationships sub-domains. Items were revised based on patient feedback, and Social Function items were field-tested. Analyses included exploratory factor analysis (EFA), confirmatory factor analysis (CFA), two-parameter IRT modeling and evaluation of differential item functioning (DIF).
The analytic sample included 956 general population respondents who answered 56 Ability to Participate and 56 Satisfaction with Participation items. EFA and CFA identified three Ability to Participate sub-domains. However, because of positive and negative wording, and content redundancy, many items did not fit the IRT model, so item banks do not yet exist. EFA, CFA and IRT identified two preliminary Satisfaction item banks. One item exhibited trivial age DIF.
After extensive item preparation and review, EFA-, CFA- and IRT-guided item banks help provide increased measurement precision and flexibility. Two Satisfaction short forms are available for use in research and clinical practice. This initial validation study resulted in revised item pools that are currently undergoing testing in new clinical samples and populations.
Patient-reported outcomes (PRO) are increasingly incorporated into clinical trials and clinical practice. A new approach to PRO measurement is the development of item banks that contain numerous questions representative of a common trait. Items in a well-constructed bank cover the entire continuum and are calibrated on the same measurement scale, thus simplifying scoring and interpretation. With a calibrated bank, the questions can be used to create fixed-length test instruments and computerized adaptive tests (CATs) that minimize respondent burden [1-4]. The primary objective of the Patient-Reported Outcomes Measurement Information System (PROMIS; http://www.nihpromis.org) is to develop and validate numerous item banks.
PROMIS began in 2004 as an NIH roadmap initiative that includes research scientists, clinicians and psychometric measurement experts at the NIH, at 6 primary research sites and at a statistical coordinating center . PROMIS item banks measure key symptoms and health concepts applicable to a range of chronic health conditions. Calibrated item banks will enable measures of PRO that are efficiently administered, reliable, valid and easily interpretable. The purpose of this paper is to describe the development and initial validation of the Social Health item banks.
PROMIS developed a domain map (framework) to portray the item bank structure. Starting with three overall domains of physical, mental and social health , PROMIS workgroups were formed to define, develop and test multiple sub-domains.
Past work has recognized the importance of social determinants of health such as social status, social networks and social support [7-9]. The study of social health as an outcome, that is, a factor that would reflect measurable change in response to interventions or changes in health status, has received limited attention. With increasing focus on understanding the entire picture of health and the full impact of disease on people’s lives, there is a need to measure social health as an outcome, distinct from physical and mental health.
Theories of social health have used different conceptual models, based in different disciplines, posing a special challenge in defining sub-domains. Primary components include social role participation, social network quality, social integration and interpersonal communication [10-16]. Other conceptualizations are based on interpersonal attributes independent of particular roles . Researchers have also studied social support and social ties, for example, marital status, frequency of contacts with friends and relatives, and religious organization membership [7, 18].
In general, the division of social health into sub-domains appears to depend on the purpose of a given clinical or research program. Available measures reflect these varying conceptual divisions. The goals of the PROMIS Social Health Workgroup were to develop a unified framework for conceptualizing social health and to create item banks that would reflect the experience of healthy people, as well as those with a range of medical and mental health conditions. This paper describes the processes of framework development, item development and testing, and psychometric analysis for the Social Health item banks.
Based on PROMIS goals and an extensive literature review, the Social Health Workgroup recognized that extant frameworks included two primary sub-domains: Social Function and Social Relationships (Fig. 1). Social Function often included some distinction between capability and satisfaction, and social relationships included concepts of social support and isolation. The workgroup focused first on Social Function and agreed to address Social Relationships (including possible expansion of its sub-domains) in a future initiative. We developed items covering four contexts: family, friends, work and leisure. In previous cancer research, we developed and tested social health items . Those results provided preliminary support for two Social Function sub-domains: Ability to Participate and Satisfaction with Participation , including an empirical construct hierarchy that was generally consistent with clinical expectations. For example, limitations in leisure activities were more common than limitations related to family/friends. The Social Health Workgroup adopted this structure for PROMIS.
Each PROMIS workgroup performed a qualitative item review process that included identification of existing items, development of new items, item revision, readability levels, focus group exploration of domain coverage, cognitive interviews on individual items and final revision before field-testing . Multilingual translation experts reviewed items to facilitate future translations. Additional details about qualitative item review are reported elsewhere . This process produced 56 Ability and 56 Satisfaction items. The Social Health Workgroup wished to evaluate whether positive endorsement of capability could be scaled alongside negative endorsement of limitation and to evaluate the value of asking about (positive) satisfaction as well as (negative) disappointment or bother. As a result, the item pools included multiple examples of minor variations on similar themes.
Items were administered to general population and clinical samples. The general population sample was comprised of panel members of YouGovPolimetrix (www.polimetrix.com), an Internet polling organization with a registry of more than one million respondents. Target accrual percentages were established for the general population sample by gender (50% women), age (20% in each of six age groups), race (10–15% African-American), ethnicity (10–15% Hispanic) and education (25% high school or less). All participants received a small incentive ranging from $10 to $50; one site provided a token incentive less than $10 in value.
Panel respondents were assigned to complete items in either a block testing or full-bank format. For block testing, overlapping blocks of seven items were formed within each of 14 PROMIS item pools. For the full-bank format, respondents completed all items from two PROMIS item pools. All clinical samples were assigned to block testing. Each item was administered to at least 900 general population participants and 500 clinical sample participants (arthritis, cancer, chronic obstructive pulmonary disease, heart disease, psychiatric conditions and spinal cord injury). Respondents also answered sociodemographic and clinical questions .
Respondents in full-bank testing completed 56 Ability and 56 Satisfaction items. They also completed 10 PROMIS global health items  and 15 items from “legacy” (i.e., widely used and accepted) instruments to evaluate criterion validity and provide a link to prior work. Nine items were used from the SF-36  (version 2, acute timeframe) and six from the Functional Assessment of Cancer Therapy-General Population (FACT-GP, version 4 ). The Ability and Satisfaction item subsets were counterbalanced, positively and negatively worded item sets were grouped together, and items were administered in random order. In block testing, the 7-item blocks were randomly ordered. These respondents also completed the global health, sociodemographic and clinical questions, but not the legacy instruments. All data were collected by computer with secure servers, and only one item at a time was displayed on the screen. Respondents could skip items and go back to change a response.
Analyses followed the PROMIS guidelines . The goals were to develop unidimensional item sets that fit a two-parameter item response theory (IRT) model and did not exhibit differential item functioning (measurement bias) across gender, age and education. Analyses were conducted separately for the Social Function sub-domains (Ability and Satisfaction).
Only data from the full-bank (general population) testing were used in the analyses reported here. Our goal was to evaluate dimensionality and create item banks; tasks most effectively done when all items are administered to all people, which was not the case for the clinical samples. Of the 956 respondents taking the full-bank format, we excluded respondents who answered fewer than half of the items (n = 94 for Ability; n = 104 for Satisfaction). We also excluded respondents who had an average response time of less than one second per item or had a response time of less than a half second for each of 10 consecutive items (n = 84 for Ability; n = 84 for Satisfaction). This left 778 respondents (81%) available for the Ability analyses and 768 (80%) for Satisfaction. Data quality was assessed to identify out-of-range and missing values and to assure that negative items were reversed for scoring. Preliminary analyses were conducted to identify unused or sparsely used categories, to examine whether the average measures in response categories increased monotonically and to evaluate internal consistency reliability. A corrected item-total score correlation above 0.30 was required in order to retain that item for further analysis.
The analytic sample was randomly split into half for use in either exploratory factor analysis (EFA) or a subsequent confirmatory factor analysis (CFA). For EFA, polychoric correlations were entered into Mplus and analyzed using an unweighted least squares estimation procedure [26-28]. Factors were identified by eigenvalues greater than 1.0 and examination of scree plots. Items loading 0.40 or above on a factor were examined to describe the factor. For the subsequent CFA, polychoric correlations were entered into Mplus and analyzed using a weighted least squares estimation procedure. A value greater than 0.95 on the comparative-fit index (CFI) was considered evidence of good model fit; a value greater than 0.90 was considered acceptable . The results of one-factor models were examined; acceptable model fit provided some support for unidimensionality (local dependence was also examined; see below). A bifactor model was also used to confirm unidimensionality . We considered the data to be “essentially unidimensional” if fit was acceptable for a model with one general and several specific orthogonal factors . Finally, local dependence between item pairs was defined as a residual correlation greater than 0.20.
We used the MULTILOG graded response model to estimate item parameters and evaluate model fit [32, 33]. In this two-parameter logistic IRT model, item responses are used to estimate the “measure” (theta, i.e., the person’s transformed level on the latent trait). The two parameters are item difficulty, which represents the item’s location on the latent trait, and item slope, which indicates how well the item discriminates (distinguishes) between person differences across the latent trait . We used item characteristic curves to examine the distribution of responses across categories, item thresholds to examine the range of Ability (or Satisfaction) being measured by these items, slopes to identify items with poor discrimination and the test information function to estimate where theta estimates had the most precision. Model fit was assessed with likelihood-based chi-squared statistics (S − X2) [35, 36].
For all unidimensional item sets, we examined uniform and non-uniform DIF using IRTLRDIF . Uniform DIF detects differences across the entire theta range, whereas non-uniform DIF detects differences in only a segment of the theta range. We compared hierarchically nested IRT models; specifically, one model that fully constrained parameters to be equal between two groups was compared to other models that allowed parameters to be freely estimated. Three group comparisons were evaluated: by gender, by age (<65 vs. ≥65) and by education (high school/GED or less vs. higher education).
Fixed-item short forms were created to provide an alternative where computers may not be available. The criteria for item inclusion were content representativeness (inclusion of items from each context), maximized range of difficulty (inclusion of items across the calibration range) and acceptable discrimination levels (inclusion of items that distinguish between people across the latent trait). Raw scores were calculated and transformed to T-scores (mean = 50; standard deviation = 10). Descriptive statistics were calculated for the sample as a whole and for gender, age and education subgroups. Spearman correlation coefficients were calculated to evaluate the association between short form scores and legacy measures (SF-36 role physical, role emotional and social functioning subscales, and FACT-G functional well-being subscale).
We identified and reviewed 1,781 Social Function items; 112 items were retained and edited. New items were written to fill content gaps. This process produced 56 items for Ability to Participate and 56 for Satisfaction with Participation within four contexts (family, friends, work and leisure). All items were written as statements, using a 7-day reporting period. The Ability items used a 5-point frequency rating scale (Never, Rarely, Sometimes, Often and Always) and the Satisfaction items used a 5-point intensity rating scale (Not at all, A little bit, Somewhat, Quite a bit and Very much). These scales were chosen to best measure the defined latent traits.
Table 1 summarizes characteristics of the general population participants in the analytic datasets. The proportions of Hispanic and African-American respondents, and those with lower education, were slightly lower than the target proportions.
Response distributions were somewhat negatively skewed, that is, relatively fewer respondents reported poor social function. Few category inversions (disordered measures across response categories) were found and, where they occurred, they were at the bottom of the scale where frequencies were small. The relatively sparse categories and inversions did not suggest a problem with respondent use of the rating scale. Reliability coefficients were high (>0.98), and item-total correlations were acceptable (0.65–0.85 for Ability; 0.47–0.82 for Satisfaction).
The EFA for Ability to Participate began with 56 items. Seven items loaded weakly across factors, and the remaining items loaded on three factors: 21 social activity items (family, friends, leisure and community activities); 16 social roles and responsibilities items (work and family responsibilities); and 12 social activity limitations items (limitations in family, friends and leisure activities; Table 2). Good or acceptable CFA model fit was found for each subset; specifically, the CFI for both social activities and social roles was 0.951, and the CFI for social activity limitations was 0.908. After deleting two “visiting relatives” items because of local dependence and poor discrimination, model fit improved for social activity limitations (CFI = 0.952). A bifactor model (one general and three specific factors) demonstrated acceptable fit (CFI = 0.912), suggesting that it might be possible to model the 49 Ability to Participate items as an “essentially unidimensional” construct .
In the Satisfaction EFA, we started with 56 items and deleted 30 items worded in terms of bother or disappointment. The introduction of both positively and negatively worded items, and the inclusion of nearly redundant content varying only by modifiers (e.g., disappointed; bothered), produced EFA results that reflected several small factors and a clear division between positive and negative items. Most of the negative items initially loaded on a separate factor; in subsequent analyses, they exhibited local dependence along predicted lines and poor discrimination. The remaining 26 positive items loaded on two factors: 14 satisfaction with participation in social roles items and 12 satisfaction with participation in discretionary activities items (Table 2). Good CFA model fit was found for these two factors (CFI = 0.959 and 0.968, respectively). The bifactor model (one general and two specific factors) demonstrated acceptable fit (CFI = 0.931), and there was no local dependence.
All categories were used by the respondents, but with sparse coverage at the top of the hierarchy. When modeling all 49 Ability to Participate items together, the threshold range (−1.54 to 1.54) was narrower than the score measure range (−3.39 to 2.31), slopes were acceptable for all items (range: 1.84 to 4.52), and examination of the test information function showed that theta estimates were most precise in the middle score range. However, 23 items misfit the unidimensional IRT model (P < 0.05 for the S − X2 statistics). When modeling the three subsets separately, 7 of the 21 social activity items, 3 of the 16 social role items and 7 of the 12 social activity limitations items showed significant misfit. We concluded that a coherent, interpretable item bank could not be constructed from the tested Ability to Participate item pool. Instead, results were referred back to the PROMIS Social Health Workgroup for their use in refining the item wording and concepts to be measured in a second-round effort (ongoing as of 2010).
For Satisfaction with Participation, the threshold range (1.15 to −1.22) was narrower than the score measure range (−2.68 to 2.02), and slopes were acceptable for all 26 items (range: 2.66 to 4.46). However, 23 of 26 items misfit the unidimensional IRT model. When modeling the two subsets separately (satisfaction with participation in social roles and satisfaction with participation in discretionary activities), there was no item misfit. Item wording and calibration statistics are presented in Table 3.
For the two Satisfaction with Participation sub-domains, no items exhibited non-uniform DIF, and one (“satisfied with ability to do things for fun at home”) exhibited a trivial level of uniform age DIF (chi-square = 5.642; P = 0.018).
Two seven-item short forms were developed (available at http://www.assessmentcenter.net/ac1 and shown in bold in Table 3). Table 4 displays descriptive statistics for the whole sample and for gender, age and education subgroups used in DIF analyses. A higher score represents higher satisfaction. Correlations between short form scores and SF-36 and FACT-G legacy subscales were moderate to high (0.43–0.74), suggesting good evidence of criterion validity.
Figure 2 provides the test information function for the two Satisfaction banks and short forms. Considering 30 as an arbitrary threshold for reliable measurement, which corresponds to a standard error of 0.41, these preliminary item banks provide reliable measurement across a range of ~3 standard deviation units; the short forms provide reliable measurement across ~2.5 standard deviation units. Accurate (reliable) measurement spans both sides of the average (T-score = 50) but covers relatively more of the impaired (poorer social health) side of the average for the general US population.
PROMIS aims to create measures that are valid, easily administered and useful in behavioral, epidemiological, clinical and health services research. Outreach to end users requires a transparent description of the item bank development methods. Our goal here was to describe in detail the processes used to develop Social Health item banks, with a focus on quantitative methods that complement the qualitative methods described elsewhere [20, 21].
PROMIS seeks to build measurement validity throughout all stages of development. Content validity was sought by examining previous models and by collecting patient experiences about their social well-being and limitations [20, 21]. Sub-domain structures emerged from qualitative data, expert consensus and quantitative results of extensive field tests. Although Ability to Participate and its three subdomains were essentially unidimensional (based on EFA and CFA), they did not fit the IRT model, when examined either separately or together. That is, while the content of the items was designed to measure the same construct, the observed data were not consistent with model expectations. As a result, we opted to commit to developing calibrated Ability to Participate item banks in a supplemental PROMIS project that is underway.
Two Satisfaction with Participation sub-domains (Social Roles and Discretionary Activities) met the requirements of unidimensionality and IRT model fit, and small preliminary item banks were created. Of note, results from oncology data were found to be consistent with clinical experiences, that is, individuals tend to experience limitations in leisure before reaching a higher level of social function limitation that extends to the areas of work and family . This confluence of quantitative results and clinical expectations begins to build support for the clinical importance of the social health measures. Additionally, quantitative analyses presented here indicated that items performed uniformly across gender, age and education.
By assessing social health via well-constructed item banks, the entire continuum of the domain can be accurately measured, with floor and ceiling effects minimized. Using the information in the item bank library, researchers can create CATs or short form measures tailored to their populations of interest. Scores on these tailored measures can then be easily compared because the items have been mapped onto a common metric. Likewise, by promoting a comprehensive domain map that is congruent with prevalent health frameworks, PROMIS seeks to provide a common model to be used across lines of PRO research.
Progress on the development and validation of PROMIS Social Health item banks has been significant, yet incremental. Unlike physical and mental health, both of which have long traditions of measurement with hundreds of instruments, social health—particularly as an outcome measure—has historically suffered neglect. Two small Satisfaction with Participation banks (Social Roles and Discretionary Activities) were successfully developed; others in Ability to Participate did not materialize in the first attempt. However, results suggest possible explanations that have been carried forward to planning subsequent work. First, in our attempt to explore alternative phrasing of items with very similar content, we introduced confusion of concepts within the bank and item local dependence. The repetitive nature of the items (with and without modifiers; positive and negative wording) may also have led to more missing data. Second, it is challenging to distinguish where social health variables lie along an outcome continuum. Not all health-relevant social variables are necessarily health outcomes, per se, and deciding which are, may be subject to disagreement and somewhat dependent on the study context.
A related conceptual complexity is that social health outcomes are often influenced by factors other than health, for example, attitudes, personality and finances. Consequently, the degree to which physical or mental health are causes of changes in social functioning may be difficult to determine. Even if respondents were asked whether health factors played a role (e.g., “due to my illness, I could not…”), it may not be possible to provide a meaningful answer. All of these considerations complicate determining when social functioning represents a health outcome. By using these social health item banks, researchers will be able to examine associations between sociodemographic, clinical and behavioral factors and patient-reported outcomes.
Another limitation of our study is that we did not include any clinical samples in the calibration analyses. PROMIS items are intended to reflect the general population across its full functional range. Without oversampling the extremes of the continuum, however, the number of cases representing the most impaired end of the distribution was too few to serve as a basis for item calibration within that extreme range. That is, item responses were probably more skewed than they would have been had we been able to include a sample whose social functioning was more likely to be negatively affected by medical or mental health conditions. Our study sample, although drawn from a registry of more than one million people, may also not be fully representative of the U.S. general population in that it is comprised of people who have agreed to participate in this online registry. However, the degree of skew in the data was not so extreme as to adversely affect the analyses. In these IRT analyses, marginal maximum likelihood (MML) estimation assumes the normal distribution of theta in the population. However, when the assessment is of sufficient length, the effect of departure from normality (skewness) on the parameter estimation is negligible.
We considered the conceptual issues discussed here and elsewhere  as we worked toward a domain framework that includes both Social Function and Social Relationships (Fig. 1). At the broadest level, social health includes health outcomes and social processes (e.g., social support) that play an important role in influencing health. In some contexts, processes such as social support can be outcomes of interventions such as family therapy or forms of psychotherapy. For the PROMIS initiative, we began by focusing on the Social Function sub-domain, which is most commonly measured as a health outcome. Our initial conceptualization identified two broad categories of Social Function outcomes, Ability to Participate and Satisfaction with Participation; both encompass the four contexts of family, friends, work and leisure activities.
Field test results produced two item banks for Satisfaction with Participation: Social Roles and Discretionary Activities (Table 3). However, Ability to Participate items failed to produce a coherent item bank. The implications for future research are that sub-domains of Social Function may need to be much more narrowly defined. We are currently revising the Social Function item pools and creating item pools for the Social Relationships sub-domains (Fig. 1) . We also plan to field test these revised item pools in various clinical samples. Doing so will remedy one of the limitations of the current study, namely, field test results based on only general population data. Future analyses will investigate whether the enhanced item pools produce factor structures in line with our conceptual model and result in item banks that can be used across chronic illnesses.
The Patient-Reported Outcomes Measurement Information System (PROMIS) is a National Institutes of Health (NIH) Roadmap initiative to develop a computerized system measuring patient-reported outcomes in respondents with a wide range of chronic diseases and demographic characteristics. PROMIS was funded by cooperative agreements to a Statistical Coordinating Center (Northwestern University, PI: David Cella, PhD, U01AR52177) and six Primary Research Sites (Duke University, PI: Kevin Weinfurt, PhD, U01AR52186; University of North Carolina, PI: Darren DeWalt, MD, MPH, U01AR52181; University of Pittsburgh, PI: Paul A. Pilkonis, PhD, U01AR52155; Stanford University, PI: James Fries, MD, U01AR52158; Stony Brook University, PI: Arthur Stone, PhD, U01AR52170; and University of Washington, PI: Dagmar Amtmann, PhD, U01AR52171). NIH Science Officers on this project have included Deborah Ader, PhD, Susan Czajkowski, PhD, Lawrence Fine, MD, DrPH, Laura Lee Johnson, PhD, Louis Quatrano, PhD, Bryce Reeve, PhD, William Riley, PhD, Susana Serrate-Sztein, PhD, and James Witter, MD, PhD. This manuscript was reviewed by the PROMIS Publications Subcommittee prior to external peer review. The authors thank Ron Hays, PhD, and Paul Pilkonis, PhD, for helpful suggestions on the final version of the manuscript, and Jacquelyn George for assistance with research coordination. See the web site at www.nihpromis.org for additional information on the PROMIS cooperative group. Presented in part at the International Symposium on Measurement of Participation in Rehabilitation Research, Toronto, Ontario, Canada, October 14–15, 2008.
Elizabeth A. Hahn, Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, 710 N. Lake Shore Dr., Room 725, Chicago, IL 60611, USA.
Robert F. DeVellis, Department of Health Behavior and Health Education, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Rita K. Bode, Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
Sofia F. Garcia, Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, 710 N. Lake Shore Dr., Room 725, Chicago, IL 60611, USA.
Liana D. Castel, Institute for Medicine and Public Health and Vanderbilt Epidemiology Center, Vanderbilt University Medical Center, Nashville, TN, USA.
Susan V. Eisen, Department of Health Policy and Management, Boston University School of Public Health and Center for Health Quality, Outcomes & Economic Research, ENRM Veterans Hospital, Bedford, MA, USA.
Hayden B. Bosworth, Center for Health Services Research, Durham VAMC; Departments of Medicine, Psychiatry, and Nursing, Duke University Medical Center, Durham, NC, USA.
Allen W. Heinemann, Department of Physical Medicine and Rehabilitation, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA.
Nan Rothrock, Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, 710 N. Lake Shore Dr., Room 725, Chicago, IL 60611, USA.
David Cella, Department of Medical Social Sciences, Feinberg School of Medicine, Northwestern University, 710 N. Lake Shore Dr., Room 725, Chicago, IL 60611, USA.