|Home | About | Journals | Submit | Contact Us | Français|
Content validity of patient-reported outcomes (PROs) is evaluated primarily during item development, but subsequent psychometric analyses, particularly for item-response theory (IRT)-derived scales, often result in considerable item pruning and potential loss of content. After selecting items for the PROMIS banks based on psychometric and content considerations, we invited external content expert reviews of the degree to which the initial domain names and definitions represented the calibrated item bank content.
A minimum of four content experts reviewed each item bank and recommended a domain name and definition based on item content. Domain names and definitions then were revealed to the experts who rated how well these names and definitions fit the bank content and provided recommendations for definition revisions.
These reviews indicated that the PROMIS domain names and definitions remained generally representative of bank content following item pruning, but modifications to two domain names and minor to moderate revisions of all domain definitions were needed to optimize fit with the item bank content.
This reevaluation of domain names and definitions following psychometric item pruning, although not previously documented in the literature, appears to be an important procedure for refining conceptual frameworks and further supporting content validity.
Patient-reported outcome (PRO) measurement development has benefited from recent efforts to outline best practices for establishing content validity. These best practices for determining the extent to which an instrument sufficiently represents all facets of the relevant constructs have emphasized the importance of developing a conceptual model that clearly defines the constructs of interest [1,2] and utilizing patient input in the development of item content and the conceptual model [1,3,4]. Best practices also have been offered on the use of qualitative research methodology to obtain patient input for the evaluation of content validity .
The Patient-Reported Outcomes Measurement Information System (PROMIS) developed item banks consistent with current content validity guidance, including conceptual model development and inclusion of the patient perspective. The PROMIS domain framework and definitions resulted from extensive literature review, archival data analyses, and a modified Delphi process with content experts [6,7]. Patient feedback from numerous focus groups refined these conceptual definitions, generated content for new items, and documented saturation of the construct [8,9]. Cognitive interviews solicited feedback on item clarity . Item pools were tested in a large sample of general population respondents, augmented by clinical samples, and item response theory (IRT) methodology was used to select and calibrate items for the PROMIS item banks [11-14].
Item pruning, the elimination of items based on psychometric considerations, is inherent in the development of IRT-based item banks. Items may be removed for a variety of psychometric concerns including local dependence (correlated residuals), differential item functioning (DIF), inadequate unidimensionality, lack of monotonicity, and poor IRT model fit [11,15]. For example, 56 items from the PROMIS depression item pool were tested, but only half (28 items) were retained . Although item pruning is consistent with IRT test development, it is also a potential threat to the content validity of the resulting item bank since item content generated from patient and expert feedback may be lost as items with poor psychometric properties are removed. During the PROMIS item bank selection process, domain content experts worked with psychometricians to minimize loss of representative item content due to psychometric concerns. A few items with less than optimal psychometric properties were retained because they uniquely covered a relevant facet of the domain, but a substantial number of items performed too poorly to be retained and were excluded from the calibrated item banks.
To address the effects of content loss from eliminated items on content validity, the ISPOR PRO Task Force has recommended that patient interviews or focus groups be used to determine the importance of omitted versus retained items . However, given the comprehensive initial item pool development process in PROMIS [7-8], new items generated to replace omitted content deemed important by this approach likely would have similar psychometric limitations as the omitted items. Assuming that new items with potentially better psychometric properties could be generated, the testing and calibration of new items with the existing items in a large and diverse sample of respondents would take considerable time and resources. In the interim, the initial domain names and definitions may not accurately represent the currently retained items, thus potentially misleading users as to the content represented by the current item banks. Therefore, we decided to first revise the domain names and definitions to better represent the retained item banks.
Consistent with the ISPOR Task Force recommendation, PRO guidance regarding content validity has focused predominately on the initial item development phase and on developing items that cover the facets or attributes of the conceptual definition [3-5]. The iterative development of PROs, however, includes modifying not only the item content consistent with the concept, but also the concept consistent with the psychometric findings. The Food and Drug Administration (FDA) PRO development model indicates that psychometric findings may lead to revision of the concept , and the Mayo/FDA PRO Consensus Meeting Group elaborated on this process of concept refinement, indicating that empiric evidence from psychometric analyses should be used to modify the conceptual framework . This complementary and iterative process of conceptual refinement includes not only generating items that cover all of the purported facets of the concept, but also using the psychometric data to revise the concept consistent with the retained items. Therefore, we report in this study a procedure for revising domain definitions consistent with the retained item content by asking content experts to review the item banks and make recommendations about revising the PROMIS domain names and definitions based on the items retained in these banks.
PROMIS domain groups (physical function, pain, fatigue, emotional distress, social health, sleep) identified content experts with experience developing and validating instruments in the domain or conducting clinical research in which the domain is a primary outcome. To participate, experts could not be supported by the PROMIS cooperative agreement, but could have served as consultants to the project in a limited capacity. We identified potential content experts from a number of sources, including developers of legacy scales from whom we had asked permission to use their scales for item pool development and/or concurrent validity testing. Approximately eight experts for each PROMIS domain were identified with the goal of at least four per domain completing the review. This expert feedback was considered exempt from Institutional Review Board review.
The identified content experts were contacted by email, described the study purpose, and asked to review attached item banks and provide online feedback. Nonrespondents were recontacted after approximately one month. As needed, additional experts were contacted until a minimum of four experts in each domain provided responses. Expert reviews were performed independently of each another.
To balance burden and utilize expertise across similar areas, content experts reviewed either one large item bank or multiple smaller item banks as follows:
|Bank A:||Physical Function (124 items)|
|Bank B:||Fatigue (95 items)|
|Banks C & D:||Pain Behavior (39 items), Pain Impact (41 items)|
|Banks E, F, & G:||Depression (28 items), Anxiety (29 items), Anger (28 items)|
|Banks H & I:||Satisfaction with Participation in Social Roles (14 items),|
Satisfaction with Participation in Discretionary Social Activities (12
|Banks J & K:||Sleep Disturbance (27 items), Wake Disturbance (16 items)|
Experts received and reviewed only the banks assigned to them, but all experts also reviewed and provided feedback on the 10 PROMIS Global Health items .
Prior to revealing the existing domain name and definition, experts were asked based on the item bank content alone (i.e. blind feedback) to: a) provide a 1-4 word name for the bank, and (b) describe and define in a few sentences what the item bank measures. The domain names and definitions were then revealed, and experts were asked to respond to the following questions.
All experts then provided feedback on the PROMIS Global Health items.
Experts completed a short sociodemographic questionnaire including information on English language background and years of experience in the domain area. Each expert received a $200 honorarium for participation. No further feedback was obtained. PROMIS Domain Group and Steering Committee Review Procedures.
Representatives from each domain group reviewed and summarized the expert feedback for their respective PROMIS domain groups to consider in revising domain names or definitions. Domain groups considered the quantitative and qualitative feedback subjectively (i.e. no predetermined criteria for revision) and revised domain names and definitions. These revised definitions were presented to the PROMIS Steering Committee for discussion and approval. Definitions were further revised by study investigators (WR, NR) for format and content consistency across domains, and were then reviewed and approved by the respective domain chairs and by the PROMIS Steering Committee.
During the expert review process, the social domain group received supplemental funding to further develop and test social domain items. As a result, the social domain deferred any name or definition revisions until further item bank development was completed. Therefore, social domain expert feedback is included only for the global items.
Thirty-five participants, 23 males and 12 females, provided expert review. One was Hispanic and one was African-American. Twenty-eight were Ph.D.s and seven were M.D.s. Participants had a mean of 27 (SD = 9) years of experience in the domain area. Thirty-four of the 35 indicated English as their first language (see Table 1). Eighty-three percent either had no prior contact with PROMIS (21/35) or had only contributed legacy items or scales to PROMIS (8/35). The remaining 6 experts had served as consultants to PROMIS in some limited capacity.
Blinded to the “physical function” domain name, three of the six expert reviewers provided “physical function”, two provided “physical activity”, and one provided “PROMIS Health Assessment Questionnaire” as the domain name. After unblinding, five indicated that the name reflected the item bank “very much” (1 rated “quite a bit”). Blinded definitions for this bank included, “Capacity to do a large number of physical activities,” “wide range of usual daily physical activities, exercise, and household chores,” and “measures the participant’s current ability to perform particular tasks that involve use of limbs or core and coordinated movements.” Two of the experts indicated that there was item content not adequately reflected in the physical function definition, with one commenting that the bank was missing goal attainment scaling. Although asked how the definition could be expanded to fully capture the item bank content, the experts in all domains often responded instead with how the item content could be expanded to fully capture their definition of the domain. None of the respondents reported that the physical function definition needed to be narrowed to accurately reflect the bank content. (For initial definitions reviewed by the content experts, see Table 3).
Of the six experts, three provided “fatigue”, two provided fatigue plus descriptors (e.g. “fatigue assessment scale”, “fatigue frequency and severity”), and one provided “PROMIS Item Bank B” as the domain name. After unblinding, four of the six indicated that the name reflected the item bank “very much” (1 “quite a bit”, 1 “somewhat”). Blinded definitions included, “assessment of symptoms of subjective fatigue and excessive tiredness, “fatigue severity and fatigue interference”, and “supposed to assess fatigue but confuses tiredness, exhaustion, and sleepiness in this construct.” One expert indicated that the item content was not adequately reflected in the fatigue definition, but repeated the concern noted above about the definition being too broad. One respondent noted that the fatigue definition needed to be narrowed to accurately reflect the bank content, specifically noting that duration of fatigue is not as well represented in the bank as frequency of fatigue.
The four experts provided “pain behavior,” “pain responses scale,” “pain effects,” and “pain-related affective distress” as the domain name. After unblinding, two of the four indicated that the name reflected the item bank “very much” (1 “quite a bit”, 1 “a little bit”). Blinded definitions included, “how an individual with pain responds to pain,” “observable behavior associated with pain,” “pain-related affective distress,” “broad set of behaviors that patients may express when in pain.” Two of the experts indicated that the item content was not adequately reflected in pain behavior definition. One noted that “pain behavior” is not familiar to many pain specialists, and another indicated that the definition needed to represent a balance of expressing, avoiding, minimizing, and reducing pain. None of the respondents reported that the pain behavior definition needed to be narrowed. One expert recommended less prominence of “pain behavior as communication” since communication is not always understood broadly to include inadvertent, unintentional, and/or unrecognized communication.
Of the four experts, three provided “pain interference”, and one provided “pain-related interference with functioning” as the domain name. After unblinding, one of the four indicated that the name reflected the item bank “very much” (3 “quite a bit”). Blinded definitions included, “various aspects of pain-related interference with functioning,” “degree to which pain interferes with various aspects of a patients [sic] life,” “how pain interferes with various activities and states,” “pain interference with daily activities.” One expert indicated that the item content was not adequately reflected in the definition but stated that “the definition seems broad enough.” Two respondents indicated that the pain impact definition needed to be narrowed to accurately reflect the content of the bank, noting only one sleep item in the bank.
Of the nine experts, five provided “depression” as the domain name. The others provided “dysphoric mood”, “components of depression”, “depression symptoms”, and “depression, discouragement, demoralization.” After unblinding, five indicated that the name reflected the item bank “very much” (2 “quite a bit”, 2 “somewhat”). Blinded definitions included, “depressive symptoms and mood but without vegetative symptoms,” symptoms of depressed mood and the trait of negative affectivity,” and “physical, emotional, and social aspects of depression with emphasis on feeling states, withdrawal from others, and pessimism about the future.” Four experts indicated that the item content was not adequately reflected in the definition, noting that the bank emphasized negative affect much more than positive affect and that the definition could better reflect this emphasis. Two respondents indicated that the definition needed to be narrowed to accurately reflect the content of the bank, with one noting that there was only one item related to indecisiveness despite a reference to “information-processing deficits” in the definition.
Of the nine experts, six provided “anxiety” in some derivation (e.g. “general fear and anxiety”, “anxiety and fear,” “anxiety symptoms”) as the domain name. Three respondents included “panic” or “phobic” in the anxiety name provided. After unblinding, five indicated that the name reflected the item bank “very much” (3 “quite a bit”, 1 “somewhat”). Blinded definitions included, “anxiety, negative emotions, worry, and panic/fear,” “emotional/affective and physical components of anxiety,”, and “anxious/fearful mood and somatic symptoms of anxious arousal.” Seven indicated that the item content was not adequately reflected in the definition, and comments focused primarily having only one behavioral avoidance item (“I avoided public places and activities”) in the bank. Two respondents indicated that the definition needed to be narrowed to accurately reflect bank content, and reiterated the underrepresentation of behavioral avoidance.
All nine respondents provided some derivation of “anger” (e.g. “anger and irritability”, “anger and hostility”, “anger and frustration tolerance”) as the domain name. After unblinding, eight indicated that the name reflected the item bank “very much” (1 “quite a bit”). Blinded definitions included, “basic emotion of anger/hostility,” “tendency toward angry, frustrative affect”, and “perceptions of anger, irritation, resentment, and frustration.” One expert indicated that the item content was not adequately reflected in the definition, and that angry behavior appeared underemphasized relative to covert anger. None of the respondents reported that the definition needed to be narrowed.
Of the five experts, three provided some derivation of “insomnia or sleep disturbance,” and two provided “sleep quality” as the domain name. After unblinding, one of the five indicated that the name reflected the item bank “very much” (3 “quite a bit”, 1 “somewhat”). Blinded definitions included “all aspects of insomnia symptoms including concern about sleep and cognitive arousal,” “nature, extent, and severity of difficulties sleeping at night with an attempt to capture the psychological underpinnings of these difficulties,” and “sleep quality – the cognitive, emotional, and restorative aspects of the sleep experience.” None of the experts reported that the item content was not adequately reflected in the definition or that the definition needed to be narrowed to accurately reflect item content.
Of the five experts, three provided some derivation of “sleep-related daytime functioning,” and two provided “daytime impairment” or “daytime functioning” as the domain name. After unblinding, one of the five indicated that the name reflected the item bank “very much” (2 “quite a bit”, 2 “somewhat”). Blinded definitions included “negative consequence of disturbed sleep focusing on attention, cognition, and mood,” “impact of the loss of sleep and disturbed sleep on the ability to conduct daily activities and mental health,” and “nature, extent, and severity of daytime functioning that may be impaired following a night with sleep difficulties.” Two indicated that the item content was not adequately reflected in the definition, but the qualitative comments reflected item content that they believed should be in the item bank such as physical performance and interpersonal relationships. None of the respondents indicated that the definition needed to be narrowed.
Ten responded “Quite a bit” and 22 responded “Very much” (Mean rating = 4.7, SD = 0.5) for how well the Global Health definition reflected the item content. Most comments were positive (e.g. “Fine as is”, “The name and definition are excellent”, “Seems quite on target and germane”). One expert suggested that “overall health” or “overall health and well-being” might be a better name, and one expert suggested that the definition should more clearly indicate that some of the mental and social items do not exclusively relate to the health impact. Two experts noted that spirituality or spiritual health was missing from the content of the global items.
Ratings of name and definition fit by domains are summarized in Table 2.
Consideration of the expert feedback by PROMIS domain groups and steering committee resulted in two domain name changes. The sleep domain group shared expert reviewer concerns with the name “Wake Disturbance”, and changed the domain name to “Sleep-related Impairment.” The pain domain group and the PROMIS SC believe that “Pain Impact” is a broader and more appropriate name for this domain, but the name was changed to “Pain Interference” in response to expert feedback and the acknowledgment that most of the pain research community associates “pain interference” with the content reflected in this bank.
All other PROMIS domain names remained unchanged, although there was considerable debate about changing “depression” to “depressive symptoms” since items related to somatic or vegetative symptoms had been removed from the bank due to poor psychometric fit. The primary concern was that the domain name “depression” might infer that the bank is measuring depression as a psychiatric diagnosis. Technically, “depression” is not the formal name of any psychiatric diagnosis , and in common usage the term “depression” describes both clinical and subclinical states of sadness and dysphoria. Based on these considerations, the PROMIS SC decided to retain the domain name “depression” and address the absence of somatic or vegetative symptom content in the domain definition.
PROMIS domain groups revised their domain definitions based on expert feedback. Definitions were further revised for consistency and were approved by the domain chairs and PROMIS SC. The initial and revised definitions are provided in Table 3.
To identify potential effects of psychometric pruning on content validity, external experts provided feedback on the congruence of the PROMIS domain names and definitions with the item bank content. This feedback indicated that the domain names and definitions remained generally representative of bank content following psychometric pruning, but that minor to moderate domain name and definition revisions were necessary to better represent item bank content, including specifying underrepresented or missing content that experts expected to be present. These findings illustrate that the review and revision of domain names and definitions following psychometric pruning and item calibration is an important step in supporting the content validity of PROs. Although the item development process is the primary source of content validity evidence, subsequent item pruning can result in loss of item content in the resulting bank. Therefore, modifying the conceptual definition to be consistent with the resulting bank content appears to be an important additional component in establishing content validity, particularly for IRT-derived banks that involve considerable item pruning. Recent PRO guidance includes this conceptual model modification step, but we believe this is the first documentation of a standardized procedure for modifying domain names and definitions of IRT-derived item banks to ensure they accurately reflect the item bank content.
Instead of modifying domain definitions to match item bank content, we could have modified item bank content to match domain definitions. The latter approach is the accepted standard during item development, and the qualitative responses of some experts in this study suggested that they would have preferred to modify item content even when the prescribed task was to modify the definition. Patient and expert feedback on the importance of the omitted items to the domain of interest could have been obtained as per ISPOR PRO Task Force recommendations ; however, as noted earlier, this approach is only the first step in a time and labor intensive process of adding new item content to an existing item bank. In the interim, we chose to solicit feedback from content experts to ensure that the domain names and definitions clearly conveyed the content of the banks to clinical researchers and practitioners. Refining the item content based on the conceptual definition and refining the conceptual definition based on psychometric findings are complementary and iterative.
When major deviations from the hypothesized conceptual model are found, it may be appropriate to focus first on generating and testing additional item content before attempting to revise the conceptual definitions to match the retained items. In our case, the factor structure of the PROMIS social domain items did not fit well with our hypothesized conceptual framework, so instead of revising the concept, we chose first to generate and test additional items in this domain. For most item bank development, however, seeking expert feedback to revise the conceptual definitions following item banking can ensure optimal fit between the domain definition and retained item bank content.
Several improvements to this domain name and definition review procedure should be considered. First, a small percentage of participants had prior experience with the PROMIS initiative, and including only “independent” experts could minimize response bias. However, even those without prior PROMIS experience were likely exposed to PROMIS information. Therefore, it is unclear how many participants were truly “blind” to the current domain names and definitions, but obtaining domain name and definition input before revealing them appears to have reduced their influence given the range of responses provided. Second, we arbitrarily set a minimum of four expert responses per domain area, but the number of expert responses required for this task is unclear. Setting the number based on a saturation threshold similar to PRO patient focus groups and interviews procedures  may have produced clearer direction for revisions. Third, in addition to asking experts about revising definitions based on item content, we also could have asked about important content omitted from the bank, thus providing direction for future additional item development. Fourth, to obtain feedback efficiently on 11 item banks and the global items, we chose online feedback, but interviews or other interactive methods (e.g. focus groups) could provide richer and more detailed feedback. Finally, in the absence of prior literature, we chose a subjective appraisal of the expert feedback to determine if and how domain names and definitions should be revised. Using the rating information presented in this study, future research may be able to set apriori criteria for determining if a domain name or definition should be revised.
Content expert feedback resulted in improved PROMIS domain names and definitions that more closely match the item content of the calibrated item banks; however, these revised conceptual definitions are likely narrower than the conceptual definitions of some researchers, clinicians, and patients. Consistent with an iterative PRO development model (2-4, 18], the mismatch of a resulting item bank to the initial conceptual definition is an opportunity to better define the content measured by a psychometrically sound item bank and to revise the conceptual framework based on the empiric data. The combination of psychometric pruning of item banks and content expert feedback to revise the names and definitions of these banks provides the basis for iterative conceptual model refinement, narrowing some conceptual definitions while illuminating attributes or facets that might be better conceptualized as a related but separate domain.
The Patient-Reported Outcomes Measurement Information System (PROMIS) is a National Institutes of Health (NIH) Roadmap initiative to develop a computerized system measuring patient-reported outcomes in respondents with a wide range of chronic diseases and demographic characteristics. PROMIS was funded by cooperative agreements to a Statistical Coordinating Center (Northwestern University, PI: David Cella, PhD, U01AR52177) and six Primary Research Sites (Duke University, PI: Kevin Weinfurt, PhD, U01AR52186; University of North Carolina, PI: Darren DeWalt, MD, MPH, U01AR52181; University of Pittsburgh, PI: Paul A. Pilkonis, PhD, U01AR52155; Stanford University, PI: James Fries, MD, U01AR52158; Stony Brook University, PI: Arthur Stone, PhD, U01AR52170; and University of Washington, PI: Dagmar Amtmann, PhD, U01AR52171). NIH Science Officers on this project have included Deborah Ader, PhD, Susan Czajkowski, PhD, Lawrence Fine, MD, DrPH, Laura Lee Johnson, PhD, Louis Quatrano, PhD, Bryce Reeve, PhD, William Riley, PhD, Susana Serrate-Sztein, MD, and James Witter, MD, PhD. This manuscript was reviewed by the PROMIS Publications Subcommittee prior to external peer review. See the web site at www.nihpromis.org for additional information on the PROMIS cooperative group.
William T. Riley, National Heart Lung and Blood Institute.
Nan Rothrock, Northwestern University.
Bonnie Bruce, Stanford University.
Christopher Christodolou, Stony Brook University.
Karon Cook, University of Washington.
Elizabeth A. Hahn, Northwestern University.
David Cella, Northwestern University.