|Home | About | Journals | Submit | Contact Us | Français|
The Patient Reported Outcomes Measurement Information System (PROMIS) aims to develop patient-reported outcome (PROs) instruments for use in clinical research. The PROMIS pediatrics (ages 8–17) project focuses on the development of PROs across several health domains (physical function, pain, fatigue, emotional distress, social role relationships, and asthma symptoms). The objective of the present study is to report on the psychometric properties of the PROMIS Pediatric Anger Scale.
Participants (n=759) were recruited in public school settings, hospital-based outpatient and subspecialty pediatrics clinics. The anger items (k=10) were administered on one test form. A hierarchical confirmatory factor analytic model (CFA) was conducted to evaluate scale dimensionality and local dependence. Item response theory (IRT) analyses were then used to finalize the item scale and short form.
CFA confirmed that the anger items are representative of a unidimensional scale and items with local dependence were removed resulting in a six-item short form. The IRT-scaled scores from summed scores and each score’s conditional standard error were calculated for the new six-item PROMIS Pediatric Anger Scale.
This study provides initial calibrations of the anger items and creates the PROMIS Pediatric Anger Scale, version 1.0
The Patient Reported Outcomes Measurement Information System (PROMIS) project, a National Institutes of Health (NIH) Roadmap for Medical Research initiative, was developed to advance the science and application of patient-reported outcomes (PROs) among patients with chronic diseases . One primary goal of the PROMIS initiative is to develop a set of patient-reported items for use in clinical research. The development process utilized modern psychometric methods including item response theory (IRT), to analyze and select the most informative items.
The PROMIS Pediatric project focused on the development of PROs to assess quality of life across several generic health domains for youth ages 8–17 years. These domains are important across a variety of pediatric chronic illnesses, and include physical function, pain, fatigue, emotional distress, and social function [3, 11, 45–48].
Emotional distress commonly refers to unpleasant feelings or emotions that are experienced privately and, therefore, are good candidates for assessment as PROs. Emotional distress among children is partially comprised of feelings of anxiety, depression, and anger . Several studies have shown these three components (anxiety, depression and anger) of emotional distress to be unidimensional constructs . Previously, we reported on the psychometric properties of the NIH PROMIS Pediatric Depressive Symptoms and Anxiety Scales . The emotional distress anger domain is the focus of the present report.
Three modalities of anger have been recognized: cognitive (appraisals), somatic-affective (tension and agitations) and behavioral (withdrawal and antagonism). The external expression of anger can be found in facial expressions, body language, physiological responses, and at times in acts of aggression . The PROMIS pediatric item scale for anger focuses on the behavior component including angry moods (e.g., irritability and reactivity), and aggression (verbal and physical).
PROMIS Pediatric items across domains were developed using a strategic item generation methodology adopted by the PROMIS Network. Six phases of item development were implemented: identification of existing items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. [3, 6, 7, 8] Items successfully screened through the process were sent to field testing (n=10 anger items).
Only a limited number of generic self-report health-related quality of life (HRQOL) instruments exist for use in pediatric populations and while most attempt to measure at least some aspect of emotional distress; they do not typically include an anger-specific domain [9, 10]. PROMIS psychometric analyses focus on determining scale dimensionality and detecting sources of local dependence (LD) using CFA methods and selecting final items and testing for differential item functioning (DIF) using IRT analyses. The primary objective of the present paper is to describe the IRT analyses of the PROMIS pediatric anger items and the measurement properties of the new PROMIS Pediatric Anger Scale that resulted from these IRT analyses, including investigations of scale dimensionality, item fit, sources of local dependence, and DIF.
Participants from North Carolina and Texas were recruited in hospital-based outpatient general pediatrics and subspecialty clinics and in public school settings between January 2007 and May 2008. To be eligible to participate in the large-scale testing survey, subjects were required to meet the following inclusion criteria: between the ages of 8 to 17 years old; able to speak and read English; and able to see and interact with a computer screen, keyboard, and mouse. Parental informed consent and minor assent were obtained for all children taking the survey. The study received IRB approval from regulatory boards at participating institutions. A more detailed description of the survey methods and the study population has been published previously .
The PROMIS anger items were administered to 759 respondents. The sampling plan was developed for collecting responses to candidate items from all of the targeted PROMIS domains and accommodated multiple objectives including: (1) confirm the factor structure of the domains; (2) evaluate items for (LD and DIF; and (3) calibrate the items for each domain using IRT. A more detailed description of the sampling plan is described elsewhere .
All of the anger items had a 7-day recall period and used standardized 5-point response options (never, almost never, sometimes, often, almost always). Table 2 shows the anger items administered during the testing.
Data analysis followed the sequence of procedures presented by Reeve et al.  in their description of plans for psychometric evaluation and calibration of HRQOL items for PROMIS. First, traditional descriptive statistics were computed, as a check on data entry and validity, and to verify that there were no empty (zero frequency) response categories for any item. These statistics included the frequencies and proportions in each item response category and the correlation of the item scores with the total summed score.
Second, to determine the extent to which the anger items measure a construct that is distinct from constructs measured by other types of questions indicating emotional distress, and as a check on the unidimensionality of the anger subset of items, the dimensionality of individual differences on all of the emotional distress items (designed to measure anger, anxiety, and depressive symptoms) was examined using a hierarchical confirmatory factor analysis (CFA) of the inter-item polychoric correlation matrix. These analyses were performed using the “weighted least squares with robust standard errors, mean- and variance-adjusted” (WLSMV) algorithm  as implemented in the software Mplus . Respondents with missing item responses were set aside for this analysis (“listwise deletion”). Additional factors fitted over and above those indicated by the design of the questionnaire and residual correlations significantly greater than zero served as indices of LD for pairs or small numbers of items that violated the LD assumption of unidimensional IRT . If a pair of items exhibited LD, one item from the pair was set aside.
Third, within the sets of items for which unidimensionality had been confirmed using CFA, the items were “calibrated” by fitting Samejima’s Graded Response Model [15, 16] using the software Multilog . This model characterizes each item with a slope or discrimination parameter (a), that reflects the degree of association of the item responses with the latent construct being measured, and four threshold parameters (bk) (for five-alternative items), that indicate the level of anger at which a response in a particular category or higher becomes likely. This model has been selected for the NIH PROMIS scales . The goodness of fit of the IRT model to the data was examined using Orlando and Thissen’s [18, 19] S-X2 statistic as generalized by Bjorner et al  for polytomous response data. Because S-X2 is a goodness of fit statistic, a nonsignificant value indicates adequate fit of the model to the data; significant values suggest close examination of the tables of response frequencies classified according to summed scores on the other items to identify the source of misfit. For the IRT item calibration, and for the IRT DIF analysis in the fourth step (below), missing item responses were treated as missing at random.
Fourth, the possibility of DIF was investigated for each item using the IRT-LR DIF detection procedure  as implemented in the software IRTLRDIF . DIF indicates that the relation of the item responses with the latent variable being measured differs between two (most often demographic) groups. Such a difference implies that some other factor, related to group membership but different from the construct being measured, had an influence on the item responses, violating the IRT assumption of unidimensionality. In the present data, the only background variable that divides the sample into two groups that are sufficiently large to examine DIF is gender, so the DIF analysis was done separating the data into responses from boys and girls. In addition, some DIF analyses were done examining age groups of younger (ages 8–12) and older (ages 13–17) children. IRT-LR DIF detection provides a χ2-distributed test statistic; again, a nonsignificant value is the desirable outcome, indicating a lack of detectable DIF. We used the Benjamini-Hochberg [23, 24] procedure to control for the multiplicity of comparisons involved in checking each item for DIF using = 0.05, and graphical methods, as suggested by Steinberg and Thissen  to evaluate effect size when DIF was detected.
Fifth, after the final item pools were selected, confirmatory factor analysis (CFA) of the interitem polychoric correlation matrix among the remaining, selected items was used to ensure that the latent variables underlying the item responses for the anger items were unidimensional in the final item pools. These analyses were performed using the WLSMV algorithm as implemented in the software Mplus . Respondents with missing item responses were set aside for this analysis (“listwise deletion”). An additional three-factor correlated simple-structure CFA model was used to estimate the “disattenuated” correlations among the latent variables for Anger, Anxiety, and Depressive Symptoms.
Finally, IRT scores for the scales are based on the graded response model (GRM) parameters after the scales are assembled . All IRT-based scores are relative to some reference group ; in this case the reference group is the subset of the sample from the NC site. While IRT-scaled scores may be based either on item response patterns or summed scores, we expect most often scale scores based on summed scores will be used; score translation tables for that purpose are provided in the Appendix.
The anger items were among a set of emotional distress items (that also included depressive symptoms and anxiety items) completed by 759 respondents between the ages of 8 to 17 were sampled. Fifty-nine percent of respondents were between 8 to 12 years old and 60% were Caucasian. Nineteen percent of the sample was of Hispanic ethnicity and approximately 21% of the children participating in the survey had a chronic illness diagnosis during the past 6 months (Table 1). The vast majority of the adults providing informed consent for the children were the parents and 27% of the adults providing consent had a high school education or less.
A CFA model was fit to depressive symptoms, anxiety, and anger items. The augmented bi-factor model contains factor loadings on the general factor for all items, group-specific loadings for each domain (anger, anxiety, or depressive symptoms), and a set of loadings or residual correlations which identify sources of local dependence (Table 2). This model serves two purposes: 1) the model establishes whether anger is representative of a separate individual differences latent construct, or if variation among the item responses is indicative of a single negative affect dimension, and 2) by identifying LD in the CFA, item calibrations may be conducted with unidimensional subsets of items. Indices of goodness of fit, as suggested by Reeve , indicate that the augmented bifactor model fits the data well, χ2(119) = 358, CFI = 0.935, TLI = 0.983, RMSEA = 0.060.
The large, non-zero loadings on the anger-specific factor indicate that the covariation among anger item responses is distinct from the covariation among anxiety and depression items. The bifactor model also identified subsets of locally dependent anger items. The subfactor labeled “Triplet 1,” contains three items similar in wording and content. Two additional “doublets” are modeled with residual correlations. Taken together, these findings indicate the possibility of constructing a unidimensional anger scale, possibly setting aside items that exhibit LD.
To avoid calibrating items with known dependencies, two separate calibrations were completed . Each calibration contained the non-LD items along with a single item from the triplet (the item “I was so angry I felt like breaking things” was set aside from the triplet prior to calibration). This procedure resulted in two sets of item parameters for each non-LD item, and from each we conservatively selected the set of item parameters which contained the lower slope (a). The values of item parameters, item fit statistics (S-X2), and LR-DIF statistics for the nine items are ordered by the magnitude of the slope parameter in Table 3; the generally best items appear towards the top of the table. The S-X2 values reach significance for several of the items. Careful examination of the tables of response frequencies classified according to summed scores on the other items reveals that these significant statistics are due entirely to deviations in a very small number of cells, without any pattern suggesting global misfit of the item response model. Examples are somewhat too many observed “0” responses when the sum of the (other) items’ scores is zero, which commonly occurs when respondents tend to choose the same response for all items somewhat more than the IRT model would predict, or a randomly located cell in the table with four observed responses and an expected value close to 1.0. The test statistic is sensitive to such features of the data, although they are not meaningfully interpretable.
Table 3 contains the six items that comprise the anger scale. Final items were selected by setting aside the less discriminating item from each locally dependent pair of items. To validate these steps, a one-factor CFA was fit to the six-item scale. This model fit the data well, indicating that the six-items are acceptably fitted with a unidimensional model, χ2(8) = 39, CFI = 0.979, TLI = 0.981, RMSEA = 0.074. Item-total correlations for the six item anger scale ranged from r = .49 to .59.
The six-item anger scale contains two items with significant gender DIF after using the Benjamini-Hochberg correction for multiplicity: “I was so angry I felt like throwing something” had higher scores than expected for boys (i.e., boys were more likely to endorse this item than mean and variance differences between gender would anticipate), and “I felt upset” had higher (conditional) scores for girls. Figure 1 plots the expected score curves for boys and girls using item parameters for the four non-DIF items and the gender-specific parameters for the two DIF items. The figure illustrates the degree to which these DIF items counterbalance (i.e., DIF cancellation) . In addition to gender DIF, we considered DIF between younger (ages 8–12) and older (ages 13–17) children, and identified a single item, “I felt fed up”, which exhibited DIF after Benjamini-Hochberg correction. The “significant” DIF was largely a-DIF, with the discrimination parameter estimated to be 1.72 for the older children as opposed to 0.99 for the younger children. Neither estimate differed sufficiently from the common estimate of 1.31 to justify exclusion of the item.
After the final PROMIS pediatric Anger, Anxiety, and Depressive Symptoms Scales were constructed, a correlated simple-structure CFA model was fitted to the final item sets for the three scales simultaneously to estimate the correlations among the latent variables. The correlations between Anger and Anxiety and Depressive Symptoms Scales were 0.66 and 0.77 respectively; the correlation between Anxiety and Depressive Symptoms Scales was 0.84.
Figure 2 shows the test information function for the six-item PROMIS Pediatric Anger Scale on a T-score scale with a mean of 50 and standard deviation of 10 (the standard metric for PROMIS scales). Test information is the expected value of the inverse of the squared standard error of measurement, which is an indicator of score precision. A standard error of 0.45 on the standardized scale, or 4.5 on the T-score metric, is associated with an information value of nearly 5 and hence a reliability coefficient of approximately 0.8. Anger scores between approximately 40 and 80 have information values greater than 5 on the T-score metric (Figure 2) and standard errors less than 4.5 (Table A1). This indicates that for the approximately 84% of respondents in the general population with anger scores that exceed 40 on the T-score scale, the IRT standard errors correspond to those that would be obtained with a scale with reliability of 0.8. The six-item PROMIS Pediatric Anger Scale listed in the Appendix contains the IRT-scaled scores from summed scores and each score’s conditional standard error. The items and score translation table are available at www.nihpromis.org.
This study describes the development of the new PROMIS Pediatric Anger Scale based on IRT analyses regarding scale dimensionality, item local dependence and differential item functioning. After determining scale dimensionality, items with LD were next identified and removed resulting in the final PROMIS Pediatric Anger Scale, allowing a variety of possible scoring options that can be tailored to meet the objectives of most clinical research studies.
Two items that exhibited DIF between boys and girls (“I was so angry I felt like throwing something” which had higher scores for boys and “I felt upset” which had higher scores for girls) were included in the final short form. Used together, the DIF for these two items counterbalances almost exactly (see Figure 1). The expected summed scores for boys and girls for any level of underlying latent anger are nearly identical. It would not be recommended to construct an even shorter form that includes only one of these two items, but not the other, as such a shorter form may exhibit bias between boys and girls. However, when both items are used they increase precision of measurement over what it would be if they were both omitted.
In addition to investigating gender DIF, we subsequently considered DIF between younger (ages 8–12) and older (ages 13–17) children, and identified a single item, “I felt fed up”, which exhibited DIF after Benjamini-Hochberg correction. This DIF was mainly due to differences in IRT discrimination parameter between age groups. Use of a common discrimination parameter will lead to overestimation of test precision in the 8–12 year age group. The problem is minor, but can be solved by excluding the item. The six-item scale is available from the NIH PROMIS Assessment Center at www.nihpromis.org, and this site allows the researcher to exclude items from the scale. Several generic self-report HRQOL instruments exist for use in pediatric populations and most attempt to measure at least some aspect of emotional distress. However, these instruments typically do not have an anger specific domain . Those that do exist are either typically not child self-reports and/or utilized classical test theory rather than taking advantage of IRT analysis in the scale development process [41, 42, 43, 44]. PROMIS psychometric analyses focus on determining the scale dimensionality and detecting sources of LD and considered final item selection using IRT analyses. Like PROMIS, two of these newer instruments, KIDSCREEN and PedsQL, utilized qualitative research methods for incorporating the child’s perspective during the development process [30, 31].
One major challenge prior to applying IRT models to the measurement of emotional distress is resolving issues of dimensionality. Conventional wisdom is that emotional distress scales are less likely to fit unidimensional models . Often items are sampled from multiple domains (e.g. mood, behavior, somatic symptoms) in order to capture a comprehensive set of latent construct indications. Hence, it is common to observe higher correlations within domains than is expected under the conditional independence assumption of unidimensional IRT models . One of the initial steps for this project was to develop multidimensional conceptual frameworks that were informed by previous empirical (e.g., factor analytic) and theoretical work as well as to determine the level of resolution at which unidimensional scales could be derived from the domains [3, 6, 7, 8]. Three constructs of emotional distress were conceptualized: depressive symptoms, anxiety and anger. These results of unidimensionality are consistent with a recent meta-analysis  and other published studies [35, 36, 37, 38, 39, 40].
The study population was utilized for testing of all of the PROMIS Pediatric items. Hence, we did not sample specifically the entire range of the anger latent trait and this may be a limitation. Instead we enrolled a large diverse sample of children from community and clinical settings  and we anticipate that we have good coverage across most of the important traits. Future studies should evaluate these items specifically in children recruited from behavioral or anger management programs. The PROMIS pediatric item scale for anger focuses on angry moods and aggression. Other scales focus on these components but may also contain other subdomains such as social skills with peers and authority figures [41–44].
The PROMIS scales provide separate scores for depressive symptoms, anger and anxiety, the PedsQL Emotional Functioning Scale also includes items that indicate depression, anxiety, and anger while the KIDSCREEN Moods and Emotions scale largely measures depressive symptoms, with one item that may indicate anxiety. It also remains a question for future validity studies to determine the usefulness of separate scores for depressive symptoms, anger and anxiety: Though these constructs are highly correlated, they may be differentially predictive or responsive to a particular treatment. In addition, there may be gender differences that might occur. The separate scores of the PROMIS pediatric emotional distress measures permit study of those questions.
Utilizing IRT analysis to identify final items ultimately offers more flexibility for future users of these items. This approach allows researchers the opportunity to select the most useful items for their study design. We proposed a 6-item anger scale; however, a smaller subset of items can also be used and scored on the same metric as the larger set.
The PROMIS pediatric PROs were developed to provide accurate and efficient assessment of important domains of HRQOL for children, including anger. This sample provides initial calibrations of the PROMIS pediatric anger items and the creation of the corresponding PROMIS Pediatric Anger Scale, version 1.0.
We would like to acknowledge the contribution of Harry A. Guess, MD, PhD to the conceptualization and operationalization of this research prior to his death.
This work was funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant 1U01AR052181-01, and by SBIR contract HHSN-2612007-00013C with the National Cancer Institute of the National Institutes of Health. Information on the Patient-Reported Outcomes Measurement Information System (PROMIS) can be found at http://nihroadmap.nih.gov/ and http://www.nihpromis.org.
Listed below are the item stems for the six-item PROMIS Pediatric Anger Scale. All items use a 7-day recall period (the preface is “In the past seven days”), and a 5-point response scale with the options never (0), almost never (1), sometimes (2), often (3) and almost always (4).
Summed score to scale score translation for these short forms is in Table A1.