|Home | About | Journals | Submit | Contact Us | Français|
Recently, the National Institutes of Health Roadmap for Medical Research initiative led a large-scale effort to develop the Patient-Reported Outcomes Measurement Information System (PROMIS). PROMIS’s main goal was to develop a set of item banks and computerized adaptive tests for the clinical research community. Asthma, as the most common chronic childhood disease, was chosen for a disease-specific pediatric item bank.
The primary objective of this research is to present the details of the psychometric analyses of the asthma domain items.
Item response theory (IRT) analyses were conducted on a 34–asthma item bank. Test forms containing PROMIS Pediatric Asthma domain items were completed by 622 children ages 8 to 12. Items were subsequently evaluated for local dependence, scale dimensionality, and differential item functioning.
A 17-item pool and an 8-item short form for the new PROMIS Pediatric Asthma Impact Scale (PAIS) were generated using IRT. The recommended 8-item short form contains the item set that provides the maximum test information at the mean (50) on the T-score metric. If more score precision is required, the complete 17-item pool is recommended and may be used in toto or as the basis of a computerized adaptive test (CAT). A shorter test form can also be created and scored on the same scale.
The present study presents the PROMIS Pediatric Asthma Impact Scale (PAIS) developed with IRT, and provides the initial calibration data for the items.
Asthma is the most common chronic childhood disease and it imposes a significant negative impact on the health, quality of life, and finances of those affected. In 2007, over 9.5 million children (13%) had ever been diagnosed with asthma, and almost 6.2 million (9%) currently have asthma (1). Using recent estimates of pediatric asthma medical expenditures ($1004.60 per child) in 2007 U.S. dollars (2), the annual direct medical cost for pediatric asthma is approximately $6.2 billion dollars. Including adults with asthma increases the cost estimate to $37.2 billion in 2008 U.S. dollars, representing a substantial proportion of healthcare costs in the United States.
With the large burden of asthmatic disease in the population comes the need to evaluate the effectiveness and potential adverse effects of pharmacologic agents, therapies, interventions, and disease management strategies. Functions of assessment and monitoring are closely linked to the concepts of severity, control, and responsiveness to treatment (3). Notably, in the 2009 National Heart, Lung and Blood Institute (NHLBI) Strategic Plan for Pediatric Respiratory Disease Research Report, one of the seven priority areas for research was “improved assessment, diagnosis, and treatment of pediatric respiratory diseases” (4). Better treatment and management lead to an improved quality of life for children and adults with asthma (5).
New asthma treatments and interventions are evaluated using asthma quality of life scales and questionnaires. Over the past several decades, researchers began a formal development of asthma-specific quality of life scales and questionnaires such as those by Juniper et al. (6), Creer et al. (7), and Hyland et al. (8). In subsequent years, the number of asthma symptoms and quality of life scales expanded considerably. These included the Pediatric Quality of Life Inventory Generic Core Scales and Asthma Module (PedsQL) (9, 10), the Pediatric Asthma Diary (11–13), DISABKIDS Asthma Module (14, 15), Marks Asthma Quality of Life Questionnaire (16), Merck’s Asthma Therapy Assessment Questionnaire (17), and the Integrated Therapeutics Group Child Asthma Short Form (18). The most recent asthma questionnaires to be developed focus specifically on asthma control: the Childhood Asthma Control Test (19) and Test for Respiratory and Asthma Control in Kids (TRACK) (20).
All of the aforementioned asthma-specific quality of life scales were developed using classical test theory. In classical test theory, the item and scale statistics apply only to the specific group of subjects who took the test (21). Thus, the scales need to be validated across a range of different populations before they can be generalizable.
Asthma scales/questionnaires that have been developed with classical test theory often have gaps in their ability to measure the full spectrum of disease (22, 23). In contrast, with item response theory (IRT)-calibrated items, one can construct a measure that is useful across the full continuum of disease. Adams et al. (22) and Ahmed et al. (23) were some of the first researchers to apply IRT to evaluating asthma-specific quality of life questions. Both identified gaps in existing scales/measurements. Adams et al. found that the Marks Asthma Quality of Life Questionnaire (AQLQ) showed smaller differences in scores at both the lower and upper ends of the scale than in the middle, whereas Ahmed et al. observed substantial ceiling effects in the 30 Second Asthma Test. Thus these asthma scales do not span the entire continuum of disease and make it difficult to evaluate the effects of new treatments and interventions in children with the most severe disease.
The Patient-Reported Outcomes Measurement Information System (PROMIS), part of the National Institutes of Health Roadmap Initiative, was designed to develop better measures of patient-reported outcomes such as pain, fatigue, and physical functioning. PROMIS includes the development of domain item banks, and computerized adaptive tests based on IRT for the clinical research community (23). The PROMIS Pediatric project focused on the development of self-report PRO item banks across several health domains for youth ages 8 to 17 years. The primary focus was on the measurement of generic health domains that are important across a variety of illnesses (24). Asthma was chosen for a disease-specific item bank to measure symptoms of and impact of disease by patient self-report.
We aimed to identify a set of items that spanned the entire asthma severity/control continuum and that are valid in a wide range of subpopulations. The goals of the present paper are to describe the details of the IRT analyses of the asthma domain items. Lastly we present the 17-item pool and the 8-item short form that resulted from the IRT analyses.
As other PROMIS-related scales were designed to measure pain, fatigue, emotional, physical and social functions, the asthma item bank focused on asthma-specific symptoms and impacts, as described in the NHLBI Asthma Guidelines ERP-3 (3). For the development of the asthma impact items, we adopted the following asthma domain definition: “Asthma causes several symptoms for children that are not addressed in the generic item banks which include cough, wheeze, shortness of breath, and avoidance of triggers. Asthma is also associated with impacts such as missing school or activities with other children. The PROMIS pediatric asthma item bank focuses on symptoms specific to asthma” (Irwin et al., submitted).
The PROMIS Pediatric Asthma item banks were created using a strategic item generation methodology developed by the PROMIS Network (Irwin et al., submitted) (25). Six phases of item development were implemented: identification of existing items, item classification and selection, item review and revision, focus group input on domain coverage, cognitive interviews with individual items, and final revision before field testing. Identification of items refers to the systematic search for existing items in currently available pediatric scales.
The systematic search was utilized to identify an initial item pool of 169 asthma items. These asthma items included asthma items collected from the following scales: Pediatric Quality of Life, Generic and Asthma Module v 3.0 (PedsQL) (9–10), the Pediatric Asthma Diary (11–13), the DISABKIDS Asthma Module (14, 15), with permission from the authors. Additional items were written by the authors to fill in previously uncovered content areas identified by patient, parents, or the research team. Focus groups input and cognitive interview results were published previously (26, 27). Items successfully screened through the process were sent to field testing. The final PROMIS pediatric asthma item set contained 34 asthma items (26).
Participants were recruited in hospital-based outpatient general pediatrics and subspecialty clinics and in public school settings between January 2007 and May 2008 in North Carolina and Texas. Parental informed consent and minor assent were obtained for all children taking the survey.
The 34 PROMIS asthma items were administered along with the items from the PedsQL Asthma Module and the DISABKIDS asthma module (26). Children (N = 622) completed one of the two testing forms (Asthma Form 1, Asthma Form 2). The details of the sampling plan are described elsewhere (Irwin et al., submitted). Most (32 of 34) asthma items had a 7-day recall period and used standardized 5-point response options (never, almost never, sometimes, often, almost always). For 2 items, participants responded to items on a response scale in reference to the number of days (0 through 7 days).
As outlined in Reeve et al. (28), psychometric evaluation was completed sequentially using the steps outlined below. First, to verify coding and see that there were no empty response categories for any item, descriptive statistics were computed as a preliminary check on the validity of the data. Included in these checks were tables of marginal frequencies of item responses and the correlations of item scores with the total summed score.
Second, confirmatory factor analysis (CFA) of the interitem polychoric correlation matrix was conducted to verify, prior to IRT calibration, that the latent variable, asthma impact, was unidimensional. However, because the items were primarily selected from existing scales, there was a potential for content-specific factors. In other words, because there could be considerable overlap in item content, responses to subsets of the items may be more related than expected based on the relationship with the general factor, asthma impact. Thus, in addition to a single-factor model, fitting additional factors, and/or residual correlations, served as an indication of local dependence (LD) for pairs or small numbers of items, violating the local independence assumption of unidimensional IRT (29). Items were set aside from subsets that exhibited LD. These analyses were performed using the mean and variance adjusted weighted least squares algorithm (WLSMV) as implemented in the computer program Mplus (30).
Third, based on the CFA findings, locally independent items were calibrated using Samejima’s graded response model (GRM; 1, 31, 32) in the software Multilog (33). The GRM estimates a slope or discrimination parameter (a), reflecting the degree of association of the item responses with the latent construct being measured, and four threshold parameters (bk) (for five response option items) that indicate the level of asthma impacts at which a response in a given category or higher becomes probable. The fit of the IRT model to the data was examined using the S-X2 statistic (34); and generalized by Bjorner et al. (35). As a goodness of fit statistic, a nonsignificant S-X2 value suggests adequate fit of the model to the data.
Fourth, differential item functioning (DIF) was investigated between males and females using the IRT-LR (Likelihood Ratio) DIF detection procedure (36) as implemented in the software IRTLRDIF (37). In this case, the presence of DIF indicates that the relation of item responses with the latent variable differs between boys and girls. Such a difference suggests that some other factor, related to gender but different from the construct being measured, influences item responses, which is a violation of the assumption of unidimensionality. Here again, a nonsignificant χ2 indicates a lack of DIF. DIF detection requires many tests of significance, so the Benjamini-Hochberg procedure (38, 39) was used to control for multiple comparisons.
Finally, though IRT scale scores may be computed from either item response patterns or summed scores, we expect scale scores for summed scores to be used more often. The Appendix provides a score translation table to be used for this purpose (40). The IRT scale scores reported here use the North Carolina sample as the reference group.
Test forms containing PROMIS pediatrics asthma items were completed by a total of 622 respondents. Table 1 describes participant characteristics by asthma form (Irwin et al., submitted); note the analyses conducted with Forms 1 and 2 are conducted in exactly the same manner. The sample participants completing the asthma forms were about 44% female and 63% of children were between the ages 8 to 12 years old. Forty-six percent were Caucasian, 35% Black, 7% multiracial, and 12% other races (Asian/Pacific Islanders, Native Americans, and Other Races). Fourteen percent of the sample was of Hispanic ethnicity. The vast majority of the adults providing informed consent for the children were parents (92%) or grandparents (4%) of the child. The educational attainment of these parents or guardians ranged from less than high school (8%) to advanced degree (9%), with 22% reporting a college degree, 37% some college, and 24% a high school diploma. Approximately 28% of the children participating in the survey had a chronic illness diagnosis during the past 6 months. The children participating in the survey had a range of asthma severity and asthma control, with approximately half reporting symptoms characteristic of children with mild persistent asthma or more severe asthma. Fifty percent and 54% of the children (completing either Asthma Form 1 or Asthma Form 2) reported having asthma symptoms 3 days or more in the past 2 weeks, with a range of 0 to 14 symptom days. Similarly, at least half of the sample (50% completing Form 1, 54% completing Form 2) reported nocturnal asthma symptoms 2 or more nights in the past 2 weeks. The majority of the children (55% completing Asthma Form 1, 58% completing Asthma Form 2) reported using rescue medication 2 times or more in the past 2 weeks, with a range of 0 to 42 times.
Table 2 provides the factor loadings and residual correlations from the CFA model. Prior exploratory and confirmatory factor analytic models were necessary to develop a model that is both substantively interpretable, and indicates close fit with the data. For this model, local dependence, or nuisance multidimensionality, is indicated by a series of additional factors (i.e., “Play,” “Attacks,” and “Scared”), where “Play,” for example, is represented by a collection of items relating to difficulty playing due to asthma symptoms. In other cases, pairs of items were locally dependent and were modeled with residual correlations (e.g., “I went to the emergency room for my asthma” and “I went to the hospital for my asthma” almost amount to asking the same question twice). Indices of goodness of fit, as suggested by Reeve et al. (28), indicate that the model fits the data well, χ2(155) = 625, CFI (Comparitive Fit Index) = 0.917, TLI (Tucker-Lewis Index) = 0.986, RMSEA (Root Mean Square Error of Approximation) = 0.070.
To ensure that the final selection of items was unidimensional, a team of experts reviewing CFA results selected one item to represent each locally dependent subset. In many instances the selected item had the highest factor loading among the subset of items; in other instances the selected item was substantively important to final scale (e.g., “I had asthma attacks” and “It was hard for me to play sports or exercise because of my asthma”). The remaining items were set aside, leaving 19 items that were calibrated using the GRM. Table 3 contains the values of the item parameter estimates, item fit statistics (S-X2), and LR-DIF statistics for the items comprising the final pool. In Table 3, the items are sorted based on the magnitude of the slope parameter, so the generally best indicators of asthma impact are near the top.
After using the Benjamini-Hochberg correction for multiplicity on both fit and DIF statistics, no items exhibited significant lack of fit, and only one item, “My asthma was really bad,” was set aside for significant DIF (χ2(5) = 18.3, p = .003). The item “I went to the emergency room for my asthma” was removed due to its poor discrimination (a = 0.88). The remaining 17 items in Table 3 comprise the asthma item pool.
Figure 1 provides test information functions for the asthma item pool and five potential short forms on a T-score scale with a mean of 50 and standard deviation of 10 (on which all PROMIS scales are reported). Test information, the expected value of the inverse of the squared standard error of measurement, indicates the precision of scores on a scaled metric. A standard error of measurement of approximately 0.32 (on a standardized metric, or 3.2 on a T-score metric) is associated with a test information value of 10 and hence a reliability coefficient of approximately 0.90. Figure 1 contains five potential 8-item short forms that provide test information greater than 10 for a range of scores between, approximately, 30 and 70 on the T-score scale. The recommended 8-item short form in the Appendix is the item set that provides the maximum test information at the mean (50) on the T-score metric. However, if more score precision is required (or “broader” precision), the complete 17-item pool is contained in Table 3 and may be used to compute IRT response pattern scores or IRT-scaled scores for summed scores, or a computerized adaptive test (CAT) may be administered using all 17 items as the pool. Conversely, if a researcher needs a shorter assessment and can sacrifice precision, a shorter scale can be created and scores may be generated on the T-score metric.
Figure 1 also serves as a simulated CAT, such that separate test information functions are computed from the 8 items that provide the most information at five possible score locations (30, 40, 50, 60, and 70 on the T-score metric). That is, the items used to generate the test information function at each of the five score locations are those that perfect adaptation would select for an individual at those score levels. To consider the usefulness of CAT given these items, one may compare both the range of score precision and the magnitude of score precision across the separate potential short forms. Because the items generally discriminate in the same score range, there is little precision gained by choosing among the five potential short forms. Additionally, relatively minor gains in reliability (approximately 0.03) were noted when comparing test information functions for the full item pool compared to the potential short forms. The PROMIS Assessment Center, available at www.nihpromis.org, contains the item pool and is capable of administering these items as a CAT.
We have described the development of the new NIH PROMIS Pediatric Asthma Impact Scale (PAIS) using item response theory, including the investigations of scale dimensionality, local dependence, and differential item functioning. After eliminating asthma items exhibiting local dependence or differential item functioning, we present the final 17–asthma item pool, as well as the recommended 8-item short form (Appendix 1).
Existing Asthma Quality of Life instruments/scales have covered a variety of asthma-related domains: symptoms (daytime, nocturnal), emotion functioning/psychosocial health, physical activities/functional limitations, worry/stigma, independence, environment, communication, and treatment (9, 10, 13, 15, 17–20, 41–46). Our scale, based on the PROMIS asthma domain definition, includes symptoms, as well as asthma-specific impact on functional limitations, physical activities, emotional functioning, and psychosocial health.
Our Pediatric Asthma Impact Scale (PAIS) takes full advantage of IRT analyses in the scale development process. As noted earlier, previous asthma quality of life scales used classical test theory. Several key assumptions in classical test theory do not always hold in practice. These include (1) each item contributes equally to the final score, (2) all items have equal variance, (3) each item is measured on the same interval scale, and (4) the error of measurement is the same at the ends of the scale as it is in the middle (21). These assumptions, however, are met in the PAIS items by virtue of being IRT based.
There is evidence that previously developed asthma quality of life scales discriminate less well among people with more severe impairment in their quality of life as well as among those with more mild limitations (22, 23). As Ahmed et al.’s analyses illustrate with IRT analyses of the 30-Second Asthma Test, the items could only distinguish between high and low levels of control, with no levels in between, whereas an ideal scale would have items that assess distinct levels of asthma impact equally spaced apart on the IRT scale. The PAIS items, developed with IRT, have this characteristic.
Furthermore, the PAIS provides assessment of asthma “impact” with high precision using a relatively small set of items through the application of computerized adaptive testing (23). As noted by Varni et al. (submitted) an additional potential advantage of utilizing IRT-developed scales includes allowing greater flexibility for researchers in selecting items that are the most meaningful for their study design and hypotheses.
The present study presents the PROMIS Pediatric Asthma Impact Scale (PAIS) developed with IRT, and provides the initial calibration data for the items. Future work will include testing of the scale’s reliability and validity, as well as responsiveness to clinical change.
Listed below are the item stems for the recommended 8-item short form for the PROMIS Pediatric Asthma Impact Scale. All items use a 7-day recall period (the preface is “In the past 7 days”), and a 5-point response scale with the options never (0), almost never (1), sometimes (2), often (3) and almost always (4).
I had trouble breathing because of my asthma.
My asthma bothered me.
I felt wheezy because of my asthma.
It was hard to take a deep breath because of my asthma.
It was hard for me to play sports or exercise because of my asthma.
I had trouble sleeping at night because of my asthma.
My chest felt tight because of my asthma.
I felt scared that I might have trouble breathing because of my asthma.
The summed score to scale score translation for this short form is in Table A1.
|Summed score||Scale score||SD|
Note. Scale scores are on a T-score scale; the values of SD are reported as conditional standard errors of measurement. Summed scores are the sum of each item score (0–4) for all 8 items; the scale scores represent the corresponding IRT score for each summed score.
Declaration of Interest
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.