|Home | About | Journals | Submit | Contact Us | Français|
The National Institutes of Health's Patient-Reported Outcomes Measurement Information System (PROMIS) has developed several scales measuring symptoms and function for use by the clinical research community. One advantage of PROMIS is the ability to link other scales to the PROMIS metric.
The objectives of this research are to provide evidence of validity for one of the PROMIS measures, the Pediatric Asthma Impact Scale (PAIS), and to link the PedsQL™ Asthma Symptoms Scale with the metric of the PAIS.
Descriptive statistics were computed describing the relationships among scores on the PAIS, the PedsQL™ Asthma Symptoms, Treatment, Worry, and Communication Scales, and the DISABKIDS Asthma Impact and Worry Scales for approximately 300 children ages 8–17. A novel linkage method based on item response theory (IRT), calibrated projection, was used to link scores on the PedsQL™ Asthma Symptoms Scale with the metric of the PAIS.
The PAIS exhibited strong convergent validity with the PedsQL™ Asthma Symptoms Scale, and less strong relations with the other five scales. The linkage system uses scores on the PedsQL™ Asthma Symptoms Scale to produce relatively precise score estimates on the metric of the PAIS.
Results of this study provide evidence for the validity of the PAIS, and a method to use scores on the PedsQL™ Asthma Symptoms Scale to estimate scores on the metric of the PAIS, in partial fulfillment of the PROMIS goal to provide a lingua franca for health-related quality of life.
The National Institutes of Health Roadmap for Medical Research initiative included a large-scale effort to develop the Patient-Reported Outcomes Measurement Information System (PROMIS). One goal of PROMIS was to provide a common set of health-related quality of life (HRQoL) scales and metrics for the clinical research community [1, 2]. To accomplish part of that goal, the PROMIS pediatric project developed patient-reported outcome (PRO) item banks across several health domains for youth aged 8–17. These included scales for measurement of generic health domains that are important across a variety of illnesses, including physical function, pain, fatigue, emotional distress, and social function [3–6]. In addition, one disease-specific item bank was developed for children with asthma , which was chosen because it is the most common chronic childhood disease, and PRO measurement is an essential component in evaluating outcomes for children with asthma [8–10].
The PROMIS scales all provide measurement with metrics defined using item response theory (IRT). A goal of PROMIS has been that those scales, and their metrics, will become a lingua franca (bridge or common language) for HRQoL measurement in clinical research. Although that goal may be accomplished by widespread adoption of the PROMIS scales themselves, even more use of the PROMIS metrics can be obtained if scores on suitable existing scales can be “linked” with the PROMIS scales, to provide score estimates on the new metrics.
This research describes the relations of scores on one of the PROMIS measures, the Pediatric Asthma Impact Scale (PAIS) , with scores from two previously existing measures of HRQoL for children with asthma (PedsQL™ 3.0 and DISABKIDS Asthma Modules), to provide evidence of convergent and divergent validity for the PAIS. Further, the PedsQL™ Asthma Symptoms Scale  is sufficiently strongly related with the PAIS that it can be linked with the metric of the PAIS.
The field-test item set for the PROMIS PAIS item bank was created using a strategic item generation methodology developed by the PROMIS Network . Some questionnaire items were developed as variants of items on existing scales measuring asthma-related HRQoL, while others were generated de novo. Focus groups and cognitive interviews [13, 14] were used to refine item wording and content. Items successfully screened through the focus groups and cognitive interviews were retained for field testing. The final PROMIS pediatric asthma field test item set contained 34 items. The response scale for each item is “never” (0), “almost never” (1), “sometimes” (2), “often” (3), and “almost always” (4).
A series of statistical analyses were performed to confirm the unidimensionality of the measure, to identify and set aside items that exhibited local dependence (LD) or differential item functioning (DIF), and finally to calibrate the remaining items onto an IRT scale using Samejima's graded item response model [15, 16]. The final PAIS item pool comprises seventeen items, with item parameters for IRT scoring, and a scoring table for an eight-item short form . The PAIS is also available from the PROMIS Assessment Center, which contains the item pool and allows the user to select and administer the 17-item pool, an 8-item short form, or a computerized adaptive test (CAT) .
The PedsQL™ 3.0 Asthma Module  comprises 4 scales: Asthma Symptoms (11 items), Treatment Problems (11 items), Worry (3 items), and Communication (3 items). The response scale for each item is the same as that for the PAIS: “never” (0), “almost never” (1), “sometimes” (2), “often” (3), and “almost always” (4). For conventional scoring of the PedsQL™ 3.0 Asthma Module, the responses are transformed to 100, 75, 50, 25, and 0, respectively, resulting in a scale range of 0–100, with higher scores indicating better HRQoL. Scores are computed as the mean of non-missing items, unless more than 50% of the responses for a scale or subscale are missing data, in which case there is no score.
The DISABKIDS Asthma Module  includes 2 scales: Impact (6 items) and Worry (5 items). The response scale is “never” (0), “seldom” (1), “quite often” (2), “very often” (3), and “always” (4). Scores on the DISABKIDS Asthma Module are the average of each scale's reversed responses multiplied by 25, yielding a score range of 0–100, with higher scores indicating better HRQoL. There is no score if responses are missing for more than one of the items on a scale.
The overall data collection design for calibration of the PROMIS pediatric measures has been described elsewhere . Participants were recruited in hospital-based outpatient general pediatrics and subspecialty clinics and in public school settings between January 2007 and May 2008 in North Carolina and Texas. Parental informed consent and minor assent were obtained for all children taking the survey.
The 34 PROMIS field-test asthma items were administered along with the items measuring other domains (e.g., emotional distress, fatigue) from the PROMIS pediatric item tryout set, as well as from the PedsQL™ 3.0 Asthma Module  and the DISABKIDS asthma module . Children (N = 622) completed one of the two testing forms (Form Asthma 1, Form Asthma 2). Form Asthma 1 included the 34 PROMIS field-test asthma items, and items measuring other PROMIS domains (the latter are not considered further here). Form Asthma 2 included the same 34 asthma items, and the PedsQL and DISABKIDS asthma modules. The seventeen asthma items that were ultimately retained for the PAIS item pool  had a 7 day recall period. The details of the sampling plan are described elsewhere .
Scores for the PedsQL and DISABKIDS scales were computed according to published procedures for each scale. The items of the PAIS have been calibrated using Samejima's graded IRT model [15, 16], which describes the probability of each item response as a function of a set of item parameters (as and cs), and θ, the latent variable measured by the scale, as follows: The conditional probability of response u = 0, 1, … 4 is
in which is a curve tracing the probability of a response in category u or higher:
for u = 1, 2,…, m − 1, and
For calibration of the PAIS, θ is unidimensional. Values of the item parameters (as and cs) are tabulated in the original description of the PAIS . For use in this research, expected a posteriori (EAP) estimates for response patterns  were computed for each respondent. Response-pattern EAP estimates may be computed in the presence of any pattern of missing item responses, so they are available for all respondents.
Descriptive statistics were computed for the scores on all seven asthma scales using all available data.
Holland provides a modern framework for test score linking; he writes that “linking refers to the general class of transformations between the scores from one test and those of another,… linking methods can be divided into three basic categories called predicting, scale aligning, and equating” . For linking scores from disparate tests, such as the PAIS and the PedsQL Asthma Symptoms Scale (hereafter the PedsQL Symptoms Scale), only predicting scores on the PAIS scale (from the PedsQL Symptoms Scale), and aligning the two scales, are viable candidates.
A commonly used method of scale aligning has been calibration, which uses IRT to place the items from each of two scales on the same metric. After that is done, standard computation of IRT scaled scores from any subset of the items (which includes all of the items on one scale, or the other) yields comparable scores. However, calibration has heretofore been limited to situations in which a unidimensional IRT model is suitable for the aggregate set of items from both scales—that is, both scales measure exactly the same construct.
For two scales that measure different (if highly related) constructs, predicting scores on one scale from those of the other has been the norm. Such predictions are based on regression models, but often the regression model is elaborated to attend to the fact that the prediction is not so much a point as it is a distribution. When the entire conditional (predicted) distribution is considered, predicting scores on one scale from the score on the other measure is called projection. Usually projection has been based on variations of standard regression models, which consider the values of the predictor variable(s) fixed.
Because the greatest threat to the validity of test linkage that is readily detectable empirically is a lack of invariance of the relation between the two scales across subgroups of the population, Dorans and Holland recommend the use of a root mean squared deviation (RMSD) statistic to check invariance of subgroup differences between scales to be linked . Dorans and Holland found values of RMSD of 1–8% to be associated with linkages that have proved useful; those were associated with test scores correlated 0.83–0.92. In this research, the RMSD statistics were computed for each scale for the differences between boys and girls, and between younger and older children.
Calibrated projection is a new statistical procedure that uses IRT to link two measures, without considering the scores on the predictor scale to be fixed, and without the demand of conventional calibration that the two are measures of the same construct. In calibrated projection, the multidimensional version of Samejima's graded model is fitted to the item responses from the two measures: θ1 represents the underlying construct measured by the PAIS, with estimated slopes a1 for each of the PAIS items and fixed values of 0.0 for the items of the PedsQL Symptoms Scale. θ2 represents the underlying construct measured by the PedsQL Symptoms Scale. The correlation between θ1 and θ2 is estimated simultaneously with the as and cs for the items on the two scales together. The model may also include additional latent variables, if required to fit LD that may arise between pairs of items within or across scales.
Subsequently, the MIRT model may be used to provide IRT scaled score estimates on the scale of one measure, using only the item responses from the other measure. Figure 1 illustrates calibrated projection: The x-axis variable is θ2, the underlying construct measured by the PedsQL Symptoms Scale, and the y-axis variable is θ1, the underlying construct measured by the PAIS. The two latent variables are highly correlated, as indicated by the density ellipses around the regression line. Given the item responses on the PedsQL Symptoms Scale, IRT methods may be used to compute the implied distribution on θ2; two of those are shown along the x-axis in Fig. 1, for summed scores of 13 and 44 on the PedsQL Symptoms Scale. The estimated relation between θ1 and θ2 may then be used to project those distributions onto the y-axis, to yield the implied distributions on θ1, the PAIS construct, given those scores on the PedsQL Symptoms Scale.
The means of the implied distributions on the θ dimensions are the IRT-based scaled scores, and the standard deviations of those distributions are reported as the standard errors of those scores. The projection links the scales in the sense that each score on the PedsQL Symptoms Scale yields a score on the PAIS metric.
The two test forms containing PROMIS pediatric asthma items were completed by a diverse sample of 622 respondents. Demographic statistics for the entire PROMIS pediatric calibration sample have been described elsewhere , as have those for the asthma form respondents . Specifically for the asthma samples, the children participating in the survey had a range of asthma severity and asthma control, with approximately half reporting symptoms characteristic of children with mild persistent asthma or more severe asthma.
Descriptive statistics for scores on the PAIS, the PedsQL Asthma Symptoms, Treatment, Worry, and Communication Scales, and the DISABKIDS Asthma Impact and Worry Scales are provided in Table 1. For the PAIS the sample size is 622, because the IRT response pattern EAP estimates could be computed for all respondents. For the other scales, the sample size varies around 300: approximately half the sample responded to the form that included those legacy scales, and a few respondents had missing scores on the PedsQL or DISABKIDS scales.
The mean and standard deviation of the PAIS scores are close to the standardizing values of 50 and 10, because the original reference sample makes up the majority of the current data. The means and standard deviations for the PedsQL scales are close to those reported for the standardization sample , with the possible exception of the Worry scale (mean 70.7 in this sample, and 76.3 in the standardization sample). Coefficient α reliability values are higher for the PedsQL scales in this sample (Table 1) than they were in the original reference sample (symptoms = 0.85; treatment = 0.58; worry = 0.72; communication = 0.70). The values of coefficient alpha for the DISABKIDS scales are consistent with the range of alphas observed across several countries (impact range = 0.72–0.91; worry range = 0.61–0.88). .
All correlations of the PAIS scores with those from the other scales are negative, because the PAIS is scored with “asthma impact” (low HRQoL) high, while the other scales are scored so that high HRQoL corresponds with high scores. The PAIS scores are correlated −0.85 with PedsQL Symptoms Scale scores; the disattenuated value of the correlation is −0.94, suggesting that the PAIS and the PedsQL Symptoms Scale measure nearly the same underlying construct, so linkage between the two scales may be useful. The correlations of the PAIS and PedsQL Symptoms Scale scores with the scores on the remaining five scales are very similar to each other, providing further evidence that those two scales measure approximately the same construct. Those correlations are generally lower, as are the correlations among the remaining five scales, suggesting that distinct aspects of HRQoL are measured by each of the remaining scales, and providing evidence of divergent validity for the PAIS and the PedsQL Symptoms Scale.
The rightmost columns of Table 1 show the standardized mean differences (effect sizes) for girls vs. boys, and younger (ages 8–11) vs. older (ages 12–17) respondents for each scale. Ignoring the (necessarily) opposite signs of the differences, those values are especially similar for the PAIS and PedsQL Symptoms Scale, again suggesting that linkage of the two metrics may be viable. RMSD values are 6% for the sex-related difference and 0 for the age difference, within the range of values for linkages between educational measures considered plausible .
Calibrated projection was used to link scores on the PedsQL Symptoms Scale to the IRT metric of the PAIS. The projection is based on a MIRT model with item parameters tabulated in Table 2 (for the discrimination (a) parameters) and 3 (for the intercept (c) parameters). In this model, θ1 represents the underlying construct measured by the PAIS, with estimated slopes a1 for each of the PAIS items and corresponding fixed values of 0.0 for the items of the PedsQL Symptoms Scale. θ2 represents the underlying construct measured by the PedsQL Symptoms Scale. The item parameters are tabulated on the conventional IRT scale for which the latent variables are standard (multivariate) normal; the estimated correlation between θ1 and θ2 is 0.96. The model also includes six additional uncorrelated latent variables, θ3 to θ8; each of those latent variables has non-zero (equal) slopes (a) for a particular pair of items that are nearly identical on the PAIS and the PedsQL Symptoms Scale. Those pairs of items arose from the fact that six PAIS items were derived by rewording corresponding PedsQL Symptoms Scale items. Some degree of local dependence (LD) is induced by those nearly identical item pairs, and the additional factors fit that LD so that the model is not misspecified. All of the a parameter estimates in Table 2 exceed six times their standard errors, indicating that the corresponding relationships differ significantly from zero.
In Table 3 the items are sorted within each of the two scales by their degree of severity. For the graded item response model, an overall index of severity is the parameter b, which is the average of the item's threshold parameters bu = −cu/a .
An alternative procedure using joint unidimensional calibration of the two scales was considered and rejected. The unidimensional model replaced the first two factors in the model described above with one-dimensional θ for both the PAIS and the PedsQL Symptoms Scale together. The likelihood ratio test for the difference in fit between the unidimensional model and the two-dimensional model was χ2(1) = 50.9, P < 0.0001, rejecting the unidimensional model (or joint calibration as a strategy).
Using the MIRT graded model parameter estimates in Tables 2 and and33 for the items of the PedsQL Symptoms Scale, expected a posteriori (EAP) values of θ1, the PAIS latent variable, were computed for each summed score on the PedsQL Symptoms Scale. These values are in Table 4, converted to the PAIS reporting scale (with a mean of 50 and standard deviation of 10 in the original PAIS calibration sample). Table 4 may be used as a scoring table to convert scores on the PedsQL Symptoms Scale into scores on the PAIS metric.
These results represent evidence of convergent and divergent validity for the PAIS; its scores measure very nearly the same construct as that measured by the PedsQL Symptoms Scale; and that construct is relatively distinct from those measured by the PedsQL Asthma Treatment, Worry, and Communications scales, and the DISABKIDS Asthma Worry and Impact scales (although the PAIS construct is closer to the latter).
Calibrated projection has been used to provide linkage of scores on the PedsQL Symptoms Scale with the metric of the PAIS. This linkage takes into account the slight difference between the constructs measured by the two scales, as well as LD that arises from six pairs of items with very similar members on both scales.
Linkage of the PedsQL Symptoms Scale with the PAIS does not reduce the usefulness of the PAIS. For research in which the PAIS measures the intended construct, the PAIS short form provides more precise measurement with eight items than does the PedsQL eleven-item Symptoms Scale; in addition, the PAIS comprises a bank of IRT-calibrated items that may be used to construct other forms or in computerized adaptive testing. So the PAIS itself would be preferred for new research involving its construct.
However, there is a large body of research that has already used the PedsQL Symptoms Scale; that research can be retro-scored onto the PAIS metric for comparability. In addition, some research may have use for the entire four-scale PedsQL Asthma Module; in such contexts the Peds-QL Asthma Module may be a more useful choice, with linkage used to provide scores comparable to those obtained with the PAIS in other research.
Test linking has not received extensive treatment in the health outcomes literature. Among health outcomes measures, researchers have linked alternate forms of similar measures of functional health status  and self-regulation . Others have attempted linkage of disparate health outcomes measures [28–30]. Those earlier attempts at linking demonstrate the power of IRT, but they also raise questions about when and what to link in measures of health outcomes (in particular, the feasibility and utility of linking apparently dissimilar instruments). Calibrated projection, as used in this study, makes fewer demands on the data (i.e., it does not require unidimensionality across scales). In addition, this research used RMSD statistics to evaluate the viability of linkage, something that has not previously been done with health outcomes scales.
Calibrated projection is not a panacea that permits any scale to be linked with any other. A limitation of linking is that it only makes sense between scales that measure very nearly the same construct. In this research, only one of the legacy scales was linked to the metric of the PAIS; the other five scales appear to measure constructs too different from that of the PAIS to link. An additional limitation of calibrated projection is that it requires item-level data on both scales from the same sample of respondents. Other linkage methods can use data collected with other designs, such as equivalent groups, but those procedures do not permit the validity of the linkage to be empirically checked. Further, projected scores from some other scale are less precise—that is, they have larger standard errors—than scores on the primary scale.
Overall this research provides results illustrating the usefulness of both the PAIS in particular and the PROMIS approach to integrated measurement of HRQoL in general. We anticipate that calibrated projection may be useful to link other legacy scales to the PROMIS metrics as well, permitting health outcomes research to integrate findings from earlier research that used previously existing scales with newer research that uses the more modern scales developed by the PROMIS network.
We would like to acknowledge the contribution of Harry A. Guess, MD, PhD to the conceptualization and operationalization of this research prior to his death. We are grateful to Li Cai for the theoretical development and implementation in software of the two-tier methods for item parameter estimation and the computation of scaled scores, and for his advice on their use in this project. This work was funded by the National Institutes of Health through the NIH Roadmap for Medical Research, Grant 1U01AR052181-01, and by SBIR contract HHSN-2612007-00013C with the National Cancer Institute of the National Institutes of Health. Information on the Patient-Reported Outcomes Measurement Information System (PROMIS) can be found at http://nihroadmap.nih.gov/ and http://www.nihpromis.org.
Conflict of interest Dr. Varni holds the copyright and the trademark for the PedsQL™ and receives financial compensation from the Mapi Research Trust, which is a nonprofit research institute that charges distribution fees to for-profit companies that use the Pediatric Quality of Life Inventory™.
David Thissen, Department of Psychology, CB# 3270, Davie Hall, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
James W. Varni, Department of Pediatrics, College of Medicine, Texas A&M University, College Station, TX, USA; Department of Landscape Architecture and Urban Planning, College of Architecture, Texas A&M University, College Station, TX, USA.
Brian D. Stucky, Department of Psychology, CB# 3270, Davie Hall, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Yang Liu, Department of Psychology, CB# 3270, Davie Hall, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Debra E. Irwin, Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
Darren A. DeWalt, Division of General Medicine and Clinical Epidemiology and Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.