|Home | About | Journals | Submit | Contact Us | Français|
The aims of the study were to (1) build new item banks for a revised version of the Pediatric Evaluation of Disability Inventory (PEDI) with four content domains: Daily Activities, Mobility, Social/Cognitive, and Responsibility and 2) use post-hoc simulations based on the combined normative and disability calibration samples to assess the accuracy and precision of the PEDI computerized adaptive tests (PEDI-CAT) in comparison to the administration of all items.
Parents of typically developing children (n=2,205) and parents of children with disabilities (n=703) between ages 0 to 21 years, stratified by age and gender participated by responding to PEDI-CAT surveys through an existing Internet Opt-in Survey Panel in the USA and by computer tablets in clinical sites.
Confirmatory factor analyses supported four unidimensional content domains. Scores using the real data post-hoc demonstrated excellent accuracy (ICCs ≥0.95) with the full item banks. Simulations using item parameter estimates demonstrated relatively small bias in the 10- and 15-item CAT versions; error was generally higher at the scale extremes.
These results suggest the PEDI-CAT can be an accurate and precise assessment of children’s daily functioning at all functional levels.
The Pediatric Evaluation of Disability Inventory (PEDI) was originally developed to provide a parent/clinician-reported functional test that was normed on children up to the age of 7 ½ years.1 The PEDI has a long history of application in developmental medicine, serving as a functional outcome measure for clinical research and practice.2 The original PEDI was a fixed-format test in which all items needed to be administered in order to derive a score. In addition to some finding the administration of all PEDI items potentially burdensome, the restriction of norms to only young children also limited its use.3 The purpose of this article is to report on the revision of the PEDI into a series of computer-adaptive tests (CAT) that cover the 0 to 21 year age range.
CATs are being promoted as the next logical step to more efficient health outcome assessments.4 To employ a CAT, Item Response Theory (IRT) modeling is used to express the association between an individual’s response to an item and the outcome domain. IRT measurement models are a class of statistical procedures used to develop measurement scales. The measurement scales are comprised of items with a known relationship between item responses and positions on an underlying functional continuum. Using this approach, probabilities of children scoring a particular response on an item at various functional ability levels can be modeled. These probability estimates are used to determine the child’s most likely position along the functional dimension. When assumptions of a particular IRT model are met, estimates of a child’s functional ability do not strictly depend on a particular fixed set of items. This scaling feature allows one to compare persons along a functional outcome dimension even if they have not completed the identical set of functional items. CAT also optimizes the items administered, providing items most likely to yield the greatest information for score estimation. CATs will generally require fewer items than a comparable length fixed form instrument to achieve similar precision, although the gains in efficiency may not be uniform throughout the full scale.5
Early work has highlighted the possible benefits of using CAT in the assessment of children’s functioning, including improved efficiency with minimal loss of precision.6–9 We have shown that the original PEDI can be successfully engineered into efficient CAT programs.8, 10 The North American Shriners Orthopedic Hospitals have recently developed a series of parent- and child-reported CAT measures of physical functioning and participation for their network with promising results.11–15
The aims of this project were to build and test new PEDI-CAT item banks for an age range of 0 to 21 years and use simulations based on the combined normative and disability calibration samples to assess the accuracy and precision of the PEDI-CAT.
The targeted normative population of interest for the PEDI-CAT was civilian households in the contiguous United States with children under 21 years of age. Normative data were collected through the internet. An online panel from YouGovPolimetrix (www.polimetrix.com), (Palo Alto, CA) was the sample source for a nationally representative sample of 2,205 parents. YouGovPolimetrix operates a panel of over one million respondents who have provided them with their names, street addresses, email addresses, and other information, and who regularly participate in online surveys. Panel members receive modest compensation when they participate in online surveys. Panel members are restricted to respondents with fluency in English.
Quota sampling based on age was used to ensure that sufficient cases of typically developing children and youth were collected within each of the PEDI age-based strata (100 cases in each of the 21 age strata starting with 0–1 and stopping with 20–21). Within each age group, equal proportions of gender were selected and efforts were made to ensure that subjects were representative of the racial/ethnic and socio-economic distribution of the US population according to the Year 2000 Census Bureau data. (See Table 1 for sample characteristics). Eligibility for participation was determined by a series of parent-reported screening items to ensure the child was developing typically. For example any child with a disability, chronic condition or receiving specialized services was considered not eligible for the normative sample. Once eligibility was determined participation and consent were obtained.
Data for the vast majority of the disability sample (n= 617) were also collected through the internet by YouGovPolimetrix. Eligibility for the disability sample was determined by screening questions which identified children and youth with a physical, cognitive, or behavioral disability. We supplemented the disability sample by collecting parent-reported mobility domain data at two clinical sites that served a wide age range (n= 86) in order to add children with more severe physical disabilities. (See Table 2). Both clinical sites collected parent-reported data from an internet system. The study received approval from each site’s governing institutional review board, and each parent respondent provided informed consent as well as Health Insurance Portability and Accountability Act authorization before participation.
The initial item pools for the PEDI-CAT were developed through a comprehensive review of existing pediatric measures, the published literature on the functional outcomes of children and youth in hospital-based and community settings and user feedback since the original PEDI’s publication in 1992.16 Feedback regarding content coverage, content relevance, and item clarity was compiled through focus groups, expert reviews and cognitive testing. Details regarding the qualitative procedures used to develop the PEDI-CAT item pool are summarized in Dumas, et al. 17 The calibration item pool was finally condensed to a total of 298 items (76 Daily Activities, 105 Mobility, 64 Social/Cognitive and 53 Responsibility).
The PEDI-CAT is comprised of a set of functional activities that are likely to be experienced by children and youth within the context of their daily lives. Functional activity is multidimensional, thus, the PEDI-CAT is comprised of four independent content domains. Daily Activities is the ability of a child to carry out daily living skills such as eating, dressing, and grooming activities. The Daily Activities domain also includes items related to household maintenance and the operation of electronic devices. Mobility is the ability of a child to move in different environments such as in the home (getting in and out of own bed) or in the community (getting on and off a public bus or school bus). Mobility items range from foundational motor skills of rolling over and sitting unsupported to more physically challenging skills of jumping, running, or carrying heavy objects. Social/Cognitive is the ability to function safely and in effective social exchanges. Social/Cognitive items address communication, interaction, safety, behavior, play, attention, and problem-solving. Responsibility is the extent to which a young person is managing life tasks (e.g. fixing a meal, planning and following a weekly schedule) which are important for the transition to adulthood and independent living. See Appendix for sample items in each domain.
The calibration item pool consisted of 76 Daily Activities, 105 Mobility, 64 Social/Cognitive and 53 Responsibility items. Most parents completed the PEDI-CAT items over the Internet. In some cases (n = 86), parents completed the assessment during their child’s medical or therapy session(s) at one of the two pediatric clinical sites that serve children and youth ages birth to 21 years. IRT analysis does not require complete data on all participants as long as everyone in the sample has answered a common core set of items. Therefore a series of 12 parallel forms were developed so that no one participant responded to more than 175 items. Three quarters of the sample answered items from each domain. A unique set of cases (n=512, 25% of sample) completed all items from each domain. The software program provided introductory information and instructions. Filter questions were included to ensure that parents were not asked questions that did not pertain to their child. For example, if parents indicated their child used a walker, but not a wheelchair, subsequent questions pertaining to the use of a wheelchair were not administered.
Item response theory methods were used to refine the item pool for each PEDI-CAT dimension. Assumptions that are checked prior to IRT modeling include unidimensionality and local independence (items measure a single trait), and stability or invariance of item parameters across groups (e.g. between normative and disability samples). Violations to these assumptions create sub-optimal modeling of the data and may restrict the accuracy of estimated scores. Our intent was to develop a combined scale for the normative and clinical samples, if possible so that both groups would be scored from the same metric.
We examined the unidimensionality of all four PEDI-CAT domains by confirmatory factor analysis (CFA) and removed items when necessary over a series of subsequent iterative analyses until satisfactory overall fit was achieved. CFA model fit was assessed by multiple fit indexes, including Comparative Fit Index (CFI), Tucker–Lewis Index (TLI) and Root Mean Square Error Approximation (RMSEA). CFI and TLI compare the model to a baseline null model; possible values range from 0 to 1; 0.95 or higher values suggest acceptable fit. RMSEA assesses misfit per degree of freedom; values less than 0.08 mean acceptable fit.18 We evaluated local independence by inspecting the residual correlations between items using MPlus 19 software.
We developed item parameters using the two-parameter logistic Graded Response Model (GRM) with PARSCALE.20 To test for item fit, we used likelihood ratio chi-square statistics to test each item based on the comparison of the expected and observed value across the distribution of the latent variable; a p value less than 0.05 indicted misfit.20 We also used the Residual-Plot program21 to assess item fit as it provided both item and category fit plots. We identified item misfit based on whether both the residual plots and chi-square statistics exceeded published standards. We examined Differential Item Functioning (DIF) based on the logistic regression model,22 which determines whether there are significant differences in item calibrations between samples. We were particularly interested in DIF between the normative and disability samples.
Simulations were used to assess the accuracy and precision of the PEDI-CAT in comparison to the administration of all items in the domain banks. In simulations using real data, we estimated the subjects’ scores based on the response patterns from the calibration data using the Health and Disability Research Institute (HDRI) software. 23 As items were selected for administration in the simulation, responses were taken from the actual data set. After each response, an estimated score based on all administered items to that point in the simulation and the associated standard error was calculated. The selection of the next item was based on the item that could provide the most information at the estimated score. For these simulations, we established specific stop-rules based on the number of items (5, 10, or 15-item versions for each PEDI-CAT domain). For each subject this procedure produced one simulated record of responses for each 5-, 10-, and 15-item CAT version.
In a second series of simulations to evaluate the performance of the PEDI-CAT, a series of Monte Carlo CAT simulations were conducted based only on the item parameters. For each logit point from −4 to +4 with a 0.5 interval, we simulated 100 subjects’ response patterns to the item bank. We converted the IRT logit metric to the more conventional PEDI-CAT scoring metric of 20–80. As in the real data simulations, we contrasted 5-, 10 and 15-item stop rule versions of the PEDI-CAT. Using the full-bank score as the reference, we chose the following as evaluation criteria: the average standard error (level of measurement precision also defined as the reciprocal of the information function), bias (difference between the score estimated from the CAT and full item bank), absolute bias (absolute difference of the scores estimated from the CAT and full item bank, and root mean square error (RMSE) (square root of the mean square difference between the scores estimated from the CAT and the full item bank).24–26 In addition, we provided an assessment of RMSE for the three CAT versions and the full item bank at selected intervals along each of the PEDI-CAT scales. We consider RMSE of <0.30 as the criterion for precise measurement based on a 20–80 score metric.5
We found sound evidence for the unidimensionality of each of the PEDI-CAT domains. (See Table 3). When the items of each domain were modeled as a single-factor, each PEDI-CAT domain had high fit indices (CFI all >.95; TLI all >.99), small error estimates (RMSE between .05–.08), and accounted for more than 80% of the total variance. In addition, the magnitude of the four sets of ratios of the first and second factor eigenvalues were very high, again indicating that each domain had one dominant factor.
Although the CFA results were favorable, some item-level irregularities were noted and addressed. Hence, twenty-two items across the four domains (Daily Activities-8; Mobility-8; Social/Cognitive-4; Responsibility-2) were removed due to excessive local dependence (item residual correlations >.2), poor fit (likelihood ration chi-square ration <.05), or significant DIF across samples. We did retain 16 items (Daily Activities-2; Mobility-9; Social/Cognitive-3; Responsibility-2) with either poor fit or DIF, primarily based on content expert feedback. Removal of these items would have created unacceptable floor effects and removed key content. The final PEDI-CAT item banks consisted of 68 Daily Activities items, 105 Mobility items, 64 Social/Cognitive items and 51 Responsibility items. All subsequent CAT analyses were conducted with the normative and clinical samples combined.
Based on the real data simulations, correlations of the domain scores from the full item banks and the three CAT versions were very high. See Table 4. All correlations were 0.95 or above, including the 5-item CAT versions.
Using simulations based only on the item parameter estimates with 100 random replications, we calculated the average standard error (SE), bias, absolute bias and RMSE across selected scoring intervals for the PEDI-CAT 20–80 scoring metric. The 15-item CAT versions all have lower SE differences from the full item banks than the 10- or 5-item CAT versions, with the 10-item CAT in the middle. We note that the Responsibility scale has generally greater standard errors than the other three PEDI-CAT scales. There are fairly large differences in bias and absolute bias between the 5-item CAT versions and the 10- and 15-item versions, with the 15-item version always the closest to the full item bank values.
Figure 1 illustrates that PEDI-CAT measurement precision is dependent upon the particular domain and score estimate. In general, measurement precision is much better in the mid-range of each scale than in the extremes. Using the RMSE value of <0.30 as a general criterion for precise measurement, all 5-item CAT versions have relatively high error across all domains at the floor and ceiling extremes. This pattern also occurs in the scoring intervals for the Responsibility scale. RSMEs for all the CAT versions and the full item banks are extremely small in the mid-ranges of the scales, indicating smaller measurement bias in the mid-ranges of the scales than the extremes.
All PEDI-CAT domains represented unidimensional constructs. Collectively they provide the foundation for a broad set of item banks. Up to now pediatric clinical programs have had to use multiple assessments that change as the child gets older. A measure such as the PEDI-CAT is needed so that one instrument can be used across all ages and levels of disability. This feature would allow assessment of outcomes for research and program evaluation purposes over the full span of childhood and adolescence as well as across populations using a common metric
Item banks should include items that fit closely with the domain and are spread evenly throughout the range of function. We did allow some items with DIF or fit problems (less than 6% of the total items) across samples to enter into the item bank. This decision was made based on content importance and was applied particularly in the Daily Routines and Mobility scales. This decision will be examined over time to see how frequently the CATs are presenting these items.
We found very high correlations between the full item bank scores and the PEDI-CAT scores for each of the 5-, 10-and 15-item versions. This is a fairly minimum requirement; these relationships are likely an over-estimate because we are using the same responses for the calibrations and the CAT scoring estimates. As in other studies,8, 10 the results suggest that although scores from the 15-item CAT are closer to the full item bank scores in all instances, the differences in correlations between the 10-and 5-item CATs and the full item banks are relatively small. Similarly, the difference in bias (scaled score in 20–80 metric) between the scores from the 15-item CAT and the full item set are very small; bias increases slightly with the 10-item CATs, and then gets larger with the 5-item CATs.
We also examined conditional precision (Figure 1) as measured by the root mean squared error (RMSE) for the CAT strategies at selected scale locations (100 simulees per location). The 10- and 15-item CAT versions yielded excellent measurement precision throughout most of the content range, although the 5-item CAT had difficulty at the extremes. Measurement in the mid-range of the scales, regardless of the number of CAT items, was very precise.
Findings suggest that the 15 item PEDI-CAT can be used as an accurate measure of function in clinical outcome measurement and clinical trials, reducing the burden typically placed on both parent respondents and research protocols when full item banks are administered. Other work by our group has shown that the mean time to complete 15 items in each domain (60 items total) is around 12.6 minutes.27 Additionally, the negligible amount of bias and levels of precision in child estimates obtained from the 10 item CATs suggest that it will provide clinicians with a sound measure of function to inform intervention planning with a significant savings of time. Although we did not examine responsiveness to change in the current study, we believe that the PEDI-CAT should have sensitivity to change that is at least as good as the original PEDI28 given the results of analyses to date.
This report describes the development of a CAT approach to measuring functional performance with the new PEDI-CAT. The four domains of the PEDI-CAT demonstrated good unidimensionality and IRT fit. All CATs were accurate, showed small bias except for the 5-item PEDI-CAT, and provided extremely good measurement in the middle of the range. The PEDI-CAT is a broad instrument that covers a wide-age range and meets precision requirements for research and clinical practice uses for children with functional difficulties affecting daily routines, mobility and social/cognitive skills. Future work is needed to examine additional validity aspects of the PEDI-CAT, such as responsiveness and feasibility of use by parents with more limited English skills.
Supported by: NIH/NICHD/NCMRR grants R42HD052318 (STTR Phase II award) and K02 HD45354-01 (an Independent Scientist Award to Dr. Haley).
|Domain||Content Areas||Sample Item||Response Scale|
|Daily Activities (68 items)||Eating & Mealtime||Inserts a straw into a juice box||Please choose which response below best describes your child’s ability in the following:
|Getting Dressed||Puts on winter, sport, or work gloves|
|Keeping Clean||Puts toothpaste on brush and brushes teeth thoroughly|
|Home Tasks||Opens door lock using key|
|Mobility (97 items)||Basic Movement & Transfers||When lying on back, turns head to both sides|
|Standing & Walking||Walks while wearing a light backpack|
|Steps & Inclines||Goes up and down an escalator|
|Running & Playing||Pulls self out of swimming pool not using ladder|
|Wheelchair||Goes up and down ramp with wheelchair|
|Social/Cognitive (60 items)||Interaction||Greets new people appropriately when introduced|
|Communication||Writes short notes or sends text messages or email|
|Everyday Cognition||Recognizes his/her printed name|
|Self Management||When upset, responds without punching, hitting, or biting|
|Responsibility (51 items)||Organization & Planning||Keeping personal electronic devices in working order (e.g., cell phone, computer)|
Includes: Having devices charged and available when needed; Updating software
|How much responsibility does your child take for the following activities?
|Taking Care of Daily Needs||Buying clothing at a store, from a catalog or online|
Includes: Purchasing clothing, including outerwear and undergarments
|Health Management||Following health and medical treatment requirements|
Includes: Taking prescribed medication as directed; Following dietary restrictions; Adhering to exercise or other treatment routines
|Staying Safe||Using the internet safely|
Includes: Recognizing scams and inappropriate approaches from strangers; Avoiding posting inappropriate images; Evaluating safety of files before downloading
Conflict of Interest Statement: Dr. Haley and Mr. Moed own founders stock in CRECare, LLC, which distributes the PEDI-CAT products.
Stephen M. Haley, Health & Disability Research Inst., School of Public Health, Boston University, Boston, MA, USA.
Wendy J. Coster, Department of Occupational Therapy, Sargent College of Health & Rehabilitation Sciences, Boston University, Boston, MA, USA.
Helene M. Dumas, Research Center for Children with Special Health Care Needs, Franciscan Hospital for Children, Boston, MA, USA.
Maria A. Fragala-Pinkham, Research Center for Children with Special Health Care Needs, Franciscan Hospital for Children, Boston, MA, USA.
Jessica Kramer, Occupational Therapy, Boston University Sargent College of Health & Rehabilitation Sciences.
Pengsheng Ni, Health & Disability Research Inst., Boston University School of Public Health.
Feng Tian, Health & Disability Research Inst., Boston University School of Public Health.
Ying-Chia Kao, Department of Occupational Therapy, Boston University Sargent College of Health & Rehabilitation Sciences.
Rich Moed, CREcare LLC, Newburyport, MA, USA.
Larry H. Ludlow, Department of Educational Research, Measurement and Evaluation, Boston College Lynch School of Education, Chestnut Hill, MA, USA.