|Home | About | Journals | Submit | Contact Us | Français|
In analyzing data from a larger study, we noticed significant disagreement between results of 2 commonly used developmental screening tools (Parents’ Evaluation of Developmental Status [PEDS; parent concern questionnaire] and Ages & Stages Questionnaires [ASQ; parent report of developmental skills]) delivered to children at the same visit in primary care. The screens have favorable reported psychometric properties and can be efficient to use in practice; however, there is little comparative information about the relative performance of these tools in primary care. We sought to describe the agreement between the 2 screens in this setting.
Parents of 60 children aged 9 to 31 months completed PEDS and ASQ screens at the same visit. Concordance (PEDS and ASQ results agree) and discordance (results differ) for the 2 screens were determined.
The mean age of children was 17.6 months, 77% received Medicaid, and 50% of parents had a high school education or less. Overall, 37% failed the PEDS and 27% failed the ASQ. Thirty-one children passed (52%) both screens; 9 (15%) failed both; and 20 (33%) failed 1 but not the other (13 PEDS and 7 ASQ). Agreement between the 2 screening tests was only fair, statistically no different from agreement by chance.
There was substantial discordance between PEDS and ASQ developmental screens. Although these are preliminary data, clinicians need to be aware that in implementing revised American Academy of Pediatrics screening guidelines, the choice of screening instrument may affect which children are likely to be identified for additional evaluation.
PEDS and ASQ are coming into increasing use and have favorable reported psychometric properties; however, little information is available about the functioning of these tools in actual practice settings or of their relative functioning in comparison with each other.
Significant discordance between the results of PEDS and ASQ screens was found. This has potentially important implications for providers as they comply with revised AAP developmental screening recommendations and choose developmental screening tools for their practice.
Developmental delay is an important problem affecting 10% to 15% of young children,1,2 with significantly higher rates among children who live in poverty.3,4 Early detection and intervention for developmental conditions such as autism, speech and language disorders, and cognitive disabilities have been shown to improve long-term academic and behavioral outcomes for affected children5–7; however, many children are not identified until school age, thereby missing treatments that are known to improve outcomes.8,9
In 2006, the American Academy of Pediatrics (AAP) issued a policy statement recommending systematic developmental screening in primary care by using a validated tool with children 9, 18, and 30 months of age.10 Newer parent-completed screening questionnaires with favorable reported psychometric properties compared with traditional provider-administered screens address a major barrier to screening: provider time.11,12 Two such screens, Parents’ Evaluation of Developmental Status (PEDS)13 and Ages & Stages Questionnaires (ASQ),14 are emerging as the tools of choice in many practices.15–17 On the basis of their quality and usability in practice, they are on a short list of recommended instruments18,19; however, data on the performance of developmental screens are based largely on standardization samples.20–22 Whereas some participants for standardization studies were recruited from primary care, screens and gold standard measures were delivered in research settings.21,23 Few studies have examined the performance of these screens in general primary care settings15–17,24 or comparatively with each other.
PEDS and ASQ represent different approaches to gathering parent observations on children's development to identify children who are at risk for developmental delays: asking parents about developmental concerns (PEDS) versus the child's specific skills (ASQ; Table 1). Both are general developmental screens rather than condition-specific tools, with adequate reported sensitivity and specificity, although these are somewhat more favorable for ASQ.10 Both are intended to identify a similar group of children at high risk for developmental problems. Recommendations for the choice of tool have focused on provider preference. We are aware of only 1 study that has compared the performance of 2 parent-completed general developmental screening tools head-to-head in the primary care setting.24
In this study, we report on the level of agreement between PEDS and ASQ screeners in a sample of children whose parents completed both at the same visit. The data are part of a larger study that was not originally intended to answer this question.25 During data analysis, we were surprised to find that PEDS and ASQ screening results were not consistent for many participants. Given that the 2 tests have similar psychometric properties and are intended to identify a similar group of children, we had expected to find good agreement between the 2 on classifications of children's development. We believe that it is important to document and describe these differences, to provide information to clinicians, and to spark additional research in this area of practice.
Participants were 6 primary care pediatricians who took part in a larger study of the effect of introducing a screening tool on parent–provider communication about child development.25 Providers (4 female and 2 male) were at an academically affiliated community hospital– based practice in Northeast Ohio that serves mainly urban, low-income families. Providers had been in practice an average of 12.0 years (SD: 7.9 years) and had a mean age 42.2 years (SD: 6.5 years).
Parents of children 9 to 31 months of age were recruited at well-child visits. In the overall study, 89 parents (82% of those potentially eligible) agreed to participate.25 Twenty-nine of 89 overall participants, those enrolled in the initial part that did not include use of PEDS (“usual care”), were excluded from this analysis because of lack of relevant screening data. For this study, 10 parent– child pairs per participating physician (n = 60) were included.
Research staff approached parents in the waiting room to describe the study, review inclusion criteria, and obtain informed consent. Parents/legal guardians who spoke English were included. Children were excluded when they had a previous diagnosis of developmental delay or known developmental condition, were enrolled in Early Intervention, or were born >8 weeks preterm. Parents received a $25 gift certificate as compensation for their time.
Providers participated in a 1-hour training session on the use and interpretation of PEDS13 and used PEDS clinically for at least 2 half-day sessions before study recruitment. Five visits with different parent– child dyads were sampled per physician by using PEDS alone (group A; n = 30) and 5 visits by using a video with information about developmental skills that are expected for most children of the child's age, encouraging parents to raise questions with the provider, followed by PEDS (group B; n = 30).
Research staff offered all parents to read study questionnaires, to address literacy barriers. Parents completed a demographic questionnaire and PEDS before the visit. Providers reviewed PEDS before the visit. Parents completed ASQ immediately after the visit, in a secondary reception area in the practice. Because of concern about parent and child fatigue at the end of the visit, research staff read ASQ questions aloud with parents. As described in ASQ administration instructions,14 simple toys needed to answer questionnaire items were available for the parent to use, if needed. ASQ was scored by study staff, and results were shared with physicians and parents by letter.
After each visit, physicians completed a 1-page checkbox form assessing the child's developmental status (no concern versus concern for delay) in multiple areas (gross motor, fine motor, expressive language, receptive language, cognitive, and social skills) and behavioral concerns. Physicians were aware of PEDS but not ASQ results in making clinical assessments.
PEDS13,21 is a validated 10-item questionnaire that elicits parental concerns in multiple developmental areas and takes 2 to 5 minutes to complete.13 According to manual instructions, PEDS was scored as positive (failed) when parents expressed ≥1 predictive/significant concern and as negative (passed) when parents expressed no predictive/significant concerns. PEDS has moderate sensitivity (79%) and specificity (80%)21 and performs well compared with other developmental screening tools.10,26,27 Providers did not correct for prematurity in scoring PEDS. After data collection, we compared PEDS scoring for chronological and corrected age and found no difference in score in any case.
Although we understand that, in practice, certain clinicians use ≥2 predictive concerns as a cutoff for a failed screen on PEDS, we scored PEDS per manual instructions, with a cutoff of ≥1 predictive concern. (Using a cutoff of ≥2 predictive concerns to take additional action, clinicians would miss approximately half of children with developmental delays identifiable by PEDS, significantly reducing its sensitivity.)
ASQ, 2nd edition14,23 a series of 19 age-based, parent-completed questionnaires, consists of 30 questions about children's current skills in 5 areas of development and yields a pass/fail score. (The third edition of ASQ was published in June 2009.) The questionnaire takes 10 to 15 minutes for parents to complete. ASQ has moderate to good sensitivity (0.70–0.90) and specificity (0.76–0.91).14,22 The cutoff for a positive screen (2 SD below the mean on ASQ) is set 1.5 SD below the mean compared with a professionally administered standardized test.14 We also understand that certain clinicians use a definition of 2 failed domains on ASQ (rather than 1) as a failed screen when scores are below but near the cutoff point. This can reduce the sensitivity of the tool; we scored ASQ as recommended in the manual.
The ASQ form closest to the child's age was selected according to manual instructions.14 Corrected age was used for children who were born 4 to 8 weeks preterm by parents’ report: weeks preterm were multiplied by 7 to determine days of prematurity. This number was subtracted from the child's chronological age, and the appropriate ASQ form was selected on the basis of corrected age. No correction was made for children who were born 1 to 3 weeks early. The conduct of this study was approved by institutional review boards at University Hospitals of Cleveland and MetroHealth Medical Center (Cleveland, OH) and Boston Medical Center (Boston, MA).
A t test was used to compare demographic characteristics between groups A and B for continuous variables and Pearson's χ2 for categorical variables. We used Pearson's χ2 test to determine whether the proportion of failed PEDS and ASQ screens was similar in the 2 groups. There was no difference in the proportion of failed screens between groups (P = .59 for PEDS; P = .56 for ASQ); neither were there differences in the agreement of the 2 screens between groups (21 [70%] of 30 of children in group A passed both or failed both screens [ie, had concordant screening results] compared with 19 [63%] of 30 in group B); data were therefore combined (N = 60 overall).
We used McNemar's test for dependent proportions to determine whether the proportion of children who failed each test, PEDS and ASQ, was similar. Finally, we used Cohen's κ for interrater agreement to determine whether the agreement between the 2 tests was greater than that expected by chance.
Because comparing agreement between the 2 screens was not the goal of the original study, we did not conduct a power calculation to determine sample size; however, posthoc calculations were conducted. For testing the hypothesis of substantial agreement between the 2 screens (Cohen's κ = 0.6), to have 80% power to reject the null hypothesis, with type I error of 5%, a sample of 22 participants who completed both tests would be required if 80% of children passed both tests (and fewer if the proportions were lower, as in our sample [63% passed PEDS and 73% passed ASQ]). A sample of 33 participants would be required for a test of good agreement (κ = 0.5) if 80% passed both screens and 53 participants for a test of moderate agreement (κ = 0.4). Our sample size of 60 is therefore sufficient for this analysis.
There were no significant differences in distribution of demographic characteristics or rate of discordant screens between the 2 groups, so combined data are presented. The mean age of children was 17.6 months (SD: 6.1 months); 42% were female, and 77% were Medicaid insured. Ninety-five percent of parents were mothers; mean parent age was 26.5 years (SD: 5.6 years); 43% were black, 45% were white, and 12% were of other race. Eight percent were of Hispanic/Latino ethnicity. Half of parents had a high school education or less, and 33% were married.
PEDS identified 37% of children (22 of 60) as being at increased risk for developmental problems, whereas ASQ identified 27% (16 of 60; Table 2). Physicians indicated developmental concerns in 22% of cases (13 of 60). The proportions of children identified by the 2 tests was not statistically different (McNemar's test: P = .26). Overall, 31 (52%) children passed both tests, and 9 (15%) failed both. Twenty (33%) failed 1 but not the other. On the basis of the differences in proportion of children identified by each test, the percentage agreement between the 2 screening tests expected by chance was 56%. The actual agreement between the 2 tests was 67%, with a Cohen's κ of 0.24 (SE: 0.13; P = .06), indicating only fair agreement (κ = 0.21–0.40) between the 2 tests.28 On the basis of the P value associated with the κ statistic, the agreement between the 2 tests was no different from that expected by chance.
Twenty (33%) of 60 children had discordant screening results. Among the 29 children with at least 1 positive screen (PEDS, ASQ, or both), 9 screened positive on both and 20 (69% of the 29) had discordant screening results. Because the specificity of PEDS for children with a single predictive concern is limited, the author of PEDS recommends using a secondary screen in this case; data on the success of this approach are not provided.21 With ASQ results considered such a secondary screen, 10 of 16 children who failed PEDS with only 1 predictive concern passed ASQ, but 3 of 6 children with ≥2 predictive concerns on PEDS also passed ASQ (Table 3).
Including physicians’ rating of development (which incorporated review of PEDS) did not seem to affect the concordance between PEDS and ASQ. Physician ratings were discordant with PEDS in 13 of 60 cases (for these 13 children, 10 passed the physician rating and ASQ but failed PEDS; 1 passed the physician rating but failed ASQ and PEDS; and 2 failed the physician and ASQ but passed PEDS). Physician ratings were discordant with ASQ in 9 of 60 cases (for these 9 children, 5 passed the physician rating and PEDS but failed ASQ, 1 passed the physician rating but failed PEDS and ASQ, and 3 failed the physician rating and PEDS but passed ASQ [Table 4]). Domain-specific information for the 29 children who failed at least 1 screen is presented in Table 5. No single pattern emerges, although differences in the area of language/communication skills are noted in many of the discordant results.
In this study, children from mainly low-income backgrounds were screened by using both PEDS and ASQ developmental screens in the primary care setting. Although the study was not originally designed to examine agreement between the screens, we found discordant results in 1 of every 3 children tested (20 of 60) and believe that it is important to bring this issue to the attention of clinicians. In examining specific developmental domains, differences in ratings of language/communication skills seemed to differ most often between the screens. If these results are duplicated in a larger study, then they would have important implications as clinicians adopt such instruments to screen children in their practice in accordance with AAP guidelines.10 PEDS and ASQ represent different approaches to gathering parent observations on children's development, for the purpose of identifying children who are at risk for developmental delays. Both, however, are general developmental screening tools that are intended to identify a similar group of children at high risk for developmental problems.
Our results raise concerns that these 2 tools, which are among the best available19,29 and are increasingly being adopted in practice,15,17,24 may identify different children. Discordance in the classification of children's developmental status might be attributable to the different format of the questionnaires (eliciting concerns versus inquiring about skills), or the tools may function differently in different populations. An important issue for the field is that standardization of screens, by and large, did not occur in actual primary care settings: some children in standardization samples were recruited from primary care, but administration of the tools occurred in research settings.21,23 In a study of a middle-class Canadian population conducted in primary care, Rydz et al24 found an unexpectedly high rate of failed screening tests (40%) using ASQ, raising concerns that such tools may not function as well in actual practice settings as in validation studies. Concerns were raised about this study, however, including use of abridged ASQ administration procedures, selection bias, and a time lag between screening and assessment with the gold standard measure.30
A number of clinicians and reviewers have commented specifically on the choice of cutoff for PEDS scoring in our study, indicating that they use a cutoff of ≥2 predictive concerns as a failed screen in clinical practice. We clarify that sensitivity and specificity data for PEDS are based on identifying children with ≥1 predictive concern.21 Although the PEDS scoring algorithm recommends referral for children with ≥2 predictive concerns, action is also recommended for children with 1 such concern: additional screening in the office or referral for additional screening. We are not aware of the availability of data of the success of this approach. If clinicians only take action when there are ≥2 predictive concerns on PEDS, however, then half of children with developmental delays that are identifiable with the tool at a given visit will be missed, dramatically reducing the sensitivity of the screen. Similarly, it has been suggested that, clinically, providers consider an ASQ screen “failed” only when a child fails 2 of 5 domains for scores below but near the cutoff (rather than 1). This approach creates a similar problem as the “alternative” PEDS scoring described previously: it reduces the sensitivity of the tool.
These results do not negate the importance of conducting formal developmental screening by using validated tools in primary care. Previous studies demonstrated significant underdetection of developmental delays when screening tools were not used in practice31–33; however, reliance on a single screening tool may not be sufficient to detect delays.
This study has a number of limitations. The original study was not specifically designed to compare the tools. The sample size was small, although sufficiently large to identify a statistically significant result, with sufficient power to detect moderate to substantial agreement between the 2 screens on the basis of posthoc power calculations. PEDS was administered before the visit, whereas ASQ was administered after the visit, which may have contributed a systematic bias because providers were aware of PEDS results and met with the parent before ASQ was completed; however, we would have expected this potential source of bias to increase the concordance between the tools. Furthermore, because PEDS elicits concerns whereas ASQ inquires about specific skills, it seems less likely that ASQ would be influenced by PEDS than PEDS by ASQ. The study design did not include a diagnostic evaluation (gold standard measure of child development), so it is not possible to know which tool was more accurate in identifying children with developmental delays in this population. Although we do not know which of the 2 screens was more accurate, these results mean that, in clinical practice, different children might be referred for additional evaluation, depending on the choice of screen.
As pediatric primary care providers comply with updated recommendations for systematic developmental screening of all children,10 it will be important to conduct research on larger, diverse samples with commonly used screening tools in actual primary care settings, including studies that directly compare the performance of screening tools by using with randomly assigned order of completion.34 This will lead to better understanding of the screens’ use, limitations, and relative performance. The process from screening and case-finding to referral and eventual treatment is extremely complex.35 Additional research to assess the process from screening to treatment for developmental delays and conditions is necessary for better understanding of the impact of revised AAP developmental screening recommendations in the primary care setting.
This study was supported by National Institutes of Health/National Institute for Child Health and Human Development grant K23 HD04773.
We thank the parents and pediatricians who participated in the study and generously contributed their time. We thank the office staff and nurses in the practice. Special thanks go to David Roberts, MD, Robert Needlman, MD, and Judy Elardo for assistance in planning the original study and Shayna Soenksen, MS, in formatting the manuscript.
The views in this article are those of the authors and do not necessarily represent the views of the National Institutes of Health/Eunice Kennedy Shriver National Institute of Child Health and Human Development.
FINANCIAL DISCLOSURE: The authors have indicated they have no financial relationships relevant to this article to disclose.