The potential social health items were administered to 3,048 children. Each item was administered to at least 754 respondents. The sample was diverse. The group was 51.8% female. The participants reported belonging to the following racial and ethnic groups: 60% white, 21.1% black, 5.6% multiracial, 10.1% other races (Asian, Pacific Islanders, Native American, other), 3.2% race not reported, and 17.5% Hispanic or Latino. Approximately 23% had a chronic medical condition within the past 6 months. The age distribution was split between the younger and older children, with 53% between the ages of 8–12 years old and 47% 13–17 years old (see ).
Study Participant Characteristics
The data from forms 3 and 4 were used for the initial EFAs because those forms included most of the potential social health items. These EFAs revealed two results: (a) The “social function” and “sociability” items did not resolve themselves into factors, or clusters of items representing dimensions of individual differences, and (b) the item sets were factorially complex. Guided by significant changes in the goodness-of-fit statistic and its associated root mean squared error of approximation (RMSEA), the analysts extracted and rotated five factors for the form 3 data and four factors for the form 4 data. Inspection of the items with large loadings on each factor revealed three factors that appeared on both forms. The first of these had large loadings for the positively worded items, the second had large loadings for the negatively worded items, and the third had large loadings for items that asked about relationships with adults (parents, teachers) as opposed to peers. In addition to those three factors common to forms 3 and 4, each form also had factors that involved smaller numbers of items that either reflected closely related content (e.g., “teased other kids” and “was mean to other people”) or wording similarity (e.g., items that included the phrase “got along”).
The analysts abandoned the distinction between “social function” and “sociability” because those did not appear to be dimensions of individual differences supported by the data. Instead, the analysts used the results of the EFAs of the data from forms 3 and 4 to guide the development of CFA models for all four forms. As an illustration of these analyses, shows the factor loadings and residual correlations for the four-factor model (with five residual correlations) that was ultimately fitted to the data from form 3. The model shown in includes a general factor (which would be the only dimension if the items measured a single construct), and three additional orthogonal factors for the positively worded items, the negatively worded items, and items that involved adults or family. The latter three factors indicate that there are dimensions of individual differences related to endorsement of positively worded items, negatively worded items, and items involving relations with adults. This four-factor model is almost a bifactor model; it would be a bifactor model if each item had nonzero loadings on only one factor in addition to the general factor. That rule is violated by the items on the “Adult” factor and by the residual correlations, so it is an augmented bifactor or two tier model.
Factor Loading Estimates for a CFA Model for the Items on Form 3
The model shown in also includes significant residual correlations between five locally dependent pairs of items. Some of these pairs have similar content (“kids wanted to be with me” and “I did things with other kids” or “kids made fun of me” and “kids were mean to me”); others have similar words (“getting along”). This model fit the data for form 3 reasonably well. The goodness of fit χ2
was 338 on 113 df
, with associated RMSEA of 0.05, CFI = 0.95, TLI = 0.98. The latter three values are at levels considered to reflect satisfactory fit. Generally, similar models fit the data for the other three forms. The model for form 1 included a general factor and positive and negative second tier factors, and three doublet residual correlations, with a goodness of fit χ2
value of 71 on 32 df
, with associated RMSEA of 0.04, CFI = 0.97, TLI = 0.98 (Table A1 in the online data supplement
). The model for form 2 included a general factor and two second-tier factors for items about interactions with peers (that were largely negatively phrased) and with adults (largely positively phrased), and three doublet residual correlations, with a goodness of fit χ2
value of 112 on 42 df
, with associated RMSEA of 0.05, CFI = 0.97, TLI = 0.98 (Table A2 in the online data supplement
). The model for form 4 was considerably more complex, with a general factor and four second-tier factors for positively and negatively worded items, and for items about interactions with friends and items that reflected meanness or bullying, and three doublet residual correlations, with a goodness of fit χ2
value of 194 on 85 df
, with associated RMSEA of 0.05, CFI = 0.97, TLI = 0.99 (Table A3 in the online data supplement
Using these results as a guide, the pediatric scale development group rethought the measurement goal and concluded that the test item pool would support the development of a scale focused on the quality of peer relationships, using largely positively worded items. Although it appeared there could be another dimension tapping children’s interaction with adults, there were not sufficient items in the pool to make a separate scale for that construct. For the most part, negatively worded items appeared to measure individual differences in the propensity to endorse negative statements; as that propensity was not the target domain, the negatively worded items were removed from the bank.
Based on this redefinition of the measurement goal, 24 items were selected as the potential new item pool. All of the items involve interactions with peers, and almost all are positively worded. Another set of CFA models were fitted to the data for these items (four items on form 1, five on form 2, and eight on form 3, and seven on form 4). Unidimensional CFA models, with error covariances indicating two LD pairs of items on form 3, and one LD pair on form 4, fit the data reasonably well. For form 1, the goodness of fit χ2 value was 7.8 on 2 df, with associated RMSEA of 0.064, CFI = 0.97, TLI = 0.98; for form 2, χ2 = 7.2 on 5 df, associated RMSEA = 0.025, CFI = 0.99, TLI = 0.99; for form 3, χ2 = 37.1 on 15 df, associated RMSEA = 0.044, CFI = 0.99, TLI = 0.99; and for form 4, χ2 = 76.1 on 10 df, associated RMSEA = 0.098, CFI = 0.97, TLI = 0.97. Before IRT calibration, four additional items were withdrawn because they appeared to be very similar to items that appeared on other forms; given the multiform design, LD could not be detected statistically for items on different forms, so pairs of items that were judged likely to be locally dependent were reduced to a single item.
The remaining 20 items (with the stems listed in ) were calibrated with the graded response model. To avoid potential influence of LD on the item parameter estimates, parameter estimation was calculated twice for the items on forms 3 and 4, once including only one of the items in each pair the CFA had indicated to be locally dependent and a second time including the other item in each pair. The items were checked for DIF between boys and girls; three items exhibited significant gender DIF after the Benjamini-Hochberg correction for multiplicity. After examining the size of the DIF for each of these items, two were removed from the final item pool [“I could talk with my classmates” (higher scores for boys) and “I was able to stand up for myself with other kids my age” (higher scores for boys)]. For the third item, “Other kids wanted to be my friend,” the DIF was primarily attributable to the fact that the item is slightly more discriminating for girls than for boys; this “nonuniform” DIF was less than one point at any point on the scale. Given this relatively low level of DIF and the usefulness of the item, it was retained. Only one item (“I spent time with my friends”) exhibited significant DIF between younger and older children, and the effect size was very small (i.e., a fraction of a point at all levels of the latent variable) so that item was also retained.
Items in the Peer Relationships Item Bank, With GRM Parameter Estimates and Goodness of Fit Statistics, Along With Items Removed Because of LD or DIF
The less discriminating items from each of the three LD pairs identified in the CFA analysis were also removed, leaving the 15-item PROMIS Pediatric Peer Relationships item pool shown in . The GRM item parameters are shown in , along with the SS-χ2 item-level diagnostic statistics and their associated d.f. and p values. Two of the values of the SS-χ2 item level diagnostic statistics were significant after Benjamini-Hochberg correction for multiplicity; however, examination of the underlying tables of frequencies suggested that those χ2 values were large, not because of poor model fit, but rather because of confluences of small observed and expected values. As this is common in the large contingency tables, summed-scores by five response alternatives, on which those statistics are based, those items were retained.
A suggested short form of eight items was selected based on maximizing information over the latent trait using information curves and considering the item content to make sure the items represent several facets of social health. The short form items are identified with “x” in the “SF” column of . The online data supplement
provides a table of the IRT scaled scores that correspond with each summed score for the short form (Table A4 in the online data supplement
shows the total information curve for the 15-item pool (solid curve) and the information curve for the 8-item short form (dashed curve) plotted against the score continuum on the T score (mean 50, standard deviation 10) scale that is standard for PROMIS measures. Measurement precision is good for the lower range of individual differences in peer relationships, up to nearly a standard deviation above the mean for the entire pool, and somewhat less for the short form.
The total information curve for the 15-item pool (solid curve) and the information curve for the 8-item short form (dashed curve).