Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Value Health. Author manuscript; available in PMC 2012 September 1.
Published in final edited form as:
PMCID: PMC3173710

Differential item functioning in quality of life measure between children with and without special health care needs



Limited studies consider the effect of differential item functioning (DIF) on health-related quality of life (HRQOL) comparisons between ill and health children. The objective is to assess DIF and compare HRQOL between children with special health care needs (CSHCN) and children without needs.


Data were collected from 1195 families of children enrolled in Florida’s public insurance programs. HRQOL was measured using physical, emotional, social, and school functioning of the PedsQL. We identified CSHCN using the CSHCN Screener and assessed DIF related to CSHCN using a multiple group-multiple indicator-multiple cause (MG-MIMIC) method. We assessed the impact of DIF by examining expected item/test scores and item/test information function. We tested the discrepancy between underlying HRQOL scores of both groups before and after DIF calibration (allowing parameters of DIF items to be different and DIF-free items to be the same across both groups).


Two (25%) and 3 items (60%) of physical and school functioning, respectively, were identified with non-uniform DIF, and two items (40%) of social functioning were identified with uniform DIF. Expected item/test scores and item/test information function suggest the impact of DIF is minimal. Before DIF calibration, HRQOL in CSHCN was more impaired than children without needs (effect size −1.04, − 0.74, −0.96, and −0.98 for physical, emotional, social and school functioning, respectively). After DIF calibration, the discrepancy was increased slightly.


Although 30% of items in the PedsQL were identified with DIF related to CSHCN and children without needs, the impact of DIF is minimal.

Keywords: Children, differential item functioning, health-related quality of life, item response theory


The prevalence of children with chronic conditions is estimated to be between 11% and 31%, and this percentage has continued to increase over the last several decades [1]. Recognizing that children with different chronic conditions may share similarity in physical, psychological, social, economic, and rehabilitation outcomes, the Maternal and Child Health Bureau (MCHB) of the US Department of Health and Human Services developed the concept of “children with special health care needs” (CSHCN) [2,3]. MCHB believes screening CSHCN has important implications for health policy planning because it focuses on different chronic conditions as a whole and measures the impact resulting from the health conditions including dependency on medication, assistive devices, and/or or medical care [4]. Although most CSHCN have two or more chronic conditions, some CSHCN are at a risk and do not necessarily have any chronic conditions. Allergies, asthma, attention-deficit disorder, and emotional problems represent the most common conditions experienced by CSHCN [5].

Daily functioning and health-related quality of life (HRQOL), under the rubric of patient-report outcomes (PROs), are a great concern of children with special health needs or chronic conditions, and their parents and physicians [68]. The International Classification of Functioning, Disability, and Health (ICF) developed by the World Health Organization provides a comprehensive framework to assess the relationship among an individual’s function, activities, and participation while also considering environmental and personal factors that influence an individual’s overall health [9]. Because daily functioning and HRQOL can be considered as an individual's perceptions of health and health-related domains of well-being, the ICF framework also serves as a foundation for the design of HRQOL measurement.

In the past 20 years, several pediatric HRQOL questionnaires and survey instruments have been developed to measure children’s HRQOL [10]. Many studies reported that children with chronic conditions were associated with impaired HRQOL significantly [1115], whereas other studies found no associations [1619]. Yet, limited evidence is available with respect to disparity in HRQOL between CSHCN and children without needs. Because HRQOL is a subjective concept, psychosocial factors such as adaptive style after the illness may contribute to different conceptualization of the items in HRQOL measures [18,20]. From a measurement perspective, it is important to assess whether items of the instrument operate equivalently between CSHCN and children without needs. Without demonstrating measurement equivalence, comparisons of HRQOL between different pediatric groups can be misleading because we cannot explain whether the disparity in HRQOL among the groups reflects an unbiased measurement or an artificial phenomenon of the measurement process such as different interpretations of the items.

Differential item functioning (DIF) analysis is an item-level psychometric method to investigate measurement equivalence between the groups by exploring whether the likelihood of responding to an item between the groups is the same or not conditioning on the same level of the underlying HRQOL [21,22]. Theoretically, if the underlying HRQOL is the same for a child with special health care needs and a child without needs, we expect both children should have the same probability of responding to a particular category of an item (e.g., “never” have a problem with walking more than one block). A DIF phenomenon exists when this assumption is not held. This may lead to overestimating or underestimating the HRQOL score of a child, thus mistakenly classifying a child to different levels of health status.

Several psychometric methods have been proposed for DIF analysis. Teresi classified these methods by nonparametric and parametric methods [21]. Nonparametric methods include Mantel-Haenszel and simultaneous item bias test (SIBTEST). Parametric methods, which are frequently used in HRQOL research, include item response theory-likelihood ratio (IRT-LR) method, ordinal logistic regression (OLR) method, multiple indicator-multiple cause (MIMIC) method, and differential functioning of items and tests (DFIT). The MIMIC method, which is regarded as a special case of confirmatory factor analysis (CFA), receives more attention in recent studies [2325]. This is because the MIMIC method can model item response function and group difference in underlying HRQOL simultaneously. Importantly, the MIMIC method can incorporate additional background variables in DIF analysis, leading to compare the mean latent scores of HRQOL among different groups fairly and meaningfully. A simulation study suggests that the sample size required for obtaining adequate power and accurate parameter estimation is smaller for the MIMIC method than the IRT-LR method [26].

Several pediatric HRQOL studies have assessed DIF related to health conditions [13,27,28] or sociodemographic variables [13,29,30] based on the IRT-LR and OLR methods. For example, Langer and colleagues assessed DIF in social functioning of the PedsQL 4.0 between children with and without chronic conditions [28]. In that study, the IRT-LR method was performed and DIF is evident when the difficulty and/or discrimination parameters of the graded response model (i.e., a two-parameter IRT model) are different between both groups. Ravens-Sieberer and colleagues assessed DIF in different domains of the KIDSCREEN-52 by groups of age, gender, and health status, respectively [13]. In that study, the partial credit model (i.e., a one-parameter IRT) was used to assess uniform DIF alone. Erhart and colleagues assessed DIF in domains of the KIDSCREEN-52 between children with cerebral palsy and general children [27]. In that study, three nested logistic regression models were used to identify DIF based on the change of log likelihood and relevant parameters of different nested models. Most of the previous studies suggest that few items were flagged with DIF. Nevertheless, the MIMIC method has never been used in pediatric HRQOL research.

The main purpose of this study was to test DIF for the PedsQL 4.0 between CSHCN and children without needs. In particular, we tested uniform and non-uniform DIF based on a multiple group-MIMIC (MG-MIMIC) method. Data were collected from families of children who were enrolled in Florida’s Medicaid and State Children’s Health Insurance Program (SCHIP). We assessed DIF related to CSHCN and children without needs in four domains (physical, emotional, social, and school functioning) of HRQOL. The parent-proxy form of the PedsQL was administered because the parental ratings of children’s health outcomes dominantly determine the use of pediatric health services. We assessed the impact of DIF by examining the expected item/test scores and item/test information functioning between both groups. We also examined the change in domain scores of individual children before and after accounting for DIF items in the score calculation (i.e., DIF calibration; see DIF methodology in Methods), and tested the discrepancy in the domain scores between both groups before and after DIF calibration.


Study population

This is a cross-sectional study using data collected from two sources: 1) the 2006 annual evaluation of the Florida KidCare Program, which is comprised of Medicaid and the Title XXI State Children’s Health Insurance Program (SCHIP), and 2) the 2006 satisfaction survey of the Florida Children’s Medical Services Network (CMSN), which is the State Title V CSHCN Program. All children in the CMSN sample were also enrolled in Medicaid.

Data collection

We conducted telephone surveys between September 2006 and December 2006 for the KidCare program evaluation and between December 2006 and March 2007 for the CMSN satisfaction survey. Parents whose children were enrolled in the KidCare or the CMSN for six months or longer were randomly selected to participate. We sent an introductory letter to the parents (N=3285 from the KidCare and N=1280 from the CMSN) to explain the purpose of the surveys. For those parents who agreed to participate, we sent an informed consent form for their signatures and set up separate dates and times for interviews.

The response rates were 50% in the KidCare evaluation (N=1642) and 49% in the CMSN survey (N=627), which are similar to other studies conducted with families that are publically insured [31,32]. Among 2269 subjects who completed the surveys, 524 subjects were excluded (95 missed the entire HRQOL section and 429 missed one or more of the PedsQL domain scores), leaving 1745 subjects for further analyses (N=1290 from the KidCare and N=455 from the CMSN).


PedsQL 4.0

We used the parent-proxy form of the PedsQL 4.0 to measure children’s HRQOL. This study focuses on three age-specific versions (5–7, 8–12, and 13–18 years), each with minor modifications in the wording based on children’s ages [33,34]. The PedsQL is comprised of 23 items covering four domains: physical, emotional, social, and school functioning. A five-point response scale is utilized (from “never” a problem” to “almost always” a problem). A domain-specific latent score is calculated based on the MG-MIMIC model (see DIF methodology), with a mean of 50 and standard deviation of 10. Higher scores indicate better HRQOL.

CSHCN Screener

We used the CSHCN Screener to assess children’s special health care needs [4]. CSHCN is defined as children who “have or are at increased risk for a chronic physical, developmental, behavioral or emotional condition, and who also require health and related services of a type or amount beyond that required by children generally” [3]. This Screener specifically asks parents whether a child: 1) needs or uses prescription medicines; 2) has above-routine need for medical, mental health, or educational services; 3) is limited or prevented in any way in his or her ability to do things that most children of the same age can do; 4) needs or uses specialized therapies such as speech, occupational, and physical therapies; and 5) needs or receives treatment or counseling for an emotional, behavioral, or developmental problem.

If parents answer yes to any of the 5 questions, they are asked up to two follow-up questions to determine whether the consequence is attributable to a medical, behavioral, or other health condition lasting or expected to last at least 12 months. Only those who provide positive responses to one or more question sequences and each of the associated follow-up questions are classified as having a special health care need [4].

Propensity score approach

We used a propensity score approach to explicitly balance the distribution of sociodemographic factors between both groups to address the confounding issue raised by these factors in the DIF assessment and comparisons of HRQOL scores between both groups [35]. We estimated the propensity score of each child being in the group of CSHCN versus the group of “without needs” based on the covariates which were not balanced between both groups, including parent’s age and education background, and child’s age and gender (p < 0.05). We matched the study samples in both groups using the nearest neighbor matching method without replacement (i.e., logit of the propensity score within a caliper equal to 0.25 times the pooled standard deviation of the logit) [35,36]. As a result, we excluded 300 subjects with missing covariates for the propensity score estimation and 250 mismatched subjects. This procedure retains 1195 subjects for the final DIF analyses.

Dimensionality assessment

We conducted dimensionality assessment for each domain of the PedsQL prior to DIF analysis. We hypothesize the potential source of DIF is the different perceptions of HRQOL by parents of CSHCH and children without needs. It is likely that an item (e.g., participating in sports or exercise) designed to measure a specific functioning (e.g., physical) is essentially measuring multiple concepts of HRQOL (e.g., physical, social, and school). This will violate the unidimensionality assumption of instrument design, and can be obvious among CSHCN in part due to the perceived dependency across different domains and the adaptive style after the illness. We assessed the dimensionality of each domain using a regular confirmatory factor analysis (CFA). If the unidimensional structure is not satisfied, a bi-factor model (BFM) comprised of a general factor and four group factors (i.e., four functioning domains) is used to help improve the model fit [37].

DIF methodology

DIF occurs when an item performs differently between the groups given the same level of underlying HRQOL. In this study, we used a MG-MIMIC method to identify DIF associated with CSHCN and children without needs by incorporating additional background variables (i.e., children’s age and gender as well as parents’ age, race/ethnicity, and educational background) into the analysis [38]. Serial tests of nested models, beginning with the most constrained model, sequentially relaxing cross-group equality constraints on the parameters, and ending up with the least constrained model, are performed to detect uniform and non-uniform DIF. Uniform DIF is captured by the discrepancy in thresholds of a categorical item between both groups (e.g., CSHCN and no needs) and non-uniform is captured by the discrepancy in the loadings of an item on underlying HRQOL between both groups. Group heterogeneity in HRQOL is indicated by the discrepancy on the mean of the underlying scores calculated by the model. The MG-MIMIC method especially takes into account the purification of anchor items in DIF assessment. Anchor items are the items that are invariant in item parameters between both groups.

Technically, the model building procedures for detecting DIF are the same as for a single group-CFA with covariates. The procedures of DIF analysis are iterative and inclusive of the following steps:

  1. Estimate a baseline model which is fully invariant in factor loadings, thresholds, and residual variances of the items, variance of latent traits, and scaling factors. The only invariant parameter is the means of the latent variables between both groups which allow estimating the group differences in underlying HRQOL;
  2. Examine the model modification indices (MIs) for the baseline model and identify the modification that would result in the largest improvement in model fit based on factor loadings and thresholds of items;
  3. Use the DIFFTEST procedure [24] to fit a model that relaxes the constraint on factor loadings relative to the baseline model;
  4. Use the DIFFTEST procedure [24] to fit a model that relaxes the constraint identified in item thresholds relative to the baseline model;
  5. Compare the chi-square values from DIFTEST procedure for these two modifications to identify the largest one, and if it is significant, accept that modification and reject the other (note, a model in Step 5 becomes a new baseline model);
  6. Estimate this new baseline model, examine the MIs, and repeat Steps 2 through 6 until there are no longer any significant model modifications were identified.

DIF magnitude and impact

In addition to testing DIF statistically, we examined the magnitude of DIF visually by plotting the expected item score function (a subject’s expected response to an item across the underlying HRQOL continuum) and item information function (measurement precision of an item across the underlying HRQOL continuum) between both groups. We also plotted the expected test score function and test information function to investigate the magnitude of DIF at the aggregate (i.e., domain) level.

We assessed the impact of DIF on the change of domain scores for each child before and after DIF calibration. We examined whether the score change was above two points (equivalent to two standard deviations (SDs) or 0.2 unit of effect size) as the evidence of minimally important change. Further, we tested the discrepancy in the underlying domain scores between both groups, and compared the discrepancy before and after the DIF calibration. The criteria < 0.2, 0.2–0.49, 0.5–0.79, and >0.8 were used to indicate negligible, small, moderate, and large difference, respectively [39].

We performed dimensionality assessment and DIF analyses using Mplus 5.21 [40], and conduced the rest of analyses using STATA 9.0 [41].


Characteristics of study sample

Table 1 shows the characteristics of the 1195 children included in DIF analyses. The mean age was 10 years old (SD: 4.1) and 52% were boys. About half (48%) of the children were classified as CSHCN. The rate of CSHCN in this study was higher than the national average (13–20%) [42] because our samples were collected from children enrolled in Medicaid and SCHIP who usually have more health-related problems than a general children population [43].

Table 1
Characteristics of study sample (N=1,195)

Dimensionality assessment

Table 2 shows the dimensionality assessment of the PedsQL domains using CFA and BFM. The Comparative Fit Index (CFI) 0.95, Tucker-Lewis Index (TLI) 0.95 and Root Mean Square Error of Approximation (RMSEA) < 0.6 were used to evaluate the model fit [44]. Findings suggest the domain of emotional functioning (no DIF) performed the best in fitting unidimensionality. In contrast, the domains of physical, social, and school functioning did not meet the assumption of unidimensionality. The use of BFM to account the residual correlation among different items across four domains demonstrates a marginally acceptable model fit.

Table 2
Dimensionality assessment

Mean item scores

Table 3 shows the raw scores of individual items (column 3), and the differences between CHSNC and children without needs (column 4) derived from the final model. CSHCN were more impaired in HRQOL measured by individual items than children without needs (p < 0.001). The effect sizes in the differences of item scores between both groups were moderate to large in physical, social, and school functioning, and small to moderate in emotional functioning.

Table 3
Item parameter of CSHCN and healthy children using the PedsQL

DIF assessment

Table 3 shows the results of DIF tests (columns 5–6) and measurement properties of each item, including factor loading and threshold parameters (columns 7 and 8–11, respectively). If a specific item was flagged with DIF, different item parameters for CSHCN and children without needs were reported. Overall, seven out of 23 items (30.4%) were flagged with DIF. Among the DIF items, two items (#1 and #6) were associated with physical functioning, two items (#15 and #18) with social functioning, and three items (#19, #22 and #23) with school functioning. No DIF were identified with emotional functioning. All DIF items in physical and school functioning operated in a type of non-uniform (#1, #6, #19, #22 and #23), whereas all DIF items in social functioning were uniform (#15 and #18).

For physical functioning, compared to children without needs, the factor loadings of the item #1 (“walking more than one block”) and item #6 (“doing chores around the house”) were greater for CSHCN. Standard errors of the factor loadings on these two items were smaller for CSHCN than children without needs. This implies, these two items possess a greater ability and are more reliable to distinguish under and above a certain level of underlying physical functioning for CSHCN than children without needs.

For social functioning, the threshold parameters of the item #15 (“other children/ teens not wanting to be his or her friend”) were the same among CSHCN and children without needs, except the fourth threshold (most difficult) where the value was smaller for CSHCN than children without needs. This implies, given a higher level of underlying social functioning, parents of CSHCN were likely to report less problems in social skills than children without needs. Threshold parameters of the item #18 (“keeping up when playing with other children/teens”) were the same among CSHCN and children without needs, except the third threshold where the value was smaller for CSHCN than children without needs.

For school functioning, compared to children without needs, factor loadings of the item #19 (“playing attention in class”) were greater for CSHCN and standard errors of the factor loadings on this item were smaller for CSHCN than children without needs. This implies, this item possesses a greater ability and is more reliable to distinguish under and above a certain level of underlying school functioning for CSHCN than children without needs. In contrast, compared to children without needs, factor loadings of the item #22 (“missing school because of not felling well”) and item #23 (“missing school to go to the doctor/hospital”) were smaller for CSHCN, but standard errors of the factor loadings on these items were comparable between both groups.

Magnitude of DIF

Figure 1 shows the expected item score function (left panel) and item information function (right panel) of DIF items in both groups. The solid and dotted lines represent CSHCN and children without needs, respectively. Overall, the discrepancy in the expected item score function conditioning on the same level of underlying HRQOL differed slightly between both groups. For non-uniform items (#1, #6, #19, #22, and #23), the expected item score function of both groups interacted at a certain level of the underlying HRQOL continuum. Specifically, compared to children without needs, the expected item score function of the items #1, #6, and #19 for CSHCN was smaller at lower levels of underling functioning, whereas lager at higher levels. In contrast, compared to children without needs, the expected item score function of the items #22 and #23 for CSHCN was larger at lower levels of underling functioning, whereas smaller at higher levels.

Figure 1Figure 1Figure 1
Expected item score function and item information function of DIF items Solid line: CSHCN; dotted line: children without needs.

Interestingly, compared to children without needs, CSHCN possessed greater item information function on items #1, #6, and #19, but smaller on items #22 and #23. This is in line with the discrepancy in factor loadings of these items between both groups where a higher factor loading contribute to greater item information function. The discrepancy in item information between both groups was smaller for uniform DIF (items # 15 and #18) compared to non-uniform DIF.

Figure 2 shows the expected test score function (left panel) and test information function (right panel) of each domain in both groups. The solid and dotted lines represent CSHCN and children without needs, respectively. The discrepancy in the expected test score and test information function was negligible between both groups.

Figure 2Figure 2
Expected test score function and test information function Solid line: CSHCN; dotted line: children without needs

Impact of DIF

Table 4 shows the HRQOL scores of CSHCN and children without needs before and after DIF calibration. Before DIF calibration, the underlying HRQOL of CSHCN was impaired significantly in all domains compared to children without needs (p < 0.001). The discrepancy was large in physical, social, and school functioning (effect sizes: −1.04, −0.96, and −0.98, respectively), and moderate in emotional functioning (effect size: −0.74). After DIF calibration, the underlying HRQOL of CSHCN was also impaired in all domains compared to children without needs (p < 0.001). The magnitudes were increased slightly, however, with effect sizes −1.08, −1.05, and −1.03 for physical, social, and school functioning, respectively.

Table 4
Underlying HRQOL scores between CSHCN and children without needs before and after DIF calibration

After DIF calibration, very few children changed their domain scores by greater or smaller than two points. For physical functioning, 3.1% of children without needs increased their scores by two points. For school functioning, 1.3% of CSHCN and 5.3% of children without needs increased their scores by two points, respectively, whereas 3.6% of CHSHCN decreased their scores by two points.


In contrast to previous studies which assessed DIF associated with children diagnosed with chronic conditions [27,28], this study focuses on DIF associated with CSHCN. Specifically, we applied a MG-MIMIC method to identify uniform and non-uniform DIF. We found that CSHCN were associated with impaired HRQOL in raw item scores compared to children without needs. Seven items (30% of the total items in the PedsQL) were flagged with DIF; among the DIF items, five were non-uniform and two were uniform. For non-uniform DIF, factor loadings of the items were higher or lower between both groups, suggesting the DIF items possessed different abilities to distinguish under and above a certain level of underlying HRQOL between both groups. For uniform DIF, some threshold parameters of the items were smaller for CSHCN than children without needs, suggesting given a certain level of underlying HRQOL, parents of CSHCN were likely to report less problems in performing daily functioning than children without needs. The DIF finding, especially the non-uniform type, leads to the effect of DIF cancelled-out at the test level, as indicated by the expected test score function and test information function where the discrepancy between both groups was not overwhelmingly significant.

The impact of DIF on the score calculation of individual children and on the comparisons of HRQOL between both groups is also minimal. DIF calibration is associated with small amount of CSHCN and children without needs who changed domain scores by two points. After calibrating DIF items, the disparity in HRQOL scores on the affected domains was increased slightly between both groups.

Comparing this study to a previous study which only assessed DIF in social functioning of the PedsQL suggests that two DIF items were commonly flagged in both studies and two additional DIF item was identified in the previous study [28]. Fifty percent of DIF items in the previous study were uniform and the remaining 50% were non-uniform; however, all DIF items in this study were non-uniform. This difference may be in part related to the use of different ways to classify children’s health status and different methods to identify DIF by both studies. This study classified children by a non-categorical approach based on the CSHCN Screener, focused on parent-proxy report of child’s HRQOL, and identified DIF by a MG-MIMIC method, whereas a previous study used a categorical/diagnosis-based approach, child self-report of HRQL, and a IRT-LR method.

The DIF findings can be interpreted from a cognitive and psychosocial point of view [45]. Compared to children without needs, parents of CSHCN may change their internal standard (e.g., altering their expectations) when assessing their children’s daily functioning, such as assuming their children are not responsible for fulfilling all physical activities and social roles [4648]. After the illness journey and treatment, parents of CSHCN may be resilient to the uncertainty associated with illness and hold a positive outlook related to their ability to function with the illness, including being more positive, having a deeper appreciation for life, letting go of worry, and living for today [49]. One qualitative study found that children with asthma tend to use different strategies to normalize their daily life, such as acknowledging the existence of asthma, minimizing the impact of asthma on health, emphasizing their ability to manage asthma and making adaptations in daily life [50]. This resiliency finding emphasizes that psychosocial adjustment and adaptive coping processes may enable children with chronic conditions to maintain or improve their personal growth and HRQOL, despite a progressive, disabling disease [51].

From a measurement perspective, DIF findings indicate a violation of the unidimensional assumption of the IRT [52]. In our study, the unsatisfied fit of unidimensionality is salient in social and school functioning which is corresponding to more prominent DIF findings. This suggests the intercorrelated multidimensional phenomenon may exist in pediatric HRQOL measurement because, for the growth and developmental reasons, a child’s performance on social and school functioning will rely on his/her physical and emotional functioning. Our analyses suggest that the correlations among the four domains of PedsQL were between 0.46 and 0.60. When the data is multidimensional and a unidimensional model is applied, it is likely that item parameters will be distorted due to local dependencies caused by secondary factor and over or under representation of specific content domains. In this regard, the use of bi-factor model is helpful, especially if we conceptualize HRQOL as a dominant general factor and specific domains as group factors. A general factor represents a common trait which explains intercorrelations among items due to shared item contents. Group factors attempt to capture the item covariation that is independent of the covariation due to the general factor. This study echoes a previous study suggesting that if dimensions are moderately to highly correlated ( 0.4), the bi-factor representation will be a useful alternative [37].

A purification process for estimating the underlying HRQOL scores and the balance of covariates between groups are critical, but less discussed issues in DIF research. Purification is an iterative process using a two-step approach to estimate the underlying HRQOL scores (i.e., assessing DIF and then recalculating the underlying scores by accounting for DIF). Several studies found that identifying DIF in the first stage may change the DIF status in the second stage, and ignoring a purification process will lead to over- or underestimating the number of DIF items [21,22]. Conducting a purification process is important because DIF tests emphasize the control for the underlying HRQOL in the modeling. Unfortunately, many previous DIF studies used DIF-free items in the recalculation of the underlying scores [53]. The underlying HRQOL scores calculated by MIMIC method are purified given the fact that the parameters detecting and controlling for DIF are estimated iteratively during model building.

Covariate balancing may confound DIF detection and the subsequent comparisons of HRQOL between the groups. For example, if age of parents in the CSHCN group is younger than parents in children without needs, and the age is assumed to influence the conception of children’s HRQOL. Without balancing the age of parents in both groups, the DIF findings related to CSHCN may be confounded by the age of parents. This issue, however, has not been explicitly addressed in DIF algorithms, and the procedures for balancing covariates must be conducted in DIF tests. Although the use of a single group-MIMIC method can incorporate covariates in DIF tests, this method only detects uniform DIF [23]. Our study is among very few studies [25,54] demonstrate the usefulness of MG-MIMIC methods to investigate uniform and non-uniform DIF. Also, the use of a propensity score approach as demonstrated in this study to select matched subjects from both groups is a feasible method for DIF tests.

DIF investigation provides important insights for instrument refinement and psychological research. If we believe DIF represents biased items, instrument developers can use DIF information to guide item modification. By contrast, if we believe DIF findings reflect the results of psychosocial adjustment or true change of item meaning, item calibration between different groups of interest would be the best strategy. By item calibration, item parameters can be separately estimated for the subgroups, and these different parameter estimates can subsequently be used to estimate the underlying HRQOL. Nevertheless, further research is encouraged to use cognitive interviewing techniques to investigate the psychological mechanisms behind the DIF findings, especially how the DIF items are interpreted by CSHCN and children without needs themselves, and by parents of both groups of children [55].

Some potential limitations merit attention. First, this study is restricted to children from low income families and enrolled in Florida KidCare, which will threaten the external validity and limit the generalizability of the findings to general children populations. This low income population, however, is important to assess because they are at a greater risk of chronic and life-limiting conditions due to their poor socioeconomic circumstances [56,57]. Second, this study used the parent’s ratings of their child’s HRQOL, which may differ from the child’s or adolescent’s own ratings [38,58]. Synthesized evidence suggests the agreement about the child’s HRQOL rated by the parent and child was greater on the observable domains (e.g. physical functioning) than on the unobservable domains (e.g. emotional, social, and school functioning) [58]. Further studies based on ratings of children are needed to determine whether the same DIF can be replicated compared to parents’ ratings. Third, this study did not collect and explicitly control for parents’ psychological variables such as depression, which may confound the DIF detection if these variables were not balanced between both groups. This will threaten internal validity of the DIF findings. It is possible that parents’ ratings of pediatric HRQOL may have been influenced by their own mental health [59].

In summary, although 30% of items in the PedsQL were flagged with DIF related to CSHCN and children without needs, the impact of DIF was negligible. Nevertheless, DIF assessment is useful for refining HRQOL instruments and investigating psychological adjustment associated with CSHCN. For comparing HRQOL, researchers should conduct DIF tests, calibrate DIF items, and then compare group differences.


Source of financial support: This study is supported in part by the grants of K23 HD057146 from the NIH/NICHD (ICH) and U01 AR052181-06 from the NIH/NIAMS (ICH).


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


1. van der Lee JH, Mokkink LB, Grootenhuis MA, Heymans HS, Offringa M. Definitions and measurement of chronic health conditions in childhood: A systematic review. JAMA. 2007;297:2741–2751. [PubMed]
2. Stein RE, Jessop DJ. What diagnosis does not tell: The case for a noncategorical approach to chronic illness in childhood. Soc Sci Med. 1989;29:769–778. [PubMed]
3. McPherson M, Arango P, Fox H, et al. A new definition of children with special health care needs. Pediatrics. 1998;102:137–140. [PubMed]
4. Bethell CD, Read D, Stein RE, Blumberg SJ, Wells N, Newacheck PW. Identifying children with special health care needs: Development and evaluation of a short screening instrument. Ambul Pediatr. 2002;2:38–48. [PubMed]
5. Child and Adolescent Health Measurement Initiative. [Accessed on December 29, 2010];National survey of children with special health care needs: 2005/2006 NS-CSHCN condition-specific profile. Available from:
6. Eiser C, Morse R. A review of measures of quality of life for children with chronic illness. Arch Dis Child. 2001;84:205–211. [PMC free article] [PubMed]
7. Detmar SB, Muller MJ, Schornagel JH, et al. Health-related quality-of-life assessments and patient-physician communication: A randomized controlled trial. JAMA. 2002;288:3027–3034. [PubMed]
8. Varni JW, Burwinkle TM, Lane MM. Health-related quality of life measurement in pediatric clinical practice: An appraisal and precept for future research and application. Health Qual Life Outcomes. 2005;3:34. [PMC free article] [PubMed]
9. Stucki G, Cieza A, Ewert T, et al. Application of the international classification of functioning, disability and health (ICF) in clinical practice. Disabil Rehabil. 2002;24:281–282. [PubMed]
10. Solans M, Pane S, Estrada MD, et al. Health-related quality of life measurement in children and adolescents: A systematic review of generic and disease-specific instruments. Value Health. 2008;11:742–764. [PubMed]
11. Varni JW, Seid M, Smith Knight T, et al. The PedsQL in pediatric rheumatology: Reliability, validity, and responsiveness of the pediatric quality of life inventory generic core scales and rheumatology module. Arthritis Rheum. 2002;46:714–725. [PubMed]
12. Varni JW, Burwinkle TM, Seid M, et al. The PedsQL 4.0 as a pediatric population health measure: Feasibility, reliability, and validity. Ambul Pediatr. 2003;3:329–341. [PubMed]
13. Ravens-Sieberer U, Gosch A, Rajmil L, et al. The KIDSCREEN-52 quality of life measure for children and adolescents: Psychometric results from a cross-cultural survey in 13 European countries. Value Health. 2008;11:645–658. [PubMed]
14. Varni JW, Limbers CA, Burwinkle TM. Impaired health-related quality of life in children and adolescents with chronic conditions: A comparative analysis of 10 disease clusters and 33 disease categories/severities utilizing the PedsQL 4.0 generic core scales. Health Qual Life Outcomes. 2007;5:43. [PMC free article] [PubMed]
15. Varni JW, Limbers C, Burwinkle TM. Literature review: Health-related quality of life measurement in pediatric oncology: Hearing the voices of the children. J Pediatr Psychol. 2007;32:1151–1163. [PubMed]
16. Soliday E, Kool E, Lande MB. Psychosocial adjustment in children with kidney disease. J Pediatr Psychol. 2000;25:93–103. [PubMed]
17. Stam H, Grootenhuis MA, Caron HN, Last BF. Quality of life and current coping in young adult survivors of childhood cancer: Positive expectations about the further course of the disease were correlated with better quality of life. Psychooncology. 2006;15:31–43. [PubMed]
18. Phipps S. Adaptive style in children with cancer: Implications for a positive psychology approach. J Pediatr Psychol. 2007;32:1055–1066. [PubMed]
19. Maurice-Stam H, Oort FJ, Last BF, et al. Longitudinal assessment of health-related quality of life in preschool children with non-CNS cancer after the end of successful treatment. Pediatr Blood Cancer. 2008;50:1047–1051. [PubMed]
20. Phipps S, Larson S, Long A, Rai SN. Adaptive style and symptoms of posttraumatic stress in children with cancer and their parents. J Pediatr Psychol. 2006;31:298–309. [PubMed]
21. Teresi JA. Different approaches to differential item functioning in health applications. advantages, disadvantages and some neglected topics. Med Care. 2006;44:S152–S170. [PubMed]
22. Teresi JA, Fleishman JA. Differential item functioning and health assessment. Qual Life Res. 2007;16 Suppl. 1:33–42. [PubMed]
23. Jones RN. Identification of measurement differences between English and Spanish language versions of the mini-mental state examination. Detecting differential item functioning using MIMIC modeling. Med Care. 2006;44:S124–S133. [PubMed]
24. Yang FM, Tommet D, Jones RN. Disparities in self-reported geriatric depressive symptoms due to sociodemographic differences: An extension of the bi-factor item response theory model for use in differential item functioning. J Psychiatr Res. 2009;43:1025–1035. [PMC free article] [PubMed]
25. Carle AC. Mitigating systematic measurement error in comparative effectiveness research in heterogeneous populations. Med Care. 2010;48:S68–S74. [PubMed]
26. Woods CM. Evaluation of MIMIC-model methods for DIF testing with comparison to two-group analysis. Multivar Behav Res. 2009;44:1–27.
27. Erhart M, Ravens-Sieberer U, Dickinson HO, et al. Rasch measurement properties of the KIDSCREEN quality of life instrument in children with cerebral palsy and differential item functioning between children with and without cerebral palsy. Value Health. 2009;12:782–792. [PubMed]
28. Langer MM, Hill CD, Thissen D, et al. Item response theory detected differential item functioning between healthy and ill children in quality-of-life measures. J Clin Epidemiol. 2008;61:268–276. [PMC free article] [PubMed]
29. Traebert J, Page LA, Thomson WM, Locker D. Differential item functioning related to ethnicity in an oral health-related quality of life measure. Int J Paediatr Dent. 2010;20:435–441. [PubMed]
30. Tsutsumi A, Iwata N, Watanabe N, et al. Application of item response theory to achieve cross-cultural comparability of occupational stress measurement. Int J Methods Psychiatr Res. 2009;18:58–67. [PubMed]
31. Anarella J, Roohan P, Balistreri E, Gesten F. A survey of medicaid recipients with asthma: Perceptions of self-management, access, and care. Chest. 2004;125:1359–1367. [PubMed]
32. Dick AW, Brach C, Allison RA, et al. SCHIP's impact in three states: How do the most vulnerable children fare? Health Aff Millwood) 2004:23–75. [PubMed]
33. Varni JW, Seid M, Rode CA. The PedsQL: Measurement model for the pediatric quality of life inventory. Med Care. 1999;37:126–139. [PubMed]
34. Varni JW, Seid M, Kurtin PS. PedsQL 4.0: Reliability and validity of the pediatric quality of life inventory version 4.0 generic core scales in healthy and patient populations. Med Care. 2001;39:800–812. [PubMed]
35. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat. 1985;39:33–38.
36. D'Agostino RB., Jr Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med. 1998;17:2265–2281. [PubMed]
37. Reise SP, Morizot J, Hays RD. The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Qual Life Res. 2007;16 Suppl. 1:19–31. [PubMed]
38. Huang IC, Shenkman EA, Leite W, et al. Agreement was not found in adolescents' quality of life rated by parents and adolescents. J Clin Epidemiol. 2009;62:337–346. [PMC free article] [PubMed]
39. Cohen J. Statistical power analysis for the behavioral sciences. Hillsdale, N.J: L. Erlbaum Associates; 1988.
40. du Toit M. IRT from SSI. Lincolnwood, IL: Scientific Software International, Inc; 2003.
41. STATCorp. Stata Statistical Software: Release 9.0. College Station, Texas: Stata Corporation; 2005.
42. Bethell CD, Read D, Blumberg SJ, Newacheck PW. What is the prevalence of children with special health care needs? Toward an understanding of variations in findings and methods across three national surveys. Matern Child Health J. 2008;12:1–14. [PubMed]
43. Todd J, Armon C, Griggs A, et al. Increased rates of morbidity, mortality, and charges for hospitalized children with public or no health insurance as compared with children with private insurance in Colorado and the United States. Pediatrics. 2006;118:577–585. [PubMed]
44. Hu L, Bentler BM. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6:1–55.
45. Bloem EF, van Zuuren FJ, Koeneman MA, et al. Clarifying quality of life assessment: Do theoretical models capture the underlying cognitive processes? Qual Life Res. 2008;17:1093–1102. [PubMed]
46. Walker LS, Zeman JL. Parental response to child illness behavior. J Pediatr Psychol. 1992;17:49–71. [PubMed]
47. Walker LS, Garber J, Greene JW. Psychosocial correlates of recurrent childhood pain: A comparison of pediatric patients with recurrent abdominal pain, organic illness, and psychiatric disorders. J Abnorm Psychol. 1993;102:248–258. [PubMed]
48. Walker LS, Garber J, Van Slyke DA. Do parents excuse the misbehavior of children with physical or emotional symptoms? An investigation of the pediatric sick role. J Pediatr Psychol. 1995;20:329–345. [PubMed]
49. Parry C. Embracing uncertainty: An exploration of the experiences of childhood cancer survivors. Qual Health Res. 2003;13:227–246. [PubMed]
50. Protudjer JL, Kozyrskyj AL, Becker AB, Marchessault G. Normalization strategies of children with asthma. Qual Health Res. 2009;19:94–104. [PubMed]
51. Schwartz CE, Sprangers MA. Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Soc Sci Med. 1999;48:1531–1548. [PubMed]
52. Roussos L, Stout W. A multidimensionality-based DIF analysis paradigm. Appl Psych Meas. 1996;20:355–371.
53. Holland PW, Thayer DT. Differential item performance and the Mantel-Haenszel procedure. In: Wainer H, Braun HI, editors. Test Validity. Hillsdale, N.J: L. Erlbaum Associates; 1988.
54. Jones RN. Racial bias in the assessment of cognitive functioning of older adults. Aging Ment Health. 2003;7:83–102. [PubMed]
55. Ercikan K, Arim R, Law D, et al. Application of think aloud protocols for examining and confirming sources of differential item functioning identified by expert reviews. Educ Meas: Issues Pract. 2010;29:24–35.
56. Stein RE, Shenkman E, Wegener DH, Silver EJ. Health of children in title XXI: Should we worry? Pediatrics. 2003;112:e112–e118. [PubMed]
57. Szilagyi PG, Shenkman E, Brach C, et al. Children with special health care needs enrolled in the state children’s health insurance program (SCHIP): Patient characteristics and health care needs. Pediatrics. 2003;112:e508. [PubMed]
58. Eiser C, Morse R. Can parents rate their child's health-related quality of life? Results of a systematic review. Qual Life Res. 2001;10:347–357. [PubMed]
59. Waters E, Doyle J, Wolfe R, et al. Influence of parental gender and self-reported health and illness on parent-reported child health. Pediatrics. 2000;106:1422–1428. [PubMed]