|Home | About | Journals | Submit | Contact Us | Français|
About 500,000 children are coping with life-threatening conditions (LTC) in the United States every year. Different service programs such as an integrated pediatric palliative care program may benefit health-related quality of life (HRQOL) which is a great concern of this children population and their families. However, evidence is limited about the appropriate HRQOL instruments for use. This study aims to validate psychometric properties of a generic HRQOL instrument, the Pediatric Quality of Life (PedsQL) 4.0, for children with LTC. The parent proxy-report was used. We conducted a telephone interview to collect data of 257 parents whose children had LTC and were enrolled in Medicaid. We used standard psychometric methods to validate the PedsQL: scale reliability, item-domain convergent/discriminant validity, and known-groups validity. We also conducted Rasch analysis to assess construct validity. Results suggest that the PedsQL did not demonstrate valid psychometric properties for measuring HRQOL in this population. Rasch analysis suggests that the contents of the items in all domains did not appropriately cover the latent HRQOL of children with LTC. We document several methodological challenges in using a generic instrument to measuring HRQOL and propose a new framework to improve HRQOL measures for children with LTC. The strategies include revising the content of existing items, designing new items, adding important themes (e.g., financial challenge), and applying computerized adaptive test to better select appropriate items for individual children with LTC.
In the United States (U.S.) about 500,000 children are coping with life-threatening conditions (LTC) every year (Himelstein et al. 2004; Hoyert et al. 2006; Hynson et al. 2003). The U.S. Institute of Medicine (IOM) defines children with LTC as those who 1) carry a substantial probability of death in childhood, although treatment may succeed in curing the condition or substantially prolonging life, and 2) are perceived as potentially having fatal outcomes (Field et al. 2003). Physical and psychosocial impairments and heath-related quality of life (HRQOL) become great concerns of families and their children who have LTC (Macdonald and Callery 2008; Rahi et al. 2004).
HRQOL instruments are either condition-specific or generic. Because the disease characteristics of children with LTC are heterogeneous, generic instruments are preferred for evaluation of service programs. Generic HRQOL instruments assess basic functioning for physical, emotional, and social health. To our knowledge, studies on children with chronic conditions frequently used the Pediatric Quality of Life Inventory Generic Core Scale (PedsQL) to measure the child’s HRQOL (Varni et al. 1999).
Although the PedsQL has been validated in children who have specific chronic illnesses (e.g., Varni et al. 2007a, 2007b), its usefulness for children with LTC has not been determined. Only one study used the PedsQL to measure HRQOL for children enrolled in the Seattle Pediatric Palliative Care Project (Hays et al. 2006). However, this study was conducted based on only 41 subjects and used descriptive analyses to validate the PedsQL. It is critical to validate psychometric properties of a HRQOL instrument such as PedsQL in children with LTC before its extensive use.
The aim of this study was to evaluate the psychometric properties of the PedsQL using a sample of parents/caregivers (the term “parents” is used throughout this paper) whose children had LTC. We used the following psychometric methods to validate the PedsQL: scale reliability, item-domain convergent/discriminant validity, and known-groups validity. In addition, we performed Rasch methodology to demonstrate the construct validity, by examining whether the PedsQL can appropriately capture the levels of latent HRQOL of children with LTC.
More recently, the IOM and the American Academy of Pediatrics (AAP) called for the integration of pediatric palliative care into ongoing medical management from the time of diagnosis to the end of life (AAP 2000; Field et al. 2003). Information derived from this study would be important for exploring the challenges in measuring HRQOL for this population and suggesting potential solutions that would be useful for those interested in assessing pediatric palliative service programs.
This study is part of a large project which aims to investigate the association of fatigue, HRQOL, and shared decision-making among children with LTC. Using the enrollment file of Florida Medicaid Program, we identified children who enrolled in Medicaid between April 2006 and March 2007 and met the Florida Medicaid guidelines for integrated pediatric palliative care admission (N=1,251) (Knapp et al. 2008). Children aged between 2 to 18 years old were eligible for study enrollment. We sent a primer letter to a random sample of 936 parents whose children meet our criteria for selection, followed by a telephone contact conducted between November 2007 and April 2008. Among 936 parents, we were able to access 447 parents who had valid contact information, including 190 parents who refused to take part in the telephone interviews and 257 parents who agreed to participate (the response rate 57.5%). A sample of 255 parents who completed a HRQOL survey was used in the psychometric analyses. The University of Florida’s Institutional Review Board approved this study.
We used the PedsQL Core 4.0 to measure the child’s HRQOL. The PedsQL is a generic instrument which is comprised of 21/23 items (see below) to capture four domains of HRQOL: physical functioning (8 items), emotional functioning (5 items), social functioning (5 items), and school functioning (3/5 items) (Varni et al. 1999). The PedsQL has four age-specific versions (2–4, 5–7, 8–12, and 13–18 years), each with minor modifications in the wording of items based on the children’s ages. The school functioning includes 3 items for the group of 2–4 years and 5 items for the group of 5–18 years. In this study, we adopted the parent proxy-report of their child rather than the child’s self-report because our previous study suggests that many children with LTC were too sick to answer the survey (Knapp et al. 2008).
The parent was asked about how much of a problem a specific function has been for their child in the past 4 weeks. The response to each item is based on a five-point Likert category, ranging from “never”, “almost never”, “sometimes”, “often”, and “almost always” having a problem. Per user’s guides, we imputed a subject’s missing item value of a specific domain using a mean score of the rest of items which do not contain missing values. If more than 50% of the items in the domain are missing, the domain scores of this particular subject will not be computed (Varni et al. 1999). We calculated the domain score by summing the item scores of the corresponding domain, dividing the summated item score by the number of items used in the domain. We then linearly transformed the domain scores to a 0–100 scale. We referred 0 to the lowest HRQOL and 100 to the highest HRQOL.
We used the Clinical Risk Groups (CRGs) system (Neff et al. 2002) to classify health status of individual children for the purpose of validating the PedsQL. The CRGs query over 2,000 diagnoses and procedures from the health services claim and encounter data. We assigned each child into one of five mutually exclusive CRG categories associated with health status. The five categories are 1) non-significant, non-acute including children whose underlying chronic condition was not recorded in the claims data but were seen for routine care or whose primary expenditures were pharmacy services, 2) significant acute conditions including children with acute illnesses that could be precursors to or place the child at risk for developing a chronic disease, 3) minor chronic conditions including children with illnesses that can usually be managed effectively with few complications, 4) moderate chronic conditions including children with illnesses that are variable in their severity and progression, can be complicated, and require extensive care, and 5) major chronic conditions including children with illnesses that are serious, and often result in progressive deterioration, debility, or death. We also collected demographic data including the child’s and the parent’s age, gender, race/ethnicity, and the parent’s educational background and marital status.
We conducted descriptive analyses and sophisticated psychometric analyses to test validity of the PedsQL for children with LTC. Descriptive analyses include the estimations of mean, median, and standard deviation of the item and domain scores. To better understand the degree of impairment on HRQOL among children with LTC, we compared HRQOL scores of the present population to a reference group which was derived from our previous study (Huang et al. 2009b). The reference group is based on 1,745 representative children in Florida who were between 2 and 18 years old and enrolled in the Florida Medicaid and the Title XXI State Children’s Health Insurance Program (SCHIP). Health status for children in the reference group varies, from healthy children to children with special health care needs, including acute and complex chronic conditions (Huang et al. 2009b). We compared the difference in the mean scores of the two groups using t-tests based on a pooled standard deviation (SD) of the two groups.
We conducted the following psychometric analyses to validate the PedsQL: scale reliability, item-domain convergent/discriminant validity, known-groups validity, and construct validity (Fayers and Machin 2007).
We calculated Cronbach’s alpha coefficients to assess scale reliability (also known as internal consistency). Cronbach’s alpha indicates the degree to which items of the same domain yield consistent results. Alpha coefficients above 0.7 are considered as acceptable for the purpose of group comparisons (Fayers and Machin 2007).
Item-convergent/discriminant validity measures whether the designed items capture the concept the corresponding domains intend to measure. We calculated Pearson’s correlation coefficient to assess item-domain convergent/discriminant validity, where convergent validity examines the correlation between the score of a specific item with the score of the corresponding domain (with overlap adjustment). In contrast, discriminant validity examines the correlation of the score of a specific item with the score of the other domains. To satisfy convergent validity, we expect to observe moderate (r=0.30–0.49) or strong (r≥0.5) correlation coefficients; for discriminant validity, we expect to observe small (r=0.10–0.29) or negligible correlation coefficients (r<0.1) (Cohen 1988).
We define known-groups validity as the extent to which the HRQOL instrument can discriminate between various groups of health conditions such as disease severity (Fayers and Machin 2007). We compared the mean differences in HRQOL scores between children with moderate and severe conditions measured by the CRGs. We conducted linear regression analyses to test the mean difference between the two groups with and without account for the influence of the child’s age and the parent’s race/ethnicity and educational background. We expect to observe greater HRQOL for children with moderate conditions compared to children with severe conditions.
We define construct validity as the extent to which the observed relationship (or hierarchy) between the designated items captures the construct of HRQOL we want to measure. We used Rasch methods to analyze construct validity of the PedsQL. Different from aforementioned analytic methods which are based on classical test theory (CTT), Rasch methods focus on item level analysis and uses mathematical models to express a subject’s response to an item as a function of the subject’s latent HRQOL trait (e.g., physical functioning) and item characteristics (i.e., item difficulty) (Smith and Smith 2004). We defined item difficulty as how easy or difficult it is for a subject to endorse an item for the attribute the item intends to measure. Rasch analysis calibrates the item difficulty and a person’s latent HRQOL on the same continuum (or metric). That is, under the same domain, items were displayed on the right panel of the latent continuum, from the bottom to top representing the easiest to most difficult items. Simultaneously, persons were displayed on the left panel of the latent continuum, from bottom to top representing persons’ worst to best HRQOL. Unlike the traditional ordinal scales, Rasch analysis converts item difficulty and person ability into the same unit (i.e., logit) for item and person-level comparisons. The value of difficulty is the location of an item on the latent continuum where, given a certain level of latent HRQOL trait, individuals have a probability of 50% to endorse this item (Smith and Smith 2004).
We conducted separate Rasch analyses using 255 LTC children and 255 healthy children to assess the hierarchy of item difficulty on the LTC children and the healthy children (the reference group). If the item hierarchy of the LTC group is different from the healthy group, this implies poor construct validity for the LTC group. We randomly selected 255 healthy children from 794 healthy children of the 1,745 representative children in Florida (see reference group described in the section of Psychometric Analyses). We identified healthy children based on the category 1 of the CRGs (see CRGs described in the section of Measures). Children’s age and gender as well as parents’ age and educational background were not statistically different between children with LTC and healthy children.
Table 1 shows the characteristics of the study sample. Mean age of the parents was 43.1 years old (standard deviation 11.4) and mean age of the children was 11.4 years old (standard deviation 5.2). Fifty-four percent of the parents were male and the majority of respondents had the educational background of high school or below (58%). Race/ethnicity of the parents was 41% White, non-Hispanic, 24% Black, non-Hispanic, and 30% Hispanic. Most respondents were married (49%). Seventy-six percent of the children were classified by the CRGs as having major chronic conditions.
Table 2 shows the distribution of the PedsQL domain scores. Of note, the column “reference group” in Table 2 represents the scores derived from our previous study (Huang et al. 2009b). Across all domains of the PedsQL, children with LTC scored significantly lower than the reference group (p<0.001). The mean scores of all domains were slightly skewed to left (i.e., negative skew). Floor effects (defined as a percentage of subjects who report the lowest scores of the measure which represents the most impaired HRQOL) were slightly larger in school functioning (5.3%) compared to other domains. Ceiling effects (defined as a percentage of subjects who report the highest scores of the measure which represents the greatest HRQOL) were slightly larger in the domains of emotional (10.2%) and social functioning (9.7%) compared to other domains.
At the item level (not shown), the item scores were not normally distributed. A greater portion of parents reported that their children “never” had a problem in the items of physical (19–48%), emotional (24–43%), social (21–48%), and school (14–32%) functioning. Parents reported that the content of several items was not applicable to their children, particularly related to physical functioning (1–15%) and school functioning (3–8%). We used a follow-up question to determine why the item was not applicable and parents reported that their children used a wheelchair, remained in bed, or were otherwise unable to walk or speak.
As shown in Table 3 (column 1), internal consistency reliability of the PedsQL was largely acceptable. Cronbach’s alpha coefficients were greater than 0.7 in the domains of physical, emotional, and school (2–4 years) functioning, and marginally acceptable in the domains of social and school (5–18 years) functioning.
Table 3 (columns 2 and 3) shows the results of item-domain convergent/discriminant validity. Pearson’s correlation coefficients suggest that convergent/discriminant validity of the PedsQL was not satisfied. Correlations between the scores for a specific item and its own domain (with overlap adjustment) were not significantly larger than the correlations between the scores for a specific item with other domains. In addition, the magnitudes of the correlations for some items with other domains were moderate to strong, rather than small or negligible, suggesting poor discriminant validity. For example, correlations between item scores and the domain score of physical functioning (0.37–0.74) were not discernable compared to the correlations between item scores of physical functioning with domain scores of emotional, social, and school functioning (0.16–0.45).
Table 4 shows that known-groups validity of the PedsQL was not satisfied. Using the CRGs as the known groups, parents reported that children with more severe conditions were not associated with impaired HRQOL than children with less severe conditions across the domains of physical, social and school functioning. This finding contrasts with our hypothesis and suggests poor known-groups validity.
Table 5 and Figs. 1, ,2,2, ,3,3, and and44 show the construct validity of the PedsQL. We conducted Rasch analyses based on 255 children with LTC and 255 healthy children derived from our previous study (Huang et al. 2009b) which is comprised of the 1,745 general children enrolled in Florida Medicaid Program. Figures demonstrate that the hierarchical relationships associated with item difficulty among the items were different between children with LTC and healthy children. This finding was held across four individual domains of the PedsQL. For children with LTC, for example, the more difficult items of the physical functioning were lifting something heavy, followed by walking more than one block and running. The less difficult items were low energy level, followed by taking a bath self and doing chores (Fig. 1, right panel). For healthy children, however, the more difficult items of the physical functioning were having aches, followed by doing chores and low energy level. The less difficult items were walking more than one block, followed by taking a bath oneself and lifting something heavy (Fig. 1, left panel).
Figures 1, ,2,2, ,3,3, and and44 also serve a visual aid to determine whether the spread of the item difficulty was wide enough to cover the latent HRQOL of the two children populations. The findings suggest that difficulties of the items across four domains may be more appropriately dispersed for the latent HRQOL of children with LTC compared to healthy children. Taking physical functioning as an example, the person mean index, defined as the difference between the mean of item scores (anchored to 0) and the mean of children’s person scores, was smaller in children with LTC compared to healthy children, which were 0.02 and 2.04 logit unit, respectively (Table 5). The mean of healthy children’s HRQOL is much higher than the mean of the item difficulty of the HRQOL. This reflects the fact that compared to children with LTC the latent HRQOL of the healthy children was excellent but was beyond the items’ capability to measure.
Two other indexes are also useful to measure construct validity of the PedsQL: person separation and item separation (Table 5). Person separation index indicates how well the PedsQL items separate the sample into statistically distinct levels of ability. Similarly, item separation index indicates how well the sample separates the items into different levels of difficulty. Higher separation (larger than 2.0) suggests a measurement scale covering a sufficient range of the construct being measured (Smith and Smith 2004). The item separations were larger than 2.0 in all domains of the PedsQL (except for social functioning) among healthy children. In contrast, the item separations were larger than 2.0 in only social and school functioning among children with LTC. Although person separation was smaller than 2.0 on all domains, this value was larger among children with LTC compared to healthy children.
This study investigated psychometric properties of the PedsQL within a population of children who had LTC and were eligible for an integrated pediatric palliative care program. We used standard psychometric tests to assess scale reliability, item-domain convergent/discriminant validity, and known-group validity. We performed Rasch analysis to assess construct validity. We found that HRQOL of children with LTC was significantly impaired across all domains compared to a reference group derived from our previous study (Huang et al. 2009b) which is comprised of 1,745 children enrolled in Florida Medicaid Program.
We also found the use of a generic instrument, the PedsQL, has less ceiling effects on domain scores among children with LTC compared to healthy children. However, a great proportion of parents still reported their children never had a problem in performing daily functioning, specifically in the physical, emotional, and social domains. This appears to be a threat to the construct validity of the PedsQL. Our Rasch analysis further indentified a substantial difference in the hierarchy of item difficulties between children with LTC and healthy children. This may be in part due to the fact that parents of children with LTC and healthy children used different internal standards to perceive the meaning of items within the same domain, leading to a change in the relative importance of the items (Breetvelt and Van Dam 1991). Although this study cannot determine how parents were truly interpreting the items, future cognitive interviews with parents to investigate their interpretations of the items will provide better answers. Of note, the comparison of the hierarchy of item difficulties between children with LTC and healthy children were not based on the same calibrated metric. Further studies need to conduct differential item functioning (DIF) or measurement invariance tests (Reeve et al. 2007) to better detect whether the measurement construct of the PedsQL is the same between children with LTC and healthy children.
We found that item-domain convergent/discriminant validity of the PedsQL was not satisfied. This suggests that the items originally designed to measure a specific concept of HRQOL are not appropriate for this population. It is likely that items of, for example, physical functioning may measure the concept of multiple domains of physical, emotional and social functioning. In addition, the known-groups validity associated with disease severity was not satisfied, where the most severely ill group measured by the CRGs did not demonstrate impaired HRQOL compared to less severely ill counterparts. We however cannot rule out the possibility that the use of the CRGs to validate the PedsQL is less appropriate for this population because data gathered for generating the CRGs is based on a one-year window prior to the HRQOL survey. For children with LTC whose diseases progress rapidly, there may be a time lag between health status determined by the CRGs and genuine health status during the survey. The unexpected findings may also relate to parental traumatic growth and adaptation regarding their children’s functioning in the face of a LTC. Parents may be resilient to the uncertainty associated with illness, and may hold a positive outlook related to their ability to function with the LTC (Maurice-Stam et al. 2008; Parry 2003).
Measuring HRQOL using generic instruments is a challenging endeavor for children with LTC. On the one hand, some items in the existing generic HRQOL instruments may not be appropriate for some of the children; on the other hand, some important domains are not included in the existing instruments. Our Rasch analysis suggests that the contents of the items in all domains may not appropriately cover the latent HRQOL of children with LTC given the unsatisfied findings of person separation and/or item separation. As shown in the Figs. 1, ,2,2, ,3,3, and and4,4, the difficulties of the items on each domain were clustered on the middle range of the latent continuum of HRQOL, which is narrower among children with LTC compared to that among healthy children. This implies we may need to include some items on the high end and low end of the scale to better discriminate HRQOL of children with LTC who have very high/low HRQOL. In addition, to resolve the ceiling effect on the PedsQL when applying in healthy children, challenging items may need to be added to the measurement.
Consistent with the Rasch findings, our open-ended inquiry also reported some parents thought that the content of several items was not applicable to their children, particularly in the domains of physical functioning. About 15% of parents reported that the “problem walking more than one block” and “problem with running” were not applicable because their children were in bed or used a wheelchair. This finding casts a doubt on the use of the static measurement module which requires all children with LTC to answer all items of the same instrument regardless of their applicability.
It is time to develop a new measurement framework and use a comprehensive approach to measure HRQOL for children with LTC. We suggest using the following strategies to improve existing HRQOL measurements. In addition to standard domains of the HRQOL, such as physical, psychological and social functioning, the new framework should account for condition-specific themes which are important to this group of children, such as fatigue, pain and symptoms, ancillary health states (e.g., vision, hearing, speech, etc.), social discrimination/stigma, resilience, financial impact (including concern of health insurance), and family functioning and family cohesion (Emerson 2003; Goldman et al. 2006). As previously described, many items in the PedsQL are not equally useful for all children because the contents of some items (e.g., walking more than one block) are not possible for some children with LTC to accomplish. We may add some condition-specific items, such as “moving more than one block using equipment such as a wheelchair” to better discriminate children with different levels of disability. Adding condition-specific items into the existing generic items can increase the instrument’s capability to cover the lower end of latent HRQOL for severely ill children.
Recognizing the heterogeneous disease characteristics of children with LTC, the use of static modules will increase measurement errors and decrease precision. In this regard, the use of dynamic methods, such as computerized adaptive tests (CAT), to select items appropriate for an individual might better assess their HRQOL (Hays et al. 2000). The Patient-Reported Outcomes Measurement Information System (PROMIS), initiated by the U.S. National Institute of Health (NIH), is a useful framework to measure HRQOL. The PROMIS develops item banks of HRQOL which can cover a broad range of latent HRQOL and allows the administration of items using a CAT module (Cella et al. 2007; Reeve et al. 2007). The domains of the pediatric PROMIS are: physical functioning—mobility, physical functioning—upper extremity, fatigue, pain impact, anxiety, depression, anger, and peer relationships. We, however, should bear in mind that the contents of the PROMIS items are generic rather than condition-specific.
Several study limitations merit attention. First, the response rate for the survey was 57.5%. While this response rate is consistent with other surveys conducted with Medicaid eligible pediatric populations (Huang et al. 2009a; 2009b), there may be inherent differences between responders and non-responders. Second, the generalizability of our findings to the entire population of children with LTC is limited because the underlying characteristics of parents of Medicaid children in this study may be different from parents of children from other socioeconomic backgrounds. Third, we assessed HRQOL of children with LTC based on a parent proxy-report rather than a child self-report. Our previous study suggests that the discrepancy in HRQOL between parent proxy-report and a child self-report was significant (Huang et al. 2009a). Ideally, it is better to collect a child’s perception about the impact of his/her diseases on daily functioning. Fourth, we chose the CRGs as our known-groups, which queries diagnostic and procedure codes to classify children into health status categories. Inherent biases may exist in the CRGs in that they do not account for gaps in coverage and provider miscoding that may have occurred. Finally, we collected and validated cross-sectional data which does not take into account changes in the children’s HRQOL associated with their disease progression. It is important to longitudinally validate HRQOL instruments, known as responsiveness to change, over their disease trajectories within this population.
We concluded that the use of a generic HRQOL instrument for children with LTC is limited. It is important to develop a new framework and use a compelling approach to measure HRQOL for this vulnerable children population.
This study is supported in part by 1 K23 HD057146-01 from the U.S. National Institutes of Health (ICH).
This study was presented in part at the 2009 International Society for Child Indicators on November 5, 2009, Australia
I-Chan Huang, Departments of Health Outcomes and Policy, and the Institute for Child Health Policy, University of Florida, 1329 SW 16th Street, Room 5277, Gainesville, FL 32608, USA.
Pey-Shan Wen, Departments of Health Outcomes and Policy, and the Institute for Child Health Policy, University of Florida, 1329 SW 16th Street, Room 5130, Gainesville, FL 32608, USA.
Dennis A. Revicki, Center for Health Outcomes Research, United BioSource Corporation, 7501 Wisconsin Avenue, Suite 705, Bethesda, MD 20814, USA.
Elizabeth A. Shenkman, Departments of Health Outcomes and Policy, and the Institute for Child Health Policy, University of Florida, 1329 SW 16th Street, Room 5235, Gainesville, FL 32608, USA.