|Home | About | Journals | Submit | Contact Us | Français|
ObjectiveTo conduct a multimethod psychometric evaluation to refine the Children's Somatization Inventory (CSI) and to investigate its dimensionality.MethodThe CSI was administered to 876 pediatric patients with chronic abdominal pain at their initial visit to a pediatric gastroenterology clinic. Tools from three psychometric models identified items that most effectively measured the construct of somatization and examined its dimensionality.ResultsEleven statistically weak items were identified and removed, creating a 24-item CSI (CSI-24). The CSI-24 showed good psychometrics according to the three measurement models and correlated.99 with the original CSI. The CSI-24 has one dominant general factor but is not strictly unidimensional.ConclusionsThe CSI-24 is a reliable and psychometrically sound refinement of the original CSI. Findings are consistent with the view that somatization has a strong general factor that represents a continuum of symptom reporting, as well as minor components that represent specific symptom clusters in youth with chronic abdominal pain.
Although somatic symptoms are common in the general population (Pennebaker, 1982), individuals who frequently experience medically unexplained somatic symptoms may be classified from a psychiatric perspective as having a somatoform disorder (e.g., pain disorder, somatization disorder) and from a biomedical perspective as having a functional somatic symptom or syndrome (e.g., recurrent abdominal pain, fibromyalgia, irritable bowel syndrome) (Campo & Fritsch, 1994). We developed the Children's Somatization Inventory (CSI; Walker, Garber, & Greene, 1991) to assess 35 symptoms (e.g., headache, nausea, heart racing) that often, but not necessarily, occur in the absence of identified disease. The CSI has been used in numerous studies of pediatric patients, including patients with chronic abdominal pain (Dorn et al., 2003; Mulvaney, Lambert, Garber, & Walker, 2006; Robins, Smith, & Proujansky, 2002; Walker, Garber, & Greene, 1993; Walker et al., 1991), headache (Smith, Martin-Herz, Womack, & McMahon, 1999; White, Alday, & Spirito, 2001), chronic fatigue syndrome (Garralda, Rangel, Levin, Roberts, & Ukoumunne, 1999; Smith, Martin-Herz, Womack, & Marsigan, 2003; van de Putte, Engelbert, Kuis, Kimpen, & Uiterwaal, 2006), factitious illness by proxy (Berg & Jones, 1999), and chronic unexplained pain (Konijnenberg et al., 2005, 2006). The instrument also has been used to assess symptom reporting in community samples of children receiving routine medical care (Rocha & Prkachin, 2007; Tsao & Zeltzer, 2003) and children undergoing routine immunization inoculations (Rocha, Prkachin, Beaumont, Hardy, & Zumbo, 2003).
Our initial development and validation of the CSI was based on a sample of pediatric patients with frequent abdominal pain (Walker et al., 1991). Evidence of the instrument's reliability and validity in nonclinical community samples was subsequently demonstrated both in our work in the United States (Garber, Walker, & Zeman, 1991) and in work by investigators in the Ukraine (Bromet et al., 2000; Litcher et al., 2001) and the Netherlands (Meesters, Muris, Ghys, Reumerman, & Rooijmans, 2003; Muris, Vlaeyen, & Meesters, 2001). All these validation studies noted that some items on the CSI are rarely endorsed and have low item-total correlations. This observation is not surprising given that the CSI includes items from the symptom criteria that define somatization disorder in adults [Diagnostic and statistical manual of mental disorders, third edition, revised (DSM-III-R); American Psychiatric Association, 1987], and several of these symptoms (e.g., memory loss, pain with urination) are rarely experienced by children. Thus, the first aim of the current study was to identify items that could be dropped from the CSI to make the instrument more appropriate for youth while maintaining its measurement properties.
Different approaches to scoring the CSI are evident in the literature and reflect different perspectives regarding the dimensionality of somatization. The creation of total scores by summing responses to all 35 CSI items is consistent with a view of somatization as a single dimension of somatic concern. In contrast, creation of subscales by summing responses to items that reflect symptom categories [e.g., gastrointestinal (GI), conversion] defined by the criteria for somatization disorder is consistent with a view of somatization as multidimensional. The empirical literature with adults suggests that the construct of somatization includes a large general factor that represents a general tendency to report somatic symptoms (Liu, Clark, & Eaton, 1997). However, little is known about the dimensionality of somatization in children and adolescents. Exploratory factor analysis (EFA) of the CSI (e.g., Garber et al., 1991; Litcher et al., 2001; Meesters et al., 2003) has yielded inconsistent factors, and no studies have applied confirmatory factor analysis (CFA) to evaluate whether CSI data support a general somatization factor rather than, or in addition to, symptom clusters. Such an investigation has important implications both for the conceptualization of somatization and for the diagnostic criteria for somatization disorder in children and adolescents. Thus, the second aim of the current study was to examine the dimensionality of the CSI.
Our analytic plan used tools from the three main approaches to psychometrics, namely classical test theory (CTT; Cronbach, 1951; Cronbach & Shavelson, 2004), item response theory (IRT) using Rasch modeling (Linacre, 2006b; Rasch, 1960/1980), and EFA and CFA. Because some items on the CSI are rarely endorsed, we expected to identify items with low variance that could be dropped without reducing alpha reliability. We also expected to identify items that contributed little to distinguishing among respondents. Finally, because somatization has been conceptualized both as trait-like somatic distress and as clusters of symptoms associated with different organ systems (Robbins, Kirmayer, & Hemami, 1997), we expected that EFA and CFA would yield evidence of a single large dimension, somatization, and several smaller unique factors defined by items comprising various bodily systems.
The CSI data for the current study were collected from several cohorts of pediatric patients that were not included in the initial instrument development sample (Walker et al., 1991) and have been described in detail elsewhere (Lipani & Walker, 2006; Walker, Garber, Smith, Van Slyke, & Claar, 2001; Walker et al., 2006). All participants were recruited during the period from 1993 through 2005 as consecutive new patients referred to the Pediatric Gastroenterology Clinic in Vanderbilt University Medical Center for evaluation of unexplained chronic abdominal pain. Additional eligibility requirements for study participants included being between the ages 8 and 18 years, living with parent(s) or parent figure, speaking English, having no preexisting diagnosis of organic disease (e.g., Crohn's disease, pancreatitis, ulcerative colitis) that would explain abdominal pain, having no other chronic illness (e.g., diabetes), and having no developmental disability (e.g., mental retardation or autism). Approximately 6% of patients screened for eligibility were excluded because of previously diagnosed chronic illness (e.g., diabetes) or developmental disability.
The sample used in the current study comprised 876 children (mean age = 11.66, SD = 2.47); 59% were female. The majority of the participants were Caucasian (87.8%), 4.1% were African American, and 3.1% were of other ethnicities (5% were missing this information); these percentages reflect the ethnic composition of patients evaluated in the clinic. Duration of abdominal pain ranged from 1 month to 15 years (M = 19 months, SD = 24 months). The majority of the children's mothers and fathers (80%) had completed a high school education, with the average educational level being equivalent to some college or technical school.
Parental consent and child assent were obtained upon the families’ arrival for their clinic visit, prior to the child's medical evaluation. Parents completed a questionnaire pertaining to child and family demographics. The child protocol was administered in a private room by an interviewer who read the items aloud and asked children to indicate their answers on a printed response sheet; this procedure was used to maximize similarity between administration of the questionnaire at the clinic and at a subsequent telephone follow-up (not reported here) in which items were read to children and they selected responses from a response sheet that had been mailed to them. Families were compensated for their participation. The study was approved by the institutional review board.
Children completed the CSI (Walker et al., 1991), a questionnaire that assesses the perceived severity of 35 nonspecific somatic symptoms. The CSI includes items from the symptom criteria for somatization disorder as defined by the DSM-III-R (American Psychiatric Association, 1987), items from the Somatization factor of the Hopkins Symptom Checklist (HSCL; Derogatis, Lipman, Rickels, Uhlenhuth, & Covi, 1974), and an additional symptom—constipation—that is common in functional GI disorders. Examples of CSI items include headaches, low energy, dizziness, and chest pain. One symptom derived from the DSM criteria, “pain in the genitals,” was removed from an earlier version of the CSI because it was not endorsed by study participants. The stem for symptom report on the CSI is the same as that in the HSCL, namely, “How much were you bothered by (symptom)?” (Derogatis, Lipman, & Covi, 1973; Derogatis et al., 1974). The standard time period for symptom report on the CSI is 2 weeks. The response format is a 5-point scale ranging from “not at all” (0) to “a whole lot” (4). Total CSI scores, obtained by summing all item ratings, can range from 0 to 140.
Parents completed a 10-item demographic questionnaire regarding the child's ethnicity, family constellation, and caregivers’ occupations and educational levels.
In this section, we refer to the original CSI with 35 items as CSI-35 and to the presently revised CSI with 24 items as the CSI-24. The CSI-35 data were 99.9% complete, with 97.6% of the cases having no missing items. With <5% of cases having any missing data, almost any reasonable method of dealing with missing data can be used (Harrell, 2001). We replaced missing data by single imputation using the expectation maximization (EM) algorithm (Rubin, 1991) to fill in the 2.4% missing item scores based on each patient's nonmissing responses (Little & Rubin, 1987). An advantage of EM imputation is that it preserves the variance, which may be reduced by mean imputation. In this case, the average item mean and standard deviation were the same both before and after imputation (M = 0.68, SD = 0.88).
The 876 participating patients were split into two groups by an SAS-generated random number to create a learning sample and a cross-validation sample. This split allowed us to freely experiment with the learning sample (n = 417), revise the CSI-35, and then examine the revision's psychometric properties with the fresh cases in the cross-validation sample (n = 459).
First, we examined the psychometric properties of items from the CSI-35 in the learning sample (Table I). According to CTT, good items should be free of ceilings or floors that limit their variance and produce excessive skew and kurtosis and should have high item-total correlations to give the test high alpha reliability. Thus, the CTT criteria for item evaluation were descriptive statistics (i.e., mean, SD, skew, and kurtosis) and item-total correlations. Rasch (1960/1980) modeling is a one-parameter member of the IRT family (Embretson, 1996; Lord, 1986) and focuses on test construction with practical applications (Bond & Fox, 2001; Edelen & Reeve, 2007). While two- or three-parameter IRT models are available, the Rasch IRT model was preferable because it can be scored by a clinician using a simple total score or a measure score from a table without the use of specialized scoring software. The Rasch model evaluates test items and people on a single latent trait interval scale. Good items show a logistic ogive curve in which the probability of endorsing an item increases with the strength of the latent trait (i.e., somatization) in the person, and the probability goes down as the strength of the trait increases. For example, a child low in somatization might report the occasional headache (an “easy” item), but not deafness, seizures, and blindness (“difficult” items). Items that fit the Rasch model have good ability to distinguish people on the strength of their trait, as shown by acceptable infit and outfit mean squares. Therefore, in addition to CTT criteria, we used Rasch infit and outfit indices as criteria for identifying CSI items with atypical response patterns indicative of poor measurement. “Infit” measures unexpected responses to items with a latent trait level close to the person's symptom level. “Outfit” measures unexpected responses to items with a latent trait level markedly different from the person's symptom level (Linacre & Wright, 1999). Popular criteria favor fit indices that lie between 0.5 and 1.5 (Linacre, 2006a) or 0.7 and 1.3 (Bond & Fox, 2001).
Response alternatives for the CSI items range from 0 to 4, and many items had means close to the floor (Table I). For example, the item “blindness” (23) had 96% zeros and a mean of 0.07, indicating that the item “blindness” contributed very little information about most patients because it is so rarely endorsed. To visually identify items in Table I with flooring, we underlined means below 0.30. Items with the lowest means generally had problems on multiple criteria. For example, the item “blindness” had an SD less than half that of the higher items, a skew over 7, a remarkable kurtosis of 57, a below-average item-total correlation, and unacceptable Rasch infit. Thus, the multiple criteria from CTT and Rasch gave a consistent message that the item “blindness” does not measure effectively. Moreover, the underlined items in Table I reveal a general pattern of measurement problems for items with means near the floor. Many items close to the floor had weaker psychometric properties as shown by multiple underscores. These floored items require time and effort on the patients’ part to complete, but they add little to the CSI's ability to measure. Therefore, we dropped the 10 lowest scoring items (indicated by • in Table I), namely those with means <0.30 on a 0–4 scale. These items all had warning flags on other criteria as well, such as item-total correlations or infit. An additional item, “lump in throat” (10), showed unacceptable outfit to the Rasch measurement model and a lackluster item-total correlation <0.30, so it too was dropped, leaving a revised CSI with 24 items (i.e., CSI-24). Our goal in dropping the statistically weaker items was to reduce the test length without reducing its reliability. Next we compared the reliability of the CSI-35 and the CSI-24.
The internal consistency reliability of the CSI-35 in the learning sample was good with Cronbach's alpha =.90. The Rasch person separation reliability, an index of measure sensitivity that evaluates how effectively items separate respondents, was.84, indicating that the CSI-35 has a good ability to distinguish between individuals who are higher or lower on somatization. The Rasch items separation reliability was.99, suggesting that the CSI-35 contains a wide range of item difficulties to measure patients who are low or high on somatization. In the learning sample, the reliability of the 24-item CSI was very close to that of the 35-item form, Cronbach's alpha =.88 (down from.90). Rasch person separation reliability of the CSI-24 in the learning sample remained.84, and the Rasch items separation reliability remained.99. These results based on the learning sample suggest that the 24-item version was as reliable as the 35-item version in the learning sample.
Having revised the CSI in the learning sample (n = 417), we next evaluated its psychometric properties in the cross-validation sample (n = 459). According to substantive theory, the CSI should reflect an underlying trait of somatization but should not be exclusively unidimensional because somatization can be expressed in many distinct bodily systems (e.g., nervous, musculoskeletal, circulatory, GI). A child who has functional GI symptoms will not necessarily have musculoskeletal symptoms. For this reason, we expected the symptoms on the CSI to have more uniqueness than a strictly unidimensional test.
To describe the dimensionality of the CSI, we began with a principal component analysis (PCA). The goal of the PCA was to describe the variance of the sample as parsimoniously as possible. While there has been debate about the merits of PCA and EFA (Costello & Osborne, 2005), PCA is common in the medical literature and, according to a widely used text, is the better choice for a unique mathematical solution that represents an empirical summary of the data set (Tabachnick & Fidell, 2001).
The scree plot of principal components in Fig. 1 suggests that a single large component explains almost 30% of the total variance, and a second independent component explains 8% of the variance. For reference, a parallel factor analysis of random numbers is shown by the dashed line (O’Connor, 2000). The third and later components can be ignored as they are similar to chance eigenvalues. To understand the second eigenvalue, we examined its PCA loadings. These loadings were relatively low: Only one item (loose bowel movements) had a loading >0.40. Five additional items (constipation, food intolerance, nausea, bloating, and stomach pain) had loadings between.25 and.40. These items represent common symptoms in the pediatric gastroenterology clinic population from which our sample was drawn.
Next, we conducted a CFA of a one-factor measurement model. The CFA, run with MPLUS (Muthen & Muthen, 2003), suggested that the CSI-24 does not fit a single-factor model. The comparative fit index (CFI) was low 0.74 (a good CFI is >0.95; Hu & Bentler, 1999). The root mean squared error of approximation was 0.07 (a good fit is <0.05, Steiger, 2000; or <0.06, Tabachnick & Fidell, 2001). These results are congruent with those obtained with the PCA and suggest that the CSI-24 items have uniqueness in addition to the somatization trait that they all share.
We evaluated the CSI-24 in the cross-validation sample using 10 criteria from CTT, factor analysis, and the Rasch modeling, as shown in Table II. Columns 1–2 show Cronbach's alpha, columns 3–6 show factor analysis, and columns 7–10 show Rasch modeling results. Below, we interpret the criteria in Table II in order from left to right.
(1) The internal consistency reliability of the CSI-24 was good.
(2) The average correlation between items was moderate. This moderate correlation suggests that the items have uniqueness in addition to their shared construct of somatization.
(3) In a PCA, the second component (GI symptoms) was small, suggesting that it would be difficult to build a robust second factor.
(4) In the CFA, standardized factor loadings were all positive, ranging from medium to high, suggesting that there was a strong first factor that all items measure.
(5–6) In the CFA, the CSI-24 showed a poor fit to a one-factor model, suggesting that it is not strictly unidimensional.
(7–8) Rasch fit criteria were almost always satisfactory, suggesting that the items fit the Guttman–Rasch model.
(9) The CSI-24's sensitivity to differences between people was good.
(10) Items covered a wide range of difficulty values.
A useful feature of IRT models is the ability to assess the extent to which items are targeted to a given sample. Tests can be constructed to assess individuals across a wide range of a trait or can be targeted to assess individuals in a particular range of a trait (e.g., those with clinically significant levels of a trait). Item information curves show that items with low measure scores evaluate healthier subjects with the least error, whereas items with high measure scores are well targeted toward more severe clinical cases. Most of the CSI items had “high-end” measure scores and would thus yield more accurate information about the most severe cases of somatization. This targeting of items toward the high end fits a measure that is used to identify individuals who may have clinically significant somatization.
Finally, we compared CSI-35 scores with CSI-24 scores in the cross-validation sample. The correlation between total scores was very high (r =.99, p <.001), indicating that the CSI-24 is a refinement of the original CSI-35 rather than a distinct new instrument.
Scores on the CSI-24 were somewhat higher for older children (r =.19); the effect size for age was between “small” and “medium” according to Cohen (1992). Scores also were higher for females, t(830) = 3.52, p =.0005; the effect size for gender was “medium,” d = (mean difference)/SDpooled = 0.52.
Administration and scoring instructions for the CSI-24 are presented in Appendix I. Researchers who desire equal-interval scores may prefer to use the Rasch measure scores rather than CSI total sum scores. The Rasch measure scores corresponding to each CSI total score are provided in Appendix II. A figure illustrating the relation between CSI raw total scores and Rasch measure scores is presented in Appendix III. In the midrange, raw total scores and Rasch measure scores are closely related, but at the extremes, the nonlinear Rasch measure scores give a more equal-interval measurement.
Some investigators have modified the CSI by rescoring the 5-point response format into a dichotomous score and calculating a total score as the number of symptoms endorsed (Garber et al., 1991; Garralda et al., 1999; Konijnenberg et al., 2005; Litcher et al., 2001; van de Putte et al., 2006). According to Cohen (1983), dichotomous scoring throws information away, reducing statistical power. We investigated the impact of dichotomous scoring of the CSI by applying three cut points to our data. Cutpoint A dichotomized responses as 0, 1 versus 2, 3, 4; cutpoint B dichotomized responses as 0, 1, 2 versus 3, 4; and cutpoint C dichotomized 0, 1, 2, 3 versus 4. Continuous CSI-24 scores correlated r =.96 with cut A scores; r =.92 with cut B scores; and r =.74 with cut C scores. Cronbach's alpha was reduced from.88 (continuous scores) to.84 (cut A),.80 (cut B), or.68 (cut C). Apparently cutpoints A and B do only slight harm, but cutpoint C seriously reduces reliability. In addition to the loss of reliability, dichotomous scoring makes cross-site comparisons of means problematic and therefore is not recommended.
We conducted a multimethod psychometric evaluation to refine the 35-item CSI and to examine the dimensionality of somatization. Using CSI data from 876 pediatric patients with chronic abdominal pain, we identified 11 statistically weak CSI items in a randomly selected learning sample. These items were removed, resulting in the 24-item CSI. An evaluation of the CSI-24 in a cross-validation sample showed that it has good psychometric properties that meet the requirements of CTT, Rasch measurement models, and CFA. The CSI-24 correlates.99 with the CSI-35 and has better psychometric properties, less respondent burden, and items that are more appropriate for youth. Thus, the CSI-24 is preferable to the original 35-item measure unless the specific items deleted from the original measure are of particular interest.
EFA and CFA examined the dimensionality of the CSI-24. A large factor representing the presence of multiple symptoms explained almost 30% of the total variance. All CSI items had positive standardized loadings of medium to high magnitude on this factor, indicating that one dominant general factor underlies symptom reporting associated with multiple organ systems. The second factor (GI symptoms) was quite weak and may be unreliable. These findings are similar to those of Liu and colleagues (1997) who examined the latent structure of somatization symptoms in a population sample of several thousand adults and found that all items loaded strongly on a stable general factor, although smaller factors also were evident. Our findings suggest that somatization in children also has a strong general factor that represents a continuum of symptom reporting and may have minor components that represent specific symptom clusters. Recent prospective studies provide evidence of the continuity of the general component of somatization from childhood through adolescence and young adulthood (Dhossche, Ferdinand, van der Ende, & Verhulst, 2001; Mulvaney et al., 2006; Steinhausen & Winkler Metzke, 2007). This study contributes further evidence suggesting that a trait-like component of somatization persists across development even as specific symptoms may change.
CSI scores describe the subjective severity of somatic symptoms regardless of etiology. Appropriate medical evaluation is necessary to rule out disease. Similarly, the presence of a somatoform disorder cannot be assumed without appropriate psychiatric evaluation. The CSI should not be used to make individual diagnoses of somatization disorder but must be combined with other medical and psychiatric examinations to identify possible explanations for the symptoms endorsed. However, the CSI may be useful in tracking somatic symptoms over time or monitoring treatment response in patients whose clinical evaluation yields the diagnosis of a functional somatic syndrome or somatoform disorder (cf. Gledhill & Garralda, 2006).
The CSI-24 differs from the 8-item somatic complaints scale of the Child Behavior Checklist (Achenbach, 1991) in that it includes a broad range of symptoms that represent the criteria for various somatoform disorders, as well as symptoms representing the criteria for functional somatic syndromes such as irritable bowel syndrome, fibromyalgia, and chronic fatigue syndrome. Therefore, the CSI-24 will be useful in advancing research on both a trait-like construct of somatization and functional syndromes associated with specific clusters of medically unexplained symptoms.
The present study was based on a predominantly Caucasian clinical sample and may not be representative of somatization in other ethnic groups or in nonclinical community populations. We selected a sample of patients with a primary complaint of chronic abdominal pain because this patient population was used in the original CSI validation study (Walker et al., 1991) and, moreover, abdominal pain is the most common unexplained somatic complaint of childhood and is frequently associated with a variety of other unexplained somatic symptoms. Interestingly, in addition to the major component of somatization, our CSI data showed a minor component characterized by GI symptoms. It is possible that factor analysis of CSI data from patients with other primary complaints (e.g., chest pain, headaches) would yield a minor component consisting of symptoms from the corresponding organ system. Additional research is needed to evaluate whether the factor structure of the CSI in our sample is reproduced in various clinical and community populations.
Recent theoretical and empirical literature highlights an ongoing debate regarding competing approaches to the classification of medically unexplained symptoms (e.g., Hiller, 2006; Kroenke, 2006; Sharpe, Mayou, & Walker, 2006) and whether specific symptom clusters represent one or several syndromes (Aggarwal, McBeth, Zakrzewska, Lunt, & Macfarlane, 2006; Ciccone & Natelson, 2003; Moss-Morris & Spence, 2006). To date, this literature has focused on adult populations. The CSI-24 is a psychometrically sound instrument that can be used to acquire pediatric data relevant to the classification of these symptoms. For example, the CSI-24 can be used to assess whether other pediatric somatic syndromes (e.g., fibromyalgia, chronic fatigue syndrome) provide further evidence of a general symptom reporting factor across multiple organ systems. These data will be important in evaluating the extent to which pediatric somatic syndromes represent discrete entities or are variants of a more general tendency to report somatic symptoms.
(R01 HD23264 and P30 HD15052); National Institute of Mental Health (T32 MH18921, partial to Vanderbilt University).
Conflicts of interest: None declared.
|Sum score||Rasch measure score||Sum score||Rasch measure score||Sum score||Rasch measure score|