|Home | About | Journals | Submit | Contact Us | Français|
This study explored the application of latent variable measurement models to the Social Anhedonia Scale (SAS; Eckblad, Chapman, Chapman, & Mishlove, 1982), a widely used and influential measure in schizophrenia-related research. Specifically, we applied unidimensional and bifactor item response theory (IRT) models to data from a community sample of young adults (n = 2,227). Ordinal factor analyses revealed that identifying a coherent latent structure in the 40-item SAS data was challenging due to: a) the presence of multiple small content clusters (e.g., doublets), b) modest relations between those clusters which, in turn, implies a general factor of only modest strength, c) items that shared little variance with the majority of items, and d) cross-loadings in bifactor solutions. Consequently, we conclude that SAS responses cannot be modeled accurately by either unidimensional or bifactor IRT models. Although the application of a bifactor model to a reduced 17-item set met with better success, significant psychometric and substantive problems remained. Results highlight the challenges of applying latent variable models to scales there were not originally designed to fit these models.
The Social Anhedonia Scale (SAS; Eckblad, Chapman, Chapman, & Mishlove, 1982) is a 40-item dichotomously scored self-report measure used frequently in psychiatric and general population research. Patients with schizophrenia consistently report substantial elevations on this scale and higher levels of social anhedonia are related to worse functioning and outcome (Horan et al., 2006; 2008). Moreover, behavioral genetic studies indicate the presence of elevated SAS scores among unaffected biological relatives of those with schizophrenia (Horan et al., 2006, 2008). There is also extensive evidence from non-clinical samples that elevated SAS scores are an indicator of vulnerability for the development of schizophrenia. For example, in cross-sectional studies, healthy individuals with elevated SAS scores demonstrate neurocognitive, physiological, and psychological abnormalities similar (though attenuated) to those with schizophrenia. In prospective studies, individuals with elevated SAS scores demonstrate significantly higher risk for later development of schizophrenia-related disorders (Mishlove & Chapman, 1985; Kwapil, 1998; Gooding, Tallent, & Matts, 2005; 2007). Thus, a large and diverse body of research based on the SAS has provided important insights into vulnerability to schizophrenia-related psychopathology.
Although the SAS has clearly been an important and highly influential research instrument, it was developed nearly 30 years ago and fundamental questions about its applicability in contemporary psychopathology research remain unaddressed. Specifically, what exactly is measured by the SAS or, equivalently, what is the latent dimensional structure of the SAS – does it assess:
The main objective of this research is to evaluate the latent structure of the SAS to determine the major source(s) of variance that affect scores on this measure. Stated differently, the objective of this study is to explore the degree to which SAS responses are consistent with the above described models.1 To address this question, we apply unidimensional and bifactor item response theory (IRT; Embretson & Reise, 2000) models to data collected from a community sample (n = 2,227) of young adults.
However, this project is not merely a technical exercise in latent variable model fitting nor will we engage in a “fit index contest” to determine the “best” model. Instead, our analyses are designed to discover how and why SAS item responses, considering all 40 items, conform or fail to conform to alternative structural representations both statistically and substantively. Our initial hope is that the data are consistent with either Model A or Model B. In turn, our motivation for applying an IRT model rests on our interest in using it to: a) expand the range of questions researchers can ask using SAS responses (e.g., questions regarding differential item functioning), and b) to make SAS responses more amenable for use in other latent variable statistical procedures such as structural equations modeling (SEM) and latent growth curve (LGC) modeling. Although our focus here is on the SAS, similar conceptual and methodological issues apply to many commonly used questionnaires in personality and psychopathology research that were developed prior to the advent of latent variable models.
In what follows, we first describe the SAS and previous factor analytic investigations of this scale. Second, we review the application of unidimensional and bifactor IRT models and describe key model assumptions. Third, using SAS data collected from a large, community-based sample, we report a sequence of analyses that assess the latent structure of the SAS.
The SAS is a 40-item dichotomously scored self-report measure designed to assess individual differences in the tendency to engage in, and derive positive emotions from, social relations. The scale was developed in the 1970s and 80s as set of self-report measures to assess traits associated with schizotypy or psychosis-proneness (Chapman et al., 1976; Chapman et al., 1994; Eckblad & Chapman, 1983; Mishlove & Chapman, 1985). The item content of the SAS was designed to reflect popular theories of anhedonia and schizotypy (e.g., Meehl, 1962; Rado, 1960). The SAS was not developed through factor analytic methods and no attempt was made to construct a measure that would produce responses that are consistent with a latent variable model, either unidimensional or multidimensional. Rather, among other criteria (e.g., low endorsement rate (because social anhedonia is assumed rare), low correlation with social desirability measures), items were retained for the SAS that correlated reasonably well with other candidate SAS items, but were not too highly correlated with measures of other schizotypy related constructs.
The content for each of the 40 SAS items is shown in Table 1. Inspection of the item content suggests a number of psychologically rich and important themes. Despite this content heterogeneity, most studies that included the SAS have scored the instrument as representing a single construct and report high internal consistency reliability (e.g., coefficient alpha > .80; Fonseca-Pedero et al., 2009; Horan et al., 2008). Of course, this high internal consistency is purchased at the price of a lengthy 40-item questionnaire, implying a relatively modest average inter-item correlation, which in turn suggests a weak general trait underlying the items. Moreover, inspection of Table 1 reveals that the SAS contains several item clusters (e.g., #19, #37) that are arguably the same question presented twice in slightly different forms, thus further inflating the internal consistency reliability.
Although there are several factor analytic investigations of schizotypy related measures (e.g., Kwapil, Barrantes-Vidal, & Silvia, 2008), there are surprisingly few exploratory or confirmatory factor analyses of SAS responses. Ostensibly, the SAS was designed to measure a single construct and one would therefore expect to find that a single latent dimension would reproduce the item inter-correlations well. This is exactly what Fonseca-Pedrero et al. (2009) argue in a confirmatory factor analysis using a translated version of the SAS in a Spanish college student sample. Specifically, in a one-factor model, these researchers found: RMSEA=.067 and CFI=.92, indicating an acceptable recovery of the original tetrachoric correlation matrix.
However, in reference to Fonseca-Pedrero et al. (2009), we note that: a) an acceptable fit to a one-factor model does not imply a “strong” single dimension or that the loadings are estimated correctly (i.e., unbiased by correlated residuals), b) alternative multidimensional models or modification indices were not reported, and c) the authors reported very high loadings for many items (i.e., six loadings > .70, and one loading was .90) and an exceptionally high coefficient alpha (α =.95). In short, these latter findings suggest that SAS responses in this Spanish sample were more internally consistent than is typically found in U.S. samples.
Despite this finding suggesting a single common factor, several researchers have suggested that the item content of the SAS may tap into multiple distinguishable domains. For example, some have suggested that the SAS includes items that are conceptually unrelated to anhedonia (e.g., Luwandowski et al., 2006 – negative affect, emotion dysregulation). Others have attempted to select homogeneous subsets of items on a rational face-valid basis, including items described as measuring “social aversion” or “disinterest” (Rector et al., 2005; Brown et al., 2007; Granholm et al., 2009). However, we are aware of only one study that empirically evaluated whether the SAS has a multidimensional structure.
In a large college student sample, Blanchard et al. (2000) used principal components analysis followed by direct oblimin rotation to argue for a four component solution. The basis for selecting four components was the size of the eigenvalues and the fact that solutions of higher dimensionality resulted in uninterpretable findings (e.g., factors that were defined by two-item doublets). The four components were labeled: a) lack of importance of close friends (#19, #37, #33, b) lack of involvement with others (#36, #4, #32), c) preference for being alone (#34R, #6, #8R), and d) lack of emotional awareness (#1R, #22R, #39R). Modest correlations were found among the four estimated components ranging from .30 to .40. Thus, the Blanchard et al. study provides evidence that multiple factors can be extracted from SAS item responses, and that these dimensions are correlated. In turn, such findings suggest the presence of a general (or second-order) dimension of weak to modest strength as well.
In summary, although the SAS was initially developed such that scores on the measure reflect variation on a single unitary construct, empirical research that has evaluated the structure of this widely used measure is scant. While the available evidence does suggest the presence of a single common factor (Model A), it also suggests the presence of both a general factor and psychologically meaningful facets that contribute distinguishable sources of variance (Model B). In our subsequent analyses, we attempt to empirically sort out these sources of variance and explore the extent to which they represent either mere nuisance (and thus an acceptable unidimensional IRT model) or important sources of variance that prevent unidimensional measurement (and thus a bifactor representation may be appropriate). Prior to performing these analyses, we briefly review IRT models and associated assumptions.
For dichotomously scored items, the fundamental unit of an IRT model is an item response curve (IRC) that mathematically relates individual differences on a continuous latent trait (θ) with the probability of endorsing an item (xi) in the keyed direction. By far the most commonly applied model to describe the IRC for non-cognitive measures is the two-parameter logistic model (2PL) defined in Equation 1.
In Equation 1, individual differences are represented by θ – a latent variable expressed on a z-score scale with mean zero and variance one, and each item is characterized by two parameters: α is an item discrimination parameter and β is an item location parameter. The 1.7 in Equation 1 is a scaling factor included for reasons that are beyond the present scope. In turn, this facilitates translation between IRT and factor analytic models. Item discrimination parameters are analogous to factor loadings – items with higher discriminations are better able to differentiate between individuals. The item location parameter corresponds to the location on the latent trait scale where the probability of endorsing an item is .50 and reflects where on the latent trait an item is most discriminating (i.e., psychometrically informative).
The model in Equation 1 is a “unidimensional” model (Model A; Figure 1) because it contains only a single variable to reflect the individual differences measured by an item set. Thus, a critical assumption is that a single common factor explains the item inter-correlations (i.e., common variance). When item response data are not unidimensional, forcing data into a unidimensional model leads to distorted item parameter estimates (Steinberg & Thissen, 1996). The most likely form of this distortion is that item discrimination parameter estimates are too high and do not properly represent the relation between item responses and the common target latent trait. Moreover, the latent variable may not validly represent the target variable of interest, but rather will reflect a weighted composite of the multiple common dimensions influencing the items.
Despite the above warnings, researchers have long recognized that few psychological measures produce item responses that are strictly unidimensional. In fact, it is arguable that many important measures are constructed with built-in multidimensionality caused by clusters of items that reflect different aspects of a trait (Chen, West, & Sousa, 2006, p. 189; Hull, Lehn, & Tedlie, 1991, p. 922). Consequently, much research has been devoted to studying the robustness of IRT parameter estimates under different degrees of unidimensionality violation.
Stemming from these robustness studies (e.g., Drasgow & Parsons, 1983), researchers have concluded that to apply models like Equation 1 data needs to be “unidimensional enough” so that the item parameter estimates properly reflect the latent trait held in common among the items and are not biased by additional common dimensions caused by clusters of items with similar content (i.e., correlated errors in SEM, or, local independence violations in IRT). How to define and evaluate “unidimensional enough” has been much debated and a wealth of statistical tests and rules-of-thumb have been proposed. Nevertheless, there is a consensus in the field that models like Equation 1 are applicable if there is a “strong” general factor (and thus multidimensionality due to content parcels is likely “mere nuisance”).
When a unidimensional model fails to sufficiently account for the data, alternative structural representations (e.g., Model B) must be found or the idea of fitting any latent trait model must be dropped. In this study, we investigate the applicability of a multidimensional IRT model based on a bifactor structural representation (Model B, Figure 2; Holzinger & Swineford, 1939; Schmid & Leiman, 1957). A bifactor is a latent structure where each item loads on a general factor. This general factor reflects what is common among the items and represents the individual differences on the target dimension a researcher is most interested in (i.e., social anhedonia). In addition, a bifactor structure specifies two or more orthogonal “group” factors. These group factors represent common factors measured by the items (e.g., preference for being alone; lack of interest in close friendships) that potentially explain item response variance not accounted for by the general factor (social anhedonia). Group factors are assumed to be uncorrelated because: a) it is assumed that once the general is controlled for groups factors are now unrelated, and equivalently, b) the presence of correlated group factors would suggest the presence of additional general factors.
In what we will refer to as a “restricted” bifactor IRT model (Gibbons & Hedeker, 1992) each item discriminates on a single general factor (G), and at most on one additional orthogonal group factor (p' = relevant group factor). In this case, Equation 1 becomes:
The bifactor model in Equation 22 assumes that the items all measure a common latent trait (i.e., social anhedonia), but that the variance of each item is also influenced by a second common factor caused by clusters (or parcels) of items tapping similar aspects of the trait. Thus, a chief virtue of the bifactor model is that it allows researchers to retain a goal of measuring a single common latent trait of social anhedonia but also attempts to model, and thus control for, the variance that arises due to additional common factors. In other words, the bifactor model allows the researcher to explore the degree to which items reflect a common target trait as well as a subdomain.
The bifactor IRT model also has restrictive assumptions that need to be met in order for group factors to be identified, substantively interpretable, and have parameters that are properly estimated. For example, for a group factor to be identified, there must be at least three items that load only on that group factor. More importantly, although items displaying cross-loadings on the group factors are allowable in exploratory solutions (see below), such items lead to distorted and untrustworthy item parameter estimates in restricted bifactor solutions. Stated differently, a restricted bifactor IRT model (Equation 2) demands not only that the data be multidimensional, but also that the multidimensionality be well structured (i.e., each item measures a general trait and one and only one subtrait).
Thus, even if SAS responses are deemed inappropriate for a unidimensional model, there is no guarantee that a bifactor model will provide an acceptable alternative representation. To the degree that neither the unidimensional nor a bifactor model holds, SAS responses must be considered un-modelable (Model C, no coherent latent structure) and we would lose our ability to implement any advantage IRT modeling or other latent variable modeling techniques have to offer. This is a serious concern with the SAS because the instrument was not developed through factor analytic techniques and it was never designed with unidimensional or bifactor latent variable models in mind.
The present research utilized data from 2,227 subjects from the Maryland Longitudinal Study of Schizotypy (MLSS; Blachard et al., in press). The MLSS is based on a community sample recruited using random-digit-dial methods. Commercially available data bases were used to select those neighborhoods that contained residential housing within the recruitment area. The MLSS contracted with a University-affiliated survey research center to identify 18-year-olds from within a 20-mile radius of the College Park campus.
This recruitment area allowed us to identify individuals from a wide range of urban and suburban settings including racially diverse populations within a commuting distance from the University lab where direct assessments would be conducted. Initial screening for participation involved the identification of households with an 18-year-old willing to complete a brief screening questionnaire. Screening occurred in two waves in 2001 and 2002. The initial mailed screening consisted of a “Feelings and Preferences Scale” that included intermixed items from the 40-item Revised Social Anhedonia Scale (Eckblad, Chapman, Chapman, & Mishlove, 1982). An Infrequency Scale (Chapman & Chapman, 1983) was included to identify invalid responding and individuals who endorsed three or more items in the unexpected direction were excluded from the study.
The sample was 55.8% female and 44.2% male. Ethnicity was as follows: 41.3% Caucasian, 34.8% African American, 8.4% Asian, 10.1% Hispanic, 4.8% Other (0.6% refused to identify race). Regarding education, 50.9% of the sample taking at least some college courses (ranging from part-time enrollment in community college to full-time college), 38% still in high school, 7.8% not currently in school, 3.2% in some other educational setting (e.g., technical or trade school), and 0.1% refused to provide this information.
Raw SAS scores had a mean of 10.58 and standard deviation of 6.1, and were positively skewed: skewness = 0.91, kurtosis = 3.87. When computed based on Pearson correlations, coefficient alpha was .84 with an average item inter-correlation of .12. Item descriptive statistics are shown in Table 2. One notable feature in Table 2 is that several items have very low item-test correlations (e.g., #3R, #9R, #15R, and #31). This finding is consistent with Blanchard et al. (2000) who also found that many SAS items did not load meaningfully on any of their four components. It is also consistent with Fonseca-Pedrero et al. (2009) who identified several items with loadings below .30 in a unidimensional solution.
In what follows, we conduct exploratory and confirmatory factor analytic models, including unidimensional (Model A) and bifactor models (Model B). Because of the well known problems of factor analyzing item-level data using Pearson correlations (Bernstein & Teng, 1989), we used methods appropriate for ordinal data. Our primary data analytic tool was weighted least squares with mean and variance adjustment using MPLUS (Muthén and Muthén, 2007). The use of factor analysis for ordinal variables to fit and evaluate an IRT model is justified due to their equivalence (e.g., see Takane & de Leeuw, 1987; Wirth & Edwards, 2007). In fact, programs such as MPLUS (Muthen and Muthen, 2007), TESTFACT (Bock et al., 2003), and NOHARM (Fraser, 1988) routinely provide results in both IRT (slope and location) and factor analytic (loading and threshold) parameters. Below, results will be reported using mostly factor analytic terminology (e.g., loadings) because we believe readers are relatively more familiar with this framework.
In order to gain better insight into any possible content clustering on the SAS, we conducted exploratory factor analyses prior to applying unidimensional and bifactor models. First, we performed a parallel analysis which tends to over-estimate the number of meaningful latent factors (Wood, Tataryn, & Gorsuch, 1996). The parallel analysis suggested that there were seven non-random factors in the data. The first seven eigenvalues from a tetrachoric correlation matrix were: 10.73, 2.75, 2.34, 1.61, 1.41, 1.22, and 1.15; while the first seven from the parallel analysis were: 1.26, 1.23, 1.21, 1.19, 1.18, 1.16, 1.149. We then followed-up by extracting seven factors (minres) and rotating to an oblique solution (direct oblimin). Although these factors were poorly defined (i.e., only two or three items with salient loadings per factor), we found them informative in regard to identifying clusters of items assessing highly similar psychological themes.
In Table 3 we display the items with the highest three loadings on each factor. The majority of items have their primary loading on the first three rotated factors (preference for solitude, socially aloof, and friends not important). The remaining factors contain small clusters of items with similar content. We also display the five items that did not load > .20 on any factor; it is reasonable to conclude that those items have little to do with what is being assessed by the majority of the items. In subsequent analyses, we will be monitoring the effects of these content clusters carefully. A critical question is whether the major item clusters (i.e., items loading on the first three factors), as well as the minor clusters, are ignorable “mere nuisance” or instead reflect meaningful common sources of variance that vitiate our attempt to fit a unidimensional model.
In the second column of Table 4 we display the factor loading estimates, extracting a single factor using MPLUS. The factor loadings are wide ranging, with a few items having exceptionally high loadings (e.g., #37, #19). Around half (19/40) the items have loadings greater than .50 (bolded in Table 4) and thus these appear to be the “best” markers of the dimension. As shown in the last two columns of Table 4, the bolded items are mostly drawn from the first three factors in the seven factor solution displayed in Table 3 (5, 6, 6, and 2 items from factors 1 through 4, respectively). Thus, there is a lot of content redundancy among these items that inquire about either a preference for solitude, lack of interest in others, and lack of close relationships. Interestingly, items that inquire about experiencing pleasure in interpersonal activities (#15R, #38, #39R, #40R) – a presumably core feature of social anhedonia – are among the lowest loading items.
In the next two columns are the estimated IRT location and discrimination parameters (Equation 1), respectively. Notice that the relative size of the discriminations are perfectly related to the size of the factor loadings. Also, most items have large positive location parameters indicating that an individual must be relatively high on the trait in order to endorse the item content. Finally, the non-bolded items in Table 4 have discrimination parameters that are so low as to make them nearly worthless as trait indicators (especially item #3). For example, consider that raw scores based on just the 19 most discriminating items would have coefficient alpha = .82 (versus .84 for the 40-item scale). Thus, little precision would be lost by just scoring these 19 most discriminating items.
Before taking the item parameter estimates in Table 4 seriously, we must determine whether the data are sufficiently unidimensional. As reported above, the first seven eigenvalues from a tetrachoric correlation matrix are: 10.73, 2.75, 2.34, 1.61, 1.41, 1.22, and 1.15. Judging by the first eigenvalue, there is a weak to modest general factor in the data (explaining only 25% of total item variance: 10.73/40). Moreover, the 1st to 2nd eigenvalue ratio of 4.58 is only slightly above the 3 to 1 criterion commonly cited in evaluating whether data are sufficiently unidimensional (Embretson & Reise, 2000). Finally, when we fit a confirmatory unidimensional model using MPLUS, CFI=.88 and RMSEA=.05. The former value is slightly below the commonly used rule-of-thumb of CFI > .90, and the latter indicates a reasonable model.
Judging by the above values, some scholars may conclude that the SAS data are acceptable for a unidimensional IRT model and that the parameter estimates of such models should be considered valid and used in applications. However, given: a) the weakness of a single trait running through the items, b) the results of the parallel analysis that suggested seven factors, and c) that the most discriminating items are predominantly drawn from two large content clusters, we remain cautious in our assessment. Specifically, we are concerned that the item parameter estimates in the unidimensional model may be distorted and not accurately reflect the common target dimension of social anhedonia. We will return to consider this issue more fully after examining the bifactor results in the next section.
Just as one might conduct an exploratory analysis prior to fitting a confirmatory model, in the bifactor case it is important to conduct an exploratory bifactor model (where items are free to load on any group factor) before fitting a more restricted bifactor model (where items are forced to load on one and only one group factor). Although standard statistical software programs do not provide “bifactor rotations”, a researcher can easily conduct a Schmid-Leiman (SL; Schmid & Leiman, 1957) orthogonalization prior to fitting a restricted bifactor model. In the present study, these analyses were performed on a tetrachoric correlation matrix using the SCHMID command in the PSYCH library (Revelle, 2009) in the R statistical package. Simply stated, a SL is conducted as follows. First, a correlated traits (i.e., oblique rotation) factor analysis is performed specifying a given number of "primary" factors. Second, the correlation matix among the primary factors is in turn factored extracting a single second-order factor. Finally, the SL transformation is performed as follows. An item's loading on the general factor is found by multiplying the item's loading on the primary factor by the primary factor's loading on the second-order factor. An item's loading on the group factor is found by multiplying the item's loadings on its primary factor by the square root of one minus the loading of the primary on the second-order squared. In other words, it is obtained by multiplying by that part of the primary that is not explained by the general.
We conducted SL orthogonalizations, extracting one general and two3, three, and four group factors.4 These results were inspected for: a) identification (are there at least three items with simple loadings for each group factor?), b) substantive interpretability, c) cross-loadings (which suggest the item is a blend of two group factors and problematic for restricted bifactor models). Most importantly, based on the SL results, we identified which group factor an item loaded highest on. We then estimated restricted bifactor models using MPLUS based on specifying that each item loads on the general factor and one group factor.
Unfortunately, our results indicated that neither the two, three, nor four group factor restricted bifactor models appears promising for IRT application. There were several notable problems. First, in running MPLUS we ran into estimation problems, especially in the two group factor model. For example, when an item’s loading on a group factor was low, we obtained zero or negative parameter estimates. Beyond technical problems, we also ran into problems in identifying the group dimensions. For example, in the four group factor model, there were not enough items with relatively large and simple loadings to uniquely identify two of the four group factors. Yet, the most daunting problem was the presence of cross-loading items. Specifically, when an item cross-loads in the SL solution (i.e., loads on more than one group factor), not only does that suggest model mis-specification, but forcing that item into a restricted bifactor inflates the item’s loading on the general factor and deflates its loading on a group factor.
To illustrate the problem, in the second column of Table 5 we re-display the factor loadings from a unidimensional model. In the next set of columns are the SL results for the three group factor model. This three group factor model is displayed because it was the least problematic to identify and estimate, it is the most interpretrable, and displays a good fit. Notice that the loadings on the general factor in the SL solution are lower than the unidimensional solution. This illustrates that: a) loadings in a unidimensional solution are inflated due to multidimensionality and, b) the SL structure nicely accounts for that bias by shifting the multidimensionality over to the group factors. Also, note that in the SL around a dozen items have cross-loadings, especially for group factor three.
Finally, in the next set of columns are the results of the restricted ("confirmatory") bifactor model with three group factors. As noted above, each item was assigned to a group factor based on its highest loading in the SL. The general factor in the restricted bifactor accounts for around 22 percent of the variance and the three group factors represent 3.5, 5.5, and 4.2 percent (around 13%), respectively (unexplained variance is around 65%). Substantively, we interpret the three group factors to represent: a) lack of interest in others, b) lack of warm attachments, and c) preference for solitude. The statistical fit of this three-group factor model is very good: CFI=.97 and RMSEA=.03. However, a good fit indicates only that the loadings recover the original tetrachoric correlation matrix closely and not that the parameters are correctly estimated. In fact, a major cause for concern with this model is the potential distortion in the parameter estimates caused by forcing cross-loading items into a highly constrained model.
For example, unlike the SL results, in the restricted model several items load higher on the general factor than their loadings in the unidimensional (or SL) solution – implying that controlling for multidimensionality via the restricted bifactor makes the item a better measure of the common trait! By inspecting the SL results, it is clear that this phenomenon occurs when items have significant cross-loadings. For example, item #19 has loadings of .28, .32, and .10 on the three orthogonal group factors in the SL. When this high communality item is forced into a one general and a single group factor model, the restricted bifactor model assigns that item’s common variance mostly to the general factor.
In short, the constrained bifactor model does not (mathematically) know what to do with an item with high communality and cross-loadings on two or more dimensions. Thus, when a restricted bifactor model is estimated, the model assumes that the item is a strong measure of the general factor. In turn, if the loading on the general factor is biased high, then there is less communality left for the item to load highly on a group factor. Most importantly in terms of practice, not only are the item parameters wrong, but an item could (artificially) look like a great indicator of the general trait and a poor indictor of a group factor (e.g., see items #37 and #11 in Table 5).
In sum, our conclusion is that despite the good statistical fit, we simply cannot trust the parameters of this or any of the other restricted bifactor models considered. Furthermore, the bifactor analyses indicate that the factor loadings in the unidimensional analyses conducted in the prior section are artificially inflated by multidimensionality, which distorts IRT parameter estimates (biased high). Thus, we conclude that the 40 item SAS does not conform well to either Model A or Model B, and thus does not have a coherent latent structure – i.e., Model C.
Although our attempts to fit restricted latent variable models to the SAS were not successful, it is reasonable to consider whether a subset of items could be found that does provide a satisfactory structure. Recall that in the previous unidimensional model analyses, we identified a set of 19 items that were relatively more discriminating and we noted that: a) these items come mostly from the first three factors in the seven factor solution (see Table 3), and b) the remaining 21 items could be deleted from the scale without much loss in internal consistency.
Because there are only two items from Table 3 factor four in the top 19 – a doublet about close friends in high school and current close friends – to evaluate a reduced item set for unidimensionality, we ignore those two items and consider only the 17 highest discriminating items (see Table 6). Based on tetrachoric correlations, the average inter-item correlation is r=.37 and coefficient alpha is .91. When based on Pearson correlations, the average inter-item correlation is r=.18 and coefficient alpha is .79. The first five eigenvalues are: 6.95, 1.50, 1.37, 0.92, and 0.80. Thus, 41% of the variance is explained by the first factor, and the ratio of the first to second eigenvalue is 4.63, which is slightly above the recommend 3:1 ratio. The fit of this data to a unidimensional model is CFI=.93 and RMSEA=.066, acceptable. Given these values, we can conclude that a one factor model reasonably recovers the original correlation matrix. However, can we also conclude that the data are unidimensional enough for a unidimensional IRT model? That is, do the item parameters reflect the common trait underlying the items?
In our view, although the eigenvalues and the fit indices are arguably reasonable, there are potential flies-in-the-ointment. As evidence of this, consider the following. First, the average tetrachoric correlation among the 17 most discriminating items is r=.37. However, items #19 and #37 have correlation r=.73, and the correlations among items #6, #8, and #34R range between r=.60 to r=.70. Such high correlations suggest correlated errors (local dependence in IRT jargon) and may very well bias what is being measured in a unidimensional solution. In other words, the item discrimination parameters change if certain sets of items are removed prior to calibration – a clear violation of the IRT invariance assumption. Second, as displayed in Table 6, we can unsurprisingly extract three interpretable factors (see columns F1 to F3). This is not necessarily a problem except for the fact that the correlations among those factors are only modest (i.e., arguably not strong enough to suggest a single common factor), ranging from .32 to .52.
Third, when we fit a SL exploratory bifactor model with three group factors, the items do not load as high on the general factor as they do in the unidimensional solution (see columns GSL1 to GSL3 in Table 6). This is evidence that the item discriminations are biased high in the unidimensional solution. Moreover, when we computed coefficient omega hieararchical for the general factor (Zinbarg, Revelle, Yovel, & Li, 2005), its value was only .68. There are two ways of viewing this index (see also Gustafsson and Aberg-Bengtsoon (2010) and citations therein for further discussion). First, it indicates that 68% of the variance in raw scale scores is explained by variation on a general factor. Second, the difference between coefficient omega (.68) and alpha (.91) indicates the degree of inflation in the latter index due to multidimensionality. In this case, it is clear that the 17-item SAS does not provide a precise measure of a single common social anhedonia dimension.
Although fitting and interpreting a unidimensional model for this reduced item set is highly questionnable, a bifactor model is more plausible. Table 6 also displays the results of a confirmatory bifactor model estimated for the 17-item reduced set. For this model, each item was forced to load on one and only one group factor. The fit of this model is excellent, CFI=0.96, RMSEA=0.036. Moreover, all items have reasonable loadings on the general factor and, in turn, these loadings are generally lower relative to their values in the unidimensional solution. This signals that the bifactor model is indeed controlling for the confounding effects of local dependence violations (multidimensionality) by specifying group factors.
Despite these positive bifactor model characteristics, substantial psychometric and substantive concerns persist. Particular psychometric concerns include: a) the group factors (GB1 to GB3) are ill defined (small loadings), especially group factors two and three, and b) there is some model misspecification due to forcing item #28 to load on group factor one, and forcing the cross-loading items (#20 and #10) to load on only a single group. Substantively, although the model attempts to control for this fact, the 17 item instrument still contains clusters of items that are essentially the same question asked in slightly different ways.
Our analyses revealed that the latent structure of the SAS data was challenging to model due to: a) the presence of two to three relatively large and multiple small content clusters (e.g., doublets, triplets), b) modest relations between those clusters, which imply a common latent trait that is, at best, only modest in strength, c) many items (nearly half) that shared little variance with the majority of items, and d) cross-loadings in bifactor solutions. As a consequence, we conclude that responses to the 40-item SAS does not conform well to either unidimensional or bifactor models. In either model, we are not convinced that the item parameters are invariant or reflect the true relation between the items and the target latent trait. Rather, we believe that the parameters are non-trivially affected by multidimensionality (in the unidimensional model) and cross-loading items (in the bifactor model). Thus, we conclude that neither model could serve as a meaningful foundation for IRT applications of the SAS such as linking, the assessment of differential item functioning, or computerized adaptive testing.
Our attempts to fit restricted latent variable models to a subset of SAS items were more promising. Although analysis of a 17-item subset revealed that a unidimensional solution was implausible due to non-trivial multidimensionality, a three group factor (preference for solitude, socially aloof, and close friends not valued) bifactor model resulted in an excellent statistical fit while simultaneously controlling for the biasing effects of multidimensionality. A bifactor representation of this reduced 17-item scale thus appears substantially more plausible than any of the models we attempted to fit to the full 40-item scale. Nevertheless, even in this reduced item set there were non-trivial problems concerning the identification and specification of group factors, as well as substantial content redundancy among the items. Thus, any attempts to model this subset of SAS items would need to proceed cautiously due to potentially biased parameter estimates.
To be clear in our conclusions, we are not arguing that scores on the SAS have no meaning or value in research. Clearly, high scores yield crucial information about an individual, and both cross-sectional and longitudinal research demonstrates that these scores are associated with important non-test criteria and life outcomes (e.g., Blanchard et al., in press). However, we conclude that the SAS is not an ideal modern research instrument. In addition to its poor suitability for use in contemporary IRT and other latent variable techniques, it is difficult to substantively interpret scores on the SAS in any research context. Internal consistency estimates for the SAS are substantially inflated by multidimensionality and repetition of items that ask essentially the same question in slightly different ways. Scores on the 40-item SAS clearly do not reflect a single unitary psychological construct; instead, variability on the scale predominantly reflects two (see Table 3) unbalanced content clusters: preference for solitude and indifference to others, as well as several poorly identified item clusters. Notably, neither of the main sources of variability is strongly associated with items directly related to the experience of pleasant emotions from interpersonal activities, which is a core defining feature of social anhedonia. Thus, the complex structure of the SAS complicates our understanding of meaning of scores on this scale.
Rather than trying to "clean up" the SAS by deleting items, a useful, albeit time and resource intensive, future direction would be to develop new scales that more precisely define and assess social anhedonia. The current results provide some empirically-based guidance for developing new items to be evaluated. These analyses uncovered up to seven psychologically meaningfully content domains that may be worth exploring in the development of new scales. In addition, the IRT analyses identified many items with relatively low discrimination. This indicates that individuals may endorse such items for reasons that are not directly related to the core construct(s) of interest. For example, individuals may endorse items due to factors such as anxiety or suspiciousness rather than a lack of interest in or capacity to experience pleasure from interactions with others. Careful wording of items to avoid such “secondary” sources of contamination would be an important contribution. Ideally, new scale development should be guided by a clearly defined measurement model (Borsboom, 2005) and an iterative scale development and validation approach (Clark & Watson, 1995).
Alternatively, researchers might consider using other relevant instruments that were developed through an iterative factor analytic framework and have gone through rigorous psychometric evaluation. For example, the Social Closeness Scale from the Multidimensional Personality Questionnaire (Tellegen & Waller, 2008) appears to tap into several of the key content domains assessed by the SAS. This 22-item self-report measure is a subscale of a higher-order positive affectivity dimension and includes item content assessing: a) sociability versus solitary, b) values close ties with friends, c) is affectionate and warm versus cold and distant, and d) turns to others for support. This measure has been previously modeled under a unidimensional framework IRT (Reise and Waller, 1990). Although no SEM model fit results were reported, those authors reported good IRT item fit statistics, a first to second eigenvalue ratio of 8.78 to 1.68 = 5.22, and high correlations between parameter estimates across two large samples. Moreover, the Social Closeness scale does not appear to have the same problem with doublets and triplets that are endemic to the SAS.
In closing, we note that this research attempted to fit a modern measurement theory – IRT – to an instrument that was never designed to be consistent with such a framework. The SAS is certainly not unique in this regard as many commonly used scales in psychopathology and personality research were developed prior to the advent of latent variable modeling techniques. Our primary motive for applying an IRT model is that we recognize that modern researchers are asking ever more ambitious (and expensive) questions relating environmental, neurobiological, and genetic variation with psychological variation. In our view, a measure with a clear, identified, and interpretable latent structure (i.e., a measure that would fit an IRT model) provides a more solid foundation for meaningful and replicable research to flourish. In addition, establishing a coherent latent structure greatly expands the types of research questions that can be addressed using a particular instrument. For example, questions about change over time or differential item functioning are more readily addressed through latent variable approaches.
Traditional scale development techniques reward developers for item redundancy through inflated alpha coefficients. Moreover, structural models (e.g., bifactor) and statistical indices (e.g., coefficient omega) that force a scale developer to address the issue of how well a single common dimension is being assessed have historically played no role in scale development. For these reasons, we believe that if a researcher wants a latent variable measurement model then it is probably necessary to start with that model in mind when designing the scale. Attempts to fit older scales, particularly those that were not developed using techniques such as confirmatory factor analysis, into new measurement model frameworks are likely ill-advised.
This work was supported by the National Institute of Mental Health (Blanchard: R01MH51240, K02MH079231, R01MH082839; Horan: R01MH82782).
1A model with two or more correlated factors is also a well-known and plausible latent structure for the SAS. This model proposes that subsets of items are indicative of distinct separate dimensions, and in turn, these dimensions are correlated. However, this correlated-traits model does not directly model a common latent variable of social anhedonia that runs among the items. Instead, such a model needs to be extended to represent social anhedonia as a second-order factor that explains the correlation among primary traits. Because second-order models, which are nested under confirmatory bifactor models, do not directly model the relation between items and a second-order dimension, they will not be considered further in the present paper.
2First, note that this is a “compensatory” model – high levels on either the general or group latent trait increase the probability of endorsing the item. Second, the γi in Equation 2 is simply a multidimensional intercept parameter. Unlike the location parameter in unidimensional IRT models, the multidimensional intercept has no simple interpretation.
3This model was identified by specifying that each of the primary dimensions is equally related to a higher-order dimension.
4Models up to seven group factors were also investigated, but these solutions tended to have group factors that were not well defined (e.g., only one or two items loading highly on a factor).