|Home | About | Journals | Submit | Contact Us | Français|
Progress in the development of new pharmacological and psychosocial treatments for the negative symptoms of schizophrenia is impeded by limitations of available assessment instruments. The multi-site Collaboration to Advance Negative Symptom Assessment in Schizophrenia (CANSAS) was established to develop and validate a new clinical rating scale using a transparent, iterative, and data-driven process. The Clinical Assessment Interview for Negative Symptoms (CAINS) was designed to address limitations of existing measures and assess consensus-based sub-domains, including asociality, avolition, anhedonia, affective blunting, and alogia. The structure and psychometric properties of the CAINS were evaluated in a sample of 281 schizophrenia and schizoaffective outpatients at four sites. Converging structural analyses indicated that the scale was comprised of two moderately correlated factors -- one reflecting experiential impairments (diminished motivation and enjoyment of social, vocational, and recreational activities) and one reflecting expressive impairments (diminished non-verbal and verbal communication). Item-level analyses revealed generally good distributional properties, inter-rater agreement, discriminating anchor points, and preliminary convergent and discriminant validity. Results indicate that the CAINS is a promising new measure for quantifying negative symptoms in clinical neuroscience and treatment studies. Results guided item modification or deletion, and the reliability and validity of the revised, shorter version of the CAINS is in the final phase of development within the CANSAS project.
Negative symptoms substantially impede functional recovery for people with schizophrenia (Kirkpatrick et al., 2006). Despite their clinical significance, current treatments do not adequately address negative symptoms - there are not yet any medications with a specific indication for negative symptoms and psychosocial interventions show similarly limited benefits (Leucht, 1999, Montgomery and Zwieten-Boot, 2006). To address this critical treatment need, the NIMH-Negative Symptom Consensus Development Conference (Kirkpatrick et al., 2006) recommended that the field required a new negative symptoms measure that can be productively used in pharmacological trials. The Collaboration to Advance Negative Symptom Assessment of Schizophrenia (CANSAS) was established to develop and validate a “next-generation” clinical rating scale by following data-driven, iterative, and transparent process (Blanchard et al., in press). This report describes the psychometric evaluation of a beta version of the Clinical Assessment Interview for Negative Symptoms (CAINS) in a large outpatient sample. This effort is unlike any other scale development project to date in that it includes a large and diverse patient population and adopts a comprehensive empirical approach to item generation, selection, and retention.
The CAINS was designed to address limitations of existing instruments (Blanchard et al., 2011, Horan et al., 2006) and assess the five consensus negative symptom sub-domains (Kirkpatrick et al., 2006). Ratings of asociality, avolition, and anhedonia are based on interviewees’ reported subjective experiences of motivation and emotion, as well as frequency of actual engagement in relevant activities. Asociality assesses the degree to which close social bonds are valued and desired, and frequency of social interactions. Avolition assesses level of interest and motivation, and initiation and persistence of behavior. Anhedonia assesses experience and frequency of consummatory pleasure and anticipatory pleasure. The final two domains are rated based on observable behaviors throughout the interview. Blunted affect ratings also include prompts to elicit positive and negative emotions. Ratings for alogia include measures of speech output.
A feasibility study of an early version of the scale (Forbes et al., 2010) demonstrated good internal consistency and inter-rater agreement, and very good convergent and discriminant validity with other symptom and functional outcome measures. However, several areas needed refinement, including skewed and restricted range of anhedonia items, low inter-item correlations in asociality and avolition domains, marginal inter-rater agreement for items in alogia and avolition domains, and difficulties distinguishing among anchor points for several items.
We report here a comprehensive psychometric evaluation of the revised CAINS in a large, diverse sample of outpatients with schizophrenia or schizoaffective disorder based on recommendations that clinically stable patients are preferred for negative symptom treatment development studies (Kirkpatrick et al., 2006, Laughren and Levin, 2006). The first goal was to examine the scale’s latent structure through a series of complementary structural analyses. Clarification of the underlying structure of the scale items, which were designed to comprehensively cover five consensus-based content domains, is critical for optimal assessment of the negative symptoms (Blanchard and Cohen, 2006, Blanchard et al., 2011). This was followed by a series of scale development analyses, including analyses of item- and scale-level characteristics, within- and between-site inter-rater agreement, and a preliminary analysis of discriminant and convergent validity. The over-arching goal was to refine the CAINS for use in the final scale development phase of the CANSAS project.
Participants were 281 people with schizophrenia (n=223) or schizoaffective disorder (n=58), ages 18–60, recruited from outpatient clinics at the four CANSAS sites (UCLA, UC-Berkeley, University of Pennsylvania, and the University of Maryland). Patients met diagnostic criteria based on the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID; (First et al., 1996). Exclusion criteria were: episode of major depression or mania within the last month; substance dependence in the last six months; substance abuse in the last month; IQ < 70; history of head injury or neurological disorder; insufficient English fluency. All participants provided written informed consent before the study.
A two-day training workshop preceded study initiation. Training included manual review, didactic presentations, independent ratings of videotaped CAINS and other scales interviews, and discussions. Raters were credentialed for all study instruments following practice and confirmation of competency for videotaped and in-person interviews. Procedures were identical at all sites and approved by their institutional review boards. A detailed manual for training and supervision was developed and revised throughout the course of the study.
Following revisions from the feasibility study (Forbes et al., 2010), the CAINS-beta 2 included 23 items covering (1) Asociality (3 items covering family, romantic relationships, and friendships), (2) Avolition (4 items covering social interactions, vocational activities, recreational activities, and self care), (3) Anhedonia (9 items, covering frequency, intensity, and expected pleasure in social, physical, and recreational activities), (4) Blunted Affect (5 items covering facial, vocal, gestural expression as well as eye contact and spontaneous movement) and (5) Alogia (2 items covering quantity of speech and spontaneous elaboration). As discussed elsewhere (Blanchard et al., 2011), the first three domains are assessed based on patients’ reports of motivation, interest, and emotional experience, as well patients’ reports of actual engagement in relevant social, vocational, and recreational activities. The items in these areas do not exclusively focus on level of functional attainment because poor functioning may reflect factors that are unrelated to negative symptoms (e.g., paranoia, anxiety, skill deficits, lack of opportunity). Thus, the items were designed to more closely measure constructs that are central to negative symptoms (i.e., deficits in interest, motivation, affiliative desire).
The final two domains are based on behavioral observations during the interview. We opted to be over-inclusive with respect to the number of items, recognizing that our systematic data analytic approach to scale development would result in a smaller yet psychometrically sound instrument. All items were rated on a 0 to 4 scale with higher scores reflecting greater psychopathology. The time period covered by the interview was the past seven days. Expected pleasure assessed future pleasure with no specified time period.
Three additional measures were included to characterize the sample and for use in preliminary convergent/discriminant validity analyses: the Brief Psychiatric Rating Scale (BPRS; (Overall and Gorham, 1962), 24-item version, assessing Positive, Negative, Depression-Anxiety, and Agitation (Kopelowicz et al., 2008); the Calgary Depression Scale for Schizophrenia (CDSS; (Addington et al., 1996, Addington et al., 1990), evaluating depressive symptoms; the Wechsler Test of Adult Reading (WTAR; (Wechsler, 2001), providing a reliable estimate of full-scale IQ.
The CAINS, BPRS, CDSS, and WTAR were administered in a fixed order. CAINS assessments were videotaped for supervision and evaluation of inter-rater agreement. A random subset of 10 CAINS videos from each site was independently rated by two different raters at each of the four sites to evaluate both within-site and between-site agreement on a common set of 40 interviews.
Complementary classical test theory (CTT) and item response theory (IRT; (Embretson and Reise, 2000, Reise et al., 2005)) analyses evaluated the latent structure of the scale, item response characteristics, inter-rater agreement, and preliminary convergent and discriminant validity. Although CAINS items were written to over-inclusively sample from five consensus-based negative symptoms domains, it was not our intention to adhere to an a priori factor structure. Instead, decisions about the ultimate structure and content of the scale were based on converging structural analyses of the overall scale and multiple item-level analyses.
We adopted several guiding principles for making data-driven decisions about retaining, modifying, or deleting items. Items that failed to meet one of these criteria were considered for deletion: (1) item fit with the factor structure of the scale, (2) inter-rater agreement within- and between-sites, (3) minimal redundancy with other items, (4) item-total correlations, (5) item skew, and (6) convergent and discriminant validity.
Data analyses thus included multiple procedures. The latent structure of the scale was evaluated using four complementary approaches. Our primary analytic approach was exploratory factor analysis using principle axis extraction with promax rotation. Complementary structural analyses were conducted to confirm the replication of this initial structure including cluster analysis (Aldenderfer and Blashfield, 1985), Bass-Ackwards analysis (Goldberg, 2002), and Mokken Scaling analyses (Sijtma and Molenarr, 2002). Following confirmation of the CAINS general structure, additional analyses to evaluate performance of constituent items included: (a) Item-level descriptive statistics, including skewness and item-total correlations; (b) Inter-rater agreement with Intraclass Correlation Coefficients (ICC's; model: two-way mixed: type: absolute agreement; (20) at both within- and between-site; (c) Correlational analyses to assess whether the CAINS subscales significantly correlated with the BPRS negative symptom subscale to assess convergent validity. Discriminant validity was assessed by examining correlations between the CAINS subscales and measures of positive and affective symptoms, which were expected to be small; (d) Item Response Theory (IRT) analyses to assess whether each of the five within-item response options was functioning adequately, and to guide item revisions and further item deletions.
Demographic and clinical information (Table 1) on this chronically ill sample indicate diverse gender and ethnicity, allowing for confidence in applicability to diverse populations. The sample was characterized by generally high levels of functional disability. The majority of participants were unmarried, unemployed, and receiving disability benefits. Clinical symptom ratings on the BPRS and CDSS for this outpatient sample fell in the moderate to low range. Estimated IQ was in the low average range.
The scree plot for the factor analysis suggested a two or three-factor solution. Overall, the analyses favored a two-factor solution, and Table 2 presents the factor loadings for the two-factor solution. Factor 1 contained all the items from the asociality, avolition, and anhedonia domains and Factor 2 contained all the items from the affective blunting and alogia domains. The two factors were moderately correlated (r = .39, p < .05). Thus, the two factor solution broadly distinguished between Experience items (i.e., requiring participants to report on internal experience of motivation, emotion, or closeness) and Expression items (i.e., based on observations of behavior). Items that did not load above .40 on either factor included Alogia: self-care and Blunted Affect: eye contact. The three-factor solution was more difficult to interpret, as the third factor items had extremely variable loadings and were skewed in the nonpathological direction (Table 3)1.
We conducted additional hierarchical structural analyses including Cluster analysis, Bass-Ackwards analysis, and Mokken Scaling analysis (Supplemental On-line Materials). Convergent findings confirmed that the CAINS items do not form a “unidimensional” composite. Instead, the items break down into two homogeneous content groupings: expressive and experiential. A possible third grouping was comprised of anhedonia intensity items and eye contact and self-care items. As these items underperformed in the other analyses they were ultimately dropped.
Table 3 presents descriptive information for the CAINS items. Several items had a skew > 1.0. Many of these items also had low loadings on the two factors from the structural analyses, including the anhedonia intensity items. Several items had low item-total correlations, including eye contact, spontaneous movement, and self-care.
We examined inter-item correlations to identify redundancy (On-Line Supplemental Material). The two alogia items were highly inter-correlated (.88), indicating redundancy (as in the structural analyses). The social avolition item was more strongly correlated with the asociality items than with other avolition items, suggesting that motivation for social interactions fits better with items assessing social relationships. The frequency of physical anhedonia item was somewhat redundant with the frequency of recreation anhedonia item (r = .56). The anhedonia intensity items were not strongly intercorrelated with one another (all but two below .35), consistent with the factor analysis.
Inter-rater agreement results are presented in the final columns of Table 3. Although agreement statistics are typically presented at the level of scale, we conducted more stringent item-level analysis to provide a rigorous assessment of item performance to inform possible item retention or revision. Both within- and between-site agreement were quite good for the Experience items. Agreement was lower, however, for the Expression items.
Several items failed to meet one or more criteria we established for item retention: (a) factor loading > .45 on one of the two factors; (b) inter-rater agreement > .70; (c) correlation < .55 with another item; (d) item to scale total correlation ≥.35; and (e) skew <1.0. We thus deleted: Avolition: self-care (criteria a, d); Blunted: eye contact (a, b, d); Blunted: spontaneous movement (d, e); all Anhedonia intensity items (a, d, e); Anhedonia: physical frequency (c); Alogia: spontaneous elaboration (c); and Avolition: social (c). The revised scale included seven Expression items and four Experience items.
The internal consistency (coefficient alpha) was .84 for the four-item Expression subscale, .77 for the seven-item Experience subscale, and .80 for all items combined. The correlation between these scales was modest (r = .30, p < .001). Within- and between-site agreement for the two subscales were above .70 (Experience subscale: within = .94 and between = .92; Expression factor: within = .77 and between = .74). The items included in these two subscales were examined in subsequent validity and IRT analyses.
As shown in Table 4, the Expression subscale and total scores strongly converged with BPRS negative symptom ratings; this relatively strong association is not surprising since the BPRS negative symptoms subscale focuses on observable behaviors. The correlation between the Experience subscale and BPRS negative symptoms, although significant, was small.
The CAINS subscales demonstrated good discriminant validity. The CAINS total score was not correlated with depression, positive symptoms, or IQ. The CAINS Expression scale was only modestly correlated with IQ (r = .19) and was not correlated with any other non-negative symptom. The CAINS Experience scale was not correlated with IQ and had only small correlations with positive symptoms, depression (as rated on the CDSS but not BPRS), and agitation (all rs < .15).
Regarding sex and ethnicity/race, men had higher scores than women on the Expression subscale, t (279) = 2.00, p = .05, but men and women did not differ on the Experience subscale, t (279) = 1.47, p > .05. There were no effects for ethnicity or race.
IRT modeling was used to determine how clearly and meaningfully the anchor point categories (i.e., ratings of 0 – 4) for the retained items differentiated among individuals2. For the Expression subscale, all items were highly discriminating and, although the 5 anchor points were fairly discriminating, the spread among some anchor points was less then ideal. For example, ratings of 2 and 3 were only within a very narrow range whereas ratings of 1 and 4 tended to cover a relatively wide range of Expression symptoms (particularly for the Alogia: quantity of speech item). The Experience subscale items were generally less discriminating among people than the Expression items, suggesting that these items tap into a relatively broader underlying construct. In general, the 5 anchor points were discriminating, suggesting that the response categories provide useful distinctions. However, on some items ratings of 2 and, to a lesser extent, 3 did not uniquely identify individuals on Experience symptoms. Thus, IRT analyses suggested that most anchor points for Expression and Experience items were performing reasonably well, but also provided clear guidance for revisions to better demarcate thresholds between several rating category boundaries.
The goal of this collaborative project is to develop and validate a state-of-the-art clinical rating scale for negative symptoms so that the next-generation of pharmacological treatments may have more potent and clear treatment targets. Analyses of this large, diverse outpatient sample, which showed considerable functional disability, indicated that the CAINS is a very promising measure for future treatment studies in schizophrenia. Converging structural analyses indicated two underlying factors reflecting experiential and expressive negative symptoms. Item-level analyses revealed good distributional properties, inter-rater agreement, initial convergent and discriminant validity, and discrimination among individuals with anchor point categories. The results guided our final revisions of the CAINS, which we describe in more detail below.
Across a convergence of multiple analytic approaches, a two-factor model more parsimoniously defined the structure of the CAINS than the five original consensus-based negative symptom sub-domains. The first factor reflects experiential aspects of negative symptoms and includes diminished motivation for and engagement in pleasurable social, vocational, and recreational activities. The second factor reflects expressive aspects of negative symptoms and includes diminished verbal and non-verbal communicative output. Results demonstrated that the CAINS factor-derived scales showed clear convergent validity with another assessment of negative symptoms and these scales were largely independent of other non-negative symptom domains including depression and positive symptoms. This two-factor conceptualization of moderately correlated experiential and expressive negative symptoms comports with recent conceptual and empirical reviews of older interview-based measures of negative symptoms (Blanchard and Cohen, 2006, Kimhy et al., 2006, Messinger et al., 2011). Further research will determine if these factors reflect distinct neurobiological processes. Nevertheless, the convergence of our results with prior findings provides a compelling basis for organizing the CAINS into two core dimensions and crafting subscales to optimally assess each.
Based on the a priori empirical principles that guided the data analyses, the CAINS has been shortened and revised. Within the Experience domain, the most substantial changes involved anhedonia-related items. Structural and item-level analyses indicated that these items did not cohere well with other Experience items and showed substantial skew and restricted range in the non-pathological direction. Although we adopted a Likert scale approach to assess Anhedonia intensity to address similar problems from our feasibility study (Forbes et al., 2010), this was not successful. It is unclear whether these relatively non-pathological intensity scores found across studies reflect the absence of a pleasure intensity deficit, consistent with other research (Cohen and Minor, 2008, Kring and Moran, 2008) or, instead, limited sensitivity of our interview-based assessment. Nevertheless, our findings across two studies revealed psychometric limitations to rating anhedonia intensity that substantially impede their ability to demonstrate improvement in the context of a clinical trial. Thus, anhedonia intensity items were removed from the CAINS.
Anhedonia items based on the frequency of pleasurable experiences fared relatively better in the analyses, and thus these items were retained following revisions. First, we omitted the physical pleasure frequency item because it substantially overlapped with the recreational pleasure frequency item and, relative to the other anhedonia frequency items, showed the lowest inter-rater agreement, greater skew toward non-pathological ratings, and was judged to be relatively less clinically important. Second, following from the IRT analyses, we adjusted the anchors for social and recreational pleasure frequency to more clearly demarcate among rating anchor points. Third, we added a provisional vocational frequency item to provide uniformity across the social, vocational, and recreational domains. Finally, we added three new provisional frequency items for expected pleasure to test the utility of assessing expected pleasure based on frequency, rather than intensity.
Results also guided changes involving avolition and asociality items. First, although selfcare is included in several negative symptom scales and is widely regarded as clinically important, we removed the self-care item due to its very low coherence with other Experience items and its skew toward non-pathological ratings. Second, we removed the social avolition item due to empirical and conceptual overlap with the three asociality items. We revised the asociality items such that motivation for relationships was more clearly part of each item. Finally, IRT analyses led to modifications of probe questions and anchor points to help raters distinguish among rating points for the remaining avolition and asociality items.
Regarding the Expression items, results revealed problems with redundancy and lack of coherence for some, as well as concerns about inter-rater agreement, and demarcation among rating anchor points. Because the two alogia items were highly inter-correlated, we removed the spontaneous elaboration item, primarily because raters found the quantity of speech item simpler to use. Although eye contact and spontaneous movements are included in other scales, they showed poor coherence with other Expression items and relatively low inter-rater agreement and were thus removed. To address the lower inter-rater agreement, limited range, demarcations among rating points, and user-friendliness of these items, we revised all the remaining items to refine and simplify all rating anchor points. Finally, we added a provisional item aimed at making the format of the verbal expression (i.e., prosody) rating more consistent with the remaining Expression items.
Based on these results, we now have a substantially shorter and re-organized version of the CAINS. As shown in Table 5, the scale is now organized into two subscales that differ in content and the sources of information used to rate them. The Experience subscale consists of seven items (plus four provisional items) assessing experienced motivation and pleasure, as well as behavioral engagement in social, vocational, and recreational activities. The Expression subscale consists of four items (plus one provisional item) tapping verbal and non-verbal emotion expressive behaviors. Based on the data collected to date, the CAINS approach to rating negative symptom shows promising psychometric properties, inter-rater agreement, and convergent, discriminant validity.
In the final phase of the CANSAS project, the reliability and validity of the revised CAINS will be evaluated in another large sample. Test-retest reliability will be assessed to ensure that the scale shows sufficient stability for use as an endpoint in clinical trials. The convergent and discriminant validity of the CAINS will also be comprehensively evaluated. For example, independent ratings on the widely-used Scale for the Assessment of Negative Symptoms (Andreasen, 1983) will be conducted by raters who are blind to CAINS ratings. Participants will also be assessed on psychiatric symptoms, functional capacity, functional attainment, neurocognition, and alternative measures of emotion and motivation. The final version of the CAINS will be a major step forward in the assessment of negative symptoms, thus meeting the original charge of the consensus development conference and setting the stage for the next generation of pharmacological treatment advances. The development process of the CAINS is unique with respect to the sample size, a priori empirically driven approach, and comprehensive assessment of psychometric properties and validity.
This approach will prove fruitful for attaining the ultimate goal, which is to stimulate novel pharmacological and psychosocial treatments and new research into the underlying causes of negative symptoms. Efforts to develop neuroscience based accounts of schizophrenia (e.g., Barch and Dowd, 2010, Gur et al., 2007, Juckel et al., 2006, Ochsner, 2008) integrating psychological, neurobehavioral, neuroimaging and genomic data hinge on the integrity and reliability of clinical phenotypic data. The CAINS can contribute to treatment by providing an outcome measure for interventions aimed at ameliorating impaired emotion processing and social cognition in schizophrenia (Carter et al., 2009, Green et al., 2008). Functional neuroimaging studies examining the underlying neural circuitry of motivated behavior and affective processes would be complemented by the CAINS, which can help establish associations between activation abnormalities and deficits in specific symptom domains.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1In the three-factor solution, Factor 2 was identical to the two-factor solution. Factor 1 included 11 of the 16 items from the two-factor solution Factor 1, and the remaining five items formed a third factor consisting of anhedonia intensity. Variable loadings of the third factor were: Recreation past week intensity (.75) and expected intensity (.72), Physical expected intensity (.59) and past week intensity (.38), and Social - expected intensity (.34).
2For each subscale, we fit a separate graded response model (GRM; (Samejima, 1969) using MULTILOG (Thissen, 2003), which produces parameter estimates of a slope (i.e., how well the item discriminates among individuals) and four between-category threshold parameters (e.g., where on the latent trait continuum an individual has a 50% probability of responding 0 versus 1, 2, 3 or 4; 0, 1 versus 2, 3, 4; 0, 1, 2 versus 3, 4, and 0, 1, 2, 3 versus 4). These parameter estimates were then transformed into category response curves (CRCs), which display how the probability of responding in a particular category changes as a function of changes on the latent variable (Supplemental Analyses).
William P. Horan, VA Greater Los Angeles Healthcare System, University of California, Los Angeles.
Ann M. Kring, University of California, Berkeley, CA.
Raquel E. Gur, University of Pennsylvania.
Steven P. Reise, University of California, Los Angeles.
Jack J. Blanchard, University of Maryland, College Park.