The primary purpose of this study was to demonstrate that most of the items of the PANSS are Very Good or Good at assessing overall illness severity throughout the spectrum of increasing levels of severity. A second purpose was to create an abbreviated version of the PANSS using a nonparametric IRT in the TestGraf software. Shortened versions of this standardized and widely disseminated scale provide an interesting avenue, which could be more fully explored before investing resources in the development of completely new instruments.
Our results confirmed that a majority of PANSS items (63.33%; 19 out of 30 items) are either
Very Good or
Good at assessing the overall illness severity. Our results agree with the ones found by Santor and colleagues [
8] who conducted the first IRT analysis of the PANSS. Not surprisingly our present nonparametric IRT showed that the Negative Symptom items (particularly, N1, N2, N3, N4, N6, and N7) showed good discriminative properties across almost the entire range of severity (i.e. increases in symptom intensity correspond to increases in illness severity), and it is these items that most closely approximates the "ideal" item illustrated in Figure . In addition, items of the Positive Symptoms subscale, P1, P3, P5, and to a lesser degree, P2, P4 and P6, also showed good approximation to the "ideal" item presented in Figure . For these items, the probability of rating a particular option (level of severity) corresponded to a relatively well defined and narrow range of severity.
In contrast, as demonstrated by Santor and colleagues [
8] many items (P7, N5, N7, G1, G5, G6, G9, G10, G11, G12, G13, and G15) demonstrate problematic features and some fundamental issues remain with regard to the use of the PANSS total score as a measure of overall level of psychopathological severity in schizophrenia. Several items from the General Psychopathology subscale failed to show good discriminative properties across the range of severity assessed in the present study. Of the 16 items of the General psychopathology Subscale, only seven (43.75%) were found to be either Very Good or Good and were retained in the Mini-PANSS. For example, for item G3 (Guilt Feelings), OCCs were flat (not peaked) across almost the entire severity range, and was dominated by a single response option throughout most of the distribution of scores. One may argue that this is a result of the severity of the patient population used for this study, however, the levels of psychopathology in this study ranged from the lowest levels of severity (a total PANSS score of 32) to very high levels of severity (a total PANSS score of 161).
A consistent observation across all items was that very extreme symptomatology (option 7) was rarely rated. Additionally, Santor and colleagues [
8] and Obermeier and colleagues [
36] recommended rescaling the PANSS options as option 7 is rarely endorsed and some options present ambiguous definitions. For example, on item P1, patients scoring at the highest range of Positive Symptoms total score were far more likely to score a 5 or 6 on this item, suggesting that option 7 was underutilized. Additionally, a large number of items showed an overlap in OCCs for options 3 and 4 (some examples include G2, G3, G12). These result were not unexpected, because the definition of option 3 includes "little interference with patient's daily functioning,' whereas option 4 "represents a serious problem but occurs occasionally" (Kay et al., 1987). This phrasing appears to create greater overlap as the terms "little interference" and can be difficult to differentiate from "occurs occasionally." Results also demonstrate overlap between a number of adjacent OCCs. In particular, items P7, N5, G1, G3, G10, G11, and G12 display significant overlap between most options suggesting these levels of severity are poorly differentiated. Also, results show that some items are predominantly rated at higher levels of severity and do not span the entire continuum of expected scores. For example, G2, G5, and G16, have OCCs starting from expected scores on the General Psychopathology subscale of approximately 25.
It is noteworthy that results of the current investigation offered a high degree of agreement with other psychometric research of psychopathology in schizophrenia. Specifically, like the present IRT, previous psychometric investigations have indicated that PANSS items P7, N5, G1, G5, G10, G11, G12, G15 either do not discriminate well in terms of assessing overall severity or do not reflect dimensional individual differences between patients with schizophrenia [
2,
8,
37]. Also, like the present IRT, previous psychometric investigations have indicated that PANSS items of N1, N2, N3, and N4 discriminate well and reflect dimensional individual differences [
19,
37,
38]. The present results have implications for psychopathology measurement and clinical assessment. Researchers and clinicians evaluating psychopathology in schizophrenia using the 30-item PANSS may choose to focus only on items that performed well in IRT analyses.
The effectiveness of item options has a direct bearing on the effectiveness of their respective item and, therefore, on the effectiveness of the Positive, Negative, and General Psychopathology subscales. In this case, the Negative Symptom subscale was found to provide maximum information at the low and high ends of the construct. The low standard error of estimate supports the conclusion that these items form a well-defined subscale. Similar observations are noted for the Positive Symptom subscale with test information functions 0.10, and better for the lower 10% and upper 5% of the severity. The General Psychopathology subscale had the least test information function of the three subscales, ranging from 0.04 to 0.09 of the severity level. Additionally, standard error of estimate for the General Psychopathology subscale increased progressively from 1.0 at the lower end of the trait level up to 6.0 at the higher end of the severity level, thus indicating increased errors of measurement along higher levels of the severity continuum. These subscale performance results are similar to those found by Santor and colleagues [
8], who observed better subscale performance for the Positive and Negative subscales over the General Psychopathology subscale. It appears then that the two subscale scores reflect the overall severity spectrum more appropriately than the total PANSS score. The use of the two Positive and Negative subscales independently from the rest of the scale is seen at times in clinical trials considering that these two symptom domains are key components of the disease [
2] and which are primarily targeted in drug development.
Although the PANSS was originally designed with three subscales (Positive, Negative, and General Psychopathology), studies examining the internal structure of the scale [
39] have all identified the same two underlying factors, a positive and negative factor. Other factors have varied and included Disorganized, Excitement, Hostility, Dysphoric, Catatonic and many more [
2]. Given that OCCs depend on how symptom severity is defined, the appropriateness of modelling of items via their subscale scores, rather than a total PANSS score was confirmed by conducting PCA on each subscale to assess unidimensionality. The PCA of the General Psychopathology subscale did not assume unidimensionality, which supports to some extent the common practice in clinical trials to examine the Positive and Negative subscales independently from the rest of the scale since these symptoms are considered a key component of the disease [
2] and are symptom clusters which are primarily targeted in drug development.
Our results of the nonparametric IRT provided valuable information regarding whether each item on the PANSS subscales was useful in the assessment of the overall severity of schizophrenia and in scale construct. In addition, it allowed us to select the PANSS items having utility across a broad range of illness severity and to include them in a shortened version of the scale (termed, Mini-PANSS). The similarities and differences between the 30-item PANSS and the Mini-PANSS were examined with a series of descriptive analyses, including high correlations between subscale and total scores. Results of the PCA of the Mini-PANSS assumed dimensionality for all three of the subscales. We deleted those PANSS items, which did not appear to contribute significantly to the symptom structure of schizophrenia based on their option curves. Exclusion of these less specific items (P7, N5, G1, G2, G3, G5, G10, G11, G12, G15 and G16) resulted in high internal reliability between PANSS 30-item subscales and Mini-PANSS subscales, indicating that omission of these items in future clinical trials is not likely to significantly alter the PANSS subscales. The performance of the Mini-PANSS relative to the original by comparing correlations and reliability of the 30-item PANSS subscales with the Mini-PANSS subscales was demonstrated by significant correlations and good reliability between the respective subscales, and the examination of the mean score differences between the interpolated scores and the actual PANSS scores show little bias in linking methods used.
This study illustrates a method of calibrating scales on the summed-score scale using an IRT approach. This method has been used in previous studies as the basis for the computation of IRT scaled scores for each summed score [
16,
40,
41]. Although one may argue that some loss of information follows from the simplification of scoring from response patterns to summed scores, that loss of information is small and the corresponding change in the reported standard error would often not result in a visible change in the number of decimals usually reported.
We also developed a summed-score linking method to enable the transformation of the mini-PANSS scores for each of the subscales to the subscale scores of the full PANSS. This linking method will allow comparing data scored with the mini-PANSS to be transformed to the full PANSS allowing for comparison of results from studies using the two versions of the PANSS or to transform data from one study using the Mini-PANSS to data with the full PANSS. Future studies may benefit by incorporating a shortened version of the PANSS based on the items that performed as Very Good and Good in the IRT analyses. For example, abbreviating the measure in a meaningful way could serve as a screening instrument, increase rater reliability of assessment in research settings as well as offer an objective approach to measuring psychopathology in primary care and other clinical settings.
Limitations
First, despite its advantage as a shorter instrument, the Mini-PANSS should not be considered as a replacement for the original scale. The decision to produce a short IRT- based form of the PANSS could be seen as a loss of the multidimensional construct. The PANSS dimensions of Anxiety/Depression, Excitement/Hostility, and Cognition are not fully represented in the Mini-PANSS. Even if a theoretical criterion was applied to select, among the most effective items, the different items that would eventually form a Mini-PANSS, one would need to re-examine these items from a theoretical perspective. Furthermore, there are still no definitive criteria to establish whether measures developed from IRT are theoretically and empirically superior to instruments developed with CTT.
Second, the present sample was based on patients included in clinical trials according to specific inclusion and exclusion criteria, and may therefore not accurately represent all patients with schizophrenia encountered in clinical practice and not be generalizable. Because of the large number of sites and investigators, interrater reliability among raters at different sites may not have been consistently optimal.
Third, our examination of OCCs showed that options in some items (e.g., item N5) were problematic, and that option 7 was rarely used at all levels of psychopathology. This may reflect the fact that patients included in clinical trials do usually not present with extreme levels of psychopathology. They could not be recruited and adequately consented at extreme levels of item severity. On the other hand, some adjustments may be necessary; for example, option 7 could be reformulated (e.g., combining options 6 and 7), and the effectiveness of these modifications will have to be empirically tested.
Fourth, Cella and Chang [
42] warned of the possible limitations of using IRT methods in the evaluation of health measures since IRT methods were originally developed to be used with a fairly homogeneous educational assessment population. When we apply these methods to more heterogeneous clinical populations there may be limitations to obtain item-free estimates of sample latent traits. Cella and Chang [
42] also remarked that the context, selection and sequence of items, considering both item diversity and clinical diversity, may produce sample-dependent item difficulty estimates and, therefore unreliable item-dependent estimates of patients' severity of illness. The continuous monitoring of item calibrations involved in the process of item banking will help to solve these uncertainties.
Finally, the full range of psychometric properties of the Mini-PANSS needs to be carefully studied before this new scale can be clinically used. We are presently planning to test these properties. For example, further examination of validity, reliability, sensitivity, specificity, schizophrenic categories and assessment of cut-off scores for the Mini-PANSS can be examined in a clinical trial framework.