|Home | About | Journals | Submit | Contact Us | Français|
Establishing the minimum clinically important difference (MCID) in the Positive and Negative Syndrome Scale (PANSS) is important to the interpretation of the research and clinical work conducted with this scale.
This study employed both anchor-based and distributive methods to estimate the MCID for the PANSS using data from the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) Schizophrenia trial, a large, multicenter trial for patients with schizophrenia. Data from 1442 individuals linked PANSS scores with both clinician and patient ratings on the Clinical Global Impressions Scale (CGI) using an equipercentile method. Data were also used to investigate the magnitude of the standard error measurement (SEM), offering another estimate of the MCID.
Cross-sectional clinician rated CGI scores of 1 through 7 linked to PANSS scores of 32.4, 42.2, 57.5, 74.5, 93.0, 110.9, and 131.0 respectively. The MCID for PANSS scores using this scale equaled a 15.3 point (34.0%) change from baseline. A 1.96 SEM on the PANSS corresponded to a 16.5 point (36.2%) change from baseline. The MCID for a subsample with above-median baseline PANSS scores was 38% higher than a sample with lower baseline scores. With the patient-rated CGI as the anchor, PANSS scores were higher for CGI scores of 1 through 4 and the MCID was lower, 11.2 points (24.6%).
MCID estimates from a longer-term effectiveness trial were consistent with previous efforts from shorter-term efficacy trials. MCID estimates can help clinicians and researchers design future studies and interpret treatment change in future research and clinical work.
The Positive and Negative Syndrome Scale (PANSS) is the most widely-used standardized instrument for assessing symptom severity in schizophrenia.1 It has been used as an outcome measure in a multitude of treatment efficacy studies and is increasingly used in clinical practice. 2–4 One drawback of the PANSS and other instruments based on summary rating scores is the lack of a gold standard with which to interpret results.5 Clinicians must rely on experience with individual patients and populations to interpret PANSS scores and the clinical significance of various degrees of change.
The concept of the Minimum Clinically Important Difference (MCID) has emerged as a way of giving clinical relevance to changes in standardized instrument scores when there is no gold standard of meaningful change. The MCID has been defined by Jaescheke, Singer and Guyatt (1989), page 408, as “the smallest difference in a score in the domain of interest which patients [or providers] perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a meaningful change in the patient’s management.”6 This concept is important both in clinical practice and clinical trials, especially non-inferiority trials where a treatment must demonstrate the absence of a statistically significant difference in MCID from a comparison treatment. The MCID is additionally important for determining if small statistically significant differences in measurement scores in studies with large sample sizes are great enough to be considered clinically meaningful.7
A number of techniques have emerged to estimate the MCID, which fall into two categories termed anchor-based and distribution-based methods.8 Both approaches are used to estimate the change in a standardized instrument score associated with clinically important change. Anchor based methods use a measure with established or face-value clinical meaning such as the Clinical Global Impressions Scale-Severity (CGI-S)9 to anchor scores on the measure of interest.5 Distribution-based methods generally use the statistical characteristics of the sample such as the standard deviation to separate “signal” from “noise”.5 These methods describe observed differences or change in scale scores, but do not provide information about what size of change is clinically important and should ideally be linked to a clinical measure of the MCID.10
Recent studies using anchor-based methods linking PANSS change to the CGI Improvement scale have estimated that between a 16% and 24% change in PANSS corresponds to the MCID for minimal clinical improvement.11–15 These studies have evaluated diverse populations. However, no study has yet used a distributive method to estimate the MCID of the PANSS or used a patient-rated measure of illness severity to anchor PANSS scores.
The Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) is one of the largest and longest schizophrenia trials conducted to date comparing the effectiveness of multiple antipsychotic treatments using broad inclusion criteria in a variety of treatment settings. The current effort seeks to use this sample to estimate the MCID of the PANSS using both anchor-based and distributive techniques in addition to using both patient- and rater-reported clinical global impressions of severity.
CATIE was designed to compare the effectiveness and cost-effectiveness of available second generation antipsychotics in a large, National Institute of Mental Health–funded, randomized double-blind trial at 57 US sites, including both academic and community providers. Participants were 18 to 65 years of age with a diagnosis of Schizophrenia. Those diagnosed with schizoaffective disorder, a cognitive disorder or who had only one schizophrenic episode were excluded. Details of the study design have been presented elsewhere.16,17 The study sample (n=1442) included individuals with data from baseline and follow-up time points.
The PANSS yields a total average symptom score, based on 30 items rated from one to seven (range=30–210). Higher scores indicate more severe symptoms.1 This study utilized PANSS results from baseline and 1, 3, and 6 month follow-up assessments. Raters were required to undergo both initial and ongoing certification in the PANSS to ensure high interrater reliability.18
The CGI-S is a widely used measure of global illness severity scored on a seven-item scale. The clinician is asked to rate the individual based on his or her “total clinical experience with the given population.” The following scores can be given: 1=normal, not at all ill, 2=borderline mentally ill, 3=mildly ill, 4=moderately ill, 5=markedly ill, 6=severely ill, and 7=among the most extremely ill patients.9 The CATIE trial also included a measure of global illness severity similar to the CGI but scored by the patient on a seven-item scale (CGI-P). The patient was asked, “On a scale of ‘1’ to ‘7’, where ‘1’ is not at all ill, and ‘7’ is the worst that your illness has ever been, how would you rate the severity of your schizophrenia symptoms?”
This effort used both an anchor-based and distributive technique to estimate the MCID of the PANSS.
Equipercentile linking techniques were used to compare scores on the CGI-S/CGI-P and PANSS scales following a method used by Leucht et al.14 and described by Kolen and Brennen.19 This technique functionally maps scores between the two different, but correlated, scales by linking scores on both measures at the same percentile rank.20 For the purposes of this study, equipercentile linking is preferable because it links the CGI-S/CGI-P scales to the PANSS scale and back in an equivalent manner. A regression would produce different comparisons depending on which scale was treated as the independent variable. Linkings were computed using a process developed by Albano21 in the R Environment for Statistical Computing. In a cross-sectional analysis, CGI-S and CGI-P scores were initially mapped to PANSS scores using equipercentile linking techniques for values at baseline, 1, 3, and 6 months, as well as from all data pooled across all time points. The reductions in CGI-S and CGI-P scores were then linked to corresponding changes in PANSS scores and the percent change from baseline in PANSS scores at 1, 3, and 6 months and across the pooled data. The percent change from baseline in PANSS scores were calculated by first subtracting the 30 baseline points which correspond to the lowest score of one point on each of the 30 PANSS questions, thus establishing a valid 0 score at the bottom of the scale. In a secondary analysis, the population was stratified by the median baseline PANSS score and linkings were repeated for those with “high” and “low” baseline PANSS.
The distribution-based method estimates the MCID by comparing the observed change in PANSS to the variability in the PANSS, calculated in this study as the Standard Error of Measurement (SEM). The SEM is a measure of the variability in PANSS scores reflecting the test-retest reliability of the scale and is considered to be a characteristic of the measure and not of the sample.10 The formula for the SEM is where δ is the standard deviation (SD) and r is the reliability as measured by the intraclass correlation coefficient. Previous efforts have indicated that values between 1 and 2.3 SEM approximate the MCID.5,10,23 In order to calculate the SD and reliability of the PANSS in the CATIE trial, a subset of the population which had stable symptomatology over the first month was chosen by identifying individuals whose CGI-S score did not change from baseline to one month, a method similar to that used by Duru and Fantino.24 The SD of the PANSS scores for this population at baseline was used for the SEM calculation. The intraclass correlation coefficient was calculated using a two-way mixed model of PANSS scores at baseline and 1 month.
The number of individuals with PANSS, CGI-S and CGI-P data are reported in Table 1 with mean scores and SD. The number of individuals declined by 32% between baseline and 6 months due to study dropouts. Scores in all three measures decreased from baseline to 6 months by a total of 16.5% on the PANSS, 10.0% on the CGI-S and 8.6% on the CGI-P.
Figure 1 depicts the cross-sectional analysis linking CGI-S scores of 1 through 7 to PANSS values of 32.4, 42.2, 57.5, 74.5, 93.0, 110.9, and 131.0 respectively pooled across the four time periods (baseline, months 1, 3, and 6). CGI-P scores linked to PANSS values of 45.2, 57.4, 67.4, 78.0, 90.0, 101.8 and 114.4, pooled across the same four time periods reflecting patient judgments of their improvement (Figure 1). PANSS total scores linked to CGI-P scores were generally greater than those linked CGI-S scores for less severe ratings (1 – 4). PANSS scores linked to CGI-P levels decreased over time for lower CGI-P values (Figure 3), but not when linked to CGI-S levels (data not shown).
Figure 2 depicts the change from baseline in CGI-S scores linked to the absolute change from baseline in PANSS scores. For this analysis, we assumed the MCID of the PANSS to be the change linked to a one-point change in the CGI-S score. Using data pooled from all time points, a one-point improvement in the CGI-S linked to a 15.3 point (34.0%) decrease in PANSS total score from baseline. The MCID for improvement in the PANSS was slightly lower, 14.7 points (32.1%) at one month compared to that calculated at three months, and 15.0 (35.3%) at six months, 16.4 (35.8). When the change in PANSS scores was linked to the CGI-P (Figures 2 and and3),3), the MCID for improvement was somewhat smaller, 11.2 points (24.6%) using pooled data, and showed a similar increase over the course of the study.
Figure 4 depicts linking CGI-S and CGI-P scores with change in PANSS scores stratified by the median baseline PANSS. In those with lower baseline PANSS scores, CGI-S and CGI-P scores linked to lower PANSS scores compared to those with higher baseline PANSS scores. On the whole, this discrepancy in stratified scores was higher using the CGI-P than the CGI-S. Similarly, the MCID using the CGI-S was lower, 11.6 points, in those with lower baseline PANSS scores compared to those with higher baseline PANSS scores, 18.7. A similar difference was observed using the CGI-P as the anchor but both values were lower that those using the CGI-S.
The sub-population where there was no change in the CGI-S between baseline and 1 month consisted of 707 individuals with a mean PANSS score of 76.0 (SD=17.4). The reliability of the PANSS calculated as the intra-class correlation between PANSS scores at baseline and those at 1 month was 0.77 which computes to 1 SEM = 8.4 (18.4% decrease from baseline) and 1.96 SEM = 16.5 PANSS points (36.2% decrease from baseline).
This study estimated the MCID for the PANSS, a widely used measure of symptomatology in schizophrenia. The study used a large sample of participants in the CATIE Schizophrenia trial which employed open inclusion criteria, allowing for what is likely a generalizable population with chronic schizophrenia. In addition, varying analytic methods were used including a novel anchor based approach as well as by a distributive technique. We estimated an MCID for the PANSS of 15.3 points or 34.0% from baseline by linking PANSS and CGI-S scores. This estimate corresponds to the more conservative estimate of 1.96 SEM which yielded a similar MCID estimate of 16.5 points. When patient rated CGI-P values were used as an anchor, lower self-rated scores linked to higher PANSS scores compared to clinician rated global severity. As a result, the MCID estimated via linking with patient-reported severity was slightly lower than that estimated by clinician report. In addition, the MCID estimate varied according to baseline psychopathology with a 38% difference in the MCID between those with higher and lower baseline PANSS scores. These results suggest that for patients in the CATIE trial, where mean PANSS scores changed on average at most 14.8% over the follow-up period, symptom change may not have reached a clinically meaningful level on average, using a clinician anchored or distributive estimation of the MCID although many patients achieved meaningful gains.
This study used both anchor-based and distributive methods, each with its own conceptual underpinnings, to develop similar estimates of the MCID. This provides additional evidence of the validity of previous MCID estimates. Studies evaluating different distributive methods have suggested that the SEM is more concordant with clinically meaningful change than other distributive methods.25,26 The use of the SEM also mitigates the problems associated with the absence of adequate psychometric investigations into the CGI.14,27 However, there has been some disagreement as to how many SEMs an individual must change in order for that change to be considered clinically important.23 Several groups have suggested that one SEM corresponds to the MCID on measures of health-related quality of life10,28,29 while other studies favor a value of 1.96 SEM.23,30 It must be remembered that the SEM reflects change that cannot be attributed to measurement error alone31 and its application to an estimation of the MCID is theoretical and should to be corroborated with clinical evaluation. The current results indicate that using the conservative value of 1.96 SEM may best approximate the MCID. This corresponds to the value representing a 95% confidence interval proposed by McHorney and Tarlov.25 In addition, similar studies using the CGI as an anchor to estimate the MCID have used the CGI improvement scale11,12,14,32 in which clinicians are asked to rate the change in symptoms compared to baseline. Our use of serial CGI-S measurements avoids the recall bias potentially confounding the use of this scale.
Several prior studies using multiple populations have attempted to anchor PANSS scores with a measure of global clinical change. A summary of these studies is reported in Table 2. The studies vary in sample size, in the severity and chronicity of symptoms, and in methods for evaluating clinical change. A majority of these studies report the MCID as the percent reduction from baseline in the PANSS,11,12,14,15,32 while a fraction12,32 also report the absolute value of the change. Understanding change as a fraction of the whole may be easier than considering the absolute value of a change on an arbitrary scale which may give validity to these author’s choice to report the MCID as a percent. However, since the PANSS is a set of 30 questions scored from 1 to 7, the PANSS total score has a minimum value of 30 points which makes the calculation of percent change in the PANSS score problematic. In calculating the percent change from baseline 30 points must be subtracted from the baseline value before the percent change can be calculated. In a review of the methods used in the prior efforts at calculating an MCID for the PANSS,11,12,14,15,32 none specifically mention this issue or that they accounted for the 30 minimum points. Therefore, we feel that comparing the absolute values from these studies is proper.
The current results compare most closely with those reported by Leucht et al. (2005 and 2006) who described an absolute value for the MCID of 15 points in a pooled sample of seven pharmacologic trials in sample with “florid psychosis” where the mean (SD) baseline PANSS was 94 (19).14,32 A more recent study by Schennach-Wolff et al. found a much lower estimate of approximately 5 points.12 These authors suggest the more generalizable population generated by the open inclusion criteria and probably consisting of individuals with more chronic disease may account for the difference. However, they do not report baseline PANSS values and were unable to quantify chronicity in the population, so comparison with the CATIE sample is difficult. CATIE used open inclusion criteria to develop a representative sample that included both inpatients and outpatients at multiple sites and in multiple treatment settings in addition to providing rigorous training and testing of raters to ensure reliability of PANSS ratings. In doing so, this study may provide a more representative estimate of the MCID for the PANSS in patients with chronic schizophrenia.
Several authors have suggested that the MCID for the PANSS varies depending of on the severity and chronicity of symptoms and two efforts along with the current analysis have stratified samples on baseline PANSS scores to evaluate this.12,32 In the current analysis, we found a 38% difference between the MCID of the samples with more and less psychopathology at baseline while this difference was 43% in Leucht et al. (2006) and 85% in Schennach-Wolff et al. These results demonstrate that baseline psychopathology greatly affects the MCID of the PANSS and that both clinicians and researcher must account for this in their use of MCID.
To our knowledge, this is the first study to evaluate the MCID from the patient’s perspective. The CGI-S is anchored to the clinician’s “total clinical experience with the given population”9 while the CGI-P is anchored to the patient’s knowledge of their past illness severity. While this distinction somewhat lessens the value of comparing the MCIDs derived from the two scales, we feel that the data show, in general terms, that patients can tolerate more symptamatology per unit of subjectively assessed severity and that there is less change in symptamatology associated with each subjectively detectable unit change in severity. These results indicate that the current experience of symptoms are less clinically meaningful to patients compared to formally trained raters but that patients experience the change in symptoms as more clinically meaningful. The former may provide evidence for a decrease in insight in those with schizophrenia or minimization of symptoms as has been previously reported.33 In addition, Cramer et al.15 pointed out that patients with schizophrenia may have a difficult time perceiving change due to changes in cognition or the presence of positive symptoms. Our results provide contradictory evidence in that patients identified less symptomatic change as clinically significant compared to trained raters. However, linking with CGI-P scores displayed more change over time (Figure 3) compared to linking with CGI-S scores (data not shown). CGI-P levels linked to less severe PANSS scores over time and the MCID bases on CGI-P linking increased over time, trends which have also been seen in work by other authors.12,14,15,32 These findings may indicate that the reliability of CGI-P ratings is less that that of CGI-S ratings, however the extent to which this phenomena is driven by regression to the mean is unknown especially given that linked CGI-S scores in this study did not show much change over time.
Several limitations of this study must be addressed. There was a high rate of treatment discontinuation in CATIE because of its relatively long duration potentially introducing some attrition bias.17 In the order of measures for the CATIE trial, the PANSS was administered prior to the CGI-S and both measures were most often completed by the same rater at each assessment. This procedure may bias CGI-S ratings in an unknown manner if the rater is aware of the PANSS score. Future investigations should replicate findings using independent raters for the PANSS and CGI, preferably with the clinician managing treatment completing the CGI. In addition, this study used a self-assessed version of the CGI-S, which has not been validated especially in the light of possible problems with patient-report data from individuals with schizophrenia as discussed above. However, the availability of this measure did allow us to investigate differences in patient and independent rater assessed MCID which has not been examined in previous studies. Finally, the methods used to estimate the MCID in this study evaluate a sample as a whole, and caution must be used when applying the MCID results to individual patients as might be attempted in measurement guided treatment initiatives.34
In conclusion, we estimated the MCID for the PANSS using a more representative sample than previous studies and novel analytic techniques. We estimate an MCID of approximately 15 points or 34% of the baseline value of the 0-based PANSS in the CATIE sample. Our estimates also varied considerably when stratified by baseline psychopathology and when anchored to a patient reported illness severity measure. We found that patients with schizophrenia perceive their symptoms as less severe and judge smaller symptom changes as likely to be clinically important compared to formally trained raters. Our results may give clinicians and researchers a greater understanding of a commonly used measure of schizophrenia symptomatology especially when the instrument is used as an outcome measure. These findings may allow a more informed assessment of the meaning of change in both research reports and clinical practice.
The authors would like to thank Larry Price, Ph.D., Professor of Psychometrics & Statistics, Director of Faculty Research, Texas State University, San Marcos, Texas and Rolf Engel, Ph.D., Ludwig Maximillian’s University, Munich, Germany for their assistance on equipercentile matching. The authors would also like to thank Elina Stefanovics, Ph.D, VA New England Mental Illness, Research, Education and Clinical Center, VA Connecticut Health Care System, West Haven, Connecticut for her help in data management and analysis.
Financial and Material Support
This analysis was supported by the New England Mental Illness Research and Education Center. The funding source had no role in the design, analysis or interpretation of data or in the preparation of the report or decision to publish.
Drs. Price, Engel and Stefanovics declare no conflicts of interest in relation to this research.