The current study aimed at investigating the validity of the CGI-I and CGI-S dif as outcome measures in clinical trials. More precisely, it was examined whether use of CGI-I or CGI-S dif appears more appropriate. Above, it was investigated whether therapists' CGI ratings correspond to the view the patients themselves have on their condition.
The results of the present study showed that CGI-I provided relatively high change scores compared to the difference score CGI-S dif
in terms of effect sizes. To rate a patient's condition on the CGI-I clinicians first have to remember the patient's condition at admission and then contrast it to their condition at present. By contrast, CGI-S only needs representation of the patient's current condition. Thus, the current results might be interpreted as suggesting that using CGI-I might be more prone to well known effects of hindsight memory distortion [e.g., [26
]]: When using CGI-I at discharge, therapists, teams and patients might have been inclined to retrospectively recall the patient's condition at admission as more impaired than it really was according to CGI-Sadm
and thus rated change of condition as more prominent. If this was the case, in our view, it would threaten the validity of CGI-I as outcome measure in clinical trials. However, additional research is needed directly addressing the role of memory effects on results in CGI-I until a definite conclusion on this issue is possible.
The congruence of ratings from the three perspectives on CGI-I was moderate to good and much better than the congruence of ratings on CGI-S. Moreover, while congruence between the single therapists and the teams was moderate to good, patients gave divergent ratings especially on CGI-S dif
. Overall, patients provided the most conservative ratings for change, in both CGI-I and CGI-S dif
. Simultaneously, patients' ratings correlated most strongly with BDI dif
for both CGI-I and CGI-S dif
while correlations with BDI for the other two perspectives were virtually zero. One might oppose that doubts on the validity of a self-reported CGI-rating might be warrantable because originally the CGI was not designated to be a self-rated scale so that low correlations with self-reported CGI could be seen as weak criterion for validity. However, self-reported CGI-ratings correlated significantly with BDI and the validity of BDI as an instrument for the assessment of depression severity has been shown in numerous studies [for some recent examples see e.g., [27
]]. These results suggest that CGI ratings - regardless of whether CGI-I or CGI-S dif
are concerned - made by the treating therapist or obtained through a consensus process in the team of therapists appear not to fully represent the view of the patient on the severity of his or her impairment.
So which global measure of CGI should be used as outcome measure, CGI-I or CGI-S dif? Results of the present study do not suggest a definite recommendation since no strong evidence for the validity of neither CGI-I nor CGI-S dif could be found. In our view, the overall picture of results could be interpreted as being slightly in favour for CGI-I but without doubt additional research is needed.
As already noted, there were no substantial differences between therapists' and teams' ratings. One potential explanation is that in our study the therapist who did the single rating was also member of the team of therapists and might have influenced the consensus rating in his favoured direction. Nevertheless, at least under the conditions described, our results suggest that in contrast to Kadouri et al. [18
] a consensus rating following a Delphi process does not necessarily change reliability or validity of the rating.
A couple of limitations of the current study have to be reported. The sample size was rather small so that reported results should be interpreted with care. Above, only patients suffering from a MDD have been assessed which impedes generalizability of the reported results to other patient groups. Because the length of the current depressive episode could not be determined from study data, it could not be ruled out that length of depressive episode or chronicity could have had an influence on results. Furthermore, since neither the CGI nor the BDI have been applied to a random sample of the adult population the rather low to moderate ICC found in the present study might simply be explained by the fact that only a very homogeneous sample consisting of patients who had been hospitalized for MDD has been investigated. Replication studies, ideally with larger and more heterogeneous samples are warranted.
The only criterion available for the validation of the CGI in this study was self-reported data (BDI and patients' ratings on CGI). However, the most valid procedure for diagnosing a depressive disorder is a structured diagnostic interview based on DSM-IV [29
] or ICD-10 [30
] criteria that is conducted by a clinical expert. Thus, future studies should incorporate interview-based assessments at discharge for replication of the present findings.
The reported findings were not collected in a clinical trial which is one of the main areas of application for CGI. In clinical trials clinicians are usually blinded as to what study condition the patient belongs, e.g., treatment vs. placebo. Thus, they do not know whether it is supportive for the aim of the study to state that the patient improved much or not. However, in this study, clinicians treated and rated the patients themselves. It might therefore be possible that clinicians might have been inclined to assign relatively high change scores. However, they also knew that the conducted study did not aim at evaluating therapy effects so that we expect the effect of such demand characteristics in our data to be rather small. Nevertheless, future research should investigate whether our results could be replicated in a blinded setting.