This study revealed that PHQ-9 has high sensitivity and a high negative predictive value in the setting of a clinic specializing in psychiatry as well as in primary care facilities and other specialty clinics [1
] so that PHQ-9 is useful for screening purposes for the presence of a current major depressive episode. However, low specificity and a low positive predictive value in the setting of a clinic specializing in psychiatry do not support the use of PHQ-9 for diagnostic purposes in contrast to the setting of primary care facilities and other specialty clinics, for which high specificity and a high positive predictive value were reported [1
]. These findings were shown similarly for both the diagnostic algorithmic threshold and the summary score threshold (PHQ ≥10), which were recommended for diagnostic and screening purposes by earlier studies [1
Because of low specificity in both the diagnostic algorithmic threshold and the summary score threshold (PHQ ≥10), false-positive cases should be noted in the use of PHQ-9. For a current major depressive episode, schizophrenia, panic disorder, adjustment disorder, eating disorders, dementia, and insomnia were diagnoses of false-positive cases. Compared with patients visiting primary care and other specialty clinics, more patients with various psychiatric disorders showing depressed mood and other various symptoms visit clinics specializing in psychiatry, which engenders more false-positive results.
PHQ-9 is used primarily for screening for the presence of major depressive episode but not major depressive disorder because the DSM-IV-TR
diagnosis of major depressive disorder demands several exclusion criteria such as the absence of manic or hypomanic episode, but PHQ-9 does not include such exclusion items [4
]. A significant number of bipolar disorder patients are invariably misdiagnosed with major depressive disorder by PHQ-9 because a major depressive episode is part of bipolar disorder if one uses PHQ-9 for the screening for major depressive disorder. Originally, Kroenke et al. noted that before making a final diagnosis, the clinician is expected to rule out physical causes of depression, normal bereavement, and history of a manic episode [6
]. Especially for a psychiatric specialty clinic, where bipolar disorder is much more prevalent, Kroenke’s notion must be considered. For this reason, we compared the operational characteristics of PHQ-9 against “current major depressive episode” and “current major depressive episode with major depressive disorder”. Our analysis of the validity of PHQ-9 for the screening for current major depressive episode with major depressive disorder (Table ) revealed that the positive predictive value decreased by about 10
% compared with the screening for current major depressive episode. The underdiagnosis of bipolar disorder by PHQ-9 was a main reason for the increased false positives. Further diagnostic workup for past manic or hypomanic episodes or the combination of other screening tools for these episodes can resolve this major disadvantage of PHQ-9.
One might expect that the diagnostic algorithm threshold has better specificity than that of the summary score threshold because the diagnostic algorithm closely mimics the DSM-IV-TR
diagnosis criteria of major depressive episodes. Nevertheless, against our expectations, the results obtained in this study and previous studies [3
] showed no marked difference between two thresholds in operational characteristics. The summary score threshold (PHQ ≥10) has slightly higher sensitivity and negative predictive value, but slightly lower specificity and positive predictive value than the diagnostic algorithm with no marked difference in this study and a previous study of primary care [4
]. The ROC analysis of the cut-off point (threshold) of summary score of PHQ-9 against “current major depressive episode” showed that the optimal cut-off was 13/14, which showed 0.86 of sensitivity and 0.67 of specificity comparable to those of the diagnostic algorithm, in the setting of a psychiatric specialty clinic. Table shows that high specificity (>90
%) was reached with a cut-off score of 21/22 or higher, but a high positive predictive value was not reached with any cut-off score. The salient implication is that PHQ-9 used in a psychiatric specialty clinic might be suitable for screening purposes with the optimal cut-off of 13/14 of the summary scores for major depressive episode, but not for diagnostic purposes. The summary score threshold with different cut-off points for specific purposes might be preferred to the diagnostic algorithm.
Summary scores of PHQ-9 in patients with major depressive disorder were moderately correlated with severity measures of the depressive symptoms, HDRS and MADRS scores, positively, and with the overall levels of psychological, social, and occupational functioning, GAF scores, negatively. Consistent with our results, an earlier report of primary care described that PHQ-9 scores were correlated linearly with measures of quality of life, self-reported disability days, clinical visits, and self-reported difficulties related to symptoms [6
]. In depressive disorder patients of primary care facilities, PHQ-9 scores were correlated moderately with the HDRS (17 items) scores [13
]. In this study, the correlation of PHQ-9 scores was highest with the MADRS, which is related to the core concept of depression and which showed about twice the precision in estimating depression as the HDRS (17 items) showed for the average severity of depression [14
]. Therefore, PHQ-9 scores might reflect the core symptoms of major depressive disorder, as inferred from items corresponding to the DSM-IV
criteria items. To date, the correlation of PHQ-9 scores with the standard rating scales of depression, especially the MADRS, has not been reported in psychiatric patients. In addition to screening, PHQ-9 might be useful for measuring the severity of major depressive disorder.
All subjects in this study were psychiatric patients of a university hospital that provides primary and secondary services in Japan. These patients might have more complicated backgrounds than patients in other psychiatric clinics. Accordingly, these findings might not be generally applicable to other populations, which constitutes one limitation of this study.