Br J Gen Pract. 2007 August 1; 57(541): 650–652.
PMCID: PMC2099671

Diagnosing depression in primary care using self-completed instruments: UK validation of PHQ–9 and CORE–OM

Simon Gilbody, MD, DPhil, MRCPsych, Professor of Psychological Medicine & Health Services Research and David Richards, PhD, Professor in Mental Health
Department of Health Sciences, University of York, York
Michael Barkham, PhD, Professor of Clinical and Counselling Psychology


There is increased emphasis on routine assessment of depression in primary care. This report is the first UK validation of two self-completed measures: the Patient Health Questionnaire (PHQ–9) and the Clinical Outcomes in Routine Evaluation – Outcome Measure (CORE–OM). Optimum cut-off points were established against a diagnostic gold standard in 93 patients. PHQ–9 sensitivity = 91.7% (95% confidence interval [CI] = 77.5 to 98.3%) and specificity 78.3% (95% CI = 65.8 to 87.9%). CORE–OM sensitivity = 91.7% (95% CI = 77.5 to 98.2%) and specificity = 76.7% (95% CI = 64.0 to 86.6%). Brief self-rated questionnaires are as good as clinician-administered instruments in detecting depression in UK primary care.

Keywords: depression, diagnosis, primary care, screening


Recent guidelines, such as those issued by the National Institute for Health and Clinical Excellence,1 recommend standardised instruments to improve the recognition and management of depression. The role of standardised assessment of depression is emphasised in the Quality and Outcomes Framework,2 where routine assessment in established depression, and depression case-finding in diabetes and ischaemic heart disease, are rewarded. Two specific instruments have been proposed:3,4 the Patient Health Questionnaire (PHQ–9)5 and the Clinical Outcomes in Routine Evaluation — Outcome Measure (CORE–OM).6 The PHQ–9 is a self-administered nine-item depression-specific questionnaire developed in the US.5 The CORE–OM is a longer 34-item generic instrument developed in the UK, which measures common mental health problems (including four items tapping depression), functional capacity, and risk. Both of these have clear advantages over other instruments, in that they are self-completed and are freely available to end-users. However, there is no published UK primary care validation study of these instruments assessed against a diagnostic gold standard of depression.


A randomised trial was conducted of collaborative care for depression in a UK primary care setting where both the PHQ–95 and CORE–OM6 were used at 3 months' follow-up. At 3 months' follow-up a trained interviewer carried out a diagnostic interview using the Structured Clinical Interview for DSM (SCID),7 without foreknowledge of PHQ–9 or CORE–OM scores. One hundred and fourteen patients were recruited, of whom 96 were followed up and received a diagnostic interview, PHQ–9, and CORE–OM (22 males, 74 females; mean age 42.5 years, standard deviation 13.6 years).

Sensitivity, specificity, and likelihood ratios for various PHQ–9 scores were calculated (including ≥10, as recommended in US primary care5) and for the CORE–OM-clinical and depression scores.6 Receiver operator characteristic (ROC) analysis was also conducted.


Thirty-six of 96 patients were diagnosed as having major depressive disorder on the SCID. Forty-seven out of the 60 SCID non-depressed patients scored below the recommended PHQ–9 cut off of ≥10, giving PHQ–9 sensitivity of 91.7% (95% CI = 77.5 to 98.3%), specificity 78.3% (95% CI = 65.8 to 87.9%), positive likelihood ratio 4.2 (95% CI = 2.6 to 6.9), and negative likelihood ratio of 0.11 (95% CI = 0.04 to 0.32). Increasing the cut-off point of the PHQ–9 to ≥12 improved the specificity slightly, without compromising sensitivity (Table 1). For the CORE–OM-clinical score the optimum cut-off point was ≥13, giving the same sensitivity of 91.7% (95% CI = 77.5 to 98.2%), specificity 76.7% (95% CI = 64.0 to 86.6%), positive likelihood ratio 3.9 (95% CI = 2.5 to 6.3), and negative likelihood ratio of 0.10 (95% CI = 0.04 to 0.32). Using the depression subscale (CORE-OM-D) improved the psychometric properties marginally (optimum cut off ≥17 sensitivity = 94.4%, 95% CI = 81.3 to 99.3%; specificity = 78.3%, 95% CI = 65.8 to 87.9%).

Table 1
Table 1. Sensitivity, specificity, and likelihood ratios at various cut-off points of the PHQ–9 and CORE–OM.

ROC curve analysis indicated that the PHQ–9 and CORE–OM-clinical performed well: PHQ–9 area under the curve (AUC) = 0.94 (95% CI = 0.89 to 0.98); CORE–OM-clinical AUC = 0.92 (95% CI = 0.87 to 0.97); CORE–OM-D AUC = 0.92 (95% CI = 0.86 to 0.97) (Figure 1 and Table 1; see Supplementary Table 1 for detailed psychometric properties, including CIs). At cut offs of ≥10 and ≥13 respectively, PHQ–9 and CORE–OM-clinical delivered almost identical sensitivity and specificity.

Figure 1
Receiver Operator Characteristic (ROC) curve of CORE–OM (Clinical Outcomes in Routine Evaluation − Outcome Measure) depression scores; CORE–OM clinical scores; and Patient Health Questionnaire (PHQ–9) scores in the presence ...

By applying Bayes' theorem8 to these performance data, at a commonly encountered baseline prevalence of 10% for depressive disorders, a positive screen of ≥10 on the PHQ–9 will increase the post-test probability of depression from 10 to 32% (95% CI = 22 to 43%). At a 20% prevalence of depression, typical of that encountered among those with chronic diseases such as diabetes, a positive screen on the PHQ–9 will increase the post-test probability from 20 to 51 % (95% CI = 39% to 63%).


This is the first UK validation of the PHQ–9 and CORE–OM against a diagnostic gold standard for depression in a UK primary care population. There are relative merits to both instruments: the PHQ–9 can be self-completed in less than 2 minutes, while the CORE–OM measures a greater range of mental health problems in addition to depression, and assesses functional capacity and risk. The cut-off points of ≥10 for the PHQ–9 and ≥13 for the CORE–OM achieve good performance, if not better than other longer and clinician-competed instruments.9 As this is a relatively small cross-sectional study, replication of these findings would be helpful.

How this fits in

There is an increased emphasis on the recognition and management of depression in the UK under the Quality and Outcomes Framework. PHQ–9 and CORE–OM are brief self-completed instruments, which have been recommended as screening and assessment tools. There is no published report from UK primary care on the performance of the PHQ–9 or CORE–OM as assessment tools for depression. The PHQ–9 and CORE–OM perform as well in UK primary care as clinician-rated instruments. GPs may consider using either the PHQ–9 or CORE–OM alongside enhancements of care for depression.

While these performance characteristics are impressive according to accepted criteria,10 it is unlikely that case-finding instruments, by themselves, will improve the quality and outcome of primary care for depression.1,11

We are grateful to the Medical Research Council who funded the initial trial from which these data are drawn, and to our co-investigators of the MRC Collaborative care platform trial group.


Ethics committee

Not applicable

Competing interests

Michael Barkman was involved in the development of the CORE–OM instrument, but does not gain financially from its use. The CORE–OM instrument is free within the NHS.


