|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this study was to examine the performance characteristics and validity of the Patient Health Questionnaire - 9 item (PHQ-9) as a screening tool for depression among adolescents.
The PHQ-9 was completed by 442 youth (13-17 years) who were enrolled in a large healthcare delivery system and participating in a study on depression outcomes. Criterion validity and performance characteristics were assessed against an independent structured mental health interview (the Child Diagnostic Interview Schedule, DISC-IV). Construct validity was tested by examining associations between the PHQ-9 and a self-report measure of functional impairment, as well as parental reports of child psychosocial impairment and internalizing symptoms.
A PHQ-9 score ≥11 had a sensitivity of 89.5% and specificity of 77.5% for detecting youth meeting DSM-IV criteria for major depression on the DISC-IV. On ROC analysis the PHQ-9 had an area under the curve of 0.88 (95% CI = 0.82 to 0.94) and the cut point of 11 was optimal for maximizing sensitivity without loss of specificity. Increasing PHQ-9 scores were significantly correlated with increasing levels of functional impairment, as well as parental report of internalizing symptoms and psychosocial problems.
Although the optimal cut point is higher among adolescents, the sensitivity and specificity of the PHQ-9 are similar to those of adult populations. The brief nature and ease of scoring of this instrument make this tool an excellent choice for providers and researchers seeking to implement depression screening in primary care settings.
In response to the growing evidence for effective treatments for depression among adolescents, the US Preventive Services Task Force now recommends screening for depression among adolescents in primary care settings.1 However, a recent systematic review identified only five studies with adequate psychometric data on sensitivity and specificity of depression screening instruments in primary care.2 Two of the studies evaluated the same instrument, thus four screening instruments were tested. The four instruments identified each have potential limitations for application in primary care including cost to administer a patented instrument,3 length of the screening instrument,4, 5 algorithm based scoring rather than symptom scoring,6 and the ability to be a stand alone screener for depression.7
Ideal screening instruments are brief, easy to understand for patients, simple to score, available without cost, and have strong performance characteristics. The Patient Health Questionnaire 9-item (PHQ-9) depression screener was developed for administration among adults in primary care settings. It has been shown to have good diagnostic validity and comparable sensitivity and specificity to other longer measures of depression.8, 9 In addition, because the instrument is based on DSM-IV criteria, the same 9-items are used in adults to establish probable depressive disorder diagnoses as well as to grade depressive symptom severity.
Despite wide use in adult populations, the PHQ-9 has not been validated in adolescent populations. In this paper, we evaluate the operating characteristics (sensitivity and specificity) of the PHQ-9 as a diagnostic instrument for depressive disorders among adolescents and the construct validity of the PHQ-9 as a depression severity measure in relation to functional status and parental assessment of child symptoms.
The AdoleSCent Health (ASC) Study was developed by a multidisciplinary team at the University of Washington and the Group Health (GH) Research Institute. The main purposes of ASC Study were to evaluate the performance of depression screening tools and to describe the characteristics of adolescents who would most benefit from exposure to evidence-based interventions for depression in primary care settings. All procedures were approved by the GH Human Subjects Protection Committee.
Between September 2007 and June 2008, study staff randomly selected 4,000 enrollees, ages 13-17, who had seen a provider in a GH healthcare facility at least once in the prior 12 months. GH is a consumer-governed non-profit healthcare organization that serves over 630,000 residents of Washington and Idaho. The parents/guardians of selected enrollees received an invitation letter, a consent form, and a brief survey for their child. Parents were asked to sign the consent form and give the survey to their child to complete in a private place. The child received a $2 pre-incentive. Completion of the survey was taken as a form of assent by the child. Parents of youth who did not respond received a second mailing and follow-up phone calls.
The brief survey consisted of 10-items about age, gender, weight, height, sedentary behaviors, overall heath, functional impairment, and depressive symptoms (the Patient Health Questionnaire 2-item Depression Scale (PHQ-2)). The PHQ-2 includes the first two items from the PHQ-9 and asks respondents to rate the frequency (0 = not at all; 3 = nearly every day) that they have had: a) depressed mood and/or b) lack of pleasure in usual activities in the past 2 weeks. In a prior publication using these same data, we found that a score of ≥3 on the PHQ-2 has a sensitivity of 73% and a specificity of 75% for detecting major depression among adolescents.10
A subset of youth (n = 499) were invited to participate in the follow-up phone interview study, during which in-depth information was obtained on depressive symptoms, functional impairment, and health behaviors. Youth with a PHQ-2 ≥ 3 (n = 271) and a sample of youth with a PHQ-2 ≤ 2 (n = 228) who were frequency matched for age and gender were invited to participate. Youth completing the follow-up interview received $20. Consent for the phone survey was obtained from both the parent and the child.
The child phone interview included the Patient Health Questionnaire 9-item (PHQ-9) screener and the Diagnostic Interview Schedule for Children depression modules (DISC-IV). The PHQ-9 was completed prior to other depression and mental health measures.
The PHQ-9 is a self-administered version of the depression portion of the PRIME-MD interview,11 which uses DSM-IV criteria to assess for mental disorders in primary care.8 It can be scored to provide a dichotomous diagnosis of probable major depression and to grade symptom severity via a continuous score. The PHQ-9 has been found to have high sensitivity (73%) and high specificity (98%) for the diagnosis of major depression in adult populations.8, 11 Among adults, scores on the PHQ-9 have also been used to define severity for probable diagnoses in the following manner: a score of 5-9 is considered minimal depression, 10-14 is considered mild major, 15-19 is moderate major, and ≥20 is severe major.8 The PHQ-9 also has a functional impairment question (item 10) that asks how much the symptoms they endorse in the first 9 items interfere with daily functioning.
The DISC-IV is a reliable and valid structured interview designed for lay interviewers, which includes algorithms to diagnose DSM-IV disorders in children and adolescents.12 Telephone versions of structured psychiatric interviews have been found to have a high correlation with in-person interviews.13, 14 In order to decrease patient burden, only the depression modules (major depression and dysthymia) were used. All interviewers received 12 hours of classroom and hands-on training and additional project-specific training on the DISC-IV.
The Columbia Impairment Scale (CIS) was used to assess functional impairment.15 The 13-item CIS scale measures adolescent impairment in school, family, and peer relationships and has been shown to correlate with the clinician-rated Children’s Global Assessment Scale.15
To assess for anxiety symptoms, youth were asked to complete the brief 5-item version of the Screen for Child Anxiety Related Emotional Disorders (SCARED).16 Using a cut-off of 3 or greater, the brief SCARED has been shown to have a sensitivity of 74% and a specificity of 73% for discriminating youth with clinically significant anxiety from those without anxiety.16
Parents were asked to complete the Brief Pediatric Symptom Checklist (PSC-17). The internalizing component of the PSC-17 (at a cut-point of ≥5) has a sensitivity of 73% and specificity of 74% for detecting youth with a depressive disorder.17 The externalizing component (at a cut point of ≥7) has a sensitivity of 62% and a specificity of 89% for detecting youth who met criteria for an externalizing disorder.17
Descriptive statistics were completed for the full sample and stratified by depression status. Three categories of depression status were used based on algorithms from the DISC-IV: major depression, “intermediate depression”, or no depressive disorder.12 Youth with “intermediate depression” reported at least 3 of the 9 symptom criteria for major depression but did not meet diagnostic criteria for major depression in the past year. Chi-square analyses and F-test analyses were used to compare categorical and continuous variables, respectively, among depressed and non-depressed individuals based on PHQ-9 score. The area under the ROC curve was calculated as a quantification of the sensitivity and specificity of the ability of the self-report questionnaires to classify youth into the past month major depression category based on the DISC-IV. Results were interpreted based on standards that have been set for interpreting the area under the curve.18
Of 3,775 eligible youth, 2291 (60.7%) completed the brief survey (Figure 1). Twelve percent of youth (N=281) screened positive for possible depression with a PHQ-2 ≥ 3. 499 youth were invited to participate in the full baseline assessment of whom 444 (89%) consented and both the parent and child completed the baseline survey. Two youth who met DSM-IV criteria for bereavement on the DISC-IV were removed from the analytic sample resulting in a final sample of 442 youth.
The mean age of participants was 15.3 years (SD = 1.2) and 60% of subjects were female. The sample was predominantly White (71%) with the next largest minority groups being Asian (10%) and Black (9.6%). Seventy-eight percent came from two parent households and 87% of youth came from households where at least one parent had at least some college. The median neighborhood household income for participants was $57,442 (SD = $18,293). Among the 242 youth who had a positive PHQ-2 on initial screening, 112 were still positive on a PHQ-2 two weeks later and 101 had a PHQ-9 score of 11 or higher. Among the 202 that were negative on the screening PHQ-2, 194 were still negative on the PHQ-2 two weeks later and 190 had a PHQ-9 score less than 11.
Table 1 shows the distribution of PHQ-9 scores by depression status on the DISC-IV. Categories on the PHQ-9 (minimal, mild, moderate, moderately severe, & severe) were based on severity thresholds established by the original authors of the PHQ-9 8 with the exception of the use of 11 rather than 10 as the lower threshold for the moderate category based on the results of our ROC analyses. Youth meeting criteria for major depression had significantly higher total scores on the PHQ-9 and were more likely to be in the “moderate” to “severe” categories of impairment on the PHQ-9 compared to the other two diagnostic groups (Chi-square (8) = 105.06, p < .0001). Youth with “intermediate depression” on the DISC-IV were most likely to be in the “mild” to “moderate” categories of impairment on the PHQ-9. Youth with no depression diagnosis were most likely to report minimal symptoms. The mean PHQ-9 scores also decreased significantly and linearly from a high of 15.5 (SD = 5.6) for those with major depressive disorder, 10.7 (SD = 3.9) for those with “intermediate depression”, and 6.1 (SD = 5.1) for youth with no depressive disorder (F(2,439) = 48.08, p < .0001).
Table 2 shows the test characteristics of the PHQ-9 using the DISC-IV as a gold standard. The optimal cut-point for maximizing sensitivity of the PHQ-9 without loss of specificity was a score of 11 or greater. At this cut-point, the PHQ-9 had a sensitivity of 89.5% and a specificity of 77.5% for detecting youth with major depression on the DISC-IV. The positive predictive value was 15.2% for detecting major depression on the DISC-IV and the negative predictive value was 99.4%. On ROC analysis (Figure 2) the area under the curve for detecting major depression was 0.88 (0.82 – 0.94).
We also assessed the PHQ-9 in our sample using the algorithmic scoring protocol for probable major depression (presence of depression and/or anhedonia at least “more than half the days” and a minimum of 5 total symptoms occurring at least “more than half the days” (with the exception of suicide which is positive with any endorsement)). Using this scoring protocol, the sensitivity for detecting youth with Major Depression on the DISC-IV was 57.9% and the specificity was 90.3%.
Table 3 shows the relationship between PHQ-9 scores and each of our measures of impairment. Scores on the Columbia Impairment Scale (CIS), mean depressive symptom related difficulty (item 10 on the PHQ-9), parental report of internalizing symptoms (PSC-17 internalizing scale), and overall psychosocial impairment (total PSC-17 score on the parent version) increased in a linear fashion such that youth with higher PHQ-9 scores also exhibited higher scores (indicating more impairment) on each of these measures (p <0.0001 for all measures).
The false positive rate, calculated as 1-specificity, was approximately 22.5% when using the DISC-IV as a gold standard. To better understand the characteristics of youth with false positives, we examined the association between having a false positive PHQ-9 and having “intermediate depression” or a positive screening test for another disorder (anxiety or externalizing disorder). Among adolescents with PHQ-9 scores ≥11 but no DISC-IV diagnosis for major depression (N = 95), 29.3% had “intermediate depression” on the DISC-IV, 16.5% had major depression in the past year but not in the prior month, 23.9% had elevated externalizing disorder symptoms (PSC-17 externalizing scale score ≥7), and 56.8% had clinically significant anxiety symptoms (SCARED score ≥3). Taken together, 82.2% of the false positive group had at least one of the four indications we examined: 45.3% had one, 31.6% had two, and 5.3% had three of the four indications.
The US Preventive Services Task Force advises primary clinicians to screen adolescents for depression provided there is a system of care to confirm diagnosis and initiate treatment.1 To implement this recommendation, providers need screening tools that can be easily implemented in pediatric primary care settings. Although it has been extensively tested among adults, this is the first study to examine the test characteristics of the PHQ-9 in an adolescent population. We found that, when compared to a structured diagnostic interview, the PHQ-9 had high sensitivity (89.5%) and good specificity (78.8%) for detecting major depression among adolescents and on ROC analysis had an area under the curve of 0.88 putting this screening tool in a “good” range. 18 This sensitivity and specificity of the PHQ-9 is in a similar range to other depression screening tools that have been tested among adolescents in primary care: BDI (Sensitivity – 91%, Specificity – 91%)3, PHQ-A (Sensitivity – 73%, Specificity – 94%)6, and Short MFQ19 (Sensitivity – 80%, Specificity – 81%)20, and performs better than physician interview following targeted training (Sensitivity – 43% and Specificity – 87%).5
In adult samples, a PHQ-9 score of 10 or higher is recommended to identify individuals with likely depression. Based on our findings, we would recommend using a cut-point of 11 or higher to indicate the need for further evaluation for depression. However, providers may reasonably choose an alternate cut point. For example, even though it may result in a higher rate of false positives, clinics where both adolescents and adults are seen might choose to use a cut-off of 10 in order to simplify procedures for providers.
In a prior study using this sample, we found that the PHQ-2 which contains the first two items of the PHQ-9 has a sensitivity of 74% and a specificity of 75% for detecting major depression among adolescents.10 Clinics wishing to minimize respondent burden could start with the PHQ-2 followed by the full PHQ-9 only on those with a score of 3 or higher on the PHQ-2. The benefit of adding the PHQ-9 in this protocol is that it provides more information on individual depressive symptoms, has better specificity for major depression than the PHQ-2, and it includes a question about suicide, an important cause of mortality among adolescents.21 It is important to note, however, in using the PHQ-9 that youth do not need to be depressed to be suicidal. Any positive indication of suicidality (a score of 1 or higher on item 9 of the PHQ-9) should be taken seriously and followed up on by providers regardless of total PHQ-9 score.
Compared to the findings in adults, the sensitivity of the PHQ-9 is higher but the specificity is lower in the adolescent population. This suggests that when used as a screening tool, the PHQ-9 is less likely to miss youth with major depression but there is a higher false positive rate in adolescent populations. The higher false positive rate in adolescent populations may be a result of a high rate of subthreshold depressive symptoms and adjustment disorders, as well as a significant overlap of symptoms between mental health disorders among this age group. Of the youth who were in the false positive category, 82% had an indication of a mental health concern including meeting criteria for “intermediate depression” on the DISC-IV, having depression in the past year but not in the past month, having high levels of externalizing behavior and/or having high levels of anxiety symptoms suggesting the need for further monitoring.
An additional difference between the adult and the youth DSM-IV criteria for major depressive disorder is that youth may meet the diagnostic criteria by presenting with irritability rather than depressed mood. The PHQ-9 does not include an item about irritability and, to allow for the use of a single form for settings where both adolescents and adults are seen, we chose not to change the wording of the PHQ-9. As we did not add an irritability item, we are not able to determine how it may have modified the performance of the PHQ-9. The DISC-IV does include an irritability item and some of the discrepancy between these two instruments may relate to this difference.
This study has the following limitations. First, this study was conducted in an insured population of adolescents in the Pacific Northwest and may not be generalizable to all adolescent populations. Second, the response rate to our initial brief screen was 60% and we may have had some selection bias regarding youth who participated in the study. Although we were very encouraged by the 89% participation rate in the follow-up interview study, it is possible that youth who chose not to participate were different from those who did. Third, since we oversampled youth with elevated PHQ-2 scores, the prevalence of depression in our study sample may be higher than would be seen if conducting screening in the primary care clinic. The positive and negative predictive values are influenced by underlying population prevalence and may be lower in a general primary care sample. Additionally, the PHQ-9 was administered via a phone interview which may have resulted in different responses than if it had been self-administered. Finally, the DISC-IV asks questions about a one-month time period while the PHQ-9 asks about the prior 2 weeks. Some of the lack of sensitivity and specificity may be due to these time window differences.
Despite these limitations, the PHQ-9 is a promising screening tool for use among adolescents. It is brief, easy for patients to understand, simple to score, and available without cost. An additional major advantage of the PHQ-9 is that many primary care providers are already using it for the adult population and thus have familiarity with administration and scoring. It performs well in this age group and will be particularly useful for providers or researchers who want to conduct rapid screening in primary care settings or as part of research protocols.
This work was supported by grants from the Group Health Community Foundation Child and Adolescent Grant Program, the University of Washington Royalty Research Fund, a Seattle Children’s Hospital Steering Committee Award and through a K23 award for Dr. Richardson from the NIMH (5K23 MH069814-01A1).