|Home | About | Journals | Submit | Contact Us | Français|
To examine the validity of the Patient Health Questionnaire - 2 (PHQ-2), a 2-item depression screening scale among adolescents.
After completing a brief depression screen, 499 youth (13–17 years) who were enrolled in an integrated healthcare system were invited to participate in a full assessment including a longer depression screening scale (the Patient Health Questionnaire, PHQ-9), a structured mental health interview (the Child Diagnostic Interview Schedule, DISC-IV). Eighty-nine percent (N=444) completed the assessment. Criterion validity and construct validity were tested by examining associations between the PHQ-2 and other measures of depression and functional impairment.
A PHQ-2 score ≥3 had a sensitivity of 74% and specificity of 75% for detecting youth meeting DSM-IV criteria for major depression on the DISC-IV, and a sensitivity of 96% and a specificity of 82% for detecting youth who met criteria for probable major depression on the PHQ-9. On ROC analysis the PHQ-2 had an area under the curve of 0.84 (95% CI = 0.75 to 0.92) and the cut point of 3 was optimal for maximizing sensitivity without loss of specificity for detecting major depression. Youth with a PHQ-2 ≥3 had significantly higher functional impairment scores and significantly higher scores for parent-reported internalizing problems than youth with scores <3.
The PHQ-2 has good sensitivity and specificity for detecting major depression. These properties coupled with the brief nature of the instrument make this tool very promising as a first step for screening for adolescent depression in primary care.
By age 18, 20% of youth have experienced at least one episode of major depression.1 Depressed youth are at increased risk for suicide, school failure, substance abuse, nicotine dependence, early pregnancy, and social isolation.2–4
In the United States, less than half of youth who meet criteria for mental health disorders receive treatment for these disorders.5–7 Younger age of disease onset has been shown to be a predictor of increased risk for delays in mental health treatment, with most adolescents not receiving any treatment until early adulthood.8 The delay in diagnosis and treatment of mental health disorders in adolescents and an inadequate supply of child mental health specialists has led to increasing focus on screening for depression and improving quality of depression treatment in pediatric primary care settings9–15
In response to the growing evidence for effective treatments for depression, the US Preventive Services Task Force now recommends screening for depression among adolescents.16 However, a recent meta-analysis identified only five studies with adequate psychometric data for screening in adolescents in primary care.17 Each of these studies used different instruments and none of the studies examined very brief screening questionnaires for depression (i.e. 2–3 questions). Brief screens are important in the primary care setting given the time constraints of busy practices and the need to screen for many health risk behaviors, not just depression.
The Patient Health Questionnaire 2-item depression screener is one of the most commonly used brief screens with adult populations. It has been shown to have good diagnostic validity among multiple large samples of adult primary care patients and comparable sensitivity and specificity to other longer measures of depression.18, 19 It is often used as a first step in depression screening, to identify individuals who require further evaluation with the remainder of the PHQ-9 questions and a clinical interview.
In this paper, we evaluate the criterion and construct validity of the PHQ-2 as a screening tool for depressive disorders among adolescents.
The AdoleSCent Health (ASC) Study was developed by a multidisciplinary team at the University of Washington and the Group Health (GH) Research Institute. The main purposes of ASC Study were to evaluate the performance of depression screening tools and to describe the clinical characteristics of adolescents who would most benefit from exposure to evidence-based interventions for depression in primary care settings. All study procedures were approved by the GH institutional review board.
Between September 2007 and June 2008, study staff randomly selected 4,000 enrollees, ages 13–17, who had seen a provider in a GH facility at least once in the prior 12 months. The parents/guardians of selected enrollees received an invitation letter, a consent form, and a brief survey (10-items) for their child. Parents were asked to sign the consent form and give it and survey to their child to complete in a private place. The child received a $2 pre-incentive. Completion of the survey was taken as a form of assent by the child and a phone number for questions was included on all study materials. Parents of youth who did not respond received a second mailing and follow-up phone calls.
The brief survey consisted of 10-items about age, gender, weight, height, sedentary behaviors, overall heath, functional impairment, and depressive symptoms. The Patient Health Questionnaire 2-item Depression Scale (PHQ-2) was administered for the first time as a part of this brief questionnaire and was used to determine who would be invited for a follow-up interview. The PHQ-2 asks respondents to rate on a Likert scale the frequency (0 = not at all; 3 = nearly every day) that they have had: a) depressed mood and/or b) lack of pleasure in usual activities in the past 2 weeks. Scores range from 0–6. A score of ≥3 on the PHQ-2 has been found in adults to have the highest sensitivity and specificity and area under the curve in ROC analysis to a diagnosis of major depression based on structured psychiatric interview.18
A subset of youth (n = 499) were invited to participate in the follow-up phone interview study, during which more in-depth information was obtained on depressive symptoms, functional impairment, and health behaviors. Youth with higher PHQ-2 scores on screening were oversampled such that most youth with a PHQ-2 ≥ 3 (n = 271) and a sample of youth with a PHQ-2 ≤ 2 (n = 228) who were frequency matched for age and gender were invited to participate. Youth completing the follow-up interview were mailed $20. Consent for the phone survey was obtained from both the parent and the child.
The child phone interview included the Patient Health Questionnaire 9-item (PHQ-9) screener and the Diagnostic Interview Schedule for Children depression modules (DISC-IV). The PHQ-2 questions are included as the first two questions of the PHQ-9. In our phone interview, the PHQ-9 was completed prior to any other depression or mental health measures. We used the PHQ-2 from the phone interview in all analyses in this paper to eliminate time between assessments as a reason for disagreement in screening and interview results.
The PHQ-9 is a self-administered version of the depression portion of the PRIME-MD interview,20 which uses DSM-IV criteria to assess for mental disorders in primary care.21 and was used as a gold standard in two of the main primary care-based adolescent screening studies.22, 23 It can be scored to provide a dichotomous diagnosis of probable major depression and to grade symptom severity via a continuous score. The PHQ-9 has been found to have high sensitivity (73%) and high specificity (98%) for the diagnosis of major depression in adult populations.20, 21
The DISC-IV is a reliable and valid structured interview designed for lay interviewers, which includes algorithms to diagnose DSM-IV disorders in children and adolescents.18 Telephone versions of structured psychiatric interviews in both adults19 and youth20 have been found to have a high correlation with in-person interviews. In order to decrease patient burden, only the depression modules (major depression and dysthymia) were used. All interviewers received 12 hours of classroom and hands-on training and additional project-specific training on the DISC-IV.
The Columbia Impairment Scale (CIS) was used to assess functional impairment.24 The 13-item CIS scale measures adolescent impairment in many domains including school, family, and peer relationships and has been shown to correlate with the clinician-rated Children’s Global Assessment Scale.25
To assess for anxiety symptoms, youth were asked to complete the brief 5-item version of the Screen for Child Anxiety Related Emotional Disorders (SCARED).26 Using a cut-off of 3 or greater, the brief SCARED has been shown to have a sensitivity of 74% and a specificity of 73% for discriminating clinically significant anxiety from non-anxiety compared to an interview administered by trained clinicians.26
To evaluate for parent-reported child internalizing symptoms and psychosocial function, parents were asked to complete the Brief Pediatric Symptom Checklist (PSC-17). The internalizing component of the PSC-17 (at a cut-point of ≥5) has a sensitivity of 73% and specificity of 74% for detecting youth with a depressive disorder on structured diagnostic interview.27
Descriptive statistics were completed for the full sample and stratified by depression status. Three categories of depression status were used based on algorithms from the DISC-IV: major depression, intermediate depression, or no depressive disorder. Youth were classified as having “intermediate depression” if they reported at least 3 or more of the 9 symptom criteria for major depression with or without impairment due to depression but the diagnostic criteria for major depression were not met. Chi-square analyses and t-test analyses were used to compare categorical and continuous variables, respectively, among depressed and non-depressed individuals based on PHQ-2 score. The individual items in the PHQ-2 as well as the overall PHQ-2 score were examined by depressive status category. Subsequently, ROC analyses were performed for the PHQ-2 using major depression on the DISC-IV as the gold standard.
Given its’ brevity and ease of use, primary care physicians are probably more likely to use the PHQ-9 than the DISC-IV as their diagnostic instrument. ROC analyses were also conducted using the PHQ-9 diagnosis of probable major depression (the presence of 5 or more symptoms of depression occurring on “more than half the days” in the prior week with at least 1 cardinal symptom) as a gold standard.
Finally, to better understand the symptoms of individuals with a “false positive” result (i.e., PHQ-2 score of ≥3 but no diagnosis of major depression on the DISC-IV), we examined whether the “false positive cases” met our screening criteria for intermediate depression on the DISC-IV, or probable anxiety and externalizing disorders based on cutoffs on the SCARED and the PSC-17, respectively.
Of 3,775 eligible youth, 2291 (60.7%) completed the brief survey (Figure 1). Twelve percent of youth (N=281) screened positive for possible depression with a PHQ2 ≥ 3 and 88% screened negative for depressive symptoms. 499 youth were invited to participate in the full baseline assessment of whom 444 (89%) consented and both the parent and child completed the baseline survey. Two youth who met DSM-IV criteria for bereavement were removed from the analytic sample resulting in a final sample of 442 youth for the current analysis.
Study participants were predominantly female (60%), white (71%) and from urban regions (83%). The mean age of participants was 15.3 years (SD = 1.2 years). The median household income for neighborhoods in which subjects lived was $57,442 (SD = $18,293) and 86% of youth had one or more parents who had at least some exposure to higher education. Seven percent of youth were enrolled in a public assistance insurance plan.
Table 1 shows the distribution of scores for each of the individual PHQ-2 items as well as for the full PHQ-2 score by depression status on DISC-IV (major depression, intermediate depression, or no depression). Youth meeting criteria for major depression had significantly higher total scores on the PHQ-2. Youth with “intermediate depression” on the DISC-IV were most likely to report symptoms several days but not nearly every day. Youth with no depression diagnosis were most likely to report no symptoms. When the sensitivity and specificity of each individual item in the PHQ-2 was examined neither individual item performed better than the two combined. This is supported by the significantly greater area under the curve for the PHQ-2 in comparison with either item [Chi-square, df = 2 = 39.97, p < .001].
Table 2 shows the test characteristics of the PHQ-2 using the PHQ-9 and DISC-IV as gold standards. The optimal cut-point for maximizing sensitivity of the PHQ-2 without loss of specificity was a score of 3 or greater. At this cut-point, the PHQ-2 had a sensitivity of 96.2% for detecting youth with probable major depression by PHQ-9 criteria and of 73.7% for detecting youth with major depression on the DISC-IV. The specificity was 82.3% for detecting youth with probable major depression on the PHQ-9 and 75.2% for detecting youth with major depression on the DISC-IV. The positive predictive value was 42% for detecting probable major depression on the PHQ-9 and 11.8% for the DISC-IV. On ROC analysis (Figure 2) the area under the curve for detecting major depression was 0.84 (0.75 – 0.92) using the DISC-IV diagnosis as a gold standard and 0.95 (0.93–0.97) using the PHQ-9 diagnosis as a gold standard.
Youth with a PHQ-2 of ≥3 compared to those with <3 had significantly higher scores for functional impairment as measured by the Columbia Impairment Scale, as well as parent-reported psychosocial impairment as measured by the Pediatric Symptom Checklist (Table 3). Additionally, parental reports of internalizing symptoms were significantly higher for this group.
Item 9 of the PHQ-9 inquires how often the respondent is “thinking that you would be better off dead or that you want to hurt yourself in some way.” Sixteen youth indicated that they had these thoughts “more than half the days” or “nearly every day.” Of these 16 youth, 13 (81%) had a PHQ-2 score of ≥3, while three did not.
The false positive rate, calculated as 1-specificity, was approximately 25% when using the DISC-IV as a gold standard or 18% if using the PHQ-9 as a gold standard. To better understand the characteristics of youth with false positives, we examined the association between having a false positive PHQ-2 and having intermediate depression or a positive screening test for another disorder (anxiety or externalizing disorder). Among adolescents with PHQ-2 scores ≥3 but no DISC-IV diagnosis for major depression (N = 105), 23.3% had a intermediate depression on the DISC-IV, 14.7% had major depression in the past year but not in the prior month, 26.2% had elevated externalizing disorder symptoms (PSC-17 externalizing scale score ≥7), and 55.2% had clinically significant anxiety symptoms (SCARED score ≥3). Taken together, 76.2% of the false positive group had at least one of the four indications we examined: 39.0% had one, 32.4% had two, and 4.8% had three of the four indications.
The US Preventive Services Task Force now advises primary clinicians to screen adolescents for depression provided there is a system of care to confirm diagnosis and initiate treatment.16 To effectively implement broad-based screening in pediatric settings, brief tools are needed. The PHQ-2 is well suited as a first-line screening tool for depression as it is brief, easy to score, and available without cost. When comparing to the PHQ-9, the sensitivity and specificity of the PHQ-2 is similar to what has been found with the Beck Depression Inventory for Primary Care 22 and the Adolescent Version of the Patient Health Questionnaire (PHQ-A).23 Thus, based on our evaluation, the PHQ-2 has good sensitivity and specificity to be used as a first-line screening tool for adolescents in primary care settings.
Compared to the DISC-IV gold standard, the PHQ-2 has a lower specificity than has been found among adults, 75% versus 92%;18 thus 25% of youth without disease would have false positive results compared to only 8% of adults screened. In part, this lower specificity may be due to the high degree of comorbidity and symptom overlap among youth. In our study, we found that 76% of the false positive group had either elevated depressive symptoms that were under the threshold for a major depression diagnosis, depressive symptoms that had met the cut-off for major depression in the past year but their current symptoms had improved, or had screening scores suggestive of externalizing or anxiety disorders. Depressive symptoms exist on a continuum and youth with elevated depressive symptoms are at increased risk for the later development of depression.28, 29 In addition, youth who have had one major depressive episode have a high likelihood of relapse or recurrence.30 Finally, youth who meet criteria for externalizing and anxiety disorders are at increased risk for the development of major depression and may benefit from monitoring of depressive symptoms over time.30 Thus, this study indicates that youth with false positive PHQ-2 scores are likely to be at-risk for subsequent major depression episodes and might benefit from further monitoring.
An additional difference between the adult and the youth DSM-IV criteria for major depressive disorder is that youth may meet the diagnostic criteria by presenting with irritability rather than depressed mood. We chose not to change the wording of the PHQ-2 to allow for the use of a single form for settings where both adolescents and adults are seen. As we did not include this item, we are not able to determine how it may have modified the performance of the PHQ-2.
Based on our findings, we would recommend using a cut-point of 3 or higher to indicate the need for further evaluation for depression, particularly if requiring a full diagnostic assessment on all positive youth. However, providers should be aware that a negative screen does not rule out disorder. The sensitivity of 74% implies that 26% of youth with a depressive disorder would be missed using this cut point. If providers are concerned about missing depressed youth, they might reasonably choose to use a cut-point of ≥2 to maximize sensitivity and then follow this with a longer screening instrument (such as the PHQ-9 or the Beck Depression Inventory22) to improve specificity and reduce the number of youth requiring full assessment. If opting for the second approach, it would be wise to also screen for anxiety given the high rate of elevated anxiety symptoms in youth with false positive PHQ-2 screening results. Additionally, in our study, approximately 20% of youth with suicidal ideation identified by their responses on the PHQ-9 would have been missed by screening with the PHQ-2 alone. Any screening procedures relying on the PHQ-2 might also include a question about suicidality.
This study has four main limitations. First, this study was conducted in an insured population of adolescents in the Pacific Northwest and may not be generalizable to all adolescent populations. Second, the PHQ-2 measure used in this study was a part of the PHQ-9. Thus, one would expect a high degree of correlation of results and the sensitivity and specificity may be slightly lower if they were administered separately. Although this is a limitation, we do feel that our usage parallels common clinical and research practice of administering a screening measure and immediately following the screener with a more definitive measure for those who screen positive. Third, the response rate to our initial brief screen was 60% and we may have had some selection bias regarding youth who participated in the study. We were very encouraged by the 89% participation rate in the follow-up interview study, however, it is possible that youth who chose not to participate were different from those who did. Finally, the DISC-IV asks questions about a 1 month time period while the PHQ-2 asks about the prior 2 weeks. Some of the lack of sensitivity and specificity may be due to these time window differences.
Despite these limitations, the PHQ-2 is a promising screening tool for use among adolescents. It performs well in this age group and will be particularly useful for providers or researchers who want to conduct a quick initial depression screening in primary care settings or as part of web-based or survey screening for health risks.
This work was supported by grants from the Group Health Community Foundation Child and Adolescent Grant Program, the University of Washington Royalty Research Fund, a Seattle Children’s Hospital Steering Committee Award and through a K23 award for Dr. Richardson from the NIMH (5K23 MH069814-01A1).