Our requirement of a formal depression diagnostic interview for depression diagnosis and randomization to treatment groups resulted in problems, including lower than expected depression rates, missed depression interviews, selection bias, and inconsistent PHQ-9/SCID results.
The 8.9% SCID-based depression diagnosis rate seen here was much lower than expected, and in fact, our significantly higher 22% rate of positive PHQ-9 scores over 9 months more closely approximated the 22% one-year prevalence of PPD (major depression) cited in Gaynes’ et al. meta-analysis.1
This notable difference in diagnostic rates raises the question: is the SCID less palatable or convenient for new mothers than the PHQ-9, or is the difference due to a gap in predictive values of the PHQ-9 vs. SCID?
In support of the theory that the SCID might be less convenient or comfortable for mothers than the PHQ-9 is the finding that 15% of participants could not be reached for the telephone-based SCID interview, even after multiple attempts. It is very possible that this group of SCID-non-completers included missed PPD cases. For example, based on our overall 8.9% SCID-positive rate, we would have expected approximately 7 of our 75 SCID non-completers to be SCID-positive, had they been interviewed. This estimate is also supported by the fact that 10 of our SCID non-completers had a positive PHQ-9 at the time contact was attempted. It is important to note that SCID non-completers (vs. completers) were younger, poorer, and more likely to be single and Black, so it is possible that the SCID requirement produced selection bias in the diagnosis and randomization of women to treatment groups, which eventually resulted in treatment disparities.
A number of prior studies that have used diagnostic depression interviews have not specified rates of missed depression interviews.19–23
However, other investigators that have included this information report SCID interview non -completion rates of 66% to 74%,24,25
indicating that this has also been a problem elsewhere.
Another concern was the inconsistency between participants’ SCID and PHQ-9 results. For example, 19 women with very high PHQ-9 scores (15–27, representing moderately severe to severe depression) were either not recognized as depressed by the SCID interview, or the SCID affirmation of depression occurred months later. Conversely, 11 women with positive SCIDs had a negative PHQ-9. Possible reasons for our observed PHQ-9/SCID discrepancies include: inaccuracy of the PHQ-9 or SCID (one would expect greater accuracy with the SCID, our gold standard), the presence of depressive symptoms caused by other mental conditions (e.g., baby “blues,” bipolar disorder, subsyndromal depression, or grief), disparate timing of survey and interview (in this study, a mean of 7 days, with a maximum of 2 weeks), differences in length of time over which symptoms were assessed (2 weeks for PHQ-9, 1 month for SCID), interviewer technique, and mothers’ level of comfort with a particular diagnostic tool or method.
It is interesting that over 90% of PHQ-9-positive women indicated that they had some degree of functional impairment, which speaks to the face validity of the PHQ-9 in this sample. It would be helpful if future studies addressed/confirmed the validity of the PHQ-9 plus the “difficulty” question for identifying PPD in other populations. If the “difficulty” question were found to increase the accuracy, or at least the clinical utility of the PHQ-9 in other populations, it might be used more routinely to help sort out women with false positive PHQ-9 results – women who may be less likely to benefit from depression treatment.
Strengths of this study include its sample size, relative ethnic diversity, primary care base, longitudinal nature, and use of a repeated measures design in assessing PPD with the PHQ-9 and SCID. The study also has weaknesses: though its sample was drawn from 7 family medicine and pediatric clinics, it is not demographically representative of the US population, and its modest response rate (33%) may have contributed to this problem. Although our SCID interviewers were carefully trained and had ongoing weekly supervision to encourage diagnostic consistency, we did not perform formal inter-rater reliability testing. Additional weaknesses are our use of only a single measure of function, and only the depression component of the SCID, which limited our diagnostic capabilities. Finally, this study does notdefinitively compare and validate the SCID and PHQ-9, and it is likely that the use of the PHQ-9 for diagnostic purposes would result in some false positives or misdiagnoses which would need to be sorted out by primary care or mental health providers to avoid mistreatment. Despite these shortcomings, the study provides preliminary findings to help researchers and clinicians weigh certain risks and benefits of using a DSM-IV-based depression interview. Additional research is needed to further evaluate and compare these tools for identifying PPD.
In conclusion, our results show that that the requirement of a diagnostic interview in PPD research can be problematic, as some individuals cannot be reached for an interview, resulting in missed opportunities for diagnosis, selection bias, and possible treatment disparities. In contrast, a depression survey, though perhaps less accurate, would be easier, more cost effective, and more inclusive. Based on these results, if a positive depression diagnosis were required to initiate some form of coordinated care or increased access to other resources, exclusive use of the SCID for diagnosis would disproportionately penalize those who need this help most: the unmarried, racial minorities, less educated, and more impoverished women. These potential problems should be considered when a decision is being made about whether to use a formal DSM-IV based interview to identify depression in research.