Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Am Board Fam Med. Author manuscript; available in PMC 2011 May 19.
Published in final edited form as:
PMCID: PMC3097469

Problems Encountered with Using a Diagnostic Depression Interview In a Postpartum Depression Trial

Dwenda Gjerdingen, MD, MS, Patricia McGovern, PhD, MPH, and Bruce Center, PhD



DSM-IV based depression interviews, valued for their diagnostic accuracy, are often considered to be essential for depression treatment trials. However, this requirement can be problematic, due to participant burden. The purpose of this article is to describe our experience with the depression component of the SCID interview (Structured Interview for DSM-IV) in a postpartum depression (PPD) treatment trial.


In this prospective cohort study of 506 mothers of infants from 7 primary care clinics, participants were asked to complete the depression module of the SCID interview soon after enrollment, and the PHQ-9 (9-item Patient Health Questionnaire) depression survey at 0–1, 2, 4, 6, and 9 months postpartum.


Forty-five (8.9%) women had a positive SCID interview and 112 (22.1%) had a positive PHQ-9 over 0–9 months postpartum. Problems encountered in using the SCID depression interview included: 1) lower than expected SCID-based rates of depression diagnosis (8.9%); 2 ) SCID non-completion by 75 (14.8%) women; SCID non-completers (vs. completers) were younger, poorer, less educated, and more likely to be single and Black (vs. Caucasian); and 3) inconsistent SCID/PHQ-9 results: 19 women with moderately severe to severe PHQ-9 score elevations (≥15) had negative SCIDs; all of these were functionally impaired. Over 90% of PHQ-9 positive women reported some degree of impairment from their depressive symptoms.


The requirement of a diagnostic depression interview resulted in selection bias and missed opportunities for depression diagnosis – problems that detract from the interview’s key strength, its diagnostic accuracy. These problems should be considered when electing to use a DSM-IV-based depression interview in research.

Major depressive disorder affects up to 22% of mothers in the year after delivery, according to best estimates from a recent meta-analysis.1 The Diagnostic and Statistical Manual of Mental Disorders (DSM-IV)states that postpartum onset specifier can be applied to major depression if the onset occurs within 4 weeks of delivery.2 However, some experts believe that PPD may also begin later, and that it often lasts for several months.3,4 Therefore, for purposes of this study, we will define postpartum depression (PPD) as major depressive disorder identified over the course of our study, from 0–9 months postpartum. Given the relatively high prevalence and duration of this serious disorder, and the fact that PPD affects not only the mother but also the infant and other family members, ongoing PPD research is sorely needed.

Fundamental to depression research is the proper identification of potential participants with major depression. DSM-IV based depression interviews have long been considered the gold standard for depression diagnosis in research and continue to be commonly used as such.58 The SCID (Structured Clinical Interview for DSM-IV)9 is a widely used DSM -IV based diagnostic interview, and telephone administration of the SCID has been found to be acceptable to patients,10 and to have 97.6% agreement with in-person administration for diagnosing depression (50.0% positive agreement and 97.5% negative agreement).11 While having such a gold standard is necessary when one is validating depression screens, this requirement may be counterproductive for some depression treatment trials.

Potential problems that may result from requiring a formal depression interview for PPD (and other depression) treatment trials include: increased study costs, need for personnel who are trained in administering the interview, and missed cases, either because participants cannot be reached for the interview, or because the interview itself may not accurately elicit depressive symptoms in some patients. For example, a recent study of 168 Melbourne aged-care residents with normal cognitive function found that the point prevalence of major depressive disorder rose from 16% (with the SCID interview alone) to 22% by including an informant clinical interview in the diagnostic procedure. Overall, 27% of depressed residents failed to disclose symptoms in the clinical interview. It was concluded that individual interviews may be insufficient to detect depression among older adults.12

Although mothers of infants are not elderly, they may have other characteristics that would make a formal depression interview an imperfect or cumbersome diagnostic tool. For example, they have round-the-clock care-giving responsibilities for their dependent infants, so their schedules may be erratic, making it difficult to find uninterrupted time for an interview. They may also be threatened by the idea of verbally disclosing their depressive symptoms, for fear that they will be seen as unfit mothers and their infants will be taken away. Previous research has demonstrated this fear to be a barrier to postpartum depression diagnosis and/or treatment.1314 Indeed, even apart from such fears, some individuals may find it easier to disclose personal feelings on a written survey than verbally, as suggested by a maternal depression screening study where higher rates of positive screens were seen with a paper-based screen than with an interview-based screen (22.9% vs 5.7%).15

The present study was part of a randomized controlled trial testing the benefit of collaborative stepped care on postpartum depression outcomes. Participants completed interval PHQ-9 depression surveys (9-item Patient Health Questionnaire)16 and SCID interviews, and a positive SCID was required for mothers’ randomization to treatment groups. We selected the PHQ-9 as an alternative depression measure in this study because it includes the diagnostic criteria for depression16 and therefore might be considered a valid diagnostic tool for future PPD studies if validity data supported this use. As reported previously, the sensitivity and specificity of the PHQ-9 in this postpartum sample, using a PHQ-9 score cut-off of >10 and the SCID as the criterion standard, were 82% and 84% respectively.17

The purpose of this study was to relate the authors’ experience with the SCID interview in a primary care-based sample of postpartum women. Specifically, we sought to compare rates of positive SCIDs and PHQ-9s, determine the frequency of missed SCID interviews and the demographic characteristics of those with missed SCID interviews, and investigate inconsistencies between SCID and PHQ-9 results. Also, we evaluated positive PHQ-9 scores against the PHQ-9 follow-up question that asks how difficult the respondent’s depressive symptoms make it to function.


Participants and Procedures

This prospective cohort study, conducted within the context of a randomized controlled trial to test the benefit of collaborative stepped care for postpartum depression, was approved by the University of Minnesota and North Memorial Hospital Institutional Review Boards. Participants were recruited during their infants’ initial well-child visits at a participating clinic from October 1, 2005, through September 30, 2006 (exception: 20 women affiliated with one of the clinics were enrolled during their maternity hospital stay). Participating clinics included 4 family medicine and 3 pediatric clinics, all located in the Minneapolis/St. Paul, Minnesota metropolitan area. Inclusion criteria were: English literate mother (this was evaluated by telephone in questionable cases, and telephone surveys were offered to English-speaking women who preferred this option), ≥12 years of age, with newborn infant (0–1 month of age) who was registered at a participating clinic.

During the infant’s initial well-child visit, mothers were informed of the study and received a consent form and initial survey which could be completed at the time of the visit or later and returned by mail. Participants were given 2-, 4-, and 6-month follow-up surveys at subsequent well-child visits (or alternatively, they completed telephone or mailed surveys), and they received the final 9-month survey by mail.

Mothers were also asked to complete the depression module of the Structured Clinical Interview for DSM-IV (SCID) by telephone within 2 weeks of the initial survey, and again later if a previously non-depressed woman had a positive depression screen. The SCID interview served as our reference standard for the diagnosis of major depressive disorder, and was conducted by 3 trained psychology doctoral students whose training consisted of observing SCID training tapes and completing 5 practice tapes under the supervision of an experienced doctoral-level clinical psychologist. Interviewers also had ongoing weekly group supervision throughout the study to foster consistency and to address any assessment questions or uncertainties. SCID interviewers attempted to contact participants as soon as possible after they completed their initial survey (2 week maximum interval), and several attempts were made to call difficult-to-reach mothers. Only participants with positive SCIDs were formally diagnosed as depressed and randomized for the treatment trial component of the study.


Survey measures included: 1) demographic characteristics (initial survey), including age, level of education, marital status, number of children, race/ethnicity, total family income, and health insurance; 2) PHQ-9 depression survey, included in all surveys, positive if score ≥10;16 3) the PHQ-9 follow-up “difficulty” question, which assesses functional impairment by asking how difficult the depressive symptoms made it for the woman to do her work, take care of things at home, or get along with other people (responses: not at all difficult, somewhat difficult, very difficult, extremely difficult); and 4) the telephone administered depression module of the SCID interview.

Statistical Analyses

Descriptive analyses assessed: participants’ characteristics, numbers of women with positive PHQ-9s and SCIDs over the 9-month course of the study, numbers of women who could not be reached for a SCID interview, inconsistent PHQ-9/SCID results, and record of positive PHQ-9 results prior to SCID-based depression diagnoses.

Bivariate analyses (chi-square and t-tests)were used to compare PHQ-9 positive vs. negative women in their responses to the “difficulty” question, and SCID-completers vs. non-completers on PHQ-9 scores and demographic characteristics.


Participants’ Characteristics

A total of 506 women with infants participated in the study, which represents a response rate of approximately 33% (506 participants/1556 eligible women), with non-responses being due to refusals and mothers either ignoring or not being offered an enrollment form.17

A majority of participants were white, married, and employed (Table 1). However, compared to U.S. norms,18 our sample had a smaller proportion of Whites (67.0% vs. 79.6%), and a larger proportion of Blacks (17.6% vs 12.9%), Asians (6.7% vs. 4.6%), women with 4-year degrees (52.2% vs. 24.4%), and family incomes below the poverty threshold (27.3% vs. 13%). Sixty-seven percent of participants were recruited from pediatric clinics and 33% from family medicine clinics. Thirty-four (6.7%) participants dropped out before completing the final survey. More detail about demographic characteristics was provided in a previous publication.17

Table 1
Participants’ Demographic Characteristics (total n = 506)

Women with Positive PHQ-9 and SCID Results

Over the 0–9 month postpartum course observed here, 112 (22.1%) women had a positive PHQ-9 (score ≥10), and 45 (8.9%) had a positive SCID interview (Table 2). The SCID interview was conducted an average of 7 days after the initial PHQ-9.

Table 2
Numbers (%) of Women with Positive PHQ-9 and SCID by Interval.

Missed SCID Interviews

A total of 75 (14.8%) women could not be reached for a SCID depression interview: 68 could not be reached initially at 0–1 month postpartum, and an additional 7 could not be reached subsequently at the time of a positive follow-up PHQ-9 screen. Of the 68 women who could not be reached initially for a SCID interview, 10 (14.7%) had a positive PHQ-9 at the time the call was attempted.

When initial SCID non-completers were compared to completers, SCID non-completers were found to be younger, less educated, more often single and Black, had lower incomes, and were more likely to be on medical assistance (Table 3).

Table 3
Differences in Characteristics between Women Who Completed SCID Interview at 0–1 Months, and Those Who Did Not, As Determined By T-Tests and Chi-Square Tests

Inconsistent SCID/PHQ-9 Results

Nineteen women with PHQ-9 scores of 15 or greater (consistent with moderately severe to severe depression) did not receive a SCID-based depression diagnosis, either because they could not be reached for an interview (n = 5) or they had a negative SCID interview (n = 14). One of these women had a PHQ-9 score of 25 with suicidal ideation at the time of a negative SCID (she was called to confirm her depressive symptoms and safety, and was advised to seek immediate help). On the other hand, 11 women with positive SCIDs had negative PHQ-9 scores at the time.

Of the 25 women who became SCID-positive after the initial 0–1 month interval, 9 (36%) had had positive PHQ-9 results at an earlier interval, when the SCID was either negative or not done. Two of the 9 women had had prior PHQ-9 scores that were consistent with moderately severe depression (score range 15–19), and 2 had had prior scores consistent with severe depression (score range 20–27).

Functional Impairment

Women with positive PHQ-9 scores differed significantly from women with negative PHQ-9 scores on their responses to the question, “How difficult have these problems (depressive symptoms) made it for you to do your work, take care of things at home, or get along with other people?” At each of the study intervals, over 90% of PHQ-9 positive women, compared to 35%–46% PHQ-9 negative women, indicated that their depressive symptoms made it somewhat to extremely difficult to function (p < 0.001; Table 4).

Table 4
Responses of Participants to the “Difficulty” Question by Survey Interval: Number of Women with Positive [vs. Negative] PHQ-9 Scores

All of the 19 women with PHQ-9 scores of ≥15 (consistent with moderately severely or severe depression) and negative SCIDs reported that their depressive symptoms made it somewhat to extremely difficult to function: 10 women found it somewhat difficult, 8 very difficult, and 1 extremely difficult to function.


Our requirement of a formal depression diagnostic interview for depression diagnosis and randomization to treatment groups resulted in problems, including lower than expected depression rates, missed depression interviews, selection bias, and inconsistent PHQ-9/SCID results.

The 8.9% SCID-based depression diagnosis rate seen here was much lower than expected, and in fact, our significantly higher 22% rate of positive PHQ-9 scores over 9 months more closely approximated the 22% one-year prevalence of PPD (major depression) cited in Gaynes’ et al. meta-analysis.1 This notable difference in diagnostic rates raises the question: is the SCID less palatable or convenient for new mothers than the PHQ-9, or is the difference due to a gap in predictive values of the PHQ-9 vs. SCID?

In support of the theory that the SCID might be less convenient or comfortable for mothers than the PHQ-9 is the finding that 15% of participants could not be reached for the telephone-based SCID interview, even after multiple attempts. It is very possible that this group of SCID-non-completers included missed PPD cases. For example, based on our overall 8.9% SCID-positive rate, we would have expected approximately 7 of our 75 SCID non-completers to be SCID-positive, had they been interviewed. This estimate is also supported by the fact that 10 of our SCID non-completers had a positive PHQ-9 at the time contact was attempted. It is important to note that SCID non-completers (vs. completers) were younger, poorer, and more likely to be single and Black, so it is possible that the SCID requirement produced selection bias in the diagnosis and randomization of women to treatment groups, which eventually resulted in treatment disparities.

A number of prior studies that have used diagnostic depression interviews have not specified rates of missed depression interviews.1923 However, other investigators that have included this information report SCID interview non -completion rates of 66% to 74%,24,25 indicating that this has also been a problem elsewhere.

Another concern was the inconsistency between participants’ SCID and PHQ-9 results. For example, 19 women with very high PHQ-9 scores (15–27, representing moderately severe to severe depression) were either not recognized as depressed by the SCID interview, or the SCID affirmation of depression occurred months later. Conversely, 11 women with positive SCIDs had a negative PHQ-9. Possible reasons for our observed PHQ-9/SCID discrepancies include: inaccuracy of the PHQ-9 or SCID (one would expect greater accuracy with the SCID, our gold standard), the presence of depressive symptoms caused by other mental conditions (e.g., baby “blues,” bipolar disorder, subsyndromal depression, or grief), disparate timing of survey and interview (in this study, a mean of 7 days, with a maximum of 2 weeks), differences in length of time over which symptoms were assessed (2 weeks for PHQ-9, 1 month for SCID), interviewer technique, and mothers’ level of comfort with a particular diagnostic tool or method.

It is interesting that over 90% of PHQ-9-positive women indicated that they had some degree of functional impairment, which speaks to the face validity of the PHQ-9 in this sample. It would be helpful if future studies addressed/confirmed the validity of the PHQ-9 plus the “difficulty” question for identifying PPD in other populations. If the “difficulty” question were found to increase the accuracy, or at least the clinical utility of the PHQ-9 in other populations, it might be used more routinely to help sort out women with false positive PHQ-9 results – women who may be less likely to benefit from depression treatment.

Strengths of this study include its sample size, relative ethnic diversity, primary care base, longitudinal nature, and use of a repeated measures design in assessing PPD with the PHQ-9 and SCID. The study also has weaknesses: though its sample was drawn from 7 family medicine and pediatric clinics, it is not demographically representative of the US population, and its modest response rate (33%) may have contributed to this problem. Although our SCID interviewers were carefully trained and had ongoing weekly supervision to encourage diagnostic consistency, we did not perform formal inter-rater reliability testing. Additional weaknesses are our use of only a single measure of function, and only the depression component of the SCID, which limited our diagnostic capabilities. Finally, this study does notdefinitively compare and validate the SCID and PHQ-9, and it is likely that the use of the PHQ-9 for diagnostic purposes would result in some false positives or misdiagnoses which would need to be sorted out by primary care or mental health providers to avoid mistreatment. Despite these shortcomings, the study provides preliminary findings to help researchers and clinicians weigh certain risks and benefits of using a DSM-IV-based depression interview. Additional research is needed to further evaluate and compare these tools for identifying PPD.

In conclusion, our results show that that the requirement of a diagnostic interview in PPD research can be problematic, as some individuals cannot be reached for an interview, resulting in missed opportunities for diagnosis, selection bias, and possible treatment disparities. In contrast, a depression survey, though perhaps less accurate, would be easier, more cost effective, and more inclusive. Based on these results, if a positive depression diagnosis were required to initiate some form of coordinated care or increased access to other resources, exclusive use of the SCID for diagnosis would disproportionately penalize those who need this help most: the unmarried, racial minorities, less educated, and more impoverished women. These potential problems should be considered when a decision is being made about whether to use a formal DSM-IV based interview to identify depression in research.


This study was funded by the National Institute of Mental Health (R34 MH072925). The contents of this paper are the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Mental Health.


The authors have no conflicts of interest.

Contributor Information

Dwenda Gjerdingen, Department of Family Medicine & Community Health, University of Minnesota.

Patricia McGovern, School of Public Health, University of Minnesota.

Bruce Center, Department of Educational Psychology, University of Minnesota.


1. Gaynes BN, Gavin N, Meltzer-Brody S, Lohr KN, Swinson T, Gartlehner G, Brody S, Miller WC. Perinatal depression: prevalence, screening accuracy, and screening outcomes. Evidence Report/Technology Assessment No. 119. AHRQ Publication No. 05-E006-2. Rockville, MD: Agency for Healthcare Research and Quality; Feb, 2005. [PubMed]
2. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders. 4. Washington, DC: American Psychiatric Association; 2000. (DSM -VI)
3. Gale S, Harlow BL. Postpartum mood disorders: a review of clinical and epidemiological factors. J Psychosom Obstet Gynecol. 2003;24:257–266. [PubMed]
4. Wisner KL, Parry BL, Piontek CM. Postpartum depression. N Engl J Med. 2002;347(3):194–199. [PubMed]
5. Lowe B, Kroenke K, Herzog W, Grafe K. Measuring depression outcome with a brief self-report instrument: sensitivity to change of the Patient Health Questionnaire (PHQ-9) Journal of Affective Disorders. 2004;81(1):61–66. [PubMed]
6. Zubaran C, Foresti K, Schumacher MV, Amoretti AL, Mullet LC, Thorell MR, White G, Madi JM. Validation of a screening instrument for postpartum depression in Southern Brazil. Journal of Psychosomatic Obstetrics & Gynecology. 2009;30(4):244–54. [PubMed]
7. Lee DT, Yip AS, Chiu HF, Leung TY, Chung TK. Screening for postnatal depression: are specific instruments mandatory? Journal of Affective Disorders. 2001;63(1–3):233–8. [PubMed]
8. Bernstein IH, Wendt B, Nasar SJ, Rush AJ. Screening for major depression in private practice. Journal of Psychiatric Practice. 2009;15(2):87–94. [PMC free article] [PubMed]
9. First MG, Spitzer RL, Gibbon M, Williams JB. Structured Clinical Interview for DSM-IV Axis I Disorders. Clinical version, administration booklet. New York: Biometrics Research Department, New York State Psychiatric Institute; 1997.
10. Allen K, Cull A, Sharpe M. Diagnosing major depression in medical outpatients: acceptability of telephone interviews. J Psychosom Res. 2003;55:385–387. [PubMed]
11. Cacciola JS, Alterman AI, Rutherford MJ, McKay JR, May DJ. Comparability of telephone and in-person Structured Clinical Interview for DSM-III-R (SCID) diagnoses. Sage Social Science Collections. 1999;6 (3):235–242. [PubMed]
12. Davison TE, McCabe MP, Mellor D. An examination of the “gold Standard” diagnosis of major depression in aged-care settings. Am J Geriatr Psychiatry. 2009;17:359–367. [PubMed]
13. Heneghan AM, Mercer MB, DeLeone NL. Will mothers discuss parenting stress and depressive symptoms with their child’s pediatrician? Pediatrics. 2004;113(3):460–467. [PubMed]
14. Gjerdingen D, Crow S, McGovern P, Miner M, Center B. Stepped care treatment of postpartum depression: impact on treatment, health, and work outcomes. J Am Board Fam Med. 2009;22:473–482. [PMC free article] [PubMed]
15. Olson AL, Dietrich AJ, Prazar G, Hurley J, Tuddenham A, Hedberg V, Naspinsky DA. Two approaches to maternal depression screening during well child visits. Developmental and Behavioral Pediatrics. 2005;26(3):169–176. [PubMed]
16. Kroenke K, Spitzer RS, Williams JBW. Validity of a brief depression severity measure. J Gen Intern Med. 2001;16:606–613. [PMC free article] [PubMed]
17. Gjerdingen D, Crow S, McGovern P, Miner M, Center B. Postpartum depression screening at well-child visits: validity of a 2-question screen and the PHQ-9. Ann Fam Med. 2009;7(1):63–70. [PubMed]
18. U.S. Census Bureau. State and County QuickFacts. Downloaded on 10-27-10 from:
19. Blumenthal JA, Babyak MA, Doraiswamy M, Watkins L, Hoffman BM, Barbour KA, Herman S, Craighead WE, Brosse AL, Waugh R, Hinderliter A, Sherwood A. Exercise and pharmacotherapy in the treatment of major depressive disorder. Psychosomatic Medicine. 2007;69:587–596. [PMC free article] [PubMed]
20. Kammerer M, Marks M, Pinard C, Taylor A, Von Castelberg B, Kunzli H, Glover V. Symptoms associated with the DSM IV diagnosis of depression in pregnancy and post partum. Arch Womens Ment Health. 2009;12:135–141. [PubMed]
21. Lee DT, Yip AS, Chiu HF, Chung TK. Screening for postnatal depression using the double-test strategy. Psychosomatic Medicine. 2000;62(2):258–63. [PubMed]
22. Cooper PJ, Murray L. The impact of psychological treatments of postnatal depression on maternal mood and infant development. In: Murray L, Cooper PJ, editors. Postpartum Depression and Child Development. London: Guilford Press; 1997.
23. Muzik M, Klier CM, Rosenblum KL, Holzinger A, Umek W, Katschnig H. Are commonly used self-report inventories suitable for screening postpartum depression and anxiety disorders? Acta Psychiatrica Scandinavica. 2000;102(1):71–3. [PubMed]
24. Lee DTS, Yip ASK, Chan SSM, Tsui MHY, Wong WS, Chung TKH. Postdelivery screening for postpartum depression. Psychosomatic Medicine. 2003;65:357–361. [PubMed]
25. Wittkampf K, van Ravesteijn H, Baas K, vande Hoogen H, Schene A, Bindels P, Lucassen P, van de Lisdonk E, van Weert H. The accuracy of Patient Health Questionnaire-9 in detecting depression and measuring depression severity in high-risk groups in primary care. General Hospital Psychiatry. 2009:451–459. [PubMed]