|Home | About | Journals | Submit | Contact Us | Français|
The goal was to describe the accuracy of the Edinburgh Postnatal Depression Scale (EPDS), Beck Depression Inventory II (BDI-II), and Postpartum Depression Screening Scale (PDSS) in identifying major depressive disorder (MDD) or minor depressive disorder (MnDD) in low-income, urban mothers attending well childcare (WCC) visits during the postpartum year.
Mothers (N=198) attending WCC visits with their infants 0 to 14 months of age completed a psychiatric diagnostic interview (standard method) and 3 screening tools. The sensitivity and specificity of each screening tool were calculated in comparison with diagnoses of MDD or MDD/MnDD. Receiver operating characteristic curves were calculated and the areas under the curves for each tool were compared to assess accuracy for the entire sample (representing the postpartum year) and sub-samples (representing early, middle and late postpartum time frames). Optimal cut-points were calculated.
At some point between 2 weeks and 14 months postpartum, 56% of mothers met criteria for either MDD (37%) or MnDD (19%). When used as a continuous measures, all scales performed equally well (areas under the curves of ≥ 0.8). With traditional cut-points, the measures did not perform at the expected levels of sensitivity and specificity. Optimal cut-points for the BDI-II (≥14 for MDD, ≥11 for MDD/MnDD) and EPDS (≥9 for MDD, ≥7 for MDD/MnDD) were lower than currently recommended. For the PDSS, the optimal cut-point was consistent with current guidelines for MDD (≥80) but higher than recommended for MDD/MnDD (≥ 77).
Large proportions of low-income, urban mothers attending WCC visits experience MDD or MnDD during the postpartum year. The EPDS, BDI-II and PDSS have high accuracy in identifying depression but cutoff points may need to be altered to more accurately identify depression in urban, low-income mothers.
Postpartum depression affects an average of 1 out of every 7 new mothers in the United States1 with rates as high as 1 out of 4 among poor and minority women.2–4 Multiple, long term negative effects for mothers and infants are well described.5, 5–8 Recent efforts have focused on improving identification of women with postpartum depression.9–11 To increase the potential for early intervention, primary care providers, including pediatricians, are encouraged to screen mothers.10–14 However, practitioners are unsure which instruments to use and whether one is preferable.
Pediatric practitioners must have confidence that the tools accurately identify depression among the women in their diverse practices. Several studies have assessed the accuracy of screening tools in identifying postpartum depression but they have several limitations1. Most did not include significant numbers of low-income or minority women who have higher rates of postpartum depression. Also, most assessed the tools’ accuracy in the early postpartum period (2 and 12 weeks). Since depression can occur anytime in the postpartum year15, 16, and some providers screen mothers throughout the year11, evaluation of the tools’ accuracy at different points throughout the year is critical. Finally, despite endorsement of depression screening in primary care settings17 and established feasibility of postpartum depression screening in pediatric clinics 10, 11, 18–20 few accuracy studies have been conducted in primary care settings, including pediatrics, 21, 22
To address these limitations, we conducted a cross-sectional study among a cohort of low-income mothers attending WCC visits at a pediatric clinic. The study was designed to establish the sensitivity, specificity and operating characteristics of three depression screening tools for low-income urban women during the postpartum year.
The study was approved by the University of Rochester Research Subjects Review Board. All participants provided written informed consent.
Between April 1, 2003 and August 31, 2005, a convenience sample of mothers (N=647) of infants ≤14 months, ≥ 18 years, and attending a WCC visit during the postpartum year at the Strong Pediatric Practice at Golisano Children’s Hospital were invited to complete a demographic questionnaire and Center for Epidemiologic Studies Depression Scale (CES-D)23, 24 and return for a diagnostic interview. Eight were ineligible due to maternal age (<18), language barriers, or previous participation in the study (completion of the SCID earlier or with a previous infant). Of 639 eligible women, 217 (34%) refused but provided non-identifiable demographic information and 422 (66%) provided written informed consent, completed the demographic questionnaire and CES-D. (Figure 1) Of the 422, 28 refused further participation, 9 were excluded, and 198 completed the psychiatric diagnostic interview (Structured Clinical Interview for DSM-IV or SCID25).(Figure 1)
Forty-nine percent (N=187) of eligible women who agreed to complete the SCID did not return. Difficulties with retention, similar to those described by other investigators26, were recognized and addressed early.27 To improve follow-up we, 1) offered an immediate SCID; 2), provided an appointment card at the time of consent 3) sent a confirmation letter, and 4) placed a confirmation phone call 12–48 hours before the appointment. If a subject did not return, we attempted to call and reschedule at her convenience. Subjects received between 1 and 9 calls (mean 4.1, SD 2.06) and were rescheduled a maximum of 3 times. We also followed-up in person at the next WCC visit. Subjects received $40 for participation in the SCID.
Because of the cross-sectional study design, infant age at the time of the maternal interview was used to assign subjects to a postpartum group to assess the screening tools’ accuracy at different points in the postpartum year: 2 weeks to 4 months (Early), >4 to 8 months (Middle), and >8 to 14 months (Late). The groups were chosen to assess the utility of the tools throughout the year, and because the group timeframes coincide with at least 2 WCC visits.
Center for Epidemiologic Studies Depression Scale (CES-D) at initial recruitment: The CES-D is a 20-item self-report measure that has been used to screen for postpartum depression.23, 24, 28 It was used for two primary purposes. To assess for potential bias due to depression, an initial depression measure that was not the focus of the accuracy study, was needed to compare the women who did and did not complete the SCID. In addition, to determine the ROC curves, the sample size is based on the assumption of roughly equal numbers of depressed and non-depressed subjects. By conducting the CES-D, the distribution of women with high (≥16) and low CES-D (<16) scores was monitored. Interviewers were blind to all screening tools’ scores (including the CES-D).
To determine minimum group size for comparing ROC curves across the postpartum groups, power analysis using Power Analysis and Sample Size software (Hintze, J. PASS. NCSS. LLC, Kaysville, Utah). 2008, with 50% depressed subjects, indicated that a sample size of 60 was sufficient to have enough power (80%) to detect a difference of 0.13 to 0.16 depending on the true AUCs.
Demographic information included maternal race, ethnicity, age, marital status, number of children, insurance status and education.
The tools were placed in random order in sealed envelopes to ensure they were not answered in a biased fashion.
The Structured Clinical Interview for DSM-IV (SCID) is a semi-structured interview developed to assess 33 DSM Axis I diagnoses in adults.25, 42 It is considered the “gold standard” to characterize study samples in terms of current psychiatric diagnoses. In this study it was used to establish DSM IV Axis I diagnoses including MDD and MnDD, dysthymia, bipolar, substance use, anxiety, and psychotic disorders. It was administered by a trained rater and reviewed by a psychiatrist (LHC), two psychologists (NLT, SG) and trained raters (HW, EW) to confirm the diagnostic decision. Consensus team members were blind to the screening tool scores.
To compare characteristics of mothers who did and did not complete the SCID, we used T-tests and Wilcoxon Rank-Sum (Mann-Whitney) Tests for continuous demographic variables and Chi-square tests for categorical variables.
To assess the accuracy of the EPDS, BDI, and PDSS, receiver operating characteristic (ROC) curves were computed, for the whole sample as well as for each postpartum group for each tool. In each possible threshold based on the sample, we computed the estimates of the corresponding sensitivity, specificity and positive predictive value (PPV). The ROC curve plots the sensitivity of a measure on the Y-axis and (1-sensitivity) on the X-axis and measures the overall accuracy of a test. The area under the ROC curve (AUC) is the most important summary index of the ROC curve. An ROC curve with AUC > 0.5 suggests the test is informative in that it is better than classifying subjects randomly. An ROC curve with AUC > 0.8 is generally considered as an accurate test. The closer the curve to the upper-left corner (point [0,1]), the bigger the AUC, and the more accurate the test.43
For each of the empirical ROC curve estimates (based on the empirical estimates of the sensitivities and specificities at the observed test levels), the empirical AUC and the associated standard error were estimated. Because each subject completed each tool, their results are correlated. Methods developed by Delong and colleagues 44, which address such within-subject correlations, were used to compare the accuracies among the screening tools, both over the entire sample and for each postpartum group.
The AUCs of each tool were compared across the postpartum groups to assess differential accuracy across the groups. Optimal cut-points for the screening tools were recommended based on the empirical ROC curves. Since sensitivity and specificity estimates change in the opposite direction when the cut-point varies, a good choice should balance between sensitivity and specificity, while maintaining the ROC curve as close to the upper-left corner as possible. We present the results of the optimal cut-points as computed using the criteria that minimizes the Euclidean distance from (sensitivity, specificity) to the point (1,1) in the X-Y plane.45
There were no statistically significant differences in number of children (p = 0.34) or level of education (0.27) between the women who did (N=422) and did not agree (N=217) to participate. There were statistically significant differences in age (p = 0.005), race (p = 0.004), marital status (p = 0.004) and insurance types (p = 0.009) between these groups. Older women, Hispanic women, married women, and women who had private insurance were more likely to refuse.
Of the 422 women who consented to participate, 385 (91%) agreed to complete the SCID but 49% (N=187) did not return. (Figure 1) There were no statistically significant differences between women who did (N=198) and did not (N=224) complete the SCID with regard to maternal age, education, or number of children, mean CES-D scores or proportion of high CES-D scores, but Hispanic women, married women and women with private insurance were less likely to complete the SCID. (Table 1)
Approximately equal numbers of women were recruited into each postpartum group. (Table 2) There were no statistically significant differences in the percentage with MDD or MDD/MnDD among the groups. All groups had rates exceeding 50% for MDD or MnDD.
When evaluated for the entire sample (N=198), each tool performed well for MDD and MDD/MnDD with AUCs of 0.8 or higher. (Figures 2 and and3)3) The AUCs for the BDI, EPDS and PDSS were 0.84 (0.78–0.89), 0.86 (0.81–0.91), 0.83 (0.79–0.89) for MDD respectively and for MDD/MnDD, they were 0.89 (0.84–0.93), 0.87 (0.82–0.92), 0.83 (0.78–0.89) respectively. There were no statistically significant differences in the AUC (MDD Chi square = 1.96, P = 0.38, MDD/MnDD Chi square = 5.64, p = 0.06) between the tools although there is a trend toward significance for MDD/MnDD.
The AUC of the tools were compared to each other within each postpartum group. No statistically significant differences were found between the tools for either MDD or MDD/MnDD within any postpartum group. (Table 3)
To assess potential differences in a tool’s accuracy related to a postpartum period, the AUCs were calculated for each tool for each postpartum group (Table 3) and compared across the groups. When the AUC of each tool was compared across groups, no statistically significant differences were found. In the Late Group, for MDD, no tool reached an AUC of 0.8.
We assessed the sensitivity and specificity to estimate the optimal cut-point for each screening instrument, and compared this to published cut-points. For the BDI-II and EPDS, the optimal cut-points for MDD or MDD/MnDD were lower than published guidelines. 32, 33, 46 (Table 4) For the PDSS, the optimal cut-point for MDD/MnDD is within the range published as consistent with significant symptoms (≥77) however, it is 17 points greater than recommended for Depressive Disorder NOS (or MnDD) (≥60).41 The cut-point for MDD (≥80) is consistent with published recommendations.41
Optimal cut-points for each postpartum group ranged from 0–3 points from the optimal cut-points for the whole sample. (Table 4)
Our study is the first to describe the prevalence of MDD and MnDD using a diagnostic interview among a population of mostly low-income, black young mothers attending WCC visits in an urban pediatric clinic. It is also the first to describe the accuracy of depression screening tools among this understudied population of mothers.
The finding that more than half (56%) of these new mothers meet criteria for MDD or MnDD was unexpected. Many studies cite high rates of depressive symptoms with screening tools2, 3, 11, but none has quantified the prevalence of depression by diagnostic interview in this disadvantaged population. The unexpectedly high rate may due to selection bias. The study participants may have self-identified as needing assistance with their depression and therefore were more likely to meet diagnostic criteria for MDD/MnDD than non-participants. A second possibility is that, based on the differences in sociodemographics between participants and non-participants, the final sample may have been the most economically and socially disadvantaged and therefore at greatest risk for depression. Because of the potential sample biases we cannot generalize the high prevalence to the larger clinic population but we can underscore the need for a group of mothers with high levels of depression to be identified and referred for care.
Another finding is that the proportion of depressed women was essentially equivalent during any 4 month infant age range in the postpartum year. Because of the cross-sectional study design, we cannot accurately identify when incident or recurrent cases occurred. However, the finding, which is similar to previous findings 15, supports the practice of screening at early and late first year WCC visits.
Our findings suggest that the BDI-II, EPDS and PDSS are equally accurate in identifying depression in low-income, black mothers during the postpartum year. The performance of the tools did show some minor variability at different postpartum time points but did not reach a level of statistical significance. Therefore, the findings suggest that pediatric practitioners can be confident using these tools at any first year WCC visit. These findings are similar to those of a study conducted in Pittsburgh with the PDSS-Short Form (PDSS-SF), Patient Health Questionnaire (PHQ-9) and EPDS in which the AUCs for the continuous scales did not show any significant differences. 21
While providers may be reassured that these tools are accurate at detecting depression, presumably, they use published cut-points to guide their clinical evaluations and referrals. Our findings suggest that in this population, using the established cut-points for the BDI-II and EPDS, may lead a clinician to fail to identify many women with depression. Other studies found similar suboptimal performance by screening tools at traditional cut-points.21 While replication of our findings in other settings with similar populations is required to make final recommendations for changing cut-points, pediatric practitioners who use the EPDS or BDI-II should be aware that using traditional cut-points may not be as accurate as previously thought and they may consider decreasing the cut-points for optimal performance. (Table 4) Scores within 2–3 points below traditional cut-points may indicate a need for further evaluation. Studies from different countries and conducted with different ethnic populations have indicated a range of optimal cut-points.34–36, 47–49 If providers use the PDSS our findings support the recommended cut-point (80) for detection of MDD. However, the PDSS appears to overestimate the number of women with MDD/MnDD with a cut-point of 60 thereby potentially unnecessarily labeling women as depressed. Using a higher cut-point may decrease unnecessary referrals. As with any screening tool, clinical evaluation of the specific situation is necessary.
Reasons for the lower optimal cut-points for the BDI-II and the EPDS are not clear. This high risk population may have higher rates of co-morbid medical or mental health concerns that may influence the optimal cut-points. Anxiety, alone, cannot be the explanation as the EPDS has an anxiety subscale as does the PDSS but the optimal cut-points are in opposite directions. Because the BDI-II relies more heavily on somatic symptoms, it might be expected to over-estimate the number of depressed women. Our findings are the reverse. Further exploration of the underlying mechanisms for the different optimal cut-points is indicated.
With the finding that all three tools perform equally well in a low-income black population of new mothers, providers must consider the advantages and disadvantages of each tool. The EPDS is a short screening tool, easy to complete, free to providers, has been used in multiple ethnic and socioeconomic groups and settings and is available in multiple languages. The PDSS allows clinicians to target interventions or referrals as it identifies multiple domains and it is available in Spanish. The disadvantages of the PDSS are its length and cost per use. Advantages of the BDI-II are that many providers are comfortable with its use, it is available in Spanish, and it has been used with adolescents and minority populations. The disadvantages are that it focuses on somatic symptoms that may overlap with normal postpartum adaptation, must be purchased and is not traditionally used with a dichotomous cut-point structure. Providers will need to take all this information into consideration when choosing the right screening tool for their clinic and population.
The sample - urban, low-income, black mothers - is a primary strength of this study as it represents a large number of women in the US about whom little is known. Second, the population was recruited from a pediatric clinic which is important when considering the prevalence of depression among mothers presenting to WCC visits. Third, the sampling size and strategy allowed for sufficient sample sizes to test the accuracy of the tools in the postpartum year and within time periods corresponding to WCC visits, as demonstrated by the relatively tight 95% confidence intervals around the estimated AUCs of the three instruments. The sufficient numbers of depressed and non-depressed women, and the use of a diagnostic interview, allowed us to address prior studies’ limitations.
This study also had limitations. By purposefully sampling from one urban academic medical center clinic that serves a low-income, high risk population, the findings cannot be generalized to more ethnically or socioeconomically diverse populations or other types of pediatric primary care settings. Replication in other sites and types of clinics as well as among ethnically diverse populations is warranted. Another limitation is the cross-sectional study design. Validation of the tools with a longitudinal prospective study would help to determine the tools’ accuracy at repeated visits. Finally, the large proportion of women lost to follow-up limited our ability to determine diagnoses or test the screening tools in women who may represent a slightly different population based on the differences in demographic characteristics. Future studies should attempt to obtain broader representation of this portion of the population.
Depression is highly prevalent among low-income, black mothers attending WCC visits during the postpartum year and can be accurately identified by screening them with the EPDS, PDSS or BDI-II. Depending on the clinical population and screening tool, pediatric practitioners may need to alter the cut-point to more effectively identify those who could benefit from referral and treatment.
Funding: This study was funded by a grant from the National Institute of Mental Health, Award Number K23 MH64476 (Chaudron). Dr Wisner’s work on this study was supported in part by National Institute of Mental Health R01 MH071825 and 2 R01 MH 057102.
We would like to thank the women who participated in this study. We would like to acknowledge the members of our consensus group, Stephanie Gamble, PhD (SG), Nancy Talbot, PhD (NT), Holly Wadkins (HW), Erin Ward (EW).
Dr. Katherine Wisner is on the Advisory Board for Eli Lilly Corp and received a donation of active and placebo transdermal estradiol patches for an NIMH funded study from Novartis (novogyne).
Linda Chaudron, MD, MS; Peter Szilagyi, MD, MPH; Wan Tang, PhD; Elizabeth Anson, MS; Nancy Talbot, PhD; Holly Wadkins, MS: Xin Tu, PhD, have no disclosures.