|Home | About | Journals | Submit | Contact Us | Français|
The objective of the current study was to delineate the optimal cutpoints for depression rating scales during pregnancy and the postpartum period and to assess the perinatal factors influencing these scores. Women participating in prospective investigations of maternal mental illness were enrolled prior to 28 weeks gestation and followed through 6 months postpartum. At each visit, subjects completed self-rated depression scales – Edinburgh Postnatal Depression Scale (EPDS) and Beck Depression Inventory (BDI) and clinician-rated scales – Hamilton Rating Scale for Depression (HRSD17 and HRSD21). These scores were compared to the SCID Mood Module for the presence of fulfilling diagnostic criteria for a major depressive episode (MDE) during 6 perinatal windows: preconception; first trimester; 2nd trimester; 3rd trimester; early postpartum; and later postpartum. Optimal cutpoints were determined by maximizing the sum of each scale’s sensitivity and specificity. Stratified ROC analyses determined the impact of previous pregnancy and comparison of initial to follow-up visits. A total of 534 women encompassing 640 pregnancies and 4025 follow-up visits were included. ROC analysis demonstrated that all 4 scales were highly predictive of MDE. The AUCs ranged from 0.857 to 0.971 and were all highly significant (p<0.0001). Optimal cutpoints were higher at initial visits and for multigravidas and demonstrated more variability for the self-rated scales. These data indicate that both clinician-rated and self-rated scales can be effective tools in identifying perinatal episodes of major depression. However, the results also suggest that prior childbirth experiences and the use of scales longitudinally across the perinatal period influence optimal cutpoints.
Maternal depression during pregnancy and the postpartum period, i.e. perinatal depression, is a common problem that has been the focus of extensive investigation. Studies examining the prevalence of perinatal depression have demonstrated considerable variability that is a consequence, at least in part, of the assessment method used to identify the presence of depression, the timing of the assessment, and population characteristics (Gaynes et al 2005). Authors of one review recommended that more precise determinants of the occurrence of perinatal depression are needed to estimate disease burden more accurately (Gaynes et al 2005).
Depressive symptoms are common in pregnancy with most studies reporting rates comparable to non-gravid women (Cutrona 1983; Kumar & Robson 1984; Watson et al 1984; Gotlib et al 1989; O’Hara et al 1991). A meta-analysis of depression during pregnancy (Bennett et al 2004), utilizing data encompassing 19,284 gravidas from 21 studies in which depression was assessed by a structured clinical interview or self-rated scale such as the Beck Depression Inventory (BDI) (Beck et al 1961), or the Edinburgh Postnatal Depression Scale (EPDS) (Cox et al 1987), estimated the prevalence of depression as 7.4% in the first trimester, 12.8% in the second trimester, and 12.0% in the third trimester. However, the data were inadequate to render conclusions regarding comparative risk between trimesters. Furthermore, the authors reported that the BDI produced significantly higher prevalence estimates, whereas EPDS estimates were statistically equivalent to those of structured clinical interviews.
Depression during the postpartum period has also garnered considerable attention. An earlier meta-analysis by O’Hara and Swain (1996), encompassing 12,810 postpartum women from 59 studies utilizing a clinical interview or self-report scale, estimated the prevalence of depression in the postpartum period at 13.0%. Similar to the pregnancy data, self-report measures yielded higher estimates of postpartum depression than clinician-administered assessments. The postpartum timing of the assessment did not significantly affect the prevalence estimates in this meta-analysis. A review of the prevalence studies found that 7.1% may experience a major depressive episode (MDE) during the first 3 months postpartum (Gavin et al 2005). Despite the historical assumption of increased vulnerability to depression in the postpartum period, the literature has not definitively demonstrated an increased risk (Gavin et al 2005). In contrast, a recent large-scale epidemiological study provided evidence of increased risk for major depression in the postpartum period compared with non-pregnant/non-postpartum women (adjusted odds ratio: 1.52; 95% CI: 1.07–2.15) (Vesga-Lopez et al 2008). Moreover, women are more likely to require psychiatric admission for depression during the postpartum period than outside the puerperium (Kendell et al 1987; Munk-Olsen et al 2006).
Numerous scales have been developed for identifying postpartum depression or risk factors for the development of postpartum depression (Beck 1995; Ferguson et al 2002; Morris-Ruth et al 2003; Perfetti et al 2004; Austin et al 2005). The EPDS has emerged as a well-validated and widely-utilized instrument for postpartum depression screening and detection. Conversely, validated tools to assess depression during pregnancy are lacking (Gaynes et al 2005). By default, the EPDS, developed for postpartum use, has been increasingly used to identify depression during pregnancy (Adouard et al 2005; Thoppil et al 2005; Felice et al 2006) and to screen for those at risk for developing depression during pregnancy (Evans et al 2001; Rubertsson et al 2005; Gordon et al 2006). Beyond this ad hoc use of the EPDS, no scale exists to identify major depressive disorder during pregnancy. Moreover, only one screening test, an unvalidated scale consisting of only two items, has been developed specifically for depression in pregnancy (Campagne 2004). Our group in collaboration (Altshuler et al 2008), recently completed an individual item analysis of the 28 item Hamilton Rating Scale for Depression (HRSD) compared to SCID Mood Module to identify the items most predictive of an accurate identification of an episode of major depression across all trimesters of pregnancy. The seven items most predictive of the presence of depression were tested as a screening tool for depression during pregnancy (Altshuler et al 2008).
The urgent need to identify reliable instruments for detecting perinatal depression is underscored by: 1) numerous reports of adverse obstetrical, neonatal, and developmental outcomes in association with maternal stress, depressive symptoms, and episodes of major depression during the perinatal period (Paton et al 1977; Zuckerman & Bresnahan 1991; Sheer et al 1992; Hedegaard et al 1993; Pritchard & Teo 1994; Orr & Miller 1995; Chung et al 2001; Andersson et al 2004; Mancuso et al 2004; Dayan et al 2006; Diego et al 2006; Neggers et al 2006); 2) accurate diagnosis of a MDE during the peripartum is complicated by the fact that purportedly normal perinatal symptoms (e.g., fatigue, sleep disturbance, appetite and weight changes, diminished libido) potentially overlap with the neurovegetative symptoms comprising part of the diagnostic criteria for major depression; 3) lower estimates of maternal mental illness during pregnancy may be in part secondary to limited recognition (Vesga-Lopez et al 2008); and 4) validated assessment tools are a requisite step in the design and completion of much needed controlled treatment studies during the perinatal period.
The overall aim of the current study was to provide clinicians and researchers alike with information regarding the sensitivity and specificity of commonly used depression rating scales during pregnancy and the postpartum period. The specific objectives of the study were: 1) to identify optimal cutpoints (maximizing the summation of sensitivity and specificity) for commonly used depression rating scales during each trimester of pregnancy and the postpartum period; 2) to determine whether previous pregnancy and childbirth experience influences the performance of the rating instruments; and 3) to determine whether repeated administration of a depression rating scale over the course of pregnancy and the postpartum period is associated with learning effects that alter the optimal cutpoints for the rating scales. With respect to these objectives, our a priori hypotheses were: 1) that the performance of the scales including optimal cutpoints would be altered during pregnancy, particularly during the third trimester when many of the physical symptoms of pregnancy most closely mirror the neurovegetative symptoms of depression; 2) that multigravid women (having previously experienced the physical sequelae of gestation) would be more likely to report physical symptoms of depression on a depression rating scale than primigravid women producing higher cutpoints on the scales during pregnancy; and 3) that optimal cutpoint scores would be impacted by repeated administration of both clinician-administered and self-rated depression scales.
The study was conducted at the Women’s Mental Health Program (WMHP) at the Emory University School of Medicine. Women with a lifetime history of mental illness participating in one of two prospective longitudinal perinatal investigations of the pharmacokinetics of psychotropic medications and/or maternal stress (P50 MH 68036; P50 MH 77928) were screened for inclusion in the current analysis. The schedule and methods for assessing maternal depression were identical in the two studies. Participants were enrolled no later than week 28 of gestation and evaluated at 4–6 week intervals across pregnancy and through 26 weeks postpartum. At each visit, subjects completed the self-rated BDI and EPDS. In addition, a research interviewer masked to treatment status administered the Structured Interview Guide (Williams 1988) for the Hamilton Rating Scale for Depression (Hamilton 1960) to obtain 17-item (HRSD17) and 21-item (HRSD21) scores and the Mood Module of the Structured Clinical Interview for DSM-IV Axis I Disorders (First et al 2002). To ensure consistent administration of the clinician-rated instruments, research interviewers were trained to use a “rate as you see” approach when scoring items, eschewing any subjective judgment as to whether symptoms were due to depression or pregnancy/postpartum. Quarterly inter-rater reliability assessments were conducted throughout the course of both investigations to ensure maintenance of kappa statistics ≥ 0.8 on all clinician-administered instruments. All scales were coded with a HIPAA compliant identifier and entered into a centralized database. Subjects were included in the current analysis if they had two or more perinatal visits during which the SCID Mood Module and one or more of the depression rating scales were completed. The investigation was carried out in accordance with the latest version of the Declaration of Helsinki. The study was reviewed and approved by the Emory University Institutional Review Board. Informed consent of the participants was obtained after the nature of the procedures had been fully explained.
Each visit was assigned to one of 6 distinct perinatal epochs including: 1) preconception; 2) 1st trimester (0–12 weeks gestation); 3) 2nd trimester (13–24 weeks gestation); 4) 3rd trimester (25 weeks gestation to delivery); 5) early postpartum (0–6 weeks); and 6) late postpartum (7–26 weeks). A completed SCID Mood Module plus one or more of the depression symptom scales was necessary for a visit to qualify for inclusion. At each visit, the presence (or absence) of a MDE was determined by the SCID Mood Module.
To assess the diagnostic accuracy of the symptom scales within each perinatal epoch and the overall perinatal period, the receiver operating characteristic curve (ROC) analysis proposed by Obuchowski (1997), which accounts for correlation due to repeatedly measured rating scales from each subject, was used. Defining a cutpoint score for a scale as the score such that greater or equal scores are considered consistent with the presence of a MDE, then the optimal cutpoint score for each scale was defined as the score at which the sum of the scale’s sensitivity and specificity was maximized within each perinatal epoch. Stratified ROC analyses were conducted to examine the impact of primigravid vs. multigravid pregnancies and first visit vs. follow-up visits on the accuracy of depression scales for identifying an active MDE. Typically, the diagnostic accuracy of a symptom scale is considered poor when ROC AUC < 0.70, fair when 0.70 ≤ ROC AUC < 0.80, good when 0.80 ≤ ROC AUC < 0.90, and excellent when 0.90 ≤ ROC AUC (Tape 2008). Wald tests were used to compare ROC AUCs. All statistical tests were two-tailed and conducted at a significance level of 0.05.
A total of 708 women were enrolled in longitudinal perinatal investigations of the pharmacokinetics of psychotropic medications and/or maternal stress. One hundred and seventy four women were excluded secondary to missing data – most commonly, lack of adequate follow-up visit data. Table 1 summarizes the characteristics of the entire study population, the subpopulation included in the analysis and the subpopulation excluded from the analysis due to missing data. The included group is demographically more diverse than the excluded group; specifically, the included group has higher minority representation, includes more unmarried subjects, has more subjects who have completed no more than a high school education, and has a higher proportion of unplanned pregnancies.
Five hundred and thirty four participants (75.4% of all enrollees), encompassing 640 pregnancies with 4025 visits, were included in the current analysis. The sample was very homogeneous: 33.1± 5.1 years of age, predominately Caucasian (85.8%), married or cohabitating at study entry (85.4%), and with a high school education or greater (92.0%). Because the gestational age at study entry varied (preconception through ≤ 28 weeks), the number of subjectswithin each perinatal epoch varied..
ROC analysis demonstrated that all symptom scales (BDI, EPDS, HRSD17, HRSD21) were highly predictive of MDE (cf. Table 2). The ROC AUCs, ranging from 0.855 to 0.971, were all statistically significant (p<0.0001). For all 4 scales, the peak ROC AUC was observed during the preconception period. As hypothesized, the ROC AUC was lowest during the third trimester for the BDI, HRSD17, and HRSD21. The AUC was lowest for the EPDS during the early postpartum. ROC AUCs for the EPDS suggest that it achieved excellent diagnostic validity (i.e., AUC ≥ 0.90) in 3 of the 6 epochs. The BDI achieved excellent diagnostic validity in 2 of 6 epochs, and the two HRSD scores each achieved excellent diagnostic validity in 1 of 6 epochs. The remainder of the AUCs for all 4 scales in all epochs were in the good range (0.80 ≤ AUC < 0.90). The EPDS (preconception, first trimester, third trimester) and BDI (second trimester, early postpartum, late postpartum) each achieved the largest ROC AUC among all scales in 3 of the 6 epochs.
To compare AUCs within each epoch, the analysis was also conducted using “complete cases” only, i.e., limited to visits in which all four depression scales were completed, and the results were similar. During the 2nd trimester, BDI had the largest AUC (0.908), which was significantly larger (p=0.02) than the smallest AUC (0.852, of HRSD17). For other epochs, there was no such significant difference between the largest and smallest AUC (data not shown).
Despite the consistently good to excellent diagnostic validity of all four scales across the 6 epochs, there was considerable variability in the optimal cutpoint scores for the self-rated instruments. The EPDS produced the greatest variation with the optimal cutpoint ranging from a low of 9 during the second trimester to a peak of 18 during preconception. Optimal cutpoints for the BDI ranged from a low of 12 during the third trimester to a high of 17 during preconception. Optimal cutpoints for the HRSD17 varied by only 1 point (from 14 to 15) and for the HRSD21 by only 2 points (from 14 to 16). Interestingly, the optimal cutpoint scores were highest during the preconception period for all four scales.
As noted above, there were 640 pregnancies completed by the 534 subjects included in the current analysis. Among these 640 pregnancies, there were 161 primigravid pregnancies, 465 multigravid pregnancies and 14 pregnancies with missing gravidity/parity data. Demographic analysis indicates that the primigravid group was younger (31.5 vs. 34.4 years, p<.0001) and less likely to be married (80.1% vs. 89.5%, p=.0006) than the multigravid group. There were no other significant demographic differences between the groups (data not shown).
For all four depression scales, across all pregnancy stages, the overall ROC AUC, optimal cutpoint, and summation of sensitivity and specificity for primigravid pregnancies is very similar to that observed for multigravid pregnancies, suggesting that gravidity status had no discernible impact of the global performance of the scales (cf. Table 3). However, inspection of the third trimester results demonstrates that cutpoints are higher for multigravid pregnancies for all 4 scales. In addition, the sensitivity of the scales corresponding to the optimal cutpoint is lower for multigravidas than primigravidas during the third trimester for all scales but the EPDS. Furthermore, within the multigravid group, the sensitivity of the HRSD17, HRSD21, and BDI at the optimal cutpoints reaches its nadir during the third trimester.
Among the 534 subjects included in the current analysis, 178 subjects had previously participated in other WMHP studies, and 356 completed their first research encounter at the WMHP during the current study. The comparison of first versus follow-up visits is limited to these 356 participants who completed 356 first visits and 2474 follow-up visits. Because all subjects were enrolled prior to 28 weeks gestation, none of the first visits occurred during the postpartum period. Consequently, the postpartum epochs were also excluded from the subsequent visits stratum to eliminate any confounding effect of the disparate perinatal epoch. There were no significant demographic differences between the 356 subjects included in this phase of the analysis and the 178 excluded except that the inclusion group was less likely to be Caucasian (83.2% vs. 91.0%, p= .004), reflecting the growing minority participation in WMHP research in recent years; and that the inclusion group was less likely to have a planned pregnancy (63.7% vs. 77.3%, p=.003).
ROC analysis indicated that all 4 scales performed in the good to excellent range at both initial visits and follow-up visits (cf. Table 4). The summation of sensitivity and specificity for all 4 scales was also consistent across the perinatal epochs within both strata. However, the optimal cutpoints for the HRSD17 and HRSD21 were very consistent between first and follow-up visits, whereas optimal cutpoints on the self-report instruments were generally 4–6 points higher for the first visit than for the follow-up visits during each perinatal epoch (cf. Table 4) and across PC, T1, T2 and T3 periods (cf. Table 4).
It is reassuring that ROC analysis across all phases of the analysis indicates that all 4 instruments consistently performed in the good to excellent range (Tape 2008). The overall optimal cutpoints for each scale are consistent with previous reports. Moreover, the sensitivities and specificities of the 4 instruments were also very similar. It might, therefore, be inferred that any of the 4 depression rating scales evaluated in the current study are suitable candidates to identify episodes of major depression during pregnancy and the postpartum period. Given the ease of administration of self-report measures in both the clinical and research settings, it could be argued that there is no justification for using the labor intensive HRSD over the self-report BDI and EPDS. However, the stability of the cutpoint is a key consideration when screening a broad perinatal sample and conducting longitudinal follow-up. The clinician administered HRSD provided more stable cutpoints (1–2 point range) across the perinatal epochs compared to the BDI (5 point range) and EPDS (9 point range). Consequently, the present data suggests that the HRSD may be preferred when conducting longitudinal studies across pregnancy and the postpartum period, but the less costly BDI and EPDS may be preferred for cross-sectional studies.
Consistent with our previous experience (Altshuler et al 2008), the comparative performance of the HRSD17 and HRSD21 indicates that items 18–21 can be eliminated from the perinatal administration of the HRSD with little or no impact on the performance of the scale. Inclusion of items 18–21 elevated the optimal cutpoint within each perinatal epoch by only 0–1 points and produced no significant improvement in the ROC AUC, sensitivity, or specificity of the HRSD during pregnancy and the postpartum period.
It is noteworthy that the specificities of the scales were uniformly lower during pregnancy and the early postpartum period than during the preconception epoch. Late postpartum specificities were generally intermediate between preconception and pregnancy/early postpartum values. Consistent with our a priori hypothesis, we suspect this may be a consequence of the overlap between physical symptoms of pregnancy and the neurovegetative symptoms of depression. In contrast, we had anticipated that the symptomatic overlap of pregnancy and depression would elevate the optimal cutpoints during pregnancy, the current results found the pregnancy cutpoints to be the same as or lower than the preconception cutpoints. This suggests that women incorporate their own opinions regarding the etiology of the symptoms during both interview and completion of self-rated scales. An extended item analysis may clarify the role that the neurovegetative symptom items play in the perinatal performance of these rating scales.
The primigravida versus multigravida comparison provides additional insights on this issue. As noted above, there is no evidence from the current study to indicate that the gravidity status alters the overall performance of these scales across the entirety of pregnancy and the postpartum period; however, the current data does suggest that multigravidas rate the instruments differently during the third trimester. Whereas there was no evident difference in the specificity of the scales in the third trimester, the cutpoints are consistently higher among multigravidas during the third trimester, and the scale sensitivity is lower among multigravidas for 3 of the 4 scales. We suspect that this confluence of third trimester results among multigravidas (i.e., higher cutpoint, lower sensitivity, unchanged specificity) is a consequence of multigravidas being more ready to report physical discomforts, particularly when such symptoms are incongruent with prior pregnancies, on a depression rating scale than primigravidas. Inclusion of the physical symptoms by the multigravidas would tend to elevate the scores. With the higher overall scores, higher cutpoints would be needed to forestall, at least in part, the resultant decline in sensitivity.
Finally, the analysis of first visit versus follow-up visit performance of the scales is generally reassuring, but raises concerns regarding the lack of cutpoint stability of the self-report depression measures. Overall first visit versus follow-up visit cutpoints for the HRSD only varied by 0–1 points, but the BDI and EPDS first versus follow-up visit cutpoints differed by 6 and 4 points, respectively. This apparent post-first visit learning effect has significant implications for the use of these self-report depression rating scales in longitudinal studies across the perinatal period, again suggesting that the HRSD may be preferred for longitudinal investigations.
The study is limited by the homogeneous clinical population that was able to complete participation in a longitudinal investigation; thereby limiting the generalizability of these results to community based perinatal samples where self-rated instruments are likely to be employed as screening tools. Notably, the group of women excluded from the present analyses were more demographically homogeneous than the included group. While this counter-intuitive finding runs counter to most longitudinal studies, it may reflect the benefits garnered by study participants with respect to frequent contact providing additional education and support. Indeed, the greater heterogeneity of the inclusion group potentially enhances the generalizability of the study results and increases support for further research in perinatal maternal mental illness with respect to the benefit of repeated professional contact. Similarly, the study assesses the validity of multiple scales in identifying episodes of depression that fulfill SCID diagnostic criteria. Limiting to episodes meeting diagnostic criteria, fails to assess the utility of scales in appropriately identifying subsyndromal symptoms warranting clinical attention.
Investigations in perinatal psychiatry continue to refine more optimal study parameters and methodology. As noted in our previous investigation, retrospective maternal reports are inadequate proxies for categorizing maternal depression during pregnancy and exposure to non-psychotropic medications (Newport et al 2008). The current study sought to define optimal cutpoint scores and factors that influence these scores. All scales demonstrated good to excellent ROC AUCs in identification of a MDE as defined by SCID criteria across pregnancy and the postpartum period. The impact of gravidity, first visit completion versus follow up visits, and variability of optimal cutpoint scores across the perinatal period need to be considered in the application of these scales in future longitudinal and treatment investigations.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.