It is reassuring that ROC analysis across all phases of the analysis indicates that all 4 instruments consistently performed in the good to excellent range (Tape 2008
). The overall optimal cutpoints for each scale are consistent with previous reports. Moreover, the sensitivities and specificities of the 4 instruments were also very similar. It might, therefore, be inferred that any of the 4 depression rating scales evaluated in the current study are suitable candidates to identify episodes of major depression during pregnancy and the postpartum period. Given the ease of administration of self-report measures in both the clinical and research settings, it could be argued that there is no justification for using the labor intensive HRSD over the self-report BDI and EPDS. However, the stability of the cutpoint is a key consideration when screening a broad perinatal sample and conducting longitudinal follow-up. The clinician administered HRSD provided more stable cutpoints (1–2 point range) across the perinatal epochs compared to the BDI (5 point range) and EPDS (9 point range). Consequently, the present data suggests that the HRSD may be preferred when conducting longitudinal studies across pregnancy and the postpartum period, but the less costly BDI and EPDS may be preferred for cross-sectional studies.
Consistent with our previous experience (Altshuler et al 2008
), the comparative performance of the HRSD17
indicates that items 18–21 can be eliminated from the perinatal administration of the HRSD with little or no impact on the performance of the scale. Inclusion of items 18–21 elevated the optimal cutpoint within each perinatal epoch by only 0–1 points and produced no significant improvement in the ROC AUC, sensitivity, or specificity of the HRSD during pregnancy and the postpartum period.
It is noteworthy that the specificities of the scales were uniformly lower during pregnancy and the early postpartum period than during the preconception epoch. Late postpartum specificities were generally intermediate between preconception and pregnancy/early postpartum values. Consistent with our a priori hypothesis, we suspect this may be a consequence of the overlap between physical symptoms of pregnancy and the neurovegetative symptoms of depression. In contrast, we had anticipated that the symptomatic overlap of pregnancy and depression would elevate the optimal cutpoints during pregnancy, the current results found the pregnancy cutpoints to be the same as or lower than the preconception cutpoints. This suggests that women incorporate their own opinions regarding the etiology of the symptoms during both interview and completion of self-rated scales. An extended item analysis may clarify the role that the neurovegetative symptom items play in the perinatal performance of these rating scales.
The primigravida versus multigravida comparison provides additional insights on this issue. As noted above, there is no evidence from the current study to indicate that the gravidity status alters the overall performance of these scales across the entirety of pregnancy and the postpartum period; however, the current data does suggest that multigravidas rate the instruments differently during the third trimester. Whereas there was no evident difference in the specificity of the scales in the third trimester, the cutpoints are consistently higher among multigravidas during the third trimester, and the scale sensitivity is lower among multigravidas for 3 of the 4 scales. We suspect that this confluence of third trimester results among multigravidas (i.e., higher cutpoint, lower sensitivity, unchanged specificity) is a consequence of multigravidas being more ready to report physical discomforts, particularly when such symptoms are incongruent with prior pregnancies, on a depression rating scale than primigravidas. Inclusion of the physical symptoms by the multigravidas would tend to elevate the scores. With the higher overall scores, higher cutpoints would be needed to forestall, at least in part, the resultant decline in sensitivity.
Finally, the analysis of first visit versus follow-up visit performance of the scales is generally reassuring, but raises concerns regarding the lack of cutpoint stability of the self-report depression measures. Overall first visit versus follow-up visit cutpoints for the HRSD only varied by 0–1 points, but the BDI and EPDS first versus follow-up visit cutpoints differed by 6 and 4 points, respectively. This apparent post-first visit learning effect has significant implications for the use of these self-report depression rating scales in longitudinal studies across the perinatal period, again suggesting that the HRSD may be preferred for longitudinal investigations.
The study is limited by the homogeneous clinical population that was able to complete participation in a longitudinal investigation; thereby limiting the generalizability of these results to community based perinatal samples where self-rated instruments are likely to be employed as screening tools. Notably, the group of women excluded from the present analyses were more demographically homogeneous than the included group. While this counter-intuitive finding runs counter to most longitudinal studies, it may reflect the benefits garnered by study participants with respect to frequent contact providing additional education and support. Indeed, the greater heterogeneity of the inclusion group potentially enhances the generalizability of the study results and increases support for further research in perinatal maternal mental illness with respect to the benefit of repeated professional contact. Similarly, the study assesses the validity of multiple scales in identifying episodes of depression that fulfill SCID diagnostic criteria. Limiting to episodes meeting diagnostic criteria, fails to assess the utility of scales in appropriately identifying subsyndromal symptoms warranting clinical attention.
Investigations in perinatal psychiatry continue to refine more optimal study parameters and methodology. As noted in our previous investigation, retrospective maternal reports are inadequate proxies for categorizing maternal depression during pregnancy and exposure to non-psychotropic medications (Newport et al 2008
). The current study sought to define optimal cutpoint scores and factors that influence these scores. All scales demonstrated good to excellent ROC AUCs in identification of a MDE as defined by SCID criteria across pregnancy and the postpartum period. The impact of gravidity, first visit completion versus follow up visits, and variability of optimal cutpoint scores across the perinatal period need to be considered in the application of these scales in future longitudinal and treatment investigations.