|Home | About | Journals | Submit | Contact Us | Français|
Objective To evaluate the cost effectiveness of routine screening for postnatal depression in primary care.
Design Cost effectiveness analysis with a decision model of alternative methods of screening for depression, including standardised postnatal depression and generic depression instruments. The performance of screening instruments was derived from a systematic review and bivariate meta-analysis at a range of instrument cut points; estimates of other relevant parameters were derived from literature sources and relevant databases. A decision tree considered the full treatment pathway from the possible onset of postnatal depression through identification, treatment, and possible relapse.
Setting Primary care.
Participants A hypothetical population of women assessed for postnatal depression either via routine care only or supplemented by use of formal identification methods six weeks postnatally, as recommended in recent guidelines.
Main outcome measures Costs expressed in 2006-7 prices and impact on health outcomes expressed in terms of quality adjusted life years (QALYs). The time horizon of the analysis was one year.
Results The routine application of either postnatal or general depression questionnaires did not seem to be cost effective compared with routine care only. The Edinburgh postnatal depression scale (at a cut point of 16) had an incremental cost effectiveness ratio (ICER) of £41103 (€45398, $67130) per QALY compared with routine care only. The ICER for all other strategies ranged from £49928 to £272463 per QALY versus routine care only, while the probability that no formal identification strategy was cost effective was 88% (59%) at a cost effectiveness threshold of £20000 (£30000) per QALY. While sensitivity analysis indicated that the cost of managing incorrectly identified depression (false positive result) was an important driver of the model, formal identification approaches did not seem to be cost effective at any feasible estimate of this cost.
Conclusions Formal identification methods for postnatal depression do not seem to represent value for money for the NHS. The major determinant of cost effectiveness seems to be the potential additional costs of managing women incorrectly diagnosed as depressed. Formal identification methods for postnatal depression do not currently satisfy the National Screening Committee’s criteria for the adoption of a screening strategy as part of national health policy.
Depression accounts for the greatest burden among all mental health problems and by 2020 is expected to become the second most common general health problem.1 Postnatal depression is an important category of depression, with over 11% of women experiencing major or minor postnatal depression six weeks postnatally.2 There is now considerable evidence to show that postnatal depression has a substantial impact on the mother and her partner,3 the family,4 mother-baby interactions,5 and the longer term emotional and cognitive development of the baby,6 especially when depression occurs in the first year of life.7
Though clinically and cost effective treatments are available,8 9 less than half of cases of postnatal depression are detected in routine clinical practice.9 10 Formal strategies for screening and case identification (such as standardised postnatal questionnaires, standardised generic questionnaires for depression, and prenatal screening for known risk factors for postnatal depression) have been advocated but are controversial.11 12 The National Screening Committee has clear criteria that must be satisfied before the adoption of formal screening strategies.13 These criteria consist of 23 items relating to the condition, the test, the treatment, and the proposed screening programme.14 In particular, screening strategies are assessed to ensure that the screening does more good than harm at a reasonable cost. When these criteria were previously applied to formal screening strategies for postnatal depression, there was insufficient clinical and economic evidence to support their implementation.12 Nevertheless, these strategies are widely used in current practice, with particular focus on the Edinburgh postnatal depression scale (EPDS).8
Furthermore, recent clinical guidelines issued by the National Institute for Health and Clinical Excellence (NICE) recommend the use of brief case finding questions to identify possible postnatal depression (box 1), with the use of self report measures such as the Edinburgh postnatal depression scale, the hospital anxiety and depression scale (HADS), or the patient health questionnaire (PHQ-9) as part of subsequent assessment or for routine monitoring.8 A specific recommendation was made for the use of brief generic case finding questions (the Whooley questions) that had previously been validated in older men but not postnatal women.8 This guidance, however, did not formally consider the cost effectiveness of such strategies. In view of the uncertainty surrounding this issue, the United Kingdom National Institute for Health Research (NIHR)-Health Technology Assessment Programme prioritised a review of the clinical and cost effectiveness of formal identification methods for postnatal depression in primary care. A full technical report is published elsewhere.15 Here we provide a summary of the cost effectiveness analysis and policy implications of this policy driven evidence review.
Clinical guidance issued by NICE in 2007 recommends that healthcare professionals ask two questions at a woman’s first contact with primary care, again at her booking visit, and again postnatally (usually at 4-6 weeks and 3-4 months):
A third question should be considered if the woman answers “yes” to either of the initial questions:
We carried out several formal systematic reviews of identification methods and their corresponding performance and developed an economic model to evaluate the costs and health outcomes associated with each. This model took the form of a decision tree and considered the full treatment pathway from the possible onset of postnatal depression through identification, treatment, and possible relapse (figure(figure).
The model evaluated a hypothetical population of women managed in primary care six weeks postnatally. We assumed that a proportion of the women were depressed but that only some had been identified as such through routine care. At this time a formal identification method could be administered on the entire population of postnatal women. As screening and case finding instruments are imperfect diagnostic instruments, this might identify some depressed women not detected through routine care, but it might also incorrectly identify some women who were not depressed. Women were assumed to have a diagnosis of depression if they were positively identified by either the formal identification method (when applicable) or routine care. We assumed that those with a diagnosis of depression were offered additional treatment (box 2) while all others received usual postnatal care. All women were followed up for a year. To facilitate the modelling of the postnatal period, we assumed that women who were not depressed six weeks postnatally remained so until the model end point.
The recent NICE clinical guidance presented a decision analytical model that assessed the relative cost effectiveness of two alternative psychological treatments for women with mild to moderate depression in the postnatal period: structured psychological therapy (as exemplified by manualised cognitive behavioural therapy delivered by a clinical psychologist) and non-directive counselling (for example, that based on Rogerian principles and delivered by a primary care practice counsellor); full descriptions of these treatments are given in the guidance.8 We reconstructed this model with updated parameter inputs and carried out probabilistic Monte Carlo simulation over 10000 iterations. By adopting a conventional threshold of willingness to pay of £20000-30000 per quality adjusted life year (QALY), we found that structured psychological therapy is a cost effective treatment for mild to moderate postnatal depression, with an incremental cost effectiveness ratio (ICER) of £17481 per QALY versus no treatment. The ICER compares the additional costs that one strategy incurs over another with the corresponding additional health benefits.16 Non-directive counselling was not cost effective, with an ICER of £66275 per QALY versus structured psychological therapy.
Women with a diagnosis of postnatal depression in our model were therefore assumed to be offered structured psychological therapy as detailed in the NICE report. This included eight 50 minute sessions with a clinical psychologist, three 10 minute appointments with her general practitioner, four 45 minute home visits from a health visitor (a qualified nurse specially trained in health assessment and promotion in the community), and a single one hour home visit from a community psychiatric nurse. As in the NICE model, each woman would either continue or discontinue treatment and would either respond to treatment or not. If the woman responded then there was a possibility of relapse. If the woman did not respond then she remained depressed.
Women with an incorrect diagnosis were assumed not to receive any of the sessions with the clinical psychologist but did receive the supportive care. Women with undiagnosed depression were assumed to return to their general practitioner during the follow-up period, at which time there was a possibility of receiving a diagnosis and being offered treatment; meanwhile there was a possibility of these women recovering under usual postnatal care. Full details are provided in the technical report.15
The analysis was conducted from the perspective of the NHS and personal social services, with costs expressed in 2006-7 prices and health outcomes expressed in quality adjusted life years (QALYs). We did not apply discounting because the time horizon of the analysis was only one year. The model was probabilistic: input parameters were entered into the model as probability distributions to reflect uncertainty surrounding the mean estimates. We used Monte Carlo simulation (over 10000 iterations) to propagate this uncertainty throughout the model and sensitivity analysis to explore the impact of alternative model assumptions.
To specify the range of validated case finding and screening techniques and to establish their diagnostic performance characteristics, we undertook a comprehensive search across 20 electronic databases (including forward citation searching of key literature, personal communication with authors, and scrutiny of reference lists15). We identified numerous generic and specific screening strategies for postnatal depression, 14 of which had been validated among women during pregnancy or the postnatal period. By far the most commonly used identification method was the Edinburgh postnatal depression scale, a specially developed standardised postnatal questionnaire. Other methods identified included standardised generic questionnaires such as the Beck depression inventory. Both of these methods generate a score on a particular scale—when a patient scores at or above a particular cut point clinically significant depression is assumed to be present.15 The performance (sensitivity and specificity) of each method depends critically on the cut point chosen (table 1)1).
We used a bivariate meta-analysis to derive the sensitivity and specificity of the alternative formal identification methods. This meta-analysis formed part of a wider portfolio of research that assessed the accuracy, acceptability, clinical effectiveness, and cost effectiveness of formal identification methods for postnatal depression.15 The bivariate model was fitted with a generalised linear mixed model approach to the bivariate meta-analysis of sensitivity and specificity.17 This approach uses the exact binomial distribution to describe variability of sensitivity and specificity within a study rather than the normal approximation as originally proposed.18
For “major or minor” depression there were sufficient data to pool the Edinburgh postnatal depression scale at various cut points (7-16 inclusive) and the Beck depression inventory at cut point 10. We did not consider the Whooley questions in the base case because of lack of data available to pool estimates as part of the bivariate meta-analysis and concerns as to the absence of data in postnatal women. For the purposes of the probabilistic model, we modelled the sensitivity and specificity on the log odds scale using normal distributions. The correlation between sensitivity and specificity was reflected in the probabilistic analysis using the Cholesky decomposition of the covariance matrix.19
We used an estimate of the prevalence of depression among postnatal women six weeks postnatally from a previous systematic review.2 In the absence of specific data on postnatal depression, we derived the estimate of the probability that depression is detected via routine care at six weeks postnatally from a UK follow-up study of detection rates for depression and anxiety in primary care.20 The estimate of the probability that postnatal depression is detected when a woman with undiagnosed depression returns to her general practitioner during the follow-up period was also derived from this study. The robustness of our model to uncertainty around the estimate of prospective detection rates was formally examined within our probabilistic sensitivity analysis.
Estimates of the probabilities that women discontinue treatment, do not respond to treatment, or relapse after responding to treatment (see box 2) were adopted from data used in the most recent evidence synthesis undertaken within the recent NICE clinical guideline.8
Table 2 summarises the model’s parametersparameters.
Critical to the cost effectiveness of screening and case finding for postnatal depression is the impact of early identification and management on a woman’s quality of life. To estimate QALYs, we assessed the quality of life associated with different health states in terms of preference weights based on published evidence.21 In the absence of published weights specific for postnatal depression, we assumed that affected women experience the quality of life associated with “moderate depression,” while women without depression (including those in remission) experience the quality of life associated with “remission.”
The costs associated with formal identification methods are also critical to their cost effectiveness. These include the cost of administering the method, the cost of any subsequent treatment, and the costs associated with incorrect diagnosis. We established estimates of time and cost from published data or from expert opinion.
Estimates for NHS unit costs were derived from national reference costs.22 In common with the assumptions used with the NICE clinical guideline, structured psychological therapy consisted of eight 50 minute sessions with a clinical psychologist (£446.67) plus the following supportive care: three 10 minute appointments with a general practitioner (£76.50), four 45 minute home visits from a health visitor (£278.20), and a single one hour home visit from a community psychiatric nurse (£59.30).
Estimated administration costs were £7.57 (five minutes of a health visitor’s time at £91 an hour) for the Edinburgh postnatal depression scale and £8.59 (five minutes at £91 an hour plus the $2 licence fee) for the Beck depression inventory.23
Women with undiagnosed depression were assumed to make an additional 10 minute visit to their general practitioner during the follow-up period (£25.50). We assumed that women without depression who were wrongly identified as depressed would not receive any of the sessions with the clinical psychologist but would receive all the supportive care (total cost £414); this assumption was returned to in a sensitivity analysis.
We identified several important drivers of the model: the cost incurred by and approach to managing those wrongly identified as depressed (false positives); and the decision to include or exclude formal identification strategies not evaluated in the bivariate meta-analysis because of insufficient data. The impact of these drivers on the results of the model was explored through sensitivity analysis with a series of alternative scenarios.
Scenario 1—The cost incurred by managing women with a false positive diagnosis is a particularly important driver of the model that is subject to considerable uncertainty. The base case estimate of this cost (£414) was calculated on the assumption that women wrongly identified as depressed would receive all of the supportive care associated with structured psychological therapy before being identified as not depressed. As this estimate might well be conservative, we considered an alternative scenario whereby this cost was assumed to be that of a single consultation with a general practitioner (£25.50), on the optimistic assumption that women identified as depressed by a formal identification method would be referred to a general practitioner, who would then immediately make the correct diagnosis.
Scenario 2—We also considered the impact of using a complementary ideal identification test, such as the structured clinical interview for DSM-IV axis I disorders (SCID), for those identified as depressed by a formal identification method. We assumed that this interview took 30 minutes of a health visitor’s time to administer (£45.50) and had a sensitivity and specificity of 100%; this mitigated the cost of managing false positives.
Scenario 3—We evaluated the impact of considering the Whooley questions as an alternative identification method, given its recommendation in the recent NICE clinical guidance and its subsequent policy relevance. This was not considered in the base case because of the lack of data available to pool estimates as part of the bivariate meta-analysis and concerns as to the absence of validation data in a postnatal population. Given these limited data, the Whooley questions were assumed to have the sensitivity (0.96, 95% confidence interval 0.86 to 0.99) and specificity (0.89, 0.87 to 0.91) reported by Arroll et al in 200524 and were assumed to take a health visitor one minute to administer (£1.52).
We have presented our results in two ways: firstly, mean lifetime costs and QALYs of each formal identification method with the cost effectiveness of each method compared with incremental cost effectiveness ratios (see box 2); and, secondly, decision uncertainty as the probability that each formal identification method is considered the most cost effective strategy for a given cost effectiveness threshold.
Table 3 summarises the results of the base case analysisanalysis.. No formal identification method seemed to be cost effective under a conventional willingness to pay threshold of £20000-30000 per QALY,25 with the strategy of routine care the most likely to be cost effective. Adopting the Edinburgh postnatal depression scale with a cut point of 16 was the least costly and effective formal identification method, with an estimated incremental cost effectiveness ratio of £41103 per additional QALY compared with routine care alone. The incremental cost effectiveness ratio for all other screening strategies ranged from £49928 to £272463 per QALY versus routine care only. The costs of managing those wrongly diagnosed as depressed represented a key driver of these results; as such, strategies adopting formal identification methods with higher specificity were associated with more favourable incremental cost effectiveness ratios.
Table 4 summarises the results of the sensitivity analysesanalyses.. The assumption of a lower cost associated with managing those wrongly diagnosed as depressed had two important consequences: the incremental cost effectiveness ratios of strategies adopting identification methods became more favourable; and the ranking of non-dominated strategies became less dependent on the specificity of the identification method adopted. When the cost of managing those wrongly diagnosed was assumed to be £25.50, using the Edinburgh postnatal depression scale with a cut point of 10 seemed borderline cost effective with an incremental cost effectiveness ratio of £29186 per additional QALY.
Under base case assumptions, adoption of the structured clinical interview as a confirmatory test proved to be cost saving compared with the equivalent strategy without the interview, although no strategy subsequently proved to be cost effective based on a threshold of £20-30000 per QALY. Use of the Edinburgh postnatal depression scale with a cut point of 13 with confirmatory structured clinical interview had the lowest reported incremental cost effectiveness ratio of £33776 per QALY compared with routine care.
When we considered the Whooley questions as an alternative identification method they proved not to be cost effective, with an incremental cost effectiveness ratio of £46538 per QALY versus the strategy of using the Edinburgh postnatal depression scale with a cut point of 16 (itself not cost effective compared with routine care only).
The use of formal identification methods for detecting postnatal depression does not to represent value for money for the NHS. Postnatal depression is an important clinical, economic, and social problem which is under-recognised and for which effective treatments are available. Decisions to screen, however, have attracted considerable controversy, and such policy decisions should be informed by systematic consideration of the clinical and economic evidence. In the absence of prospective economic evidence collected within randomised trials, decision models remain the most useful method to inform practice and policy decisions and to identify areas of uncertainty and priorities for further research. We used decision modelling to address this problem and built on a recent and comprehensive diagnostic meta-analysis of the performance of screening instruments.
Our conclusions regarding the lack of cost effectiveness are primarily driven by the costs of managing false positives—that is, women with a misdiagnosis of depression at a one off screen who do not subsequently turn out to have postnatal depression. When this cost is relatively high, the specificity of a formal identification method is an important contributor to its cost effectiveness; as this cost falls, specificity becomes less important relative to sensitivity. In the absence of reliable data surrounding this cost, we used a conservative approach in our base case analysis: compared with routine care the most cost effective formal identification method (Edinburgh postnatal depression scale with a cut point of 16) had an incremental cost effectiveness ratio of £41103 per QALY. This ratio is well above a conventional threshold of willingness to pay of £20000-30000 per QALY. Furthermore, even when we adopted a particularly optimistic estimate of this cost (that of a single appointment with a general practitioner) in a sensitivity analysis, compared with routine care the most cost effective formal identification method (Edinburgh postnatal depression scale with a cut point of 10) had an incremental cost effectiveness ratio of £29186 per QALY, falling only slightly under the upper limit of the £20000-30000 per QALY cost effectiveness threshold. As such, our main finding seems to be robust to plausible estimates of the cost of managing false positives. Nevertheless, a definitive answer as to the cost effectiveness of formal identification methods requires further evidence around this particular cost.
A secondary finding is that adopting the structured clinical interview as a confirmatory test for those positively identified by a formal identification method proved to be cost saving compared with the equivalent strategy without such an interview; however, no such strategy subsequently proved to be cost effective compared with routine care only. This suggests that future research into the cost associated with managing false positives should also consider alternative approaches to the management of those positively identified by a formal identification method.
We evaluated the relative performance of each identification method using bivariate meta-analysis, with the data used to inform the meta-analysis derived from several systematic searches of the literature. We used probabilistic techniques to propagate parameter uncertainty throughout the model. The model is compatible with the most recent best practice guidelines from NICE for the methods of technology appraisal.25
The analysis was conducted from the perspective of the NHS and personal social services and the model focused on the costs and health outcomes associated solely with the mother; no account was taken of the potential impact that successful identification and subsequent management might have had on other family members or the infant(s). These aspects, and a wider societal perspective, were not considered because of the lack of reliable evidence on the wider impact of case identification or treatment strategies.
There were limited published data available for estimating particular parameters in the model, including the probability that postnatal depression was identified via routine care at six weeks, the risk of relapse, and the utility weights. As a result, we derived the estimates used in the model from studies of general depressed populations (that is, not postnatal women), which represents a serious limitation of the model, particularly with respect to the utility estimates adopted. Further research specifically into the health related quality of life of women with postnatal depression would be valuable for future studies. The Edinburgh postnatal depression scale was the only identification strategy for which there were sufficient data at more than one cut point to be able to combine results and produce pooled summary estimates of sensitivity and specificity; as such, the performance of other identification strategies could not be assessed at multiple cut points. In addition, there were insufficient data to carry out subgroup analyses on different populations, such as women of different ages and women with or without complications after birth.
A further issue is the degree to which the QALY is an appropriate measure of health outcome. While the QALY is used throughout the literature on evaluation of health economics, it might be an insensitive measure of outcomes in mental health care.26 In the absence of a suitable alternative, we used the QALY to ensure comparability between the interventions considered here and those outside mental health; the potential insensitivity of the QALY in this context, however, should be considered in the interpretation of the results.
Finally, there were moderate to high levels of heterogeneity between studies across all cut points of the instruments considered. Consequently, we used random effects meta-analysis to incorporate the additional uncertainty caused by that heterogeneity in the test performance results for each instrument. It has recently been argued that conventional random effect approaches might underestimate uncertainty by not considering the whole distribution of effects and that this additional uncertainty can be reflected by using a prediction interval.27 In the context of decision models analysts could use either the predictive distribution of a future treatment effect or they should assume that future implementation will result in a distribution of treatment effects.28 The importance of this issue depends on the sources of heterogeneity in the evidence and their relation to a future implementation, both of which remain uncertain here given the limited existing evidence base for several of the instruments. Importantly from a policy perspective, the decision model presented here is linear with respect to the parameter inputs and the outputs used to establish the incremental cost effectiveness ratios and hence our conclusions would not be altered by using the predictive distribution.
The results suggest that application of the recent NICE guidance (which recommends the use of the Whooley questions with an additional help question, see box 1) and widespread current practice (which focuses on the routine or ad hoc administration of the Edinburgh postnatal depression scale) do not result in value for money for the NHS. Formal methods of screening or case identification do not seem to satisfy the National Screening Committee’s criteria for the adoption of a screening strategy as part of national health policy as their adoption does not represent a “favourable ratio of costs to benefits” according to conventional NHS cost effectiveness thresholds.
Contributors: MP and SP were responsible for the economic review, including developing the decision model and reporting the results of the cost effectiveness analysis. CH and SG were responsible for the clinical review, including conducting and reporting the results of the bivariate meta-analysis, and also contributed to the development of the decision model. All authors contributed to the writing of this paper. SG was chief investigator for the NIHR Health Technology Assessment of Screening for Post Natal Depression (HTA project grant 05/39/06), which included this bivariate meta-analysis and decision model. SG is guarantor for this publication.
Funding: This study was funded by the NIHR Health Technology Assessment (HTA) Programme (HTA project grant 05/39/06). The funding source had no role in study design, conduct, data collection, data analysis, or data interpretation or in the decision to submit the manuscript for publication.
Competing interests: None declared.
Ethical approval: Not required.
Cite this as: BMJ 2009;339:b5203