|Home | About | Journals | Submit | Contact Us | Français|
Screening and case-finding has been proposed as a simple, quick and cheap method to improve the quality of care for depression. We sought to establish the effectiveness of screening in improving the recognition of depression, the management of depression and the outcomes of patients with depression.
We performed a Cochrane systematic review of randomized controlled trials conducted in nonmental health settings that included case-finding or screening instruments for depression. We conducted a meta-analysis and explored heterogeneity using meta-regression techniques.
Sixteen studies with 7576 patients met our inclusion criteria. We found that the use of screening or case-finding instruments were associated with a modest increase in the recognition of depression by clinicians (relative risk [RR] 1.27, 95% confidence interval [CI] 1.02 to 1.59). Questionnaires, when administered to all patients and the results given to clinicians irrespective of baseline score, had no impact on recognition (RR 1.03, 95% CI 0.85 to 1.24). Screening or case finding increased the use of any intervention by a relative risk of 1.30 (95% CI 0.97 to 1.76). There was no evidence of influence on the prescription of antidepressant medications (RR 1.20, 95% CI 0.87 to 1.66). Seven studies provided data on outcomes of depression, and no evidence of an effect was found (standardized mean difference –0.02, 95% CI –0.25 to 0.20).
If used alone, case-finding or screening questionnaires for depression appear to have little or no impact on the detection and management of depression by clinicians. Recommendations to adopt screening strategies using standardized questionnaires without organizational enhancements are not justified.
Depression affects 5%–10% of people,1 but about half of these cases are missed in primary care2 and in general hospital settings.3 The use of screening and case-finding instruments to improve the quality of care for depression has been supported by recommendations from the Canadian Task Force on Preventive Health Care,4 the US Preventive Services Task Force5 and the UK National Institute of Clinical Excellence.6 The potential for these instruments to improve the ability of nonspecialists to recognize and manage depression is substantial but cannot be assumed. Even simple quality-improvement strategies must be supported by evidence of clinical benefit.7
Previous systematic reviews in this area have produced seemingly conflicting results. One review of the use of screening without further enhancement of care reported no overall benefit;8 however, there were substantial differences between the included studies and some suggestion that isolated screening may be effective in certain circumstances. A review by the US Preventive Services Task Force supported screening9 and recommended its use in conjunction with additional enhancement of care. Recommendations by the UK National Institute of Clinical Excellence6 stated that screening should be offered for populations at increased risk of depression, but the research evidence used to support this recommendation was not clear. A more recent review showed definitively that enhanced care for depression (collaborative care) is effective in improving the outcome of depression10 but that the use of screening as a component of multifaceted quality enhancement was not a necessary condition for improved outcomes.
In many health care systems, the use of screening questionnaires in primary care without additional enhancement of care has become the most commonly used quality-improvement strategy for depression care.7 We conducted a systematic review to determine the specific clinical effectiveness of screening and case-finding instruments without additional enhancement of care in improving the recognition, management and outcome of depression. In particular, we sought to distinguish between studies that evaluated screening alone from those that included other elements to improve the organization and delivery of care. We also sought to examine whether the effectiveness of screening alone might vary according to circumstance and patient population.
We conducted this review according to the methods recommended by the Cochrane Colaboration.11
We searched the following databases without language restrictions from inception to December 2007: MEDLINE; EMBASE; CINAHL; PsycLIT; EconLIT; BNI/RCN; Cochrane Database of Systematic Reviews; the Trials Register of the Cochrane Depression, Anxiety and Neurosis Group; Cochrane Library; NHS Economic Evaluations Database; and the Database of Reviews of Effectiveness. A detailed example of the strategy used to search MEDLINE is available online (Appendix 1, www.cmaj.ca/cgi/content/full/178/8/997/DC2).
We included randomized controlled trials that investigated screening and case-finding instruments among patients in nonpsychiatric settings (e.g., general hospital or primary care) and that compared the introduction of a routine form of screening or case-finding instrument with usual or routine care. The active intervention involved the addition of standardized depression screening or an outcome-assessment instrument to routine care with information from the assessment given to the clinician. We excluded studies with substantial enhancements in the process of care12,13 (e.g., case managers, nursing interventions, collaborative care) because these interventions consist of enhanced care of which screening is only 1 element.12
The outcomes of interest were the rates of detection or recognition of depression by the clinician indicated by a clear entry in the medical record and the rates of intervention for depression, such as initiation of pharmacological or psychosocial intervention or active referral to specialist care for depression. We also included outcomes of depression, which were classified as short term (< 6 months), medium term (6–12 months) and long term (> 12 months).
All included studies were scrutinized independently by 2 researchers (S.G. with either A.H. or T.S.). We assessed the methodologic quality of the included studies with reference to the method of randomization and allocation concealment.11 In addition, we examined clustering and “unit-of-analysis error” for studies that randomized clinicians or practices rather than individual patients.14 Study inclusion or exclusion, quality assessment and data extraction were first performed by 1 reviewer (S.G.) and independently checked by a second reviewer (A.H. or T.S.). Differences of opinion were resolved by a third reviewer.
We performed a meta-analysis15 of risk ratios for dichotomous variables (recognition and treatment of depression) using a random-effects analysis. To analyze outcomes of depression, we calculated standardized and weighted mean differences for continuous variables, and we transformed dichotomous variables to a standardized mean difference.16 We assessed between-study heterogeneity using the I2 statistic.17 As a guide, an I2 value of 25% is considered low, 50% is considered moderate and 75% is considered high for the likelihood of heterogeneity.17
If substantial between-study heterogeneity was found (I2 > 50%), we examined the following likely sources of clinical heterogeneity that had been defined a priori: clinical setting (general hospital v. primary care); patient population (studies that randomized all patients irrespective of their score on a depression questionnaire [unselected patients] v. those that randomized only preselected patients who exceeded a prestated cut-off point on a questionnaire [high-risk patients]); type of instrument (depression-specific measures [e.g., Beck Depression Inventory]18 v. mixed anxiety and depression measures [e.g., General Health Questionnaire]19 or depression-specific measures embedded within a range of psychiatric subscales [e.g., Primary Care Evaluation of Mental Disorders]).20
For studies with substantial heterogeneity, we explored the causes using meta-regression.21 The amount of heterogeneity explained by these a priori causes was examined by reductions in the I2 inconsistency statistic.
Studies that randomized by cluster (clinician or practice) but failed to incorporate this clustering into their analysis (unit-of-analysis error) were re-analyzed by use of an appropriate method14,22 (incorporating an intra-class correlation coefficient of 0.02).23
We performed all analyses using STATA (version 8) with the “metan” and “metareg” series of commands.
Of the 11 389 records identified, we selected 59 for further scrutiny. Of these, 16 met our full inclusion criteria (Figure 1; Appendix 2, available online at www.cmaj.ca/cgi/content/full/178/8/997/DC2).24–39 The majority of studies were excluded because they involved a substantial enhancement of care (over and above screening) or were nonrandomized. Appendix 3 presents the excluded studies and the reasons for exclusion (available online at www.cmaj.ca/cgi/content/full/178/8/997/DC2).
Of the included studies, 12 were conducted in primary care settings,24–27,29–31,33,34,37–39 2 in general hospital outpatient settings;28,32 1 in an emergency department35 and 1 in an inpatient setting with elderly patients.36 The sample sizes were between 51 and 2209, and 4 studies26,27,36,38 included a power calculation. Twelve of the studies were performed in the United States.
We identified 2 distinct populations of randomized patients: 9 studies included unselected patients,26–30,32,35,36,38 and 7 studies included high-risk patients.24,25,31,33,34,37,39 Two studies included a greater proportion of elderly patients or were specifically targeted to elderly patients.33,37
The interventions evaluated in these studies involved the feedback of test results to the clinician, generally in the form of a sheet containing summary scores and an explanation of the importance of high scores in terms of the likely presence of a psychological disorder. The control condition was generally the same case-finding instrument administered to the patient but without the results being given to the clinician. Nine studies used depression-specific instruments24,25,27,32–34,37–39 and 6 used less specific measures in which depression was one of a number of psychiatric disorders covered.26,28–31,35,36
All of the included studies were described as randomized; however, few studies gave specific details of either the method of randomization or concealment of allocation. One study used a clustered design37 but failed to account for clustering in the analysis of results, making it susceptible to unit-of-analysis error.40
Eleven studies presented data on the effect of screening or case-finding instruments on the recognition of depression.25–29,32–35,37,38 The rates of recognition were largely established by researchers scrutinizing medical records to see if the physician had made an entry about depression (Appendix 2, available online at www.cmaj.ca/cgi/content/full/178/8/997/DC2).
Visual inspection of the forest plots (Figure 2) and statistical testing demonstrated moderate heterogeneity between studies (I2 = 69%). We performed a random-effects pooling of the results, which showed that screening and case-finding instruments had a borderline positive impact on the rate of recognition of depression by clinicians (11 studies; relative risk [RR] 1.27, 95% CI 1.02 to 1.59). Exploration of the possible sources of heterogeneity showed that the most plausible explanation was the method of scoring and patient randomization (Figure 2). Selection and randomization of patients according to pre-existing scores above a cut-off value (high-risk patients) produced a larger effect size (ratio of RR 1.67, 95% CI 0.89 to 3.16) (Table 1). For unselected patients, screening and case finding instruments had no effect on depression recognition (RR 1.03, 95% CI 0.85 to 1.24).
We found that the size of the effect reported by studies that used depression-specific rating scales was larger than that reported by studies that used more broadly defined or mixed measures of depression or anxiety, although this was of borderline significance (ratio of RR 0.59, 95% CI 0.33 to 1.04, p = 0.06). The overall effect size was the same in general hospital (RR 1.38, 95% CI 0.79 to 2.43) and primary care settings (RR1.30, 95% CI 0.99 to 1.70, p = 0.89).
Ten studies presented data on the impact of screening or case finding on the management of depression.24,25,27,28,31–33,35–37 Among these studies, there was a borderline significant difference between the intervention and control groups for any intervention for depression (RR 1.30, 95% CI 0.97 to 1.76, I2 = 81%) (Figure 3). Dividing the studies according to the method of selection did not reduce the overall level of between-study heterogeneity (from 81% to 61%), and this was a significant predictor of variation. Studies that randomized high-risk patients showed a larger effect size than studies that randomized unselected patients (meta-regression ratio of RR 1.37, 95% CI 0.64 to 2.94), although this difference was not significant (p = 0.37) (Table 1).
There was no significant effect of screening on the prescription of antidepressants (RR 1.20, 95% CI 0.87 to 1.66, I2 = 83%). There was a larger effect size among studies that randomized high-risk patients compared with those that randomized unselected patients, although this was nonsignificant (meta-regression ratio of RR 1.64, 95% CI 0.55 to 4.89, p = 0.29).
The use of a depression-specific instrument was not related to the rate of intervention for depression (p = 0.29) or to the prescription of antidepressants (p = 0.29). The effect of the intervention was also unrelated to study setting, with feedback being equally ineffective in primary care and general hospital settings (p = 0.58).
Effect of screening or case-finding on depression outcomes
Seven studies24,25,27,30,31,37,38 reported data on the impact of screening or case finding on the outcome of depression over time. Of these studies, 5 provided sufficient data to be pooled24,25,31,37,38 (Appendix 2, available online at www.cmaj.ca/cgi/content/full/178/8/997/DC2). There was no overall impact of screening on depression outcomes (standardized mean difference –0.02, 95% CI –0.25 to 0.20, I2 = 31%) (Figure 4). There was low between-study heterogeneity, which we did not explore.
We found no substantial effect of screening or case-finding instruments on the overall recognition rates of depression, the management of depression by clinicians or on depression outcomes. These findings were true for both primary care and general hospital settings.
The finding that routinely administered screening or case-finding instruments for depression have little impact on the recognition of depression is a robust finding based on several large-scale studies. In a subset of studies that used the more complex 2-stage screening and feedback methods, there was some evidence of improved recognition. A further finding from our exploration of between-study heterogeneity is that depression-specific instruments seem to influence clinicians to a greater extent than less specific instruments, such as the General Health Questionnaire, that measure both depression and anxiety. It would seem that when information is specific and requires little additional computation on the part of the clinician, they may more readily integrate this information into their clinical decision-making process. Our finding that high-risk screening strategies might be more effective than unselected strategies might reflect an implicit decision-making process among clinicians, whereby they are more likely to act on the basis of information when there is strong positive likelihood that the information predicts the presence of a disorder. Among previously unselected patients, the prevalence of depression will be low (< 10%) and the post-test probability will be less than 50%, meaning that a positive screening test will be wrong more often than it is right. In this sense, our findings may reflect the Bayesian processes inherent in many clinical decisions.41 These are areas that deserve further research and may point the way to finding an effective role for screening instruments in nonspecialist settings.
Despite our best efforts in summarizing these data, there are several limitations largely related to the primary studies included. First, most studies did not report adequate concealment of allocation or the method of randomization; thus, we could not determine the susceptibility to bias.42 Second, we were unable to account for some of the substantial heterogeneity that remained between studies. Further research should seek to identify other sources of clinical heterogeneity. Lastly, we should urge caution in drawing firm conclusions from any suggestive findings based upon our exploratory meta-regression analysis, since this involves making observational comparisons within randomized studies and the power of causal inference is therefore reduced.21 These results should be considered hypothesis generating and further randomized trials are needed to test the robustness of these findings.
Previously, researchers have sought to apply systematic review methods to establish the effectiveness of screening for depression,8,9 but with seemingly contradictory results.43 The results presented in this review should be considered alongside those of a 2002 report by the US Preventive Services Task Force9 and an updated reported by the Canadian Task Force on Preventive Health Care based on the US review.4 An Australian “review of reviews” by Hickie and colleagues in 2002 placed great emphasis on the results of the review by the US Preventive Services Task Force, but they did not include subsequent research or reviews included in our review.43 Thus, to explore the reasons for this apparent divergence in results, we need to compare our methods and results with those of the US Preventive Services Task Force review.9 We believe our findings are largely consistent with the reviews by the US Preventive Services Task Force and the Canadian Task Force on Preventive Health Care; however, the following differences should be noted. First, our review updates the reviews by the US Preventive Services Task Force and the Canadian Task Force on Preventive Health Care, and it includes 3 studies were published after the other 2 reviews24,26,35 and 3 studies that were not included in the US Preventive Services Task Force review.28,29,36 Second, although the results were broadly similar in both settings, our review focuses on screening in any setting, compared with the reviews by the US Preventive Services Task Force and Canadian Task Force on Preventive Health Care, which focused on primary care alone. Third, we excluded 1 study that had been included in both the US Preventive Services Task Force and Canadian Task Force on Preventive Health Care reviews because it did not meet our inclusion criteria.44
The most notable difference is that our review focuses on screening strategies alone and does not include studies in which screening was embedded within wider enhanced-care programs. The results of our review are, therefore, only relevant to stand-alone screening programs, for which we found clear evidence of limited or no benefit. The set of interventions reviewed by US Preventive Services Task Force and Canadian Task Force on Preventive Health Care included those in which screening was included as a part of enhanced patient care and clinician support (collaborative care and quality-improvement strategies).45–47 Of particular importance was the inclusion of a large US study, the Partners in Care study,47 whose results were strongly in favour of the active intervention (screening with collaborative care), which included face-to-face clinician education; computerized decision support; individualized treatment algorithms; psychotherapy or drug treatment; active support by a case manager; and regular consultation with a specialized mental health clinician (psychologist or psychiatrist). This study accounted for between 30% and 47% of the weighted information in the meta-analyses produced by the US Preventive Services Task Force.9
Complex enhanced collaborative care for depression improves the outcomes of depression, and these packages have been comprehensively reviewed elsewhere.10,13,48 We do not question the effectiveness of these strategies, but it remains unclear whether screening is a necessary component of enhanced collaborative care for depression. From a recent review of the necessary components of collaborative care,10 several other factors emerged as potentially necessary (and statistically significant) components including the use of coordinated patient follow-up, case managers with a mental health background and regular supervision of case managers.10 Thus, the previous reviews by the US Preventive Services Task Force and Canadian Task Force on Preventive Health Care, which mixed studies of enhanced care (some including screening) and screening alone, do not provide reliable evidence on the effectiveness of screening. Many studies that include complex enhancements of care have not used screening as a recruitment strategy, but they have also reported positive results.49–51 Further trials in this area should compare the relative effectiveness of enhancements of care with and without screening.
Our review complements those by the US Preventive Services Task Force and Canadian Task Force on Preventive Health Care, and it enhances our understanding of prior work and the available and more recent evidence. It helps us to better understand the role of screening in general and confirms that screening without other systematic changes to improve depression management is unlikely to improve outcomes. This is of particular importance to policy makers who may have ignored the key recommendations of the US Preventive Services Task Force and Canadian Task Force on Preventive Health Care in the provision of additional management strategies for depression. In practice, screening strategies have often either been recommended for populations at high risk for depression, such as those with chronic illness, or adopted alone and without further enhancements of care.7
@ See related article page 1023
An earlier version of this review was published in the Cochrane Library in 2005 [Gilbody S, House AO, Sheldon TA. Screening and case finding instruments for depression. 2005;(4):CD002792]. The current paper represents an updated version that incorporates more recent trials, analysis and interpretation.
Une version française de ce résumé est disponible à l'adresse www.cmaj.ca/cgi/content/full/178/8/997/DC1
This article has been peer reviewed.
Contributors: Simon Gilbody was the lead researcher and study guarantor. He, along with Trevor Sheldon and Allan House, contributed substantially to the study conception and design, the acquisition of data, and the analysis and interpretation of data. Each of the authors contributed to the drafting and revision of the manuscript and gave final approval of the version to be published.
Competing interests: None declared.
Correspondence to: Dr. Simon Gilbody, Professor of Psychological Medicine and Health Services Research, Department of Health Sciences, Hull-York Medical School, University of York, YO10 5DD UK; fax 44 1904 321320; sg519/at/york.ac.uk