|Home | About | Journals | Submit | Contact Us | Français|
Screening for emotional distress is becoming increasingly common in cancer care. This systematic review examines the psychometric properties of the existing tools used to screen patients for emotional distress, with the goal of encouraging screening programs to use standardized tools that have strong psychometrics. Systematic searches of MEDLINE and PsycINFO databases for English-language studies in cancer patients were performed using a uniform set of key words (eg, depression, anxiety, screening, validation, and scale), and the retrieved studies were independently evaluated by two reviewers. Evaluation criteria included the number of validation studies, the number of participants, generalizability, reliability, the quality of the criterion measure, sensitivity, and specificity. The literature search yielded 106 validation studies that described a total of 33 screening measures. Many generic and cancer-specific scales satisfied a fairly high threshold of quality in terms of their psychometric properties and generalizability. Among the ultrashort measures (ie, those containing one to four items), the Combined Depression Questions performed best in patients receiving palliative care. Among the short measures (ie, those containing five to 20 items), the Center for Epidemiologic Studies–Depression Scale and the Hospital Anxiety and Depression Scale demonstrated adequate psychometric properties. Among the long measures (ie, those containing 21–50 items), the Beck Depression Inventory and the General Health Questionaire–28 met all evaluation criteria. The PsychoSocial Screen for Cancer, the Questionnaire on Stress in Cancer Patients–Revised, and the Rotterdam Symptom Checklist are long measures that can also be recommended for routine screening. In addition, other measures may be considered for specific indications or disease types. Some measures, particularly newly developed cancer-specific scales, require further validation against structured clinical interviews (the criterion standard for validation measures) before they can be recommended.
Transient mood disturbances occur frequently among cancer patients during the disease trajectory, and depression often persists in these patients (1). Consequently, psychosocial counseling has become an integral part of cancer care, and several meta-analyses support its efficacy (2–4). More specifically, behavioral interventions (5–7) and supportive–expressive group therapy (8,9) are effective in reducing emotional distress in cancer patients. These treatments work best for patients with pronounced clinical symptoms of emotional distress (10). To maximize the use of limited treatment resources and provide equitable access to mental health services, emotionally distressed cancer patients need to be reliably identified. Traditionally, referrals for mental health services are either self-initiated or based on physician judgment. However, the concordance rates between patients’ self-report and physicians’ clinical impressions are low, thus identifying a need for standardized validated tools for measuring emotional distress (11,12). Given that the so-called criterion standard clinical assessment interviews for emotional distress - either standardized (eg, the Composite International Diagnostic Interview (CIDI) for DSM IV Axis I Disorders) or structured (eg, the Structured Clinical Interview for DSM IV Axis I Disorders (SCID-I))—are time consuming for both the patient and the clinical staff who administer them and are, therefore, costly, their routine implementation in busy clinics is unlikely. Furthermore, patients who are receiving palliative care may not be physically able to complete lengthy diagnostic interviews. Thus, relatively brief but validated questionnaires would seem to be the tools of choice for routine screening of cancer patients’ emotional distress. Brief self-reports are easy to administer, inexpensive (some are even free), and, if properly validated, can help identify those patients most in need of professional mental health support.
A distinct advantage of systematic screening of cancer patients for emotional distress is that it is likely to promote equal access to psychological services, whereas a system that is based only on physician- or patient-initiated referrals might fail to identify and/or overlook a substantial proportion of emotionally distressed patients who are in need of supportive treatment. Furthermore, systematic screening allows mental health staff to forecast their workload. To date, however, only a minority of cancer centers in the United States (13), the United Kingdom (14), and Canada (15) have implemented emotional distress screening of patients with standardized tools. Time constraints of health professionals and insufficient knowledge about the appropriate screening tool may partially account for the infrequent use of high-quality screening instruments in cancer care settings. The widely acknowledged shortage of professional staff for treatment follow-through suggests a need for screening tools with high sensitivity and high specificity that ensure that all patients in need of psychological support are identified. We posit that the choice of a screening tool ought to consider the psychometric properties of the instrument, with special emphasis on its sensitivity and specificity, the treatment environment, and the patient's disease stage.
Psychological measures (which in this review are referred to as scales, tools, instruments, and measures) come in varying lengths and formats. One important distinguishing feature for various scales is their length, which is defined by the number of questions or test items they contain; the term “screening tool” usually refers to particularly short tests. Longer tests cost more money to administer but are sometimes needed to reach acceptable levels of reliability and validity. The advantages and disadvantages of screening tools of varying lengths are summarized in Table 1. We created the following length categories according to the number of items in a measure: ultrashort (one to four items), short (five to 20 items), and long (21–50 items); these cut points were chosen arbitrarily before data extraction and review. Ultrashort measures are typically limited to one psychological domain, such as depression or anxiety, and are the easiest to implement in routine care settings; however, they may not be appropriate for use in research settings. Their brevity presents a potential economic advantage because fewer staff resources are required for their administration and scoring. As one meta-analysis (16) demonstrated, ultrashort screening tools can possess adequate sensitivity to identify distressed patients but lack the specificity to rule out those patients who were wrongly identified as distressed (ie, false positives). Test instruments that contain more than four items can assess more aspects of emotional impairment and may possess superior psychometric properties. The trade-off is that routine use of longer tools, particularly their scoring and interpretation, places more of a burden on staff time. However, the availability of touch screen computer–based assessments can eliminate this disadvantage because the computer program can automatically score the assessment tool and generate a report.
Included in this review are both newly developed and well-established distress screening tools that have been validated in patients with cancer. To the best of our knowledge, this systematic review is the most comprehensive review of screening instruments for emotional distress in cancer patients to date. In this review, we define distress as a state of negative affect that is suggestive of affective disorders (ie, minor or major depressive disorder and dysthymia), anxiety disorders, and adjustment disorders (depressive, anxious, or mixed). Measures of related domains (eg, physical symptom distress, lack of social support, quality of life, and patient needs) were excluded.
The data extraction and study review process were performed according to the guidelines for systematic reviews of diagnostic tests in cancer (17). We searched MEDLINE (1966 to August 2008) and PsycINFO (1872 to August 2008) databases for English-language studies in cancer patients by using the following search terms (cancer OR screening OR instrument OR measure OR questionnaire OR validation) AND (distress OR depression OR anxiety OR adjustment disorder OR negative affect OR psychological). After eliminating the duplicate studies, the titles and abstracts of the remaining studies were reviewed independently by two authors (A. Vodermaier and C. Siu) (Figure 1). These authors also reviewed the full-length article for all studies that were retained, and their interrater reliability was calculated. Interrater reliability was computed as a kappa coefficient (κ = .86). Disagreements about whether or not studies met the inclusion criteria were resolved by seeking additional input from the second author (W. Linden). The first author (A. Vodermaier) performed a detailed assessment of the included studies and identified additional validation studies via cross-referencing.
A study was included in this review if it attempted to validate a newly developed cancer-specific questionnaire (either interviewer administered or standardized self-administered) or reported on an existing generic measure that had also been validated in a sample of cancer patients. The measure could not exceed 50 items and must have been published in a peer-reviewed English-language journal. We focused on published peer-reviewed studies because we expected them to be the most methodologically rigorous, thus yielding the strongest conclusions with regard to recommendations about tool choice.
Studies included in this review were evaluated on the basis of the following criteria: the number of validation studies identified, the number of participants across studies, generalizability across cancer types and/or disease stages, reliability, type of the criterion measure (in which structured clinical interviews such as Composite International Diagnostic Interview or SCID represent the criterion standard), and validity. When information on sensitivity, specificity, positive predictive value, or negative predictive value was partly missing but could be computed on the basis of other data presented, we completed these computations.
Using the recommendations of statistical experts (18), we required an internal consistency of .8 or higher for a screening instrument to warrant a designation of high quality. Internal consistency was usually reported in the included studies as the Cronbach alpha estimate (19) or as the Spearman–Brown rho coefficient (19). Reliability was available for the generic scales included in this systematic review. Therefore, internal consistency should be reported for newly developed cancer-specific scales for which psychometric properties have not been yet established in order to achieve an evaluation of adequate reliability as a screening tool. Unless subscale reliabilities were specifically reported, the Cronbach alpha or Spearman–Brown rho represents the internal consistency for the entire scale. Test–retest reliability was considered less important as another index of reliability than the scale's internal consistency because mood in cancer patients is known to be unstable and a function of where the patient is in the illness trajectory (20). Information on test–retest reliability or sensitivity to change was included in the description of studies when it was available.
We assumed that the typical screening measures included in this systematic review already have face and content validity. Therefore, this review focused on information about concurrent, construct, and discriminant validity. Concurrent validity is a test's ability to measure similar phenomena as do other tests for the same target variable, for example, other anxiety tests. Construct validity seeks agreement between a theoretical concept and a specific measure. For example, a researcher developing a depression scale will first make a concerted effort to define depression so that the new test actually captures the target variable of depression. Regarding quality of validation, we posit that the most important criterion is whether or not a screening tool has empirically validated cutoffs based on clearly identified sensitivity and specificity data. Hence, in this review, we placed the greatest emphasis on the results of receiver operating characteristic (ROC) analyses that provided empirically justified cutoffs for clinical decision making (ie, discriminant validity). The ROC curve is a graphical plot of the sensitivity vs 1 minus the specificity that provides information needed for choosing a useful cutoff. For this review, a tool was considered to have high validity if the average of its sensitivity and specificity estimates was .80 or higher. We searched for evidence of predictive validity in particular but could not find any study that was suitable to be included in the review.
Our evaluation of the validation studies used decision rules that are summarized in Table 2. The results of individual studies were averaged across each single measure such that the number of participants was weighted across studies within each measure to assess overall reliability, type of criterion measure, and validity. Reliability, type of criterion measure, and validity were rated as high, moderate, or low. These three ratings were condensed into a five-level overall judgment (excellent, good, moderate, fair, or poor) according to the decision rules described in Table 2. The overall judgment was “poor” if any of the three criteria was rated as low, reliability was not reported, no ROC analysis data were available, or the number of participants in a validation attempt was below a threshold of 100 when self-report scales were used as the criterion measure or below 50 when structured clinical interviews were used. Given that generalizability is not of general importance for screening tool choice, this criterion did not influence the overall judgment.
The literature search identified 2747 publications. A total of 1416 studies remained after duplicate studies were removed. The decision steps are detailed in Figure 1. Data extraction and additional articles found via checks of cross-references resulted in 106 validation studies that described a total of 33 measures.
Table 3 provides the summary judgments for the screening tools based on the predefined evaluation criteria. The key data for each study were extracted and are presented in Tables 4, ,5,5, and and6.6. Table 4 presents the validation studies on questionnaires that contain one to four items, Table 5 describes those containing five to 20 items, and Table 6 covers those with 21–50 items. When a non–English-speaking country is noted in the “sample” column, it refers to a version of the scale that was translated according to standard forward and backward translation procedures [except in one study (125), where this procedure was not used]. The Brief Symptom Inventory–18 (BSI-18) (127), the BSI-53 Global Severity Index (128), the Center for Epidemiological Studies–Depression Scale (CES-D) (129), the General Health Questionnaire–12 (GHQ-12) (130), the Hospital Anxiety and Depression Scale (HADS) (131), the Patient Health Questionnaire–9 (PHQ-9) (132), the Symptom Checklist-90–Revised (133), and the State–Trait Anxiety Inventory–trait version (134) were used as criterion measures; the BSI-18, CES-D, and HADS were also used as screening tools.
A total of 29 studies examined the use of ultrashort screening instruments (Table 4). The majority of these ultrashort measures were validated for use in patients with advanced cancer.
The single-item question “Are you anxious?” (21) was studied as a screening tool for emotional distress in palliative care patients and showed insufficient specificity to rule out nonanxious patients. The anxiety subscale of the HADS (131) was used as the criterion measure.
The Brief Case Find Depression is a four-item scale that was validated against the Primary Care Evaluation of Mental Disorders in a small sample of cancer patients (22). Its interrater reliability was low. The measure had moderate specificity and performed worse than the HADS and the Beck Depression Inventory (BDI) (135) in ruling out nondepressed patients.
Lengthy questionnaires may be especially burdensome for patients in palliative care. For this reason, several studies have tested single questions from diagnostic interviews against structured clinical interviews as a screening method for depressive disorders in palliative care patients. Altogether, seven studies (21,23–28) examined the psychometric properties of single screening questions; four of these studies (23,24,26,27) tested the single question against a structured clinical assessment of the diagnosis. Three studies (23,26,28) also examined the combination of the two screening questions (hereafter referred to as the combination depression questions) that represent the first and second diagnostic criteria for a depressive disorder. The first criterion—“Are you depressed?”—yielded perfect sensitivity and specificity to detect any kind of depressive disorder and outperformed the BDI and the visual analog scales in one study (23). However, in several other studies, it had low sensitivity to detect any affective disorder (24–27), whereas its sensitivity to detect a major depressive disorder was high across all studies. The second diagnostic criterion for a depressive disorder—“Have you lost interest?”—showed the same pattern as the first diagnostic criterion question (26) in that it was much less sensitive in detecting minor disorders, such as adjustment disorder, than in detecting major depressive disorder (26). The combined screening questions did not increase the specificity compared with each individual question but increased the sensitivity (26,28).
An alternative screening tool, the one-question interview, was developed by Akizuki et al. (29) and asked patients to “Please grade your mood during the past week by assigning it a score from 0 to 100, with a score of 100 representing your usual relaxed mood. A score of 60 is considered the passing grade.” The measure had comparable psychometric properties to the HADS (131) and the National Comprehensive Cancer Network Distress Thermometer (DT) (136), but its criterion validity was low.
The National Comprehensive Cancer Network DT was introduced more than a decade ago (136) and measures overall emotional distress with one item on an 11-point rating scale (from 0 = no distress to 10 = extreme distress). Although domain-specific distress can be measured with a complementary problem list that asks whether problems exist in practical, familial, emotional, physical, or spiritual domains, most of the studies included in this review provided psychometric information only on the DT itself. Altogether, 15 validation studies (29–43) examined the DT. Of these, eight studies (33,34,36,38–40,42,43) used the HADS as the criterion measure, four studies (31,35,37,41) used exclusively other distress or depression scales, and two studies (29,32) relied on clinical diagnosis to assess the validity of the DT. The DT scale was tested in populations of cancer patients with mixed diagnoses and disease stages, breast cancer patients, and patients awaiting bone marrow transplantation.
Two studies (31,43) provided information on the internal consistency of the problem list. Its overall reliability was good but was insufficient for some of the subscales. Sensitivity to change has been shown in one study (40): changes in DT scores at 4 and 8 weeks were comparable to changes in the criterion measures’ scores. However, the interrater reliability, that is, the congruence of patient self-report compared with nurses’ judgments, tested in one study (29), was moderate. Nurses seemed to underestimate the actual distress of the patients (29). Taken together, criterion measures were weak to moderate, and most studies demonstrated moderate specificity for the DT.
The optimal cutoff for identifying clinically significant distress in most studies was defined as 4 or 5, depending on the diagnostic criteria or the validation measures used. Compared with nondistressed patients, distressed patients reported more problems on the problem list (34,35), had lower Eastern Cooperative Oncology Group performance status (34,35), and were more likely to be female (34,38).
Several modifications and extensions of the DT have been developed, including two-item screening tools that combine the DT with an impact thermometer, which asks patients about the impact of distress on their daily life activity (32), and with a mood thermometer (33). Both alternatives have been tested in comparison with the DT and demonstrate better psychometric properties.
Two studies (21,44) examined subscales of the Edmonton Symptom Assessment System (137) that measure anxiety and depressive symptoms in comparison with the HADS. The Edmonton Symptom Assessment System was developed to assess symptom distress in palliative care patients. The scale demonstrated moderate validity as a screening tool for emotional distress in palliative care patients.
Six studies (23,45–49) examined the validity of visual analog scales that were derived from the Memorial Pain Assessment Card mood subscale (138) as screening tools for emotional distress in various populations of cancer patients. One study (46) reported a moderate correlation between patients’ self-reported distress and the distress levels rated by their physicians. Another study (49) that compared several screening instruments with structured clinical interviews provided evidence that visual analog scales performed worse than other screening measures.
Most of the screening measures that have been validated for use in cancer patients have between five and 20 items. Altogether, 72 studies described 15 screening instruments of this length (Table 5).
The BDI–Short Form is a widely used depression scale that consists of 13 items (139). Two studies (23,50) examined the psychometric properties of this scale in populations of patients with advanced cancer. The BDI–Short Form demonstrated low interrater reliability and moderate specificity.
The BSI-18 is a self-report scale that was designed to assess clinically relevant psychological symptoms (127). The scale was tested against its two long forms, the BSI-53 Global Severity Index (128) and the Symptom Checklist-90–Revised (133). With these criterion measures, the BSI-18 demonstrated excellent reliability and validity in a large mixed sample of cancer patients with a sensitivity and specificity of .91 and .93, respectively (52), and in adult survivors of childhood cancer with a sensitivity and specificity of .97 and .85, respectively (53). Internal consistency was high for the anxiety and depression subscales (54,55). Results of a factor analysis confirmed the scale's three-factor structure (ie, depression, anxiety, and somatization) (55).
The CES-D (129) is a 20-item depression measure that has been validated in mixed samples of cancer patients and reference groups of healthy control subjects (56–59). Results from factor analyses (57) suggested that the negative affect subscale of the CES-D was a better measure of depression than the CES-D total score. The CES-D demonstrated good internal consistency (56,57,59). Two studies (58,59) provided information on the scale's sensitivity and specificity and revealed that it has very good psychometric properties.
The Edinburgh Postnatal Depression Scale, a 10-item scale that was initially developed to screen for postpartum depression in new mothers, measures guilt, worthlessness, and hopelessness (140), which are symptoms that may also discriminate between depressed and nondepressed patients with advanced cancer. This scale was examined as a screening tool for depression in patients with advanced cancer (25,51,60,61) and tested against a structured clinical interview as the criterion (25,51,60). The sensitivity and specificity of the Edinburgh Postnatal Depression Scale were adequate, and it performed better than the HADS in this population. The Edinburgh Postnatal Depression Scale also demonstrated good internal consistency and interrater reliability (25,60,61). A short form of the EDPS, the six-item Brief Edinburgh Depression Scale had psychometric properties that were comparable to those of the original scale (51).
The GHQ-12 (130) was tested as a screening tool for psychological distress in two studies (62,63) and compared with the HADS. Both studies demonstrated that the psychometric properties of the GHQ-12 were adequate but inferior to those of the HADS in samples of patients with advanced cancer.
The HADS is a 14-item questionnaire that assesses anxiety and depressive symptoms in medical settings (131). A total of 41 of the identified validation studies of screening tools used to detect psychological distress in cancer patients were conducted on the HADS or compared its psychometric properties with other scales (22,26,29,32,33,46,49,50,58,62–93). Ten studies (33,66,67,73,79,81,86–89) tested whether or not the known two-factor structure of the HADS (which corresponds to the anxiety and depression subscales of the questionnaire) could be replicated in samples of cancer patients. Most of those studies (33,66,73,81,87–89) did replicate the two-factor structure of the HADS in cancer patients. Two studies (67,86) yielded a three-factor solution and one study (78) a four-factor solution. Smith et al. (81) demonstrated in a very large sample of cancer patients that the two-factor structure was stable across subsamples that were stratified by age, sex, and disease stage. The internal consistency of each subscale and of the total scale were shown to be adequate (33,66,67,72,73,78,82,86,88) and sensitive to change (72) in cancer patients.
Twenty-six studies (22,26,49,50,58,62–65,68–70,72–81,83,89,91,93) examined the discriminant validity of the HADS by comparing it with structured clinical assessments such as the SCID, Present State Examination, Clinical Interview Schedule, Clinical Interview Schedule–Revised, Psychiatric Assessment Schedule, Monash Interview for Liaison Psychiatry, Schedule for Affective Disorders and Schizophrenia, Schedule for Clinical Assessment in Neuropsychiatry, Composite International Diagnostic Interview, Primary Care Evaluation of Mental Disorders, and the Diagnostic Interview Schedule. Ten studies (49,58,62,65,70,72,73,75,83,91) showed that the screening performance of the HADS was high, 14 studies (22,26,63,64,68,69,74,76–79,81,89,93) showed moderate performance, and two studies (50,80) reported low screening performance.
One study (69) reported that the HADS performed better in patients who were disease free or who had stable disease than in patients in acute treatment or with advanced disease. The HADS failed as a screening instrument in patients newly diagnosed with breast cancer (80).
In some studies (65,74,93), the anxiety subscale of the HADS performed better than the depression subscale. Other studies demonstrated that the HADS total score had psychometric properties that were comparable (65) or superior (49) to those of the anxiety or depression subscales. We were disconcerted to find that cutoffs for distinguishing anxious or depressed patients from nonanxious or nondepressed patients differed widely across studies and that this variability had not been justified. The cutoffs for the HADS total score ranged from 8 to 22 and for the subscale scores from 5 to 11.
The Hornheide Questionnaire Short Form is a nine-item questionnaire that was validated in 122 German patients with head and skin cancer following surgery and had high internal consistency (α = .81) (141). One study (49) compared different screening measures in a sample of German patients with laryngeal cancer and found that the psychometric properties of the Hornheide Questionnaire Short Form and of the other instruments were inferior to those of the HADS.
The Impact of Event Scale was originally developed as an instrument to measure posttraumatic stress and is a 15-item scale that is widely used to assess emotional distress in cancer patients (142). One study (94) examined the discriminant validity of the Impact of Event Scale to detect adjustment disorder in patients undergoing bone marrow transplantation and found that this scale had inadequate specificity for use as a screening tool in this population. Other studies (95,96) did not provide further evidence for recommending the Impact of Event Scale as a distress screening tool in cancer patients.
The Memorial Anxiety Scale for Prostate Cancer is an 18-item scale that was developed for use in prostate cancer patients and consists of three subscales: prostate cancer anxiety, prostate-specific antigen anxiety, and fear of recurrence (97). Except for the prostate-specific antigen anxiety subscale, the Memorial Anxiety Scale for Prostate Cancer has good internal consistency. Preliminary results of the scale's validity have been reported (97,98), but clinical cutoffs have yet to be established. The Memorial Anxiety Scale for Prostate Cancer was also validated for use in men undergoing prostate biopsy (99).
The Psychological Distress Inventory (100) is a 13-item scale that was developed to measure distress in Italian breast cancer patients. Its reliability and validity indices are good (79,100). The discriminant validity of the Psychological Distress Inventory was tested against a structured clinical interview as the criterion, and cutoffs of 28 (79) and 29 (100) have been considered clinically significant. However, its use is limited to Italian-speaking patients.
The Patient Health Questionnaire–9 (PHQ-9) measures depressive symptoms according to Diagnostic and Statistical Manual of Mental Disorders–Fourth Edition (143) criteria. The PHQ-9 was validated in a large sample of primary care and obstetrics and gynecology patients and found to have strong psychometric properties (132). The PHQ-9 also demonstrated adequate reliability as well as concurrent and divergent validity in a small study of head and neck cancer patients (101) and in a study that used a touch screen computerized version of the questionnaire (102). However, information on the scale's sensitivity and specificity with regard to clinical decision making in cancer patients is lacking.
The Post Traumatic Stress Disorder Checklist-Civilian Version (144) was tested as a measure of posttraumatic stress in breast cancer patients (103) and in survivors of bone marrow transplantation (104,105). The latter two studies (104,105) examined the scale's construct validity and demonstrated that it had high reliability and a four-factor structure. In a sample of breast cancer patients, the measure showed moderate sensitivity but high specificity to detect posttraumatic stress disorder (103).
One study evaluated the Profile of Mood States–Linear Analog Self-Assessment (106) as a screening instrument in cancer patients with mixed diagnoses and for patients with different stages of disease and compared it with the original Profile of Mood States and the Symptom Checklist-90–Revised. The measure demonstrated sensitivity to change and concurrent validity. However, not enough data are available on its psychometric properties to recommend its use in clinical decision making.
The Zung Self-Rating Depression Scale is a 20-item questionnaire that evaluates depression (145). Six studies (94,107–111) reported information on the scale's psychometric properties in cancer patients. A 13-item short form of this scale is highly correlated (r = .92) with the long form (110). Although the scale (long form) had high reliability, it demonstrated low concordance rates with physician ratings of depression (107) and moderate validity (94,110) when used for cancer patients. Also, the short-form scale was found to have inadequate sensitivity compared with the long-form scale (110).
Nine scales, each with more than 20 items, were identified for screening cancer patients (Table 6).
One small study of cancer patients (59) examined the psychometric properties of the Beck Anxiety Inventory (146). The study provided evidence that the Beck Anxiety Inventory can be a valid measure to screen cancer patients for emotional distress, but there is not enough validation information available to justify a recommendation at this time.
Five studies (22,58,59,71,112) examined the psychometric properties of the 21-item BDI (135), and all but one (112) provided data from ROC analyses. One study (22) showed that the scale possessed low sensitivity, whereas the other studies demonstrated that it had excellent sensitivity and specificity to detect any depressive disorder.
The Distress Inventory for Cancer (113) was developed for use in head and neck cancer patients. To our knowledge, the only information available to date is on the scale's construct validity, and more studies on the scale's discriminant validity are necessary before a recommendation is possible.
The GHQ-28 (130) was tested as a screening tool for psychological distress in two studies (69,114), where it demonstrated high sensitivity and specificity to detect cancer patients with psychiatric symptoms.
The Mood Evaluation Questionnaire (147) is a 23-item measure that demonstrated excellent internal consistency but only moderate agreement with SCID interview data (115). Its discriminant validity was adequate (115). The Mood Evaluation Questionnaire has been used for repeated assessments in patients with advanced cancer (116).
One study (117) provided information about the construct validity of the Profile of Mood Scale–Short Form (148) for patients awaiting bone marrow transplantation. A factor analysis identified six factors that provided evidence for construct validity. The internal consistency of the subscales was high, with Cronbach alphas that ranged from .78 to .90 (117). To date, there is insufficient information on this scale's validity to make recommendations for its implementation in routine screening.
The 21-item Psychosocial Screen for Cancer was developed in mixed samples of cancer patients, and its psychometrics are good (118,119). The scale assesses six domains: depressive symptoms, anxiety symptoms, quality of life (global), quality of life (number of days impaired), perceived social support, and social support desired. The anxiety and depression subscales were highly sensitive and specific when compared with the HADS. In addition, normative data exist that compare different samples of cancer patients with healthy control subjects and with a control group of persons with a chronic disease other than cancer (119). Specificity data suggest the use of a cutoff of 11 for screening of an anxiety or depressive disorder and a cutoff of 8 for screening of anxiety and depressive symptoms.
The Questionnaire on Stress in Cancer Patients–Revised is a 23-item validated scale that was developed in a large sample of German patients with diverse cancer diagnoses (120,121). The Questionnaire on Stress in Cancer Patients–Revised consists of five subscales that measure psychosomatic symptoms, anxiety, information gaps, impairments in everyday life, and social distress. The Questionnaire on Stress in Cancer Patients–Revised is highly sensitive and moderately specific in detecting anxiety and depressive symptoms compared with the HADS. However, its use is limited to German-speaking patients because to our knowledge, no psychometric information exists on its translation into English (121).
The Rotterdam Symptom Checklist (RSCL) is a 30-item questionnaire that has been used extensively in clinical trials (122). Although some studies showed a four- (122,123) or five-factor structure of this scale (124), a two-factor psychological and composite somatic structure has also been suggested (122–126). The psychological subscale demonstrated stability across subsamples as well as high internal consistency (122,124). Three studies provided information from ROC analyses: Two studies (65,69) reported that the RSCL had moderate psychometric properties for use as a screening tool, and one (74) found that the RSCL failed as a screening tool because of its low sensitivity. The RSCL was superior to the HADS in two studies (65,69) for samples of patients with progressive disease. Three studies reported on the psychometric properties of non-English [French (126), Italian (123), and Spanish (125)] versions of the questionnaire and showed results congruent with the original report, thus providing evidence for its use in cross-cultural settings. One study (149) reported only on an extension of the physical symptom scales of the RSCL and, therefore, was not included in this systematic review.
We have provided extensive details on tool psychometrics, as well as details on types of tools and extent of validation, to guide clinicians’ own choice of an assessment instrument for routine emotional distress screening. Making recommendations about which screening tools should be used depends on the context in which tools are going to be implemented and the intended objectives that may vary across settings and users. The following recommendations were based on composite quality criteria that we defined using transparent decision rules (Table 2).
Among ultrashort measures, the two-item combination depression questions had the best psychometric properties. The widely used DT had been subjected to the most validation studies on the largest patient samples but was not validated against a structured clinical interview with established sufficient psychometrics. For the DT, the sensitivity and specificity findings were lower than 80% in about half and two-thirds, respectively, of the validation studies. However, some evidence suggests that modifications of the DT, such as the Mood Thermometer (33), or expansions, such as the Impact Thermometer (32), may represent improvements over the original scale.
Our findings regarding ultrashort measures differ in part from the results of other meta-analyses and reviews on screening tool validity. Meta-analyses (16,150) as well as studies in primary care (151,152) have demonstrated a lack of specificity in ultrashort measures (including the DT) for identifying depression. However, our results reveal that this criticism does not apply to the combination depression questions as these were found to demonstrate high specificity.
When it comes to ultrashort measures, patients have reported that a single-item interview format did not accurately describe or capture their mood (38,116). In line with these findings, Ohno et al. (153) reported that 65% of patients responded to the question “Are you depressed or not?” with “neither,” which indicates their uncertainty when rating emotional distress with such a simple question, even though their HADS scores suggested that they had clinical depression. Furthermore, agreement between ultrashort and longer measures in identifying distressed patients detected by structured clinical interviews was poor (115). Problems with determining the face validity of single-item measures as well as patients’ difficulty with scaling on single-item screening tools could explain these discrepant findings. Consequently, further comparison studies investigating tools of different lengths should be conducted.
Among the short measures, we can recommend the CES-D as a screening tool for depression because it met all criteria for quality. The most extensive validation existed for the HADS, and this was the case across disease types and stages as well as across languages and cultures. The scale has been extensively tested against criterion standards.
Note that many other tools relied on the HADS for discriminant validation. Studies that compared the discriminant validity of the HADS against other scales found that the HADS was superior (26,49,58,62,63) or equivalent (65,69) to other measures. With regard to whether or not to use the total score or the subscale scores of the HADS, several studies showed that the total score was superior in nonpsychiatric patients (49,65,154).
The BSI-18 and the GHQ-12 are short measures that also demonstrated good psychometric properties. Nevertheless, ROC analyses of the BSI-18 were based on comparisons of short form with the long form of the same instrument and do not, therefore, represent independent validation (52,55). In addition, the GHQ-12 consistently performed worse than the HADS (62,63). Nonetheless, both scales have also been used as criterion measures in validation attempts of other scales.
The Post Traumatic Stress Disorder Checklist-Civilian Version, the Psychological Distress Inventory, and the Hornheide Questionnaire Short Form are short measures that demonstrated adequate psychometric properties. However, their use to date is limited to specific cancer types or language applications. For patients receiving palliative care, the Edinburgh Postnatal Depression Scale or its short form, the six-item Brief Edinburgh Depression Scale, demonstrated adequate psychometric properties. Because of the strong psychometric properties of the PHQ-9 in large samples of primary care and obstetrics and gynecology patients (132), this scale deserves further empirical evaluation of its value for distress screening of cancer patients.
Among the long measures, the BDI and the GHQ-28 met all quality criteria. The Psychosocial Screen for Cancer has not been validated against a structured clinical interview but otherwise met all criteria. In addition, the Psychosocial Screen for Cancer provides information on the social support that a patient desired and actually received, which may also guide decision making in psycho-oncological follow-up. The Questionnaire on Stress in Cancer Patients–Revised was validated in a large sample of cancer patients and provided good psychometric properties. The existing English version of the scale, therefore, deserves recommendation as a screening tool for emotional distress in cancer patients. Finally, the RSCL is a long measure that demonstrated adequate psychometric properties for distress screening.
Cancer-specific tools may provide more relevant information than generic scales on patients with a specific type of cancer; however, some of these tools, such as the Memorial Anxiety Scale for Prostate Cancer (97), require additional validation. Furthermore, the routine use of cancer-specific tools is particularly likely to be implemented in specialized centers such as those that treat breast or prostate cancer patients. Facilities that treat patients with a broader disease spectrum may benefit most from a screening tool that can be applied to a mixed patient population, such as well-established scales including the BDI, the CES-D, the GHQ-28, or the HADS. Furthermore, the use of a scale that assesses anxiety as well as depressive symptoms, such as the BSI-18, GHQ-28, the HADS, the Psychosocial Screen for Cancer, or the RSCL, may prevent anxiety disorders from being overlooked within a routine screening program.
We argue that, depending on the physical condition of the patients and the treatment setting, relatively short tools should be used for the screening of palliative care patients or patients who are undergoing strenuous treatment. Furthermore, the use of shorter tools for routine screening in an inpatient setting is easier to justify and implement. By contrast, patients who have completed treatment, have follow-up appointments, or are attending rehabilitative care may have more physical resources (eg, compared with patients under chemotherapy treatment or palliative care patients) and more time to complete longer questionnaires. Moreover, cancer patients who are undergoing treatment may require immediate psychological support, whereas cancer survivors may need to adapt to the disease in the long term. For the latter patients, a more extensive psychological assessment seems to be needed.
Although single-item interviews may have a useful role in assessing distress in palliative care patients by minimizing patient burden, it is also true that somewhat longer scales may have higher content validity and may be better suited for longitudinal assessments. Future research should compare the accuracy and appropriateness of tools of differing lengths in specific treatment settings.
Choosing a tool for routine screening of cancer patients requires a trade-off between a measure with adequate psychometric properties and one with a reasonable length. It has been shown that computerized versions of screening instruments that use touch screen technology can be used successfully, including by older patients (155). The use of fully computerized touch screen and autoscoring technology minimizes the workload of oncology treatment personnel, further reduces costs, and ensures the continuity and standardization of its application.
The usefulness of a screening program for emotional distress can be evaluated according to whether or not screened patients accept referral to a mental health professional. Shimizu et al. (156) found that neither patient demographic variables nor the level of physical functioning, disease stage, or treatment status was associated with acceptance of a referral by the patient, whereas level of distress was, thus providing evidence that screening for emotional distress can result in enhanced utilization of psychological treatment. Compared with structured clinical interviews, distress screening instruments tend to overestimate the prevalence rates of depressive disorders in cancer patients (116). In this regard, measures that have superior psychometric properties may, therefore, reduce the workload of psycho-oncology staff and allow for the accurate forecasting of resource needs. When clinic staff, alone or in cooperation with researchers, want to undertake distress tracking over time to assess treatment outcomes and/or learn more about adjustment processes longitudinally, then ultrashort screening tools tend to fall short because they lack a range of scores. Only the longer versions of measures can accomplish such objectives.
Several limitations of this systematic review must be noted. Some validation studies or measures could have been overlooked because of the fact that only peer-reviewed articles were included in this review. On the other hand, the scientific accuracy of such studies or measures would have remained unclear because of their lack of peer review. Furthermore, we only included validation studies that provided information on construct validity, discriminant validity, and/or concurrent validity for at least one additional measure, and we excluded feasibility studies that only reported on the measure itself or on a translation of the measure. Many studies that were included only reported on limited aspects of validation. Of these, several described results of factor analyses, as well as subscale and total scale reliabilities, whereas others provided data from ROC analyses without information on reliability. Also, many included studies did not provide sufficient descriptive statistics to allow us to compute missing indices of sensitivity, specificity, positive predictive values, and negative predictive values. Consequently, the conclusions we draw in this review depend on the information given in the original reports. However, the strength of a systematic review is that it provides a broader scope than meta-analyses, which typically combine studies of varying types and consequently provide only summary statistics. Hence, this systematic review is, to our knowledge, the most comprehensive review to date that addresses a broad range of screening tools, varying types of cancers, and disease stages.
In conclusion, several generic and newly developed cancer-specific instruments meet high-quality criteria for use in emotional distress screening of cancer patients. Many general emotional distress screening tools focus on depression. Nonetheless, highly prevalent transient anxiety or mixed emotional disorders that occur during the cancer diagnosis and treatment trajectory deserve the attention of clinicians. Hence, the exclusive use of a depression scale may overlook other disorders (eg, anxiety disorders). Consequently, a scale that measures mixed emotional states rather than depression only has clear merit for clinical practice.
Apart from purely psychometric considerations, large-scale implementation of screening for emotional distress may not occur if a given test has to be purchased for each use. This factor alone may have an impact on the choice of a screening tool, given that some well-validated screening tools have to be purchased for every use, whereas others are available at no cost. Another useful criterion for deciding which tool to use is the treatment setting. For example, treatment centers that specialize in breast or prostate cancer may prefer to use disease-specific measures.
In terms of actual decision making, it is important to recognize that a measure's sensitivity and specificity are a function of the cutoff that is used to distinguish anxious or depressed patients from nonanxious or nondepressed patients. Higher cutoffs improve the measure's specificity, and treatment facilities can decide upfront, by consciously choosing a specific cutoff, the amount of psychological and psychotherapeutic follow-up treatment they are willing to or can provide. Given that we were able to find a large number of well-executed validation studies on distress screening tools, we question whether the development of additional tools at this time should be discouraged to avoid redundancy. However, it may be worthwhile to initiate additional attempts to improve the validity of work on the tools that have good psychometric properties but that have not yet been validated against criterion standards.
Worthy of note is an ongoing National Institutes of Health project—the Patient-Reported Outcomes Measurement Information System network (http://www.nihpromis.org/default.aspx)—to improve measures of patient-reported outcomes. A number of tools for the assessment of emotional distress in patients with chronic diseases are in the process of being developed within this network that may be useful as potential screening tools for emotional distress in cancer patients in the future.
Empirical findings published to date do not allow us to judge the predictive validity of screening tools for emotional distress. Nonetheless, the screening tools recommended here are effective for routine screening of emotional distress based on their high sensitivity and specificity. However, further information is needed about how screening affects long-term outcomes and patient quality of life.
Canadian Institutes for Health Research (team grant #AQC83559).
The study sponsor did not have any role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; and the decision to submit the manuscript for publication.
We are very grateful for the constructive feedback of Paul Jacobsen, PhD, and Alex Mitchell, MD, on an earlier version of this article. We are also grateful to Roanne Millman for supporting language and final editing of the article.