The 17-item HAM-D measures a set of symptoms with face validity in major depression, including anxiety, sleep problems, impact on work and activities and hypochondriasis. Although the clinician-rated HAM-D
17 and the longer 21-, 24- and 29-item versions have wide acceptance in research settings for measuring efficacy outcomes, the tool has been criticized for its inadequate reliability, lack of internal and external validity and overemphasis on somatic complaints.
5,9 Other observer tools, such as the 10-item Montgomery–Asberg Depression Rating Scale (MADRS), are also available and may offer improved validity.
10 However, none of these rating instruments are popular in the clinical setting. This is primarily because of the length of time required to administer the interview, the lack of training for clinicians and the uncertain value of a given severity score and change across time for different populations.
The briefer unidimensional versions of the HAM-D
17, which assess “core depressive symptoms” commonly reported in clinical practice (e.g., the Bech Melancholia Scale, Maier and Phillip Severity Subscale and the Gibbons Global Depression Severity Scale)
5,6,7 share considerable symptom overlap in that they all include items 1, 2, 7 and 10. The items in the Toronto HAM-D
7, selected on the basis of their frequency of occurrence at baseline and their sensitivity to change with treatment, also included items 1, 2, 7 and 10.
These brief scales have been shown to correlate with the HAM-D
17 assessment of both severity of symptoms and sensitivity to change over time. A study of 164 depressed outpatients with and without atypical features demonstrated that the Bech HAM-D
6 was as sensitive to symptom changes as the 17-, 21- and 24-item versions of the scale.
11 Furthermore, the different versions of the HAM-D were strongly correlated with each other at baseline and endpoint in both depression subtypes. It was concluded that the 6-item version of the HAM-D allowed the assessment of severity of depression with comparable sensitivity to the standard and more elaborate versions of the same scale. Hooper and Bakish
12 compared the sensitivity of the HAM-D
6 with the HAM-D
17 and the MADRS in a retrospective analysis of 4 clinical trials (3 double-blinded, 1 open study) comprising 143 outpatients receiving treatment for major depressive disorder, with or without melancholia and/or dysthymic disorder. The briefer version strongly correlated with the longer version at baseline and termination. The HAM-D
6, HAM-D
17, and MADRS demonstrated equal sensitivity to change over the course of treatment, both in the full sample and in the dysthymic and melancholic subgroups. The ability of the shorter version to show comparable results supports the assertion that the HAM-D
6 measures “core” features of depression.
Faries et al
13 conducted 2 meta-analyses (
n = 2899) to compare the sensitivity of the multidimensional HAM-D
17 with the unidimensional briefer scales (Bech,
5 Maier
7 and Gibbons
6) for detecting treatment differences. In both meta-analyses, the unidimensional core subscales outperformed the HAM-D
17 at detecting treatment differences. With the improved responsiveness and increased effect size, studies based on these subscales would require one-third fewer subjects to detect drug treatment differences. The HAM-D
6 appears to be as (or more) sensitive to change during treatment as the HAM-D
17 and the MADRS.
One potential limitation of the shorter form is that, statistically, the presence of fewer items typically results in lower reliability. However, our data indicate that the shorter forms have comparable reliability estimates to the HAM-D
17. In addition, all of these shortened versions have been extracted from the same parent HAM-D
17. Development of the original scale was guided by clinical experience and logic rather than by empirical testing and re-evaluation.
6 It is confounded by extraneous items that do not reflect severity of depression; it is vulnerable to the influence of antidepressant side effects, and the clinical value of the total score is not clear.
6,12 Moreover, the HAM-D
7 was not validated in patients with known concurrent medical disorders. It is well established that many people with depression in primary care settings present with multiple medical conditions and somatic complaints. The HAM-D
7 includes 2 items that assess somatic symptoms (somatic anxiety, energy). It behooves the clinician to ascertain if somatic symptoms are part of a confluence of depressive symptoms or due to a general medical condition; this scale does not replace everyday clinical decision making.
The question is, does a shortened version of a flawed scale have clinical utility? A prospectively designed study to investigate factors that are indicative of the severity of depression and are sensitive to change with antidepressant therapy would be ideal. A prospective study to validate the Toronto HAM-D7 in general practice is planned.
The clinical utility of the shorter version is increased by the determination that a score of approximately 3 or less is comparable to a HAM-D17 score of less than 8, which is considered a full remission. A cut-off score for “response” was not derived, because it is not considered an acceptable endpoint in clinical practice. A caution is that the cut-off scores derived in this study were based on discriminant function analysis, which employs an algorithm that maximizes a balance between sensitivity (in this instance the presence of remission) and specificity (the absence of remission). Different cut-off scores might be applied if the clinician is more concerned about misidentifying a patient who is not in remission as being in remission (undertreating) at the expense of misidentifying a patient who is in remission as not (overtreating).
Another caution is that the items that compose the HAM-D7 were derived from a single sample and, therefore, need to be replicated in other samples before widespread use, especially in instances where important clinical decisions are to be made. Similarly, the cut-score proposed to detect full remission was derived using discriminant function analysis (DFA) in this sample only. As DFA procedures capitalize on “chance” effects, the cut-score derived in this sample must be replicated before widespread use in either clinical or research settings. Pending replication and cross-validation of these items and the cut-score for determining full remission, the use of the HAM-D7 may have a role in clinical practice and antidepressant trials.