|Home | About | Journals | Submit | Contact Us | Français|
The Mini Mental State Examination (MMSE) is frequently used to assess cognition in studies of late-life depression (LLD). However, its sensitivity and specificity in this population are largely unknown. We undertook an analysis of subjects with LLD and hypothesized that: (1) at the traditional cutoff of 24, the MMSE would have low sensitivity in the detection of cognitive impairment; (2) increasing the cutoff score would improve this sensitivity at the expense of a minimal reduction in specificity.
We analyzed the MMSE scores of 447 non-demented subjects with LLD using the Dementia Rating Scale (DRS) as the gold standard for cognitive function.
Using the DRS raw total cutoff of 132 as the “gold standard”, the MMSE at a cutoff of 24 has a sensitivity of 8.0% and a specificity of 99.4% in detecting “cognitively impaired” depressed elders. A receiver operating characteristic curve demonstrates that with an MMSE cutoff of 27 instead of 24, its sensitivity more than quadruples and increases to 37.5% while its specificity decreases minimally from 99.4% to 91.3%.
In our sample almost all of those classified as cognitively impaired by the DRS are mislabelled as “cognitively intact” by the MMSE. By using a higher cutoff score, the sensitivity can be increased with a minimal reduction in specificity. Our findings have significant implications for those who study or treat persons with LLD or other neuropsychiatric disorders.
The Mini Mental State Examination (MMSE) (Folstein et al., 1975) is widely used by clinicians and researchers in the treatment or study of late-life depression (LLD). This is due in part to its psychometric properties such as excellent interrater reliability (Anthony et al., 1982; Folstein et al., 1975; O’Connor et al., 1989) and good evidence of criterion and construct validity for dementia (Anthony et al., 1982; O’Connor et al., 1989). In addition, most clinicians are very familiar with this instrument and many use it to estimate global cognitive performance or to screen for acute cognitive impairment. In clinical trials, it is often used to exclude subjects with cognitive impairment that may interfere with treatment or assessment of response. However, in some trials cognitive impairment is allowed particularly in those addressing the effects of cognitive impairment on treatment response or vice versa (Butters et al., 2000; Potter et al., 2004). Traditionally, a MMSE cutoff score of 24 (Lezak et al., 2004) has been used in clinical settings to detect cognitive impairment. Similarly, cutoff scores of 24 or 25 and below have been used to exclude subjects from participating in LLD pharmacological (Rapaport et al., 2003; Schneider et al., 2003; Tollefson et al., 1995) and psychological (Lincoln and Flannaghan, 2003; Rokke et al., 2000) treatment studies. Some studies have suggested that when these cutoff scores are used, the MMSE is neither sensitive nor specific in the detection cognitive impairment (Anthony et al., 1982; O’Connor et al., 1989). Thus, we conducted an analysis of MMSE scores in a large sample of patients with LLD who were categorized as being cognitively impaired or intact using a more comprehensive instrument, the Dementia Rating Scale 2 (DRS) (Jurica et al., 2001; Mattis, 1973; Vitaliano et al., 1984). We hypothesize that: (1) at the traditional cutoff of 24, the MMSE has low sensitivity in the detection of cognitively impaired subjects with LLD; (2) that increasing the cutoff score will significantly improve this sensitivity at the expense of a minimal reduction in specificity.
We report on 447 non-demented depressed older subjects who were recruited in four clinical trials: PAROX, LLTSD, MTLD-II, and SWELL. Details on each study can be found elsewhere (Butters et al., 2000; Mulsant et al., 2001; Reynolds et al., 2005, 2006; Saghafi et al., 2007; Szanto et al., 2003). Briefly, 32 subjects, age 60 and above, were participants in the double-blind randomized comparison of nortriptyline and paroxetine in late-life depression (PAROX) (Mulsant et al., 2001). They were recruited between June 1996 and December 1998 from in- and outpatient geriatric units of the Western Psychiatric Institute and Clinic (WPIC), a teaching hospital that provides psychiatric care to a large urban and rural catchment area in Southwestern Pennsylvania. Similarly, 49 subjects, age 60 and above, were participants in the late-life total sleep deprivation (LLTSD) (Reynolds et al., 2005) and were recruited between November 1998 and August 2003 at WPIC; 132 subjects age 70 and above, were participants in the maintenance therapies in late-life depression II (MTLD-II) (Reynolds et al., 2006) and were recruited between March 1999 and February 2003 at WPIC; finally, 234 subjects, age 60 and above, were recruited in the Geriatric Depression: Getting Better, Getting Well study (SWELL) between June 1, 2004 and May 1, 2006 in primary care practices and at WPIC (Saghafi et al., 2007).
All of the assessments included in this analysis were performed prior to the initiation of study medications or sleep deprivation. All subjects received a comprehensive evaluation performed by a multidisciplinary geropsychiatric team. This diagnostic evaluation included a psychiatric history and mental status examination, a social and medical history, a physical examination, structural magnetic resonance imaging scans (for some depressed subjects) and laboratory tests. The assessment also included the Structured Clinical Interview for DSM-IV Axis I Disorders, Patient Edition (SCID-IV) (First et al., 2002). On the basis of all information available, consensus Axis I diagnoses were established according to DSM-IV criteria by faculty psychiatrists and research staff.
In all four studies, subjects met criteria for a major depressive episode based on the SCID-IV and had a baseline rating of 15 or higher on the 17-item Hamilton Depression Rating Scale (HDRS-17) (Mulsant et al., 1994). The HDRS-17 was administered at baseline for 286 subjects, 1–7 days prior to baseline for 148 subjects, 8–14 days prior to baseline for 8 subjects, 15–23 days prior to baseline for 3 subjects, and 1–4 days after baseline for 2 subjects.
Subjects who enrolled in these studies presented with a range of clinical and cognitive function. However, subjects were excluded if they had a previous diagnosis of dementia from a qualified healthcare provider. Furthermore, none met DSM-IV criteria for dementia based on the comprehensive evaluation described above. Subjects were also excluded if they had a baseline MMSE score lower than 17 (LLTSD, MTLD-II, and SWELL) or 18 (PAROX). For this analysis we excluded the unique subject (from SWELL) who had a baseline MMSE score of 18. Finally, subjects were excluded if they had any of the following: a lifetime diagnosis of bipolar disorder, schizophrenia, schizoaffective, or other psychotic disorders; or a history of alcohol or drug abuse within the past 12 months. All subjects provided written informed consent.
All subjects were administered the MMSE and the DRS at baseline and on the same day by trained clinicians under the supervision of one neuropsychologist (M.A.B.) at one site (WPIC). The MMSE is a well known and widely used brief mental status test of cognitive function. The test consists of 13 items assessing orientation to place and time, learning and memory, construction ability, attention, and calculation. The possible range of scores is 0–30. The intra class correlation (ICC) for the MMSE was calculated across the raters and was 0.98.
The DRS is a more extended screening instrument, designed to assess cognitive functioning in dementia. It includes screening items that were used on all subjects. It consists of 36 items and yields a range of scores from 0 to 144. It assesses function in several cognitive domains: the Attention subscale consists of repeating digit strings, following one and two-step commands, and counting target letters embedded in a random array of letters; the Conceptualization subscale requires generating abstract concepts common to series of two items presented verbally and three items presented visually; the Construction subscale consists of copying designs and name writing; the Initiation/Perseveration subscale consists of naming supermarket items, repeating series of rhymes, performing double-alternating hand movements, and copying rows of alternating symbols; and the Memory subscale comprises orientation items, delayed recall of two sentences and recognition memory for series of word pairs and design pairs. The ICC for the DRS was calculated across the raters and was 0.997.
First, we calculated the correlation coefficients between the MMSE and the DRS total raw scores. Then, subjects were classified as either cognitively intact or impaired based on their DRS and MMSE scores: cutoff scores of 132 (total DRS scores) and 24 (total MMSE scores) were selected, and we calculated Kappa statistic to assess the agreement between the DRS and MMSE when classifying subjects as “cognitively impaired” based on these cutoff scores. Then, we calculated Kappa statistics using different MMSE cutoff scores. Then, using the DRS as the “gold standard” to define cognitive status, we calculated the MMSE sensitivity (i.e., 1 – false negative rate), specificity (i.e., 1 – false positive rate), and positive and negative predictive values for classifying subjects as cognitively impaired. Finally, we constructed a receiver operating characteristic (ROC) curve for several MMSE cutoff scores using the DRS to define cognitive status and we plotted the sensitivities and specificities of the MMSE at these different cutoff scores. A total raw DRS cutoff score of 132 was selected as the “gold standard” because it corresponds to 1.5 SD below the mean of an age- and education-equated control group (Butters et al., 2000) and a cutoff 1.5 SD below the mean optimizes the balance between sensitivity and specificity (Heaton et al., 1991).
The demographic and clinical characteristics of the subjects are summarized in Table 1. For the whole group, the mean (SD) age of subjects was 73.3 (7.5) years. Mean (SD) education level was 13.5 (2.9) years. The majority of subjects were Caucasian (90.8%) and female (67.3%).
The mean (SD) scores were 28.2 (1.9) for the MMSE, and 134.5 (6.5) for the DRS and they were significantly correlated (r = 0.552, p < 0.0001) (see Fig. 1).
Eleven (2.5%) subjects scored less than 24 on the MMSE and were classified as being cognitively impaired compared to 112 (25.1%) with a DRS score of less than 132. The kappa statistic for the overall agreement between the MMSE and DRS in classifying subjects as cognitively intact or impaired using these cutoff scores is 0.11. Of the 105 (23.5%) subjects who were classified differently by the two scales, all but two were classified as cognitively impaired by the DRS and cognitively intact by the MMSE (see Fig. 1). Using different MMSE cutoff scores, the kappa statistic peaks (k = 0.33) at a cutoff score of 27 (see Table 2).
Using the DRS total scores and a cutoff of 132 as the “gold standard” to define cognitively impaired subjects, the MMSE at a cutoff of 24 has a low sensitivity (8.0%), but a high specificity (99.4%). Correspondingly, the false negative and positive rates are 92.0% and 0.6%, respectively. The positive and negative predictive values are 81.8% and 73.4%, respectively (see Fig. 1 and Fig. 2, and Table 3).
To further evaluate the MMSE at different cutoff scores with respect to the DRS, we constructed a ROC curve using a DRS total score of 132 as the threshold for being “truly” cognitively impaired (see Fig. 3).
The ROC curve slopes off at an MMSE cutoff of 27. At this cutoff the MMSE sensitivity increases from 8.0% to 37.5%. This increase in sensitivity is in contrast to a minimal decrease in specificity from 99.4% to 91.3%. Finally, the area under the curve (AUC) for the ROC curve scores is 0.729 (Fig. 3).
In a study group of 447 older adults presenting for the treatment of a major depressive episode, the MMSE and DRS scores were significantly correlated. However, the strength of agreement between the DRS and the MMSE at different cutoff scores in classifying subjects as cognitively impaired or intact ranged from slight to fair (Landis and Koch, 1977). The strongest agreement was achieved using an MMSE cutoff score of 27. Against the DRS, the typical MMSE cutoff score of 24 results in a low sensitivity when identifying depressed elders who are cognitively impaired (i.e., most older depressed patients identified as being cognitive impaired when assessed with the DRS are falsely classified as being cognitively intact by the MMSE). A ROC analysis suggests the use of the cutoff score of 27 (i.e., only patients with a MMSE score of 27 or higher should be considered cognitively intact).
To our knowledge, this is the first published analysis of the relationship between MMSE and DRS scores in patients with LLD. These results should be interpreted in the context of similar analyses conducted in samples of healthy or demented subjects (Bobholz and Brandt, 1993; Chan and Siu, 2005; Freidl et al., 1996, 2002; Salmon et al., 1990). Strong correlations between the MMSE and DRS have been reported in patients with Alzheimer’s disease (Pearson r = 0.82) (Salmon et al., 1990), suspected dementia of different etiologies (Pearson r = 0.78) (Bobholz and Brandt, 1993), or various types of dementias (Pearson r = 0.80) (Chan and Siu, 2005). In contrast, a weak correlation has been reported in healthy older subjects (Pearson r = 0.26, p < 0.001) (Freidl et al., 1996). We found a moderate correlation in patients with LLD. The highest correlation observed in patients with dementia may be due to the high correlation between the three MMSE items that are most impaired in these patients (orientation to time and place, and word recall) (Benedict and Brandt, 1992) and the memory subscale of the DRS (Bobholz and Brandt, 1993). In contrast, the most consistently reported deficits in LLD are those in executive function (Butters et al., 2004) which is adequately assessed by the DRS, but not by the MMSE. The correlation between the DRS and MMSE is higher in LLD than in healthy controls. This suggests that despite a lack of a clinically evident cognitive deficit, the cognitive profiles of healthy subjects are more variable than of those with LLD. With relatively more homogeneity, the chances of overlap are greater between the different cognitive domains deemed impaired by the DRS and MMSE, resulting in a higher correlation. Alternatively, a stronger correlation in a clinical population than in healthy controls could be due to the wider distribution of data in a clinical population. The ceiling effect observed with the MMSE and the DRS in healthy controls limits the spread of data points that covary and therefore results in a weak correlation.
Because of the high rate of clinician familiarity with the MMSE, it is used as a screening instrument in many LLD clinical trials as well as clinical settings. Clinical trials that rely only on the MMSE are likely to include a large number of cognitively impaired subjects: with a cutoff score of 24, the MMSE has poor sensitivity with more than 90% of cognitively impaired subjects mislabelled as being cognitively intact. This number can be reduced if a cutoff score of 27 instead of 24 is used (i.e., if those with a MMSE score of 26 and lower, instead of 23 and lower, are considered impaired). This reduction in the false negative rate is at the expense of a minimal reduction in specificity (that remains above 90%). These findings have several implications in clinical trials. Older depressed subjects who are mislabelled as “cognitively intact” may not show response to psychotherapy, because of a cognitive impairment that was missed using a relatively low MMSE cutoff score. A similar cognitive impairment may result in poor compliance and response to pharmacotherapy.
This study has several limitations: first, the subjects had a relatively high level of education. Highly educated patients who score in the normal range of the MMSE may actually have significant cognitive impairment or even dementia (Tombaugh and McIntyre, 1992). Irrespective of their state of depression, this high level of education could have contributed to the low sensitivity of the MMSE as opposed to the DRS. This limitation should also be considered when our findings are applied to depressed elders who may not be as highly educated. A second limitation to the generalizability of our findings is the fairly high baseline MMSE scores. While these high scores could be due to the exclusion of subjects with clinical evidence of dementia, it could also be a corollary to the high educational level. A third limitation is the fact that our sample is overwhelmingly Caucasian. Thus, a different relationship could exist between the MMSE and DRS among patients who are non-Caucasian. Finally, the data have been aggregated across several treatment studies. Notwithstanding that these studies were comparable on most inclusion and exclusion criteria, and that the cognitive data used in this analysis were collected at baseline prior to the study-specific intervention, the following two differences are to be noted: these studies were conducted at different times with a decade separating the first from the last study and the age inclusion criterion was 10 years older in MTLD-II than in the other studies.
In conclusion, researchers who need to screen out cognitively impaired subjects for LLD clinical trials should not rely solely on the MMSE and they should consider using a more comprehensive (and sensitive) instrument. If the MMSE is the sole screening instrument, a cutoff score of 27 should be used to ruleout cognitively impaired subjects. Similarly, in clinical settings, patients with LLD and a MMSE score of 26 or lower should undergo a more comprehensive assessment and their cognition should be monitored. This is of particular importance given that, contrary to many clinicians’ expectation, cognitive impairment in older patients with depression (when assessed with the DRS) has been shown not to resolve with successful treatment (Bhalla et al., 2006; Butters et al., 2000).
We want to thank the patients who participated in the studies reported on in this manuscript and their families.
Role of funding source
Funding for this study was provided by the NIMH: US PHS grants MH 37869, MH 43832, MH 69430, MH 71944, and MH 72947; and the John A Hartford Center of Excellence in Geriatric Psychiatry at Pittsburgh. The NIMH and the John A Hartford Center of Excellence in Geriatric Psychiatry had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.
Mr. Bensasi and Ms. Zmuda contributed to the retrieval of the data. Dr. Butters and Ms. Zmuda supervised the administration of the cognitive tests and Dr. Butters contributed to the design of this analysis. Ms. Houck and Lotz contributed to the data analysis. Ms. Miranda and Dr. Rajji wrote the first draft of this manuscript. Dr. Mulsant contributed to the design and the writing of this report. Dr. Reynolds was the principal investigator in the studies reported on. All authors contributed to and have approved the final manuscript.
✩Supported in part by US PHS Grants MH 37869, MH 43832, MH 69430, MH 71944, and MH 72947; and the John A Hartford Center of Excellence in Geriatric Psychiatry at Pittsburgh.
Conflict of interest statement
The authors have no potential conflict of interest related to the topic of this manuscript to disclose.