|Home | About | Journals | Submit | Contact Us | Français|
The merit of screening for dementia and cognitive impairment has been the subject of recent debate. One of the main limitations in this regard is the lack of robust evidence to support the many screening tests available. Although plentiful in number, few such instruments have been well validated in the populations for which they are intended to be used. In addition, it is likely that “one size does not fit all” in cognitive screening, leading to the development of many specialised tests for particular types of impairment. In this review, we sought to ascertain the number of screening tools currently available, and to examine the evidence for their validity in detecting different diagnoses in a variety of populations. A further consideration was whether each screen elicited indices of a range of cognitive, affective and functional domains or abilities, as such information is a valuable adjunct to simple cut‐off scores. Thirty‐nine screens were identified and discussed with reference to three purposes: brief assessment in the doctor's office; large scale community screening programmes; and identifying profiles of impairment across different cognitive, psychiatric and functional domains/abilities, to guide differential diagnosis and further assessment. A small number of screens rated highly for both validity and content. This review is intended to serve as an evaluative resource, to guide clinicians and researchers in choosing among the wide range of screens which are currently available.
As our population grows older, the issue of screening for dementia and cognitive impairment will become increasingly important. It is now well accepted that the incidence of dementia is on the rise (eg, it has been forecast that the annual number of new cases of Alzheimer's disease (AD) in the US will double by the year 2050).1 Improvements in survival rates following stroke mean that we will see an increase in vascular and post‐stroke dementias: approximately 30% of stroke patients go on to develop a progressive dementia.2,3 At present, a typical UK primary care general practitioner (GP) with a list of 2000 patients sees one or two new cases of dementia each year, and has 10 or 12 existing cases.4 Significant increases in these numbers in the coming decades will present a challenge to GPs, many of whom have difficulty diagnosing dementia at current levels.5 A review by the US Preventive Services Task Force6 reported that between 50% and 66% of patients found to have dementia in primary care samples had no such diagnosis in their medical notes. The authors suggested that “new screening in primary care practice could therefore potentially double the number of patients who receive a diagnosis of dementia”. However, the authors did not advocate global screening for dementia, because of potential adverse factors such as distress caused by false positive results, a paucity of efficacious treatment options and a lack of evidence that early detection significantly improves outcomes for patients. On the other hand, any benefit that can be gained from AD treatment medications is usually most apparent in the earlier stages, and early detection also allows for more careful planning of financial and support systems when the patient is in a position to make their wishes known.7 A consensus is emerging that (in primary care settings at least) screening should be applied to patients aged over 75 years, and to younger patients when there is reason to suspect cognitive impairment.8
A key point in the screening debate is the suitability of currently available screening instruments: few screens have been validated in the populations for which they are intended to be used, many have low accuracy for mild levels of impairment, and there are often demographic biases in score distributions. Although “no single instrument for cognitive screening is suitable for global use”,8 clinician surveys indicate that the Mini‐Mental State Examination (MMSE)9 is overwhelmingly ubiquitous in practice.10 Boustani et al6 recommend further research into alternative brief screening tools before routine screening can be advocated unreservedly.
It could be said that the basic purpose of cognitive screening tests is to indicate likelihood of genuine cognitive impairment, inferred from the relationship of the patient's score to reference norms. A very impaired score (along with supporting history) may lead a physician to make a diagnosis without further investigation; a borderline score may prompt referral for specialist assessment (eg, at a memory clinic), where available. The success of a particular screening tool for this purpose will lie in its statistical robustness—ideally, high sensitivity and specificity along with a high positive predictive value in a population with a relevant base rate of impairment. Sensitivity refers to the proportion of people who have an impairment who are classified by the screen as impaired; specificity refers to the proportion of people who do not have an impairment who are classified by the screen as unimpaired; positive predictive value refers to the proportion of people who are classified by the screen as impaired who really are impaired (this statistic is not always reported in validation papers). Time pressure in the clinical consultation means that this robustness must be achievable in the minimum time possible, using an instrument which is easy to administer. This imperative has led to the development of extremely brief one or two task screens, with an emphasis on predictive performance, often in narrowly delineated patient groups.
While the benefits of the statistical probability–calculation approach are clear (maximising detection while minimising unnecessary investigation of healthy people), several drawbacks are also apparent. Many screening tests overemphasise memory dysfunction, the hallmark of AD, to the neglect of other domains such as language, praxis or executive function, which may be the earliest features of vascular or other non‐Alzheimer dementias.11 Indeed, a review by the American Academy of Neurology12 recommended that memory dysfunction should not be a required part of the diagnosis of dementia. A recent debate in the literature has focused on what has been termed the “alzheimerisation” of dementia,13 and the influence of this on screening tests may mean that important signs of other types of cognitive impairment are missed. A second problem is the emphasis on cut‐off scores rather than profiles of impairment: with some exceptions, most screens produce a single score which is then compared with a standard cut‐point. This runs counter to the preferred practice of most clinicians, who tend to arrive at a diagnosis by an iterative process of creating, rejecting and refining hypotheses over a period of time.14 This would be better served by a symptom oriented approach to assessment,15 where the qualitative information elicited by a screen was at least as important as a numeric score. Screens which elicit information about a wide range of domains (cognitive, functional and affective) would also find use in many settings apart from the doctor's surgery; neuropsychologists, in particular, also use screens, but more to guide further assessment than to examine statistical risk of a diagnosis. The ideal screen would be both statistically robust and qualitatively rich, allowing referring clinicians to better describe a patient's symptom profile, and lending itself to use in a wider range of settings.
While the cognitive screen is not intended to be a substitute for a full neuropsychological assessment (and each has a complementary but distinct role), it should still be possible to obtain indices of key cognitive domains in a brief consultation. Neuropsychological testing has consistently shown that subtypes of dementia are characterised by different patterns of impairment. AD is characterised by impairment of episodic memory (verbal and non‐verbal) at the initial stage, followed by dysfunction in judgement and abstract reasoning, visual construction, verbal fluency and naming.16 Patients with vascular dementia tend to be significantly more impaired than patients with AD on tests of executive function such as verbal fluency, and their level of memory impairment is usually less severe.17 In frontotemporal dementia, letter fluency and executive function are usually worse than in AD, while memory performance is usually better.18 Lewy body dementia is characterised by dysfunction in attention, visuospatial tasks, letter fluency, mental tracking and abstract reasoning.19 Sensitivity to all types of dementia would undoubtedly increase if a screening instrument covered the cognitive domains known to be impaired in the various types of dementia, rather than having a restricted focus on memory impairment. Based on established neuropsychological profiles in different dementias, there are six core domains or abilities that should be covered by a comprehensive screening instrument: attention/working memory, new verbal learning and recall, expressive language, visual construction, executive function and abstract reasoning.
The aims of this review were to identify currently available cognitive screening tests and consider their suitability for three main purposes:
This paper is intended to serve as both an information resource about available screens, as well as an evaluative critique of those screens for the purposes described above.
Screens were identified by searching electronic databases (Entrez‐PubMed, CINAHL, PsycINFO and IngentaConnect), using combinations of the following terms: “dementia”, “Alzheimer”, “cognitive impairment”, “post stroke”, “screen”, “primary care” and “community”. Individual test names were also used as search terms. In addition, the reference lists of papers yielded were manually searched.
Screening tests were included if they were designed to screen for cognitive impairment or had been used for that purpose, had an administration time of less than 20 min and were available in English. Screens could be administered directly to patients, or be partly or fully informant rated. Individual papers relating to each screen were included if they: were the original paper presenting the content of the screen; presented data relating to the screening aspects of the test (as opposed to non‐relevant aspects such as factor structure); presented data relating to the performance of the test as it stands alone (ie, validity statistics based on scores from combined sources (screen test plus functional status, for example) were not considered). Validity studies must also have employed acceptable “gold standard” diagnostic criteria (ie, based on international diagnostic guidelines or clinical judgement following a full assessment battery); the use of another screening test as the gold standard was not acceptable. Further criteria were applied when selecting screens for inclusion in intablestables 2 and 33 below, and are detailed in the relevant parts of the results section below.
Data were extracted for each screen by one author (BC), according to a semi‐structured pro‐forma encompassing reliability statistics, sample types, validity statistics by type of diagnosis and pertinent comments or criticisms contained in individual papers. A list of cognitive, psychiatric and functional domains/abilities covered by each test was made independently by two of the authors (BC and BO'N), and a final list was agreed upon by consensus, with a third author (JJE) consulted if necessary.
Thirty‐nine tests were identified which met selection criteria. A further three tests did not meet the criteria: the Community Screening Interview for Dementia (CSI ‘D')20 and the Cambridge Cognitive Examination (CAMCOG),21 as each takes more than 20 min to administer, and the Mental Alternation Test,22 as it has not been standardised against an acceptable gold standard. Table 11 presents the names, source references, administration times and reliability coefficients (where published) of the 39 tests included. Of individual published papers pertaining to the 39 screens examined, a total of eight were excluded because of use of an inappropriate gold standard, or because other information was added to the screen score when calculating validity statistics, or because they did not directly examine screen validity. The remainder are referenced in intablestables 2 and 33 below.
Table 22 shows how a selected subset of the tests performed on the key characteristics deemed important for this purpose (ie, good sensitivity/specificity balance, validation in samples from varied sources (especially primary care or community) with varied illness aetiologies and brevity). Coverage of the six key cognitive abilities is also detailed. Two of the six identified key cognitive abilities were attention/working memory and executive function; these were represented by digit span/other mental tracking and verbal fluency, respectively. Tests which are wholly informant rated were excluded from table 22,, as proxy rating is often not feasible or optimal in the medical consultation setting, because of the absence of an informant, concerns about confidentiality on the part of the patient or inability of informants to give a reliable history. Telephone administered screens were also excluded. Of the remainder, tests were selected for inclusion in table 22 if their reported sensitivity and specificity were high (above 85%) for all dementia types together or for more than one particular subtype alone, and/or they covered at least three key domains. Cross comparison of the validity and sample source columns gives an indication of the varied performance of screens in different samples.
As table 2 shows, the 3MS and the CASI both perform well in this context, eliciting information about key cognitive abilities, with robust validity in non‐selected samples. The MMSE is probably the most widely utilised screen, although it does not cover all key abilities. The ACE‐R and DemTect are included in table 22 as potentially useful in community/primary care samples, although they have not yet been validated in this way.
Criticisms have been noted in the literature regarding the characteristics of some of these tests (eg, age and/or education biases (MMSE6 GPCOG75), low specificity in some unselected samples (CASI)76 wide variation in scores without any accompanying clinical change (3MS)25 and low positive predictive value (AMT)).77
Table 33 shows tests suitable for this purpose (ie, those which can be administered indirectly (eg, by telephone or by informant proxy) and which have a good sensitivity/specificity balance (>85%) for all dementia types together or for more than one particular subtype alone). Some of the shorter tests described in table 22 could also be considered for this purpose, if the screening campaign is to be conducted directly (eg, the CAST is a self‐report paper and pencil test and could be administered by post).
Shortcomings are evident for some of these tests, or the ways in which they have been validated (eg, variable test–retest reliability across individual items (IQCODE‐SF)48; gold standard diagnosis based on case note review (MCAS)49; and non‐comparable assessment procedures for patients (informant rated) and controls (self‐rated; SMQ)).60
TablesTables 4 and 55 describe the content of all tests according to a comprehensive checklist of abilities/domains assessed. Wholly informant rated tests and those administered by telephone are not included in these tables, as they are unlikely to be used alone to guide subsequent direct testing of a patient. Two of the tests reviewed (CDT and VFC) are listed as cognitive abilities in and of themselves, due to the range of impairments that might contribute to poor performance, and a lack of clear consensus on the sub‐abilities tapped. The “number transcoding” task has also been treated as an ability in itself, for the same reasons. The ability named “verbal fluency” refers specifically to timed assessment of word fluency for letters or categories, rather than the broader definition sometimes used in the literature, which may refer to conversational fluency, etc.
A test was considered to cover a named ability if a subscore was assigned to an item which assessed that ability (eg, while “receptive language” is certainly an element of any verbally administered test, it was only marked as present on the table if it independently received a score which contributed to the total). Direct assessment was also required (ie, testing of verbal recall by presentation of new information in the assessment session rather than by indirect self‐report). The six key abilities discussed above are shown in table 44;; these are encompassed in the nine columns in table 44 (digit span or other mental tracking; verbal fluency; reasoning/judgment; expressive language; visual construction; immediate free verbal recall or delayed free verbal recall or cued verbal recall). Recognition memory is not included in the key abilities as it may be intact in dementia, despite impairment of free recall. It should also be noted that the length of the delay in delayed recall tasks varies across screens. However, all delayed recall responses are elicited after an intervening period of at least a couple of minutes, during which an unrelated task is performed, precluding rehearsal. It seems likely that longer delays would be more sensitive to impairment, although it is unclear at what point in time information passes from the episodic buffer to long term memory.109
The 3MS and CASI are the only tests which cover all six key abilities. Where tests cover four or five of the six abilities (ACE‐R, SASSI, MMSE, NCSE, STMS), reasoning/judgment and verbal fluency are most frequently absent. Memory is not directly assessed by several tests, even those covering a number of other abilities (CAST, SPMSQ). The CAST does, however, elicit indirect information about memory problems in its self‐report questionnaire section. Conversely, other screens test memory and no other ability (3WR, HVLT, MIS); these are included in the present review because they have been used in the literature as screening tools, yet they would clearly have less utility in guiding other specific areas for further, broad ranging cognitive testing.
Other screens were included in this review but fit less easily with the three main uses described above. Some tests were designed for very specific purposes (eg, the R‐CAMCOG for screening post‐CVA cognitive impairment, and the ABCS for mild cognitive impairment). These tests have been developed to overcome the limitations of existing screens in particular patient groups, and reflect the point that “one size fits all” screens can be of limited clinical utility in specialist settings.
The aims of this paper were to identify and evaluate available screening instruments for cognitive impairment. Screens have been presented in tables according to different purposes, forming a quick reference resource to assist clinicians and researchers in making choices. We have also evaluated screens according to the three main purposes outlined, and have drawn attention to criticisms in the literature.
The first purpose for which we considered the screens was as brief assessment tools in the clinical setting, particularly in primary care. This is probably the most common way in which screens are used, and is the focus of policy consensus statements,6,7,8 which have highlighted the dearth of evidence in favour of routine screening. It is clear from the present review that few of the 39 tests identified have been validated in the types of unselected primary care or community based samples which would be representative of target populations for screening efforts. It is interesting that the screens which rate highest with regard to validation methods and statistics, as well as coverage of key cognitive abilities, are those which expand on the content of the MMSE and from which an MMSE score can be derived (3MS, CASI, SASSI). The ACE‐R also expands on the MMSE but has yet to be validated in non‐specialist settings. It is likely that these screens will prove easily acceptable to clinicians already familiar with the MMSE. There does not appear to be a direct relationship between number of key cognitive abilities covered and the validity statistics; however, the usefulness of having broader coverage lies more in the qualitative information it adds to the basic score. Despite an understandable drive towards ultra‐brief tests which can be used in a typically time constrained GP consultation, an administration time of more than 10 min appears to be an unavoidable cost of achieving sufficiently robust statistical performance while covering key domains.
The second purpose considered was large scale community screening programmes. Informant rated scales, or assessments of patients which can be carried out by telephone or post, formed the main focus of this section. However, some community screening initiatives (eg, memory awareness days in clinics or community centres) could be conducted face to face using the shorter of the instruments detailed in table 22.. Of the informant scales, the IQCODE (in its original and abbreviated versions) is the most widely used, although it has variable performance across reported studies. The SMQ shows promise as a brief and accurate screen, meriting further study.
Coverage of various cognitive, psychiatric and functional abilities/domains was examined for all 39 screens. Tests varied in coverage from single domain tasks to wide ranging mini‐batteries. Clearly, if the clinician's aim is to elicit useful qualitative and quantitative information about the profile of a patient's presenting symptoms, then wider ranging screens will be of greater value. Secondary or tertiary care clinicians (working in psychiatric, neuropsychological or neurological settings, for example) are likely to be more concerned with differential diagnosis or with further investigation of mild or unusual presentations, situations in which clinical judgement will take precedence over composite scores and cut‐points; scales with broader coverage will therefore be sought in preference to brief assessment tools. It is possible to achieve a balance between these different uses, however, as with scales such as the 3MS, CASI, SASSI and ACE‐R.
An important point to note is that although a number of screens have been validated in particular subtypes of dementia, this does not mean that they are necessarily useful for differential diagnosis. Most sensitivity and specificity statistics for the various subtypes of dementia were calculated against normal controls, rather than other types of dementia. This means that a screen which is particularly good at picking up AD, for example, will not in fact be useful clinically unless it is also good at picking up non‐AD impairments. An effective screen is one which can firstly identify impairment of any aetiology, and secondly provide an indication as to the most likely aetiology in a particular case. For the former aim, it matters most that a screen has demonstrated good validity in samples of mixed aetiology to detect any type of impairment (ie, the “all dementia” column in intablestables 2 and 33);); for the latter, it matters not that a screen can distinguish AD from normal controls (for example), but that it can distinguish AD from non‐AD aetiologies. The ACE‐R is notable for having been specifically validated with differential diagnosis in mind: the patient's individual profile across cognitive domains can be used to estimate the likelihood that their impairment is due to AD versus frontotemporal dementia, providing a valuable adjunct to their simple overall score. This further underscores the importance of covering a wide range of cognitive abilities when designing a screen (and, as mentioned in the introduction, fits better with the preferred working methods of most clinicians). Until other screens are also examined for effectiveness in distinguishing between different aetiologies, anything other than the “all dementia” calculations is clinically redundant.
If one considers the commonest application of screening (ie, brief direct assessment of patients, with the aim of firstly identifying any impairment and secondly providing an indication of the cause of that impairment), then the screens which are likely to be most useful are those which have good sensitivity and specificity for all dementia types in unselected populations, and which elicit information about key cognitive abilities, which can then be compared with neuropsychological profiles in different types of dementia. Table 22 shows that the most promising candidates are the 3MS, CASI, MMSE, SASSI, STMS and ACE‐R. The STMS is notably shorter than the others and so may appeal to the most time pressed clinicians. The 3MS and CASI are the only screens which have been validated in community samples and which cover all the key cognitive abilities, and so are good candidates for those with more time available (although note the shortcomings mentioned in the text accompanying table 22 above). The ACE‐R has not yet been validated in community samples, but its focus on differential diagnosis profiles may be particularly useful for clinicians in secondary/tertiary practice, to guide further investigations.
The specific criticisms described in the results section regarding some of the screens are indicative of common shortcomings in test validation research. Few screens have been validated in unselected samples, and those that have are frequently subject to differential gold standard procedures for patient and control groups. It is rare for all participants who screen negative in large community samples to undergo the same type of confirmatory assessment as those with positive screens. This leads to verification bias, whereby sensitivity calculations are overestimated and specificity underestimated.110 Applicability to real life situations is further compromised by restrictive sample recruitment criteria which often exclude those with a history of substance use, neurological and psychiatric disorder, head injury and other common comorbidity. In addition, as table 11 shows, many authors have not published reliability statistics for their screens. Adequate reliability (internal, test–retest and inter‐rater) is a prerequisite for robust validity, and should be evaluated and reported routinely. These factors should be borne in mind when evaluating all of the screens described here.
In our endeavour to present a comprehensive overview of as many screens as possible, it was not feasible to conduct a fully rigorous quality rating of each study from which we extracted the data presented here. We have, however, applied inclusion criteria as described in the methods section, and have noted critical points regarding certain screens and studies. This review is intended to serve as a resource and starting point from which interested readers can further investigate particular screens for their own requirements.
In consideration of the various purposes for which cognitive impairment screens can be used, it is almost certainly futile to attempt to develop screens that fit all needs. Out of 39 screens identified, we have emphasised a small subset that, in our opinion, have particular strengths, but ultimately there is no such thing as the perfect screen for all purposes. Clinicians should move away from the tendency to become over reliant on one screen (usually the MMSE), and take advantage of the continually evolving (and dauntingly extensive) range of more specialised tools for different situations. This task would be made easier if researchers were to focus on refining and adapting existing screens, with closer consideration of the theoretical basis of symptom profiles in different diagnoses, and specific examination of differential diagnosis within impaired samples. Regardless of policy positions on the merits or otherwise of routine cognitive screening, there is a wealth of potential benefit in the thoughtful application of existing screens in clinical practice.
AD - Alzheimer's disease
CAMCOG - Cambridge Cognitive Examination
GP - general practitioner
MMSE - Mini‐Mental State Examination
Competing interests: None.