|Home | About | Journals | Submit | Contact Us | Français|
Dementia of the Alzheimer’s type (DAT) is a major public health threat in developed countries where longevity has been extended to the eighth decade of life. Estimates of prevalence and incidence ofDAT vary with what is measured, be it change from a baseline cognitive state or a clinical diagnostic endpoint, such as Alzheimer’s disease. Judgment of what is psychometrically “normal” at the age of 80 years implicitly condones a decline from what is normal at the age of 30. However, because cognitive aging is very heterogeneous, it is reasonable to ask “Is ‘normal for age’ good enough to screen forDAT or its earlier precursors of cognitive impairment?” Cost containment and accessibility of ascertainment methods are enhanced by well-validated and reliable methods such as screening for cognitive impairment by telephone interviews. However, focused assessment of episodic memory, the key symptom associated with DAT, might be more effective at distinguishing normal from abnormal cognitive aging trajectories. Alternatively, the futuristic “Smart Home,” outfitted with unobtrusive sensors and data storage devices, permits the moment-to-moment recording of activities so that changes that constitute risk for DAT can be identified before the emergence of symptoms.
With the extension of longevity in the United States and elsewhere, dementia with aging has become a major public health threat. In epidemiological studies of dementia and precursor conditions, what is measured has a large effect on reports of prevalence and incidence. What is measured also affects the cost of the research, especially with regard to case ascertainment. This article, which complements the one by Weir et al , focuses on the choice of desired endpoints of ascertainment and how to measure cognition.
Specifically, Evans (section 2) weighs the arguments both for assessing cognitive change against a personal baseline and for identifying markers of clinical diagnoses such as Alzheimer’s disease (AD) or mild cognitive impairment (MCI). Because dementia of the Alzheimer’s type (DAT) is not an acute event and pathophysiological changes in the brain can precede overt symptoms by many years [2–9], it would be preferable to detect accelerated cognitive decline before a frank diagnosis so that preventative measures can be taken. However, clinical diagnoses are practical and easier to conceptualize in the healthcare setting. Grodstein (section 3) points out that cost containment and accessibility are enhanced by wellvalidated and reliable telephone interviews that provide a global cognitive screen. Loewenstein (section 4), by contrast, sees value in the focused assessment of episodic memory and its components, which targets the key symptom associated with AD. This approach, especially if tests are shortened, might prove to be more effective than other cognitive measurements to distinguish normal from diseased aging. Kaye (section 5) provides a view toward the future fueled by technological advances, as he espouses a model of a scalable “Smart Home.” Unobtrusive sensors and data storage devices permit the moment-to-moment recording of activities so that changes that might constitute a risk for cognitive decline can be identified before the emergence of symptoms. Finally, Weintraub (section 6) provides a synthesis of all of this material and suggests that our views of what is “normal for age” are challenged as we struggle to face the pending healthcare crisis of the next several decades.
This section focuses on using directly measured individual cognitive decline as a study outcome, and a contrast is made with using AD and MCI diagnostic endpoints as research study outcomes. Both approaches are complementary, and neither is inherently superior to the other. Rather, the two approaches are fully compatible; each has strengths, and coordinated use of both may be preferable to relying on either one alone. A central goal of research in this area is to facilitate prevention by identifying at-risk subjects and potential risk factors earlier when the disease process is more susceptible to intervention. Focus on the intermediate syndrome of MCI is of undoubted value in this regard. Rigorous study of the entire continuum of directly measured individual cognitive decline may well be a logical next step.
Perhaps the greatest conceptual advantage of directly measuring individual decline in cognitive function as an outcome is the potential to identify risk factors early, facilitating prevention of progression to MCI and AD. As the goal of preventing fully manifest AD is increasingly seen as vitally important, the need to recognize the early, or preclinical, stages has been strongly emphasized [10,11]. Other conceptual advantages are (1) the greater validity of assessing decline in cognition directly by measuring it repeatedly rather than by attempting to infer cognitive decline indirectly from a single measurement without previous baseline, and (2) the potential reduction in bias. Practical advantages include greater statistical power and reduced cost of data collection.
In studies of cognitive impairment or dementia, the object of interest is each subject’s loss of cognition from a former state, and not that subject’s level of cognition as compared with an absolute, “normative” value. Accurate inference of the degree of cognitive loss from measurement of cognition at a single point (cross-sectional assessment) is challenging because cognitive abilities vary greatly among individuals and there is great random variation in measurement of cognition. Even assessment for incident dementia is typically cross-sectional because it usually consists of making a diagnosis of dementia among subjects who previously did not meet criteria for the condition. Direct measurement of individual change in cognition for each subject is preferable, but requires that cognitive function be assessed sequentially at a minimum of three time points and that appropriate statistical models be used to adequately identify change from an initial cognitive level. Typically, mixed models (random-effects models) are used .
Another conceptual advantage of directly measured individual change in cognitive function over a diagnostic endpoint as an outcome is its potential lack of bias. This advantage may be especially important in assessing whether apparent individual characteristics such as race/ethnicity or gender are related to dementia or cognition. Examiners cannot be effectively masked to these characteristics, which are typically obvious. The possibility of subtle, but important examiner bias in reaching a diagnosis is exceedingly difficult to eliminate. It is unlikely to be substantially reduced by providing information from the examiner to a panel of expert clinicians who actually assign a diagnosis on the basis of this information. The possibility of such bias affecting the analytic result is especially strong when the characteristic of interest coincides with a difference in average lifelong level of cognitive performance. For example, meaningful assessment of the possibility of differences in risk for dementia or cognitive loss between those belonging to African American race/ethnicity and Americans belonging to white race/ethnicity has been challenging because African Americans usually score lower on tests of cognition, most likely for cultural and educational reasons [13–17]. Race/ethnicity, education, and other factors distorting measurement of cross-sectional cognitive performance most likely do not affect directly measured individual change in cognitive function at older ages. In older age, most educational and cultural factors that affect measurement of cognition are fairly stable, and for each person, the factors distorting measurement of cognition at one time point are likely to be similar to those affecting subsequent time points. To the extent that the distorting factors are constant over time, they will not affect the measurement of individual changes in performance between these points.
Greater statistical power is achievable because analyses of directly measured change in cognitive function can include the entire spectrum of cognition. All subjects across this spectrum with the requisite measurements will have a value for cognitive decline. However, with studies of clinical syndromes, such as AD or MCI, the limiting factor determining statistical power is typically the smaller number of subjects developing these conditions. Further, in most studies, use of directly measured individual decline in cognitive function as an outcome results in reduced costs. Sequential administration of cognitive tests by trained technicians as required to measure this outcome typically is less costly as compared with clinical evaluation by skilled clinicians, often specialists, as required for diagnosis of the clinical syndromes of AD and MCI.
Conversely, using the clinical syndromes of AD and MCI as outcomes also has advantages. Because many criteria for AD diagnosis include functional loss, a sense of the burden of the disease is typically inherent in the disease definition. In diagnosing AD, a clinician judges that the subjects in the study have functional impairment from the condition, although the level of impairment required for diagnosis may vary among different examiners. An advantage is that incorporating functional impairment emphasizes the public health burden of AD. A disadvantage is that it becomes difficult to rigorously study functional impairment as an AD consequence in cases when it is inherent in the disease definition . From the perspective of AD consequences, including functional impairment as a disease criterion is a self-fulfilling prophecy; one must have the consequence of functional impairment to have a clinical diagnosis of probable AD. Further, communications to general clinical audiences are facilitated if they can be summarized in terms of particular clinical syndromes. Change in cognitive function is difficult to summarize easily in terms that are clearly understandable intuitively. Decline in cognition also does not provide specificity as to the conditions responsible for the decline; the extent to which the observed decline in cognition is because of AD, vascular dementia, another condition, or a combination of conditions is not certain.
These approaches to studying decline in cognition and the conditions responsible seem strongly complementary. The major advantages of using directly measured individual change in cognitive function as an outcome in studies of cognitive loss or dementia—especially reduced bias, direct assessment of cognitive decline rather than inference of decline from a single measurement, and increased statistical power—are essentially most applicable to analytic studies, those evaluating the relations between a putative risk factor and cognitive decline. In contrast, the advantages of studying the clinical syndromes of AD and MCI, especially ease of communication, apply most directly to either more descriptive studies, such as those seeking to measure the prevalence or incidence of AD or MCI in a population or healthcare setting, or to studies of the consequences of AD or MCI such as behaviors or healthcare utilization.
Conclusions regarding either risk factors or consequences are more secure when the results of the two approaches coincide in direction and, to the extent it can be accurately assessed, magnitude. However, if the results found with one approach differ substantially from those found with the other, an explanation must be sought and may well be informative.
At a population level, average differences in cognition or cognitive change among individuals are often fairly subtle, thus a large sample size ensures precise estimates of differences. This section addresses the measurement of cognitive function in very large populations, at very low cost. The main focus is the telephone cognitive battery used for repeated measures of cognitive function in approximately 20,000 subjects of the Nurses’ Health Study (NHS). The NHS cognitive assessment battery will be described briefly and its properties considered. Finally, the strengths and weaknesses of using a telephone instrument to measure cognition will be weighed. The NHS battery is made up of tests of global function, episodic memory, category fluency, working memory, and attention. Specifically, the battery includes the Telephone Interview for Cognitive Status (TICS) , an adapted telephone version of the Mini-Mental State Examination. Delayed recall of the TICS 10-word list, which constitutes the TICS-modified, is also tested. In addition, the battery incorporates immediate and delayed recall trials of the East Boston Memory Test , to assess paragraph recall. Thus, there are several measures of verbal anterograde memory impairment, which is a strong predictor of AD. Category fluency is tested by having the subject name as many animals as possible in one minute. An oral adaptation of the Trail Making Test, Parts A and B, which has been well-validated and compares favorably with in-person administration of the original Trail Making test, is administered . Finally, subjects are given the Digit Span Backwards Test. With the inclusion of the Trail Making Test, the NHS battery covers six of the seven domains and/or functions (excluding naming) in the Uniform Dataset of the National Institute on Aging Alzheimer Disease Centers’ neuropsychological battery .
The NHS battery was validated against a more extensive battery used in a sample of 61 subjects from the Religious Orders Study (ROS) who received both batteries. The ROS battery, administered in person, comprises 21 cognitive tests . For each participant, a global score (average performance across all the tests) was obtained for each battery. The two global scores had a correlation value of 0.81. This finding suggests that cognitive testing by telephone captures cognitive performance in a manner similar to face-to-face testing, and that a short battery can effectively assess overall cognition.
A clinical validation study compared findings from the NHS battery with the clinical diagnosis of dementia (F.D. Grodstein, unpublished, 2005). Cognitive function was assessed by telephone in approximately 80 women. Later, informant interviews were conducted using the telephone Dementia Questionnaire . With information from the informant interviews, a neurologist from the Massachusetts Alzheimer Disease Research Center determined dementia status in the 80 women. Those who scored ,31 points on the TICS had an eightfold increased risk of dementia diagnosis. For scores on the verbal memory tests, there was a 12-fold increased risk of dementia diagnosis for women with a poor episodic-memory score as compared with those who performed well. All of these findings were statistically significant. This study is a strong clinical validation of the effectiveness of telephone tests in evaluating cognition.
Finally, perhaps the most important validation of telephone cognitive testing is the ability to detect established risk factors for cognitive impairment or decline. One of the most established risk factors for cognitive impairment is apolipoprotein E genotype. Approximately 4000 women in the NHS were genotyped . Cognitive impairment was defined as a score falling within the worst 10% of the distribution of cognitive scores. For women with an 34 allele, the odds ratio was 1.5, or a 50% higher risk of cognitive impairment. For those who were homozygous for 34, the odds ratio was 2.6, with a highly significant trend of increased risk of cognitive impairment with increased number of 34 alleles. These results are similar to those that would be expected in any study that assessed cognition in person. To further test ability to measure specific cognitive domains, the relationship between the 34 allele and verbal memory was considered. Mean scores across four tests of verbal memory in women who had an 34 allele were compared with those from women who did not ; women with an 34 allele scored significantly worse than those with no 34 allele, by 0.15 standard units. For comparison, ROS investigators conducted a similar analysis using their verbal memory tests from an in-person interview . They found a mean difference of 20.1 standard units in verbal memory score for women with versus those without an 34 allele—almost identical to the NHS findings. Again, this suggests that the NHS telephone cognitive battery and in-person cognitive assessments yield equivalent findings for risk factor relationships.
Other groups have also addressed the utility of a brief telephone cognitive battery for assessing cognition. For example, Wilson and Bennett considered one that tested episodic memory, working memory, semantic memory, and global cognitive function in 996 participants of the ROS . They found very strong relationships between apolipoprotein E genotype and each cognitive system, and between each of these cognitive systems and AD pathology. This study also demonstrated the validity of a brief cognitive battery that can be administered by telephone.
Although validity is critical, test–retest reliability is important as well. The TICS was administered twice to the same group of 50 women in the NHS, within a gap of 31 days. There was a correlation of 0.7 in the scores, within acceptable standards for reliability measures in cognitive tests. This reliability test was performed in generally highfunctioning subjects; therefore, separate data were not available for those with MCI or dementia. However, Brandt and Folstein  have reported test–retest reliability of 0.97 for the TICS in 100 subjects with AD.
Because administering cognitive tests to large groups of subjects requires many telephone interviewers, interinterviewer scoring reliability was also tested. Among a large team of telephone interviewers, there can be some level of judgment in deciding on how to score a given response within a cognitive test, and it is important that the scores are similarly scored regardless of the interviewer. In one study (F.D. Grodstein, unpublished, 1998), 10 telephone interviewers scored the same set of five cognitive interviews. A very high correlation of test scores across interviewers was found, .0.95 for each cognitive test. One important aspect of achieving high inter-interviewer reliability is to maintain strict and conservative scoring rules, diminishing the possibility of subjective judgment in the scoring.
In conclusion, an important advantage of telephone cognitive testing is its ease of administration and low cost. Further benefits include high participation because participants do not need to travel or to set aside significant amounts of time to participate. High participation enhances the validity of results, and also increases generalizability, such that larger segments of the population can participate instead of only a select group that is willing or able to participate in extensive in-person examinations.
There are disadvantages as well. The telephone method does not allow for certain types of cognitive tests. For example, visuospatial perception and naming, both vulnerable to early changes associated with DAT, cannot be tested. In addition, a potential disadvantage of telephoning is the possibility of hearing impairment interfering with performance, especially in the older subjects and especially in men, in whom hearing impairment can be more common than in women . Some encouraging results were obtained in a pilot study comparing telephone cognitive tests in men aged ≥85 years from the Health Professionals Follow-up Study with scores obtained by separate but similar cohorts on tests administered in person (F.D. Grodstein, unpublished, 2002). First, there was 94% participation in the cognitive assessment, which is extremely high for this age group. Second, regarding the validity of performance, scores on the telephone tests were similar to those obtained from in-person testing of the oldest old. For example, on the test of category fluency (animal naming), the mean score among the men from the Health Professionals Follow-up Study was 15.6, as compared with the mean of 15.1 for the Consortium to Establish a Registry for Alzheimer’s Disease neuropsychological battery norm for the oldest-old men and women . In the Digit Span Backwards Test, the mean was 5.5 in the Health Professionals Follow-up Study, versus 5.6 in men and women from the 901 Study .
The similarity of results between these men who were given a telephone cognitive battery and comparable subjects who were given in-person testing not only suggests that hearing ability may not meaningfully interfere with performance on the telephone, but also indicates that other issues (e.g., writing down word lists, background distractions) are not important limitations because one would not expect to find equivalent performance if there were substantial problems with administration of the telephone tests. In addition, in this same pilot study from the Health Professionals Follow-up Study, hearing ability was evaluated; as expected, 59% of participants reported at least some hearing problems. However, this did not affect participation, nor did it seem to affect performance. Interviewers asked each participant about their hearing ability and adjusted the volume of their voices on the basis of the subjects’ needs; importantly, 11 of 15 items in the telephone cognitive battery can be repeated if the participant has difficulty in hearing the question. On the TICS, the mean score was 31.2 among the men with hearing problems and 31.4 among those without; on category fluency, the mean was 15.8 among those with hearing problems and 15.2 among those without; on the 10-word delayed recall, the mean was 3.3 among the men with hearing problems and 3.5 among those without; and on the East Boston Memory Test, the mean was 8.7 among the men with hearing problems and 8.2 among those without. Therefore, these data do not indicate that hearing ability severely influences results on a telephone cognitive battery.
In brief, a telephone instrument to measure cognitive function can provide meaningful information regarding cognitive status in large populations, at low cost and with good validity and test-retest reliability.
Researchers interested in the assessment of cognitive impairment in the elderly population face substantial challenges. There is considerable interest in using the most sophisticated methods to investigate subtle neuropsychological impairments associated with the earliest stages of neurodegenerative disease . However, at the same time, the complexity of current research questions and the need to capture many different aspects of disease, from clinical signs  to biomarkers , have led to pressure to use shorter cognitive batteries to allow for the inclusion of additional types of measures. In practical terms, this has often meant the simple adaptation of many instruments that have long been used in dementia research, without evidence that they have the requisite psychometric properties or provide adequate sensitivity and/or specificity for conditions such as MCI or even more subtle forms of cognitive impairment.
Neuropsychological performance for any individual is determined by a complex array of variables including but not limited to cognitive reserve, premorbid strengths and weaknesses, engagement in testing, attentional capacity, sensory acuity, and fatigue, to name a few. Because any single cognitive measure can be affected by any one or a combination of these factors, neuropsychologists often examine an individual’s pattern of performance across several different indices, comparing that performance with normative data, rather than relying on a single test in isolation . Current trends, however, are to use limited numbers of measures and to use cut-offs for performance on the basis of the number of standard deviations that, for an individual, fall below a reference point set for patients of similar age, education, and cultural background .
The most sophisticated clinical and epidemiological studies of cognitive impairment in the elderly population assess different aspects of neuropsychological function at baseline and ensure that their measurements have sufficient range to track changes in different cognitive domains longitudinally. The optimal cognitive battery may, at minimum, include tests of (1) learning and retentive memory, (2) executive function, (3) language, and (4) visuospatial skills . It is also desirable to have measures of both attention and processing speed because these are often impaired by a variety of brain disorders and, thus, serve as a mjore general markerof impairment.
Because memory dysfunction is a hallmark feature of AD and is present in many neurodegenerative disorders, this section will focus on selecting memory tests for studies of MCI and the trade-offs between efficiency and comprehensiveness of a cognitive battery.
Many neuropsychologists favor verbal list learning tasks for clinical work, longitudinal investigations of aging, clinical trials, and epidemiological studies. The advantage of this type of memory paradigm is that (1) there is more than one presentation of the to-be-remembered stimuli, thus minimizing the effect of poor attention on performance; (2) learning curves over repeated trials can be examined; (3) the processes of encoding, acquisition, and storage of to-beremembered information as well as specific types of memory can be distinguished; and (4) many measures also have parallel and alternate forms.
In choosing a list learning test, the number of to-be-remembered items should be sufficient to be challenging for high-functioning subjects (12 to 16 items is optimal). In populations where there may be cultural and/or language issues, alternative list learning tasks, such as, for example, using actual objects and making the test more challenging by requiring the subject to alternate between learning trials and distractor tasks may be beneficial .
There are many other memory tasks that are also well justified in the cognitive assessment of older adults including, but not limited to, paired associates learning, immediate and delayed recall of simple and increasingly complex geometric designs, spatial object location, face–name association, and paragraph recall.
For neuropsychologists, the most widely used paragraph recall test is the Logical Memory subtest on the Wechsler Memory Scale [37,38], which assesses immediate and delayed recall for two passages, each containing 25 elements. In one newer version of the test, the second passage is repeated to assess learning over more than one trial. Although Logical Memory takes no longer than 10 minutes to administer, the pressures to abbreviate the testing have been sufficiently powerful, such that a standard in the field today is to use a single paragraph to assess memory function. In fact, delayed recall for a single story passage is used in several significant national clinical research initiatives as the only memory measure in addition to a Clinical Dementia Rating  clinical score to derive a diagnosis of amnestic MCI. However, a potential problem with this approach is that factors such as auditory difficulties or inattention can greatly affect performance on this cognitive test. Conversely, persons with seemingly intact story memory, especially those with above-average premorbid function, might still have underlying difficulties with other aspects of memory. The solution to this problem is to have tests that are not as prone to attentional effects, have repeated trials, and are more sensitive to early, mild impairment. It is likely that these issues will only become more acute as investigators attempt to identify MCI at even earlier stages.
In their proposed revision of research criteria for AD, Dubois et al  contend that using paradigms that increase encoding specificity at acquisition and that assess failure to benefit from cueing at recall are superior to episodic memory tests using free recall alone. Buschke et al , using the Selective Reminding procedure, first observed that probing with the same semantic cues used for learning and retrieval was superior to testing free recall alone and was also superior to paired associate learning and Logical Memory in distinguishing mild dementia from normal aging. More recently, Buschke developed the Memory Capacity Test, which requires an individual to learn 16 items from different semantic categories with cued recall and then to learn 16 new items using the same semantic categories with cued recall. Cued recall for the second list and 30-minute delay of the second list of the Memory Capacity Test was particularly useful in identifying amyloid-positive normal elderly individuals with high cognitive reserve who were at greater risk for AD because of their increased amyloid load [41,42].
Loewenstein et al have focused on determining the extent to which vulnerability to semantic interference may identify cases of early AD . The modified three-trial Fuld Object Memory Evaluation paradigm was extended by having subjects recall a second list of items that are all semantically similar to the original to-be-remembered targets (i.e., ring versus bracelet, key versus lock) . Reduced recall for the second list as compared with the first list was thought to occur because of competition from the previously presented targets on the first list (proactive interference), whereas reduced recall for the first list after recall of the second list was thought to be related to retroactive interference. The Semantic Interference Test (SIT)  evidenced high sensitivity and specificity in distinguishing normal elderly subjects from those with MCI and early dementia. Moreover, vulnerability to proactive interference was most associated with those subjects with MCI who progressed to dementia over a period of 2 to 3 years . Recently, the SIT has been validated for use in epidemiological investigations .
Studies by Buschke et al and Loewenstein et al focus on semantic information processing deficits that may be specific for early AD and support the need to develop memory tests that optimize attention to the to-be-remembered stimuli and emphasize lack of encoding specificity.
For epidemiological studies, briefer memory tests may be required. The Florida Brief Memory Screen (FBMS) , which takes approximately 3 to 4 minutes to administer, was recently developed to identify those in need of further evaluation. The primary objective is for the subject to attend to and register the to-be-remembered targets. Free recall is then assessed. The FBMS has been found to be highly reliable and correctly classify 100% of patients with AD, 82.6% of individuals with amnestic MCI, and 87.5% of normal elderly controls without cognitive impairment . Importantly, the FBMS scores were generally independent of age, education, and primary language (English versus Spanish). Although preliminary, these results suggest that a one-trial memory measure may have potential as a screening measure for amnestic MCI and dementia. For aforementioned reasons, the FBMS is not viewed as being sufficiently broad or having an adequate number of trials to be used as a stand-alone memory measure any more than the three words of the Mini-Mental State Examination  or the five words of the Montreal Cognitive Assessment Screen  would serve on their own to assess memory.
Because tests are developed to assess cognitive impairment in its earliest stages, the increased sensitivity may involve a trade-off of decreased specificity. Because false-positive and false-negative classification errors are related to underlying base rates of true impairment in a particular population, this trade-off needs to be evaluated with regard to the research questions being posed and the costs associated with misidentification.
Although there are many pressures to abbreviate memory measures, recent work with cued recall and semantic interference paradigms suggests that assessments may have to become even more detailed and comprehensive to capture subtle cognitive changes. A new era has commenced, where it is possible to obtain imaging and other biomarkers that point to cerebral compromise years before clinical symptoms are manifested . Instead of a focus on how to reduce existing tests, there needs to be continued development of measures that (1) can use the person as his/her own control in evaluating decrements in performance, (2) reveal vulnerabilities in the cognitive system that cannot be masked by high education and cognitive reserve, and (3) have acceptable levels of specificity. The dementia field deserves no less.
A major challenge in clinical research is the identification of several kinds of events, such as those that are either rare, irregularly occurring, and evanescent (e.g., falls, episodic sleepiness, transient neurological events), or those that evolve slowly over time and may have poor demarcation with regard to onset and transition to new clinical states (e.g., depression, cognitive impairment). Currently, survey and assessment methodologies tend to rely on sparsely spaced queries either by questionnaire or in-person examinations. These depend on recall of events or a brief snapshot of function that assumes that observations recorded during the examination are representative of the persons’ typical state of function over relatively long periods. Consequently, it is difficult to identify data or events with detailed temporal precision and intra-individual specificity. They also are of limited ecological validity. Away to improve this state of affairs is to bring the locus of assessment into the home, providing an opportunity to record in real-time events as they occur. Ideally, this is done with minimal obtrusiveness, “in the background” of daily activity. Recent advances in ubiquitous computing and sensor network methodologies along with other convergent technologies and methods such as the techniques of ecological momentary assessment, telemedicine, and activity-monitoring provide a background for achieving these goals. The fully instantiated version of this approach to assessment has further evolved from the concept of the “Smart Home.”
The term “smart home” is often used to describe the general concept of a home that is outfitted with technology that automates the detection of events and performs activities or functions for the person living inside that structure without the resident needing to consciously engage with the technology. In the building industry, this home automation (sometimes called “domotics”) is most commonly focused on environmental engineering (e.g., cooling/heating, lighting, home entertainment systems, alarms).
In the healthcare or biomedicine sphere, the concept is extended to describe a home where technology is placed or used to automate data capture or to provide interventions relevant to health [49–51]. This ranges from automated assessment of motion to detect relevant events such as walking, falls, or general activity levels (e.g., through passive infrared motion sensors located around the home); activities of daily living (e.g., instrumenting a pillbox or medication dispenser); physiological function (e.g., vital signs monitored using an embedded scale in the floor or bed); and degree of socialization (e.g., using motion sensors to measure the amount of time spent outside the home, or with telephone monitoring, number of telephone calls). Ideally, these technologies “talk” to one another and form an intelligent sensor net that acts in the background to unobtrusively identify ongoing patterns of health or intermittent acute events. These data are then available for analysis using various methodologies that may directly record events or infer them through prediction models using the dense continuous data streams generated through these assessments.
To realize the power of transitioning from the current assessment approach (infrequently acquired, conducted in an artificial setting, scheduled at the examiners’ convenience, time limited, and recall-biased) to a more continuous, naturalistic, always-on paradigm, one needs to consider how such a goal may be achieved. Several key elements must be taken into account. These include first changing the principal locus of assessment from clinics or “appointments” (even when conducted as a home visit) to assessments that are integrated to the daily life of the home and community of the individual. Second, events should be recorded in real time or near real time. Timing may depend on how important the immediacy of the data is to the research questions. In the case of intervention studies, one may wish to know trends immediately so as to intervene or provide feedback quickly. In risk factor studies, this timing may not be as critical on the reporting side. However, an advantage of pervasive computing technology is that it can improve on current methods by having data that are inherently timestamped, thereby allowing for unusual temporal precision in connecting one risk factor to another. Third, the recording of activity or events of interest should be minimally obtrusive, occurring ideally in the background of a person’s usual daily routines. This quality has been referred to as “ambient assessment.” Finally, the recording of these data should be considered to be a continuous data stream.
As noted previously, the embodiment of this kind of approach comes from the tradition of the smart home. Applications based on this approach in the healthcare domain have, until recently, used it with the relatively limited purpose of serving as a demonstration apartment, or in a limited number of free-standing homes to show feasibility of various technologies for assessing or maintaining function toward health applications. In these laboratory settings, the apartment or home has been outfitted with technologies for monitoring of function for a few hours or days [49–51]. Obviously, this does not represent a real-world setting and does not easily generalize to typical older persons or to a chronically ill population. The other highly relevant thread that applies to the home-based assessment model taking advantage of technology is in the classic application of telemedicine . Here, there is quite a long experience, but most of the work in this area has focused on single domain or condition assessment such as physiological monitoring for diabetes or heart failure or mood assessment for depression. There has not been until recently a more integrated approach that attempts to fuse multiple data streams (physiological, cognitive, behavioral, environmental, etc.) in a continuous manner on a large scale.
As a means to achieve this goal, the Oregon Center for Aging and Technology, a National Institute on Aging-supported Roybal Center, has developed an approach to continuous in-home assessments geared toward the elderly population for assessment of key functions that together lead to loss of independence, or if maintained, successful aging [53,54]. Thus, the goals of the assessment platform are to be able to continuously assess primarily those functions that are critical to home independence such as cognitive function, psychological status, health events (e.g., falls, hospitalizations or emergency room visits, medication changes) and mobility. Figure 1 shows the layout of an individual home and an example of the technology installed for monitoring of these functions in a continuous ambient manner.
The backbone of this system is a network of passive infrared motion sensors that are strategically placed about the home to provide several key measures of function. These include total activity across several different time scales (e.g., activity per day, night, week, weekend, season, after or before a health event such as a fall or illness) and the location where the activity occurs. These same inexpensive sensors can be field-restricted and placed in series in a location trafficked throughout the day or night in the home to provide a measure of walking speed, a metric tied to many meaningful outcomes in gerontology including cognitive function, independent living, morbidity, and even mortality . An instrumented pillbox can time-stamp the daily opening of the correct pill compartment, and thus measure the ability to adhere to medication regimens [56,57].
The home personal computer is a key element in this setup. It provides a multilevel assessment of function ranging from psychomotor speed, such as log-in typing speeds or keyboarding function or assessments of mouse movements, to “incidental” measures of cognitive function through assessment of simple trends in daily computer use ranging from time on the computer to time to respond to e-mails [58,59]. More structured assessment can be obtained through other means such as game playing performance over time where the games are designed to tap specific cognitive domains . Although more obtrusive, one can still ask subjects to take formal cognitive tests online; doing this conveniently at home provides the opportunity to maximize the comfort level of test administration as well as easily vary the frequency of test administration. In addition, the home personal computer provides an opportunity to query volunteers directly throughout the year on a frequent basis. Experience suggests that a weekly e-mail is reliably answered to such recurring questions as, for example, “Did you fall? Did you have overnight visitors? Have you been feeling down or blue?”
Of course, this computer use requires that volunteers are taught to use computers if they do not know how to in the first place, a common need for the current elderly generation. However, because the next generation of elderly individuals—the Baby Boomers—are now approximately 75% online , this training need will recede. Even with growing numbers of elderly individuals over time having increasing capacity for taking advantage of home-based technology, issues surrounding attitudes and beliefs about the appropriate use of these monitoring systems in everyday life need to be addressed. What types of data and under what circumstances elderly individuals will want to have this real-time information about themselves sent to various stakeholders, such as their doctors or their family members, remains to be clarified. Preliminary data suggest that the acceptance of unobtrusive in-home monitoring is closely tied to the perceived utility of the data generated and, accordingly, concerns about the privacy or security of this information are balanced by the value that these data may provide to the various stakeholders in the care ecosystem of the elderly individuals that can support their independence or optimize health .
It is thus important that we begin to understand how to best use this information and harness the power of these approaches as soon as possible. The input/output potential of the personal computer itself will be transformed, such that the distinctions among a desktop, a laptop, a tablet, a smartphone, or even a heads-up display in your mirror or embedded in appliances will become muted. Automated speech recognition technology will allow for more immediate analysis of spoken responses. This will extend beyond the simple capture of test question response accuracy to an entire new science of assessing affect and social interactions in real time. Ultimately, we will need to learn new ways to analyze these rich multidimensional data streams where the metrics for outcomes may become very different (Fig. 2) . Thus, we may find that, instead of asking people how their memory function is or how they have felt in the past week, we now can more precisely measure real instances of activity patterns and how the individual felt about any changes (Fig. 3) . Finally, the ability to perform this kind of automated assessment across hundreds or thousands of households holds the promise for a new kind of epidemiology that is conducted in real time and provides an unprecedented ability to connect activities that are currently difficult or impossible to capture by relying on recall of events.
In 2007, life expectancy at birth for the total population in the United States reached 77.9 years . This stands in sharp contrast to 47.3 years in 1900. The greatest risk factor for DAT is age and, thus, increased longevity is a double-edged sword. The issues discussed in this article lie at the heart of ascertainment of dementia in support of epidemiological studies. The various sections share several themes, but each also highlights some unique challenges faced in ascertaining cognitive decline and dementia in the community at large. All emphasize the need for brief, cost-effective measures that could enhance the ability to collect data from large, geographically distributed, and demographically diverse populations. Three themes emerged that can be emphasized as follows: (1) A suitable target of measurement. Should the outcome be an individual “slope of cognitive decline” or a distinct diagnostic endpoint such as MCI or AD? (2) A suitable instrument. What is the optimum measure of cognitive change and dementia? What form should it take? Is an overall, global cognitive score, a profile of performance along several cognitive dimensions, a single in-depth probe of episodic memory, or a functional measure with highest ecological validity most optimal? (3) If tests are administered, should they be conducted in person by a clinician and/or a technician or in the comfort of one’s home by telephone, Internet, or “in-the-background” through electronic monitoring? All sections emphasize the importance of early detection so that interventions, when they become available, could be introduced at a time point when they might prevent or slow decline.
At the crux of all these considerations lie our assumptions about what is “normal” for cognitive aging. Virtually every cross-sectional study of cognitive aging has observed agerelated decline in scores on so-called “fluid” cognitive measures such as episodic memory and mental speed of processing tests . This finding is so well-established that we have come to expect that “normal” for an 80-year-old can represent a loss of 50% in a raw test score as compared with “normal” for a 30-year-old. In fact, normative tables for standard neuropsychological measures, such as the Wechsler Memory Scale , show this phenomenon, schematically illustrated in Fig. 4. For example, an 80-year-old can earn a scaled score of 10 (average) with a raw score of 13/50 on the Delayed Recall portion of the Logical Memory subtest. However, this represents an almost 50% reduction from a raw score of 24/50 that would earn a 30-year-old the same scaled score. Therefore, clinical judgments about the normalcy of cognitive test performance in older individuals assume loss as a given rather than as an end product of a long developmental process with variable speed of occurrence.
This bias is magnified when trying to determine whether an average-for-age score is acceptable for an 80-year-old who, as a 30-year-old, was well above average for age. Thus, two 80-year-olds can obtain the same test score, but one may actually be declining and the other may not. The nature of cognitive test scores, as discussed later in the text, makes them difficult to interpret without adjusting for several sociodemographic variables. To address this conundrum, Evans (section 2) makes the case for the importance of capturing intra-individual rates of change as well as diagnostic endpoints in epidemiological studies of age-related cognitive change. The individual is her/his own control and no automatic assumptions of “normal” decline need to be integrated into decision-making. Results from the Alzheimer’s Disease Neuroimaging Initiative have shown that cerebrospinal fluid and neuroimaging biomarkers associated with Alzheimer’s pathology precede the onset of symptoms by many years . Therefore, a steep “personal slope” of cognitive change over time, even though test scores remain within normal limits for age, might signal greater vulnerability to dementia in future years. Several strategies for estimating decline against a standard of a peak prior level of ability have been suggested and exemplified in the work of Rentz et al [42,68–70]. However, in individuals with limited education or illiteracy, it may be more difficult to derive such an estimate.
Methods and measures of ascertainment were the topics covered in sections 3 to 5. A cognitive measurement that captures a single time point is problematic because, unlike some physiological health measures, the distinction between normal and abnormal can be blurred depending on age, education, racial/ethnic background, and other sociodemographic factors. Thus, there is no cognitive value akin to blood pressure that is either absolutely normal or abnormal. Another problem is that the baseline is unknown and, as aforementioned, can only be estimated. The lack of a baseline cognitive marker is a major handicap to a clinician and to researchers attempting to determine whether there has been a change. Annual health checks typically provide indices of physical health, such as blood pressure, cholesterol levels, bone density values, and cancer screenings, which can then be tracked and monitored to detect deviations from baseline values and from normal. But, to date, virtually no older individuals have had a baseline cognitive screening. In an unprecedented acknowledgement of the importance of cognitive health and change as an index of illness, Medicare has ruled that, as of January 2011, all beneficiaries be screened for cognitive impairment as part of an annual wellness benefit. This should be viewed as a major first step in the right direction to highlight the importance of cognitive and/or brain as well as physical and/or body health. However, from a clinical perspective, cognitive screening is most likely to identify dementia, and not risk for dementia, especially in previously high-functioning individuals. Thus, it could be recommended that neuropsychological assessment, or a “whole brain” health check, be carried out at a time when individuals are healthy and functioning well, in their 30s, 40s, and 50s, to establish a baseline against which cognitive decline could be detected. Such a practice will also make the detection of dementia less reliant on functional activity changes, which can be very difficult to assess without knowledge of functional baselines and which are a lot less easy to characterize than objective cognitive test scores.
Inter-individual variability in indices of cognitive (and other) performance is a universal phenomenon, but the variability increases exponentially as the age scale is ascended. In a study comparing physicians ranging in age from 28 to 92 years on a computerized cognitive battery, it was reported that average mean scores declined but standard deviations increased dramatically from the youngest to the oldest age groups . This is consistent with the view that aging is associated with increased heterogeneity and gives rise to the expectation that some older individuals may retain their cognitive abilities at a level that is, at a minimum, average for their much younger cohorts. The ability to measure individual rates of decline can capture the different cognitive aging trajectories that individuals will take over time, from “super” or very little change, to “usual” or average change, to “abnormal” or accelerated change.
Grodstein (section 3) offered evidence that a telephone-based test of mental state could provide a cost-effective method of ascertainment of cognitive functioning with good compliance for conducting repeated assessments. With the exception of naming and visuospatial perception, the telephone interview can survey domains vulnerable to AD, including attention, episodic memory, and word fluency. The instruments had good construct and criterion validity when compared with an in-person cognitive battery, and had strong inter-rater reliability. Furthermore, reduced hearing acuity, increasingly common among older individuals, did not constitute a significant deterrent to telephone screening for many participants. Telephones are plentiful in the United States and provide an acceptable, “low-tech” option for wide-scale assessment.
An alternative approach to a global measure was offered by Loewenstein (section 4), which suggested that measurement should focus on the cognitive domain most sensitive to AD, namely episodic memory, and its component operations. Mental operations entailed in encoding and retrieval have been validated against structural and functional brain imaging variables associated with “normal aging”, MCI, and early AD [72–76]. Not only does the ability to acquire, retain, and retrieve episodic information decay with AD but there is increased sensitivity to semantic interference also . Thus, mechanisms of proactive and retroactive interference serve to erode memories, and tests based on the careful probing of these mechanisms can differentiate between healthy and diseased memory [36,77–79]. Furthermore, methods that control for encoding to ensure that the information to be remembered has been attended to and processed have also yielded important insights into normal and abnormal memory in aging [80–83]. The trade-off between a more domain-focused and a more global cognitive index approach to measurement is that the former requires clinician involvement and more time for test administration and scoring. The SIT  is offered as an instrument derived from principles of the cognitive/neuropsychological approach.
Kaye (section 5) considered a revolutionary approach to healthmonitoring and ascertainment of cognitive impairment, the “Smart Home.” Making optimum use of today’s technology (keeping in mind “big brother” concerns), moment-to-moment monitoring of activity patterns, appliance use, and other surrogate/functional indices of cognitive function, offers an unprecedented opportunity to focus on individual patterns and habits and detect deviations at a behaviorally microscopic level. Increased frequency of falls, missed medication dosages, and decreased use of the stove may signal underlying cognitive deficits. Although intra-individual variability has been explored as a construct in cognitive aging [84,85], there are only a handful of studies that have shown that this factor could serve to distinguish those who are cognitively normal from those experiencing decline [86–89]. “Smart home” technology could permit early detection of subtle variability before symptoms cross the line into the abnormal range. Although the technologies to deploy such systems are readily available, the feasibility of this approach remains to be tested and implemented on a large scale.
In conclusion, ascertainment of dementia at the population level creates many challenges. On the basis of the issues discussed in this article, how we define our endpoints and measurements will have a significant effect on estimating prevalence and incidence. The looming healthcare crisis of aging and dementia will require that a cognitive health check be as important to the primary care setting as cancer screenings and checks on cardiovascular, orthopedic, and sensory motor health.
This combined effort was supported by National Institute of Aging grants R01AG011101,R01AG024059,R01AG024215, P30AG008017, P30AG024978, and P30AG013854. The research was also supported by the Intel Corporation. The authors have no conflicts to disclose. The sponsors had neither a role in the analysis or interpretation of these data, nor in the content of the article. Appropriate approval procedures were used concerning human subjects.