|Home | About | Journals | Submit | Contact Us | Français|
Telephone-based interviews can be used for screening and to obtain key study outcomes when participants in longitudinal studies die or cannot be seen in person, but must be validated among ethnically and educationally diverse people.
The sample consisted of 377 (31% non-Hispanic white, 35% non-Hispanic black, and 34% Caribbean Hispanic) older adults. The validation standard was diagnosis of dementia and mild cognitive impairment (MCI) based on in-person evaluation. The Telephone Interview for Cognitive Status (TICS) and the Dementia Questionnaire (DQ) were administered within the same assessment wave.
The sample included 256 (68%) people with normal cognition, 68 (18%) with MCI, and 53 (14%) with dementia. Validity of the TICS was comparable among non-Hispanic whites, non-Hispanic blacks, and Hispanics, but the DQ had better discrimination of dementia from those without dementia and with MCI among Whites than other groups. Telephone measures discriminated best when used to differentiate demented from nondemented participants (sensitivity/specificity for the TICS = 88%/87%; DQ = 66%/89%) and when used to differentiate cognitively normal participants from those with cognitive impairment (i.e., MCI and demented combined; sensitivity/specificity for the TICS = 73%/77%; DQ = 49%/82%). When demographics and prior memory test performance were used to calculate pre-test probability, consideration of the telephone measures significantly improved diagnostic validity.
The TICS has high diagnostic validity for identification of dementia among ethnically diverse older adults, especially when supported by the DQ and prior visit data. However, telephone interview data were unable to reliably distinguish MCI from normal cognition.
Telephone–based assessment of cognitive status and functional decline is an alternative to in-person assessment in longitudinal studies of cognitive function and dementia of older adults 1-9. A telephone interview has become the primary modality of cognitive data collection in several epidemiological studies 10-13, and is now frequently used as screen for clinical trials requiring participants with cognitive impairment 14-18.
A recent study at Mayo Clinic 19 found that although the modified TICS had 83.3% sensitivity and 81.6% specificity for separating demented from non-demented participants, and 83.3% sensitivity and 78.3% specificity for separating cognitively impaired participants from cognitively normal participants, the measure could not reliably distinguish participants with MCI from those with normal cognition or MCI from dementia. One limitation of this study was that the participants were almost exclusively White and well educated. Therefore, a central goal of the current study was to determine the accuracy of a telephone interview in classifying these groups among an ethnically and educationally diverse community-based sample.
The primary reason for adding the telephone interview to the assessment battery in our study of was to be able to derive key diagnostic classifications, i.e., normal cognition, MCI, or dementia, from the telephone-based data even when participants are unable or unwilling to be seen in person at a follow-up visit. However, this would require the instruments to validly distinguish MCI from normal cognition and MCI from dementia. This has been a challenge in prior studies; although several research groups have documented high specificity for MCI using telephone-based measures 14, 17, 19, only one study 18 showed high sensitivity for distinguishing MCI from normal cognition. Since participants in our study were seen in-person at a prior assessment wave, we sought to determine whether the distinction of MCI from other classifications would improve if prior visit data was used along with data from the telephone interview.
The Columbia University Institutional Review Board approved this project. All individuals discussed the study with trained research staff and provided written informed consent.
The current sample included 377 English and Spanish-speaking participants in a longitudinal study of aging, cognitive function, and dementia among Medicare-eligible older adults residing in neighborhoods in Northern Manhattan, New York. The current sample was drawn from a cohort resulting from two recruitment efforts, the first in 1992 (n=2,125) and the other in 1999 (n=2,183). The sampling strategies and recruitment outcomes of these two cohorts are detailed in prior publications 20, 21. Re-evaluations occur during follow-up waves that are spaced approximately 18 to 30 months apart.
Ethnic group was determined by self-report using the format of the 2000 US Census. All individuals were first asked to report their race (i.e., American Indian/Alaska Native, Asian, Native Hawaiian or other Pacific Islander, Black or African American, or White), then, in a second question, were asked whether they were Hispanic.
Evaluations were conducted in either English or Spanish, based on the subject’s opinion of which language would yield the best performance. Examiners were balanced bilinguals, who spoke both English and Spanish daily with friends, family, and colleagues.
The validation study for the telephone interview was initiated during the 2005 – 2007 assessment wave of the cohort. The telephone interview was conducted during the same assessment wave, but independently from, the in-person visit by trained interviewers. On average, calls occurred 7.3 months after the in-person visit, with a standard deviation of 10.9 months. One participant had only the TICS because the call was interrupted and the participant could not be re-contacted. Of the participants for whom the DQ interview was conducted, eight did not have the TICS because they were not well enough to come to the phone (n=4), the participant died soon after the in-person visit (n=2), or because the call was interrupted (n=2).
The TICS was administered and scored according to published procedures 4. The TICS is modeled after the Mini-Mental State Examination, producing scores ranging from 0 to 41. High test-retest reliability 1, 7, 22 has been demonstrated in several studies. The published Spanish language adaptation of the TICS was used among Spanish-speaking participants 2. Total score was used in the analyses.
The DQ is a semi-structured interview that includes yes or no questions assessing cognitive complaints in the domains of memory, confusion, and spatial orientation (8 questions) and language/verbal expression (3 items), as well as questions assessing problems with daily function (6 items). This questionnaire has established reliability and validity with high sensitivity and specificity for the detection of dementia and AD 3. Information about cognitive complaints and functional abilities could be provided by either the participant or an informant, as long as they were knowledgeable about the functional status and medical history of the participant. The 17 questions described above were summed to create a score representing total burden of cognitive complaints and functional problems.
Medical history and a neurological and physical examinations was performed at the initial visit and each follow-up. A medical burden score was calculated as a sum of multiple non-psychiatric medical conditions, and included hypertension, diabetes, heart disease, stroke, arthritis, COPD or other pulmonary conditions, thyroid disease, liver disease, renal insufficiency, peptic ulcer disease, peripheral vascular disease, cancer, Parkinson’s disease, cancer, multiple sclerosis, and essential tremor. Current depressive symptoms were assessed using the Center for Epidemiologic Studies Depression Scale 23. The Disability and Functional Limitations Scale 24, 25 was used to assess instrumental activities of daily living via self and informant report, as well as perceived difficulty with memory.
Neuropsychological measures included the Buschke Selective Reminding Test (SRT; 26), matching and delayed recognition conditions of a multiple choice version of the Benton Visual Retention Test (BVRT; 27), the Rosen Drawing Test 28, a 15-item Boston Naming Test 29, the Controlled Oral Word Association Test 30, the Category Fluency Test 31, the Color Trails Test 32, and the Similarities subtest from the Wechsler Adult Intelligence Scale - Revised 33.
After each clinical assessment, a group of physicians and neuropsychologists reviewed the functional, medical, neurological, psychiatric, and neuropsychological data (but were blinded to TICS and DQ data) and reached a consensus regarding the presence or absence of dementia using DSM-III-R criteria 34. For follow-up evaluations, this group was shielded from the prior consensus diagnoses. If dementia was diagnosed, the etiology was determined using published research criteria for probable and possible AD 35, vascular dementia 36, Lewy Body Dementia 37, and other dementias. Mild Cognitive Impairment was not diagnosed in the consensus conference, but was retrospectively applied, based on the neuropsychological, functional, and memory complaint measures described above using standard criteria 38 among participants not diagnosed with dementia at the consensus conference 20.
Characteristics of the three diagnostic groups were compared using Chi-square tests and ANOVA, and correlations between measures and demographic variables were calculated. Receiver operating characteristic (ROC) curves were drawn for each of the telephone measures administered (TICS total score and DQ summary score), using four planned comparisons of interest: 1) non-demented vs. demented, 2) cognitively normal vs. cognitively impaired (i.e., MCI and dementia), 3) normal vs. MCI, and 4) MCI vs. dementia. Areas under the curve were calculated and compared for all participants and separately for non-Hispanic whites, non-Hispanic blacks, and Hispanics. The cutoff yielding the best sensitivity and specificity for the overall sample was determined, and the sensitivity and specificity, negative and positive predictive values, and likelihood ratios were calculated 39 for each telephone instrument for prediction of each of these diagnostic criteria in the entire sample and within the three primary racial/ethnic groups.
Binary logistic-regression models were constructed to estimate the pretest probabilities of each of the four comparisons outlined above with the in-person diagnostic classification as the dependent variable and the following independent variables: age, sex, race/ethnicity, years of education, and the delayed word list recall score from the SRT taken from the prior assessment. Post-test probabilities for each participant were then estimated for two scenarios: 1) when both TICS and DQ were available by adding the total scores from both measures to the model, and 2) when the participant was dead or too ill to come to the telephone and the TICS was not available, by adding to the model the total score from the DQ only. Predicted values from the models were then used to generate receiver-operating-characteristic curves. Areas under these curves were compared, and the differences were calculated with 95 percent confidence intervals 40.
Demographic characteristics and scores on key study measures of participants with normal cognition, MCI, and dementia are described in Table 1. The majority (87.3%) of Hispanic participants in this study were immigrants from the Caribbean, including the Dominican Republic (59.5%), Cuba (17.5%), and Puerto Rico (10.3%). The MCI group included 44 participants with MCI with memory impairment (64.7%) and 24 with MCI without memory impairment (35.3%). Of the 53 demented participants, most were diagnosed with probable (n=33) or possible (n=16) Alzheimer’s disease, but the sample also included two people with Parkinson’s Disease Dementia, a participant with Vascular Dementia, and one with a diagnosis of Lewy Body Disease. The TICS total score was significantly correlated with age (r = −0.37; p < .001), years of education (r = 0.51; p < .001), prior visit SRT total recall (r = 0.56; p < .001), prior visit SRT delayed recall score (r = 0.48; p < .001), and depressive symptoms (r = −0.28; p < .001). Men (mean = 28.1, SD = 0.9) obtained slightly higher scores on the TICS than did women (mean = 25.6, SD = 0.6; F(1,369) = 6.1; p = .014). Differences between scores of non-Hispanic whites (mean = 30.4, SD = 7.5), non-Hispanic blacks (mean = 27.4, SD = 7.0), and Hispanics (mean = 21.8, SD = 10.2) were all significant (omnibus F(1,365) = 32.3; p < .001; all pairwise comparisons p < .05). The DQ summary score was significantly related to age (r = 0.17; p <.001), years of education (r = −0.3; p < .001), prior visit SRT total recall (r = −0.32; p < .001), prior visit SRT delayed recall score (r = −0.26; p < .001), depressive symptoms (r = 0.39; p < .001), and medical burden score (r = 0.21; p < .001). There were no significant differences in DQ summary score between men (mean = 3.6, SD = 0.4) and women (mean = 4.1, SD = 0.2; F(1,376) = 1.1; p = .291). While whites (mean = 3.1, SD = 2.9) and blacks (mean = 3.2, SD = 2.9) did not differ from each other, Hispanics (mean = 5.3, SD = 4.5) reported more problems on the DQ than each of the other two groups (F(1,372) = 15.8; p < .001). TICS and DQ scores were significantly correlated to each other (r = −0.59; p < .001).
Figure 1a shows the ROC curve for separation of demented versus non-demented (normal cognition and MCI) participants. The area under the curve (AUC), diagnostic characteristics, and optimal cutoffs derived from the ROC analyses for the TICS (Table 2) and the DQ (Table 3) are shown for each of the comparisons among all participants and then separately for non-Hispanic whites, non-Hispanic blacks, and Hispanics. The ability of the TICS to discriminate between demented and non-demented participants was comparable across racial/ethnic groups, but the DQ’s discrimination was higher among non-Hispanic Whites than among racial/ethnic minorities. Figure 1b depicts the ROC curves when MCI participants were combined with the demented participants to form a cognitively impaired group, and compared with participants with normal cognition. The AUCs for the TICS and the DQ was comparable across ethnic groups for this comparison (Tables (Tables22 and and33).
We sought to determine the diagnostic accuracy of the telephone-based measures when making more subtle distinctions between participants with normal cognition and MCI and between MCI and dementia. Figure 1c depicts the ROC curves when demented participants were eliminated from the analysis and participants with MCI and normal cognition were compared. The AUC for both measures for this comparison was relatively low (.71 for the TICS and .58 for the DQ), but discriminability was similar across ethnic groups (Tables (Tables22 and and3).3). We then determined the ability of the instruments to distinguish people with MCI from those with dementia, when participants with normal cognition were omitted from the analysis (Figure 1d). For this comparison, the AUC was .91 for the TICS, and was comparable across ethnic groups. The AUC was .81 for the DQ in the whole sample, but for this comparison, the DQ had better discrimination among non-Hispanic Whites than among ethnic minorities. Examination of the Odds Ratios in Tables Tables22 and and33 reveals that both the TICS and the DQ performed best in distinguishing the nondemented group (normal cognition and MCI combined) from the demented participants, and in distinguishing people with MCI from people with dementia.
Figure 2 depicts ROC curves for pretest and post-test probabilities as calculated in the logistic regression models. As shown in Table 4, adding information gathered from the TICS and DQ to the pre-test model significantly improved the diagnostic performance for all key clinical outcomes in the study. For example, the addition of the TICS and DQ to the pre-test prediction of dementia vs. no dementia improved the AUC by 6.5%. The best diagnostic performance was in distinguishing non-demented from demented participants (AUC = .96) and MCI from demented participants (AUC = .95) using demographic information, prior SRT delayed memory score, and both TICS and DQ. Accurate identification of MCI among non-demented participants was poor overall, even when both TICS and DQ were available (AUC = .75). Predicted classification as normal cognition, MCI, and dementia, using the optimal cutoffs for the predicted values from the models separating demented from nondemented participants and normal cognition from cognitive impairment, was compared to the observed diagnoses. The cutoffs correctly identified 66.5% of participants with normal cognition, 55.4% of those with MCI, and 92.7% of those with dementia.
Models using only the DQ summary score showed improved classification over pre-test probabilities when the goal was to distinguish demented from nondemented people, and cognitively impaired from cognitively normal people. However addition of the DQ summary score did not improve diagnostic accuracy over pre-test probabilities when the goal was to identify MCI among non-demented participants or to identify dementia among cognitively impaired participants (Table 4).
The sensitivity and specificity of the TICS and DQ was variable and depended on the diagnostic groups serving as the standard for comparison. There were no consistent racial/ethnic differences in the ability of the TICS to discriminate diagnostic classifications. However, in distinguishing demented people from nondemented (normal cognition and MCI combined), and people with dementia from people with MCI, the DQ performed better among non-Hispanic whites than among non-Hispanic blacks and Hispanics.
Used alone, the TICS had high sensitivity for distinguishing demented people from nondemented (normal cognition and MCI combined), and excellent specificity when distinguishing people with dementia from people with MCI. The DQ had lower sensitivity and higher specificity than the TICS for all comparisons, but was most valid when distinguishing demented people from those with MCI. The superior specificity of the DQ to the TICS is expected given the original purpose of developing the instruments: the DQ was designed to pick up on changes in memory and function that are specific to dementia and are not seen in normal aging or MCI.
Comparing likelihood ratios with the recent Mayo clinic study by Knopman et al. 19, our use of the TICS overall and within each racial/ethnic group yielded superior performance to the TICS-M when distinguishing demented from nondemented participants (MCI and normal cognition combined), and MCI from dementia. Identification of cognitive impairment (MCI + dementia) from normal cognition was comparable to the Mayo TICS-M among non-Hispanic whites, non-Hispanic blacks, and Hispanics. Although similar among Hispanics, the Mayo study showed better accuracy for the separation of MCI from normal cognition than among the whites and blacks in the current study. Both studies had lower diagnostic validity for identification of MCI versus normal cognition than was reported by Cook et al. in a study of mostly White, community dwelling nondemented older adults 18.
Our standard diagnostic algorithms for dementia and MCI require an in-person visit; therefore, diagnoses could not be derived for participants not seen due to death, moving out of the area, or refusal. The current study revealed that if a participant and/or informant can be reached by phone, presence of dementia can be estimated with moderate to high validity among non-Hispanic whites, non-Hispanic blacks, and Hispanics. Our analyses indicate that the diagnostic utility of telephone instruments will increase or decrease in response to variations in prevalence of cognitive impairment and dementia in the population, and thus will vary in cohorts that differ by age, racial, ethnic group, education, and other demographic variables. Indeed, we found that adding age, sex, education, race/ethnicity, and prior memory scores to the data supplied by the TICS and DQ significantly improved the diagnostic accuracy of the telephone interview data. Even when the TICS was not used in the model, the nondemented/demented classification was highly accurate as predicted by demographics, DQ data and prior visit data.
It was hoped that the availability of the DQ, which taps into cognitive complaints and functional status, and prior visit memory test performance, when added to the direct cognitive assessment provided by the TICS, would improve the identification of MCI among nondemented participants. However none of the models tested were able to reliably differentiate MCI from normal cognition – this was true across all racial/ethnic groups. Addition of a delayed word list recall to the TICS may marginally improve identification of MCI, but prior research suggests that even with this added component, the classification rate may remain too low to advocate for the use of the TICS-M as a stand-alone measure for identification of MCI 19.
This study was supported by National Institute on Aging grants P01-AG07232 (R. Mayeux) and R01-AG16206 (J. Manly). The authors had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.