|Home | About | Journals | Submit | Contact Us | Français|
Purpose: This study provides preliminary evidence for the acceptability, reliability, and validity of the new Alzheimer's Disease Knowledge Scale (ADKS), a content and psychometric update to the Alzheimer's Disease Knowledge Test. Design and Methods: Traditional scale development methods were used to generate items and evaluate their psychometric properties in a variety of subsamples. Results: The final 30-item, true/false scale takes approximately 5–10 min to complete and covers risk factors, assessment and diagnosis, symptoms, course, life impact, caregiving, and treatment and management. Preliminary results suggest that the ADKS has adequate reliability (test–retest and internal consistency) and validity (content, predictive, concurrent, and convergent). Implications: The ADKS is designed for use in both applied and research contexts, capable of assessing knowledge about Alzheimer's disease among laypeople, patients, caregivers, and professionals.
In 1988, Dieckmann, Zarit, Zarit, and Gatz published the Alzheimer's Disease Knowledge Test (ADKT), a 20-item multiple-choice tool to assess what people know about Alzheimer's disease (AD). The ADKT has been used in a broad array of research projects, with knowledge about Alzheimer's as both a dependent variable (e.g., Sullivan & O'Conor, 2001) and an independent variable (e.g., Proctor, Martin, & Hewison, 2002). Since the original publication, however, science has revealed much about AD in terms of its etiology, diagnosis, symptoms, course, and management. Although the 1988 test reflected experts’ best understanding of the disease at that time, nearly three decades later the content is dated: As a set, the items do not reflect issues about AD that are important today, and some answers coded as correct in 1988 would not be viewed as correct today. Still, as recently as 2007, researchers were using the instrument. The purpose of the current study was to create a new scale that updated the ADKT to reflect contemporary understanding of AD.
This new scale, the Alzheimer's Disease Knowledge Scale (ADKS), could be used in a number of circumstances to examine what people know about AD. For example, the effectiveness of public information campaigns could be evaluated by administering the ADKS to broad samples of community residents. Similarly, giving the ADKS to health care or social service staff might pinpoint education needs or indicate the success of education efforts. The ADKS also could be given to patients and caregivers seeking a dementia evaluation to determine what they know, and to dementia support groups to guide psychoeducational efforts. Finally, researchers might use the ADKS to examine familiarity with AD as both a predictor variable and an outcome variable, depending on their research questions.
In this report, we describe our efforts to (a) create a set of items that reflect current scientific understanding about AD and (b) test the psychometric properties of those items on a broad sample that is representative of people with whom the ADKS might be used.
We developed a pool of items by reviewing other scales that were designed to assess knowledge about AD, dementia, and related phenomena (e.g., memory), as well as other scales that contained questions about AD even when the purpose of the overall scale may have been more broad (e.g., a general aging questionnaire). The goal of this process was to enhance the content validity of the new scale (Trochim & Donnelly, 2007). Twenty-one instruments were located (see Table 1). After an initial review of these scales, we identified seven content domains that encompassed the breadth of information in the instruments: risk factors, assessment and diagnosis, symptoms, course, life impact, caregiving, and treatment and management. Next, we divided the scales among the project team so that each scale was evaluated by two investigators. They independently extracted all relevant items and assigned them to content domains. The entire project team then came together to review the selection and categorization of items made by the original two investigators. All discrepancies were reconciled in a series of consensus conferences. This method helped ensure comprehensive content coverage as well as content relevance (Streiner & Norman, 1995).
Once all instruments had been reviewed, a master spreadsheet was created to list each item in each domain. In a series of subsequent conferences, the entire research team reviewed each item by domain to remove items with overlapping content and rewrite items for final wording. During this process, the research team made an effort to keep items at or below an eighth-grade reading level, avoid ambiguous phrasing, remove jargon, avoid double-barreled questions, excise value-laden words, and avoid questions phrased in the negative, all of which can be more difficult for respondents. This process was informed by test development guidelines from a variety of sources (Clark & Watson, 1995; Kline, 2005; Streiner & Norman, 1995). In the end, we developed 57 potential items.
At this phase, we also decided to use a true/false response format rather than the multiple-choice format used in the ADKT. Although a multiple-choice test may pinpoint misinformation with its use of distractor responses, that format is no better than a true/false format at distinguishing incorrect guesses from actual misinformation (Kline, 2005; Stanley & Hopkins, 1972). In the end, we chose the true/false format because of its relative ease for respondents and ease in scoring.
Next, the 57 items were presented to eight small groups to identify unclear phrasing. The groups consisted of graduate students in clinical psychology and community-dwelling older adults. Each group included three to four people who completed the instrument. For any item answered incorrectly by any participant, group members were asked to explain what they thought the question was asking and why they responded as they did. This “think-aloud” technique identified errors based on misunderstanding of the question. In addition, face validity of the scale was confirmed by universal agreement that the scale appeared to tap their knowledge about this particular disease (Trochim & Donnelly, 2007). After the groups were completed, eight items were removed from the scale, and others were rewritten for clarity, leaving 49 items (see Table 2). We gathered citations to substantiate a correct answer for these remaining items.
With those 49 items, we compared features of the ADKS with and without a “don't know” (DK) option. In a sample of 52 undergraduates, average total scores were higher in the group that completed the non-DK version (M = 35.73) compared with the group that completed the DK version, M = 28.96, t(50) = 5.08, p < .001, reflecting the advantage to guessing on a true/false scale. Although a DK option has some advantages (provides an option for people averse to guessing and differentiates misinformation from uncertainty), it also has disadvantages (neglects different degrees of uncertainty and complicates scoring). On balance, we concluded that there was more loss than gain in precision by including a DK option. Consequently, we used the true/false format when testing the 49 pilot items with the samples described next.
A number of distinct groups were recruited to test the scale. These groups are representative of the types of people with whom the final version of the ADKS might be used. As mentioned earlier, a sample of college students (n = 26) completed a version of the ADKS that included a DK option. An additional sample of individuals of disparate ages (n = 40), with no cognitive impairment, completed the ADKS on two occasions for the purpose of establishing the test–retest reliability. Another sample of students (n = 36) completed the ADKS before and after dementia education to assess anticipated change in their scores as evidence of concurrent validity.
The remaining groups included people whose knowledge of and experience with dementia would provide information about the construct validity of the items, insofar as knowledge scores should be higher for groups with more exposure. Recruitment focused on health care professionals from a variety of disciplines who are involved in dementia research and service provision (n = 75), senior center staff (n = 61), caregivers of people with dementia (n = 54), community-dwelling older adults with no cognitive impairment (n = 89), and college students, some of whom had curricular exposure to aging and dementia (n = 484). Characteristics of those subsamples appear in Table 3. Four individuals reported that they were not completely fluent in English; those individuals were scattered among the subsamples and, because of their small number, were not felt to exert undue influence on the results.
Participants were given or mailed a packet that included a consent form and a questionnaire. Depending on the circumstances of administration, some questionnaires were returned by hand and others were mailed in a stamped envelope that had been provided. Because we recruited a series of convenience samples (e.g., asking for volunteers at an agency, soliciting undergraduates in a subject pool), it was not possible to calculate a final response rate.
Each survey packet included a set of demographic and background questions about gender, age, race or ethnicity, education, primary occupation, and English fluency.
Respondents also completed a series of questions to assess experience with AD and dementia. These questions asked whether family members had AD or a related disorder, whether respondents were currently or previously caregivers for someone with AD or a related disorder, whether they had ever attended a support group or educational program related to AD or a related disorder, and whether their job or volunteer responsibilities involve working with people who have AD or a related disorder.
After these questions came the old and new AD knowledge scales and a vocabulary test. The order of the knowledge tests was counterbalanced. Respondents also provided a self-rating of knowledge about AD and related disorders, on a scale from 1 (I know nothing at all) to 10 (I am very knowledgeable).
Two instruments were administered. The original ADKT (Dieckmann, Zarit, Zarit, & Gatz, 1988) is a 20-item multiple-choice test with four response options plus a DK option per item. Items on this scale were generated based upon a literature review at the time and expert consensus. Item content covers prevalence, etiology, diagnosis, symptoms, proposed cures, management of problem behaviors and symptoms, public policy affecting reimbursement, and the role of supportive services. A total score is calculated by summing the number of correct responses. According to the original validation study, the ADKT has high internal consistency (.71–.92), moderate test–retest reliability (.62), and adequate construct validity as evidenced by (a) performance differences across groups of respondents with varying familiarity with AD and (b) increasing scores following instruction. The ADKS was developed as an update to the ADKT and is substantially different in content and format. Forty-nine true/false pilot items were developed using the methods described earlier. Test coverage includes items about risk factors, assessment and diagnosis, symptoms, course, life impact, caregiving, and treatment and management. The 49 pilot items and the final 30 items that survived analyses appear in Table 2. Psychometric properties of the 30 final items are described in the remainder of this report.
To measure general intellectual functioning, we used the Shipley Institute of Living Scale (Shipley, 1940), which includes 40 progressively difficult vocabulary words. Respondents choose which of four listed words “means the same or nearly the same” as a target word. The number of correct items is the final score. The Shipley has good test–retest reliability (median r = .78), internal consistency reliability (alpha = .80 in the current sample), and validity (median r = .71); and the scale has been normed in a wide range of populations (Zachary, 1991).
Acknowledging that a 49-item scale was impractical for most purposes, analyses were guided by the goal of reducing the number of items (following Kline, 2005). In the process, we sought a final set of items that had adequate face validity and broad content coverage, was internally consistent, and demonstrated solid properties of validity. To accomplish these goals, we began with a series of analyses focused at the level of individual items: content and face validity, followed by calculation of discrimination indexes, item difficulty indexes, and homogeneity. Once we had arrived at a final set of items, we then moved on to examine properties of the resulting scale based on recommendations from Trochim and Donnelly (2007): test–retest reliability, internal consistency, predictive validity, concurrent validity, and convergent validity.
The purpose of this analysis was to eliminate items that were the least effective at discriminating between high and low overall scorers on the scale. We selected a random half of the sample (n = 384) for this initial analysis. Using their score on the 49 items, we identified high scorers (top 27%, n = 104) and low scorers (bottom 27%, n = 104). High scorers answered between 40 and 48 items correctly (M = 43.35, SD = 2.03); low scorers answered between 17 and 35 items correctly (M = 31.15, SD = 3.46). Then, for each item, we calculated the percentage of participants in each group (high and low scorers) who answered the item correctly. We subtracted those two percentages (high scorers minus low scorers) to arrive at each item's discrimination index. As an example, 91% of the high scorers and 89% of the low scorers answered item No. 30 correctly. The difference (high – low; 91% – 89% = 2%) is small, which suggests that people answered item No. 30 correctly, regardless of their overall knowledge about AD. Therefore, the item appears to be a candidate for removal because it does not enhance the discriminability of the overall scale. In other words, a high-discrimination index suggests that the item does a good job of differentiating between people based on their knowledge. The discrimination index for each item appears in Table 2. This index was used in concert with the item difficulty index and coefficient alpha (described next) to determine which items to drop.
This analysis was performed using the same randomly selected half of the sample that was used to calculate the discrimination indexes. We calculated a difficulty index (p) for each item, which represents the percentage of people who answered the item correctly. Items that are answered correctly (or incorrectly) by a high percentage of people are unlikely to discriminate among test takers and are therefore candidates for deletion. A p value of .95 indicates that most people answer the item correctly, and the item provides little useful information and may, in fact, detract from the scale's psychometric properties (Streiner & Norman, 1995). The difficulty index for each item appears in Table 2. We retained only items whose difficulty index was lower than .95.
The next analyses used the other half of the sample (n = 384) to confirm the weak items identified earlier. Because we sought to develop a homogeneous scale that measures knowledge about AD, each item should tap that overarching construct. So, to test the relationship between each item and the overall scale score, we used two indexes. The first, Cronbach's alpha, was used to calculate the internal consistency reliability of the scale, with successive items removed. As we deleted items based on the difficulty and discrimination indexes, we wanted to make sure alpha did not drop below the recommended .70 (Nunnally, 1970). We stopped deletion at a scale of 30 items (see Table 2), which provided a significant reduction from 49 while still providing relatively comprehensive content coverage and adequate internal consistency (alpha = .71).
With the 30 items that were retained, we calculated the item–total correlation (see Table 2), the correlation of each individual item with the total score omitting that item. For item retention, it is recommended that this correlation be at least .20 (P. Kline, 1986). Among our items, the item–total correlations ranged from .14 to .37 (M = 0.23, SD = 0.06). Seven items had correlations less than .20, although only two were less than .17. We chose to keep these items to ensure an acceptable alpha (.71) and adequate content coverage.
We administered the 30-item scale on two occasions to 40 people. They ranged in age from 22 to 87 years (M = 48.9 years, SD = 21.2), and their scores on the ADKS ranged from 19 to 30 (M = 24.2, SD = 2.4), suggesting some variability in their knowledge about AD. The test–retest interval ranged from 2 to 50 hr (M = 20.4, SD = 15.9), and the test–retest reliability coefficient was .81, p < .001, suggesting adequate test–retest reliability.
As mentioned earlier, coefficient alpha (the average interitem correlation) was .71. Randomly dividing the 30 items and correlating scores on those two halves yielded a split-half reliability of .55, p < .001. These statistics suggest a moderately homogeneous scale. We conducted a principal components factor analysis to determine if meaningful subscales exist within the ADKS. The unrotated first principal component accounted for only 11% of the variance. Following oblique rotation, 10 factors had eigenvalues greater than 1.0, and coefficient alpha for each subscale ranged from .26 to .60. Examination of the factor loadings and item content suggested no simple structure or meaningful factor interpretation. Consequently, in its present form, we think the ADKS is best thought of as a scale of overall AD knowledge and not as a set of separately scored subscales.
Scores on the ADKS should be significantly associated with other values that knowledge about AD should predict. We examined this type of validity by calculating the correlation between performance on the ADKS and ratings of self-reported knowledge about AD. The correlation was .50, p < .001. Looking at specific subsamples, correlations were still significant, although variable: for dementia caregivers, r = .46; for AD professionals, r = .39; for older adults without cognitive impairment, r = .41; and for undergraduates, r = .20. Respondents do have some (although not perfect) awareness of how much they know about AD.
Scores on the ADKS should be different across groups with theoretically different levels of knowledge about AD. We examined this type of validity by comparing ADKS scores across groups of respondents who likely had different degrees of knowledge about AD. People who know more about AD because of their experience or education should score higher than people who know less about AD. We first examined two-group differences using t tests. The left-hand portion of Figure 1 includes results from these group comparisons. Knowledge about AD was more extensive among people who had attended a dementia support group (M = 25.73) compared with those who had not, M = 21.11, t(755) = 9.53, p < .001; more extensive among people who had attended a class or educational program about dementia (M = 24.04) compared with those who had not, M = 20.57, t(756) = 11.10, p < .001; more extensive among people whose work involved contact with people with dementia (M = 24.52) compared with people whose work did not, M = 20.91, t(749) = 9.47, p < .001; and more extensive among people who volunteered with people with dementia (M = 22.80) compared with those who did no such volunteer work, M = 21.39, t(750) = 3.32, p < .01.
Next, we used a one-way analysis of variance to examine differences across the five subsamples that were recruited based on differences in experience with dementia. The right-hand portion of Figure 1 contains mean ADKS scores (with standard deviation bars and ranges; see Table 3 for the exact values) for these groups, which were significantly different from one another, F(4,758) = 86.90, p < .001. Post hoc group comparisons using the Scheffe method revealed that undergraduates and senior center staff had, on average, lower scores on the ADKS than older adults and dementia caregivers, whose scores were lower than professionals working in the field of dementia research or care.
These group differences are in expected directions, with the exception of dementia caregivers, who we expected to score higher than noncaregiving older adults. This may reflect the fact that we populated our caregiving group with people who described themselves as caregivers. Previous research has suggested, however, that caregivers may not identify themselves that way (Kutner, 2001; National Family Caregiver Association, 2002). Consequently, some people who actually were caregivers may have been included in our general older adult group, explaining why knowledge in that group was relatively high. In addition, high scores among the noncaregiving older adults may reflect their overall intellectual ability: Their scores on the Shipley were higher than any other group (see Table 3), and the correlation between the ADKS and the Shipley was significant (r = .44, p < .001).
As further evidence of concurrent validity, we examined how scores on the ADKS changed when individuals were exposed to education about AD and dementia. If education increases knowledge, scores on the ADKS should increase following education. Respondents in this part of the study included students in a social gerontology class (n = 9) who received 3 hr of instruction on dementia and took the scales 1 week apart, students in an aging and mental health class (n = 21) who received 2 hr of instruction on the topic and took the scales 2 weeks apart, and students in a psychology and aging course (n = 6) who received 3 hr of instruction on dementia and took the scale at the beginning and end of the semester. Scores on the ADKS before instruction (M = 14.07, SD = 3.15) were lower than scores after instruction (M = 16.83, SD = 2.39), t(29) = −4.42, p < .001.
Scores on the ADKS should be significantly associated with scores on related constructs. To examine this type of validity, we calculated the correlation between the ADKS and the ADKT, both given to the same people at the same time. For this analysis, we had 311 respondents from the larger sample, ranging in age from 18 to 90 years. The Pearson correlation coefficient between the two scales was .65, p < .001. The correlation between the ADKS and the ADKT with its four outdated items removed was .60, p < .001. These correlations suggest a moderate association between the new scale and the original instrument, evidence of adequate convergent validity. If anything, these correlations may be underestimates due to the significant differences in response format across scales and the fact that some outdated items were not so much incorrect as no longer relevant (e.g., describing the purpose of the Alzheimer's Disease and Related Disorders Association, which is now known as the Alzheimer's Association).
This study provides preliminary evidence for the acceptability, reliability, and validity of the ADKS. The ADKS contains 30 true/false items to assess knowledge about AD, based on current scientific understanding of the disease. The scale takes approximately 5–10 min to complete and covers risk factors, assessment and diagnosis, symptoms, course, life impact, caregiving, and treatment and management. It is designed for use with students, health care professionals, and the general public. An analysis of the scale's psychometric properties suggests it has adequate reliability (test–retest correlation = .81; internal consistency reliability = .71) and validity (content, predictive, concurrent, and convergent), although additional research is needed to confirm these attributes. The scale, along with a scoring key and documentation of answers, can be downloaded http://www.psych.wustl.edu/geropsych/ADKS.
We acknowledge that there are some limitations of the scale. First, its internal consistency reliability is relatively low. This may be due to the true/false response format and the relatively high item difficulty indexes, which together could result in lower variance among the items. Additional testing with an expanded response format and items that are more varied in difficulty might be useful. Low reliability also might reflect the fact that items were written to tap multiple facets of knowledge in people who themselves may have idiosyncratic information about AD. Future efforts could focus on the development of cohesive subscales. Some aspects of the scale's validity also could use refinement. For instance, our pre–post samples were relatively small.
In addition, because it is brief, the scale excludes some specific topics. For instance, items that address the usual pace of deterioration, daily variability in symptoms, genetic risk, the use of vitamin E in prevention, and some details about prevalence were dropped from the scale because of their unfavorable psychometric attributes or because no clear consensus about the facts exists in the professional literature. Consequently, it is important to recognize that the ADKS is not an exhaustive assessment tool. Rather, it contains representative items that, as a set, likely reflect a person's general knowledge about AD. For this reason, the scale also may have ceiling effects in more expert groups. We can imagine the utility of developing add-on modules for the ADKS that have more specialized content and could be used with different groups, such as dementia caregivers or specialty clinicians. Moreover, if the ADKS proves to be useful, it will be necessary to develop versions in other languages and to expand normative samples specific to age, profession, and setting of evaluation. Finally, we believe the ADKS reflects current knowledge about AD, although as with any fact-based scale it will require continued revision to keep pace with scientific advances.
During this research, Steve Balsis was supported in part by National Institute of Mental Health grant F31 MH075336 and National Institute on Aging grant F31 AG00030, and Poorni Otilingam was supported in part by National Institute on Ageing grant F31 AG021879.
Randi Jones provided detailed input during the initial generation of items. Assistance with data management was provided by Lauren Chow, Brian Gendron, Grace Jackson, Kyle McGarty, Rebecca Morris, and Rachel Tepper. Generous assistance with data collection was provided by Darby Morhardt at the Northwestern University Feinberg School of Medicine, Scott Roberts at the University of Michigan School of Public Health, Paula Ogrocki at Case Western Reserve University, Christine Ferri at the Richard Stockton College of New Jersey, the St. Louis chapter of the Alzheimer's Association, the Alzheimer's Disease Research Center at the University of Southern California (National Institutes of Health grant P50 AG05142), members of Psychologists in Long-Term Care, and members of the American Psychological Association's Division 12, Section II (Clinical Geropsychology).