Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Exp Aging Res. Author manuscript; available in PMC 2013 July 1.
Published in final edited form as:
PMCID: PMC3612583

The Framingham Heart Study Clock Drawing Performance: Normative Data from the Offspring Cohort

Justin A. Nyborn, MPH,1,2 Jayandra J. Himali, MS,2,3 Alexa S. Beiser, PhD,1,2,3 Sherral A. Devine, PhD,1,2 Yangchun Du, PhD,2,3 Edith Kaplan, PhD ABCN,1,4 Maureen K. O’Connor, PsyD ABCN,1,5 William E. Rinn, PhD,6 Helen S. Denison, PhD,1,4 Sudha Seshadri, MD,1,2 Philip A. Wolf, MD,1,2 and Rhoda Au, PhD1,2


Background/Study Context

While the Clock Drawing Test (CDT) is a popular tool used to assess cognitive function, limited normative data on CDT performance exists. The objective of the current study was to provide normative data on an expanded version of previous CDT scoring protocols from a large community-based sample of middle to older adults (aged 43 to 91) from the Framingham Heart Study.


The CDT was administered to 1476 Framingham Heart Study Offspring Cohort participants using a scoring protocol that assigned error scores to drawn features. Total error scores were computed, as well as for subscales pertaining to outline, numeral placement, time-setting, center, and “other.”


Higher levels of education were significantly associated with fewer errors for time-setting (Command: p<.001; Copy: p=.003), numerals (Command: p<.001) and “other” (Command: p<.001) subscales. Older age was significantly associated with more errors for time-setting (Command: p<.001; Copy: p=.003), numeral (Command: p<.001) and “other” (Command: p<.001) subscales. Significant differences were also found between education groups on the Command condition for all but the oldest age group (75+).


Results provide normative data on CDT performance within a community-based cohort. Errors appear to be more prevalent in older compared with younger individuals, and may be less prevalent in individuals who completed at least some college compared with those who did not. Future studies are needed to determine whether this expanded scoring system allows detection of preclinical symptoms of future risk for dementia.

Keywords: Clock Drawing Test, Normal aging, Scoring methods, Neuropsychological tests, Dementia, Cognitive screening


The Clock Drawing Test (CDT) is a multifaceted and multidimensional cognitive assessment tool widely used in research and clinical practice. It is particularly sensitive to detecting dysfunction among the elderly and those with neurological and psychiatric disorders (Brodaty & Moore, 1997; Freedman et al., 1994; Hermann et al., 1999; Hubbard et al., 2008). Previous studies have found the CDT to correlate well with a diagnosis of dementia (Heinik, Solomesh, Raikher, & Lin, 2002; Lessig, Scanlan, Nazemi, & Borson, 2008; Shulman, Pushkar, Cohen, & Zucchero, 1993), including Alzheimer’s disease (AD) (Brodaty & Moore, 1997; Sunderland et al., 1989; Wolf-Klein, Silverstone, Levy, & Brod, 1989). While many variations on CDT scoring systems have been reported, they largely share the same general protocol (Freedman et al., 1994; Henderson & Hotopf, 2007; Shulman, 2000). Shulman (2000) did a meta-analysis of CDT scoring systems and found high sensitivity and specificity, high inter-rater and test-retest reliability, and good predictive validity to detect cognitive change.

The utility of CDT as a test of cognitive status lies in the fact that it taps into multiple processes that involve different regions of the brain (Freedman et al., 1994). When asked to draw a clock and set the hands to a pre-specified time, individuals must possess adequate visuoconstructive and visuospatial skills, auditory comprehension, verbal and visual memory, motor programming, numerical knowledge, spatial attention, concentration, frustration tolerance, and executive functioning including organization, planning, abstract reasoning, and parallel processing (Freedman et al., 1994; Shulman, 2000; Libon, Malamut, Swenson, Sands, & Cloud, 1996; Spreen & Strauss, 1998; Ueda et al., 2002). In addition to measuring an array of cognitive skills, the CDT has been shown to offer other advantages to practitioners and researchers including: 1) brief and simple administration protocol (Bourke, Castleden, Stephen, & Dennis, 1995; Hubbard et al., 2008; Shulman, 2000; Shulman, Shedletzky, & Silver, 1986); 2) sensitivity to subtle impairments (Freedman et al., 1994; Libon et al., 1996); 3) low cost (Nishiwaki et al., 2004; Shulman et al., 1986); and 4) correlation with other cognitive tests such as the Blessed Dementia Rating Scale (Brodaty & Moore, 1997), Wechsler Memory Scale (Strauss, Sherman, & Spreen, 2006), and the Mini-Mental State Examination (MMSE) (Brodaty & Moore, 1997; Bourke et al., 1995; Shulman et al., 1986; Shulman, 2000). Ease of administration and acceptability to patients have fueled a recent resurgence of studies on the CDT among clinicians and researchers, each using modified clock-scoring systems (Hubbard et al., 2008; Shulman, 2000).

Studies report that CDT performance does show consistent decline with age (Crowe et al., 2010; Freedman et al., 1994; Spreen & Strauss, 1998; Hubbard et al., 2008; von Gunten, 2008), but no study to our knowledge has reported significant gender differences (Hubbard et al., 2008; Kim & Chey, 2010). While the CDT has been touted as an education unbiased (Lam et al, 1998; Shulman et al., 1986; Yamamoto, et al., 2004), and culture-fair (Marcopulos & McLain, 2003) test, the absence of educational or cultural bias is not clear.

Studies that used low education cut-offs (e.g., ≥ versus < than 7 years) found that those with higher education performed better on the CDT (Kim & Chey, 2010; Marcopulos & McLain, 2003). Another study that compared three CDT scoring systems used higher education cut-offs (e.g., ≥ versus < college graduate) and did not find education to impact CDT performance for two of the scoring systems (Hubbard et al, 2008). Further, the education effect disappeared for the third scoring system when controlling for educational attainment, as measured by a reading test. von Gunten et al. (2008) compared CDT performance across three educational levels and found poorer performance only among those in the lowest educational group that were over age 80. Ratliff et al. (2003) and Hubbard et al. (2008) proposed there is a “ceiling effect” of education on CDT scores because healthy elderly obtain relatively high CDT scores no matter their education level, so education effects are difficult to observe.

Reported ethnic differences on the CDT (Manly et al, 2004; Crowe et al., 2008) find that African American participants do more poorly on the clock drawing test than people of Caucasian ethnicity, even after adjusting for other confounding factors such as age, education and level of urbanization (Crowe et al, 2008). But the use of years of education rather than more objective measures of educational attainment may be an underlying mediating factor (Crowe et al., 2010). Documented systematic social, regional and cultural biases have created educational inequities between the African American and Caucasian U.S. populations, whose impact is seen most significantly among the elderly. Thus, studies that use educational attainment measures, such as the Wide Range Achievement Test – Reading Subtest 3, Arrangement, in place of reported years of education, have found no difference based on ethnicity (Hubbard et al., 2008; Crowe et al., 2010). These studies suggest that in community-based studies with a diverse sample, reading ability may be a better predictor of cognitive performance than years of education.

Growing concern about early detection of AD, particularly at the pre-clinical stages (e.g., Mild Cognitive Impairment [MCI]), has propelled investigations as to whether the CDT can distinguish between normal and mildly impaired performance. While some studies find no differentiation (Lee at al., 2008; Powlishta et al., 2002; Seigerschmidt, Mosch, Siemen, Forstl, & Bickel, 2002), others do find CDT performance differences (De Jager, Hogervorst, Combrinck, & Budge, 2003; Yamamoto et al., 2004). The discrepancy in results may lie in the use of scoring systems with differential levels of sensitivity. Recent studies (Crowe et al. 2008; Crow et al. 2010; Hubbard et al., 2008; von Gunten et al., 2008) have used scoring protocols that result in a wide range of scores (10–36 points). While a presumption might be made that more possible points reflects potential higher sensitivity in detecting cognitive change, in fact, it is the type of feature being scored that is a greater determinant of sensitivity rather than the number of points scored. With AD research focused on pre-clinical detection, a CDT scoring system that enhances sensitivity to subtle changes is warranted (Hubbard et al., 2008).

Despite widespread use of the CDT in both clinical and research settings, there is limited normative data on CDT performance (Hubbard et al., 2008). Freedman et al. (1994) developed one of the initial CDT scoring protocols to establish normative data on clock drawing performance. It incorporated qualitative features of the clock and consisted of 15 descriptors or “critical items” characteristic of a “good” clock, with one point assigned for presence of each critical item. The 15-point range was divided into 4 subgroups: clock outline (2 points), numerals (6 points), time setting (6 points), and center (1 point). They tested their CDT scoring system on 348 subjects ranging in age from 20 to 90 years in community centers in the Toronto area. To discriminate demented from non-demented participants, they also examined CDT performance in a subgroup of 18 non-demented participants, 20 with Parkinson’s disease but no dementia, 14 with Parkinson’s disease and dementia, and 13 with AD. They found that demented individuals scored below 12 out of a possible 15 points. Their cut-off score showed high sensitivity (.78), specificity (.82), and construct validity to performance on other neuropsychological tests including the Wechsler Adult Intelligence Scale-Revised (WAIS-R), the Wisconsin Card Sorting Test, and the Rey-Osterrieth Complex Figure (Freedman et al., 1994).

Hubbard et al. (2008) conducted a recent CDT normative study examining the reliability, validity, clinical utility, and effects of demographics of three previously published CDT scoring systems. They applied the scoring systems to clocks drawn by 207 cognitively intact participants aged 55 to 98. They discovered that the range of CDT performance that would be considered normal is greater than previously reported in studies with smaller samples sizes. They suggested some variation in drawing ability was associated with age and education and should not be assumed to be abnormal. The authors proposed that what constituted “normal performance” based on these three systems was too narrow. They recommended further comparison of cognitively healthy participants to determine whether specific CDT errors are representative of the natural consequences of normal aging rather than necessarily indicators of the early stages of a progressive dementia. The objective of the current study is to provide normative data on an expanded version of previous CDT scoring protocols from a large community-based sample of middle to older adults (aged 43 to 91) from the Framingham Heart Study. Two primary reasons were the basis for expanding the scoring system. First, observations from Hubbard et al. (2008) that the range of performance that defines normal may be greater than has been documented suggests the need for a scoring system that can better detect a broader range of errors. Second, beginning with Peterson’s seminal paper on MCI (1999), the research on preclinical dementia particularly for Alzheimer’s disease, has become increasingly focused on earlier detection. Pre-MCI is emerging as a pre-symptomatic stage that neuropsychological tests, as currently scored, will not be able to detect (Sperling et al., 2011). Led by Dr. Edith Kaplan, who has developed earlier CDT scoring protocols, the Framingham Heart Study CDT scoring system sought to better capture subtle differences in clock drawing performance in people who are still years, possibly decades away from risk of Alzheimer’s disease.



The FHS Offspring Cohort (n= 5124) was recruited in 1971 with the intent of establishing a prospective epidemiological study of young adults (mean age 37) to identify risk factors for cardiovascular and cerebrovascular diseases (Garrison, Kannel, Stokes, & Castelli, 1987). Biological Offspring of the Original FHS Cohort and their spouses were eligible to enroll. These members were invited to health examinations approximately every four years that included a detailed medical history, physical examinations, and laboratory tests (Kannel, Feinleib, McNamara, Garrison, & Castelli, 1979).

All Offspring participants who completed examination cycle 7 (n=3539) and also participated in a first round of neuropsychological testing (1999–2005) were recruited for a follow-up evaluation on brain imaging and cognition. A total of 1516 participants completed the Command and Copy CDT as part of a more comprehensive neuropsychological (NP) battery between 2005 and 2007. Participants with prevalent clinical stroke, dementia, or other neurological diseases (e.g., multiple sclerosis, severe head trauma, etc.) were excluded from the study (n=40). Thus, 1476 participants (680 men, 796 women) comprised the total sample for the normative study. Compared to the 2063 Offspring participants who completed examination cycle 7, but were not administered the Command and Copy CDT, the normative sample was younger (60.44 versus 63.05 years; p<.001) and scored higher on the MMSE (29.03 versus 28.28; p<.001). The normative sample was also healthier compared to the remaining participants from examination cycle 7 with less stage I hypertension (38% versus 52%; p<.001), prevalent cardiovascular disease (8% versus 14%; p<.001), diabetes (10% versus 16%; p<.001), and smokers (10% versus 16%; p<.001). Diagnosis of clinical stroke and dementia was done by consensus review panels and met standard diagnostic criteria (e.g., DSM IV and NINDS-ADRDA criteria for dementia/Alzheimer’s disease, respectively). Diagnostic procedures and criteria have been described in detail elsewhere (Seshadri et al., 2006). The proportion of excluded participants was relatively small because the majority of our sample was younger (only 6% over age 75 and 68% under age 65). The Institutional Review Board at Boston University Medical Center approved the study protocol and informed consent was obtained from all participants.

Clock Drawing Administration Procedure

The CDT Command condition was administered using the standardized instructions, “Draw a clock, put in all the numbers, and set the hands to ten after eleven.” For the Copy condition, participants were shown an image of a clock and instructed to copy it. The Copy condition was administered approximately fifteen minutes after the Command Condition.

Clock Drawing Scoring Protocol

A task group of five neuropsychologists (Devine et al., 2007) led by Edith Kaplan, one of the investigators who developed the Freedman et al. (1994) protocol, established the modified Framingham Heart Study Clock Drawing Test Scoring Protocol (FHS-CDT-SP) to enhance the sensitivity of the Freedman model. They assigned error scores to 38 qualitative features. Similar to Freedman’s model but calculating error scores instead of correct scores for critical features, the FHS-CDT-SP provides quantitative scores that include an overall summary error score, as well as subscale error scores related to outline, numeral placement, center, time-setting, and “other” (i.e. extraneous marks and self-corrected errors) characteristics of drawn clocks (Appendix A). Table 1 illustrates the FHS-CDT-SP Command and Copy condition breakdown of error points by subscale. Numerals (0–9 points) and time-setting (0–7 points) subscales comprise a majority of the 20.5 total possible error points and a majority of the 38 qualitative features. The FHS-CDT-SP algorithm was designed so that committing errors on some features (e.g., numerals are represented only by substitutes) excludes the possibility of making others (e.g., numerals are written out only). Additionally, participants can make multiple errors without receiving the maximum error score for each subscale (e.g., had only one 12 present [1 point], duplicated a numeral [1 point], and represented one numeral with a substitute [1 point] for a total of 3 out of 9 possible numeral placement error points).

Table 1
FHS-CDT-SP Error Score Rating System By Subtype

Trained examiners scored the majority of Command and Copy condition clock variables based on observation, but some features required precise measurements. These measurements were calculated with two templates. Template 1 (Figure 1), a transparency sheet with concentric circles 10 millimeters apart, was placed directly over and pinned to the center of Template 2 (Figure 2), a transparency sheet with intersecting lines spaced 10 millimeters apart. Scorers used Template 1 to locate the center of the drawn circle and rotated Template 2 accordingly to calculate the longest diameter and its perpendicular bisecting diameter. Template 1 was also used to find the center of the numeral array, and that center was used with Template 2 to find the medial deviation, numeral displacement, and center of the hands (Appendix A).

Figure 1
CDT Template 1: 9 cm × 9 cm.
Figure 2
CDT Template 2: 9 cm × 9 cm.

Two quality control measures were taken to ensure consistency of scoring. First, two separate trained examiners independently score the same tests to ensure reliability of the scoring system. An inter-rater reliability score of 0.945 for the Command condition and 0.868 for the Copy condition was determined from a subset of 100 clocks. Second, a neuropsychologist (S.D.) performed a weekly review of a random set of tests for scoring accuracy. The average % of scoring errors across the entire 50 minute neuropsychological test battery, including the clock was less than 1%.

Statistical Analysis

Measures of demographic characteristics and clock drawing error scores were summarized with means and standard deviations for the continuous measures and number and percentages for the categorical variables. Since the distributions of the command summary error score (CmSES) and the copy summary error score (CoSES) were skewed, we performed our analyses using the natural logarithms of CmSES [LCmSES)] and CoSES [LCoSES)]. A general linear model was used to compare these error scores by age, gender, education and age×education; post-hoc pair-wise comparisons were made among the sub-groups using Tukey’s procedure. Chi-square tests were used to assess relations between categorical measures of the components of CmSES and CoSES, and gender, age-groups, and education levels. All data were analyzed using SAS version 9.1.



Table 2 provides demographic information on the total study sample. Mini-mental State Examination scores from the 7th cycle examination were high, with a mean of 29 out of 30 points, suggesting that that study sample was of normal cognition. The racial composition of the sample was Caucasian and fairly well-educated, with 65% having at least some college.

Table 2
Offspring Cohort Descriptive Statistics

Command Condition Error Score Totals by Subscale

The number and percent of participants making errors and no errors on each feature measured by the FHS-CDT-SP on the Command condition are displayed in Table 1. Commission of errors with low assigned point values was relatively common. For example, well over half (63.87%) of participants (n=928/1453) drafted an outline that was insufficiently round (e.g., oval or distorted). Other prevalent errors included 626 of 1445 (43.22%) participants displaced a non-anchor numeral (n=626/1445, 43.22%) 1 or 1.5 half-hour positions from their measured correct location on the numeral array (e.g., placing a “2” in the “1” location) and 626 of 1436 participants (43.59%) drew their hands shifted their hands up or down from the true center of their numeral array.

Features with high assigned point values were less common. For instance, only 15 of 1467 participants (1.02%) did not construct a circular outline and 4 of 1468 participants (0.27%) omitted hands or included a clock with three (n=10/1468, 0.68%) or four or more hands (n=1/1468, 0.07%).

Command Condition Error Scores [%]

Table 3a displays the mean and percentages for the CmSES stratified by gender, and age, and Table 3b provides the overall mean LCmSES, raw summary error scores, and percentage error scores by subscale for education categories. Log transformed scores were also provided because raw scores provide normative values of performance, while transformed scores allowed comparison across skewed CDT measures. Considering the distributions were skewed and most participant clocks had low error scores, we analyzed participant error percentages overall (>0 points) and at higher error intervals (>0.25 points, >0.5 points) for each subscale to compare percentages of more critical errors. Error performance was found to be different across both age (p<0.001) and education (p<0.001), as well as for specific subscores (see Tables 3a & 3b). Numeral and time-setting subscale errors percentages overall and at higher error intervals were the most highly associated with age and education.

Copy Condition Error Scores [%]

Similar to the Command Condition, Table 4a illustrates for the Copy Condition the comparison of LCoSES, raw error summary scores, and error percentage scores by gender and age and Table 4b presents the overall mean LCoSES, raw summary error scores, and error percentage scores by education. Again, log transformed scores were included in the table to provide normative distributions and comparative values across skewed measures. Error performance was found to be significantly different across both age (p< 0.004) and education (p<0.001). The time-setting subscale error percentage scores were most highly associated with age and education.

Command and Copy Condition Errors Scores - Age × Education

The LCmSES and LCoSES and raw errors scores for the Command and Copy condition by age×education are shown in Table 5. Log transformed scores again allowed for across measure comparisons. The lowest education level (e.g., <high school) was not included because there were too few subjects. Significant differences were found between education groups on the Command condition for all but the oldest age group (75+).

Table 5
CDT Command and Copy Condition Error Scores By Age × Education


This study reported CDT findings on a large, community-based sample across both broader age and educational categories compared to earlier published normative data. The comprehensive FHS-CDT-SP scoring system, as applied to this sample, supports previous suggestions that the distribution of errors among participants with normal clock drawings is greater than has been published to date (Hubbard et al., 2008). The data derived from the FHS-CDT-SP suggest that age, education, and gender may need to be considered when evaluating whether specific types of errors made on the CDT are unusual relative and potentially clinically significant to a population of healthy aging individuals.

A major advantage of this study is the large cohort sample. The data show that presumed cognitively intact individuals make few errors of probable clinical significance; however, patterns emerged that suggest that older age and lower education are related to the commission of more errors. The overall mean transformed participant error score was 0.95 (raw error score= 1.93) on the clock Command condition and 0.77 on the Copy condition (raw score= 1.33) out of 3.02 possible points (raw score= 20.5 possible points). The overall mean error score is higher on the Command condition because presumably drawing a free-hand clock requires semantic memory and a greater dependence on abstract skills than copying a clock (Freedman et al., 1994).

Participants committed a multitude of minor errors on each of the major tasks involved in drawing or copying a clock. The most common errors included clocks with directional diameters (i.e, the outline was not round), clearly irregularly shaped outlines, displaced numerals from their correct half-hour location, hour and minute hands of the same length, and hands that were shifted from the center of the numeral array. Participants were not instructed to draw a precise clock and some were aware that they were being timed, so many of these minor errors may be a result of imprecision or rushing under pressure. Clocks rarely demonstrated errors on features with higher assigned point values such as an absent outline, absent numerals, duplication or omission of numerals, no hands or more than two hands, or hands that do not essentially meet in the center of the numeral array. These results demonstrate that although relatively infrequent, healthy aging participants do make errors.

Gender Differences

Results show that gender differences on CDT performance are mixed. The overall Command and Copy error scores did not show a gender difference. Females made a significantly greater percentage of errors in centering their clock hands on the Command and Copy conditions. Men had significantly higher error percentage scores on time-setting on the Copy condition, but marginally significant lower error percentages scores on time-setting errors higher than 0.5 points on the Command condition. Men also had higher percentage scores on outline errors greater than 0.25 points on the Command condition and “other” subtype errors greater than 0.5 points on the Copy condition. Our results suggest that it is not likely there are significant gender differences overall, but females may make more errors in centering their clock, while men may make more time-setting or outline errors. Further study is necessary to discern whether gender differences have clinical relevance.


Overall CDT error scores were lower in individuals who reported a college education on Copy and Command conditions. Error percentage scores were also slightly lower for individuals with higher education on the center subscale, and significantly lower on numerals, time-setting, and “other” subscales on the Command condition, and on numerals, time-setting, and “other” error subscales when the analysis for errors >0.5 points was performed. Error percentage scores among those with more education decreased significantly on the time-setting subscale overall, and on time-setting and center subscales for errors higher than 0.5 points on the Copy condition. These results are consistent with previous reports of better performance on the CDT was associated with the highest levels of reported education (Kim & Chey, 2010; von Gunten et al., 2008). In addition, significant differences occurred between the error percentages among the education groups on the center subscale of the Copy condition, but those with less than a high school education and college degree scored similarly, while those with a high school and some college had higher error percentages. This inconsistent finding may be due to the fact that the less than high school education group is small (n=42) compared to the other education groups (high school, n=473; some college, n=371; college degree, n=590). Numeral and time-setting subscale error percentages were the most highly associated with education on the Command condition and time-setting error percentages on the Copy condition. Time-setting errors percentages scores were more significant on the Command than Copy condition. These results may have clinical relevance, particularly with a higher percentage of time-setting errors among those with lower education on both the Command and Copy condition. Finally, the impact of education level on CDT error scores by age group still showed significant differences on the Command condition for all but the oldest age group (75+). However, significant differences between the education groups disappeared when analyzed by age on the Copy condition. Thus, these education differences should be observed longitudinally to see the relationship between education and CDT performance as individuals continue to age.


Previous literature reported a decline in CDT performance among cognitively normal adults as they increase in age (Freedman et al., 1994; Spreen & Strauss, 1998), which is corroborated by the current findings. Overall, the mean LCmSES were significantly different among those under 55, between 55 and 65, 65 to 75, and those over 75. The overall mean LCmSES for individuals over 75 was 1.21 compared to 0.84 for those under age 55. We also found significantly different error percentage scores among the age groups on the Command condition numeral, time-setting, and “other” subscales, and on outline, numerals, and time-setting subscales for errors assigned a value of greater than 0.5 points. The overall mean LCoSES increased across age groups on the Copy condition. Furthermore, time-setting and center subscale error percentages and time-setting and numeral subscale error percentages greater than 0.5 points were significantly more prevalent in the 75 and older age group on the Copy Condition. Freedman et al. (1994) similarly found that the greatest increase in error scores occurs above age 70. Similar to the results found for education, numeral and time-setting subscale error percentages scores were most highly associated with age on Command Condition and time-setting error percentages on the Copy Condition. Again, the associations were stronger on the Command condition than the Copy condition, and further research is necessary to determine if there is clinical relevance. Results suggest that as cognitively healthy individuals age, they may show subtle signs of difficulty in recall and planning or sequencing associated with more frequent errors especially on numerals and time-setting, while minor errors in creating an outline may not change significantly.


This study has several limitations. The Offspring cohort is relatively well-educated, with 39.9% of the sample having at least a college degree and 97% of the sample finished high school. Analyses for the lowest education level (less the high school) was likely underpowered, masking any potential relationship between low education and poorer CDT performance. Also, this was a relatively young study sample, with fewer subjects in the highest age category (75+) compared to the younger age categories. Given that age and education were significant factors in predicting error scores, these data may be an underestimate of error scores in the general population. Further, the participants were Caucasian and thus results could not address the mixed findings of whether CDT performance is affected by race. Further, these results may not be generalizable to other ethnically diverse populations. This study also only examined cross-sectional clock drawing data. Longitudinal follow-up is necessary to determine whether these performance patterns are clinically meaningful.


These normative data provide the distribution of quantified error scores on CDT performance in a cohort population. Results suggest that a variety of errors are seen on CDT Command and Copy conditions in a healthy aging population, and these errors are more prevalent in older compared with younger individuals, and may be less prevalent in individuals who completed at least some college compared with those who did not. Future studies are needed to determine whether this expanded scoring system allows detection of preclinical symptoms of future risk for dementia.


The project described was supported by grants from the National Institute of Neurological Disorders and Stroke (grant number NS17950); the National Institute of Aging (grant numbers AG08122, AG16495); and the National Heart, Lung and Blood Institute’s Framingham Heart Study (grant number NIH/NHLBI Contract # N01-HC-25195). Edith Kaplan, Ph.D. and Helen Denison, Ph.D, two core contributors to the development of the Framingham Heart Study Clock Scoring Protocol, are now deceased. Dr. Kaplan has influenced generations of neuropsychologists on test administration and interpretation, and is world renown for her expertise in analysis of the Clock Drawing Test. Dr. Denison’s clinical wisdom has also guided many in the field and was critical in the creation of this scoring protocol. They are forever missed for what they have done and could still be doing.

Appendix A

Clock Drawing Subscale Error Explanations [Please note: Although examiners captured data on all the following variables, we were still in the process of accurately defining some of the variables, so not all of the following were included in our analysis]


The outline variables relate to the outer form, typically a circle, drawn by the participant.

  1. Clock outline is present: This variable codes whether an outline is present, and, if so, whether it is a circle or some other shape (e.g., a square).
  2. Outline formed by a continuous curve: The examiner watches the participant draw the outline and codes whether a continuous curve is used or the participant lifts his or her pen.
  3. Direction of motion in creating outline: Again, the examiner observes, and records, whether the participant draws the outline in a clockwise, counterclockwise, or mixed direction.
  4. Length of longest diameter: Template 2 (Figure 2) is rotated until the x-axis lies along the longest diameter. Scorers record the two radii measurements of this diameter in millimeters.
  5. Length of perpendicular bisecting line of the longest diameter: Once the longest diameter is identified and measured, its perpendicular bisector is measured, again in millimeters.
  6. Direction of the longest diameter: Recorded is whether the longest diameter is horizontal, vertical, or oblique. If the circle is nearly round, direction of the longest diameter is coded as “not applicable”.
  7. Clock outline is clearly irregular, angular, lumpy, or has indentations: To be coded as irregular, the irregularity has to be obvious to the examiner looking at the outline; subtle irregularity is not considered.
  8. Clock outline perseveratively overdrawn: This is coded as present if the participant draws the circle and then continues to draw along the circle, going more than two times around.
  9. Perseveration of clock outline: The outline is considered perseverated if participants draw multiple outlines, without indication that they are trying to correct, or improve the drawing.


  1. The presence of numerals: Numerals and numeral substitutes may be absent altogether, numeral substitutes may be present without actual numerals, or numerals can be written out (e.g., “six”), Arabic, Roman, or some combination of these last three.
  2. Rotated paper while placing numerals/numeral substitutes: This is coded as present if the participant rotates the paper 180 degrees while placing the numerals, resulting in the numbers in the bottom quadrant being upside-down or angled.
  3. Anchor numerals or substitutes placed before any others: Anchor numerals are 12, 3, 6, and 9. Placing these numerals first often assists in spatial placement of the other numerals. All anchors placed before the first non-anchor are recorded (e.g., if a participant started by writing “12,” “6,” “1,” then 12 and 6 are coded).
  4. Numerals or substitutes are placed on or outside the outline: This can be coded as none, at least one but not all, or all.
  5. Measurement of the most medially deviated numeral: Medial deviation refers to how far the numerals are from the drawn outline. Template 1 (Figure 1) is used to find the center of the numeral array. Template 2 (Figure 2) is used to find the numeral that is furthest from the drawn outline (toward the center) by measuring the line that is projected through the center of that numeral to find the location on the outline on which to measure. Using the outside edge of the number, scorers measure the distance to the inside edge of the drawn circle, measured in millimeters.
  6. Displacement of anchor (12, 3, 6, 9): Template 1 is used to find the center of the numeral array. Template 2 is aligned with the numbers on the drawn clock such that it is the “best fit” for the anchor numeral. The anchor that is most displaced from the correct half-hour position is measured. Starting at the numerals true hour line, scorers count the number of ½ hour lines to the location of the displaced number.
  7. Displacement of nonanchor (1, 2, 4, 5, 7, 8, 10, 11): Without moving the template from the position determined by the anchor numerals, the nonanchor numeral that is furthest from where it should be is measured in the same way as displacement of anchor.
  8. Dots, words, symbols, or other marks are substituted for one or more numerals.
  9. Two 12s are present
  10. Duplication of numerals (other than 12)
  11. Omission of numerals
  12. Numerals (or substitutes) beyond 12 are present.
  13. Sequencing errors: Sequencing errors refer to placement of numerals that do not follow the correct order (e.g., 1, 2, 4, 3). This does not include omitted numerals, duplicated numerals, or numerals placed in a counter-clockwise order (e.g., the 1 in the 11 position, 2 in the 10 position).

Time Setting

As mentioned above, participants are asked to place the hands to “ten after eleven.”

  1. Number of drawn hands: This is coded as “none,” “one,” “two,” “three,” or “four or more.”
  2. One hand is correctly pointing to the 11 AND the 11 is in the correct location.
    1. If not, then the 11 OR 11 position are indicated in some manner (e.g., a circle is drawn around the 11, but no hand is pointing to it or a hand points to the “11” location but there is no “11” in that location).
  3. One hand is correctly pointing to the 2 AND the 2 is in the correct location.
    1. If not, then the 2 OR 2 position is indicated in some manner.
  4. One hand incorrectly points to the 10.
  5. Length of the hour hand versus the minute hand: This is measured as minute hand longer, hour hand longer, or hands are equal.


The center is determined by the intersection point of two or more hands. Two hands must be present, and they must essentially meet. That is, if the ends of the hands come close enough that it is clear where they would meet if their end points did connect, then that determined intersection is measured. Template 1 and 2 are used to determine the location of the center relative to the numeral array.

  1. Two hands are present AND they essentially meet.
  2. On the horizontal axis, the point of intersection of the hands is left, center, or right.
  3. On the vertical axis, the point of intersection of the hands is down, center, or up.

Other Variables

  1. Participant asks for a reminder of the time for time-setting.
  2. Attempt to self-correct any error: If no errors are present, this variable is coded as “not applicable.” If at least one error is present, this variable is scored as either “no” (there was no attempt to correct an error), “Yes, correct” (the correction resulted in an error-free clock), or “Yes, NOT correct” (a correction was attempted, but it did not lead to an error-free clock).
  3. Time to completion.
  4. Extraneous marks in clock: This refers to extra marks such as bisecting lines, criss-crossing lines, or radiating lines. Tick marks used as numeral place markers are not considered extraneous marks.
  5. Other observations: Any unusual characteristics of the drawn clock are indicated here, with a text description.
  6. Tester’s clinical assessment of clock drawing: The scorer makes a judgment about whether the overall clock is normal, mildly impaired, moderately impaired, or severely impaired.


  • Bourke J, Castleden M, Stephen R, Dennis M. A comparison of clock and pentagon drawing in Alzheimer’s disease. International Journal of Geriatric Psychiatry. 1995;10:703–705.
  • Brodaty H, Moore CM. The Clock Drawing Test for dementia of the Alzheimer's type: A comparison of three scoring methods in a memory disorders clinic. International Journal of Geriatric Psychiatry. 1997;12:619–627. [PubMed]
  • Crowe M, Clay OJ, Sawyer P, Crowther MR, Allman RM. Education and reading ability in relation to differences in cognitive screening between African American and Caucasian older adults. International Journal of Geriatric Psychiatry. 2008;23:222–223. [PubMed]
  • Crowe M, Allman RM, Triebel K, Sawyer P, Martin RC. Normative performance on an executive clock drawing task (CLOX) in a community-dwelling sample of older adults. Archives of Clinical Neuropsychology. 2010;25:610–617. [PMC free article] [PubMed]
  • De Jager CA, Hogervorst E, Combrinck M, Budge MM. Sensitivity and specificity of neuropsychological tests for mild cognitive impairment, vascular cognitive impairment and Alzheimer's disease. Psychological Medicine. 2003;33:1039–1050. [PubMed]
  • Devine S, Au R, Du Y, Beiser A, Denison H, Rinn W, O’Conner M, Seshadri S, Wolf P, Kaplan E. Normative data for the clock drawing test: Results from the Framingham Heart Study; Poster presented to the Annual Meeting of the International Neuropsychological Society, Maui, HI; 2007. Aug,
  • Freedman M, Leach L, Kaplan E, Winocur G, Shulman KI, Delis D. Clock Drawing: A Neuropsychological Analysis. New York: Oxford University Press; 1994.
  • Garrison RJ, Kannel WB, Stokes J, III, Castelli WP. Incidence and precursors of hypertension in young adults: The Framingham Offspring Study. Preventive Medicine. 1987;16:235–251. [PubMed]
  • Heinik J, Solomesh I, Raikher B, Lin R. Can clock drawing test help to differentiate between dementia of the Alzheimer's type and vascular dementia? A preliminary study. International Journal of Geriatric Psychiatry. 2002;17:699–703. [PubMed]
  • Heinik J, Solomesh I, Berkman P. Correlation between the CAMCOG, the MMSE, and three clock drawing tests in a specialized outpatient psychogeriatric service. Archives of Gerontology and Geriatrics. 2004;38:77–84. [PubMed]
  • Henderson M, Hotopf M. Use of the clock-drawing test in a hospice population. Palliative Medicine. 2007;21:559–565. [PubMed]
  • Herrmann N, Kidron D, Shulman KI, Kaplan E, Binns M, Soni J, Leach L, Freedman M. The use of clock tests in schizophrenia. General Hospital Psychiatry. 1999;21:70–73. [PubMed]
  • Hubbard EJ, Santini V, Blankevoort CG, Volkers KM, Barrup MS, Byerly L, Chaisson C, Jefferson AL, Kaplan E, Green RC, Stern RA. Clock drawing performance in cognitively normal elderly. Archives of Clinical Neuropsychology. 2008;23:295–327. [PMC free article] [PubMed]
  • Kannel WB, Feinleib M, McNamara PM, Garrison RJ, Castelli WP. An investigation of coronary heart disease in families. The Framingham offspring study. American Journal of Epidemiology. 1979;110:281–290. [PubMed]
  • Kim H, Jeanyung C. Effects of education, literacy, and dementia in the Clock Drawing Test Performance. Journal of the International Neuropsychological Society. 2010;16:1138–1146. [PubMed]
  • Lam LCW, Chui HFK, Ng KO, Chan C, Chan WF, Li SW, Wong M. Clock-face drawing, reading and setting tests in the screening of dementia in the Chinese Elderly Adults. Journal of Gerontology: Psychological Sciences. 1998;53B(6):353–357. [PubMed]
  • Lee KS, Kim EA, Hong CH, Lee DW, Oh BH, Cheong HK. Clock drawing test in mild cognitive impairment: Quantitative analysis of four scoring methods and qualitative analysis. Dementia and Geriatric Cognitive Disorders. 2008;26:483–489. [PubMed]
  • Lessig MC, Scanlan JM, Nazemi H, Borson S. Time that tells: Critical clock-drawing errors for dementia screening. International Psychogeriatrics. 2008;20:459–470. [PMC free article] [PubMed]
  • Libon DJ, Malamut BL, Swenson R, Sands LP, Cloud BS. Further analyses of clock drawings among demented and nondemented older subjects. Archives of Clinical Neuropsychology. 1996;11:193–205. [PubMed]
  • Manly JJ, Byrd DA, Touradji P, Stern Y. Acculturation, reading level, and neuropsychological test performance among African American elders. Applied Neuropsychology. 2004;11(1):37–46. [PubMed]
  • Marcopulos BA, McLain CA. Are our norms “normal”? A 4-year follow up study of a biracial sample of rural elders with low education. Clinical Neuropsychologist. 2003;17(1):19–33. [PubMed]
  • Nishiwaki Y, Breeze E, Smeeth L, Bulpitt CJ, Peters R, Fletcher AE. Validity of the Clock-Drawing Test as a screening tool for cognitive impairment in the elderly. American Journal of Epidemiology. 2004;160:797–807. [PubMed]
  • Peterson RC, Smith GE, Waring SC, Ivnik RJ, Tangalos EG, Kokman E. Mild cognitive impairment: Clinical characterization and outcome. Archives of Neurology. 1999;56(3):303–308. [PubMed]
  • Powlishta KK, Von Dras DD, Stanford A, Carr DB, Tsering C, Miller JP, Morris JC. The clock drawing test is a poor screen for very mild dementia. Neurology. 2002;59:898–903. [PubMed]
  • Ratcliff G, Dodge H, Birzescu M, Ganguli M. Tracking cognitive function over time: Ten-year longitudinal data from a community-based study. Applied Neuropsychology. 2003;10(2):76–88. [PubMed]
  • Seigerschmidt E, Mosch E, Siemen M, Forstl H, Bickel H. The clock drawing test and questionable dementia: reliability and validity. International Journal of Geriatric Psychiatry. 2002;17:1048–1054. [PubMed]
  • Seshadri S, Beiser A, Kelly-Hayes M, Kase CS, Au R, Kannel W, Wolf PA. The lifetime risk of stroke: Estimates from the Framingham Heart Study. Stroke. 2006;37:345–350. [PubMed]
  • Shulman KI. Clock-drawing: Is it the ideal cognitive screening test? International Journal of Geriatric Psychiatry. 2000;15:548–561. [PubMed]
  • Shulman KI, Pushkar Gold D, Cohen CA, Zucchero CA. Clock-drawing and dementia in the community: A longitudinal study. International Journal of Geriatric Psychiatry. 1993;8:487–496.
  • Shulman K, Shedletzky R, Silver I. The challenge of time: Clock drawing and cognitive function in the elderly. International Journal of Geriatric Psychiatry. 1986;1:135–140.
  • Spreen O, Strauss E. A Compendium of Neuropsychological tests: Administration, norms and commentary. 2nd Ed. New York: Oxford University Press; 1998.
  • Sperling RA, Aisen PS, Beckett LA, Bennett DA, Craft S, Fagan AM, Iwatsubo T, Jack CR, Kaye J, Montine TJ, Park DC, Reiman EM, Rowe CC, Siemers E, Stern Y, Yaffe K, Carrillo MC, Thies B, Morrison-Bogorad M, Wagster MV, Phelps CH. Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & Dementia. 2011;7:280–292. [PMC free article] [PubMed]
  • Strauss E, Sherman S, Spreen O. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary. 3 Ed. New York: Oxford University Press; 2006.
  • Sunderland T, Hill JL, Mellow AM, Lawlor BA, Gundersheimer J, Newhouse PA, Grafman JH. Clock drawing in Alzheimer's disease: A novel measure of dementia severity. Journal of the American Geriatrics Society. 1989;37:725–729. [PubMed]
  • Ueda H, Kitabayashi Y, Narumoto J, Nakamura K, Kita H, Kishikawa Y, Kenji F. Relationship between clock drawing test performance and regional cerebral blood flow in Alzheimer's disease: A single photon emission computed tomography study. Psychiatry and Clinical Neurosciences. 2002;56:25–29. [PubMed]
  • Wolf-Klein GP, Silverstone FA, Levy AP, Brod MS. Screening for Alzheimer's disease by clock drawing. Journal of the American Geriatrics Society. 1989;37:730–734. [PubMed]
  • von Gunten A, Ostos-Wiechetek M, Brull J, Vaudaux-Pisquem I, Cattin S, Duc R. Clock-drawing test performance in the normal elderly and its dependence on age and education. European Neurology. 2008;60:73–78. [PubMed]
  • Yamamoto S, Mogi N, Umegaki H, Suzuki Y, Ando F, Shimokata H, Akihisa I. The clock drawing test as a valid screening method for mild cognitive impairment. Dementia and Geriatric Cognitive Disorders. 2004;18:172–179. [PubMed]