|Home | About | Journals | Submit | Contact Us | Français|
Although cholinesterase inhibitors have produced statistically significant treatment effects, their clinical meaningfulness in Alzheimer's disease is disputed. An important aspect of clinical meaningfulness is the extent to which an intervention meets the goals of treatment.
In this randomized controlled trial, patients with mild to moderate Alzheimer's disease were treated with either galantamine or placebo for 4 months, followed by a 4-month open-label extension during which all patients received galantamine. The primary outcome measures were Goal Attainment Scaling (GAS) scores from assessments by clinicians and by patients or caregivers of treatment goals set before treatment and evaluated every 2 months. Secondary outcome measures included the cognitive subscale of the Alzheimer's Disease Assessment Scale (ADAS-cog), the Clinician's Interview-based Impression of Change plus Caregiver Input (CIBIC-plus), the Disability Assessment for Dementia (DAD) and the Caregiving Burden Scale (CBS). To evaluate treatment effect, we calculated effect sizes (as standardized response means [SRMs]) and p values.
Of 159 patients screened, 130 (mean age 77 [standard deviation (SD) 7.7]; 63% women) were enrolled in the study (64 in the galantamine group and 66 in the placebo group); 128 were included in the analysis because they had at least one post-baseline evaluation. In the intention-to-treat analysis, the clinician-rated GAS scores showed a significantly greater improvement in goal attainment among patients in the galantamine group than among those in the placebo group (change from baseline score 4.8 [SD 9.6]) v. 0.9 [SD 9.5] respectively; SRM = 0.41, p = 0.02). The patient– caregiver-rated GAS scores showed a similar improvement in the galantamine group (change from baseline score 4.2 [SD 10.6]); however, because of the improvement also seen in the placebo group (2.3 [SD 9.0]), the difference between groups was not statistically significant (SRM = 0.20, p = 0.27). Of the secondary outcome measures, the ADAS-cog scores differed significantly between groups (SRM = –0.36, p = 0.04), as did the CIBIC-plus scores (SRM = –0.40, p = 0.03); no significant differences were in either the DAD scores (SRM = 0.28, p = 0.13) or the CBS scores (SRM = –0.17, p = 0.38).
Clinicians, but not patients and caregivers, observed a significantly greater improvement in goal attainment among patients with mild to moderate Alzheimer's disease who were taking galantamine than among those who were taking placebo.
The role of cholinesterase inhibitors in treating mild to moderate Alzheimer's disease is controversial. Although these drugs have produced statistically significant treatment effects,1 their clinical meaningfulness is disputed.2–7 How to test clinical meaningfulness is unclear.8 In dementia drug trials, American regulators have required as primary outcome measures9 both a neuropsychologic battery of tests (usually the cognitive subscale of the Alzheimer's Disease Assessment Scale [ADAS-cog]10) and a scale (usually the Clinician's Interview-based Impression of Change plus Caregiver Input [CIBIC-plus])11 completed by an experienced clinician. Still, critics have charged that the instruments do not translate to usual care, that the trials were too short and that the effects were too small.2–4,6,7 On the other hand, many physicians believe that the trials did not capture meaningful treatment effects that they recognize in the clinical setting.12
Clinical meaningfulness can be assessed in part by the extent to which an intervention meets the goals of treatment.13 Here we report our findings from a clinical trial in which we tested the efficacy of galantamine by using Goal Attainment Scaling14 (GAS) to detect change, and we compared those findings with results from other validated instruments.
We targeted English-speaking people with probable Alzheimer's disease (as determined by NINCDS-ADRDA criteria15) at 10 sites across Canada for treatment with the cholinesterase inhibitor galantamine.16 Patients with mild to moderate dementia (Mini-Mental State Examination [MMSE]17 score of 10–25 inclusive and an ADAS-cog18 score of at least 18) were eligible. We excluded patients who were in nursing homes, those who had disabling communication difficulties (problems in language, speech, vision or hearing), other active medical issues or competing causes of dementia, patients who had taken anti-dementia medications within 30 days before screening for study enrolment, those who were hypersensitive to cholinomimetic agents or bromide and those who had been in other galantamine trials. Eligible patients needed to have daily contact with a responsible caregiver.
For our study — the Video-Imaging Synthesis of Treated Alzheimer's disease (VISTA) trial — we used a 16-week randomized, double-blind, parallel-group, placebo-controlled design, with a 16-week open-label follow-up period during which all study patients were given galantamine (see online Appendix 1, available at www.cmaj.ca/cgi/content/full/174/8/1099/DC1). To understand treatment effects better, interviews were digitally video-recorded. After screening, eligible patients were randomly assigned to receive either galantamine (16–24 mg/d) or placebo. Randomization was determined immediately before medication was administered by having a research nurse nurse phone into a contracted, interactive voice-response system for an assignment number; she was blind to the number's meaning in terms of treatment assignment. Given that the primary efficacy measure (the Goal Attainment Scaling [GAS] instrument14) was new to investigators at the study sites and that some sites might have had to withdraw if investigators did not know how to complete GAS, we randomized in blocks of 2, by site, to decrease the chance of incomplete blocks.
Patients were instructed to take 1 tablet twice daily, preferably with food. During the placebo-controlled phase, patients in the galantamine group were given 8 mg/d (4 mg twice daily) for 4 weeks, followed by 16 mg/d for another 4 weeks. At the end of week 8, the dose could be increased to 24 mg/d depending on tolerability. At week 12, patients were re-evaluated; the dose could then be reduced to 16 mg/d if necessary, after which it could not be changed. Patients assigned to the placebo group followed a sham titration schedule. During the open-label phase, patients in the placebo group were given galantamine in titrated doses for 12 weeks, and those in the galantamine group underwent a sham titration while continuing to receive the dose they were taking at the end of the placebo-controlled phase.
The primary efficacy measure was the GAS instrument,14 an individualized outcome measure in which goals are set and then followed over the course of a trial. The goals are personalized (i.e., people set goals according to their own needs). What is standardized is the extent of their attainment, which can be either “no change,” or “much better” (or “much worse”) than expected. (An example of how GAS is used can be found in online Appendix 2, available at www.cmaj.ca/cgi/content/full/174/8/1099/DC1). Two independent GAS assessments were completed: one by physicians, after interviewing patients and caregivers and completing all study procedures, and the other by patients and caregivers, in a separate interview facilitated by an experienced, independent health professional (usually a research nurse) who was blinded to all other outcomes and adverse events except for the CIBIC-plus, which the health professional also scored. GAS raters completed a 4-hour training session. Blinded qualitative raters from the coordinating study site coded every video-recorded interview and made domain assignments; this step provided quality assurance for how goals were set but did not influence scoring.
Secondary outcome measures included the CIBIC-plus,11 with scores anchored at 4 (no change) and ranging from 1 (very much improved) to 7 (very much worse). The 11-item ADAS-cog10,18 was used to assess memory, language and praxis, with scores ranging from 0 (no impairment) to 70 (severe impairment). The Disability Assessment for Dementia (DAD)19 was used to evaluate 23 aspects of instrumental and 17 aspects of basic activities of daily living, with scores determined as a percentage of applicable items, and higher scores indicating better performance. In the 13-item Caregiving Burden Scale (CBS)20 higher scores reflect higher burden. After baseline, the DAD and CBS were administered at 4 and 8 months, and the others every 2 months. We also introduced the Allocation of Caregiver Time Survey,21 the Red Pen Task22 and the locally developed Examination of Memory and Temporality, but we have not reported on these findings in this article.
Although the GAS instrument can be more responsive than standard measures because it is personalized, this attribute had not been tested in a controlled trial in dementia. For this exploratory analysis, we estimated the sample size from our limited experience with GAS in anti-dementia drug trials.13,23 Assuming a moderate effect size of about 0.524 and a 15% dropout at 4 months, we determined that 152 subjects would be required to detect differences at the 5% significance level (2-tailed) with 80% power.25 We recognized that this might not result in statistically significant results for the secondary outcomes, which we used to compare with the primary outcomes and with results from other studies.1
All of the patients who were randomly assigned were included in analyses of safety, demographic and baseline characteristics. The intention-to-treat analysis included all randomly assigned patients who took at least 1 dose (treatment drug or placebo) during the placebo-controlled phase and who provided any follow-up GAS. Missing data were imputed based on the last observation carried forward (excluding baseline data) during the placebo-controlled phase. The observed case analysis included only data from scheduled time points.
We report the mean change from baseline for efficacy measures by treatment group (galantamine v. placebo). Analysis of variance of the mean change in GAS scores from baseline to week 16 was the primary efficacy comparison.
Our protocol specified that the statistical significance of the primary outcomes be tested only at 16 weeks; otherwise, effect sizes were estimated at relevant points. Nevertheless, pre-publication assessments favoured calculating p values for secondary outcome measures; these calculations are limited to the 16-week test results. To evaluate clinical detectability and measurement responsiveness, we calculated effect sizes,24 estimated as standardized response means (SRMs), derived as the mean difference between groups divided by the pooled standard deviation of their change.26 For scales whose higher scores indicate worse outcomes (ADAS-cog, CIBIC-plus, CBS), negative effect sizes (less than zero) indicated a positive treatment effect; the opposite was true for scales whose higher scores indicate better outcomes (GAS, DAD).
As detailed below, group assignment was imbalanced, with more patients who had moderate dementia being randomly assigned to the placebo group than to the galantamine group. In a secondary analysis we used a mixed-effects model, with dementia severity as the fixed effect and patient as a random effect. This analysis allowed us to assess the effects of dropout and to adjust for dementia severity at baseline.
For the initial analysis, the statistician at the coordinating centre was blind to group assignments. An independent, unblinded statistician verified all analyses.
In terms of ethics, we reckoned that treatment in a carefully monitored placebo-controlled phase was ethically permissible for up to 16 weeks, given that patients had an opportunity to withdraw. All patients and caregivers provided written, informed consent, including specific consent for video-recording. Each institution's research ethics committee as well as the Therapeutics Product Directorate of Health Canada approved the study protocol.
Between November 2001 and July 2004, 130 patients were enrolled from 14 Canadian sites. We added 4 sites to the original 10, to aid recruitment. No interaction between site and treatment was present. Despite randomization, more patients with moderate dementia were assigned to the placebo group (Table 1). During the placebo-controlled phase, similar proportions of patients in the galantamine and placebo groups withdrew from the study (17% and 15% respectively), although more patients in the galantamine group than in the placebo group withdrew because of adverse events (8% v. 5%), including 1 death (Fig. 1).
Four patients in the galantamine group and one in the placebo group withdrew before completing a post-baseline GAS assessment. Two patients (both in the galantamine group) had no follow-up GAS or any other assessments and were excluded from analysis. One patient (assigned to the galantamine group) had follow-up data only for the clinician-based GAS assessment, and one from each group had data only for patient-caregiver–based GAS assessments; in all 3 cases, other secondary outcome measures were completed.
Clinicians set fewer goals than did the patients and caregivers (377 v. 439), although each set a median of 3 and no more than 6 goals per patient. Both the patients and caregivers and the clinicians set most goals in areas of cognition and function (67% and 60% respectively) and fewest in leisure and social activities (14% and 19%).
Both the patient-caregiver–based and clinician-based GAS assessments indicated that patients in both groups showed net mean goal attainment at 2 months (Fig. 2). After 4 months, although the patient– caregiver-based assessments showed no significant difference in mean goal attainment between the galantamine and placebo groups (absolute difference between groups 1.9, p = 0.27; SRM = 0.20), the clinicians detected significantly higher levels of goal attainment among patients in the galantamine group (absolute difference between groups 4.0, p = 0.02; SRM = 0.41). Higher goal attainment was seen among patients with moderate dementia, and because more patients with moderate dementia were in the placebo group, the bias of the imbalance at baseline was conservative (i.e., against demonstrating a galantamine effect). This finding was confirmed by the post hoc mixed-effects model analysis (see the following section).
The clinician-and patient-caregiver–based GAS assessments differed somewhat in terms of non-response. Thirty-one patients (47%) in the placebo group and 19 (29%) in the galantamine group met none of the goals set by the clinicians at the end of the placebo-controlled phase; the corresponding numbers of patients who met none of the goals set by patients and caregivers were 20 (30%) and 15 (23%). The largest differences in goal attainment, as determined by both the patient-caregiver–based and clinician-based assessments, occurred at 6 months (2 months after the start of the open-label phase). Patients who had been taking placebo during the first 4 months attained fewer goals at 6 months than did patients who were taking galantamine for the entire 6 months (absolute difference in patient-caregiver–rated GAS score = 4.3 [SRM = 0.39] and in clinician-rated GAS score = 4.5 [SRM = 0.42]).
The secondary outcome measures mostly showed effects that favoured initial treatment with galantamine. The ADAS-cog scores (Fig. 3) showed readily detectable differences between groups at 2 and 4 months (e.g., 4-month SRM = –0.36, p = 0.04), as did the CIBIC-plus (SRM = –0.40, p = 0.03) (Fig. 4). No significant differences in either the DAD scores (SRM = 0.28, p = 0.13) or the CBS scores (SRM = –0.17, p = 0.38) were found at 4 months.
In the observed case analysis, effect sizes favouring treatment with galantamine were similar to those seen in the intention-to-treat analysis (e.g., at week 16, clinician GAS SRM = 0.38, patient–caregiver GAS SRM = 0.22; ADAS-cog = –0.41; CIBIC-plus = –0.27; DAD = 0.19; CBS = –0.18). Because randomization resulted in more patients with moderate Alzheimer's disease being assigned to the placebo group than to the galantamine group, we first stratified by severity of dementia. For both sets of GAS assessments, patients with moderate dementia had higher degrees of goal attainment than did patients with mild dementia; for example, clinician-rated mean GAS scores were 53.9 and 47.7 in the galantamine and placebo groups respectively among patients with moderate dementia, compared with 56.0 and 53.8 respectively among patients with mild dementia. Still, there was no significant difference in the patient– caregiver-rated GAS scores by severity of dementia between groups.
The mixed-effects analyses confirmed that the treatment effects determined from the clinician-rated GAS scores and the CIBIC-plus scores remained significant at 16 weeks. After adjustment for dementia severity and dropout, the significance of findings that were significant in the unadjusted analyses was increased. The 4-month DAD score was at the threshold of statistical significance (p = 0.051) in the mixed-effects analysis.
During the placebo-controlled phase, adverse events occurred in 54 (84%) of the 64 patients in the galantamine group and 41 (62%) of the 66 patients in the placebo group (Table 2). Serious adverse events occurred in 5 and 10 patients respectively after 4 months. During the open-label phase, 47 of 54 patients newly receiving galantamine experienced adverse events; none reported serious adverse events. Of the patients who had already been taking galantamine, 4 reported serious adverse events during the open-label phase. A possible, probable or very likely association with the drug treatment was inferred in 28% of the adverse events that occurred in patients taking galantamine and in 15% of those in patients taking placebo. These rates are comparable to those in other galantamine trials.16
In this trial involving patients with mild to moderate Alzheimer's disease, 4 months of treatment with galantamine, compared with placebo, resulted in clinicians identifying statistically significant levels of goal attainment through GAS assessment. Using the same outcome measure, patients and caregivers were less able to discern a difference during the placebo-controlled phase, chiefly because they recognized improvement in both patient groups. The ADAS-cog and the CIBIC-plus scores showed appreciable effect sizes, which were also statistically significant.
Our data must be interpreted with caution. The study was small, with only 128 patients available for analysis, and the time spent in the placebo-controlled phase was short, although that phase was as long as we considered ethically permissible. Interpretation of missing data is problematic in studies of degenerative disorders. In dementia trials, carrying forward baseline scores of patients who drop out early means that there is no opportunity to demonstrate decline that might occur after dropout. One remedy would be to carry forward the mean scores of the placebo group.27 In our study, given the mean positive goal attainment in the placebo group, it would have been more conservative to carry forward cases in which no goals were met. However, doing so showed no appreciable differences between patient groups in either the SRMs or in the p values. Similarly, the differences between the clinician-based and patient-caregiver–based GAS assessments did not reflect that patients and caregivers had set more goals than clinicians or that only the goals set by the patients and caregivers were weighted. The bias of there being more patients with moderate dementia in the placebo group than in the galantamine group proved to be conservative, since patients with moderate dementia were more likely than those with mild dementia to respond to treatment, and the primary analysis was confirmed by the mixed-effects model. Because most sites used the same physician to complete the clinician-based GAS assessment and to evaluate side effects, physicians may have been unblinded by adverse events. However, the data suggest not: in the galantamine group, the mean clinician-rated GAS score was higher among patients who did not have gastrointestinal side effects than among those who did (54.8 v. 50.6). In the placebo group, the mean GAS score was likewise indistinguishable between those with and those without gastrointestinal side effects (50.8 and 50.4 respectively). A review of cholinesterase inhibitors discounted clinical meaningfulness and suggested in particular that failure to correct for multiple comparisons gave a falsely positive interpretation of drug effects.4 Here, even a correction as conservative as Bonferroni would not undermine the statistical significance of the main result — that is, the primary outcomes at the end of the placebo-controlled phase.
Because the clinician-based GAS assessment showed a statistically significant difference between patient groups whereas the patient-caregiver–based GAS assessment did not, it may take a Rorschach test to determine whether this trial argues for clinical meaningfulness of galantamine treatment. If so, here is what we see. The clinicians who completed the GAS assessments were blinded to the CIBIC-plus scores, which were based on patient– caregiver interviews. This means that, in our study, 2 observers independently judged that more patients taking galantamine than those taking placebo demonstrated clinically meaningful responses. These data suggest that clinicians can reliably detect meaningful treatment responses. We need to routinely use methods that help clinicians understand — and adjudicate — whether treatment meets the goals of patients and caregivers.
@ See related article page 1117
Kenneth Rockwood and Chris MacKnight each receive career support from the Canadian Institutes of Health Research through Investigator Awards. Kenneth Rockwood is also supported by the Dalhousie Medical Research Foundation as the Kathryn Allen Weldon Professor of Alzheimer Research.
This article has been peer reviewed.
Contributors: Kenneth Rockwood (principal investigator) designed the study and wrote the grant application and clinical trial protocol. He supervised the analyses and wrote the paper. Sherri Fay was project coordinator, assisted in the analyses and contributed to interim drafts. Xiaowei Song supervised the statistical analyses. Chris MacKnight and Mary Gorman were site investigators and contributed to the grant application and interim drafts. All of the authors approved the final draft of the manuscript.
This peer-reviewed, investigator-initiated trial was jointly funded by Janssen-Ortho Canada (80%) and the Canadian Institutes of Health Research (20%) (grant no. DCT-49981). The sponsor provided all medications and matching placebos, conducted on-site monitoring and gathered and electronically coded the case report forms. All data are held by the principal investigator (Kenneth Rockwood), who initiated and supervised all analyses. Janssen-Ortho received the paper 45 days before submission to verify protocol details. At our request, their statisticians answered questions about the use of the mixed effects model but had no other input in the analyses.
Competing interests: None declared for Sherri Fay and Xiaowei Song. Kenneth Rockwood has undertaken consultancies and received honoraria from Janssen Ortho, the study's co-sponsor, and from Pfizer, Novartis and Merck, and he was lead author of an earlier galantamine study. He owns no stock in pharmaceutical companies. He is part owner of DementiaGuide, which is developing a Web site to aid in goal setting for people with dementia. As principal investigator, he had full access to the data and had final responsibility for the decision to submit the paper for publication. Chris MacKnight has received research grants from Janssen Ortho, Pfizer, Lundbeck and Novartis. He has received no personal payments. Mary Gorman has received honoraria and travel grants from Janssen Ortho, Pfizer and Merck.
Correspondence to: Dr.Kenneth Rockwood, Division of Geriatric Medicine, Dalhousie University, 1421–5955 Veterans' Memorial Lane, Halifax NS B3H 2E1; fax 902 473-1050; kenneth.rockwood/at/dal.ca