|Home | About | Journals | Submit | Contact Us | Français|
Study subject dropout compromises clinical trials by reducing statistical power and potentially biasing findings. We use data from a trial of treatments to delay the progression of mild cognitive impairment to Alzheimer’s disease (AD) (NEJM 2005;352(23):79–88) to determine predictors of study subject dropout and inform the design and implementation of future trials. Time to study discontinuation was modeled by proportional hazards regression with censoring at incident dementia or trial completion. Of 769 subjects, 230 (30%) discontinued prematurely. Risk of dropout was higher among nonwhites (hazard ratio (HR) 2.1, p=0.0007), subjects with less than college education (HR=1.6, p=0.02), subjects with a Hamilton Depression score of six or more (HR=1.3, p=0.04), unmarried men (HR=2.1 relative to married men, p=0.003) and subjects recruited by commercial clinical sites (HR=2.2 relative to subjects recruited by NIA funded AD research centers, p<0.0001). A trial using commercial sites with the discontinuation rates and incident dementia event rates experienced in this trial would require 80% more subjects than a comparably powered trial using NIA funded AD research center sites. Targeted retention efforts and utilization of academic sites could substantively improve the statistical power and validity of future clinical trials of cognitively impaired elderly.
Missing data is a major concern for all clinical trials. Prevention trials with long-term follow-up are especially vulnerable to missing data due to study subject attrition. Based on the experience of cohort studies, which find that cognitive impairment is a risk factor for study subject attrition1,2, loss of subjects to follow-up may be especially problematic for secondary prevention trials in dementia. As recently reviewed3, this in fact has been the case for most secondary prevention trials performed to date. Moreover, the literature on causes and risk factors for loss to follow-up within the context of clinical trials is limited4–8. To address these concerns, we investigated predictors of study subject attrition in the Alzheimer’s Disease Cooperative Study (ADCS) trial of donepezil and vitamin E to delay conversion from amnestic MCI to AD (the ADCS MCI trial9). Knowledge of the factors that influence subject attrition may suggest recruitment and retention strategies to improve the validity and statistical power of future trials.
Data from the ADCS MCI trial9 were used to examine predictors of loss to follow-up due to death, adverse events, or voluntary subject withdrawal. The trial is described in detail in the report of primary study results (Petersen, et al.9). Briefly, this three-year trial recruited 769 participants from 69 clinical sites in the U.S. and Canada. Inclusion criteria were amnestic MCI of a degenerative nature (insidious onset and gradual progression), impaired memory, a Logical Memory delayed-recall score10 1.5 standard deviations below educated-adjusted norms, a Clinical Dementia Rating (CDR11) scale score of 0.5, a score of 24 to 30 on the mini-mental state examination (MMSE12), an age of 55 to 90 years, a Modified Hachinski score13 of less than or equal to four, and lack of clinically important depression as indicated by a Hamilton Rating Scale for Depression (HAM-D14) score of less than or equal to 12. Sites were also instructed to only recruit subjects whose general health and physical condition would not preclude them from completing a three-year trial. The recruitment period spanned March 1999 to January 2004. Randomization to one of placebo, 2000 IU vitamin E, or 10 mg donepezil was balanced on MMSE score, age, and APOE e4 status. The study was conducted according to Good Clinical Practice guidelines, the Declaration of Helsinki, and the US Code of Federal Regulations Title 21 Part 50 (Protection of Human Subjects) and Title 21 Part 56 (Institutional Review Boards). Written informed consent was obtained from all participants and study partners. Full cognitive exams for incident dementia were conducted on site at six month intervals for up to 36 months. Incident dementia was diagnosed by site clinicians, with validation by a secondary review committee9. Two-hundred and twelve of 214 incident dementia cases met National Institute of Neurological and Communicative Diseases and Stroke and the Alzheimer’s Disease and Related Disorders Association criteria15 for possible or probable AD.
Subjects who missed scheduled follow-up visits were contacted by site investigators and if they choose to discontinue participation in the study were asked to return for one final cognitive exam. Clinical personnel obtained additional information at the final visit or via the telephone if the subject refused to return. The additional information included predefined questions regarding the reasons for study discontinuation. The predefined reasons queried were concern over placebo condition, length of protocol, travel requirements, frequency of assessments, potential adverse effects of medication, concern about medication effectiveness, alternative treatment chosen by subject, alternative treatment chosen by caregiver, or caregiver unwilling to continue.
The analysis plan for the present investigation consisted of univariate assessments of possible predictors of study discontinuation followed by a model-building exercise to derive a single multivariate predictive model. The outcome variable was time to discontinuation of the closed-label portion of the trial, defined as the time from baseline to the earliest of the date of progression to dementia or the date of last completed cognitive diagnostic assessment. Subjects who did not complete the closed-label portion of the trial were considered events, subjects who converted to dementia were censored at the date of dementia diagnosis, and subjects who completed the trial without converting to dementia were censored at the date of their month 36 assessment.
Univariate assessments included unweighted log-rank tests of overall association of the covariate with time to study discontinuation, descriptive summaries of the proportion discontinuing by level of the covariate, and Cox proportional hazards model estimates of the relative risk of discontinuation by level of the covariate. Variables significantly related to study discontinuation in univariate log-rank tests at the p<0.1 level were selected for inclusion in the final model; variables related at the p=0.10–0.15 level were subsequently tested for inclusion in the final model and selected for inclusion if likelihood ratio tests to add the term were significant at the p<0.05 level or if there was evidence of confounding by the variable. A covariate was considered to have a mediating or confounding effect if its inclusion in the model altered the model coefficients for other covariate terms by greater than 15% (Hosmer et al.16). Interactions with treatment arm were assessed for all variables in the final model and tested with likelihood ratio tests with a Bonferonni adjusted p-value for an overall significance level of 0.05. Interactions significant at the p<0.05 level were included in the final model. The proportional hazards assumption was checked for all term in the final model by Schoenfeld residuals tests using the log transform of time17. To investigate the extent to which observed associations were driven by deaths and other serious adverse events, we performed two sensitivity analyses, first refitting the final model with censoring on death, and again refitting the final model with censoring on both death and discontinuation due to other serious adverse events. The distribution of continuous variables was compared using the Wilcoxon signed rank test and the frequency of categorical variables was compared using Fisher’s exact test, as indicated. All analyses were conducted using the R statistical programming language18, with log-rank tests and Cox modeling performed using the R library ‘survival’.
Covariates considered included the screening measurements of Logical Memory, MMSE, modified Hachinski scale, and HAM-D, as well as measurements of activities of daily living using the ADCS Mild Cognitive Impairment Activities of Daily Living Scale19, BMI (weight in kilograms divided by the square of height in meters), the Clinical Dementia Rating scale sum of boxes (CDR sum of boxes20), and the Global Deterioration Scale21. Informant questionnaires and interviews were used to obtain study subject age, race/ethnicity, primary language, education, marital status, the Beck Informant Depression Scale22, duration of cognitive impairment prior to enrollment, and living arrangement, as well as information on the informant’s relation to the study subject and hours per week informant spends with subject. Treatment arm was categorized per the ADCS MCI trial as placebo, vitamin E, or donepezil. Type of recruitment site was categorized as National Institute of Aging program project AD research centers (ADC sites), non-ADC, university-affiliated research centers (university-affiliated sites), or commercial sites not affiliated with a university (commercial sites).
Of the 769 subjects randomized into the trial, 214 incident dementias were observed, 325 subjects completed the 36 month trial without converting to dementia, and 230 (30%) did not complete the closed-label phase of the trial. Of the 230 non-completers, 17 subjects died, 47 were discontinued by site clinicians due to adverse events, 105 withdrew consent during the trial, 55 discontinued treatment and ultimately withdrew consent during the trial, and six moved or otherwise refused further contact. Study discontinuation occurred within the first six months for 117 subjects and within the first year for 172 subjects. Over four-fifths of those lost (193, 83.9%) dropped by month 18, with 9.6%, 5.7%, and 0.9% dropping in each of the remaining six month intervals. Overall, the mean length in study was 23.2 months (standard deviation (sd) 13.4), with subjects who discontinued completing an average of 9.5 months (sd 8.9) of follow-up, compared to 18.4 months (sd 9.1) for incident dementia cases and 36.0 (sd 0.4) for those who completed the closed-label portion of the trial without converting to dementia.
Baseline characteristics by discontinuation status are presented in Table 1. Subjects who dropped were different from those who continued in the study on several demographic measures. In univariate analyses, risk of study discontinuation was higher among females (HR: 1.36, p=0.020), among non-white subjects compared to white, non-Hispanic subjects (HR: 2.12, p<0.001), and among those with less than high school education (HR: 1.51, p=0.047). Rates of study discontinuation were highest at commercial sites (43.8%), and lowest at ADCs (23.0%, p<0.001). University-affiliated sites had slightly higher rates of study discontinuation compared to ADCs (28.5%, p=0.150). Subjects who were married had lower rates of study discontinuation than their unmarried counterparts (27.0% vs 39.8%, p=0.001). Only one clinical measure was a statistically significant predictor of study discontinuation in the univariate analysis: subjects with a Ham-D score of ≥6 to 12 were more likely to discontinue than subjects scoring < 6 (HR: 1.6, p=0.004). Subjects in the donepezil arm were more likely to discontinue than subjects in the placebo arm (HR: 2.2, p<0.001).
Per our pre-specified model building algorithm, covariates selected for the final multivariate model based on a univariate log-rank test p-value < 0.1 were treatment arm, site classification, marital status/informant relationship, education, gender, race/ethnicity, and HAM-D score. Age was not significantly related to study discontinuation at the p<0.1 level, but was forced into the model. Of potential pairwise interactions, only the gender by marital status term was statistically significant (likelihood ratio test p-value < 0.0001); this interaction was added to the model using married men as the referent gender by marital status category. The two measures with univariate p-values between 0.1 and 0.15, the Beck Informant Depression Scale and the primary language indicator, were added separately to the model and tested for significance. Neither significantly improved the model (likelihood ratio test p-values > 0.3), nor did their inclusion modify coefficients of covariates already in the model by more that 15%, and therefore these terms were not included in the final model. Proportional hazards assumptions were not rejected for any terms in the model (p>0.3).
The final multivariate model is presented in Table 2. Hazard ratios and p-values in the full multivariate model are largely consist with those observe in the univariate analyses, although in the final model we observe that the effect of marital status on discontinuation is present only within men (HR=2.24, p=0.001 for unmarried versus married men). The gender effect remains, but individual terms in the final model comparing married and unmarried women to the referent married men category are not statistically significant because of the reduced per comparison sample size. If these two female categories are pooled, the gender effect is statistically significant (HR=1.34, p=0.044 for women versus married men).
When we reran the model censoring subjects who died rather than counting them as study discontinuation events, the coefficients and p-values for all of the statistically significant predictors in Table 2 were not appreciably changed except for the HAM-D term, which was attenuated and no longer a significant predictor of study discontinuation (HR 1.21, p=0.27). When we reran the model further censoring both deaths and other serious adverse events, the coefficients for all of the significant predictors were not appreciably changed, although in this model the p-value for the education less than high school term became only marginally statistically significant (p-value changing from 0.02 to 0.07 after censoring of deaths and other serious adverse events).
The final disposition of study subjects is summarized by type of clinical recruitment site in Table 3. Subjects recruited by commercial sites were less likely to convert to dementia, and more likely to die or discontinue compared to non-commercial sites. Reasons for voluntary study discontinuation by type of clinical recruitment site are summarized in Table 4. “Caregiver unwilling or unable to participate” was the most commonly sited reason, scored by 35 of 160 caregivers (22%). There were no significant differences in the frequency of reasons by site type, although there was a trend toward subjects recruited by commercial sites discontinuing because of concern that the treatment medication was not effective (p = 0.07) and a trend toward subjects recruited by non-commercial sites discontinuing because of travel requirements (p = 0.06).
We found that subjects in the donepezil arm had a 57% higher risk of study discontinuation compared to subjects in the placebo arm after controlling for covariates (Table 2, HR 1.57, p = 0.006). This association was described in the primary report of trial findings and is attributable to the known side-effect profile of cholinesterase inhibitor treatment9.
We also found that subjects with more depression symptoms reported on the HAM-D were more likely to discontinue the trial. While we did not perform a formal competing risk analysis, when we refit the final model with censoring of those subjects who discontinued due to death, the HAM-D association with study termination was attenuated; this suggests that the association was mainly due to depressive symptoms predicting death during the trial. In fact, subjects with a HAM-D greater than or equal to six were more likely to die during the trial (5.6% died versus 1.6% among subjects with a HAM-D less than six, Fisher exact test p-value = 0.016).
We also found that subjects with less educational attainment and non-white subjects had a higher rate of study discontinuation. Although minority race or ethnicity is a commonly perceived barrier to recruitment and retention6,7, there are little published data on clinical trial retention rates as a function of race or ethnicity, and we know of no published data on this issue within the elderly.
We found that commercial recruitment sites had twice the study discontinuation rate of non-commercial sites. All sites attended the same protocol training meetings and followed the same protocol. However, the commercial sites differ from the non-commercial sites in a number of ways. There are some indications the commercial sites recruited from a different pool of study subjects. While there was no difference between commercial and non-commercial sites in terms of the distribution of age, gender, and ethnicity, the subjects recruited by commercial sites had slightly less education (mean 14.1 years vs 14.8 years, Wilcoxon sign rank test p-value = 0.003). Lower education did predict a higher dropout rate in our sample (Table 1). However, in the multivariate analysis with both education and site in the model the HR for commercial sites was unchanged and remained highly statistically significant (Table 2), indicating that the site effect was not mediated by education in any appreciable way. We also observed that, among subjects who withdrew consent, caregivers from commercial sites were more likely to endorse as a reason for study discontinuation “Concern about effectiveness of medications” (26.0% versus 14.5% for non-commercial sites). It is possible that recruitment methods at commercial sites increased the proportion of subjects seeking active treatment via enrollment in clinical trials as opposed to subjects participating for more altruistic motivations. Subjects who seek active treatment via enrollment in clinical trials may be more likely to discontinue early if they perceive they are randomized to an arm with inactive or ineffective treatment (as reviewed in Vozdolska et al.23). Subjects from commercial sites were also more likely to die during the trial or to be discontinued because of other serious adverse events (Table 4). This does not entirely explain the higher loss rate in commercial sites, however. When the multivariate survival analysis was repeated with censoring of the deaths and adverse event-related discontinuations, the HR for commercial sites remained statistically significant (p = 0.0002) and was attenuated only modestly, from HR = 2.21 to HR = 2.09.
Beyond study subject characteristics, we also suspect that staff members at ADCs and university-affiliated sites are more familiar with the scientific research process and more sensitive to the importance of complete data for the validity of a trial. Staff members at ADCs and university-affiliated sites often have developed skills and standards for maintaining study subject continuity in the course of their own research. These skills and standards may transfer to their performance of recruitment and retention for multicenter clinical trials. Furthermore, academic sites are more likely to recruit subjects from their own clinic populations, while commercial sites tend to rely more heavily on advertising; longstanding relationships among staff and subjects at academic sites may improve retention at these sites.
Early study discontinuation at commercial sites did reduce the statistical power of the ADCS MCI trial. Average person-years of observation for subjects recruited by commercial sites was 1.75 person-years per subject recruited, compared to 2.01 person-years per subject recruited by ADC sites and 1.95 person-years per subject recruited by non-ADC university-affiliated sites. Compounding the high study discontinuation rate was a low incident dementia rate among subjects recruited by commercial sites (Table 3). There were only 10.4 incident dementia cases per 100 person-years of observation at the commercial sites compared to 16.6 and 14.4 incident dementia cases per 100 person-years of observation at the ADC and university-affiliated sites. We speculate that this is can be attributed to a lack of specificity of diagnosis for amnestic MCI by commercial sites when screening subjects for inclusion in the ADCS MCI trial (the inclusion criteria specified that non-amnestic subtypes of MCI should not be enrolled). Clinicians at commercial sites may have been less sensitive than their ADC and university-affiliated research counterparts to those inclusion criteria intended to narrow recruitment to the amnestic subtype of MCI that is most consistent with prodromal AD.
Cumulatively, high study discontinuation rates and low incident dementia rates such as experienced by commercial sites in the ADCS MCI trial can substantially compromise statistical power. One practical measure of the effect of incidence rates and discontinuation rates on power is in the calculation of sample size requirements when planning future trials. The statistical power of a survival analysis is determined by the expected number of incident events observed24. During this three-year trial, the commercial sites observed 0.1818 dementia events per study subject recruited, while the ADC sites observed 0.3333 dementia events per study subject recruited. Based on the number of events observed per subject recruited, a clinical trial using commercial sites with the discontinuation rates and incident dementia rates experienced in the ADCS MCI trial would require 1.8 times as many subjects as a comparably powered clinical trial using ADC clinical sites.
Subject dropout has been problematic for many MCI trials performed to date (reviewed in Jelic et al.3). Missing data due to study subject dropout compromises the power and validity of statistical analyses of trial results. We identified several demographic parameters that predict a higher dropout rate, including sex, marital status, education, and race/ethnicity. Retention efforts targeting the identified groups may improve the representativeness and power of future trials. We also found that commercial recruitment sites have a higher dropout rate and lower incident dementia rate than ADCs and university-affiliated sites. Overall, the ADCS MCI trial had a relatively low dropout rate compared to most MCI trials performed to date3, presumably because the ADCS MCI trial was less reliant on commercial sites for recruitment. We conclude that utilization of academic sites could substantively decrease dropout and improve the statistical power and validity of future clinical trials of cognitively impaired elderly.
Supported by the National Institute on Aging grants to the Alzheimer’s Disease Cooperative Study (U01 AG010483) and the University of California San Diego Alzheimer’s Disease Research Center (P50 AG005131).