Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3148349

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Inflation factor for PHM versus MLM assuming underlying Wiener process with drift
- 3. Simulations
- 4. Example: Mild Cognitive Impairment and time-to-progression
- 5. Discussion
- References

Authors

Related links

Contemp Clin Trials. Author manuscript; available in PMC 2012 September 1.

Published in final edited form as:

Published online 2011 April 30. doi: 10.1016/j.cct.2011.04.007

PMCID: PMC3148349

NIHMSID: NIHMS301523

M. C. Donohue,^{*,}^{a} A. C. Gamst,^{b} R. G. Thomas,^{b} R. Xu,^{c} L. Beckett,^{d} R. C. Petersen,^{e} M. W. Weiner,^{f} P. Aisen,^{b} and the Alzheimer's Disease Neuroimaging Initiative^{g}

The publisher's final edited version of this article is available at Contemp Clin Trials

See other articles in PMC that cite the published article.

Randomized, placebo-controlled trials often use time-to-event as the primary endpoint, even when a continuous measure of disease severity is available. We compare the power to detect a treatment effect using either rate of change, as estimated by linear models of longitudinal continuous data, or time-to-event estimated by Cox proportional hazards models. We propose an analytic inflation factor for comparing the two types of analyses assuming that the time-to-event can be expressed as a time-to-threshold of the continuous measure. We conduct simulations based on a publicly available Alzheimer's disease data set in which the time-to-event is algorithmically defined based on a battery of assessments. A Cox proportional hazards model of the time-to-event endpoint is compared to a linear model of a single assessment from the battery. The simulations also explore the impact of baseline covariates in either analysis.

We explore the relative efficiency of linear models of repeated measures of a continuous outcome and the Cox proportional hazards model (PHM) [1] of time-to-threshold of a continuous outcome in randomized placebo-controlled studies. This comparison has practical implications for clinical trial design for Alzheimer's disease (AD) and human immunodeficiency virus (HIV), among other diseases. For instance, in the study of AD in pre-dementia elderly with mild cognitive impairment (MCI), clinical trials have historically used either the rate of progression to dementia over a period of time (typically three to 24 months) [2] or PHM of time-to-progression [3, 4, 5] as the primary analysis. There is no algorithmic definition of dementia, however, and in clinical trials a panel of experts is convened to review case reports to determine a consensus diagnosis at each visit (usually every six months). The time-to-progression endpoint has been preferred for its tangible clinical importance, as well as its acceptability to regulatory authorities. Though the dementia endpoint has face validity, it can be difficult to implement, subjective, variable from visit to visit, and analytically problematic due to non-proportional hazards [3] and interval censoring. We posit that a linear model of a continuous assessment of disease severity, for example the Alzheimer's Disease Assessment Scale, may be more efficient than a subjective dichotomization (“not demented” versus “demented”). To this end, we quantify the relative efficiency of a linear model analysis of rate of change of a repeated continuous measure and a PHM analysis of time-to-threshold. “Time-to-threshold” is also known as “time-to-event” in survival analysis literature, but we use the former to emphasize that we will model the event of interest as an observed continuous measure exceeding a predetermined threshold.

The issue is not new or unique to AD research. McKay et al. [6] analyzed continuous, categorical, and time-to-event cocaine use outcomes and found continuous outcomes to express the greatest effect sizes. A meta-analysis of the orthopedic surgery randomized trial literature found those trials with continuous outcomes had greater power on average than those with a dichotomous outcome, an outcome analytically equivalent to time-to-event [7], and a greater proportion of the continuous outcomes trials attained acceptable power (>80%) [8]. Similar observations were made in the fields of rheumatoid arthritis [9] and stroke [10]. Reliable continuous biomarker surrogates have accelerated the study of HIV [11], and are still actively being sought, for example, for prostate cancer [12, 13] and AD [14].

Lee and Whitmore [15] provide an extensive review of threshold regression or first-hitting-time models which are used to analyze the relationship between covariates and the time at which an observed or latent stochastic process first crosses a boundary. Though we will be exploiting aspects of this literature, we are not proposing a threshold regression method. Rather, we consider cases in which the threshold might be considered an arbitrary dichotomization of an observable continuous process. Such dichotomizations may facilitate interpretation, but it is our primary goal to elucidate whether this ease of interpretation comes at the cost of analytic efficiency.

Section 2 introduces an inflation factor for quantifying relative efficiency, in terms of the required sample size, for the true marginal linear model (MLM) [16] and the PHM in general terms when we can assume an underlying Wiener process with drift. Section 3 provides simulation studies to demonstrate the utility of the inflation factor, and other comparisons for which the inflation factor does not directly apply because the underlying process is not a Wiener process or is not linear. In the simulations we apply linear mixed models (LMM) [17] that are commonly used in practice. In Section 4 we present an example of an event, onset of dementia, defined by multiple continuous outcomes based on publicly available data from a large MCI cohort.

Assume that clinical disease progression for an individual *i* follows an underlying Wiener process with drift,

(1)

where *i* = 1, . . . , *n, t* > 0, *θ* is a treatment-specific modifying effect on the rate of decline, and *W _{it}* is a standard Wiener process. The advantage of model (1), is that it allows a closed form expression for the distribution of the time-to-threshold, as seen below. Model (1), which was considered in [18], also allows for the variance of

(2)

where *T _{i}* is defined above,

(3)

with independently and identically distributed subject-specific vectors of residual errors, (*ε*_{i1} . . . *ε _{im}*) ~

(4)

Assuming equal group sizes, the required total *number of events* for a two-tailed Cox proportional hazards score test with specified power 1 – *β*, Type I error *α*, and log hazard ratio *θ*_{HR} can be estimated for the PHM design using Schoenfeld's [19] formula:

(5)

where *p* = ϕ(*z _{p}*) and ϕ is the standard normal cumulative distribution function. Similarly, we can use the formula from [20] for the total sample size under the MLM

(6)

where

(7)

In order to relate the two sample sizes, we need to represent the effect under one model as an effect under the other. The time-to-threshold assumption, i.e. *T _{i}* = min

(8)

Let the random variable *T _{A}* represent the time-to-threshold for an individual randomized to group

(9)

If we let *r* denoted the overall event rate, so that *n*_{PH} = *E*_{PH}/*r*, and substitute the above expression for *θ*_{HR} into (5), we have the inflation factor

(10)

Note that the mean event rate *r*, assuming no censoring up to the maximum follow-up time *τ*, can be expressed in terms of *F* as *r* = (*F _{A}*(

The plots in Figure 2 demonstrate that *Ψ* is not always greater than 1, which would indicate that MLM generally dominates PHM in efficiency. However, the only cases we found in which the inflation factor favored PHM were in impractical scenarios in which the required sample size approached zero due large effect size (or small variance).

We generated data based on a Wiener process with drift as in (1). Group *A* (placebo) had slope parameter *θ _{A}* = 0.2 and Group

Using (6) and the known data generating parameters, we calculated the required sample size to be *n* = 87.2 under a MLM (*α* = 5%, power = 80%) with a correlation structure as in (3). Alternatively, given *σ* = 0.5 and applying a threshold of *c* = 1, the log hazard ratio *θ*_{HR} ranged from *θ*_{HR} = 0.371 at *t* = 1 to *θ*_{HR} = 0.483 at *t* = 10, which translates to a hazard ratio in the range exp(*θ*_{HR}) = 1.45 to 1.62. If we assume no loss to follow-up, resulting in an overall event rate of (*F _{A}*(10) +

To study the accuracy of these sample size estimates under reasonable departures from the presumed MLM model, we simulated 1000 trials with total sample sizes of *n* = 90, *n* = 170, and *n* = 290; to examine whether the linear model attains simulated power of 80% with *n* = 90. Rather than using the known Wiener process correlation structure, we used the common mixed effects model with random intercept and slope:

(11)

In contrast to the marginal model, *ε _{ij}* were assumed to be independently distributed

Table 1 demonstrates that the calculated power as described earlier (lower half of the table) is very consistent with the simulated results (upper half of the table). We also simulated data assuming no treatment (*θ _{A}* =

We repeated the previous simulation study, but generated data based on an autoregressive model, *Y _{i}*(0) ~

Next we simulated continuous longitudinal data according to a non-linear trajectory with random intercepts and slopes that flation out once a threshold is met. More specifically, for an individual *i* in group *A* at time *t _{j}* :

where *t* = 1, . . . , 10, , , and *ε* ~ *N*(0, *σ*^{2}). We simulate two groups with *θ _{A}* = 0.2 and

To the longitudinal data we applied two misspecified linear models: (1) the random intercept and slope LMM as used in the previous examples, and (2) a random intercept and slope model with quadratic fixed effect for time allowing for a non-linear trajectory (LMM2). The parameter of interest from LMM is the group difference in slopes. The parameter of interest from LMM2 is the estimated group difference at *t* = 10. Finally, we used the known value of *c* as the threshold to define the events to be modeled via PHM. With *c* in the range 1/2 to 3, we found overall event rates in the range 87.5% (low threshold) to 8.9% (high threshold). We let *n* = 100 and 200.

The results are summarized in Table 3. We found there was a clear advantage to PHM when there was a low threshold and high event rate, but this reversed as the threshold increased and event rate decreased. We also see that the quadratic time model, LMM2, was consistently better than the standard LMM, especially when the threshold was low.

The Alzheimer's Disease Neuroimaging Initiative (ADNI), which began in 2004, is a collaborative project funded by National Institute on Aging and National Institute of Bioimaging and Bioengineering, the pharmaceutical and imaging industry, and several foundations (see *www.adni-info:org*). The study design and baseline characteristics are described in [23]. Briefly, the objective of ADNI is to study the rate of change of cognition, function, brain structure, and biomarkers in 200 elderly controls, 400 subjects with MCI, and 200 with Alzheimer's disease. For this analysis, publicly available data were downloaded from the ADNI web site *www.loni.ucla:edu/ADNI* on November 30, 2009. The data set contains repeated continuous measures of key assessments and progression events at 6-month intervals over 2 to 3 years, and is ideal for a more complex, clinically realistic simulation of our comparison of interest. Namely, we will simulate clinical trials to determine which experimental design can more efficiently detect a hypothesized intervention to slow cognitive and functional decline in a population with MCI.

In clinical practice and trials, the dementia endpoint is not algorithmically defined. It is a subjective transition based on the review of a battery of cognitive and functional assessments. Studies typically employ the consensus opinion of an expert panel. We took advantage of the rich ADNI data to develop a multivariate mixed-effects model for disease progression using multiple cognitive and functional measures, and to develop an algorithmic definition of progression for this process using the observed clinical diagnosis data. Our model of disease progression, richer than that hypothesized in Section 2, incorporated multiple measures: Alzheimer's Disease Assessment Scale, Cognitive Sub-scale, (ADAS-Cog; [24, 3]), Clinical Dementia Rating Sum of Boxes (CDR-SB, log transformed), and Functional Activities Questionnaire (FAQ) [25]. These measures were selected because they provide assessment of primary aspects of AD progression: cognitive performance, global clinical status and functional abilities. Each measure is converted to a z score to provide a common scale, and a multivariate mixed-effects model is fitted [26] to estimate mean rates of change, random variation in slopes and intercepts, and effects on slopes and intercepts of the presence of an apolipoprotein E4 (ApoE4) allele and baseline hippocampal volume. Specifically, the mixed effects model is of the form:

(12)

for individual *i*, at time *t*, outcome *k* = 1, 2, 3 (ADAS-Cog, CDR-SB, or FAQ), and covariate vectors

where Hippocampus* _{i}* and ApoE4

Because progression to dementia is subjective and not algorithmically defined, we derived a diagnostic algorithm for progression diagnosis based on baseline and follow-up ADAS-Cog, CDR-SB, and FAQ z-scores, using a repeated binary outcome Generalized Estimating Equation (GEE) logistic regression model [27] we regressed the observed progression outcomes on the z-scores:

(13)

Here *W _{ij}* = 1 if progression to AD is observed for individual

We simulated data based on the multivariate linear mixed model (12) to produce simultaneous cognitive and functional measures. All three simulated measures were then entered into our derived progression algorithm, the predictive model (13). We also added a treatment effect to model (12) resulting in a 25% or 50% reduction in the rate of decline. We then apply LMMs to the simulated continuous outcomes to derive an estimated treatment effect for ADAS-Cog and CDR-SB. Likewise, we applied the PHM to the simulated progression events to estimate the treatment effect on the time-to-progression. Note that the PHM utilized information from two assessments that are not available to the two univariate LMMs for longitudinal ADAS and CDR.

We also explored the efficiency of a pre-specified sample enrichment strategy in which the inclusion criteria requires subjects to exhibit amyloid beta (A*β*) dysregulation at baseline. Such a strategy would be particularly appropriate for testing anti-amyloid interventions. Simulations were repeated using estimates from the ADNI MCI subgroup, which we denote MCI-A*β*, defined by a cerebral spinal fluid (CSF) A*β*_{1–42} cutpoint of 192 pg/Ml, independently derived by [28]. We also used baseline FreeSurfer hippocampal volumes provided by University of California, San Francisco, and serial ADAS-Cog, CDR-SB, and FAQ assessed every six months for two years. The available sample size with complete data necessary for estimating the model parameters was *n* = 393 for MCI and *n* = 144 for MCI-A*β*.

Dropout was simulated by assuming exponentially distributed dropout times resulting in about 30% attrition over 2 years. This is a conservative estimate of dropout consistent with the 230/769 = 29.9% dropout rate observed in the 3-year donepezil and vitamin E trial [3]; and the 656/1457 = 45% dropout rate observed in the 4-year Rofecoxib trial [4].

We simulated data from 1000 clinical trials over a range of sample sizes, analyzed using LMM and PHM with and without presence of an ApoE4 allele and/or baseline hippocampal volumes, and estimated statistical power by the proportion of trials that rejected the null hypothesis of no treatment effect (p¡0.05). The model fitting and simulation were done in the **R** statistical computing environment [29].

Figure 3 summarizes the results in terms of power per total sample size *n* from simulated trials in MCI populations (bottom 2 panels) and MCI-A*β* populations with amyloid dysregulation (top two panels). Results from a simulated 25% treatment effect are displayed on the left and results from a 40% treatment effect are displayed on the right. The LMM results (“○” and “”) are clearly separated from the PHM results (“+”), demonstrating consistently greater power across all sample sizes simulated. Including baseline hippocampal volumes or ApoE4 status (not shown) provides a small, but consistent, improvement in power that is more delineated in the MCI population.

We found a quantifiable degradation of power with PHM compared to the alternative linear models in our scenarios, except when the underlying data was nonlinear and event rate was high. The inflation factor (10) demonstrates that this degradation is a function of the event rate, *r*, and the log hazard ratio, which can be expressed as a function of the threshold, slope, and variance parameters from an assumed underlying Wiener process with drift. The simulations also showed that the MLM power calculations, assuming known variance-covariance matrix, provided good estimates for the LMM. The autoregressive simulations demonstrated that power under the PHM was not monotone in the threshold or event rate. The MCI example showed that the degradation of power with PHM can have meaningful impact on the efficiency and costs of clinical trials in a realistic setting, even when clinical diagnosis is based on more outcome data than a single quantitative outcome measurement. These costs and comparisons should be considered, along with face validity, when evaluating the choice of endpoint in clinical trials.

In addition to the loss of power to detect a treatment effect, the MLM and LMM are generally more appropriate, robust, and efficient in many settings, particularly in studies of AD in MCI populations. The standard PHM analysis is not appropriate for the interval censored data that arise in these clinical trials settings; and the linear models obviate any bias that might be introduced by violations of the proportional hazards assumption. PHM also does not account for multi-state transitions, which are common. There are, of course, other analysis techniques that can handle the above issues, and their efficiency relative to the LMM is a question for future study. Another issue left to future study is a direct assessment of the effects of missing data on the inflation factor *ψ*, though the MCI simulation did attempt to replicate missingness observed in ADNI. Heuristically, the LMM and MLM make more efficient use of partially complete data, which should only amplify their relative efficiency over PHM given missing data. For the same reason, biases induced by informative missingness may be exacerbated by PHM relative to LMM. More specifically, the PHM uses no information regarding changes in performance that are below the threshold of the event of interest, whereas the LMM is informed by such changes. The fact that mixed-models use all available data helps make it robust in the face of data missing at random [30]. Using all of the data also makes the mixed-model less susceptible, relative to the Cox model, to bias induced by missing data mechanisms of all types. These open issues notwithstanding, the LMM and MLM are common, easily accessible, and robust alternatives to PHM; and the proposed inflation factor provides a means for making analytic efficiency comparisons with the PHM.

We are grateful to the reviewers and editor for their insightful suggestions. We are also grateful to the ADNI community, including the collaborators at all of the cores, sites, and laboratories; and to the ADNI volunteers and their families. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics, Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., as well as non-profit partners the Alzheimer's Association and Alzheimer's Drug Discovery Foundation, with participation from the U.S. Food and Drug Administration. Private sector contributions to ADNI are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation.

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1. Regression models and life-tables. Journal of the Royal Statistical Society Series B (Methodological) 1972;34(2):187–220. URL http://www.jstor.org/stable/2985181.

2. Safety and effiacy of galantamine in subjects with mild cognitive impairment. Neurology. 2008;70(22):2024. ID: 231612481. [PubMed]

3. Vitamin e and donepezil for the treatment of mild cognitive impairment. The New England journal of medicine. 2005;352(23):2379–88. ID: 110248948. [PubMed]

4. A randomized, double-blind, study of rofecoxib in patients with mild cognitive impairment. Neuropsychopharmacology : official publication of the American College of Neuropsychopharmacology. 2005;30(6):1204–5. ID: 110328423. [PubMed]

5. Effect of rivastigmine on delay to diagnosis of alzheimer's disease from mild cognitive impairment: the inddex study. Lancet Neurology. 2007;6(6):501–512. ID: 442451988. [PubMed]

6. Continuous, categorical, and time to event cocaine use outcome variables: degree of intercorrelation and sensitivity to treatment group differences. Drug and alcohol dependence. 2001;62(1):19. ID: 97362401. [PubMed]

7. Fitting cox's regression model to survival data using glim. Journal of the Royal Statistical SocietySeries C (Applied Statistics) 1980;29(3):268–275. URL http://www.jstor.org/stable/2346901.

8. Effect of continuous versus dichotomous outcome variables on study power when sample sizes of orthopaedic randomized trials are small. Archives of orthopaedic and trauma surgery. 2002;122(2):96. ID: 94143036. [PubMed]

9. Comparison of rheumatoid arthritis clinical trial outcome measures: A simulation study. Arthritis and rheumatism. 2003;48(11):3031. ID: 97804303. [PubMed]

10. Use of ordinal outcomes in vascular prevention trials: Comparison with binary outcomes in published trials. Stroke. 2008;39(10):2817–2823. ID: 424315419. [PubMed]

11. Modeling the relationship of survival to longitudinal data measured with error. applications to survival and cd4 counts in patients with aids. Journal of the American Statistical Association. 1995;90(429):27–37. URL http://www.jstor.org/stable/2291126.

12. Validation of a longitudinally measured surrogate marker for a time-to-event endpoint. Journal of Applied Statistics. 2003;30(2):235–247. ID: 366785396.

13. Prostate-specific antigen (psa) alone is not an appropriate surrogate marker of long-term therapeutic benefit in prostate cancer trials. European journal of cancer. 2006;42(10):1344–1350. [PubMed]

14. Serial PIB and MRI in normal, mild cognitive impairment and Alzheimer's disease: implications for sequence of pathological events in Alzheimer's disease. Brain. 2009;132(5):1355–1365. ID: 362475022. [PMC free article] [PubMed]

15. Threshold regression for survival analysis: Modeling event times by a stochastic process reaching a boundary. Statistical Science. 2006;21(4):501–513. URL http://www.jstor.org/stable/27645791.

16. Marginalized multilevel models and likelihood inference. Statistical Science. 2000;15(1):1–26.

17. Random-effects models for longitudinal data. Biometrics. 1982;38(4):963–74. ID: 113328916. [PubMed]

18. Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. Journal of the American Statistical Association. 2001;96(455) ID: 482249484.

19. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39(2):499–503. URL http://www.jstor.org/stable/2531021. [PubMed]

20. Analysis of Longitudinal Data. 2 ed. Oxford University Press; USA: 2002. ISBN 0198524846. URL http://www.worldcat.org/isbn/0198524846.

21. The inverse Gaussian distribution: theory, methodology, and applications. M. Dekker; New York: 1989. ISBN 0824779975 9780824779979. ID: 18163863.

22. Martingale-based residuals for survival models. Biometrika. 1990;77(1):147–160. doi:10.1093/biomet/77.1.147. URL http://dx.doi.org/10.1093/biomet/77.1.147.

23. Alzheimer's disease neuroimaging initiative (adni): clinical characterization. Neurology. 2010;74(3):201–9. ID: 568064771. [PMC free article] [PubMed]

24. A new rating scale for alzheimer's disease. The American Journal of Psychiatry. 1984;141(11):1356–64. ID: 114863527. [PubMed]

25. Measurement of functional activities in older adults in the community. Journal of gerontology. 1982;37(3):323–9. ID: 115094771. [PubMed]

26. Multivariate longitudinal models for complex change processes. Statistics in medicine. 2004;23(2):231–9. ID: 111663956. [PubMed]

27. Longitudinal data analysis for discrete and continuous outcomes. Biometrics. 1986;42(1):121–30. ID: 115936579. [PubMed]

28. Cerebrospinal fluid biomarker signature in alzheimers disease neuroimaging initiative subjects. Annals of Neurology. 2009;65(4):403–413. ID: 327390388. [PMC free article] [PubMed]

29. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2009. ISBN 3-900051-07-0; URL http://www.R-project.org.

30. Inference and missing data. Biometrika. 1976;63:581–592.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |