Attrition weight estimation
To account for potentially informative attrition in our analyses, we estimated weights to apply to each observation in models of smoking and cognitive decline. For each wave of visits contributing to our analysis, the weights were based on the inverse of the wave-specific probability of being observed at that wave, and thus of being alive and uncensored at that wave. The intuition behind these weights is that respondents with characteristics similar to the observations missing due to attrition are up-weighted in the analyses of smoking and cognitive decline, so as to represent their original contribution as well as their missing contributions. Because determinants of death may differ from determinants of study drop-out for other reasons, we separately modeled attrition due to death and attrition by other causes.
For each of the two sources of attrition, we first developed separate models of not being censored over the course of follow-up.31
For each planned assessment, let Cikr
indicate whether person i
is no longer in the study by wave k
for reason r
, where r
is either death (r
=1) or loss to follow-up (r
=2). Each weight represents the reciprocal of individual i
’s probability of remaining both alive and in the study at wave k
. We classified a death as occurring at wave k
if the participant died between waves k−1
, so that for such an individual, Cik1
For each wave of follow-up, we modeled and estimated via pooled logistic regression30
the probability of being alive in that wave, conditional on remaining alive and uncensored in the previous wave. We separately modeled the probability that such a living and previously uncensored participant remained uncensored. To specify the models, we defined a set of variables L
, some of which varied over time, we thought likely to influence death or censoring and also affect cognitive function: age, race (African American versus white), sex (male versus female), education (0–8 years, 9–12 years [referent], 13–16 years, 17–30 years), alcohol consumption at the previous visit (none [referent], up to 1 drink/day, >1 drink/day), social network score at the previous visit, cognitive activity at the previous visit, disability score at the previous visit, self-rated health at the previous visit (per unit worsening in rating), chronic cardiovascular conditions, diabetes, global cognitive score at the previous visit, and smoking status (current versus never). We estimated models that included as predictors: the baseline time-constant covariates in L
, smoking status (Xi
), and the most recent prior values of the time-varying covariates (Li(k−1)
), including past measurements of cognitive function. We explored weighting models including additional variables representing the history of the time-varying covariates (e.g., i(k−1)
, …, Li(k−1)
)), but these covariates did not predict censorship or death independently of Li(k−1)
, and so were dropped from the model. Together, these models were used to calculate the cumulative probability of surviving up to a given follow-up wave and of participating in the assessment at that wave. Weights were applied at the level of observations within individuals, such that for each person-wave contribution to our analysis at wave j
, the weight was the inverse of the probability of the conjunction of these two events. These weights can be obtained by the simple product formula:
Implicit to the models we estimated is the Markov assumption that an individual’s probability of contributing to the analysis at wave k
, and thus of being alive and uncensored at wave k
, depends on his or her history of the collection of time-varying covariates i(k−1)
only through its most recent value Li(k−1)
. Such an assumption may be relaxed by incorporating additional lagged covariate values, or a user-specified function of such values (e.g., cum(i(k−1)
+ … + Li(k−1)
) as potential predictors in the weight models. To optimize the fit of our attrition models, we explored several functional forms of time, including as a continuous variable and as a set of cycle indicators. We also evaluated several potentially important cross-products, including cognitive score with smoking and time with cognitive score, smoking and age. We used the same set of covariates in the death and drop-out models, selecting the final covariate set (shown in ), as the set that contained variables with modest-to-strong associations with attrition and for which there were minimal missing data.
Baseline characteristics of the study population, and adjusted hazard ratio (HR) and (95% confidence interval) of attrition over five study cycles, estimated from models of continuation.
We present model-based 95% confidence intervals (CIs) for the hazard ratios (HRs) relating each covariate to censoring, under the assumption that the pooled logistic regressions correctly model the hazard of continuation in the study given the entire history of covariates.39
We used the Bayesian information criterion as an indicator of global goodness of fit. To describe each models’ ability to discriminate those who were from those who were not censored, we computed the discordance percentage and the c
-statistic. We used the Hosmer-Lemeshow test to describe each model’s calibration across a range of observed risks.40–41
From the combination of the two cause-specific models, we computed IPA weights according to Equation 1
. These are also called non-stabilized weights because, as the reciprocal of a probability, they are guaranteed to be greater than 1 for contributing observations, and may potentially be very large for a person with a small probability of staying alive and uncensored. As a potential remedy, we also computed wave-specific stabilized IPA weights by multiplying the individual’s non-stabilized weight at that wave by the conditional probability of remaining alive and uncensored up to that wave given a subset of baseline covariates Vi
(a subset of Li0
) and smoking status. Thus, as the ratio of two probabilities, we generally expect this stabilization to reduce the undue influence of a highly variable non-stabilized weight, and therefore to result in confidence intervals that are narrower than those in analyses using non-stabilized, potentially highly variable weights. Under our assumptions, both non-stabilized and stabilized weights give unbiased effect estimates, provided Vi
is entered into the regression model relating smoking to cognitive function over time, and thus effect estimates conditional on Vi
are reported in both analyses.42
Applying stabilized weights does not adjust for the covariates Vi
that were used in the estimation of the numerator of the model. It is instead necessary to include the Vi
as regression covariates in the primary analytic model. The stabilized weight for an individual’s contribution to wave j
is thus given by:
Similar to the denominators, we obtained estimates of the numerators via pooled logistic regression analysis in which V consisted of baseline age, sex, race, education, baseline alcohol consumption, and baseline smoking status.
Several assumptions underlie the IPA weight estimation. First, we assume that the attrition process follows an ignorability assumption that states that the conditional probability of remaining alive and in the study in the next wave, given that one has survived and remained uncensored up to the current wave, does not further depend on one’s future cognitive function, given past observed covariates and cognitive measurements.43
In addition, throughout we make the standard positivity assumption43
that for any given wave of the study, and any possible realization of the covariates, smoking status and past cognitive function up to the current wave, there is a positive probability that an individual with that observed history remains alive and in the study in the next wave, given that he or she is alive and uncensored in the current wave.
It is important to note that had attrition been jointly independent of time-varying correlates of cognitive function, then a standard unweighted GEE analysis would have produced valid statistical inferences about the effects of smoking on cognitive function. Remarkably, under the above assumptions of the attrition process’s ignorability and positivity, given the observed time-varying correlates of cognitive function, our analytic approach corrects for selection bias due to attrition, to the extent that it recovers the effect of smoking on cognitive function (possibly conditional on a subset of baseline variables) one would have obtained using a standard GEE analysis, had attrition been jointly independent of all time-varying predictors of cognitive function (possibly conditional on a subset of variables).
Analyses of smoking and cognitive decline
We evaluated the association between current smoking at baseline and cognitive decline using unweighted and IPA-weighted generalized estimating equations (GEE) regression models,39
with working exchangeable correlation matrix, in which we estimated the difference between current and never smokers in rates of decline in global cognitive score. In all models, we regressed the global score on the set of predictors Vi
, by including main effect terms for age, sex, race, education (4 categories, described previously), baseline alcohol intake (3 categories, described previously), smoking status, time (years, continuous), and the cross-products of each covariate with time. These analyses included data from all eligible person-wave contributions from participants who had a baseline cognitive score.
For comparison, we fitted unweighted models as well as models that weighted observations using the two sets of IPA-weighted estimates (non-stabilized weights and stabilized weights). Our primary hypothesis on the relation of smoking to cognitive decline was assessed with the cross-product between smoking and time, that is, the estimated difference between current and never smokers in their rates of cognitive decline. To make the estimates easier to interpret, we multiplied all estimated annual changes and differences in annual change by 10, obtaining estimates of change and differences in change over 10 years. To place these effect estimates in context, we compared them with the average rate of cognitive decline among never smokers, represented in the main effect term for time, and leaving all other covariates at their referent levels. Supposing that the rate of cognitive decline among never smokers represents “smoking-free cognitive aging,” we then estimated “excess years of cognitive aging” (over a 10-year interval) among current smokers by dividing the difference in 10-year change by the annual rate of change among never smokers.