|Home | About | Journals | Submit | Contact Us | Français|
Analysis of change in health status using data from two waves can be examined either adjusted or unadjusted for baseline health status. We assess the effect of socioeconomic position (SEP) on cognitive change using both these strategies and discuss the implications of the analyses.
Data come from 1261 men and 483 women of the Whitehall II cohort study, aged 50-55 years at wave 1. Cognition was assessed at both waves using a test of verbal memory, and two tests of verbal fluency. Analysis of Variance (ANOVA) was used to estimate the effect of SEP on change score and analysis of covariance (ANCOVA) to estimate this effect adjusted for the baseline cognitive score. Then the ANCOVA estimates were corrected for bias due to measurement error (estimated based on 3-month test-retest). Finally, ANCOVA estimates were examined for increasing levels of measurement error.
The results of the ANOVA suggest no effect of SEP on cognitive decline. In contrast, the ANCOVA suggests significantly greater cognitive decline in the lower SEP groups. However, the ANCOVA estimates for the effect of wave 1 cognition show evidence for regression to the mean due to the presence of measurement error. The corrected ANCOVA estimates show no association between SEP and cognitive decline.
We recommend caution when using ANCOVA, or adjustment for baseline, in the analysis of change using two waves of observational data.
Socioeconomic position (SEP), assessed by education, occupation or income, has been shown to be associated with cognition.[1 2 3 4] There is also evidence to suggest that SEP is associated with cognitive decline, but most of this research is on the elderly.[3 5 6 7 8 9] Recent research suggests that midlife is a critical period for cognition, adding to previous work that show some decline in cognition already in midlife. However, it remains unclear whether SEP influences cognitive decline in midlife.
It is widely accepted that the analysis of change is best examined using longitudinal models with multiple repeated measures. However two-wave data are more widely available than multiple-wave data, with the difference between the two measures used to define decline, or change. There are at least two widely used analytical strategies to examine change using data from two waves. Both approaches attempt to address the same causal question, namely the extent to which changes in the exposure results in changes in the outcome measure. The first approach consists of regressing the change score on the predictor. This is equivalent to analysing the change score using analysis of variance (ANOVA). When the predictor has only two levels this is also equivalent to performing a t-test between the two levels of the predictor. The second strategy consists of regressing the wave 2 measure on the wave 1 measure and the predictor in an analysis of covariance (ANCOVA). Inconsistency in the results from these two approaches is often observed in observational studies and is known as the Lord's Paradox.[13 14] In fact, these two approaches estimate statistical parameters that correspond to the causal question mentioned above under slightly different sets of assumptions.[15 16] The first approach estimates the crude or the unconditional effect of the predictor on the change score. The second approach provides a conditional effect of the predictor on change, the condition being that the wave 1 outcome is similar in all predictor groups. In practice, both ANOVA and ANCOVA analyses are often carried out by using least squares regression which provides a more flexible method of analysis and can easily incorporate further control and adjustment for other baseline covariates.
In regression analysis the independent variable is assumed to be measured without error. Cognitive test scores are an imprecise measure of true cognitive function and are likely to be measured with some error. ANCOVA involves including the baseline cognitive score as an independent variable, contributing in some cases to bias in the ANCOVA estimates for the effect of SEP on cognition. More precisely, measurement error in the wave 1 cognition score results in an attenuation of its estimated effect on wave 2 cognition score.[17 18] If the wave 1 cognitive score in ANCOVA is related to the predictor, then the estimation of the effect of the predictor will be biased.[19 20 21] However, including the wave 1 cognitive score in ANCOVA when it is not related to the predictor does not bias the effect of the predictor on wave 2 cognitive score. Under this condition, the ANCOVA provides a more precise estimate of the effect of the predictor compared to the ANOVA,[22 23] making it particularly suitable to randomized controlled trials and other situations when there is no association between the predictor and the wave 1 value of the outcome measure. However, when the predictor and wave 1 outcome measure are associated, there are various methods to correct estimates derived from ANCOVA for the effects of measurement error.[20 24 25]
Research on cognitive decline using data from two waves is widespread.[3 6 8 26 27 28 29] In general terms, little attention has been paid to the implications of the analytical strategy. The objective of our study is to examine the effect of SEP on cognitive change in midlife using data from two waves, separated by 5 years. We estimate and compare the effect of SEP on cognitive change using the ANOVA and the ANCOVA approaches outlined above. A further objective is to examine whether the ANCOVA estimates corrected for bias due to measurement error allows judgements to be made about the estimated effect of SEP on cognitive decline in midlife.
Data were drawn from the Whitehall II study, established in 1985 as a longitudinal study to examine the socioeconomic gradient in health and disease among 10308 civil servants (6895 men and 3413 women). All civil servants aged 35-55 years in 20 London based departments were invited to participate by letter, and 73% agreed. Baseline examination took place during 1985-1988, and involved a clinical examination and a self-administered questionnaire. Subsequent phases of data collection have alternated between postal questionnaire alone and postal questionnaire accompanied by a clinical examination. Since baseline seven phases of data collection have been completed.
Our indicator of SEP was employment grade in the British civil service at Phase 1, a measure which is associated with salary, social status and level of responsibility. We classified grade into three categories representing high (administrative grades), intermediate (professional or executive grades) and low (clerical or support grades) SEP. This measure of SEP was assessed on average 15 years prior to the measurement of wave 1 cognition.
Assessment of cognitive function was introduced for the first time to the full Whitehall II sample at the fifth phase (1997-1999) of data collection. The cognitive data used in the analysis come from Phases 5 (considered to be wave 1) and 7 (2002-2004, wave 2 for our analysis). Cognition was assessed using five standard cognitive tests in the Whitehall II cohort. In our study, we have chosen to use three of the five tests that were little influenced by ceiling and floor effects, a source of potential bias in the analyses. The first test was a 20-word free-recall test of short-term verbal memory. Participants were presented with a list of 20 one- or two-syllable words at 2-second intervals and were then asked to recall in writing as many of the words as they could, in any order; they had two minutes to do so. The other two tests measured verbal fluency: phonemic and semantic fluency. Phonemic fluency was assessed via “s” words and semantic fluency via “animal” words. Participants were asked to recall in writing as many words beginning with “s” and as many “animal” names as they could. One minute was allowed for each test. The scores on all three tests represent the number of correct words.
The analysis reported in this paper was based on a sub sample of participants aged 50 to 55 years at wave 1 of the cognitive data collection. SEP was examined as a categorical variable with high SEP as the reference. Descriptive analysis on the association between SEP and cognitive performance at wave 1 was carried out using analysis of variance (ANOVA), separately in men and women and was adjusted for age. In subsequent analyses men and women were combined as there were no real differences in the estimated effects.
Cognitive change was defined as the difference between cognitive scores at wave 2 and at wave 1, a negative change score indicated cognitive decline. The effect of SEP on cognitive change was first estimated by regressing cognitive change score on SEP (analysis 1, ANOVA). Then, we used ANCOVA, consisting of regressing cognitive score at wave 2 on SEP and wave 1 cognitive score (analysis 2). This is formally equivalent to regressing the change score on SEP and wave 1 cognition. For both analysis 1 and 2 we show the estimated effects of SEP on cognitive change by using the high SEP group as the reference and estimating the difference in the cognitive change and 95% Confidence Interval (CI) in the intermediate and low SEP groups. Both analyses were adjusted for age, sex and the time between the two waves of cognitive measures that was on average 5.5 (range: 3.9-7.0) years. The ANCOVA also produced an estimate for the effect of wave 1 measure. This was the difference, at wave 2, between two individuals who had one unit difference on the observed wave 1 score.
In further analyses the ANCOVA estimates, where both wave 1 cognition and SEP are independent variables, were corrected for bias due to measurement error. Measures of cognition vary both due to day-to-day variations and error in the instrument used to assess cognition. Several methods have been developed to correct estimates for bias due to measurement error. Regression calibration is a widely used statistical method for adjusting point and interval estimates of effect obtained from regression models. We used errors-in-variables regression that allows user-specified measurement error to correct bias in the regression estimates using the EIVREG command in STATA. Measurement error is an inverse function of the reliability coefficient; the more the error, the less reliable is the measure. In our data we first used the test-retest reliability coefficient obtained from retesting a sub sample (N=556) of participants of the Whitehall II study at a three months interval. We extended these analyses by examining two further scenarios involving increasingly pessimistic assumptions about the severity of measurement error in order to reflect the fact that the magnitude of the correction in the estimated effects of SEP and wave 1 cognition depends on the magnitude of the measurement error.
The correction for bias corresponds to:
where y represents the vector of the cognitive score at wave 2, X* represents the matrix of the true value of the explanatory variables (age, sex, time between the both measures of cognitive function, wave 1 score and SEP) and X represents the matrix of the observed values. B is the vector of the parameters. The estimate b of B is obtained as following:
XT is the transpose of X, S is a diagonal matrix with elements N(1-ri)si2, where N is the number of observations, ri is the user-specified reliability coefficient for the ith explanatory variable (0<ri<1 for the wave 1 score and ri=0 for the all other variables) and si2 is the variance of the ith variable. In the simplest situation of a single explanatory variable, the corrected regression coefficient is just the observed regression coefficient divided by the reliability coefficient of the explanatory variable.
Data on all measures were available on 1743 (1261 men and 482 women) participants, aged 50-55 years at wave 1. Table 1 presents the sample characteristics in men and women. There were few men in the low SEP group (4%) and few women in the high SEP group (18%). The mean wave 1 memory score was a little over 7 words for men and women. In men, the mean score at wave 1 was more than 17 words (17.52 and 17.13) for both the verbal fluency tests. In women, the corresponding figure was 17.10 and 16.49 words, respectively.
Table 1 also shows the differences in cognitive scores at wave 1 between the three SEP groups with high SEP as the reference. In men, the low SEP group had an average memory score that was 1.83 words lower (p<0.001) than the high SEP group. For phonemic and semantic fluency in men, the average differences between the high and low SEP groups were 5.65 (p<0.001) and 5.33 (p<0.001) words, respectively. The results for women were very similar and men and women have been combined in subsequent analyses.
Table 2 shows estimates of the effect of SEP on cognitive change between wave 1 and wave 2 using ANOVA and ANCOVA. The results of the ANOVA reveal no difference in cognitive change score between the SEP groups for the three cognitive tests. However, the ANCOVA shows SEP to predict cognitive change, the decline being greater in the low SEP group, for all three cognitive tests. For instance, compared to the high SEP group the average decline in memory in the low SEP group was greater by 1.18 words (95% confidence interval (CI)=-1.53, -0.83). For phonemic and semantic fluency tests, this difference was estimated at -1.78 (95% CI=-2.31, -1.24) and -2.19 (95% CI=-2.69, -1.69) words.
The ANCOVA results also provide an estimate for the effect of the wave 1 cognitive score. These results show that one word difference in the memory score between two participants at wave 1 was associated with 0.45 word difference (95% CI =0.40, 0.49) between them at wave 2. Similar results were obtained for all three tests, all coefficients were significantly lower than 1. This implies that two individuals with one unit difference on the observed wave 1 score will have a significantly smaller difference on the observed wave 2 score. There can be two possible explanations for this finding. The first is that those with high cognition decline more than those with poor cognition at wave 2. This is not plausible on the basis of what we know about cognitive ageing. An alternative, more likely explanation for the results is that measurement error in assessments of cognitive function led to an attenuation of the estimated effect of wave 1 cognitive score on the wave 2 score[17 18]. Thus, it is important to consider correcting the estimates derived from ANCOVA, for measurement error.
Table 3 presents the estimates for the effect of SEP on cognitive change, corrected for bias due to measurement error. The observed test-retest reliability for the memory test was 0.60 and those for phonemic and semantic fluency tests 0.69 and 0.73, respectively. In the top third of the table the reliability of the cognitive test was set as the observed test-retest reliability and the estimates of the effect of SEP on cognitive change were corrected using this measure of reliability. The corrected estimates of memory show that at the observed reliability (0.60), SEP had a much weaker association with cognitive decline. However, the estimated effect of 1 unit difference in memory between two individuals at wave 1 translated to 0.77 word difference (95% CI=0.70, 0.84) between them at wave 2. The middle and bottom panels of Table 3 show the results for two hypothetical reliability coefficients, i.e. two scenarios with increasing measurement error and decreasing reliability coefficients. For example, reliability at 0.50 for memory revealed no effect of SEP on cognitive decline and little evidence of regression to the mean. These findings demonstrate the importance of measurement error to estimates derived from ANCOVA.
We examined the effect of SEP on cognitive decline in midlife using two commonly used analytical strategies, analysis of variance and analysis of covariance – the first is unadjusted and the second adjusted for baseline cognitive score. In this observational cohort study these two approaches lead to different conclusions: the results of the ANOVA suggest no effect of SEP whereas those obtained using the ANCOVA without correction for measurement error suggest greater cognitive decline in the lower SEP groups. The discrepancy is a result of at least two factors. The first is that the estimates from these two approaches are based on different assumptions; the ANOVA provides an unconditional and the ANCOVA a conditional effect. In other words the ANCOVA results are conditional on observed baseline cognition being similar in the three SEP groups. A second source of difference in the two approaches is that the ANCOVA estimates are biased due to the inclusion of baseline cognitive function, which is measured with error, as an independent variable.[19 20] We also demonstrate that the bias formula correction intended to correct for the influence of measurement error provides estimates that vary as a function of the size of the measurement error.
The discrepancy in the results between the ANOVA and the ANCOVA in observational studies was identified by Fred Lord and is known as the “Lord's paradox”[13 14]. The ANCOVA results would apply to situations were there were no baseline differences in cognition in the SEP groups. However, this is an untenable assumption as our data clearly show SEP to be related to baseline cognitive performance; a fact also widely reported in the literature[1 2 3 4] Many observational studies on cognitive decline, irrespective of the predictor examined, are based on the ANCOVA approach.[3 8 28 29 34 35] A recent literature review where 12 out of 14 studies concluded that education had some protective effect on cognitive decline. Out of these 12 studies, 8 studies adjusted for baseline cognitive score and did not take measurement error into account.[5 6 26 27 36 37 38 39] However, as highlighted by Glymour and colleagues, one study that initially suggested education to be associated with greater cognitive decline using data from 2-waves and ANCOVA without corrections for measurement error, later incorporated a third repeat measure of cognitive function to the analysis and found no effect of education on cognitive decline.
The results from the ANCOVA model provide estimates for the effect of SEP and the wave 1 cognitive measure as both are included as independent variables in the model. We use both these estimates to show that the uncorrected ANCOVA may not be reliable; the coefficient for the effect of wave 1 cognitive score on wave 2 cognitive score suggests that two individuals with one unit difference in their scores at baseline have a smaller difference at wave 2. There are two possible interpretations of this observation. The first is that higher scores at baseline predict lower scores at follow-up and vice versa. In other words, cognitive decline is stronger in individuals with better baseline cognitive function. This is not consistent with conclusions of previous studies,[11 27] and difficult to reconcile with the results for SEP obtained from the same model. We could speculate that participants who perceive themselves to have impaired performance at baseline work especially hard to improve cognitive score during the testing interval, while those who perceive themselves to perform well at baseline spend the testing interval avoiding all cognitive challenges and thus their true cognitive score declines. However, it is much more likely that these results are simply affected by measurement error which leads to “regression to the mean”,[17 18] the tendency of observations that are extreme by chance to move closer to the mean when repeated, due to measurement error. In general terms, the greater the measurement error the greater is the inflation for the estimated effect of the predictor and the greater the attenuation in the estimated effect of the baseline measure.[19 20] Cognitive function is measured with error, and given the association between SEP and baseline cognition, ANCOVA is clearly not suitable for the analysis of cognitive change in observational studies.
The bias due to measurement error in the estimates obtained from ANCOVA can be corrected.[20 24 25] As expected, this correction reduced the estimated effect of SEP and increased the impact of the baseline cognition on cognition at follow-up. Our results show that the estimates from the corrected model depend on the size of the measurement error of the tests. Increasingly pessimistic assumptions about the magnitude of measurement error lead to larger corrections of the effect estimate, so that under the assumption of high levels of measurement error (corresponds to decreasing reliability in Tables 3) the corrected effect estimate of SEP on cognitive decline decreases and then reverses sign compared to the uncorrected effect estimate. The estimates of the reliability obtained in our study for each test are close to those found in other studies[41 42] and with the observed estimates there is little influence of SEP on cognitive decline. However, we cannot rule out the possibility that the observed reliability is overestimated as the corrected wave 1 effects continue to show the presence of regression to the mean. For cognition, the measurement error is made up of errors specific to the measure but also to within-subject variability due to, for instance, health of the participant fluctuating over time. On the other hand, our second scenario is perhaps too pessimistic regarding the magnitude of the measurement error. The coefficient for wave 1 in the corrected model (reliability 0.40) indicates an increase in variance at wave 2. However, an examination of the variances associated with the observed data (not shown but available from authors) shows no such increase. Therefore, this second scenario might represent an over correction.
Glymour et al. have recently argued against adjustment for baseline health status in the analysis of change in health and proposed the use of causal diagrams to determine whether or not to adjust for the baseline measure of health. We add to this point of view in this paper by narrowing the focus to a comparison between the estimates obtained from the ANOVA and the ANCOVA, particularly for analysis of change using data from 2 waves, drawn from an observational study. We recommend caution when using ANCOVA, or adjustment for baseline, in observational data. As a first step, it is judicious to examine the estimates of both the ANOVA and the ANCOVA, when analysing change using two waves of data. Inconsistency between the results from the two analyses, one adjusted and the other unadjusted for baseline, ought to be taken as a warning sign. ANCOVA includes both the exposure and the wave 1 outcome as independent variables and we recommend that estimates for both these measures be examined when interpreting the results for the effect of the exposure. Finally, we caution against a simple correction of the ANCOVA estimates using the observed test-retest reliability to take account of the bias due to measurement error. Here again, it is advisable to examine the corrected estimates for both the exposure and the wave 1 outcome measure. When all these factors are taken into account, our results suggest no effect of SEP on cognitive decline in midlife.
AS-M is supported by a “European Young Investigator Award” from the European Science Foundation. MK and JV are supported by the Academy of Finland (projects #117604, #124271, #124322 and #129264), MM is supported by an MRC research professorship and MJS is supported by the British Heart Foundation. The Whitehall II study has been supported by grants from the British Medical Research Council (MRC); the British Heart Foundation; the British Health and Safety Executive; the British Department of Health; the National Heart, Lung, and Blood Institute (grant HL36310); the National Institute on Aging (grant AG13196); the Agency for Health Care Policy and Research (grant HS06516); and the John D. and Catherine T. MacArthur Foundation Research Networks on Successful Midlife Development and Socioeconomic Status and Health.
We thank all of the participating civil service departments and their welfare, personnel, and establishment officers; the British Occupational Health and Safety Agency; the British Council of Civil Service Unions; all participating civil servants in the Whitehall II study; and all members of the Whitehall II study team.
What this paper adds
What is already known on this subject?
Analysis of change using data from two waves of an observation study often provide different results depending on whether or not they are adjusted for the baseline measure.
Adjustment for the baseline measure provides results that are conditional on observed health measure being similar in the different exposure groups.
What does this study add?
We compared analysis adjusted and unadjusted for the baseline measure and showed that adjustment for baseline could be misleading as the exposure is often associated with the baseline measure in observational studies.
A further source of error in the estimates from the adjusted models is measurement error in the health measure; correction for this error does not provide a satisfactory solution and we recommend the use of unadjusted models.
The published version of this paper ‘Socioeconomic position and cognitive decline using data from two waves: what is the role of the wave 1 cognitive measure? Dugravot A, Guéguen A, Kivimaki M, Vahtera J, Shipley M, Marmot MG, Singh-Manoux A. J. Epidemiol Community Health. 2009 Aug;63(8):675-80. Epub 2009 Apr 29’ can be found at http://jech.bmj.com/cgi/content/full/63/8/675
Publisher's Disclaimer: Licence for Publication statement. “The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, an exclusive licence (or non exclusive for government employees) on a worldwide basis to the BMJ Publishing Group Ltd and its Licensees to permit this article to be published in Journal of Epidemiology and Community Health editions and any other BMJPGL products to exploit all subsidiary rights, as set out in our licence (http://jmg.bmj.com/ifora/licence.pdf).”