|Home | About | Journals | Submit | Contact Us | Français|
Longitudinal studies with binary repeated outcomes are now widespread in epidemiology. The statistical analysis of these studies presents difficulties and standard methods are inadequate.
We consider strategies for modelling binary repeated responses and focus on two specific issues: the choice between marginal and random-effects models, and the choice of the time point origin. These issues are addressed using the example of self-reported disability in older women assessed annually for 6 years. The indicator of disability "needing help to go outdoors or home-confined" is used.
In view of the observed associations between the responses for consecutive years, the baseline response was considered as a covariate. We compared the marginal and random-effects models first when only the influence of time and age is analysed and second when individual risk factors are studied in an aetiological perspective. There were substantial differences between the parameter estimates. They were due to differences between specific concepts related to the two models and the large between-individual heterogeneity revealed by the analysis.
A random-effects model appears to be most suitable for the analysis of self-reported disability in older women.
In developed countries, disability in older persons is a major concern for public health authorities. Progress in medical care together with improved living conditions have led to longer life spans. One adverse consequence is that the number of very old people who are disabled is also increasing. The study of the succession of the different stages of disability, and of the ability to recover, together with the analysis of the risk factors for disability, are important issues for research in gerontology.
Longitudinal studies are more appropriate than cross-sectional studies which suffer various limitations due to biases such as the selective removal of disabled persons by institutionalisation, or differences in the proportions of disabled persons between age groups, also confounded with changes across generations . Several authors  have analysed repeated measures of disability by comparing two by two time points. This method involves a very large number of tests and leads to partial results for fixed periods. The statistical analysis must take four main characteristics of the longitudinal data into account: 1) time may be an explanatory variable, 2) repeated observations for a subject are likely to be correlated, 3) the covariables may be time-dependent (they may vary through time for a subject), 4) missing data in the successive responses may induce a bias. Several statistical models developed for longitudinal data have become more popular and there is software available for some of them. The problem of the choice of which to use remains.
Here, we present a strategy to model binary repeated responses and to explain how to choose between marginal and random-effects models. The evolution of disability in older persons is used as an example to illustrate each step.
The French EPIDOS study is a prospective multicentre study of the risk factors for hip fractures in 7575 women aged 75 years or older, included in 1992–1993 and recruited by mailing based on large population-based listings including electoral rolls . Once in the study, these women were contacted by mail or telephone every four months to collect information about any falls and new fracture events. They had to complete a mail questionnaire annually investigating hospitalisation, new health events, changes in weight, type of housing, ability to go outside, activities of daily life (ADL), instrumental activities of daily life (IADL), medications used, and subjective health. This follow-up was initially planned for 4 years and then was extended to 6 years. In this paper we analyse the data from Montpellier (southern France), one of the 5 participating centres. We used the data from all the subjects included at this centre (1548 women) to analyse the evolution of disability.
The annual questionnaire was in some cases missing. This was mainly due to illness or to family events such as the death of the spouse. The women then postponed their answer.
Various indicators have been proposed to assess disability . Needing help for going outdoors represents a first evident level of functional limitation. It is easy to identify and concerns both men and women. Therefore we chose "needing help to go outdoors or home-confined" as an indicator of disability. The annual response variable is binary.
In longitudinal studies, multiple assessments of the same subject at different time points are used and the within subject responses are then correlated. This correlation must be accounted for by analysis methods appropriate to the data . Several models have been proposed for the analysis of such data. Most of them are extensions of the well-known logistic regression that is a particular case of the generalized linear models with a logistic link function . They are usually classified into marginal or random-effects models. Random-effects models are also called generalized linear mixed models or multilevel models or conditional models. However, this last term is ambiguous as several authors use it only for conditional maximum likelihood estimation  that we do not consider in this paper. Unlike linear models, the interpretation of the coefficients of these two types of model differs (see below). The choice of one or the other depends on the objectives of the study.
Let Yij denote a binary outcome (in our example: needing help to go outdoors or home confined) corresponding to the jth response (jth year in the study in our example, j = 1 to ni) of the ith subject (i = 1 to K). Let also Xij be a design matrix of covariates (1 × p vector, with first element being 1 for the intercept). The covariates may be fixed (for example age at baseline), or take different values at each year of the study (for example time or hospitalised for the last year). The marginal model, also called the population-averaged model , estimates the model thus:
logit (E (Yij | Xij)) = logit (P (Yij = 1 | Xij)) = Xij'β
whereas the random-effects model, also called the "subject-specified models" , estimates the model as follows:
logit (P (Yij = 1 | Xij)) = Xij' βi
Thus, the marginal model supposes that the relationship between the outcome Y and the covariate X is the same for all the subjects, and the random-effects model allows this relationship to differ between subjects. To highlight this point, the random-effects model may also be written:
logit (E (Yij | Xij, Ui)) = X'ij (β* + Ui) = Xij' β* + Xij' Ui
logit (E (Yij | Xij, Zij, Ui)) = Xij' β* + Zij' Ui
where the random effects Ui are assumed to vary independently from one subject to another according to a common distribution. This distribution is often supposed to be normal with mean 0 and variance D. Zij is often a subvector of Xij, which means that random effects apply only to a part of the covariates and the intercept. The variance, D, has to be estimated and represents the extent of the unexplained between-individual variability.
It is important to note that the p × 1 vectors β in the marginal model and β* in the random effects model are not equal. Hence, the estimators estimate different things and the magnitude of the difference between β and β* is function of the variance, D .
Moreover, dependencies between observations over the time are handled differently in the two models. In the marginal model, it is popular to fit the vector of parameters, β, using the Generalized Estimating Equations (GEE) proposed by Liang and Zeger  wherein the covariance matrix is structured by using a working correlation matrix R(α) fully specified by the vector of parameters, α. This working correlation is assumed to be the same for all the subjects, reflecting an average dependence among the repeated observations for all subjects. In contrast, the random-effects model allows this within-subjects dependency to vary from one subject to another, by the means of the random part of the covariable linear combination. In the simplest case, only a random intercept is introduced
logit (E (Yij | Xij, Ui)) = Xij'β* + Ui0
Ui0 being an individual parameter of propensity to become disabled, constant through the time. For a given subject, whatever the interval of time between two responses, the strength of dependency is then the same. If a random slope for the time covariable is added
logit (E (Yij | Xij, Ui)) = Xij' β* + Ui0 + Ui1 time
this individual strength of association increases or decreases with the width of the interval.
In the marginal model, several specific choices of the structure of the working correlation matrix R(α) are possible. For example, R(α) is called m-dependent if corr(Yij,Yik) = αt for t = 1, 2, ..., m and corr(Yij,Yik) = 0 for t >m; R(α) is exchangeable if corr(Yij,Yik) = α for j ≠ k, and it is unstructured if corr(Yij,Yik) = αik. An advantage of the marginal model, demonstrated by Liang and Zeger, is that β and their robust variance are consistent (the estimator converges towards the parameter being estimated as the sample size increases) even when the correlation structure is misspecified. However, choosing the working correlation structure closest to the true structure increases the statistical efficiency of the parameter estimator. Consequently, it is recommended to specify the working correlation as accurately as possible, based on the knowledge of the longitudinal process .
Concerning the estimating procedures, the GEE method for marginal models is not difficult to implement and is now available in the major statistical analysis packages. The procedures are more complex for the random-effects model. The most attractive of them directly maximizes an approximate integrated likelihood. With non-linear models computational-intensive integration methods, such as Gauss-Hermite quadrature, are necessary to evaluate the likelihood .
Different assumptions are required for the two models regarding missing data. The marginal model using the GEE requires a missing data process completely at random (MCAR). Under this assumption, missingness does not depend on individual characteristics (observed or not). In contrast, random models only need the less stringent assumption of missing at random (MAR). In this process, the probability of missingness depends only on observed variables (previous covariates or outcomes) . All marginal models do not require the MCAR assumption. For example, Robins et all  proposed methods to allow for data that are MAR in marginal models, but these methods are more complicated to implement.
The interpretation of the coefficients β and β* also differs. Consider, for instance, the covariate X "living alone". The odds ratio OR* = exp(β*) represents the odds of the outcome (needing help to go outdoors or home confined) for a person living alone compared to the same person supposed not to live alone. It can be seen as an odds ratio adjusted on unobserved individual characteristics. Under the marginal model, OR = exp(β) represents the averaged odds of the sub-group living alone compared to the sub-group not living alone.
In the following sections we will first describe the data and then focus on the specific problems inherent to longitudinal analyses: how to choose the time point origin and how to take into account the influence of time on responses. Then, we will present an example of an estimation of risk factors including covariates fixed across time for a single subject, and time-dependent covariates.
The mean (SD) age at inclusion was 80 years (3.7) and only 10 percent of the included women were aged 90 years or over. The proportion of women reporting disability (Table (Table1)1) increased only moderately with time except for the first year during which it jumped from 29 percent to 39 percent. The sequence of the seven successive states of disability (path) differs between the women. Only 268 women (17 percent) were able to go outside without help from the beginning to the end of the study, 77 (5 percent) were disabled throughout the 6 years. The frequencies of each of the other paths are very small: 1 or 2 percent. During the 6 years, recovery from disability was observed at least once in 498 women (32 percent). This relatively high proportion of women recovering from a disabled state is why we chose to perform a repeated measures analysis and not to use statistical methods analysing time before entering disability, such as Cox regression.
The study was initially planned to last 4 years, some women refused to continue in years 5 and 6, and the number of those returning the completed questionnaire decreased most quickly between years 4 and 5 (Table (Table1).1). Some women gave intermittent answers. For example, of the 970 women still providing a response in year 6, at least one observation in the previous years was missing for 150 (15 percent). The missingness partly depended on unobserved individual events (illness or family events) and is unlikely to be MCAR. The use of the marginal model is therefore questionable because an averaged effect is only poorly meaningful. The true time between yearly assessments and inclusion varied from one woman to another and the range tended to be higher at the end of the follow-up. Consequently, time is considered as a continuous covariable. Vital status was known throughout the 6 years for all the participants, even those who no longer sent back the questionnaires. The total number of deaths was 294 (19 percent) by the end of the study.
The response Yi0 given by the ith subject at baseline may be handled in two ways. It can be considered either as a part of the longitudinal response, and thus the response vector has 7 elements corresponding to the times 0, 1, ..., 6 years, or as a baseline covariate, in which case the response vector has 6 elements. These two possibilities correspond to different structures for the raw data and are displayed in Tables Tables22 and and3.3. In the models studied (marginal or random-effects model), the association between two successive responses is supposed to have the same structure through the whole survey. If the association between baseline disability and the other responses differs from the association between the follow up responses, then it is better to consider the baseline response as a covariable.
In our example, as in many cohort studies, the baseline examination was exhaustive. The women were proposed clinical and functional examinations (series of standard tests of physical performance) and a bone densitometry at the hospital. However, only those who felt well enough to undergo these tests were examined, probably introducing a selection bias at inclusion. During the follow-up, the women were only asked to complete a mailed questionnaire that even physically dependent persons could answer. The profile of our sample changed through time, with the women becoming more and more physically dependent. At the beginning, many of the disabled women were probably only slightly disabled and thus there was the possibility of recovery. As time passed more women were severely disabled, and stayed so until the end of the study. Thus, the association between the responses for consecutive years is likely to be stronger in the last years. Also, the baseline response may be considered as a special case, less related to the following disabled states.
To check this, we examined the table of the odds ratios of being disabled for the year j (Yij) according to disability the previous years (Yi(j-t)). The association between two successive time points beyond baseline was stronger than that with the baseline response (Table (Table4).4). For instance, in the lower diagonal giving the association between two observations separated by one year, the odds ratio between year 1 and the baseline is 5.3 whereas it is greater than 16 between year 2 and 1, and between the following pairs of years. In view of these results, the baseline response was considered as a covariate and the data were structured as shown in Table Table3.3. There was also a tendency for correlations to decrease with increasing time differences (Table (Table4).4). For example, the odds ratio is 16.3 between year 1 and year 2 and 10.0 between year 1 and year 5. This observation allows a better selection of the correlation structure, avoiding the use of an exchangeable working correlation matrix in marginal models, and introducing a random slope in random effect models.
The objective here is to evaluate the impact of time on the proportion of disabled women. We chose to use the age at entry (fixed covariable) and the time since baseline to characterize this phenomenon. Another solution would be to use the age at every response (as a time varying covariable) but our option allows the introduction of an interaction between the age at entry and the time since baseline to test whether the effect of time is more pronounced in the oldest women.
In view of the missingness process resulting in the sample differing at every assessment, and the possible selection bias at entry, the search for individual relationships using the random-effects model is more suitable than using population averaged associations. However the two models are interesting to compare to show that the estimate parameters can be different, and to explain these differences.
We used two methods to characterize the changes in disability over time: the marginal model with the GEE approach and the random model with likelihood integration. The random model had Gaussian random effects and errors. We used the SAS procedures GENMOD with the REPEATED statement and an unstructured working correlation matrix and also NLMIXED with the Gauss-Hermite quadrature integration method . For the SAS code refer to Appendix (see Additional file 1). In the random model, we determined the structure of the covariance introducing successively two random effects: random intercept and random slope for the time covariate.
In all the models, the following fixed effects were then tested: time since baseline (continuous variable), time square, age at entry (in years exceeding 74), interactions age × time and baseline inability to go outside without help. Only the time square effect was not significant, and was removed from the model. We also introduced an indicator variable for women dying within the 6-year period. This is a simple way to take into account dropouts due to death, a difficulty often encountered in longitudinal studies in the elderly. The use of this time-fixed indicator was possible since information about death was known for the six years for all the women. The NLMIXED procedure converged and provided estimations only when the initial parameters were close to the final solution. We calculated successive models, beginning with a simple model with only a random intercept, and at each step we used the previous estimate parameters as initialisation for the next estimation.
The results of the comparison between the 2 modelling strategies are shown in Table Table5.5. The general conclusion is the same for the 2 models but the estimated parameters are different. For instance, the most important factor, baseline inability, is very significant in both, but parameter estimates differ from 1.54 to 3.18. The significant increase in risk of being disabled with age and time since baseline is illustrated in figure figure1,1, presenting the averaged probability of disability, 5 years after the inclusion, in women able to go outside without help at baseline and who did not die during the study. In our cohort, recruited in 1992–1993, the probability of restricted mobility, 5 years later, was high, especially after the age of 80 years. Figure Figure22 shows the changes with time in two groups aged 75 and 85 years at entry. There was a significant interaction between age at entry and time, and consequently the risk of disability is accelerated in the oldest women.
Differences between the estimators of the marginal and random-effects parameters are expected (see above). The marginal model expresses averaged relationships without taking into account the fact that the same subjects are considered at each time point, whereas the random-effects model gives relationships conditionally on having certain individual characteristics modelled by the random effects. In the case of only a random intercept, Nehaus et al  demonstrated that the estimates from the marginal model are systematically lower than those from the random model. This characteristic is shown in figures figures11 and and2,2, where the curves from the marginal model are flatter than the others. The differences between the estimates of the two approaches are largely dependent on the inter-individual heterogeneity. This heterogeneity can be assessed in the random models by looking at the intercept and slope variances.
In the random model, the random intercept variance is high. The estimate is 7.25, indicating that the variability of this individual additional intercept has a 95 percent width of 10.6 ( × 1.96 × 2). If we consider the variability given by the fixed covariates (age varying from 75 to 85 years, baseline disability and death each varying from 0 to 1) the width is 6.14 (0.16 × 10 + 3.18 + 1.36). Hence the variability explained by these three fixed variables is lower than the unexplained between-individual variability. Similarly, the random slope variability has a 95 percent width of 1.66, whereas the variability explained by the age has a 95 percent width of 0.40 (0.04 × 10).
The probability of being disabled is therefore much more due to the woman's uncharacterised "frailty" than to age or to baseline disability. This wide between-individual heterogeneity explains the differences between the parameter estimates of the marginal and random-effects models.
Changes with time expressed as age at entry and the time since baseline were modelled in the previous section. The marginal and random-effects models can also be used to identify other risk factors for disability. It would also be interesting to test how the estimating algorithms behave when numerous covariables are jointly introduced.
We compared two multivariate models adjusted on the covariates presented in the previous section (age, time, interactions between age and time, baseline disability, still alive at the 6-year period) (see table table6).6). When numerous covariables are put in the model, the NLMIXED procedure may fail to integrate the likelihood or give non-stationary estimations. This non-optimal estimation can be diagnosed by checking the gradient (vector of first derivative) of the negative log likelihood function for each parameter. These gradients are systematically provided by SAS in the results. If one of them is not close to zero, then the solution cannot be considered to be valid. These problems of convergence were not encountered with the set of covariables shown in table table6.6. Two categories of covariables were tested: variables collected at baseline (living at home alone, body mass index (BMI), visual acuity measured with Snellen letter test chart on a decimal scale, and perceived health) and variables that vary with time (hospitalised at least once during last year, temporarily bed-confined during last year and number of falls during last year). Adjusted on all the others, all these factors were significant.
We present the steps for choosing a model able to characterize disability taken as a binary response.
Several important points have to be considered in the analysis of repeated binary data. First, the choice between the marginal and random-effects models depends mainly on the aims of the study. If the goal is to predict a mean prevalence of disability over time in elderly people by sex or age group, the marginal model is suitable. In contrast, if the goal is to study the individual risk factors for aetiological considerations, the random-effects model is more suitable because it allows adjustment on non-observed individual characteristics, and a better understanding of the underlying mechanism.
Second, the missing data process has to be examined. The marginal model using GEE assumes that the sample is representative of the whole population at each time point and the missing data is MCAR. This is unlikely to be the case in our example. The random-effects model assuming an MAR process is more appropriate. By the end of the study, half of the 37% of missing data were due to deaths and most were predictable from the previously collected data. The same probably applies to missing data due to chronic illnesses. Nevertheless, we cannot exclude the possibility that unobserved level of disability in our example may influence a part of the missing process. The magnitude of the potential bias introduced by non-random missingness should therefore be examined. A sensitivity analysis to assess the impact of missing data on subsequent statistical inference would be worthwhile , but is beyond the scope of this paper.
Another important point is to determine whether the disability indicator at baseline should be considered as a response or as a covariate. In epidemiological cohorts, the conditions in which responses are collected at baseline and during the follow-up are often different. The first response is often considered as a covariate but this choice has to be confirmed in view of the analysis of dependency between the responses.
The structure of the covariance between the repeated responses has also to be chosen. In the marginal model, the inferences on the parameter estimates are asymptotically valid under any assumed structure but it is better to choose a structure corresponding to the data. In contrast, in the random model, the fixed and random parameters are simultaneously estimated and the choice of the covariance structure influences the final results. For the mixed model applied to gaussian responses, it is recommended  (i)to consider first the more general model with all the relevant covariables, (ii) to specify a model for the covariance structure (i.e. to specify the random effects), and to estimate the parameters, and (iii) to try to reduce the fixed effect portion. In practice, the estimating procedures that we used do not allow introduction of a complex covariance structure due to computational time and unstable estimations. Other more flexible methods allowing multiple levels of clustering, such as Markov chain Monte Carlo methods, can be used but are more complex to handle .
The calculation procedures for random models need to be improved. The method of estimation with likelihood integration requires excessive computation. An alternative strategy to fit mixed models is the penalized quasi-likelihood (PQL) approach [18,19] (GLIMMIX macro in SAS) but this method provides highly biased estimates of mixed-effects parameters with binary responses [20,21].
The analysis of disability evolution with age at entry and time, as well as the study of other risk factors, show that the differences between the estimates for the two models can be large. However, the interpretation of the estimate parameters is different. In the marginal model, the exponential of an estimate parameter is a population-averaged odds ratio for disability and concerns the sub-population that shares a characteristic relative to the sub-population not sharing this characteristic. In the random model, the exponential of an estimate is an odds ratio for a woman that has a characteristic relative to this same woman if she were free of this characteristic. The random model takes into account the underlying dependence relationship. Furthermore, the assumptions about the distributions are different, and the fact that the conditional distribution is binomial does not imply that the marginal distribution is also binomial .
In our example, the search for risk factors for disability prevention, and the characteristics of our sample, such as the missingness process and the number of drop-outs after 4 years, render the random-effects models more appropriate. The marginal model has a tendency to waste information and does not measure the association of within-subject covariate change with change in the response, the associations typically of particular scientific value in longitudinal studies.
In epidemiology, many rich databases are now available from longitudinal studies with binary repeated events. Traditional analyses comparing time points two by two have serious limitations. Marginal models are easy to implement and represent a first solution, but the random models, although more complex, use all available data and are more suitable for explicative studies.
IC wrote an initial draft and JB made important improvements. Both authors read and approved the final manuscript.
The pre-publication history for this paper can be accessed here:
The authors are grateful to Dr François Favier and the EPIDOS group who provided the data for this work and to Annie Lacroux for editorial assistance.