Search tips
Search criteria 


Logo of blackwellopenThis ArticleFor AuthorsLearn MoreSubmit
Journal of the Royal Statistical Society. Series C, Applied Statistics
J R Stat Soc Ser C Appl Stat. 2009 July; 58(3): 369–382.
PMCID: PMC3001113

Likelihood estimation for a longitudinal negative binomial regression model with missing outcomes


Joint damage in psoriatic arthritis can be measured by clinical and radiological methods, the former being done more frequently during longitudinal follow-up of patients. Motivated by the need to compare findings based on the different methods with different observation patterns, we consider longitudinal data where the outcome variable is a cumulative total of counts that can be unobserved when other, informative, explanatory variables are recorded. We demonstrate how to calculate the likelihood for such data when it is assumed that the increment in the cumulative total follows a discrete distribution with a location parameter that depends on a linear function of explanatory variables. An approach to the incorporation of informative observation is suggested. We present analyses based on an observational database from a psoriatic arthritis clinic. Although the use of the new statistical methodology has relatively little effect in this example, simulation studies indicate that the method can provide substantial improvements in bias and coverage in some situations where there is an important time varying explanatory variable.

Keywords: Informative observation, Likelihood, Missing data, Psoriatic arthritis, Regression, Simulation

1. Introduction and motivation

This work was motivated by on-going analyses of a longitudinal database of information on patients with the chronic condition psoriatic arthritis (Hanley et al., 1988). This database, which was initiated in 1978, derives from the psoriatic arthritis clinic at the Toronto Hospital and records information from visits to the clinic which occur approximately once every 6 months. The principal measure of a patient's disease progression is the number of damaged joints. Damage to a joint is a permanent condition, typically reflecting immobility, and can be contrasted with disease activity, which is reflected by pain and inflammation of joints, which can often be alleviated by treatment.

There are two principal ways to measure joint damage: through a clinical examination and clinician assessment of damage, and through assessment of X-rays of joints, this typically being restricted to hands and feet. The clinical examination can be performed on every clinic visit but is often considered more subjective than the radiological examination, which is generally undertaken at intervals of approximately 2 years. More frequent routine X-rays would not be clinically acceptable owing to safety and resource implications.

Past studies of disease progression (e.g. Gladman and Farewell (1999), Gladman et al. (1994) and Bond et al. (2007)) have used the number of damaged joints developing between visits to the clinic as a primary outcome measure where the joint counts are those determined by clinical examination. The increment in joint counts has been related to explanatory variables, with a particular interest in the relationship between time varying explanatoryvariables reflecting disease activity and the subsequent occurrence of damage. With complete data, where we observe the damaged joint count and all explanatory variables of interest at each clinic visit, it is straightforward to compute the increment in the number of damaged joints and to fit a relevant model, such as a Poisson or negative binomial regression model.

After 30 years of data collection, the next phase of data analysis will focus on predictors of clinical damage at the individual joint level rather than at the patient level. Before initiating this work, it was felt important to provide as much evidence as possible that findings based on clinical damage are informative about disease progression and, in particular, would be consistent with those that might have been derived through use of radiological measures of damage if these were as frequently available. There is now enough evidence to consider this for patient level measurements and the work that is reported here was motivated by the desire to examine patient level predictors of radiological damage to confirm earlier findings that were based on clinical damage. The situation that arises in such analyses is that we have information on time varying explanatory variables being collected at each visit to the clinic but the primary radiological outcome measure is updated less frequently.

One approach is to consider only the data that are collected on the clinic visits when a radiograph is taken. This enables a regression model to be fitted using standard software for the implementation of generalized linear models (McCullagh and Nelder, 1989) and was done by Bond et al. (2007). However, it may be desired, and felt to be more sensible, to use the updated information on explanatory variables that is available from the intermediate clinic visits when a radiograph has not been taken. The primary aim of this paper is to present a means to implement a likelihood method for performing such estimation and to consider how much difference there is between the first ‘naive’ but straightforward method and the second more comprehensive approach to the analysis of the radiological outcome data. If consistent results are found between these two methods for the modelling of radiological data, then, given the comparable results that were found by Bond et al. (2007), reassurance is provided concerning the use of clinical measures of damage in the analysis of data at the individual joint level. Although, in this application, consistency of results between the two analyses of radiological data is a particular advantage, in other settings there might be the expectation that the more complicated analysis will differ from and be more appropriate than the naive method. Of course, if this were true for the psoriatic arthritis data, the more important question clinically would be whether the appropriate analysis of the radiological damage is consistent with the analysis of the clinical damage.

Section 2 defines some general notation and presents the technical details for dealing with missing outcome data. We perform some simulations in Section 3 to determine how much the analysis suggested can differ from a naive analysis in situations that are characterized by the influence and variability of an explanatory variable as well as the pattern of missingness. Section 4 outlines very briefly how informative observation might be incorporated into the likelihood methodology. Section 5 applies the method to the data that motivated the work. In addition, models for clinical damage are examined so that by deleting observations we can check how the method compares with the use of complete data with no missing outcomes. The paper concludes with a discussion in Section 6.

2. Methodology

Assume that each patient i has mi clinic visits (mi≥2), and there are n patients. At each visit to the clinic there is, potentially unobserved, a variable Di,j the number of damaged joints, and a vector of explanatory variables Xi,j, where 1≤in and 1≤jmi. Define the response variable Yi,j=Di,j+1Di,j,1≤j<mi, so the current set of explanatory variables is used to predict the next increment in the damaged joint count. Note that the number of increments Yi,j observed is one fewer than the number of visits at which the joint counts Di,j are observed and that there is no restriction that the damaged joint count at the first clinic visit, Di,0, be 0. We could allow the explanatory variables Xi,j to include functions of the damaged joint counts observed up to and including visit j. For simplicity, however, we exclude this possibility initially and extend the methodology to permit it subsequently.

We assume that

equation image

where f{·} defines a probability distribution function, on the integers, and has a finite number of parameters. The location parameter is assumed to be a known function μ(·) of a linear combination of the explanatory variables plus any offset terms, η=Xβ+O, and any further nuisance parameters are given by ψ. In our motivating example, f is the negative binomial distribution, μ is the exponential function and ψ is a single nuisance parameter representing overdispersion. If all the Di,j, and consequently all the Yi,j, were observed then we can define the likelihood function to be

equation image

In contrast, consider the case where we observe (dropping the i-suffix) X1,X2 and X3, the explanatory variables on visits 1, 2 and 3, but we observe only D1 and D3 with the second visit missing. In this case we can only infer that 0≤Y1D3D1, 0≤Y2D3D1 and Y1+Y2=D3D1. Provided that the observation pattern corresponds to the Ys being missing at random (Rubin, 1976), the likelihood function can be written as

equation image

where ηj=Xjβ+Oj. The fact that the random variable Y2 has a finite and discrete sample space means that we can calculate the likelihood accurately, rather than having to approximate an integral, which would be the case if the state space was continuous.

Where there is a sequence of multiple missing observations, it is helpful to think of the idea of a multistate model (Cox and Miller, 1965; Billingsley, 1981). Define the states to be the number of damaged joints. If we observe X1,…,Xk−1 and only D1 and Dk then we know that the unobserved sequence of states must be an increasing sequence starting at D1 and finishing at Dk. Modelling assumptions provide us with the transition probabilities. These are defined to be the matrices Pj,1≤jk−1, with elements An external file that holds a picture, illustration, etc.
Object name is rssc0058-0369-mu3.jpg (r,c meaning ‘row–column’). An external file that holds a picture, illustration, etc.
Object name is rssc0058-0369-mu4.jpg is the probability that the patient goes to state c at visit j+1 conditional on their being in state r at visit j, so

equation image

For the first transition, P1 is a row vector, since we have observed that r=D1, rather than a range of possible values. For the final transition, Pk−1 is a column vector, since we have observed that c=Dk. For the remaining values of j (1<j<k−1) Pj is a square matrix with DkD1+1 rows and columns, since r and c both can take values from {D1,D1+1,…,Dk−1,Dk}. The likelihood is the matrix product,

equation image

which is a scalar number since the first and last matrices are row and column vectors respectively.

Hitherto, we assumed that Xj did not include any of the values of the response variable observed at current or previous visits. When calculating the transition probability matrix Pj for a set of parameter values, this allows us to calculate μ(ηj) just once, and then the vector of values of f{·} can be calculated for just the first row of Pj, and then reused (without any further calculations) in the subsequent rows. This helps to ease the computational cost. However, if we want to include functions of the current, but potentially unobserved, value of Dj at transition j, which is denoted as Z(Dj), then an extension of equation (2) is required. A motivation for this is to take account of correlation. In this case, we modify the location parameter μ(ηj) to become

equation image

and modify equation (2) to be

equation image

The computational cost of this is that to calculate Pj now requires a separate evaluation of f{·} with different arguments for every non-zero entry of Pj.

We have defined the likelihood contribution for every sequence of contiguous missing values that are bounded by observed values. Two or more successive observed values give a simple product of probabilities, similar to equation (1). If the patient's records finish with a sequence of contiguous missing values then these observations are discarded since they give no information. The complete likelihood for the entire data is the product over visits and patients of all such terms. The complete likelihood is then used to derive maximum likelihood estimates or a posterior distribution if combined with prior distributions on the parameters. If the likelihood is maximized by using a Newton-type method (Dennis and Schnabel, 1983) the approximation to the Hessian matrix is used to estimate the covariance matrix for the parameter estimates and subsequently to form confidence intervals that are based on the asymptotic normality of maximum likelihood estimates.

3. Simulation

For our simulations, we generated independent outcome variables Y from a negative binomial distribution, with dispersion parameter 1, and with the linear predictor 1+X β. The coefficient β took values 1 or 2; the explanatory variable X was a binary variable with values 0 or 1. This implicitly assumes equidistant observations. Our first simulation held X constant within a patient and hence is comparable with a randomized control trial. This situation acts as a validation for the simulation in that little gain can be expected when explanatory variables are not time varying. We generated data for 10 ‘patients’, five of whom had X=1, and each patient had 10 observations. The situation of 10 patients is not of practical interest but is used to illustrate the nature of possible effects and a more realistic scenario is examined subsequently. To study the effect of the amount of missingness, we selected at random, without replacement, a sample of fixed size from the integers {2,3,…,9}. These observation numbers were then deleted from the simulated data, and only the sum of the relevant Ys retained (we ensured that the first and last observations were never deleted). The simulated data were then analysed in three ways: using all the original data and standard software to give a benchmark for comparison, using the missing data and standard software where the extra information that is provided by additional observations of explanatory variables was just ignored and using the missing data and the new methods of Section 2.

For each model specification, 500 simulations were undertaken and the maximum likelihood estimates and associated 95% confidence intervals were recorded. From these, the mean-squared error, the bias and coverage of the three parameter estimates (intercept, β and dispersion) were estimated. Table 1, part (a), presents selected results of these simulations, tabulated by the number of missing observations and by method.

Table 1
Simulation results

As expected, all three methods have minimal bias and adequate coverage for the location term and the regression coefficient which should be estimated consistently by all methods. The same patterns were seen for both values of β and therefore results only for β=1 are presented here and subsequently. If data were simulated according to a Poisson distribution, and Poisson regression models were used, the naive method and our likelihood method would coincide since the explanatory variable x is constant within patients and the sum of Poisson distributions is Poisson. The negative binomial distribution can be considered as a Poisson distribution with a multiplicative random effect whose variance is a function of the dispersion parameter. Hence it is not surprising that the naive method performs reasonably well for the coefficients (when associated with patient constant explanatory variables) but not the dispersion. Thus, although the naive method is biased for estimation of the dispersion parameter of the generating negative binomial, it does provide consistent estimation of the appropriate dispersion parameter for the naive analysis. The difference between these dispersion parameters increases as the number of missing observations increases. With observations that are not equidistant, the naive method would also involve some lack of fit since the sum of negative binomials would not be a negative binomial in this situation but the effect might be minimal in practice.

Tables 1, part (b), and 2, part (a), present the results of additional simulation exercises. These were similar to the first two; however, the difference was that the explanatory variable x varied within patients. This is the situation in which the naive method has the potential to give misleading results. The set of explanatory variables was simulated according to the distributions

equation image
equation image

The parameter p was set to 0.8 to generate a set of X-values that had a positive dependence, and with p=0.2 for a negative dependence. These two patterns of explanatory variables were then held constant over 500 simulations.

The results demonstrate that the naive method performs poorly for all parameters, with increasing bias with increased missingness. Our likelihood method is unbiased and gives good coverage. The effect of the missingness is to increase the mean-squared error.

Given that the simulations with a negative dependence produced the largest bias in the naive method, we choose to consider a large sample simulation using the same ‘true’ model. We repeated the simulation but with 100 patients rather than 10, each patient with 10 observations and with the same patterns of missingness as used previously. The results are shown in Table 2, part (b).

Table 2
Simulation results: explanatory variable with negative dependence

The results are similar to those for the smaller sample size. The bias and mean-square error are similar to those in Table 2, part (a), for all three parameters in the naive method but the bias, mean-squared error and coverage for our likelihood method and the full data become very close with increased sample size. The coverage for the naive method becomes worse, since the standard errors become smaller with increased sample size, but the bias is of order 1.

This simple simulation study illustrates that the methodology in Section 2 is of value primarily when explanatory variables are time varying. Furthermore, the effect of ignoring updates of such variables that are available from visits for which an outcome measure is not taken depends on the extent of mismeasurement of the explanatory variables that this induces.

4. Informative observation

The previous section was based on an implicit assumption that the pattern of the missing count data was not informative and thus missing at random. To generalize the approach, the observed data can be regarded as coarsened data (Heitjan and Rubin, 1991) in the sense that what is observed is a subset of the complete-data sample space that is defined by all observed and unobserved Yi,j-values. A similar view was adopted by Shardell et al. (2007) for longitudinal observation of discrete event time data. Non-informative observation can then also be termed coarsening at random.

To allow coarsening not at random, we require some model of the observation process. Let Gi,j be a variable that describes the level of coarsening. Further, let h{gi,j|yi,j,xi,j,z(di,j),γ} be the probability function for Gi,j conditional on the joint counts and explanatory variables at visit j and with unknown parameters γ. Then the likelihood development of Section 2 will be unchanged except that all occurrences of f{r,μ(η),ψ} will be replaced by a product of the form f{·} h{·}.

We let Gi,j be a binary variable that takes the value 1 if the increment in damaged joint count Yi,j is observed and 0 otherwise; hence

equation image

In principle, the observation process could be modelled in other manners, e.g. by a model for the width of the interval in which Y is observed to lie. Again, this could be simply incorporated into the development of Section 2. However, when considering informative observation in the following section, we shall restrict attention to a binary observation variable and further simplify, for illustration purposes, by allowing observation to depend only on the increment in the joint count Y. It is convenient then to adopt the logistic model

equation image

This is the selection model for a dropout process that was suggested by Diggle and Kenward (1994) where a non-zero value for the regression coefficient γ1 corresponds to a departure from an assumption of missingness at random. The identifiability of such models is highly dependent on distributional assumptions (Fitzmaurice et al., 1996). Also, the effect of missingness that depends only on outcome is expected to be limited with discrete data, being non-existent in the case of logistic regression (Farewell, 1979). We adopt the model represented by equation (4) here simply to illustrate the potential to extend the methods in Section 2. Adapting the methods to pattern–mixture models (Little, 1993) would also be of interest.

5. Application

The starting point for the results that are reported here was the negative binomial model that was presented by Bond et al. (2007) in their Table 4. The use of a negative binomial model is motivated by overdispersion in the data, particularly linked to an excess of 0s compared with a Poisson model. The model was developed from the psoriatic arthritis data that were introduced in Section 1. A summary of the characteristics of the patients at their initial clinic visits is given in Table 3. The model was for the increment in the number of damaged joints, determined clinically, between clinic visits. On the basis of a process of variable selection, it modelled the relationship between the log-relative-damage rates and explanatory variables: sex, age, time in clinic, initial erythrocyte sedimentation rate (ESR) current number of tender joints, current number of swollen joints, medication (none, non-steroidal anti-inflammatory drugs (NSAIDs), disease modifying anti-rheumatic drugs (DMARDs) and steroids). These correspond to the explanatory variables X as defined in Section 2 and most of these variables are time varying so there is the potential for bias in the estimation of regression coefficients if outcome data are not available at each clinic visit. The model also estimated relative rates that are associated with the current damage count (corresponding to the dynamic covariates Z(D) which were referred to in Section 2), in particular with a categorization of the damage count (0, [1–4], [5–9] and greater than 9), and an interaction variable being the product of the length of the disease and an indicator of whether the patient has any damaged joints. The offset term Oj was the logarithm of the time interval between observations j and j+1; hence the coefficients estimate the relative damage rate (on a log-scale). Here we consider models with the same explanatory variables and dynamic covariates as chosen by Bond et al. (2007) but which are estimated from different subsets of the data.

Table 4
Estimated log-relative-risks and dispersion (βX, βZ, θ) for model (a)
Table 3
Baseline characteristics of the data

The specific form of the model can be written as

equation image
equation image

where Xi,j and Zi,j are row vectors of the covariates described above for patient i at visit j, which occurred at time ti,j; βX and βZ are vectors of coefficients to be estimated and θ is the dispersion parameter to be estimated.

Our analysis incorporates the interpatient correlation of the observations from one patient through the use of dynamic covariates, i.e. the current damage count and a related interaction. Aalen et al. (2004) have discussed the practical use of this approach. Alternative routes could include using random effects (McCulloch and Searle, 2001), or using generalized estimating equations (Liang and Zeger, 1986). These are discussed briefly in Section 6. Use of the current damage count has the practical advantage of being immediately interpretable to a clinical audience. The implicit assumption is that, conditional on the current damage count, the next increment is independent of previous increments. In previous work (Bond et al., 2007), the model that is represented by equations (5) and (6) was fitted and standard errors of the regression coefficients were estimated by using the generalized estimating equation methodology with an exchangeable working correlation. The estimates and standard errors were very similar to those obtained by using regular maximum likelihood estimation. This is consistent with Cox and Snell (1981), page 23, who advised that it is sensible to be most critical about primary aspects of a problem but simplifying assumptions can be made with regard to secondary aspects. Here, primary interest is in the marginal effects of the explanatory variables, and some reasonable attempt to capture any correlation structure should be adequate.

We examine a series of models that should produce roughly comparable estimates.

  1. The first model is based on the clinical damage counts for joints in the hands and feet and uses data from all clinic visits. Here this can be regarded as the ‘correct’ model since it is based on all the relevant information. It is fitted by using standard software.
  2. This model takes the data that are used in model (a) and deletes all the variables on visits where no radiological observations were made. Estimates are obtained by using standard software, ignoring the updated covariates. Hence it is the naive method of estimation.
  3. This model takes the data that are used in model (a) but deletes just the outcome variable for visits where no radiological observations were made. Estimates are obtained by using our method where the covariates that depend on the unobserved outcome variable, Z(D), are calculated as in equation (3).
  4. The penultimate model uses the radiological damaged joint counts and a comparison of it with model (c) provides an indication of how model-based conclusions might differ depending on whether clinical or radiographical damaged joint counts are used as the outcome variable.
  5. The final model generalizes model (d) by incorporating a logistic model for the probability of a radiological count being available in the form of equation (4).

Table 4 shows the results of fitting model (a). The regression coefficients reflect the importance of the various explanatory variables on the development of clinical damage. Fig. 1 then shows 95% confidence intervals and estimates for the 18 regression coefficients, the elements of β, and the dispersion parameter for models (a)–(e). Each box refers to a different regression coefficient and there are five error bars for each of the five models that were fitted. There are no obvious patterns in the regression coefficients, either in terms of the estimates or the standard deviations. The dispersion parameter does appear to become larger as the amount of data that is used becomes less and therefore more uncertainty is introduced. It is also larger for the model based on radiological damaged joint counts, model (d), than for the comparable model based on clinical damage, model (c). Comparison of the lengths of the confidence intervals from models (b) and (c) indicates that there is no consistent pattern across regression coefficients.

Fig. 1
Comparison of estimated parameters and 95% confidence intervals: models (a)–(e) are from left to right

The logistic parameters, γ0 and γ1, in model (e) were estimated as −1.06 and 0.34 with standard errors 0.04 and 0.03. Thus there is highly significant evidence that observation depends on the increase in damaged joint count. From Fig. 1, however, it is clear that the coefficients that are estimated for models (d) and (e) are very similar. Thus, for these data, little bias appears to have been introduced by the adoption of a coarsening at random assumption in model (d).

From this example, no consistent evidence emerges that the more frequent updating of explanatory variable information that is enabled by the likelihood methodology of Section 2 is dramatically superior to simpler analyses. In addition, our estimates broadly agree with previous studies of these data (Gladman and Farewell, 1999; Gladman et al., 1994).

6. Discussion

In this paper we have illustrated how to implement likelihood-based inference for a regression model for longitudinal data when an outcome variable is only partially observed. The method is applicable to maximum likelihood estimation when the outcome variable is the increment in a cumulative, discrete quantity that can be missing at specific times of observation but when the time varying explanatory variables are available at all observation times. Generalization to account for informative observation is relatively straightforward. The choice of probability distribution is arbitrary as long as the sample space is discrete.

As pointed out by a referee, in the general context of regression models for longitudinal data, we cannot expect to improve a likelihood-based complete-case analysis if outcomes are missing but missingness at random holds. Here, the situation is special in that, in spite of the missing data, there is some information on the outcome because the sum over several outcomes is known.

In the example, our choice of the negative binomial can be regarded as a Poisson distribution with a gamma-distributed multiplicative random effect. We could generalize the assumption of conditional independence between increments, given current damage counts, by use of a Poisson model with a gamma-distributed random effect that is common within a patient. To use the multistate model framework to cope with the missing outcomes, we would have to factorize the joint likelihood of all measurements on a patient, which is available in a closed form, into a product of conditional likelihoods:

equation image

After some algebra, it can be shown that

equation image

which is very similar to equations (5) and (6), with equation (6) remaining the same, but with θi and λi obeying the recursive relationships

equation image

Hence such a random-effects model could be implemented in principle and is a topic for future research.

In contrast, it is not apparent how to combine the multistate model approach to the missing outcomes, which is a full likelihood approach, with a generalized estimating equation approach (Liang and Zeger, 1986) that uses a working correlation matrix other than the independent correlation matrix. Even the use of a working independence assumption is computationally complex. Hence we have not pursued the generalized estimating equation approach.

Our method was developed in response to a specific application: the analysis of data from psoriatic arthritis patients. In this setting, the uncertainty in how best to measure joint damage required the analysis of radiological damage information which is not observed at each clinic visit. Our software was written by using the R programming language (Ihaka and Gentleman, 1996) and can be downloaded from. The download also contains a subset of the data consisting of the response and a few of the covariates. Computational efficiency might be improved with alternative programming packages.

Our simulation studies focused on the negative binomial distribution, which is suitable for count data with overdispersion. It shows that the naive method that ignores the missing observations is biased in estimating regression coefficients when the explanatory variables vary over the observations with missing outcomes. This bias does not improve with larger sample size. However, our likelihood method is consistent and has accurate confidence interval coverage error.

The practical implications of the incorporation of updated explanatory variable information may vary from application to application, depending primarily on the strength of relationships with explanatory variables and, most importantly, the amount of variation over time in the explanatory variables. With our psoriatic arthritis data, the effect was shown to be minimal, which is itself of considerable importance in strengthening the findings of Bond et al. (2007) that demonstrated that the use of clinical measures of damage provided similar information regarding progression of disease as would be provided by the use of radiological measures. Consideration of other realistic settings with comparable characteristics would be informative.


We thank Professor D. Gladman and the staff at the Psoriatic Arthritis Clinic in Toronto for their support of this work and the Joint Editor, Associate Editor and referees for very helpful comments. Funding was provided by the Medical Research Council (UK), grant U.1052.00.009.


  • Aalen OO, Fosen J, Weedon-Fekjaer H, Borgan Ø, Husebye E. Dynamic analysis of multivariate failure time data. Biometrics. 2004;60:764–773. [PubMed]
  • Billingsley P. Statistical Inference for Markov Processes. Chicago: Chicago University Press; 1981.
  • Bond SJ, Farewell VT, Schentag CT, Gladman DD. Predictors for radiological damage in psoriatic arthritis: results from a single centre. Ann. Rheum. Dis. 2007;66:377–388. [PMC free article] [PubMed]
  • Cox DR, Miller AD. The Theory of Stochastic Processes. London: Chapman and Hall; 1965.
  • Cox DR, Snell EJ. Applied Statistics: Principles and Examples. London: Chapman and Hall; 1981.
  • Dennis JE, Schnabel RB. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. New Jersey: Prentice Hall; 1983.
  • Diggle P, Kenward MG. Informative drop-out in longitudinal data analysis (with discussion) Appl. Statist. 1994;43:49–93.
  • Farewell VT. Some results on the estimation of logistic models based on retrospective data. Biometrika. 1979;66:27–32.
  • Fitzmaurice GM, Heath AF, Clifford P. Logistic regression models for binary panel data with attrition. J. R. Statist. Soc. A. 1996;159:249–263.
  • Gladman DD, Farewell VT. Progression in psoriatic arthritis: role of time varying clinical indicators. J. Rheum. 1999;26:2409–2413. [PubMed]
  • Gladman DD, Farewell VT, Nadeau C. Clinical indicators of progression in psoriatic arthritis: multivariate relative risk model. J. Rheum. 1994;22:675–679. [PubMed]
  • Hanley JG, Russell ML, Gladman DD. Psoriatic spondyloarthropathy: a long term prospective study. Ann. Rheum. Dis. 1988;47:386–393. [PMC free article] [PubMed]
  • Heitjan DF, Rubin DB. Ignorability and coarse data. Ann. Statist. 1991;19:2244–2253.
  • Ihaka R, Gentleman R. R: a language for data analysis and graphics. J. Computnl Graph. Statist. 1996;5:299–314.
  • Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
  • Little RJA. Pattern-mixture models for multivariate incomplete data. J. Am. Statist. Ass. 1993;88:125–134.
  • McCullagh P, Nelder JA. Generalized Linear Models. 2nd edn. London: Chapman and Hall; 1989.
  • McCulloch CE, Searle SR. Generalized, Linear, and Mixed Models. New York: Wiley; 2001.
  • Rubin DB. Inference and missing data. Biometrika. 1976;63:581–592.
  • Shardell M, Scharfstein DO, Bozzette A. Survival curve estimation for informatively coarsened discrete event-time data. Statist. Med. 2007;26:2184–2202. [PubMed]

Articles from Wiley-Blackwell Online Open are provided here courtesy of Wiley-Blackwell, John Wiley & Sons