|Home | About | Journals | Submit | Contact Us | Français|
Often clinical studies periodically record information on disease progression as well as results from laboratory studies that are believed to reflect the progressing stages of the disease. A primary aim of such a study is to determine the relationship between the lab measurements and disease progression. If there were no missing or censored data, these analyses would be straightforward. However, often patients miss visits, and return after their disease has progressed. In this case, not only is their progression time interval-censored, but their lab test series is also incomplete. In this paper, we propose a simple test for the association between a longitudinal marker and an event time from incomplete data. We derive the test using a very intuitive technique of calculating the expected complete data score conditional on the observed incomplete data (CEST). The problem was motivated by data from an observational study of patients with diabetes.
In studies of chronic diseases such as cancer and AIDS, patients are monitored for clinical or laboratory measurements (or markers) as well as events that are known to be associated with declining health and an increased risk of death. Understanding the pattern of these markers and events is important for the clinical management of individual patients, for deepening understanding of the natural history of the disease, and for the design and analysis of clinical trials for new therapies. The statistical analysis of such data is common, with standard methods such as the Cox proportional hazards model with a time-varying covariate. However, the analysis may be complicated by the fact that patients miss visits, resulting in incomplete information either on when an event occurred (interval censored data) and/or on the value of serially collected clinical or laboratory measurements that are important for prediction of the event. The simple approach of analyzing only complete data can be biased and thus it is necessary to appropriately handle the missing data.
A recent example of this type of study involved a group of patients at the Joslin Diabetes Center who were followed for evidence of change in albumin excretion, which is an indicator of diabetic kidney disease (proteinuria) (Perkins, et al. 2003, Ficociello, et al. 2007). The investigators were interested in determining laboratory markers that could indicate which patients were at elevated risk for renal (kidney) disease. At scheduled visits, several lab tests were performed including a screening for presence of an excess of serum proteins in the urine, or proteinuria, which can be a sign of renal disease. One aim of the study was to determine whether blood sugar control was associated with an elevated risk of proteinuria. As a measure of diabetes control, levels of glycohemoglobin A1c (A1c) were recorded from the blood specimen. The investigators were interested in whether patients who had poorer diabetes control were at greater risk for subsequent proteinuria (renal disease). This is depicted in the top row of Figure 1. There were many patients who missed one or more clinic visits, and returned with evidence of proteniuria. When patients missed clinic visits, then neither A1c nor urine protein levels (used to determine the occurrence of the event of proteinuria) were available. Thus, as indicated in the bottom row of Figure 1, if a patient missed a visit at time j, and returned with proteinuria at time j + 1, then the A1c for visit j was unavailable for predicting the onset of proteinuria at time j + 1; also the onset of proteinuria would be interval censored. Therefore, a routine Cox model with a time-varying covariate could not be used for the analysis.
These data arise in AIDS studies as well. For example, the AIDS Clinical Trials Group (ACTG) study 181 (Finkelstein et al., 2002) periodically monitored the blood and urine of patients for evidence of CMV disease. The investigators were interested in whether the CD4 count from the blood sample could predict risk of future CMV disease. A missing lab visit resulted in both missing CD4 and missing CMV marker data which provide evidence of CMV disease.
Methodology has been proposed for the estimation and regression analysis of the effect of baseline predictors on interval-censored failure time data (see Finkelstein, 1986 and Goggins et al. 1998). In addition, there are methods for regression of failure time data with missing longitudinal covariates, such as Goggins et al. (1999), which considered the case of a missing time-varying binary covariate (resulting in interval censored covariate data). Others have developed methods for analysis of data with missing covariates for the case when the failure times are right-censored, relying on joint modeling of the covariate and failure time distributions; these are well summarized in Hogan and Laird (1997). Computational methods for these models can be intensive, particularly when a Bayesian approach is taken (Guo and Carlin, 1994). While these methods are appropriate for right censored failure time data, they cannot be readily extended to interval censored data. The goal of this paper is to develop a simple and intuitive test for the relationship between the longitudinal covariate and a failure time when the time-varying covariates can be missing and the failure times are interval censored.
To develop this test we apply a powerful yet simple technique for deriving a score test in the presence of incomplete data which utilizes the fact that the score based on incomplete data is equal to the expected value of the complete data score conditional on the observed data. This allows a particularly simple derivation of the test statistic. This fact was noted by Efron in the discussion of Dempster, Laird and Rubin (1977). We call this the conditional expected score test (CEST) and note that while the longitudinal/event time application is an illustration of this technique, it provides a paradigm for developing tests for a broader class of missing data problems.
In Section 2, we will first introduce a score test for the relationship between an observed longitudinal covariate and an event for the case when all data are completely observed. Next we will introduce the random effects model to handle the problem that the covariate value may be missing for some subjects at some times. We then apply CEST to extend the score test to handle the case where some visits are missed, resulting in interval censored event (and missing covariate) data. We later extend the test again to the case where the covariate is measured with error, and the trajectory of the covariate is the appropriate predictor of failure rather than the observed covariate. In Sections 3 and 4, we provide simulations and an application of our methods to analyze a study of renal disease in type 1 diabetes. We discuss possible extensions, etc. in Section 5.
Suppose that we completely observe the data for all subjects. We will let Ti be the time patient i has an event, for i = 1, … , n, and let tj be pre-determined equally spaced clinic visits at which the patients are monitored, for j = 1, … , M. Let δij be an indicator of whether or not patient i had failed since the previous clinic visit at tj–1, Yij be an indicator of whether or not they were in follow-up (and at risk) at time j, and let Zij be their covariate at the previous clinic visit which will be used to predict the failure in the jth interval.
To model the relationship between the covariate at the previous visit and subsequent failure, we will utilize a generalized person-year approach described in Cupples et al (1988), by which observations over multiple time intervals are pooled into a single sample, and logistic regression is used to relate the risk factors to the occurrence of the event. When grouping intervals between exams are short, this Pooling Repeated Observations method (PRO) is asymptotically equivalent to grouped Cox model for a time-dependent covariate (D’Agostino, 1990). In our application, the response variable of the logistic model will indicate failure as a function of the covariate at the previous time, zij:
where pβ(zij) = p(δij = 1zij, Ti > tj–1), which is the conditional probability of observing an event in the jth interval given the individual was event-free through time tj–1 and zij is the (possibly vector valued) value of the covariate measured at time tj–1. For the PRO model, the intercept αj = α is assumed to be constant across time, as suggested by the investigator. This can be relaxed as we will describe in the discussion of this paper. In all our applications, we will assume that the failure status is “absorbing” and note that patients are no longer under observation after they fail or complete all the visits. Also, we will assume that determination of δij is made prior to the measurement of Z at time j.
We can now write the log-likelihood for the data as:
The numerator of the score test for the hypothesis H0 : β = 0 (for complete data) would be given by:
is the MLE for , the probability of failure at each time under H0 : β = 0.
Now suppose that the value of the covariate is missing for some subjects at some times. We will assume the covariate zij follows a linear growth curve model with random effects as in Wulfsohn and Tsiatis (1997). More specifically, we assume
where the error eij is from a distribution, and cov(eij, eij’) = 0 for j ≠ j’, and that the error is independent of the random intercept and slope, θ0i and θ1i. In addition, we assume that θi ~ MN(θ, V), where θi = ( θ0i θ1i )’, θ=( θ0 θ1 )’, and
We will assume that the measured covariate value predicts the event, but it may be missing for some people at some observation times, and call this Model O (Observed). Later in the paper, we will consider an alternate paradigm for which we will assume that the covariate is measured with error, and that the trajectory (from the random effects model) predicts the failure, called Model T (trajectory). In this case, the true value of the covariate is unobserved for all subjects at all times, and these values are treated as missing data. In both cases, as we will show later, we can use the predicted values obtained from the above random effects model to impute the missing covariates.
We now consider the problem that the event data can be censored into intervals bracketed by the missing visits. Suppose that the ith subject missed visits after time tLi, came back at tRi, and that δiLi = 0 and δiRi = 1. In this case, δij is missing for Li + 1 ≤ j ≤ Ri – 1. Also because the event is an absorbing state, if any subject misses a visit but returns at tk still event-free, then we assume δij = 0 for all j ≤ k. Let Mi be the index of the last time subject i was observed. We define which is an indicator of whether the subject ever failed, or left follow-up without an event. If the subject failed while in follow-up, then δi. = 1 and Ri = Mi. If the subject has not failed by their last visit at Mi = Li, then they are right censored, δi. = 0, and Ri = ∞. We note that whenever a subject misses a visit, the covariate measured at that time (used to predict the subsequent event) will be missing as well. However, because we can assume that the covariate follows the linear growth curve model discussed in the previous section, it will be possible to use the aforementioned approach (described in Section 2.2) in a straightforward way to handle missing observed covariate values. We will assume that our data for each subject consists of values of z up until Mi, their last observed time (although some may be missing). Also, the failure time information is captured by time indices Li, Ri; if a subject misses a visit and returns without the event, then their event status is assumed unchanged during the interval of the visit missed. We assume that any missing data is missing at random (MAR).
We will derive the score test for the association between the covariate and event time first directly from the likelihood. We will then show the result could be derived by utilizing the technique we have called the conditional expected score test (CEST).
We note that the log of the likelihood for the partially missing data can be written:
where Mi = (θi0, θi1, zij1, …) are missing for subject i and
where k is an index of a time interval in (Li, Mi]. Note that if δi. = 0, then g is only evaluated at k = Mi, in which case it is the probability of no failure seen in a patient up to time Mi.
The score for subject i can be written as Ai/Bi, where
Under H0 : β = 0,
where E(zij) is the expectation of zij conditional on all observed zi under the random effects model we assumed in the previous section (2.2,) and p = eα (1 + eα)−1. To calculate the test statistic in (8), we need to calculate the expected values for the missing data which we do in the next section. We also need to obtain the MLEs for the nuisance parameters under H0 : β = 0. It can be shown that under H0, (5) can be written as , where (α) only involves failure status and only involves covariates. Thus, the estimation of α only uses data from failure status and the estimation of only depends on observed covariate data.
We now note that the score presented in (8) could actually have been derived from (3) using a general principle for derivation of the score test from missing data that was noted by Efron in the discussion of Dempster, Laird and Rubin (1977), namely that the score based on incomplete data is the expected value of the complete data score conditional on the observed data. More formally (using notation of Dempster et al, 1977) , if is the Fisher score function based on the complete data set x, and is the score function based on a statistic y(x) (which in our case is the incomplete data y), then
We call this method the conditional expected score test (CEST) and note that use of this for the case of missing longitudinal and failure time data simplifies an otherwise complex derivation. We only need to obtain the expectations for the data that are missing. This approach can be applied to a broad class of tests in the context of missing data.
To calculate the conditional expectations in (8), we note first that under H0 : β = 0, (δij, Yij) is independent of (zi, θ0i, θ1i). We need to obtain the expectation of δ and Y conditional on the observed data, E(δijδ, Y, β = 0) and E(Yij,Y, β = 0) for each subject i who failed during follow-up and whose event is censored in (Li, Ri]. To do this, we obtain the values of the probability of the event occurring at each of the visits in the censoring interval. These quantities are calculated under the null hypothesis H0 : β = 0.
for j such that tj (Li, Ri] and 0 otherwise and
Note that wrij = 1 for j ≤ Li and is zero for j > Ri. Also note that for a subject who did not fail while under observation (from times 1, 2 … ,Mi), wdij = 0 and wrij = 1 for j = 1, … ,Mi. To obtain the MLE for under H0, we solve the observed data score equation for α:
This estimate can be obtained by substituting the expected values of Y and δ when these are unknown for interval censored subjects in (4); thus , is the value of p that satisfies the self consistency equation (Efron 1967):
We do this by an EM type algorithm. We start with the value of p calculated from complete data only, and iterate until convergence.
To calculate E(zij), we note that:
where , the expectation calculated as noted above. The exact and right censored individuals contribute the same terms to (15) as they did in (3), while the contribution from each individual i interval censored between (Li, Ri] is:
Note that the score in (16) can be viewed as a sum of “observed-expected”. In this case, each interval-censored subject contributes to the “observed” value (the first term in (16)) a weighted average of their estimated true covariate values from times j at which their failure could have occurred, with a weight equal to the probability the failure was at that time. They contribute to the “expected” value (the second term in (16)) a weighted average of their estimated true covariate values at the times k they were under observation with a weight equal to the probability that the failure was at or after the time k. The numerator of the score test U = ∑i Ui can also be written
where Dj is the set of individuals who had (or may have had) an event at time j, and Rj is the set of individuals who are or may have been under observation just before time j, and for exact failures, wdij = δij and wrij = Yij. Note that for an individual censored into (Li, Ri], they will contribute terms to the score at each time through Ri. Under the independence assumption, the scores, conditional on z are independent and the U has mean zero.
Thus, to calculate the score in (17), we first estimate from (13) as well as the weights wrij and wdij. We then calculate under the random effects model using all observed for all subjects and substitute these values into (17).
We note that our method could be easily applied to the case where a covariate was measured with error, and the trajectory (from the random effects model) predicts the failure. In this case, we replace zij by θ0i + θ1itj–1 and g becomes:
The argument in the previous section goes through, and the individual score contribution becomes:
When the expectation is taken with respect to θi, conditional on all the z, the score becomes the same as in (15):
Thus, to calculate the score in (15) in this case, we estimate from (13) as well as the weights wrij and wdij as before. We then calculate the random effects model using all observed covariates for all subjects. is then calculated from the random effects model and substituted into (15).
The variance of U in (15) could be calculated using the bootstrap method. Alternatively, we can obtain the variance using standard likelihood theory where we first calculate the empirical Fisher information matrix using individual scores and then invert this matrix and pick the appropriate element. More specifically, let , where , , and Λi denote the empirical individual scores corresponding to the parameters in the random effects model (see Appendix in Supplemental Materials). The Fisher information can be approximated by:
Then the variance of Ui will be approximately equal to the reciprocal of the (8,8) element of the inverse of the matrix M. The hypothesis test of H0 : β = 0 can be performed using Rao’s efficient score test by comparing to a normal distribution. We note that the formula given for Model O (where observed covariate is used when available) work also for Model T (covariate measured with error so trajectory is used). The only difference is the value of the covariate that is used. All proofs for the score and its variance work for both models.
In our simulation study, we generated data similar to the diabetes data set used in our diabetes example. For the simulations we used a random effects model; the variances of the intercept and slope were 1.57 and 0.29, respectively, and their correlation was 0.23. The error term had a variance of 2.7. The random effects were generated using the estimates obtained from the diabetes data set and the events were based on the logistic model with intercept zero and varying slopes. We created data with five visit times. We randomly generated missing visits with a probability of 0, .25 and .5. We ran simulations with p varying from a mean value (over the five times) of .2 to .6, and the slope of the random effects model varying from 0 to 2. We considered both the “observed” model where the covariate is filled in with a value predicted by the random effects model only when it is missing (Model O) and the “trajectory” model when the covariate is assumed to be measured with error, so the trajectory from the random effects model is used as the covariate (Model T). We used sample sizes of 200 and 500. Simulations were repeated 2000 times. The test size was satisfactory, ranging from .04 to .07 for n = 500. The 95% confidence bounds on this should have been ±1%. The size was similar regardless of missingness, model, or parameters of the random effects model. The test was slightly anti-conservative, due to an underestimate of the variance of the test. This was more pronounced, as expected, when a smaller sample size (200) was used. We used the asymptotic variance for simulations, but it may be more appropriate to use a more robust variance (such as the bootstrap variance). We assessed power of the test at β = .27. These results are displayed in tables provided in the Appendix in the Supplemental Materials of this paper. In general, the power of the proposed test increases as p, or the slope in the random effects model increases; and decreases with increasing missingness. Results are similar for the two models. We also performed the simulations using complete cases only, and found the test to be biased away from the null. This is due to the fact that the patients who fail earlier are less likely to be missing before they fail, and if there is a change in the covariate with time, it will appear to be significantly related to the event even under the null.
Elevation in urinary albumin excretion (to between 30 and 299 ug/min) marks the early stages of diabetic nephropathy termed microalbuminuria. This stage is an early marker of increased risk of advanced diabetic nephropathy, indicating a need for clinical intervention to prevent progression to the more serious state of proteinuria (albumin > 300 ug/min). Microalbuminuria can regress to normal in up to half of the patients.
The Joslin Study of the Natural History of Microalbuminuria in Type 1 Diabetes has been a 10 year observational study of a clinic-based population of young adults (Perkins et al., 2003, Ficociello et al., 2007). This study included a cohort of 381 patients who had evidence of microalbuminuria identified during the 2-year baseline observational period. All patients were scheduled for a baseline and at least four biannual visits at which they underwent clinical examination, and provided urine and blood specimens. Albumin excretion rate and HbA1c measurements were recorded. A1c tends to track over time and its trajectory seems to predict failure (see Figure 2). This study was focused on monitoring early changes in kidney health in type 1 diabetes patients. One aim was to determine the relationship between diabetes control and risk of progressive kidney disease. Hemoglobin A1c (HbA1c) is an average of blood glucose over the past 2-3 months, with high levels associated with poor diabetes control. The investigator was interested in whether previous A1c would predict subsequent onset of proteinuria. Although patients were scheduled for the clinic and lab visits every 2 years, many patients had missed or delayed visits, resulting in missing observations for both the HbA1c and the albumin level that would mark progression in kidney disease. As indicated in Figure 1, if a patient missed visit j, and returned with proteinuria at time j + 1, then the A1c for visit j was unavailable for predicting the onset of proteinuria, and the onset of proteinuria would be interval censored, and thus even in a person-years approach analysis, this patient would not have data to contribute to the risk analysis for times j or j + 1. The data set has 136 subjects who had complete data at all clinic visits and never failed during follow-up, 46 who had complete data up until they failed, 181 who did not fail during follow-up and left the study early, and 18 subjects who were interval censored on their failure (missed visits and returned having failed).
An initial analysis of these data was performed by the medical investigator. He had chosen to use a person-years approach, by applying the pooled logistic regression (PRO) approach (Cupples et al., 1988, D’Agostino et al., 1990) to simplify the analysis of the longitudinal missing data. He said it was reasonable to assume that only measurements at the most recent previous clinic visit would impact the diagnosis of kidney disease progression at the current visit. Missing observations were left out (complete cases only). This resulted in 1353 subject-visits. For this analysis, using the trajectory model, yielded a test statistic of Z = −7.25, and when we used the observed value of the covariate, Z = −6.35. Next we performed the analysis using our proposed method with imputed values for A1c (under random effects model) regardless of whether the visit was missed (assuming that A1c is measured with error), resulting in Z = −7.07. When we only used the imputed A1c for intervals for which it was missing, Z = −6.03. Using the bootstrap variance, the value of the test statistic in this latter case was Z = −5.10, confirming the fact that the test statistic based on the empirical variance is anti-conservative. Thus, A1c is highly associated with risk for kidney disease progression, although the association is less than that found on complete observations only. For this application, the test statistic was highly significant in all approaches. Our computer program will be made available on our website http://hedwig.mgh.harvard.edu/biostatistics.
We were interested in testing for the relationship of the time-varying covariate and the failure when some data were missing, resulting in interval censored failure time and time-varying covariates. We note that for our data, there were only five prescribed visit times at which failure could be determined (discrete failure time), but had the event been recorded in continuous time, it would have been possible to apply these methods by allowing the logistic parameter α to take on distinct values at each of the steps of the Turnbull (1976) empirical estimator. If the sample size and number of failures were sufficient, then the test associated with the logistic PRO model would be asymptotically equivalent to the proportional hazards regression model. (D’Agostino et al, 1990).
The development of our test was simplified by the use of the conditional expected score test (CEST). This approach could be used for other applications as well, as we have demonstrated in recent work (Finkelstein et al, 2008). There are several potential extensions of our methods. First, we made the assumption in this analysis that the censoring was independent. It would be useful to extend this approach to handle dependent censoring such as would be the case if a patient died prior to the event of proteinuria (see Robins and Finkelstein, 2000 and Finkelstein, Goggins and Schoenfeld, 2002). For the diabetes data discussed above, the investigators indicated that only the most recent value of A1c was likely to impact the event of interest, and that the distribution of A1c over time was unknown. To extend this to a model in which the history of A1c predicts failure, one could remove the stationarity condition (constant intercept in the logistic model). This would remove a source of the bias described above for using complete cases only. This more general model could also accommodate the situation when the pre-determined clinic visits were not evenly spaced in time. We could also introduce additional covariates in the model for p(tz) such as one that captures the slope or previous A1c values. We used a random effects model for the covariate function. Alternatively, if the number of visits is small, we could fit a multivariate normal model. In addition, other methods are available for imputing values for the covariate such as obtaining the value from PROC MI in SAS which assumes a multivariate normal model for z.
Finally, we note that we have focused here on developing a score test. If estimation were the focus of the analysis, we could have used a full likelihood approach using NLMIXED as described in Vonesh, Greene and Schluchter (2006).
We wish to thank Drs. Adrienne Cupples and Andrzej Krolewski for assistance with the diabetes data analysis, and Manor Ashkenazi and Peter Lazar for programming assistance. This work was partially funded by grants from the NIH: HD05520, CA 74302, CA075971, and DK-41526.
Supplemental Materials The Web Appendix with calculation of variance of the test statistic referenced in Section 2.5, and the tables with results of simulations for size and power of the test referenced in Section 3 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org.