Time-to-event data in which failures are only assessed at discrete time points are common in many clinical trials. Examples include oncology studies where events are observed through periodic screenings such as radiographic scans. When the survival endpoint is acknowledged to be discrete, common methods for the analysis of observed failure times include the discrete hazard models (e.g., the discrete-time proportional hazards and the continuation ratio model) and the proportional odds model. In this manuscript, we consider estimation of a marginal treatment effect in discrete hazard models where the constant treatment effect assumption is violated. We demonstrate that the estimator resulting from these discrete hazard models is consistent for a parameter that depends on the underlying censoring distribution. An estimator that removes the dependence on the censoring mechanism is proposed and its asymptotic distribution is derived. Basing inference on the proposed estimator allows for statistical inference that is scientifically meaningful and reproducible. Simulation is used to assess the performance of the presented methodology in finite samples.
Censoring; Estimating equations; Discrete survival endpoints; Model misspecification; Robust inference
Competing risks data are routinely encountered in various medical applications due to the fact that patients may die from different causes. Recently, several models have been proposed for fitting such survival data. In this paper, we develop a fully specified subdistribution model for survival data in the presence of competing risks via a subdistribution model for the primary cause of death and conditional distributions for other causes of death. Various properties of this fully specified subdistribution model have been examined. An efficient Gibbs sampling algorithm via latent variables is developed to carry out posterior computations. Deviance Information Criterion (DIC) and Logarithm of the Pseudomarginal Likelihood (LPML) are used for model comparison. An extensive simulation study is carried out to examine the performance of DIC and LPML in comparing the cause-specific hazards model, the mixture model, and the fully specified subdistribution model. The proposed methodology is applied to analyze a real dataset from a prostate cancer study in detail.
Latent variables; Markov chain Monte Carlo; Partial likelihood; Proportional hazards
Standard methods for estimating the effect of a time-varying exposure on survival may be biased in the presence of time-dependent confounders themselves affected by prior exposure. This problem can be overcome by inverse probability weighted estimation of Marginal Structural Cox Models (Cox MSM), g-estimation of Structural Nested Accelerated Failure Time Models (SNAFTM) and g-estimation of Structural Nested Cumulative Failure Time Models (SNCFTM). In this paper, we describe a data generation mechanism that approximately satisfies a Cox MSM, an SNAFTM and an SNCFTM. Besides providing a procedure for data simulation, our formal description of a data generation mechanism that satisfies all three models allows one to assess the relative advantages and disadvantages of each modeling approach. A simulation study is also presented to compare effect estimates across the three models.
A copula model for bivariate survival data with hybrid censoring is
proposed to study the association between survival time of individuals infected
with HIV and persistence time of infection with an additional virus. Survival
with HIV is right censored and the persistence time of the additional virus is
subject to interval censoring case 1. A pseudo-likelihood method is developed to
study the association between the two event times under such hybrid censoring.
Asymptotic consistency and normality of the pseudo-likelihood estimator are
established based on empirical process theory. Simulation studies indicate good
performance of the estimator with moderate sample size. The method is applied to
a motivating HIV study which investigates the effect of GB virus type C (GBV-C)
co-infection on survival time of HIV infected individuals.
Association measure; Bivariate survival model; Copula; Current status data; Kendall's τ; Right censored data; Empirical process
In many biomedical studies, it is common that due to budget constraints, the primary covariate is only collected in a randomly selected subset from the full study cohort. Often, there is an inexpensive auxiliary covariate for the primary exposure variable that is readily available for all the cohort subjects. Valid statistical methods that make use of the auxiliary information to improve study efficiency need to be developed. To this end, we develop an estimated partial likelihood approach for correlated failure time data with auxiliary information. We assume a marginal hazard model with common baseline hazard function. The asymptotic properties for the proposed estimators are developed. The proof of the asymptotic results for the proposed estimators is nontrivial since the moments used in estimating equation are not martingale-based and the classical martingale theory is not sufficient. Instead, our proofs rely on modern empirical theory. The proposed estimator is evaluated through simulation studies and is shown to have increased efficiency compared to existing methods. The proposed methods are illustrated with a data set from the Framingham study.
Marginal hazard model; Correlated failure time; Validation set; Auxiliary covariate
We derive estimators of the mean of a function of a quality-of-life adjusted failure time, in the presence of competing right censoring mechanisms. Our approach allows for the possibility that some or all of the competing censoring mechanisms are associated with the endpoint, even after adjustment for recorded prognostic factors, with the degree of residual association possibly different for distinct censoring processes. Our methods generalize from a single to many censoring processes and from ignorable to non-ignorable censoring processes.
Cause-specific; Dependent censoring; Inverse weighted probability; Sensitivity analysis
In many randomized clinical trials, the primary response variable, for example, the survival time, is not observed directly after the patients enroll in the study but rather observed after some period of time (lag time). It is often the case that such a response variable is missing for some patients due to censoring that occurs when the study ends before the patient’s response is observed or when the patients drop out of the study. It is often assumed that censoring occurs at random which is referred to as noninformative censoring; however, in many cases such an assumption may not be reasonable. If the missing data are not analyzed properly, the estimator or test for the treatment effect may be biased. In this paper, we use semiparametric theory to derive a class of consistent and asymptotically normal estimators for the treatment effect parameter which are applicable when the response variable is right censored. The baseline auxiliary covariates and post-treatment auxiliary covariates, which may be time-dependent, are also considered in our semiparametric model. These auxiliary covariates are used to derive estimators that both account for informative censoring and are more efficient then the estimators which do not consider the auxiliary covariates.
Informative censoring; Influence function; Logrank test; Nuisance tangent space; Proportional hazards model; Regular and asymptotically linear estimators
Widely recognized in many fields including economics, engineering, epidemiology, health sciences, technology and wildlife management, length-biased sampling generates biased and right-censored data but often provide the best information available for statistical inference. Different from traditional right-censored data, length-biased data have unique aspects resulting from their sampling procedures. We exploit these unique aspects and propose a general imputation-based estimation method for analyzing length-biased data under a class of flexible semiparametric transformation models. We present new computational algorithms that can jointly estimate the regression coefficients and the baseline function semiparametrically. The imputation-based method under the transformation model provides an unbiased estimator regardless whether the censoring is independent or not on the covariates. We establish large-sample properties using the empirical processes method. Simulation studies show that under small to moderate sample sizes, the proposed procedure has smaller mean square errors than two existing estimation procedures. Finally, we demonstrate the estimation procedure by a real data example.
Biased sampling; Estimating equation; Imputation; Transformation models
Often in observational studies of time to an event, the study population is a biased (i.e., unrepresentative) sample of the target population. In the presence of biased samples, it is common to weight subjects by the inverse of their respective selection probabilities. Pan and Schaubel (2008) recently proposed inference procedures for an inverse selection probability weighted (ISPW) Cox model, applicable when selection probabilities are not treated as fixed but estimated empirically. The proposed weighting procedure requires auxiliary data to estimate the weights and is computationally more intense than unweighted estimation. The ignorability of sample selection process in terms of parameter estimators and predictions is often of interest, from several perspectives: e.g., to determine if weighting makes a significant difference to the analysis at hand, which would in turn address whether the collection of auxiliary data was required in future studies; to evaluate previous studies which did not correct for selection bias. In this article, we propose methods to quantify the degree of bias corrected by the weighting procedure in the partial likelihood and Breslow-Aalen estimators. Asymptotic properties of the proposed test statistics are derived. The finite-sample significance level and power are evaluated through simulation. The proposed methods are then applied to data from a national organ failure registry to evaluate the bias in a post kidney transplant survival model.
Confidence bands; Inverse-selection-probability weights; Observational studies; Proportional hazards model; Selection bias; Wald test
In biomedical studies where the event of interest is recurrent (e.g., hospitalization), it is often the case that the recurrent event sequence is subject to being stopped by a terminating event (e.g., death). In comparing treatment options, the marginal recurrent event mean is frequently of interest. One major complication in the recurrent/terminal event setting is that censoring times are not known for subjects observed to die, which renders standard risk set based methods of estimation inapplicable. We propose two semiparametric methods for estimating the difference or ratio of treatment-specific marginal means numbers of events. The first method involves imputing unobserved censoring times, while the second methods uses inverse probability of censoring weighting. In each case, imbalances in the treatment-specific covariate distributions are adjusted out through inverse probability of treatment weighting. After the imputation and/or weighting, the treatment-specific means (then their difference or ratio) are estimated nonparametrically. Large-sample properties are derived for each of the proposed estimators, with finite sample properties assessed through simulation. The proposed methods are applied to kidney transplant data.
Censoring; Imputation; Inverse weighting; Marginal mean; Multivariate survival analysis; Semiparametric methods
The aim of this paper is to develop a Bayesian local influence method (Zhu et al. 2009, submitted) for assessing minor perturbations to the prior, the sampling distribution, and individual observations in survival analysis. We introduce a perturbation model to characterize simultaneous (or individual) perturbations to the data, the prior distribution, and the sampling distribution. We construct a Bayesian perturbation manifold to the perturbation model and calculate its associated geometric quantities including the metric tensor to characterize the intrinsic structure of the perturbation model (or perturbation scheme). We develop local influence measures based on several objective functions to quantify the degree of various perturbations to statistical models. We carry out several simulation studies and analyze two real data sets to illustrate our Bayesian local influence method in detecting influential observations, and for characterizing the sensitivity to the prior distribution and hazard function.
Bayesian local influence; Bayesian perturbation manifold; Perturbed model; Posterior distribution; Prior; Survival model
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
Asymptotic normality; Censoring indicator; Imputation; Inverse probability weighting; Least squares; Missing at random; Regression calibration
Nested case–control (NCC) sampling is widely used in large epidemiological cohort studies for its cost effectiveness, but its data analysis primarily relies on the Cox proportional hazards model. In this paper, we consider a family of linear transformation models for analyzing NCC data and propose an inverse selection probability weighted estimating equation method for inference. Consistency and asymptotic normality of our estimators for regression coefficients are established. We show that the asymptotic variance has a closed analytic form and can be easily estimated. Numerical studies are conducted to support the theory and an application to the Wilms’ Tumor Study is also given to illustrate the methodology.
Linear transformation models; Nested case–control sampling; Weighted estimating equation
Recurrent events are frequently encountered in biomedical studies. Evaluating the covariates effects on the marginal recurrent event rate is of practical interest. There are mainly two types of rate models for the recurrent event data: the multiplicative rates model and the additive rates model. We consider a more flexible additive–multiplicative rates model for analysis of recurrent event data, wherein some covariate effects are additive while others are multiplicative. We formulate estimating equations for estimating the regression parameters. The estimators for these regression parameters are shown to be consistent and asymptotically normally distributed under appropriate regularity conditions. Moreover, the estimator of the baseline mean function is proposed and its large sample properties are investigated. We also conduct simulation studies to evaluate the finite sample behavior of the proposed estimators. A medical study of patients with cystic fibrosis suffered from recurrent pulmonary exacerbations is provided for illustration of the proposed method.
Recurrent events; Rate regression; Additive–multiplicative rates model; Counting process; Empirical process
The analysis of recurrent failure time data from longitudinal studies can be complicated by the presence of dependent censoring. There has been a substantive literature that has developed based on an artificial censoring device. We explore in this article the connection between this class of methods with truncated data structures. In addition, a new procedure is developed for estimation and inference in a joint model for recurrent events and dependent censoring. Estimation proceeds using a mixed U-statistic based estimating function approach. New resampling-based methods for variance estimation and model checking are also described. The methods are illustrated by application to data from an HIV clinical trial as with a limited simulation study.
Accelerated failure time model; Cause-specific hazard; Comparability; Competing risks; Empirical process; Semi-competing risks data
Case-control family data are now widely used to examine the role of gene-environment interactions in the etiology of complex diseases. In these types of studies, exposure levels are obtained retrospectively and, frequently, information on most risk factors of interest is available on the probands but not on their relatives. In this work we consider correlated failure time data arising from population-based case-control family studies with missing genotypes of relatives. We present a new method for estimating the age-dependent marginalized hazard function. The proposed technique has two major advantages: (1) it is based on the pseudo full likelihood function rather than a pseudo composite likelihood function, which usually suffers from substantial efficiency loss; (2) the cumulative baseline hazard function is estimated using a two-stage estimator instead of an iterative process. We assess the performance of the proposed methodology with simulation studies, and illustrate its utility on a real data example.
case-control family study; missing genotypes; multivariate survival analysis; frailty model
This article studies a general joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the competing risks survival data, and a regression sub-model for the variance–covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. The model provides a useful approach to adjust for non-ignorable missing data due to dropout for the longitudinal outcome, enables analysis of the survival outcome with informative censoring and intermittently measured time-dependent covariates, as well as joint analysis of the longitudinal and survival outcomes. Unlike previously studied joint models, our model allows for heterogeneous random covariance matrices. It also offers a framework to assess the homogeneous covariance assumption of existing joint models. A Bayesian MCMC procedure is developed for parameter estimation and inference. Its performances and frequentist properties are investigated using simulations. A real data example is used to illustrate the usefulness of the approach.
Cause-specific hazard; Bayesian analysis; Cholesky decomposition; Mixed effects model; MCMC; Modeling covariance matrices
In many studies examining the progression of HIV and other chronic diseases, subjects are periodically monitored to assess their progression through disease states. This gives rise to a specific type of panel data which have been termed “chain-of-events data”; e.g. data that result from periodic observation of a progressive disease process whose states occur in a prescribed order and where state transitions are not observable. Using a discrete time semi-Markov model, we develop an algorithm for nonparametric estimation of the distribution functions of sojourn times in a J state progressive disease model. Issues of uniqueness for chain-of-events data are not well-understood. Thus, a main goal of this paper is to determine the uniqueness of the nonparametric estimators of the distribution functions of sojourn times within states. We develop sufficient conditions for uniqueness of the nonparametric maximum likelihood estimator, including situations where some but not all of its components are unique. We illustrate the methods with three examples.
Chain-of-events data; nonparametric maximum likelihood estimate (NPMLE); uniqueness; interval censoring; progressive disease model; Time-to-event data
In a longitudinal study, an individual is followed up over a period of time. Repeated measurements on the response and some time-dependent covariates are taken at a series of sampling times. The sampling times are often irregular and depend on covariates. In this paper, we propose a sampling adjusted procedure for the estimation of the proportional mean model without having to specify a sampling model. Unlike existing procedures, the proposed method is robust to model misspecification of the sampling times. Large sample properties are investigated for the estimators of both regression coefficients and the baseline function. We show that the proposed estimation procedure is more efficient than the existing procedures. Large sample confidence intervals for the baseline function are also constructed by perturbing the estimation equations. A simulation study is conducted to examine the finite sample properties of the proposed estimators and to compare with some of the existing procedures. The method is illustrated with a data set from a recurrent bladder cancer study.
Asymptotic efficiency; Data-driven bandwidth selection; Censored follow-up times; Kernel smoothing; Model misspecification; Panel count data; Proportional mean model; Sampling adjusted estimation; Weighted least squares estimator
Due to significant progress in cancer treatments and management in survival studies involving time to relapse (or death), we often need survival models with cured fraction to account for the subjects enjoying prolonged survival. Our article presents a new proportional odds survival models with a cured fraction using a special hierarchical structure of the latent factors activating cure. This new model has same important differences with classical proportional odds survival models and existing cure-rate survival models. We demonstrate the implementation of Bayesian data analysis using our model with data from the SEER (Surveillance Epidemiology and End Results) database of the National Cancer Institute. Particularly aimed at survival data with cured fraction, we present a novel Bayes method for model comparisons and assessments, and demonstrate our new tool’s superior performance and advantages over competing tools.
Bayesian hierarchical models; Cure rate models; Cure fractions; Latent activation schemes; Markov Chain Monte Carlo algorithms; Proportional Odds Models; Survival analysis
The use of the cumulative average model to investigate the association between disease incidence and repeated measurements of exposures in medical follow-up studies can be dated back to the 1960s (Kahn and Dawber, J Chron Dis 19:611–620, 1966). This model takes advantage of all prior data and thus should provide a statistically more powerful test of disease-exposure associations. Measurement error in covariates is common for medical follow-up studies. Many methods have been proposed to correct for measurement error. To the best of our knowledge, no methods have been proposed yet to correct for measurement error in the cumulative average model. In this article, we propose a regression calibration approach to correct relative risk estimates for measurement error. The approach is illustrated with data from the Nurses’ Health Study relating incident breast cancer between 1980 and 2002 to time-dependent measures of calorie-adjusted saturated fat intake, controlling for total caloric intake, alcohol intake, and baseline age.
Measurement error; Regression calibration; Nutritional data
This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.
accelerated failure time models; cure rate model; improper model; semi-parametric model
For semiparametric models, interval estimation and hypothesis testing based on the information matrix for the full model is a challenge because of potentially unlimited dimension. Use of the profile information matrix for a small set of parameters of interest is an appealing alternative. Existing approaches for the estimation of the profile information matrix are either subject to the curse of dimensionality, or are ad-hoc and approximate and can be unstable and numerically inefficient. We propose a numerically stable and efficient algorithm that delivers an exact observed profile information matrix for regression coefficients for the class of Nonlinear Transformation Models [A. Tsodikov (2003) J R Statist Soc Ser B 65:759–774]. The algorithm deals with the curse of dimensionality and requires neither large matrix inverses nor explicit expressions for the profile surface.
Profile likelihood; Semiparametric models; Information matrix
In high throughput genomic studies, an important goal is to identify a small number of genomic markers that are associated with development and progression of diseases. A representative example is microarray prognostic studies, where the goal is to identify genes whose expressions are associated with disease free or overall survival. Because of the high dimensionality of gene expression data, standard survival analysis techniques cannot be directly applied. In addition, among the thousands of genes surveyed, only a subset are disease-associated. Gene selection is needed along with estimation. In this article, we model the relationship between gene expressions and survival using the accelerated failure time (AFT) models. We use the bridge penalization for regularized estimation and gene selection. An efficient iterative computational algorithm is proposed. Tuning parameters are selected using V-fold cross validation. We use a resampling method to evaluate the prediction performance of bridge estimator and the relative stability of identified genes. We show that the proposed bridge estimator is selection consistent under appropriate conditions. Analysis of two lymphoma prognostic studies suggests that the bridge estimator can identify a small number of genes and can have better prediction performance than the Lasso.
Bridge penalization; Censored data; High dimensional data; Selection consistency; Stability; Sparse model
Recurrent events data are frequently encountered and could be stopped by a terminal event in clinical trials. It is of interest to assess the treatment efficacy simultaneously with respect to both the recurrent events and the terminal event in many applications. In this paper we propose joint covariate-adjusted score test statistics based on joint models of recurrent events and a terminal event. No assumptions on the functional form of the covariates are needed. Simulation results show that the proposed tests can improve the efficiency over tests based on covariate unadjusted model. The proposed tests are applied to the SOLVD data for illustration.
Frailty; Proportional hazards; Proportional rates; Recurrent events data; Semiparametric model