Related Articles
Doubly robust estimation combines a form of outcome regression with a model for the exposure (i.e., the propensity score) to estimate the causal effect of an exposure on an outcome. When used individually to estimate a causal effect, both outcome regression and propensity score methods are unbiased only if the statistical model is correctly specified. The doubly robust estimator combines these 2 approaches such that only 1 of the 2 models need be correctly specified to obtain an unbiased effect estimator. In this introduction to doubly robust estimators, the authors present a conceptual overview of doubly robust estimation, a simple worked example, results from a simulation study examining performance of estimated and bootstrapped standard errors, and a discussion of the potential advantages and limitations of this method. The supplementary material for this paper, which is posted on the Journal's Web site (http://aje.oupjournals.org/), includes a demonstration of the doubly robust property (Web Appendix 1) and a description of a SAS macro (SAS Institute, Inc., Cary, North Carolina) for doubly robust estimation, available for download at http://www.unc.edu/∼mfunk/dr/.
doi:10.1093/aje/kwq439
PMCID: PMC3070495
PMID: 21385832
causal inference; epidemiologic methods; propensity score
Summary
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero.
doi:10.1093/biomet/asp033
PMCID: PMC2798744
PMID: 20161511
Causal inference; Enhanced propensity score model; Missing at random; No unmeasured con-founders; Outcome regression
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero.
doi:10.1093/biomet/asp033
PMCID: PMC2798744
PMID: 20161511
Causal inference; Enhanced propensity score model; Missing at random; No unmeasured confounders; Outcome regression
In this paper, the authors use the rubric of “coarsened data,” of which missing and censored data are special cases, to motivate the elicitation and use of expert information for performing sensitivity analyses of censored event-time data. Elicited information is important because observed data are insufficient to estimate how study participants with coarsened data compare with participants with uncoarsened data, and misspecifying this comparison may produce biased analysis results. In the presence of coarsening, performing a sensitivity analysis over a range of plausible assumptions is the best one can do. Here the authors illustrate an approach for eliciting expert information for use in sensitivity analyses to compare cumulative incidence functions of censored nonmortality outcomes. An example of such data is the AIDS Link to Intravenous Experience (ALIVE) Study, where the authors aim to estimate and compare cumulative incidence functions for human immunodeficiency virus between risk factor categories. The interval and right-censoring and censoring due to death found in the ALIVE data (1988–1998) are thought to be informative; thus, a sensitivity analysis is performed using information elicited from 2 ALIVE scientists and an expert in acquired immunodeficiency syndrome epidemiology about the relation between seroconversion and censoring.
doi:10.1093/aje/kwn265
PMCID: PMC2732953
PMID: 18952850
Bayesian analysis; frequentist approach; HIV; hypothesis test; incidence; interval censoring; sensitivity analysis
Summary
Restricted mean lifetime is often of direct interest in epidemiologic studies involving censored survival times. Differences in this quantity can be used as a basis for comparing several groups. For example, transplant surgeons, nephrologists and of course patients are interested in comparing post-transplant lifetimes among various types of kidney transplants in order to assist in clinical decision-making. As the factor of interest is not randomized, covariate adjustment is needed in order to account for imbalances in confounding factors. In this report, we use semiparametric theory to develop an estimator for differences in restricted mean lifetimes while accounting for confounding factors. The proposed method involves building working models for the time-to-event and coarsening mechanism (i.e., group assignment and censoring). We show that the proposed estimator possesses the double robust property; i.e., when either the time-to-event or coarsening process is modeled correctly, the estimator is consistent and asymptotically normal. Simulation studies are conducted to assess its finite-sample performance and the method is applied to national kidney transplant data.
doi:10.1111/j.1541-0420.2012.01759.x
PMCID: PMC3432755
PMID: 22471876
Average causal effect; Cox regression; Cumulative treatment effect; Double robust estimator; Inverse weighting
We consider the doubly robust estimation of the parameters in a semiparametric conditional odds ratio model. Our estimators are consistent and asymptotically normal in a union model that assumes either of two variation independent baseline functions is correctly modelled but not necessarily both. Furthermore, when either outcome has finite support, our estimators are semiparametric efficient in the union model at the intersection submodel where both nuisance functions models are correct. For general outcomes, we obtain doubly robust estimators that are nearly efficient at the intersection submodel. Our methods are easy to implement as they do not require the use of the alternating conditional expectations algorithm of Chen (2007).
doi:10.1093/biomet/asp062
PMCID: PMC3412601
PMID: 23049119
Doubly robust; Generalized odds ratio; Locally efficient; Semiparametric logistic regression
Non ignorable missing data is a common problem in longitudinal studies. Latent class models are attractive for simplifying the modeling of missing data when the data are subject to either a monotone or intermittent missing data pattern. In our study, we propose a new two-latent-class model for categorical data with informative dropouts, dividing the observed data into two latent classes; one class in which the outcomes are deterministic and a second one in which the outcomes can be modeled using logistic regression. In the model, the latent classes connect the longitudinal responses and the missingness process under the assumption of conditional independence. Parameters are estimated by the method of maximum likelihood estimation based on the above assumptions and the tetrachoric correlation between responses within the same subject. We compare the proposed method with the shared parameter model and the weighted GEE model using the areas under the ROC curves in the simulations and the application to the smoking cessation data set. The simulation results indicate that the proposed two-latent-class model performs well under different missing procedures. The application results show that our proposed method is better than the shared parameter model and the weighted GEE model.
doi:10.1080/03610920802585849
PMCID: PMC2879593
PMID: 20523912
Area under ROC curve; Informative dropout; Latent class; Tetrachoric correlation
The inverse of the nonparametric information operator is key to finding doubly robust estimators and the semiparametric efficient estimator in missing data problems. It is known that no closed-form expression for the inverse of the nonparametric information operator exists when missing data form nonmonotone patterns. Neumann series is usually applied to approximate the inverse. However, Neumann series approximation is only known to converge in L2 norm, which is not sufficient for establishing statistical properties of the estimators yielded from the approximation. In this article, we show that L∞ convergence of the Neumann series approximations to the inverse of the non-parametric information operator and to the efficient scores in missing data problems can be obtained under very simple conditions. This paves the way to the study of the asymptotic properties of the doubly robust estimators and the locally semiparametric efficient estimator in those difficult situations.
doi:10.1016/j.spl.2010.01.021
PMCID: PMC2850222
PMID: 20383317
Auxiliary information; Induction; Rate of convergence; Weighted estimating equation
Summary
Two approaches commonly used to deal with missing data are multiple
imputation (MI) and inverse-probability weighting (IPW). IPW is also used to
adjust for unequal sampling fractions. MI is generally more efficient than
IPW but more complex. Whereas IPW requires only a model for the probability
that an individual has complete data (a univariate outcome), MI needs a
model for the joint distribution of the missing data (a multivariate
outcome) given the observed data. Inadequacies in either model may lead to
important bias if large amounts of data are missing. A third approach
combines MI and IPW to give a doubly robust estimator. A fourth approach
(IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only
isolated missing values and uses weights to account for remaining larger
blocks of unimputed missing data, such as would arise, e.g., in a cohort
study subject to sample attrition, and/or unequal sampling fractions. In
this article, we examine the performance, in terms of bias and efficiency,
of IPW/MI relative to MI and IPW alone and investigate whether the
Rubin’s rules variance estimator is valid for IPW/MI. We prove that
the Rubin’s rules variance estimator is valid for IPW/MI for linear
regression with an imputed outcome, we present simulations supporting the
use of this variance estimator in more general settings, and we demonstrate
that IPW/MI can have advantages over alternatives. IPW/MI is applied to data
from the National Child Development Study.
doi:10.1111/j.1541-0420.2011.01666.x
PMCID: PMC3412287
PMID: 22050039
Marginal model; Missing at random; Survey weighting; 1958 British Birth Cohort
Background
In trials designed to estimate rates of perinatal mother to child transmission of HIV, HIV assays are scheduled at multiple points in time. Still, infection status for some infants at some time points may be unknown, particularly when interim analyses are conducted.
Methods
Logistic regression models are commonly used to estimate covariate-adjusted transmission rates, but their methods for handling missing data may be inadequate. Here we propose using coarsened multinomial regression models to estimate cumulative and conditional rates of HIV transmission. Through simulation, we compare the proposed models to standard logistic models in terms of bias, mean squared error, coverage probability, and power. We consider a range of treatment effect and visit process scenarios, while including imperfect sensitivity of the assay and contamination of the endpoint due to early breastfeeding transmission. We illustrate the approach through analysis of data from a clinical trial designed to prevent perinatal transmission.
Results
The proposed cumulative and conditional models performed well when compared to their logistic counterparts. Performance of the proposed cumulative model was particularly strong under scenarios where treatment was assumed to increase the risk of in utero transmission but decrease the risk of intrapartum and overall perinatal transmission and under scenarios designed to represent interim analyses. Power to estimate intrapartum and perinatal transmission was consistently higher for the proposed models.
Conclusion
Coarsened multinomial regression models are preferred to standard logistic models for estimation of perinatal mother to child transmission of HIV, particularly when assays are missing or occur off-schedule for some infants.
doi:10.1186/1471-2288-8-46
PMCID: PMC2515333
PMID: 18627627
Missing data are common in medical and social science studies and often pose a serious challenge in data analysis. Multiple imputation methods are popular and natural tools for handling missing data, replacing each missing value with a set of plausible values that represent the uncertainty about the underlying values. We consider a case of missing at random (MAR) and investigate the estimation of the marginal mean of an outcome variable in the presence of missing values when a set of fully observed covariates is available. We propose a new nonparametric multiple imputation (MI) approach that uses two working models to achieve dimension reduction and define the imputing sets for the missing observations. Compared with existing nonparametric imputation procedures, our approach can better handle covariates of high dimension, and is doubly robust in the sense that the resulting estimator remains consistent if either of the working models is correctly specified. Compared with existing doubly robust methods, our nonparametric MI approach is more robust to the misspecification of both working models; it also avoids the use of inverse-weighting and hence is less sensitive to missing probabilities that are close to 1. We propose a sensitivity analysis for evaluating the validity of the working models, allowing investigators to choose the optimal weights so that the resulting estimator relies either completely or more heavily on the working model that is likely to be correctly specified and achieves improved efficiency. We investigate the asymptotic properties of the proposed estimator, and perform simulation studies to show that the proposed method compares favorably with some existing methods in finite samples. The proposed method is further illustrated using data from a colorectal adenoma study.
PMCID: PMC3280694
PMID: 22347786
Doubly robust; Missing at random; Multiple imputation; Nearest neighbor; Nonparametric imputation; Sensitivity analysis
Summary
In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.
doi:10.1111/j.1541-0420.2009.01231.x
PMCID: PMC3148802
PMID: 19432773
Auxiliary covariate; High-dimensional data; Kernel estimation; Missing at random; Semiparametric regression
Summary
We consider the problem of comparing cumulative incidence functions of non-mortality events in the presence of informative coarsening and the competing risk of death. We extend frequentist-based hypothesis tests previously developed for non-informative coarsening and propose a novel Bayesian method based on comparing a posterior parameter transformation to its expected distribution under the null hypothesis of equal cumulative incidence functions. Both methods use estimates derived by extending previously published estimation procedures to accommodate censoring by death. The data structure and analysis goal are exemplified by the AIDS Link to the Intravenous Experience (ALIVE) study, where researchers are interested in comparing incidence of human immunodeficiency virus seroconversion by risk behavior categories. Coarsening in the forms of interval and right censoring and censoring by death in ALIVE are thought to be informative, thus we perform a sensitivity analysis by incorporating elicited expert information about the relationship between seroconversion and censoring into the model.
doi:10.1002/sim.3397
PMCID: PMC2796438
PMID: 18759370
Bayesian Analysis; Frequentist Analysis; Hypothesis Test; Interval Censoring; Markov Chain Monte Carlo; Sensitivity Analysis
SUMMARY
Double censoring often occurs in registry studies when left censoring is present in addition to right censoring. In this work, we propose a new analysis strategy for such doubly censored data by adopting a quantile regression model. We develop computationally simple estimation and inference procedures by appropriately using the embedded martingale structure. Asymptotic properties, including the uniform consistency and weak convergence, are established for the resulting estimators. Moreover, we propose conditional inference to address the special identifiability issues attached to the doubly censoring setting. We further show that the proposed method can be readily adapted to handle left truncation. Simulation studies demonstrate good finite-sample performance of the new inferential procedures. The practical utility of our method is illustrated by an analysis of the onset of the most commonly investigated respiratory infection, Pseudomonas aeruginosa, in children with cystic fibrosis through the use of the US Cystic Fibrosis Registry.
doi:10.1111/j.1541-0420.2011.01667.x
PMCID: PMC3312995
PMID: 21950348
Conditional inference; Double censoring; Empirical process; Martingale; Regression quantile; Truncation
SUMMARY
The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the proposed semiparametric model is hence more robust than the parametric conditional linear model. The unconditional distribution of the repeated measures is a mixture over the dropout distribution. We show that estimation in the semiparametric varying coefficient mixture model can proceed by fitting a parametric mixed effects model and can be carried out on standard software platforms such as SAS. The model is used to analyze data from a recent AIDS clinical trial and its performance is evaluated using simulations.
doi:10.1111/j.0006-341X.2004.00240.x
PMCID: PMC2677904
PMID: 15606405
Clinical trials; Equivalence trial; Linear mixed model; Missing data; Nonignorable dropout; Pattern-mixture model; Pediatric AIDS; Selection bias; Smoothing splines
Missing observations are commonplace in longitudinal data. We discuss how to model and
analyze such data in a dynamic framework, that is, taking into consideration the time
structure of the process and the influence of the past on the present and future
responses. An autoregressive model is used as a special case of the linear increments
model defined by Farewell (2006. Linear models
for censored data, [PhD Thesis]. Lancaster University) and Diggle and
others (2007. Analysis of longitudinal data with drop-out: objectives,
assumptions and a proposal. Journal of the Royal Statistical Society, Series C
(Applied Statistics, 56, 499–550). We wish to reconstruct
responses for missing data and discuss the required assumptions needed for both monotone
and nonmonotone missingness. The computational procedures suggested are very simple and
easily applicable. They can also be used to estimate causal effects in the presence of
time-dependent confounding. There are also connections to methods from survival analysis:
The Aalen–Johansen estimator for the transition matrix of a Markov chain turns out
to be a special case. Analysis of quality of life data from a cancer clinical trial is
analyzed and presented. Some simulations are given in the supplementary material available
at Biostatistics online.
doi:10.1093/biostatistics/kxq014
PMCID: PMC3293429
PMID: 20388914
Cancer clinical trial; Dynamic approach; Linear increments model; Longitudinal data; Missing data; Quality of life
Dropout is common in longitudinal clinical trials and when the probability of dropout depends on unobserved outcomes even after conditioning on available data, it is considered missing not at random and therefore nonignorable. To address this problem, mixture models can be used to account for the relationship between a longitudinal outcome and dropout. We propose a Natural Spline Varying-coefficient mixture model (NSV), which is a straightforward extension of the parametric Conditional Linear Model (CLM). We assume that the outcome follows a varying-coefficient model conditional on a continuous dropout distribution. Natural cubic B-splines are used to allow the regression coefficients to semiparametrically depend on dropout and inference is therefore more robust. Additionally, this method is computationally stable and relatively simple to implement. We conduct simulation studies to evaluate performance and compare methodologies in settings where the longitudinal trajectories are linear and dropout time is observed for all individuals. Performance is assessed under conditions where model assumptions are both met and violated. In addition, we compare the NSV to the CLM and a standard random-effects model using an HIV/AIDS clinical trial with probable nonignorable dropout. The simulation studies suggest that the NSV is an improvement over the CLM when dropout has a nonlinear dependence on the outcome.
doi:10.1016/j.cct.2011.11.009
PMCID: PMC3414213
PMID: 22101223
Dropout; Nonignorable Missing Data; Longitudinal data; Varying-coefficient model; B-spline; HIV/AIDS
Summary
We propose a new causal parameter, which is a natural extension of existing approaches to causal inference such as marginal structural models. Modelling approaches are proposed for the difference between a treatment-specific counterfactual population distribution and the actual population distribution of an outcome in the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population and therefore we refer to these models as population intervention models. We focus on intervention models estimating the effect of an intervention in terms of a difference and ratio of means, called risk difference and relative risk if the outcome is binary. We provide a class of inverse-probability-of-treatment-weighted and doubly-robust estimators of the causal parameters in these models. The finite-sample performance of these new estimators is explored in a simulation study.
PMCID: PMC2464276
PMID: 18629347
Attributable risk; Causal inference; Confounding; Counterfactual; Doubly-robust estimation; G-computation estimation; Inverse-probability-of-treatment-weighted estimation
We propose a family of regression models to adjust for nonrandom dropouts in the analysis of longitudinal outcomes with fully observed covariates. The approach conceptually focuses on generalized linear models with random effects. A novel formulation of a shared random effects model is presented and shown to provide a dropout selection parameter with a meaningful interpretation. The proposed semiparametric and parametric models are made part of a sensitivity analysis to delineate the range of inferences consistent with observed data. Concerns about model identifiability are addressed by fixing some model parameters to construct functional estimators that are used as the basis of a global sensitivity test for parameter contrasts. Our simulation studies demonstrate a large reduction of bias for the semiparametric model relatively to the parametric model at times where the dropout rate is high or the dropout model is misspecified. The methodology’s practical utility is illustrated in a data analysis.
doi:10.1111/j.1467-9574.2009.00435.x
PMCID: PMC3023945
PMID: 21258610
Exponential family distribution; Functional estimators; Global sensitivity analysis; Informative dropout; Infimum/Supremum statistic; Nonparametric mixture; Uniform convergence; non-identifiable models
Summary
Model misspecification can be a concern for high-dimensional data. Nonparametric regression obviates model specification but is impeded by the curse of dimensionality. This paper focuses on the estimation of the marginal mean response when there is missingness in the response and multiple covariates are available. We propose estimating the mean response through nonparametric functional estimation, where the dimension is reduced by a parametric working index. The proposed semiparametric estimator is robust to model misspecification: it is consistent for any working index if the missing mechanism of the response is known or correctly specified up to unknown parameters; even with misspecification in the missing mechanism, it is consistent so long as the working index can recover E(Y | X), the conditional mean response given the covariates. In addition, when the missing mechanism is correctly specified, the semiparametric estimator attains the optimal efficiency if E(Y | X) is recoverable through the working index. Robustness and efficiency of the proposed estimator is further investigated by simulations. We apply the proposed method to a clinical trial for HIV.
doi:10.1093/biomet/asq005
PMCID: PMC3412576
PMID: 23049121
Dimension reduction; Inverse probability weighting; Kernel regression; Missing at random; Robustness to model misspecification
This simulation-based report compares the performance of five methods of association analysis in the presence of linkage using extended sibships: the Family-Based Association Test (FBAT), Empirical Variance FBAT (EV-FBAT), Conditional Logistic Regression (CLR), Robust CLR (R-CLR) and Sibship Disequilibrium Test (SDT). The two tests accounting for residual familial correlation (EV-FBAT and R-CLR) and the model-free SDT showed correct test size in all simulated designs, while FBAT and CLR were only valid for small effect sizes. SDT had the lowest power, while CLR had the highest power, generally similar to FBAT and the robust variance analogues. The power of all model-dependent tests dropped when the model was misspecified, although often not substantially. Estimates of genetic effect with CLR and R-CLR were unbiased when the disease locus was analysed but biased when a nearby marker was analysed. This study demonstrates that the genetic effect does not need to be extreme to invalidate tests that ignore familial correlation and confirms that analogous methods using robust variance estimation provide a valid alternative at little cost to power. Overall R-CLR is the best-performing method among these alternatives for the analysis of extended sibship data.
doi:10.1111/j.1469-1809.2008.00475.x
PMCID: PMC2659381
PMID: 18782299
Extended sibships; conditional logistic regression; robust variance; simulation
We propose a marginalized joint-modeling approach for marginal inference on the association between longitudinal responses and covariates when longitudinal measurements are subject to informative dropouts. The proposed model is motivated by the idea of linking longitudinal responses and dropout times by latent variables while focusing on marginal inferences. We develop a simple inference procedure based on a series of estimating equations, and the resulting estimators are consistent and asymptotically normal with a sandwich-type covariance matrix ready to be estimated by the usual plug-in rule. The performance of our approach is evaluated through simulations and illustrated with a renal disease data application.
PMCID: PMC3261622
PMID: 22267962
In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
doi:10.1198/jasa.2011.tm10355
PMCID: PMC3115552
PMID: 21687809
Bootstrap methods; Convexity; Errors in variables; Hypothesis testing; Kernel methods; Local polynomial estimators; Monotone function; Nonparametric regression; Shape constraint; Unimodality
Summary
Linear mixed effects (LME) models are increasingly used for analyses of biological and biomedical data. When the multivariate normal assumption is not adequate for an LME model, then a robust estimation approach is preferable to the maximum likelihood one. M-estimators were considered before for robust estimation of the LME models, and recently a constrained S-estimator was proposed. This S-estimator can not be applied directly to LME models with correlated error terms and vector random effects with correlated dimensions. Therefore, a modification is proposed, which extends application of the constrained S-estimator to the LME models for multivariate responses with correlated dimensions and to longitudinal data. Also a new computational algorithm is developed for computing constrained S-estimators. Performance of the S-estimators based on the original Tukey’s biweight and translated biweight is evaluated in a small simulation study with repeated multivariate responses with correlated dimensions. Proposed methodology is applied to jointly analyze repeated measures on three cholesterol components, HDL, LDL, and triglycerides.
doi:10.1002/sim.4169
PMCID: PMC3137669
PMID: 21638300
Multivariate linear mixed effects models; robust estimation; CTBS estimator for LME model; M-estimator
Doubly-censored data refers to time to event data for which both the originating and failure times are censored. In studies involving AIDS incubation time or survival after dementia onset, for example, data are frequently doubly-censored because the date of the originating event is interval-censored and the date of the failure event usually is right-censored. The primary interest is in the distribution of elapsed times between the originating and failure events and its relationship to exposures and risk factors. The estimating equation approach [Sun, et al. 1999. Regression analysis of doubly censored failure time data with applications to AIDS studies. Biometrics 55, 909-914] and its extensions assume the same distribution of originating event times for all subjects. This paper demonstrates the importance of utilizing additional covariates to impute originating event times, i.e., more accurate estimation of originating event times may lead to less biased parameter estimates for elapsed time. The Bayesian MCMC method is shown to be a suitable approach for analyzing doubly-censored data and allows a rich class of survival models. The performance of the proposed estimation method is compared to that of other conventional methods through simulations. Two examples, an AIDS cohort study and a population-based dementia study, are used for illustration. Sample code is shown in the appendix.
doi:10.1016/j.csda.2010.02.025
PMCID: PMC2877214
PMID: 20514348
AIDS; dementia; doubly censored data; incubation period; MCMC; midpoint imputation