Generalized linear and nonlinear mixed models (GMMMs and NLMMs) are commonly used to represent non-Gaussian or nonlinear longitudinal or clustered data. A common assumption is that the random effects are Gaussian. However, this assumption may be unrealistic in some applications, and misspecification of the random effects density may lead to maximum likelihood parameter estimators that are inconsistent, biased, and inefficient. Because testing if the random effects are Gaussian is difficult, previous research has recommended using a flexible random effects density. However, computational limitations have precluded widespread use of flexible random effects densities for GLMMs and NLMMs. We develop a SAS macro, SNP_NLMM, that overcomes the computational challenges to fit GLMMs and NLMMs where the random effects are assumed to follow a smooth density that can be represented by the seminonparametric formulation proposed by Gallant and Nychka (1987). The macro is flexible enough to allow for any density of the response conditional on the random effects and any nonlinear mean trajectory. We demonstrate the SNP_NLMM macro on a GLMM of the disease progression of toenail infection and on a NLMM of intravenous drug concentration over time.
PMCID: PMC3969790
random effects; nonlinear mixed models; generalized linear mixed models; SAS; SNP
Summary
Because the number of patients waiting for organ transplants exceeds the number of organs available, a better understanding of how transplantation affects the distribution of residual lifetime is needed to improve organ allocation. However, there has been little work to assess the survival benefit of transplantation from a causal perspective. Previous methods developed to estimate the causal effects of treatment in the presence of time-varying confounders have assumed that treatment assignment was independent across patients, which is not true for organ transplantation. We develop a version of G-estimation that accounts for the fact that treatment assignment is not independent across individuals to estimate the parameters of a structural nested failure time model. We derive the asymptotic properties of our estimator and confirm through simulation studies that our method leads to valid inference of the effect of transplantation on the distribution of residual lifetime. We demonstrate our method on the survival benefit of lung transplantation using data from the United Network for Organ Sharing.
doi:10.1111/biom.12084
PMCID: PMC3865173
PMID: 24128090
Causal Inference; G-Estimation; Lung Transplantation; Martingale Theory; Structural Nested Failure Time Models
Summary
Observational studies are frequently conducted to compare the effects of two treatments on survival. For such studies we must be concerned about confounding; that is, there are covariates that affect both the treatment assignment and the survival distribution. With confounding the usual treatment-specific Kaplan-Meier estimator might be a biased estimator of the underlying treatment-specific survival distribution. This paper has two aims. In the first aim we use semiparametric theory to derive a doubly robust estimator of the treatment-specific survival distribution in cases where it is believed that all the potential confounders are captured. In cases where not all potential confounders have been captured one may conduct a substudy using a stratified sampling scheme to capture additional covariates that may account for confounding. The second aim is to derive a doubly-robust estimator for the treatment-specific survival distributions and its variance estimator with such a stratified sampling scheme. Simulation studies are conducted to show consistency and double robustness. These estimators are then applied to the data from the ASCERT study that motivated this research.
doi:10.1111/biom.12076
PMCID: PMC3865227
PMID: 24117096
Cox proportional hazard model; Double robustness; Observational study; Stratified sampling; Survival analysis
The vast majority of settings for which frequentist statistical properties are derived assume a fixed, a priori known sample size. Familiar properties then follow, such as, for example, the consistency, asymptotic normality, and efficiency of the sample average for the mean parameter, under a wide range of conditions. We are concerned here with the alternative situation in which the sample size is itself a random variable which may depend on the data being collected. Further the rule governing this may be deterministic or probabilistic. There are many important practical examples of such settings, including missing data, sequential trials, and informative cluster size. It is well known that special issues can arise when evaluating the properties of statistical procedures under such sampling schemes, and much has been written about specific areas3-4. Our aim is to place these various related examples into a single framework derived from the joint modeling of the outcomes and sampling process, and so derive generic results that in turn provide insight, and in some cases practical consequences, for different settings. It is shown that, even in the simplest case of estimating a mean, some of the results appear counter-intuitive. In many examples the sample average may exhibit small sample bias and, even when it is unbiased, may not be optimal. Indeed there may be no minimum variance unbiased estimator for the mean. Such results follow directly from key attributes such as non-ancilliarity of the sample size, and incompleteness of the minimal su cient statistic of the sample size and sample sum. Although our results have direct and obvious implications for estimation following group sequential trials, there are also ramifications for a range of other settings, such as random cluster sizes, censored time-to-event data, and the joint modeling of longitudinal and time-to-event data. Here we use the simplest sequential group sequential setting to develop and explicate the main results. Some implications for random sample sizes and missing data are also considered. Consequences for other related settings will be considered elsewhere.
doi:10.1177/0962280212445801
PMCID: PMC3404233
PMID: 22514029
Frequentist Inference; Generalized Sample Average; Informative Cluster Size; Joint Modeling; Likelihood Inference; Missing at Random; Random Cluster Size
Summary
A treatment regime is a rule that assigns a treatment, among a set of possible treatments, to a patient as a function of his/her observed characteristics, hence “personalizing” treatment to the patient. The goal is to identify the optimal treatment regime that, if followed by the entire population of patients, would lead to the best outcome on average. Given data from a clinical trial or observational study, for a single treatment decision, the optimal regime can be found by assuming a regression model for the expected outcome conditional on treatment and covariates, where, for a given set of covariates, the optimal treatment is the one that yields the most favorable expected outcome. However, treatment assignment via such a regime is suspect if the regression model is incorrectly specified. Recognizing that, even if misspecified, such a regression model defines a class of regimes, we instead consider finding the optimal regime within such a class by finding the regime the optimizes an estimator of overall population mean outcome. To take into account possible confounding in an observational study and to increase precision, we use a doubly robust augmented inverse probability weighted estimator for this purpose. Simulations and application to data from a breast cancer clinical trial demonstrate the performance of the method.
doi:10.1111/j.1541-0420.2012.01763.x
PMCID: PMC3556998
PMID: 22550953
Doubly robust estimator; Inverse probability weighting; Outcome regression; Personalized medicine; Potential outcomes; Propensity score
Mixed models are commonly used to represent longitudinal or repeated measures data. An additional complication arises when the response is censored, for example, due to limits of quantification of the assay used. While Gaussian random effects are routinely assumed, little work has characterized the consequences of misspecifying the random-effects distribution nor has a more flexible distribution been studied for censored longitudinal data. We show that, in general, maximum likelihood estimators will not be consistent when the random-effects density is misspecified, and the effect of misspecification is likely to be greatest when the true random-effects density deviates substantially from normality and the number of noncensored observations on each subject is small. We develop a mixed model framework for censored longitudinal data in which the random effects are represented by the flexible seminonparametric density and show how to obtain estimates in SAS procedure NLMIXED. Simulations show that this approach can lead to reduction in bias and increase in efficiency relative to assuming Gaussian random effects. The methods are demonstrated on data from a study of hepatitis C virus.
doi:10.1093/biostatistics/kxr026
PMCID: PMC3276268
PMID: 21914727
Censoring; HCV; HIV; Limit of quantification; Longitudinal data; Random effects
In many randomized clinical trials, the primary response variable, for example, the survival time, is not observed directly after the patients enroll in the study but rather observed after some period of time (lag time). It is often the case that such a response variable is missing for some patients due to censoring that occurs when the study ends before the patient’s response is observed or when the patients drop out of the study. It is often assumed that censoring occurs at random which is referred to as noninformative censoring; however, in many cases such an assumption may not be reasonable. If the missing data are not analyzed properly, the estimator or test for the treatment effect may be biased. In this paper, we use semiparametric theory to derive a class of consistent and asymptotically normal estimators for the treatment effect parameter which are applicable when the response variable is right censored. The baseline auxiliary covariates and post-treatment auxiliary covariates, which may be time-dependent, are also considered in our semiparametric model. These auxiliary covariates are used to derive estimators that both account for informative censoring and are more efficient then the estimators which do not consider the auxiliary covariates.
doi:10.1007/s10985-011-9199-8
PMCID: PMC3217309
PMID: 21706378
Informative censoring; Influence function; Logrank test; Nuisance tangent space; Proportional hazards model; Regular and asymptotically linear estimators
doi:10.1111/j.1751-5823.2011.00144.x
PMCID: PMC3173780
PMID: 21927532
Summary
A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial.
doi:10.1111/j.1541-0420.2010.01476.x
PMCID: PMC3061242
PMID: 20731640
Coarsening at random; Discrete hazard; Dropout; Longitudinal data; Missing at random
The Superior Yield of the New Strategy of Enoxaparin, Revascularization, and GlYcoprotein IIb/IIIa inhibitors (SYNERGY) was a randomized, open-label, multicenter clinical trial comparing 2 anticoagulant drugs on the basis of time-to-event endpoints. In contrast to other studies of these agents, the primary, intent-to-treat analysis did not find evidence of a difference, leading to speculation that premature discontinuation of the study agents by some subjects may have attenuated the apparent treatment effect and thus to interest in inference on the difference in survival distributions were all subjects in the population to follow the assigned regimens, with no discontinuation. Such inference is often attempted via ad hoc analyses that are not based on a formal definition of this treatment effect. We use SYNERGY as a context in which to describe how this effect may be conceptualized and to present a statistical framework in which it may be precisely identified, which leads naturally to inferential methods based on inverse probability weighting.
doi:10.1093/biostatistics/kxq054
PMCID: PMC3062147
PMID: 20797983
Dynamic treatment regime; Inverse probability weighting; Potential outcomes; Proportional hazards model
Summary
Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use repeated measures model to impute missing data on responder status as a result of subject drop-out and apply logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques (Rubin, 1987, Multiple Imputation for Nonresponse in Surveys) which draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within- and between-imputation variances are found to be conservative. The frequentist multiple imputation approach which fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of (Robins and Wang, 2000, Biometrika 87, 113–124) is shown to be more efficient. We propose to apply (Kenward and Roger, 1997, Biometrics 53, 983–997) degrees-of-freedom to account for the uncertainty associated with variance-covariance parameter estimates for the repeated measures model.
doi:10.1111/j.1541-0420.2010.01405.x
PMCID: PMC3245577
PMID: 20337628
Logistic regression; Missing data; Multiple imputation; Repeated measures
Summary
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero.
doi:10.1093/biomet/asp033
PMCID: PMC2798744
PMID: 20161511
Causal inference; Enhanced propensity score model; Missing at random; No unmeasured con-founders; Outcome regression
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero.
doi:10.1093/biomet/asp033
PMCID: PMC2798744
PMID: 20161511
Causal inference; Enhanced propensity score model; Missing at random; No unmeasured confounders; Outcome regression
Summary
For many diseases where there are several treatment options often there is no consensus on the best treatment to give individual patients. In such cases it may be necessary to define a strategy for treatment assignment; that is, an algorithm which dictates the treatment an individual should receive based on their measured characteristics. Such a strategy or algorithm is also referred to as a treatment regime. The optimal treatment regime is the strategy that would provide the most public health benefit by minimizing as many poor outcomes as possible. Using a measure that is a generalization of attributable risk and notions of potential outcomes, we derive an estimator for the proportion of events that could have been prevented had the optimal treatment regime been implemented. Traditional attributable risk studies look at the added risk that can be attributed to exposure of some contaminant, here we will instead study the benefit that can be attributed to using the optimal treatment strategy.
We will show how regression models can be used to estimate the optimal treatment strategy and the attributable benefit of that strategy. We also derive the large sample properties of this estimator. As a motivating example we will apply our methods to an observational study of 3856 patients treated at the Duke University Medical Center with prior coronary artery bypass graft surgery and further heart related problems requiring a catheterization. The patients may be treated with either medical therapy alone or a combination of medical therapy and percutaneous coronary intervention without general consensus on which is the best treatment for individual patients.
doi:10.1111/j.1541-0420.2009.01282.x
PMCID: PMC2891886
PMID: 19508237
Attributable Risk; Causal Inference; Influence Function; Optimal Treatment Regime
Background
Implantable cardioverter defibrillator (ICD) therapy significantly prolongs life in patients at increased risk of sudden cardiac death from depressed left ventricular function. However, it is unclear whether this increased longevity is accompanied by deterioration in quality of life.
Methods
The Sudden Cardiac Death in Heart Failure Trial (SCD-HeFT) compared ICD therapy or amiodarone versus state-of-the-art medical therapy alone in 2521 stable heart failure patients with depressed left ventricular function. Quality of life, a secondary end point of the trial, was prospectively measured at baseline, 3, 12, and 30 months and was 93% to 98% complete. The Duke Activity Status Index (which measures cardiac physical functioning) and the SF-36 Mental Health Inventory (which measures psychological well-being or distress) were prespecified principal quality-of-life outcomes. Multiple additional quality-of-life outcomes were also examined.
Results
Compared with medical therapy alone, psychological well-being in the ICD arm significantly improved at 3 months (p=0.01) and 12 months (p=0.004) but not at 30 months. No clinically or statistically significant differences in physical functioning by treatment were observed. Some other quality-of-life measures improved in the ICD arm at 3 and/or 12 months but none differed significantly at 30 months. ICD shocks within the month preceding a scheduled assessment were associated with decreased quality of life in multiple domains. Amiodarone had no significant effects on the principal quality-of-life outcomes.
Conclusions
In a large primary prevention population with moderately symptomatic heart failure, single lead ICD therapy was not associated with any detectable adverse quality-of-life effects over 30 months of follow-up.
doi:10.1056/NEJMoa0706719
PMCID: PMC2823628
PMID: 18768943
Sudden cardiac death; congestive heart failure; implantable cardioverter-defibrillator; quality of life
SUMMARY
There is considerable debate regarding whether and how covariate adjusted analyses should be used in the comparison of treatments in randomized clinical trials. Substantial baseline covariate information is routinely collected in such trials, and one goal of adjustment is to exploit covariates associated with outcome to increase precision of estimation of the treatment effect. However, concerns are routinely raised over the potential for bias when the covariates used are selected post hoc; and the potential for adjustment based on a model of the relationship between outcome, covariates, and treatment to invite a “fishing expedition” for that leading to the most dramatic effect estimate. By appealing to the theory of semiparametrics, we are led naturally to a characterization of all treatment effect estimators and to principled, practically-feasible methods for covariate adjustment that yield the desired gains in efficiency and that allow covariate relationships to be identified and exploited while circumventing the usual concerns. The methods and strategies for their implementation in practice are presented. Simulation studies and an application to data from an HIV clinical trial demonstrate the performance of the techniques relative to existing methods.
doi:10.1002/sim.3113
PMCID: PMC2562926
PMID: 17960577
baseline variables; clinical trials; covariate adjustment; efficiency; semiparametric theory; variable selection
The pretest–posttest study is commonplace in numerous applications. Typically, subjects are randomized to two treatments, and response is measured at baseline, prior to intervention with the randomized treatment (pretest), and at prespecified follow-up time (posttest). Interest focuses on the effect of treatments on the change between mean baseline and follow-up response. Missing posttest response for some subjects is routine, and disregarding missing cases can lead to invalid inference. Despite the popularity of this design, a consensus on an appropriate analysis when no data are missing, let alone for taking into account missing follow-up, does not exist. Under a semiparametric perspective on the pretest–posttest model, in which limited distributional assumptions on pretest or posttest response are made, we show how the theory of Robins, Rotnitzky and Zhao may be used to characterize a class of consistent treatment effect estimators and to identify the efficient estimator in the class. We then describe how the theoretical results translate into practice. The development not only shows how a unified framework for inference in this setting emerges from the Robins, Rotnitzky and Zhao theory, but also provides a review and demonstration of the key aspects of this theory in a familiar context. The results are also relevant to the problem of comparing two treatment means with adjustment for baseline covariates.
doi:10.1214/088342305000000151
PMCID: PMC2600547
PMID: 19081743
Analysis of covariance; covariate adjustment; influence function; inverse probability weighting; missing at random
Summary
The primary goal of a randomized clinical trial is to make comparisons among two or more treatments. For example, in a two-arm trial with continuous response, the focus may be on the difference in treatment means; with more than two treatments, the comparison may be based on pairwise differences. With binary outcomes, pairwise odds-ratios or log-odds ratios may be used. In general, comparisons may be based on meaningful parameters in a relevant statistical model. Standard analyses for estimation and testing in this context typically are based on the data collected on response and treatment assignment only. In many trials, auxiliary baseline covariate information may also be available, and it is of interest to exploit these data to improve the efficiency of inferences. Taking a semiparametric theory perspective, we propose a broadly-applicable approach to adjustment for auxiliary covariates to achieve more efficient estimators and tests for treatment parameters in the analysis of randomized clinical trials. Simulations and applications demonstrate the performance of the methods.
doi:10.1111/j.1541-0420.2007.00976.x
PMCID: PMC2574960
PMID: 18190618
Covariate adjustment; Hypothesis test; k-arm trial; Kruskal-Wallis test; Log-odds ratio; Longitudinal data; Semiparametric theory
doi:10.1214/07-STS227
PMCID: PMC2397555
PMID: 18516239