In longitudinal and repeated measures data analysis, often the goal is to determine the effect of a treatment or aspect on a particular outcome (e.g., disease progression). We consider a semiparametric repeated measures regression model, where the parametric component models effect of the variable of interest and any modification by other covariates. The expectation of this parametric component over the other covariates is a measure of variable importance. Here, we present a targeted maximum likelihood estimator of the finite dimensional regression parameter, which is easily estimated using standard software for generalized estimating equations.
The targeted maximum likelihood method provides double robust and locally efficient estimates of the variable importance parameters and inference based on the influence curve. We demonstrate these properties through simulation under correct and incorrect model specification, and apply our method in practice to estimating the activity of transcription factor (TF) over cell cycle in yeast. We specifically target the importance of SWI4, SWI6, MBP1, MCM1, ACE2, FKH2, NDD1, and SWI5.
The semiparametric model allows us to determine the importance of a TF at specific time points by specifying time indicators as potential effect modifiers of the TF. Our results are promising, showing significant importance trends during the expected time periods. This methodology can also be used as a variable importance analysis tool to assess the effect of a large number of variables such as gene expressions or single nucleotide polymorphisms.
targeted maximum likelihood; semiparametric; repeated measures; longitudinal; transcription factors
The primary goal of a randomized clinical trial is to make comparisons among two or more treatments. For example, in a two-arm trial with continuous response, the focus may be on the difference in treatment means; with more than two treatments, the comparison may be based on pairwise differences. With binary outcomes, pairwise odds-ratios or log-odds ratios may be used. In general, comparisons may be based on meaningful parameters in a relevant statistical model. Standard analyses for estimation and testing in this context typically are based on the data collected on response and treatment assignment only. In many trials, auxiliary baseline covariate information may also be available, and it is of interest to exploit these data to improve the efficiency of inferences. Taking a semiparametric theory perspective, we propose a broadly-applicable approach to adjustment for auxiliary covariates to achieve more efficient estimators and tests for treatment parameters in the analysis of randomized clinical trials. Simulations and applications demonstrate the performance of the methods.
Covariate adjustment; Hypothesis test; k-arm trial; Kruskal-Wallis test; Log-odds ratio; Longitudinal data; Semiparametric theory
There is considerable debate regarding whether and how covariate adjusted analyses should be used in the comparison of treatments in randomized clinical trials. Substantial baseline covariate information is routinely collected in such trials, and one goal of adjustment is to exploit covariates associated with outcome to increase precision of estimation of the treatment effect. However, concerns are routinely raised over the potential for bias when the covariates used are selected post hoc; and the potential for adjustment based on a model of the relationship between outcome, covariates, and treatment to invite a “fishing expedition” for that leading to the most dramatic effect estimate. By appealing to the theory of semiparametrics, we are led naturally to a characterization of all treatment effect estimators and to principled, practically-feasible methods for covariate adjustment that yield the desired gains in efficiency and that allow covariate relationships to be identified and exploited while circumventing the usual concerns. The methods and strategies for their implementation in practice are presented. Simulation studies and an application to data from an HIV clinical trial demonstrate the performance of the techniques relative to existing methods.
baseline variables; clinical trials; covariate adjustment; efficiency; semiparametric theory; variable selection
In randomized controlled trials (RCTs), treatment assignment is unconfounded with baseline covariates, allowing outcomes to be directly compared between treatment arms. When outcomes are binary, the effect of treatment can be summarized using relative risks, absolute risk reductions and the number needed to treat (NNT). When outcomes are time-to-event in nature, the effect of treatment on the absolute reduction of the risk of an event occurring within a specified duration of follow-up and the associated NNT can be estimated. In observational studies of the effect of treatments on health outcomes, treatment is frequently confounded with baseline covariates. Regression adjustment is commonly used to estimate the adjusted effect of treatment on outcomes. We highlight several limitations of measures of treatment effect that are directly obtained from regression models. We illustrate how both regression-based approaches and propensity-score based approaches allow one to estimate the same measures of treatment effect as those that are commonly reported in RCTs. The CONSORT statement recommends that both relative and absolute measures of treatment effects be reported for RCTs with dichotomous outcomes. The methods described in this paper will allow for similar reporting in observational studies.
randomized controlled trials; observational studies; causal effects; treatment effects; absolute risk reduction; relative risk reduction; number needed to treat; odds ratio; survival time; propensity score; propensity-score matching; regression; non-randomized studies; confounding
In randomized studies researchers may be interested in the effect of treatment assignment on a time-to-event outcome that only exists in a subset selected after randomization. For example, in preventative HIV vaccine trials, it is of interest to determine whether randomization to vaccine affects the time from infection diagnosis until initiation of antiretroviral therapy. Earlier work assessed the effect of treatment on outcome among the principal stratum of individuals who would have been selected regardless of treatment assignment. These studies assumed monotonicity, that one of the principal strata was empty (eg, every person infected in the vaccine arm would have been infected if randomized to placebo). Here we present a sensitivity analysis approach for relaxing monotonicity with a time-to-event outcome. We also consider scenarios where selection is unknown for some subjects because of non-informative censoring (e.g., infection status k years after randomization is unknown for some because of staggered study entry). We illustrate our method using data from an HIV vaccine trial.
Causal inference; HIV; Kaplan-Meier; Monotonicity; Principal stratification
Recent results for case-control sampling suggest when the covariate distribution is constrained by gene-environment independence, semiparametric estimation exploiting such independence yields a great deal of efficiency gain. We consider the efficient estimation of the treatment-biomarker interaction in two-phase sampling nested within randomized clinical trials, incorporating the independence between a randomized treatment and the baseline markers. We develop a Newton–Raphson algorithm based on the profile likelihood to compute the semiparametric maximum likelihood estimate (SPMLE). Our algorithm accommodates both continuous phase-one outcomes and continuous phase-two biomarkers. The profile information matrix is computed explicitly via numerical differentiation. In certain situations where computing the SPMLE is slow, we propose a maximum estimated likelihood estimator (MELE), which is also capable of incorporating the covariate independence. This estimated likelihood approach uses a one-step empirical covariate distribution, thus is straightforward to maximize. It offers a closed-form variance estimate with limited increase in variance relative to the fully efficient SPMLE. Our results suggest exploiting the covariate independence in two-phase sampling increases the efficiency substantially, particularly for estimating treatment-biomarker interactions.
case-only estimator; estimated likelihood; gene-environment independence; Newton–Raphson algorithm; profile likelihood; treatment-biomarker interactions
We examine the practicality of propensity score methods for estimating causal treatment effects conditional on intermediate posttreatment outcomes (principal effects) in the context of randomized experiments. In particular, we focus on the sensitivity of principal causal effect estimates to violation of principal ignorability, which is the primary assumption that underlies the use of propensity score methods to estimate principal effects. Under principal ignorability, principal strata membership is conditionally independent of the potential outcome under control given the pre-treatment covariates; i.e., there are no differences in the potential outcomes under control across principal strata given the observed pretreatment covariates. Under this assumption, principal scores modeling principal strata membership can be estimated based solely on the observed covariates and used to predict strata membership and estimate principal effects. While this assumption underlies the use of propensity scores in this setting, sensitivity to violations of it has not been studied rigorously. In this paper, we explicitly define principal ignorability using the outcome model (although we do not actually use this outcome model in estimating principal scores) and systematically examine how deviations from the assumption affect estimates, including how the strength of association between principal stratum membership and covariates modifies the performance. We find that when principal ignorability is violated, very strong covariate predictors of stratum membership are needed to yield accurate estimates of principal effects.
randomized experiments; intermediate outcomes; principal ignorability; principal scores; principal stratification; propensity scores
A treatment regime is a rule that assigns a treatment, among a set of possible treatments, to a patient as a function of his/her observed characteristics, hence “personalizing” treatment to the patient. The goal is to identify the optimal treatment regime that, if followed by the entire population of patients, would lead to the best outcome on average. Given data from a clinical trial or observational study, for a single treatment decision, the optimal regime can be found by assuming a regression model for the expected outcome conditional on treatment and covariates, where, for a given set of covariates, the optimal treatment is the one that yields the most favorable expected outcome. However, treatment assignment via such a regime is suspect if the regression model is incorrectly specified. Recognizing that, even if misspecified, such a regression model defines a class of regimes, we instead consider finding the optimal regime within such a class by finding the regime the optimizes an estimator of overall population mean outcome. To take into account possible confounding in an observational study and to increase precision, we use a doubly robust augmented inverse probability weighted estimator for this purpose. Simulations and application to data from a breast cancer clinical trial demonstrate the performance of the method.
Doubly robust estimator; Inverse probability weighting; Outcome regression; Personalized medicine; Potential outcomes; Propensity score
In experimental research, it is not uncommon to assign clusters to conditions. When analysing the data of such cluster-randomized trials, a multilevel analysis should be applied in order to take into account the dependency of first-level units (i.e., subjects) within a second-level unit (i.e., a cluster). Moreover, the multilevel analysis can handle covariates on both levels. If a first-level covariate is involved, usually the within-cluster effect of this covariate will be estimated, implicitly assuming the contextual effect to be equal. However, this assumption may be violated. The focus of the present simulation study is the effects of ignoring the inequality of the within-cluster and contextual covariate effects on parameter and standard error estimates of the treatment effect, which is the parameter of main interest in experimental research. We found that ignoring the inequality of the within-cluster and contextual effects does not affect the estimation of the treatment effect or its standard errors. However, estimates of the variance components, as well as standard errors of the constant, were found to be biased.
Group-randomized design; Multilevel analysis; Within-cluster regression; Between-cluster regression; Hierarchical linear model; Random coefficient model
We demonstrate that clinical trials using response adaptive randomized treatment assignment rules are subject to substantial bias if there are time trends in unknown prognostic factors and standard methods of analysis are used. We develop a general class of randomization tests based on generating the null distribution of a general test statistic by repeating the adaptive randomized treatment assignment rule holding fixed the sequence of outcome values and covariate vectors actually observed in the trial. We develop broad conditions on the adaptive randomization method and the stochastic mechanism by which outcomes and covariate vectors are sampled that ensure that the type I error is controlled at the level of the randomization test. These conditions ensure that the use of the randomization test protects the type I error against time trends that are independent of the treatment assignments. Under some conditions in which the prognosis of future patients is determined by knowledge of the current randomization weights, the type I error is not strictly protected. We show that response-adaptive randomization can result in substantial reduction in statistical power when the type I error is preserved. Our results also ensure that type I error is controlled at the level of the randomization test for adaptive stratification designs used for balancing covariates.
Response adaptive randomization; adaptive stratification; clinical trials
In biomedical studies where the event of interest is recurrent (e.g., hospitalization), it is often the case that the recurrent event sequence is subject to being stopped by a terminating event (e.g., death). In comparing treatment options, the marginal recurrent event mean is frequently of interest. One major complication in the recurrent/terminal event setting is that censoring times are not known for subjects observed to die, which renders standard risk set based methods of estimation inapplicable. We propose two semiparametric methods for estimating the difference or ratio of treatment-specific marginal means numbers of events. The first method involves imputing unobserved censoring times, while the second methods uses inverse probability of censoring weighting. In each case, imbalances in the treatment-specific covariate distributions are adjusted out through inverse probability of treatment weighting. After the imputation and/or weighting, the treatment-specific means (then their difference or ratio) are estimated nonparametrically. Large-sample properties are derived for each of the proposed estimators, with finite sample properties assessed through simulation. The proposed methods are applied to kidney transplant data.
Censoring; Imputation; Inverse weighting; Marginal mean; Multivariate survival analysis; Semiparametric methods
Most randomized efficacy trials of interventions to prevent HIV or other infectious diseases have assessed intervention efficacy by a method that either does not incorporate baseline covariates, or that incorporates them in a non-robust or inefficient way. Yet, it has long been known that randomized treatment effects can be assessed with greater efficiency by incorporating baseline covariates that predict the response variable. Tsiatis et al. (2007) and Zhang et al. (2008) advocated a semiparametric efficient approach, based on the theory of Robins et al. (1994), for consistently estimating randomized treatment effects that optimally incorporates predictive baseline covariates, without any parametric assumptions. They stressed the objectivity of the approach, which is achieved by separating the modeling of baseline predictors from the estimation of the treatment effect. While their work adequately justifies implementation of the method for large Phase 3 trials (because its optimality is in terms of asymptotic properties), its performance for intermediate-sized screening Phase 2b efficacy trials, which are increasing in frequency, is unknown. Furthermore, the past work did not consider a right-censored time-to-event endpoint, which is the usual primary endpoint for a prevention trial. For Phase 2b HIV vaccine efficacy trials, we study finite-sample performance of Zhang et al.'s (2008) method for a dichotomous endpoint, and develop and study an adaptation of this method to a discrete right-censored time-to-event endpoint. We show that, given the predictive capacity of baseline covariates collected in real HIV prevention trials, the methods achieve 5-15% gains in efficiency compared to methods in current use. We apply the methods to the first HIV vaccine efficacy trial. This work supports implementation of the discrete failure time method for prevention trials.
Auxiliary; Covariate Adjustment; Intermediate-sized Phase 2b Efficacy Trial; Semiparametric Efficiency
In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.
Auxiliary covariate; High-dimensional data; Kernel estimation; Missing at random; Semiparametric regression
Audits are often performed to assess the quality of clinical trial data, but beyond detecting fraud or sloppiness, the audit data is generally ignored. In earlier work using data from a non-randomized study, Shepherd and Yu (2011) developed statistical methods to incorporate audit results into study estimates, and demonstrated that audit data could be used to eliminate bias.
In this manuscript we examine the usefulness of audit-based error-correction methods in clinical trial settings where a continuous outcome is of primary interest.
We demonstrate the bias of multiple linear regression estimates in general settings with an outcome that may have errors and a set of covariates for which some may have errors and others, including treatment assignment, are recorded correctly for all subjects. We study this bias under different assumptions including independence between treatment assignment, covariates, and data errors (conceivable in a double-blinded randomized trial) and independence between treatment assignment and covariates but not data errors (possible in an unblinded randomized trial). We review moment-based estimators to incorporate the audit data and propose new multiple imputation estimators. The performance of estimators is studied in simulations.
When treatment is randomized and unrelated to data errors, estimates of the treatment effect using the original error-prone data (i.e., ignoring the audit results) are unbiased. In this setting, both moment and multiple imputation estimators incorporating audit data are more variable than standard analyses using the original data. In contrast, in settings where treatment is randomized but correlated with data errors and in settings where treatment is not randomized, standard treatment effect estimates will be biased. And in all settings, parameter estimates for the original, error-prone covariates will be biased. Treatment and covariate effect estimates can be corrected by incorporating audit data using either the multiple imputation or moment-based approaches. Bias, precision, and coverage of confidence intervals improve as the audit size increases.
The extent of bias and the performance of methods depend on the extent and nature of the error as well as the size of the audit. This work only considers methods for the linear model. Settings much different than those considered here need further study.
In randomized trials with continuous outcomes and treatment assignment independent of data errors, standard analyses of treatment effects will be unbiased and are recommended. However, if treatment assignment is correlated with data errors or other covariates, naive analyses may be biased. In these settings, and when covariate effects are of interest, approaches for incorporating audit results should be considered.
audit; bias; clinical trials; measurement error; multiple imputation
Estimation of longitudinal data covariance structure poses significant challenges because the data are usually collected at irregular time points. A viable semiparametric model for covariance matrices was proposed in Fan, Huang and Li (2007) that allows one to estimate the variance function nonparametrically and to estimate the correlation function parametrically via aggregating information from irregular and sparse data points within each subject. However, the asymptotic properties of their quasi-maximum likelihood estimator (QMLE) of parameters in the covariance model are largely unknown. In the current work, we address this problem in the context of more general models for the conditional mean function including parametric, nonparametric, or semi-parametric. We also consider the possibility of rough mean regression function and introduce the difference-based method to reduce biases in the context of varying-coefficient partially linear mean regression models. This provides a more robust estimator of the covariance function under a wider range of situations. Under some technical conditions, consistency and asymptotic normality are obtained for the QMLE of the parameters in the correlation function. Simulation studies and a real data example are used to illustrate the proposed approach.
Correlation structure; difference-based estimation; quasi-maximum likelihood; varying-coefficient partially linear model
In typical political experiments, researchers randomize a set of households, precincts, or individuals to treatments all at once, and characteristics of all units are known at the time of randomization. However, in many other experiments, subjects “trickle in” to be randomized to treatment conditions, usually via complete randomization. To take advantage of the rich background data that researchers often have (but underutilize) in these experiments, we develop methods that use continuous covariates to assign treatments sequentially. We build on biased coin and minimization procedures for discrete covariates and demonstrate that our methods outperform complete randomization, producing better covariate balance in simulated data. We then describe how we selected and deployed a sequential blocking method in a clinical trial and demonstrate the advantages of our having done so. Further, we show how that method would have performed in two larger sequential political trials. Finally, we compare causal effect estimates from differences in means, augmented inverse propensity weighted estimators, and randomization test inversion.
We consider the problem of jointly modeling survival time and longitudinal data subject to measurement error. The survival times are modeled through the proportional hazards model and a random effects model is assumed for the longitudinal covariate process. Under this framework, we propose an approximate nonparametric corrected-score estimator for the parameter, which describes the association between the time-to-event and the longitudinal covariate. The term nonparametric refers to the fact that assumptions regarding the distribution of the random effects and that of the measurement error are unnecessary. The finite sample size performance of the approximate nonparametric corrected-score estimator is examined through simulation studies and its asymptotic properties are also developed. Furthermore, the proposed estimator and some existing estimators are applied to real data from an AIDS clinical trial.
Corrected score; Cumulant generating function; Measurement error; Proportional hazards; Random effects
Covariate-specific ROC curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this paper, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates’ effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
Alzheimer's disease; covariate-specific ROC curve; ignorable missingness; verification bias; weighted estimating equations
Marginal structural models (MSM) are an important class of models in causal inference. Given a longitudinal data structure observed on a sample of n independent and identically distributed experimental units, MSM model the counterfactual outcome distribution corresponding with a static treatment intervention, conditional on user-supplied baseline covariates. Identification of a static treatment regimen-specific outcome distribution based on observational data requires, beyond the standard sequential randomization assumption, the assumption that each experimental unit has positive probability of following the static treatment regimen. The latter assumption is called the experimental treatment assignment (ETA) assumption, and is parameter-specific. In many studies the ETA is violated because some of the static treatment interventions to be compared cannot be followed by all experimental units, due either to baseline characteristics or to the occurrence of certain events over time. For example, the development of adverse effects or contraindications can force a subject to stop an assigned treatment regimen.
In this article we propose causal effect models for a user-supplied set of realistic individualized treatment rules. Realistic individualized treatment rules are defined as treatment rules which always map into the set of possible treatment options. Thus, causal effect models for realistic treatment rules do not rely on the ETA assumption and are fully identifiable from the data. Further, these models can be chosen to generalize marginal structural models for static treatment interventions. The estimating function methodology of Robins and Rotnitzky (1992) (analogue to its application in Murphy, et. al. (2001) for a single treatment rule) provides us with the corresponding locally efficient double robust inverse probability of treatment weighted estimator.
In addition, we define causal effect models for “intention-to-treat” regimens. The proposed intention-to-treat interventions enforce a static intervention until the time point at which the next treatment does not belong to the set of possible treatment options, at which point the intervention is stopped. We provide locally efficient estimators of such intention-to-treat causal effects.
counterfactual; causal effect; causal inference; double robust estimating function; dynamic treatment regimen; estimating function; individualized stopped treatment regimen; individualized treatment rule; inverse probability of treatment weighted estimating functions; locally efficient estimation; static treatment intervention
Recurrent event data analyses are usually conducted under the assumption that the censoring time is independent of the recurrent event process. In many applications the censoring time can be informative about the underlying recurrent event process, especially in situations where a correlated failure event could potentially terminate the observation of recurrent events. In this paper, we consider a semiparametric model of recurrent event data that allows correlations between censoring times and recurrent event process via frailty. This flexible framework incorporates both time-dependent and time-independent covariates in the formulation, while leaving the distributions of frailty and censoring times unspecified. We propose a novel semiparametric inference procedure that depends on neither the frailty nor the censoring time distribution. Large sample properties of the regression parameter estimates and the estimated baseline cumulative intensity functions are studied. Numerical studies demonstrate that the proposed methodology performs well for realistic sample sizes. An analysis of hospitalization data for patients in an AIDS cohort study is presented to illustrate the proposed method.
Comparable recurrence times; Frailty; Pairwise pseudolikelihood; Proportional rate model
This article discusses regression analysis of multivariate panel count data in which the observation process may contain relevant information about or be related to the underlying recurrent event processes of interest. Such data occur if a recurrent event study involves several related types of recurrent events and the observation scheme or process may be subject-specific. For the problem, a class of semiparametric transformation models is presented, which provides a great flexibility for modelling the effects of covariates on the recurrent event processes. For estimation of regression parameters, an estimating equation-based inference procedure is developed and the asymptotic properties of the resulting estimates are established. Also the proposed approach is evaluated by simulation studies and applied to the data arising from a skin cancer chemoprevention trial.
Counting processes; multivariate data analysis; panel count data; transformation models
Subgroup analysis arises in clinical trials research when we wish to estimate a treatment effect on a specific subgroup of the population distinguished by baseline characteristics. Many trial designs induce latent subgroups such that subgroup membership is observable in one arm of the trial and unidentified in the other. This occurs, for example, in oncology trials when a biopsy or dissection is performed only on subjects randomized to active treatment. We discuss a general framework to estimate a biological treatment effect on the latent subgroup of interest when the survival outcome is right-censored and can be appropriately modelled as a parametric function of covariate effects. Our framework builds on the application of instrumental variables methods to all-or-none treatment noncompliance. We derive a computational method to estimate model parameters via the EM algorithm and provide guidance on its implementation in standard software packages. The research is illustrated through an analysis of a seminal melanoma trial that proposed a new standard of care for the disease and involved a biopsy that is available only on patients in the treatment arm.
survival analysis; accelerated failure time model; treatment noncompliance; mixture model; EM algorithm
In assessing the mechanism of treatment efficacy in randomized clinical trials, investigators often perform mediation analyses by analyzing if the significant intent-to-treat treatment effect on outcome occurs through or around a third intermediate or mediating variable: indirect and direct effects, respectively. Standard mediation analyses assume sequential ignorability, i.e., conditional on covariates the intermediate or mediating factor is randomly assigned, as is the treatment in a randomized clinical trial. This research focuses on the application of the principal stratification approach for estimating the direct effect of a randomized treatment but without the standard sequential ignorability assumption. This approach is used to estimate the direct effect of treatment as a difference between expectations of potential outcomes within latent sub-groups of participants for whom the intermediate variable behavior would be constant, regardless of the randomized treatment assignment. Using a Bayesian estimation procedure, we also assess the sensitivity of results based on the principal stratification approach to heterogeneity of the variances among these principal strata. We assess this approach with simulations and apply it to two psychiatric examples. Both examples and the simulations indicated robustness of our findings to the homogeneous variance assumption. However, simulations showed that the magnitude of treatment effects derived under the principal stratification approach were sensitive to model mis-specification.
Principal stratification; mediating variables; direct effects; principal strata probabilities; heterogeneous variances
We consider statistical inference for additive partial linear models when the linear covariate is measured with error. We propose attenuation-to-correction and SIMEX estimators of the parameter of interest. It is shown that the first resulting estimator is asymptotically normal and requires no undersmoothing. This is an advantage of our estimator over existing backfitting-based estimators for semiparametric additive models which require undersmoothing of the nonparametric component in order for the estimator of the parametric component be root-n consistent. This feature stems from a decrease of the bias of the resulting estimator which is appropriately derived using a profile procedure. A similar characteristic in semiparametric partially linear models was obtained by Wang et al. (2005). We also discuss the asymptotics of the proposed SIMEX approach. Finite-sample performance of the proposed estimators is assessed by simulation experiments. The proposed methods are applied to a dataset from a semen study.
Backfitting; Correction-for-attenuation; Error-prone; Local linear regression; Semen quality study; Semiparametric estimation; SIMEX; Undersmoothing
Covariate adjustment using linear models for continuous outcomes in randomized trials has been shown to increase efficiency and power over the unadjusted method in estimating the marginal effect of treatment. However, for binary outcomes, investigators generally rely on the unadjusted estimate as the literature indicates that covariate-adjusted estimates based on the logistic regression models are less efficient. The crucial step that has been missing when adjusting for covariates is that one must integrate/average the adjusted estimate over those covariates in order to obtain the marginal effect. We apply the method of targeted maximum likelihood estimation (tMLE) to obtain estimators for the marginal effect using covariate adjustment for binary outcomes. We show that the covariate adjustment in randomized trials using the logistic regression models can be mapped, by averaging over the covariate(s), to obtain a fully robust and efficient estimator of the marginal effect, which equals a targeted maximum likelihood estimator. This tMLE is obtained by simply adding a clever covariate to a fixed initial regression. We present simulation studies that demonstrate that this tMLE increases efficiency and power over the unadjusted method, particularly for smaller sample sizes, even when the regression model is mis-specified.
clinical trails; efficiency; covariate adjustment; variable selection