Search tips
Search criteria 


Logo of ijbiostatThe International Journal of BiostatisticsThe International Journal of BiostatisticsSubmit to The International Journal of BiostatisticsSubscribe
Int J Biostat. 2011 January 1; 7(1): Article 31.
Published online 2011 August 17. doi:  10.2202/1557-4679.1308
PMCID: PMC3173607

The Relative Performance of Targeted Maximum Likelihood Estimators


There is an active debate in the literature on censored data about the relative performance of model based maximum likelihood estimators, IPCW-estimators, and a variety of double robust semiparametric efficient estimators. Kang and Schafer (2007) demonstrate the fragility of double robust and IPCW-estimators in a simulation study with positivity violations. They focus on a simple missing data problem with covariates where one desires to estimate the mean of an outcome that is subject to missingness. Responses by Robins, et al. (2007), Tsiatis and Davidian (2007), Tan (2007) and Ridgeway and McCaffrey (2007) further explore the challenges faced by double robust estimators and offer suggestions for improving their stability. In this article, we join the debate by presenting targeted maximum likelihood estimators (TMLEs). We demonstrate that TMLEs that guarantee that the parametric submodel employed by the TMLE procedure respects the global bounds on the continuous outcomes, are especially suitable for dealing with positivity violations because in addition to being double robust and semiparametric efficient, they are substitution estimators. We demonstrate the practical performance of TMLEs relative to other estimators in the simulations designed by Kang and Schafer (2007) and in modified simulations with even greater estimation challenges.

Keywords: censored data, collaborative double robustness, collaborative targeted maximum likelihood estimation, double robust, estimator selection, inverse probability of censoring weighting, locally efficient estimation, maximum likelihood estimation, semiparametric model, targeted maximum likelihood estimation, targeted minimum loss based estimation, targeted nuisance parameter estimator selection

1. Introduction

The translation of a scientific question into a statistical estimation problem often involves the formulation of a full-data structure, a target parameter of the full-data probability distribution representing the scientific question of interest, and an observed data structure which can be viewed as a mapping on the full data structure and a censoring variable. One must identify the target parameter of the full-data distribution from the probability distribution of the observed data structure, which often requires particular modeling assumptions such as the coarsening at random assumption on the censoring mechanism (i.e., the conditional distribution of censoring, given the full-data structure). The statistical problem is then reduced to a pure estimation problem defined by the challenge of constructing an estimator of the estimand, defined by the identifiability result for the target parameter of the full-data distribution. The estimator should respect the statistical model implied by the posed assumptions on the censoring mechanism and the full-data distribution.

For semiparametric (e.g., nonparametric) statistical models, many estimators rely in one way or another on the inverse probability of censoring weights (IPCW). Such estimators can be biased and highly variable under practical or theoretical violations of the positivity assumption, which is a support condition on the censoring mechanism that is necessary to establish the identifiability of the target parameter (e.g., Robins (1986, 1987, 1999); Neugebauer and van der Laan (2005); Petersen et al. (2010)). A particular class of estimators are so called double robust estimators (e.g., van der Laan and Robins (2003)). Double robust (DR) estimators, which rely on both IPCW and a model of the full-data distribution, are not necessarily protected from the bias or inflated variance that can result from positivity violations, and in recent literature, there is much debate on the relative performance of DR estimators when the positivity assumption is violated. In particular,Kang and Schafer (2007) (KS) demonstrate the fragility of DR estimators in a simulation study with near, or practical, positivity violations. They focus on a simple missing data problem in which one wishes to estimate the mean of an outcome that is subject to missingness and all possible covariates for predicting missingness are measured. Responses by Robins et al. (2007), Tsiatis and Davidian (2007), Tan (2007) and Ridgeway and McCaffrey (2007) further explore the challenges faced by DR estimators and offer suggestions for improving their stability.

Under regularity conditions, DR estimators are asymptotically unbiased if either the model of the conditional expectation of the outcome given the covariates or the model of the conditional probability of missingness given the covariates is consistent. DR estimators are semiparametric efficient (for the nonparametric model for the full-data distribution) if both of these estimators are consistent. In their article, KS introduce a variety of DR estimators and compare them to non-DR IPCW estimators as well as a simple parametric model based ordinary least squares (OLS) estimator. As the KS simulation has practical positivity violations, some values of both the true and estimated missingness mechanism are very close to zero. In this situation, the IPCW will be extremely large for some observations of the sample. Therefore, DR and non-DR estimators that rely on IPCW may be unreliable. As a result, KS warn against the routine use of estimators that rely on IPCW, including DR estimators. This is in agreement with other literature analyzing the issue. For an overview of the issue, for example, see Robins (1986, 1987, 1999); Robins and Wang (2000); van der Laan and Robins (2003)). For literature showing simulations demonstrating the extreme sparsity bias of IPCW-estimators, see for example, Neugebauer and van der Laan (2005)). Also, Petersen et al. (2010); Wang et al. (2006a); Moore et al. (2009); Cole and Hernan (2008); Kish (1992); Bembom and van der Laan (2008)) have focused on diagnosing violations of the positivity assumptions in response to this concern. Bembom and van der Laan (2008) presented data adaptive selection of the truncation constant to control the influence of weighting. In addition, van der Laan and Petersen (2007) and Petersen et al. (2010) discussed selecting parameters that are relying on realistic assumptions.

The particular simulation in KS also gives rise to a situation in which under dual misspecification, the OLS estimator outperforms all of the presented DR estimators. While this is an interesting issue, it is not the main focus of this article. In our view, dual misspecification brings up the need for other strategies for improving the robustness of estimators in general, such as incorporating data adaptive estimation instead of relying on parametric regression models for the missingness mechanism and the conditional distribution of responses, an idea echoed in the responses by Tsiatis and Davidian (2007) and Ridgeway and McCaffrey (2007), and standardly incorporated in the UC Berkeley literature on targeted maximum likelihood estimation (e.g., van der Laan and Rubin (2006); van der Laan et al. (2009)). In particular, we note that a statistical estimation problem is also defined by the statistical model, which, in this case, is defined by a nonparametric model: such models require data adaptive estimators in order to claim that the estimator is consistent. Nonetheless, we explicitly demonstrate the impact of the utilization of machine learning on the simulation results in a final section of this article.

In their response to the KS paper, Robins et al. (2007) point out that a desirable property of DR estimators is “boundedness,” in that for a finite sample, estimators of the mean response fall in the parameter space with probability 1. Estimators that impose such a restriction can introduce new bias but avoid the challenges of highly variable weights. Robins et al. (2007) discuss ways in which to guarantee that “boundedness” holds and present two classes of bounded estimators–regression DR estimators and bounded Horvitz-Thompson DR estimators. We define examples of these estimators below, and we evaluate their relative performance. The response by Tsiatis and Davidian (2007) offers strategies for constructing estimators that are more robust under the circumstances in the KS simulations. In particular, to address positivity violations, they suggest an estimator that uses IPCW only for observations with missingness mechanism values that are not close to zero, while using regression predictions for the observations with very small missingness mechanism values. One might consider either a hard cutoff for dividing observations or weighting each part of the influence curve by the estimated missingness mechanism. Tan (2007) also points to an improved locally efficient double robust estimator (Tan (2006)) that is able to maintain double robustness as well as provides guaranteed improvement relative to an initial estimator, improving on such type of estimators that had an algebraic similar form but failed to guarantee both properties (Robins et al. (1994), and see also van der Laan and Robins (2003)). Many responders also make valuable suggestions regarding the dual misspecification challenge.

In the current paper, adapted in part from Sekhon et al. (2011), we add targeted maximum likelihood estimators (TMLEs), or more generally, targeted minimum loss based estimators (van der Laan and Rubin (2006)) to the debate on the relative performance of DR estimators under practical violations of the positivity assumption in the particular simple missing data problem set forth by KS. TMLEs involve a two-step procedure in which one first estimates the conditional expectation of the outcome, given the covariates, and then updates this initial estimator, targeting the parameter of interest, rather than the overall conditional mean of the outcome given the covariates. The second step requires specification of a loss-function (e.g., log-likelihood loss function) and a parametric submodel through the initial regression, so that one can fit the parametric sub-model by minimizing the empirical risk (e.g., maximizing the log-likelihood). The estimator of the target parameter is then defined as the corresponding substitution estimator. Because TMLEs are substitution estimators, they not only respect the global bounds of the parameter and data (and thus satisfy the “boundedness” property defined by Robins et al. (2007)), but, even more importantly, they respect the fact that the true parameter value is a particular function of the data generating probability distribution.

TMLEs are double robust and asymptotically efficient. Moreover, TMLEs can incorporate data-adaptive likelihood or loss based estimation procedures to estimate both the conditional expectation of the outcome and the missingness mechanism. The TMLE also allows the incorporation of targeted estimation of the censoring/treatment mechanism, as embodied by the collaborative TMLE (C-TMLE), thereby fully confronting a long standing problem of how to select covariates in the propensity score/missingness mechanism of DR-estimators. In this article, we compare the performance of TMLEs to other DR estimators in the literature using the exact simulation study presented in the KS paper. We also make slight modifications to the KS simulation, in order to make the estimation even more challenging.

The remainder of this article is organized as follows. Section 2 presents notation, which deviates from that presented in KS, for the data structure and parameter of interest. Section 3 formally defines the positivity assumption and gives an overview of causes, diagnostics and responses to violations. Section 4 defines the estimators on which we focus in this paper, including a sample of estimators in the literature and TMLEs. Section 5 compares estimator performance in the original and modified KS simulations. Section 6 then looks at coupling TMLEs with machine learning. Section 7 concludes with a discussion of the findings.

2. Data Structure, Statistical Model, and Parameter of Interest

Consider an observed data set consisting of n independent and identically distributed (i.i.d) observations of O = (W, Δ, ΔY) ~ P0. W is a vector of covariates, and Δ = 1 indicates whether Y, a continuous outcome, is observed. P0 denotes the true distribution of O, from which all observations are sampled. We view O as a missing data structure on a hypothetical full data structure X = (W, Y), which contains the true, or potential, value of Y for all observations, as if no values are missing. We assume Y is missing at random (MAR) such that (P0Δ = 1 | X) = g0(1 | W). In other words, we assume there are no unobserved confounders of the relationship between missingness Δ and the outcome Y.

We define Q0 = {Q0,W, Q0}, where Q0,W (w) = P0(W = w) and Q0(W) [equivalent] E0(Y|Δ = 1, W). We make no assumptions about Q0. The generalized Cramer-Rao information bound for any parameter of Q0 does not depend on the statistical model for the missingness mechanism g0. The parameter of interest is the mean outcome E0(Y) for the sampled population, as if there were not missing observations of Y. Due to the MAR assumption and the positivity assumption defined below, our target parameter is identified from P0 by the following mapping from Q0:


3. The Positivity Assumption

The identifiability of the parameter of interest μ(P0) requires MAR and adequate support in the data. Regarding the latter, it requires that within each stratum of W, there is positive probability that Y is not missing. This requirement is often referred to as the positivity assumption. Formally, for our target parameter, the positivity assumption requires that:

g0(Δ=1|W)>0 P0almost everywhere.

The positivity assumption is specific to the the target parameter. For example, the positivity assumption of the target parameter E0{E0(Y | A = 1, W) – E0(Y | A = 0, W)} of the probability distribution of O = (W, A, Y), representing the additive causal effect under causal assumptions, requires that within each stratum there is a positive probability for all possible treatment assignments. For example, if A is a binary treatment, then positivity requires that 0 < g0(A = 1 | W) < 1. (The assumption is often referred to as the experimental treatment assignment (ETA) assumption for causal parameters.) In addition to being parameter-specific, the positivity assumption is also model-specific. Parametric model assumptions, which extrapolate to regions of the joint distribution of (A,W) that may not be supported in the data, allow for weakening the positivity assumption (Petersen et al. (2010)). However, analysts need to be sure that their parametric assumptions actually hold true, which may be difficult if not impossible.

Violations and near violations of the positivity assumption can arise for two reasons. First, it may be theoretically impossible or highly unlikely for the outcome Y to be observed for certain covariate values in the population of interest. The threat to identifiability due to such structural violations of positivity exists regardless of the sample size. Second, given a finite sample, the probability of the outcome being observed for some covariate values might be so small that the observed sample cannot be distinguished from a sample drawn under a theoretical violation of the positivity assumption. The effect of such practical violations of the positivity assumption are sample size specific, and the resulting sparse data bias and inflated variance are often as dramatic as under structural violations.

Several approaches for diagnosing bias due to positivity violations have been suggested (see Petersen et al. (2010) for an overview). Analysts may assess the distribution of Δ within covariate strata (or in the case of causal parameters, the distribution of treatment assignment), but this method is not practical with high dimensional covariate sets or with continuous or multi-level covariates, and also provides no quantitative measure of the resulting sparse-data bias. Analysts may also assess the distribution of the estimated missingness mechanism scores, gn (Δ = 1 | W), or inverse probability weights. While this approach may indicate positivity violations, it does not provide any information on the extent of potential bias of the chosen estimator. Wang et al. (2006b) introduce and Petersen et al. (2010) further discuss a diagnostic that provides an estimate of positivity bias for any candidate estimator, which is based on a parametric bootstrap. Bias estimates of similar or larger magnitude than an estimate’s standard error can raise a red flag to analysts that inference for their target parameter is threatened by lack of positivity.

When censoring probabilities are close to 0 (or 1 in the case of an effect parameter), a common practice is to truncate the probabilities or the resulting inverse probability weights, either at fixed levels or at percentiles (Petersen et al. (2010); Wang et al. (2006a); Moore et al. (2009); Cole and Hernan (2008); Kish (1992); Bembom and van der Laan (2008)). The practice limits the influence of observations with large unbounded weights, which may reduce positivity bias and rein in inflated variance. However, this practice may also introduce bias, due to misspecification of the missingness mechanism gn. The extent to which truncating gn hurts or helps the performance of an estimator depends on the level of truncation, the estimator and the distribution of the data. In our simulations below, we examine the effect of truncating missingness probabilities for all estimators that we introduce in the next section.

4. Estimators of a Mean Outcome when the Outcome is Subject to Missingness

4.1. Estimators in the Literature

As a benchmark, KS compare all estimators in their paper to the ordinary least squares (OLS) estimator. For the target parameter, the OLS estimator is equivalent to the G-computation estimator based on a linear regression model. It is defined as:


where Q¯n0=mβn is a linear regression initial fit of Q0, and βn is given by:

βn=arg minβi=1nΔi(Yimβ(Wi))2.

(Note that in our notation, the subscript n refers to an estimation, and the superscript indicates whether the estimation is from an initial fit (0n), or as we introduce below, a refit (n) or a fluctuated fit (*n).) Under violation of the positivity assumption, the OLS estimator, when defined, extrapolates from strata of W in which there is support to strata of W that lack adequate support. The extrapolation depends on the validity of the linear regression model, and misspecification leads to bias.

KS present comparisons of several DR (and non-DR) estimators. We focus on just a couple of them here. Using our terminology with the terminology and abbreviations from KS in parenthesis the estimators we compare are: the weighted least squares (WLS) estimator (regression estimation with inverse-propensity weighted coefficients, μn,WLS) and the augmented IPCW (A-IPCW) estimator (regression estimation with residual bias correction, μn,BC–OLS). Both of these DR estimators are defined below.

The WLS estimator is defined as:



βn=arg minβi=1nΔign(1|Wi)(Yimβ(Wi))2.

The A-IPCW estimator, introduced by J.M. Robins and Zhao (1994), is then defined as:


Both of these estimators rely on estimators of Q0 and g0. They are consistent if Q¯n0 or gn is consistent, and efficient if both are consistent. Under positivity violations, however, these estimators rely on the consistency of Q¯n0, and require that gn converges to a limit that satisfies the positivity assumption (see e.g., van der Laan and Robins (2003)).

Additionally, in comments on KS, Robins et al. (2007) introduce bounded Horvitz-Thompson (BHT) estimators, which, as the name suggests, are bounded, in that for finite sample sizes the estimates are guaranteed to fall in the parameter space. A BHT estimator is defined as:


This is equivalent to the A-IPTW estimator, but estimating g0(1 | W) by fitting the following logistic regression model:

logitPEXT(Δ=1|W)=αTW+[var phi]hn(W),

and hn(W)=Q¯n0(W)1nΣi=1nQ¯n0(Wi).

We also include another important class of doubly robust, locally efficient, regression-based estimators introduced by Scharfstein et al. (1999), further discussed in Robins (1999) and compared to the TMLE in Rosenblum and van der Laan (2010). This estimator is based on a parametric regression model, which includes a “clever covariate” that incorporates inverse probability weights. The estimator behaves similarly to the TMLE using a linear fluctuation (and is identical if the TMLE using a linear fluctuation uses this clever parametric regression as initial estimator). We use the abbreviation PRC. The estimator is defined as:


where Qn(W)=mβn,epsilonn(W) and mβ,ε (W) is a parametric model, which includes the clever covariate Hgn*(W)=1gn(1|W), and (βn, εn) is the OLS.

Cao et al. (2009) presents a DR estimator that achieves minimum variance among a class of DR estimators indexed by all possible linear regressions for the initial estimator, when the estimator of missingness mechanism is correctly specified (see also Rubin and van der Laan (2008) for empirical efficiency maximization), while it preserves the double robustness. They also address the effect of large IPCW by enhancing the missingness mechanism estimator in order to constrain the predicted values. Their estimator is defined as:


Cao’s enhanced missingness mechanism estimator is given by:


Here W = [1, W], and the parameters γ and δ are estimated subject to the constraints 0 < π(W, δ, γ) < 1 and Σi=1nΔi/πen(Wi,δn,γn)=n. A quasi-Newton method implemented in the function in the R package alabama was used to estimate (δn, γn) (Varadhan, 2010). We used OLS to estimate βn, which corresponds to Cao’s μ^usualen.

Tan (2010) presents an augmented likelihood estimator that is a more robust version of estimators originally introduced in Tan (2006) that respect boundedness and is semi-parametric efficient. This estimator is defined as:


where ω(W; [lambda with tilde]step2) is an enhanced estimate of the missingness mechanism based on an initial estimate, πML (W). Specifically, ω(W; λ) = πML (W) + λThn(W), where hn=(hn,1T,hn,2T),

hn,1=(1πn,ML(W))νn(W),hn,2=[partial differential]π[partial differential]γn,ML(W;γn,ML),νn(W)=[1,Q¯n0(W)]T,

and γn,ML is a maximum likelihood estimator for the propensity score model parameter. An estimate λn that respects the constraint 0 < ω(Wi, λ) if Δi = 1 can be obtained using a two-step procedure outlined in Tan’s article. Following Tan’s recommendation, non-linear optimization was carried out using the R trust package (Geyer, 2009). We consider the two variants of Tan’s LIK2 augmented likelihood estimator that performed best in Tan’s simulations under misspecification of Q. Our estimator TanWLS relies on a weighted least squares estimate of Q¯n0. TanRV relies on the empirical efficiency maximization estimator of Rubin and van der Laan (Rubin and van der Laan, 2008),

Q¯n,RV=i=1nΔig(1|Wi)(Yim(W;βn))+m(W;βn),         βn=arg minβi=1nΔi(1gn(1|Wi))gn(1|Wi)2(Yimβ(Wi))2.

4.2. TMLEs

The targeted maximum likelihood procedure was first introduced in van der Laan and Rubin (2006). For a compilation of current and past work on targeted maximum likelihood estimation, see van der Laan et al. (2009).

In contrast to the estimating equation-based DR estimators defined above (WLS, A-IPCW, BHT, Cao, and Tan), the PRC estimator and TM-LEs are DR substitution estimators. TMLEs are based on an update of an initial estimator of P0 that fluctuates the fit with a fit of a clever parametric submodel. Assuming a valid parametric submodel is selected, TMLEs do not only respect the bounds on the outcome implied by the statistical model or data, but also respect that the true target parameter value is a specified function of the data generating distribution. Due to respecting this information, the TMLE does not only respect the local bounds of the statistical model by being asymptotically (locally) efficient (as the other DR estimators), but also respect the global constraints of the statistical model. Being a substitution estimator is particularly important under sparsity, as implied by violations of the positivity assumption.

Although our target parameter involves a continuous Y, to introduce the TMLE for the mean outcome, we begin by defining the TMLE for a binary Y. In this case, the TMLE is defined as:


where we use the logistic regression submodel:


the clever covariate is defined as Hgn*(W)=1gn(1|W), and ε, the fluctuation parameter, is estimated by maximum likelihood in which the loss function is thus the log-likelihood loss function:


Thus εn is fitted with univariate logistic regression, using the initial regression estimator Q¯n0 as an off-set:

epsilonn=arg minepsiloni=1nL(Q¯n0(epsilon))(Oi).

The TMLE of Q0 is defined as Q¯n*=Q¯n(epsilonn), and μ(Qn*) is the corresponding TMLE of μ0.

For estimators Q¯n0 and gn, one may specify a parametric model or use machine learning or even super learner, which uses loss-based cross-validation to select weighted combination of candidate estimators (van der Laan et al. (2007)).

Next, consider that Y is continuous, but bounded by 0 and 1. In this case, we can implement the same TMLE as we would for binary Y in (2). That is, we use the same logistic regression submodel, and the same loss function (3), and the same standard software for logistic regression to fit ε, simply ignoring that Y is not binary. The same loss function is still valid for the conditional mean Q0 (Wedderburn (1974); Gruber and van der Laan (2010)):

Q¯0=arg min Q¯E0L(Q¯).

Finally, given a continuous Y [set membership] [a, b] we can define Y* = (Y – a)/(b–a) so that Y* [set membership] [0, 1]. Then, let μ*(P0) = E0(E0(Y* | Δ = 1, W)). This approach requires setting a range [a, b] for the outcomes Y. If such knowledge is available, one simply uses the known values. If Y would not be subject to missingness, then one would use the minimum and maximum of the empirical sample which represents a very accurate estimator of the range. In these simulations, Y is subject to informative missingness, so that the minimum or maximum of the biased sample represents a biased estimate of the range, resulting in a small unnecessary bias in the TMLE (asymptotically negligible relative to MSE). We enlarged the range of the complete observations on Y by setting a to 0.9 times the minimum of the observed values, and b to 1.1 times the maximum of the observed values, which seemed to remove most of the unnecessary bias. We expect that some improvements can be obtained by incorporating a valid estimator of the range that takes into account the informative missingness, but such second order improvements are outside the scope of this article. We now compute the above TMLE of μ*(P0), denoted as TMLEY*, and we use the relation μ(P0) = (b – a)μ*(P0) + a.

We note that the estimator proposed by (Scharfstein et al., 1999) and discussed in the KS debate is a particular special case of a TMLE (Rosenblum and van der Laan (2010)). It defines a clever parametric initial regression for which the update step of the general TMLE-algorithm introduced in van der Laan and Rubin (2006) results in a zero-update, and is thus not needed. Such a TMLE falls in the class of TMLEs defined by an initial regression estimator, a squared error loss function and univariate linear regression sub-model (coding the fluctuations of the initial regression estimator for the TMLE-update step). Such TMLEs for continuous outcomes (contrary to the excellent robustness of the TMLE for binary outcome based on the log-likelihood loss function and logistic regression submodel) suffer from great sensitivity to violations of the positivity assumptions, as was also observed in the simulations presented in the Kang and Schafer debate. As explained in (Gruber and van der Laan (2010)) the problem with this TMLE defined by the squared error loss function and univariate linear regression submodel is that its updates are not subject to any bounds implied by the statistical model or data: that is, it is not using a parametric sub-model, an important principle of the general TMLE algorithm. The valid TMLE for continuous outcomes above, defined by the quasi-binary-log-likelihood loss and a univariate logistic regression parametric submodel, was recently presented (Gruber and van der Laan (2010)), and in the latter article it was demonstrated that the previously observed sensitivity of these two estimators to the positivity assumption was due to those specific choices.

Finally, a natural extension of all of the above TMLEs is to make a more sophisticated estimate of g0. Therefore, estimator μn,C–TMLEY* is defined by (2) as well, but the algorithm for computing Qn* differs. For the C-TMLE, we generate a sequence of nested-logistic regression model fits of g0, gn,1, . . . , gn,K, and we create a corresponding sequence of candidate TMLEs Qk,gn,k*, using gn,k in the targeted MLE step, k = 1, . . ., K, such that the loss-function (e.g., log-likelihood) specific fit of Qk,gn,k* is increasing in k. Finally, we use loss-function specific cross-validation to select k. The precise algorithm is presented in Gruber and van der Laan (2010) and the software is available, and posted on As a result, the resulting estimator gn used in the TMLE is aimed to only include covariates that are effective in removing bias with respect to the target parameter: the theoretical underpinnings in terms of collaborative double robustness of the efficient influence curve is presented in van der Laan and Gruber (2009).

5. Simulation Studies

In this section, we compare the performance of TMLEs to the estimating equation-based DR estimators (WLS, A-IPTW, BHT, Cao, TanWLS, TanRV) as well as PRC and OLS, in the context of positivity violations. The goal of the original simulation designed by KS was to highlight the stability problems of DR estimators. We explore the relative performance of the estimators under the original KS simulation and a number of alternative data generating distributions that involve stronger and different types of violations of the positivity assumption. These new simulation settings were designed to provide more diverse and even more challenging test cases for evaluating robustness and thereby finite sample performance of the different estimators.

For the four simulations described below, all estimators were used to estimate μ(P0) from 250 samples of size 1000. We include TMLEY* and C-TMLEY* estimators based on the quasi-log-likelihood loss function and the logistic regression submodel. We evaluated the performance of the estimators by their bias, variance and mean squared error (MSE).

We compared the estimators of μ(P0) using different specifications of the estimators of Q0 and g0. In each of the tables presented below, “Qcgc” indicates that the estimators of both were specified correctly; “Qcgm” indicates that the estimator of Q0 was correctly specified, but the estimator of g0 was misspecified ; “Qmgc” indicates that the estimator of Q0 was misspecified, but the estimator of g0 was correctly specified; and “Qmgm” indicates that both estimators were misspecified. For the modified simulations we present results for the “Qmgc” specification only, in order to focus on the performance of each estimator when reliance on gn is essential. Additional results for the other model specifications are available as supplemental materials.

Also, for all estimators, we compared results with no lower bound on gn(1 | W) with truncating gn(1 | W) at a lower bound set at 0.025. We note that neither KS nor Robins et al. (2007) included bounding gn(1 | W) when applying their estimators. Although, not bounding gn(1, W) has the advantage that in any given application it is difficult to determine which bounds to use, the theory teaches us that the DR estimators can only be consistent if gn is bounded from below, even if in truth g0 is unbounded. In addition, some of the estimators above incorporate implicit bounding of gn, so that such estimators would appear to be particularly advantageous, while the gain in performance might all be due to the implicit bounding of gn (which would be good to know). Additional results when gn is bounded from below at 0.01 and 0.05 demonstrate similar behavior, and are also available on the web.

5.1. Kang and Schafer Simulation

Kang and Schafer (2007) consider n i.i.d. units of O = (W, Δ, ΔY) ~ P0, where W is a vector of 4 baseline covariates, and Δ is an indicator of whether the continuous outcome, Y, is observed. Kang and Schafer are interested in estimating the following parameter:


Let (Z1, . . . , Z4) be independent normally distributed random variables with mean zero and variance 1. The covariates W we actually observe are generated as follows:


The outcome Y is generated as:


From this, one can determine that the conditional mean Q0(W) of Y, given W, which equals the same linear regression in Z1(W), . . . , Z4(W), where Zj(W), j = 1, . . . , 4, are the unique solutions of the 4 equations above in terms of W = (W1, . . . , W4). Thus, if the data analyst would have been provided the functions Zj(W), then the true regression function is linear in these functions, but the data analyst is measuring the terms Wj instead. The other complication of the data generating distribution is that Y is subject to missingness, and the true censoring mechanism, denoted by g0(1 | W) [equivalent] P0(Δ = 1 | W), is given by:


With this data generating mechanism, the average response rate is 0.50. Also, the true population mean is 210, while the mean among respondents is 200. These values indicate a small selection bias.

In these simulations, a linear main term model in the main terms (W1, . . . , W4) for either the outcome-regression or missingness mechanism is misspecified, while a linear main term model in the main terms (Z1(W), . . . , Z4(W)) would be correctly specified. Note that in the KS simulation, there are finite sample violations of the positivity assumption. Specifically, we find g0(Δ = 1 | W) [set membership] [0.01, 0.98] and the estimated missingness probabilities gn(Δ = 1 | W) were observed to fall in the range [4×10−6, 0.97].

Figure 1 and Table 1 present the simulation results without any bounding of gn. Tan’s estimator imposes internal bounds on the estimated missingness mechanism, however we report performance of TanWLS and TanRV estimators when given an initial estimate gn that is not bounded away from 0. All estimators have similar performance when Q¯n0 is correctly specified. When both models are misspecified Cao’s estimator performs as well as OLS. OLS, CAO and C-TMLEY* are least biased, and TanRV has the smallest MSE. The performance of all other estimators degrades under dual misspecification. Arguably, the most interesting test case for all estimators (given that they are all enforced to use parametric models) is Qmgc. TanWLS, TanRV, C-TMLEY*, WLS have the smallest MSE, and TanRV, TanWLS are least biased. The performance of both Tan estimators is unaffected by externally bounding gn due to their internal bounding of gn.

Figure 1:
Sampling distribution of (μnμ0) with no bounding of gn, Kang and Schafer simulation.
Table 1:
Kang and Schafer simulation results with no bounding of gn.

Figure 2 and Table 2 compare the results for each estimator when gn is bounded from below at 0.025. Bounding gn appears to be crucial for PRC in the case of Qmgm, and improves the performance of Cao’s estimator for the Qmgc specification, but has little effect on the performance of the other estimators. However, this result does not generalize to other data generating distributions, where the selection bias is greater and sparsity is more extreme, as the next simulation demonstrates.

Figure 2:
Sampling distribution of (μnμ0) with gn bounded at 0.025, Kang and Schafer simulation.
Table 2:
Kang and Schafer simulation results, gn bounded at 0.025.

5.2. Modification 1 of Kang and Schafer Simulation

In the KS simulation, when Q0 or g0 are misspecified the misspecifications are small, and the selection bias is small. Therefore, we modified the KS simulation in order to increase the degree of misspecification and selection bias. This creates a greater challenge for estimators, and better highlights their relative performance.

As before, let Zj be i.i.d. N(0, 1). The outcome Y is generated as Y = 210 + 50Z1 + 25Z2 + 25Z3 + 25Z4 + N(0, 1). The covariates actually observed by the data analyst are now given by the following functions of (Z1, . . . , Z4):


From this one can determine the true regression function Q0(W) = E0(E(Y | Z) | W). The missingness indicator is generated as follows:


A misspecified fit is now obtained by fitting a linear or logistic main term regression in W1, . . . , W4, while a correct fit is obtained by providing the user with the terms Z1, . . . , Z4, and fitting a linear or logistic main term regression in Z1, . . . , Z4. With these modifications, the population mean is again 210, but the mean among respondents is 184.4. With these modifications, we have a higher degree of practical violation of the positivity assumption: g0(Δ = 1 | W) [set membership] [1.1 × 10−5, 0.99] while the estimated probabilities, gn(Δ = 1 | W), were observed to fall in the range [2.2 × 10−16, 0.87].

Figure 3 and Table 3 presents results for misspecified Q¯n0 without bounding gn and with gn bounded at 0.025. Bounding dramatically reduces the variance of all estimators, except OLS, Tan.WLS and Tan.RV, but recall that Tan estimators always internally bound gn away from 0. This improved efficiency comes at the cost of a slight increase in bias for all estimators except PRC. The variance and MSE of C-TMLEY* is less than half of the other non-TMLE estimators. In contrast to the results on the previous simulation, Cao, Tan.WLS, and Tan.RV exhibit a lack of robustness at this level of sparsity when forced to rely on gn at misspecified Q¯n0.

Figure 3:
Sampling distribution of (μnμ0) with gn bounded at 0.025, Modification 1 of Kang and Schafer simulation.
Table 3:
Modification 1 of Kang and Schafer simulation, Q misspecified.

5.3. Modification 2 of Kang and Schafer Simulation

For this simulation, we made one additional change to Modification 1: we set the coefficient in front of Z4 in the true regression of Y on Z equal to zero. Therefore, while Z4 is still associated with missingness, it is not associated with the outcome, and is thus not a confounder. Given (W1, . . . , W3), W4 is not associated with the outcome either, and therefore as misspecified regression model of Q0(W) we use a main term regression in (W1, W2, W3).

This modification to the KS simulation enables us to take the debate on the relative performance of DR estimators one step further, by addressing a second key challenge of the estimators: that they often include non-confounders in the censoring mechanism estimator. Though such an estimator remains asymptotically unbiased, this unnecessary inclusion can increase asymptotic variance, and may unnecessarily introduce positivity violations leading to finite sample bias and inflated variance (Neugebauer and van der Laan, 2005; Petersen et al., 2010).

Figure 4 and Table 4 reveal that C-TMLEY* has superior performance relative to estimating equation-based DR estimators when not all covariates are associated with Y. As discussed earlier, the C-TMLE algorithm provides an innovative black-box approach for estimating the censoring mechanism, preferring covariates that are associated with the outcome and censoring, without “data-snooping.”

Figure 4:
Sampling distribution of (μnμ0) with gn bounded at 0.025, Modification 2 of Kang and Schafer simulation.
Table 4:
Modification 2 of Kang and Schafer simulation, Q misspecified.

5.4. Modification 3 of Kang and Schafer Simulation

In some rare cases, a C-TMLE can be a super efficient estimator because they use a collaborative estimator gn that takes into account the fit of the initial estimator Q¯n0 (we refer to Rotnitzky et al. (2010) and van der Laan and Gruber (2009) for a detailed discussion). As a consequence, it is of particular interest to investigate the behavior of C-TMLEY* in the previous simulation but with the coefficient in front of Z4 set equal to C/n, for a number of values of C, in the data generating mechanism for the outcome, Y=210+50Z1+25Z2+25Z3+C/nZ4+N(0,1). We report the results for C = {10, 20, 50}. Table 5 provides the results at each value of C for all estimators when Q¯n0 is correctly specified and gn is misspecified, and when Q¯n0 is misspecified and gn is both correctly and mis-specified. In each case, gn is bounded at 0.025. We note that C-TMLEY* does not break down, even under these particularly challenging conditions (nor under other simulated scenarios presented in Gruber and van der Laan (2010b)). It is an open question how a C-TMLE performs at different local data generating distributions when it is superefficient, and further research is warranted.

Table 5:
Modification 3 to Kang and Schafer simulation, C/n perturbation, gn bounded at 0.025.

6. TMLEs with Machine Learning for Dual Misspecification

The KS simulation with dual misspecification (Qmgm) can illustrate the benefits of coupling data-adaptive (super) learning with targeted maximum likelihood estimation. C-TMLEY* constrained to use a main terms regression model with misspecified covariates (W1, W2, W3, W4) has smaller variance than μn,OLS, but is more biased. The MSE of the TMLEY* is larger than the MSE of C-TMLEY*, with increased bias and variance. We ask how the estimation process should be affected if we assume that parametric models are seldom correctly specified and that main term regression techniques generally fail in capturing the true relationships between predictors and an outcome. Our answer is that the estimation process should incorporate data-adaptive machine learning.

We coupled super learning with TMLEY* and C-TMLEY* to estimate both Q0 and g0. For C-TMLEY*, four missingness-mechanism score-based covariates were created based on different truncation levels of the propensity score estimate gn(1 | W): no truncation, and truncation from below at the 0.01, 0.025, and 0.05-percentile. These four scores were supplied along with the misspecified main terms W1, . . . , W4 to the targeted forward selection algorithm in the C-TMLEY* used to build a series of candidate nested logistic regression estimators of the missingness mechanism and corresponding candidate TMLEs. The C-TMLEY* algorithm used 5-fold cross-validation to select the best estimate from the eight candidate TMLEs. This allows the C-TMLE algorithm to build a logistic regression fit of g0 that selects among the misspecified main-terms and super-learning fits of the missingness mechanism score gn(1 | W) at different truncation levels.

An important aspect of super learning is to ensure that the library of prediction algorithms includes a variety of approaches for fitting the true function Q0 and g0. For example, it is sensible to include a main terms regression algorithm in the super learner library. Should that algorithm happen to be correct, the super learner will behave as the main terms regression algorithm. It is also recommended to include algorithms that search over a space of higher order polynomials, non-linear models, and, for example, cubic splines. For binary outcome regression, as required for fitting g0, classification algorithms such as classification and regression trees (Breiman et al., 1984), support vector machines (Cortes and Vapnik, 1995)), and k-nearest-neighbor algorithms (Friedman (1994)), could be added to the library. The point of super-learning is that we cannot know in advance which procedure will be most successful for a given prediction problem. Super learning relies on the oracle property of V-fold cross-validation to asymptotically select the optimal convex combination of estimates obtained from these disparate procedures (van der Laan and Dudoit (2003); van der Laan et al. (2004), van der Laan et al. (2007)).

Consider the misspecified scenario proposed by KS. The true full-data distribution and the missingness mechanism are captured by main terms linear regression of the outcome on Z1, Z2, Z3, Z4. This simple model is virtually impossible to discover through the usual model selection approaches when the observed data consists of misspecified covariates O = (W1, W2, W3, W4, Δ, ΔY), given


This complexity illustrates the importance of including prediction algorithms that attack the estimation problem from a variety of directions. The super learner library we employed contained the algorithms listed below. The analysis was carried out in the R statistical programming environment v2.10.1 (Team, 2010), using algorithms included in the base installation or in the indicated package.

Table 6 reports the results when super learning is incorporated into TMLEY* and C-TMLEY* estimation procedures, based on 250 samples of size 1000, with predicted values for gn(1 | W) truncated from below at 0.025. Using the data-adaptive estimator approach improved bias and variance of both estimators. TMLEY* efficiency improved by a factor of 8.5, and C-TMLEY* efficiency improved by a factor of 1.5. In addition, the MSE for both data-adaptive estimators is smaller than the MSE of the estimator that performed the best when both Q and g were misspecified, μn,OLS (MSE = 2.82).

Table 6:
Results with and without incorporating super learning into TMLEY* and C-TMLEY*, Qmgm, gn truncated at 0.025.

7. Discussion

By mapping continuous outcomes into [0,1] and using a logistic fluctuation, TMLEY* and C-TMLEY* are more robust to violations of the positivity assumption than the TMLEs using the linear fluctuation function. By being a substitution estimator, it follows that the impact of a single observation on TMLEY* is bounded by 1/n while many of the other estimators do not have such a robustness property. We show that C-TMLEY* has superior performance relative to estimating equation-based DR estimators when there are covariates that are strongly associated with the missingness indicator, while weakly or not at all associated with the outcome Y. The C-TMLE algorithm provides an innovative approach for estimating the censoring mechanism, preferring covariates that are associated with the outcome Y and missingness, Δ. C-TMLEs avoid data snooping concerns because the estimation procedure is fully specified before the analyst observes any data (or at least, not any data beyond some ancillary statistics). Even in cases in which all observed covariates are associated with Y, C-TMLE still performs well.

Related work is also being done with respect to other parameters of interest. For example, both Cao et al. (2009) and Tan (2006) include discussions on applying their estimators to causal effect parameters. In addition, Freedman and Berk (2008), focus on a causal effect parameter, and demonstrate that DR estimators (and the WLS estimator in particular) can increase variance and bias when IPCW are large.

Overall, comparisons of estimators, beyond theoretical studies of asymptotics as well as robustness, will need to be based on large scale simulation studies including all available estimators, and cannot be tailored towards one particular simulation setting. Future research should be concerned with setting up such a large scale objective comparison based on publicly available software, and we are looking forward to contributing to such an effort.

The research underlying TMLEs was motivated, in part, by the goal of increasing the stability of DR estimators, and the KS simulations provide a demonstration of the merits of TMLEs under violations of the positivity assumption. TMLEs are estimators defined by the choice of loss function, and parametric submodel, both chosen so that the linear span of the scores at zero fluctuation with respect to the loss function includes the efficient influence curve/efficient score. All such TMLEs are double robust, asymptotically efficient under correct specification, and substitution estimators, but the choice of submodel can affect the finite sample robustness if the submodel does not respect any bounds such as the linear regression submodel for the TMLE. In addition, TMLEs can be combined with super learning and empirical efficiency maximization (Rubin and van der Laan (2008) and van der Laan and Gruber (2009)) to further enhance their performance in practice. We hope that by showing that these estimators perform well in simulations and settings created by other researchers for the purposes of showing the weaknesses of DR estimators, as well as in modified simulations that make estimation even more challenging, we provide probative evidence in support of TMLEs. Of course, much can happen in finite samples, and we look forward to further exploring how these estimators perform in other settings.


*binary outcomes only, added to library for estimating g

Author Notes: Kristin E. Porter and Susan Gruber contributed equally to this work. We would like to thank Zhiqiang Tan and Weihua Cao for providing us with software to implement their estimators. This work was supported by NIH grant R01AI074345-04 and by a National Institutes of Health NRSA Trainee appointment on grant number T32 HG 00047.

Contributor Information

Kristin E. Porter, University of California, Berkeley.

Susan Gruber, University of California, Berkeley.

Mark J. van der Laan, University of California, Berkeley.

Jasjeet S. Sekhon, University of California, Berkeley.


  • Bembom O, van der Laan MJ. Data-adaptive selection of the truncation level for inverse-probability-of-treatment-weighted estimators Technical Report 230, Division of Biostatstics, University of California, Berkeley, 2008. URL
  • Breiman L. Bagging predictors. Machine Learning. 1996;24:123–140. doi: 10.1007/BF00058655. [Cross Ref]
  • Breiman L, Friedman JH, Olshen R, Stone CJ. Classification and regression trees. 1984. The Wadsworth statistics/probability series. Wadsworth International Group.
  • Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96(3):723–734. doi: 10.1093/biomet/asp033. [PMC free article] [PubMed] [Cross Ref]
  • Chang C-C, Lin C-J. LIBSVM: a library for support vector machines (version 2.31) Technical report, 2001.
  • Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. American Journal of Epidemiology. 2008;168:656–664. doi: 10.1093/aje/kwn164. [PMC free article] [PubMed] [Cross Ref]
  • Cortes C, Vapnik V. Support-vector networks. Machine Learning. 20:273–297. December 1995.
  • Dimitriadou E, Hornik K, Leisch F, Meyer D, Weingessel A. e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. 2010. URL R package version 1.5–24.
  • Freedman DA, Berk RA. Weighting regressions by propensity scores. Evaluation Review. 2008;32(4):392–409. doi: 10.1177/0193841X08317586. [PubMed] [Cross Ref]
  • Friedman JH. Multivariate adaptive regression splines. The Annals of Statistics. 1991;19(1):1–67. doi: 10.1214/aos/1176347963. ISSN 00905364. URL [Cross Ref]
  • Friedman JH. Fast MARS Technical report, Department of Statistics. Stanford University; 1993.
  • Friedman JH. 1994. Flexible metric nearest neighbor classification Technical report, Department of Statistics, Stanford University.
  • Geyer CJ. trust: Trust Region Optimization. 2009. URL R package version 0.1-2.
  • Gruber S, van der Laan MJ. UC Berkeley: 2010a. A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Technical Report 265. [PMC free article] [PubMed]
  • Gruber S, van der Laan MJ. An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. The International Journal of Biostatistics. 2010b;6(1) doi: 10.2202/1557-4679.1182. (18) [PMC free article] [PubMed] [Cross Ref]
  • Hastie TJ, Pregibon D. Generalized linear models. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Wadsworth & Brooks/Cole; 1992. chapter 6.
  • Rotnitzky A, Robins JM, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Statist Assoc. 1994;89:846–66. doi: 10.2307/2290910. [Cross Ref]
  • Kang J, Schafer J. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussion) Statistical Science. 2007;22:523–39. doi: 10.1214/07-STS227. [PMC free article] [PubMed] [Cross Ref]
  • Kish L. Weighting for unequal pi. Journal of Official Statistics. 1992;8:183–200.
  • Milborrow S. earth: Multivariate Adaptive Regression Spline Models. 2009. URL R package version 2.4-0.
  • Moore KL, Neugebauer RS, van der Laan MJ, Tager IB. Causal inference in epidemiological studies with strong confounding Technical Report 255, Division of Biostatistics. University of California; Berkeley: 2009. URL
  • Neugebauer R, Bullard J. DSA: Deletion/Substitution/Addition algorithm. 2010. URL R package version 3.1.4.
  • Neugebauer R, van der Laan MJ. Why prefer double robust estimators in causal inference? Journal of Statistical Planning and Inference. 2005;129(1–2):405–426. doi: 10.1016/j.jspi.2004.06.060. [Cross Ref]
  • Peters A, Hothorn T. ipred: Improved Predictors. 2009. URL R package version 0.8-8.
  • Petersen ML, Porter K, Gruber S, Wang Y, van der Laan MJ. Diagnosing and responding to violations in the positivity assumption. Statistical Methods in Medical Research. 2010 doi: 10.1177/0962280210386207. [PubMed] [Cross Ref]
  • Ridgeway G, McCaffrey D. Comment: Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussion) Statistical Science. 2007;22:540–43. doi: 10.1214/07-STS227C. [PMC free article] [PubMed] [Cross Ref]
  • Ripley BD. Pattern recognition and neural networks. Cambridge University Press; Cambridge, New York: 1996.
  • Robins JM, Sued M, Lei-Gomez Q, Rotnitzky A. Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statistical Science. 2007;22:544–559. doi: 10.1214/07-STS227D. [Cross Ref]
  • Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. doi: 10.1016/0270-0255(86)90088-6. [Cross Ref]
  • Robins JM. Addendum to: “A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect” [Math. Modelling 7 (1986), no. 9–12; MR 87m:92078] Comput Math Appl. 1987;14(9–12):923–945. doi: 10.1016/0898-1221(87)90238-0. ISSN 0097-4943. [Cross Ref]
  • Robins JM. Proceedings of the American Statistical Association: Section on Bayesian Statistical Science. 1999. Robust estimation in sequentially ignorable missing data and causal inference models; pp. 6–10.
  • Robins JM. Commentary on using inverse weighting and predictive inference to estimate the effecs of time-varying treatments on the discrete-time hazard. Statistics in Medicine. 1999;21:1663–1680. doi: 10.1002/sim.1110. [PubMed] [Cross Ref]
  • Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994 Sep;89(427):846–66. doi: 10.2307/2290910. [Cross Ref]
  • Rosenblum M, van der Laan MJ. Targeted maximum likelihood estimation of the parameter of a marginal structural model. The International Journal of Biostatistics. 2010;6(19) doi: 10.2202/1557-4679.1238. [PMC free article] [PubMed] [Cross Ref]
  • Rotnitzky A, Li L, Li X. A note on overadjustment in inverse probability weighted estimation. Biometrika. 2010;97(4):997–1001. doi: 10.1093/biomet/asq049. [PMC free article] [PubMed] [Cross Ref]
  • Rubin DB, van der Laan MJ. Empirical efficiency maximization: Improved locally efficient covariate adjustment in randomized experiments and survival analysis. The International Journal of Biostatistics. 2008;4(1) doi: 10.2202/1557-4679.1084. Article 5. [PMC free article] [PubMed] [Cross Ref]
  • Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for non-ignorable drop-out using semiparametric nonresponse models, (with discussion and rejoinder) Journal of the American Statistical Association. 1999;(94):1096–1120. 1121–1146. doi: 10.2307/2669923. [Cross Ref]
  • Sekhon JS, Gruber S, Porter K, van der Laan MJ. Propensity-score-based estimators and C-TMLE. In: van der Laan MJ, Rose S, editors. Targeted Learning: Prediction and Causal Inference for Observational and Experimental Data. Springer; New York: 2011. chapter 21.
  • Sinisi S, van der Laan MJ. The Deletion/Substitution/Addition algorithm in loss function based estimation: Applications in genomics. Journal of Statistical Methods in Molecular Biology. 2004;3(1)
  • Tan Z. A distributional approach for causal inference using propensity scores. J Am Statist Assoc. 2006;101:1619–37. doi: 10.1198/016214506000000023. [Cross Ref]
  • Tan Z. Comment: Understanding OR, PS and DR. Statistical Science. 2007;22:560–568. doi: 10.1214/07-STS227A. [Cross Ref]
  • Tan Zhiqiang. Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika. 2010;97(3):661–682. doi: 10.1093/biomet/asq035. [Cross Ref]
  • R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2010.
  • Tsiatis A, Davidian M. Comment: Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data (with discussion) Statistical Science. 2007;22:569–73. doi: 10.1214/07-STS227B. [PMC free article] [PubMed] [Cross Ref]
  • van der Laan MJ, Dudoit S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: Finite sample oracle inequalities and examples Technical report, Division of Biostatistics. University of California; Berkeley: Nov, 2003.
  • van der Laan MJ, Gruber S. Collaborative double robust penalized targeted maximum likelihood estimation. The International Journal of Biostatistics. 2009 [PMC free article] [PubMed]
  • van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. Springer; New York: 2003.
  • van der Laan MJ, Rubin D. Targeted maximum likelihood learning. The International Journal of Biostatistics. 2006;2(1) doi: 10.2202/1557-4679.1043. [Cross Ref]
  • van der Laan MJ, Dudoit S, van der Vaart AW. University of California; Berkeley: Feb, 2004. The cross-validated adaptive epsilon-net estimator Technical report 142, Division of Biostatistics.
  • van der Laan MJ, Polley E, Hubbard A. Super learner. Statistical Applications in Genetics and Molecular Biology. 2007;6(25) doi: 10.2202/1544-6115.1309. ISSN 1. [PubMed] [Cross Ref]
  • van der Laan MJ, Rose S, Gruber S. Readings on targeted maximum likelihood estimation. 2009. Technical report, working paper series
  • Varadhan R. alabama: Constrained nonlinear optimization. 2010. URL R package version 2010.10-1.
  • Venables WN, Ripley BD. Modern applied statistics with S. 4th edition. Springer; New York: 2002.
  • Grosse E, Cleveland WS, Shyu WM. Local regression models. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Wadsworth & Brooks/Cole; 1992. chapter 6.
  • Wang Y, Petersen M, Bangsberg D, van der Laan MJ. University of California; Berkeley: 2006a. Diagnosing bias in the inverse probability of treatment weighted estimator resulting from violation of experimental treatment assignment. Technical Report 211, Division of Biostatistics.
  • Wang Y, Petersen M, van der Laan MJ. University of California; Berkeley: 2006b. A statistical method for diagnosing ETA bias in IPTW estimators. Technical report, Division of Biostatistics.
  • Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika. 1974;61

Articles from The International Journal of Biostatistics are provided here courtesy of Berkeley Electronic Press