We review targeted maximum likelihood estimation of the additive treatment effect before defining the collaborative targeted maximum likelihood approach. Suppose we have a data set containing *n* independent and identically distributed observations, *O*_{1}, . . . , *O*_{n}, of a random variable *O* = (*W, A, Y*), where *W* is a set of baseline covariates, *A* is a treatment variable, and *Y* is the outcome variable. For simplicity we focus on binary *A*, *A* = 1 denotes treatment, and *A* = 0 denotes control. The outcome variable can either be continuous or binary. Assume we are interested in estimating the marginal additive causal effect of treatment on the outcome. The parameter of interest of the probability distribution *P*_{0} of *O* is therefore defined non-parametrically as *ψ*_{0} = *E*_{W}(*E*(*Y | A* = 1, *W*) *− E*(*Y | A* = 0, *W*)). Under the appropriate causal graph assumptions *ψ*_{0} corresponds with the G-computation formula for the marginal additive causal effect.

The probability distribution/density of

*O* can be factored as

*P*_{0}(

*O*) =

*Q*_{0}(

*O*)

*g*_{0}(

*A | W*), where

*Q*_{0}(

*O*) =

*Q*_{Y}_{0}(

*Y | A*,

*W*)

*Q*_{W}_{0}(

*W*) and

*g*_{0}(1

*| W*) =

*P*_{0}(

*A* = 1

*| W*). We used the notation

*Q*_{Y} for a conditional distribution of

*Y*, given

*A*,

*W*, and

*Q*_{W} for the marginal distribution of

*W*. For notational convenience, let

*Q*_{0}(

*A*,

*W*) =

*E*_{0}(

*Y | A*,

*W*) be the true conditional mean of

*Y*, given

*A*,

*W*, which is thus a parameter of

*Q*_{Y} _{0}. We note that

*ψ*_{0} = Ψ (

*Q*_{0}) only depends on the data generating distribution

*P*_{0} through its

*Q*_{0}-factor. The targeted maximum likelihood estimator of

*ψ*_{0} is a particular substitution estimator

where

*Q*_{n}(

*A, W*) is an estimated conditional mean of

*Y* given

*A, W*, and the marginal distribution

*Q*_{W}_{0} is estimated with its empirical probability distribution.

Targeted maximum likelihood estimation involves obtaining an initial estimate of the true conditional mean of

*Y* given

*A* and

*W*, and subsequently fluctuating this estimate in a manner designed to reduce bias in the estimate of the parameter of interest. Let

be the initial estimate of the true conditional mean

*Q*_{0}(

*A*,

*W*). For example, if

*Y* is binary, then one constructs a parametric (least favorable) model

, fluctuating the initial estimate

, where

is the fluctuation parameter. The function

*h*(

*A*,

*W*), known as the “clever covariate”, depends on the treatment assignment mechanism

*g*_{0}, and is given by

The theoretical basis for this choice of clever covariate is given in

van der Laan and Rubin (2006). In particular, it has the bias-reduction property that if one estimates

with the parametric maximum likelihood estimator, and one sets

equal to the resulting update, then the resulting substitution estimator

is asymptotically unbiased, even if the initial estimator

is inconsistent. These results indicate that estimating

*g*_{0} is crucial for reducing bias. However, the choice of an estimator

*g*_{n} should be evaluated by how it affects the mean squared error of the resulting targeted maximum likelihood estimator

, making it a harder and different problem than estimating

*g*_{0} itself.

TMLE has been shown to be double robust, i.e. the estimate is consistent if either the limits of

or

*g*_{n} are correctly specified. When both are correct, the estimator is efficient (

van der Laan and Rubin, 2006). Recent theoretical advances show that TMLE is also

*collaboratively* double robust (

van der Laan and Gruber, 2010). That is, if the initial estimator converges to a possibly misspecified

*Q*, then

*g*_{n} needs to only converge to a conditional distribution of

*A* that properly adjusts for a covariate that is a function of

*Q*_{0} *− Q*. This result is intuitively a natural consequence of the fact that the clever covariate can only reduce bias if it is predictive of the outcome after taking into account the initial estimator. This collaborative double robustness property and a corresponding asymptotic linearity theorem are proven in a companion article in this issue.

A particular method for construction of a collaborative estimator *g*_{n} involves building candidate treatment mechanism estimators that grow towards an unbiased estimator of the fully adjusted *g*_{0}. In a departure from current practice, the construction of these candidates is guided by the log-likelihood loss function for *Q*_{0}, thus not by the log-likelihood loss function for the conditional distribution of *A* given *W*, hence our use of the term “collaborative.”

Clever covariates based on these candidates give rise to a sequence of updated estimates,

, each of which provides a candidate TMLE estimate of

*ψ*_{0}. The C-TMLE estimate is the best among these candidates, as determined by V-fold

*Q*_{0}-log-likelihood-based cross-validation.