The collaborative targeted maximum likelihood estimator

equals a
kn-th step collaborative targeted maximum likelihood estimator, and thereby equals a targeted maximum likelihood estimator with a starting estimator

(e.g., the
kn – 1-th collaborative targeted maximum likelihood estimator), and the censoring mechanism estimator
gn =
gnδn as selected in the
kn-step, given the collection of candidate estimators
gnδ indexed by
δ ranging over an index set.
Thus, just like the targeted maximum likelihood estimator, the collaborative targeted maximum likelihood estimator

of
ψ0 solves the efficient influence curve estimating equation
For simplicity, we will make the assumption that the efficient influence curve at a PQ,g can be represented as an estimating function in ψ: i.e., the efficient influence curve at P can be represented as D*(Q(P), g(P), ψ(Q(P))) for some mapping (Q, g, ψ) → D*(Q, g, ψ). However, the theorem in this section can be generalized to any efficient influence curve D*(Q, g) at a data generating distribution PQ,g.
It is a reasonable assumption that

converges to some element
Q* in the model for
Q0, where
Q* is not necessarily equal to the true
Q0. In addition, let’s assume that, for each
δ, the
δ-specific censoring mechanism estimator
gnδ converges to some
g0δ. For example, if
δ indicates an adjustment set, then it might be assumed that
gnδ converges to the true conditional distribution, given this
δ-specific adjustment set.
For a given
Q, we define
δ(
Q) as the index
δ with entropy
d(
δ) minimal and so that
In other words, given the family of adjustments indexed by
δ,
δ (
Q) represents the minimal adjustment necessary in the censoring mechanism to obtain the collaborative double robustness/unbiased estimating function for
ψ0. It is then a natural assumption that
In other words, if one uses a more nonparametric estimator of the censoring mechanism than needed (i.e.., than
δ(
Q)), then one certainly obtains the wished unbiasedness.
We will assume that, as n converges to infinity, then the selected censoring mechanism estimator gn = gnδn converges to a fixed g0δ0 representing the limit of a gnδ0 , not necessarily equal to the conditional distribution, given the full X. For notational convenience, we will also denote this limit with g0.
It is assumed that
d(
δ0) ≥
d(
Q*) so that
which will be the fundamental assumption for asymptotic normality of the CTMLE. In other words, it is assumed that our collaborative C-TMLE procedure selects a nonparametric enough estimator
gn for the censoring mechanism (in collaboration with

) so that the required unbiasedness of the efficient influence curve estimating function is achieved.
To derive the influence curve of

, the asymptotic linearity theorem below assumes also that the limit of the selected censoring mechanism estimator satisfies
As a consequence of this assumption (3), the influence curve does not involve a contribution requiring the analysis of a function of

. This important simplification of the influence curve allows straightforward calculation of standard errors for the C-TMLE. The assumption (3) requires the limit
g0 to be nonparametric enough w.r.t. the actual estimator

so that enough orthogonality is achieved to make the contribution

second order.
Why assumption (3) holds for C-TMLE: We now explain why this assumption is reasonable for the C-TMLE.
Define

as

with

. In other words,

corresponds with the limit of the least nonparametric estimator (among all estimators more nonparametric than the one identified by
δ0) that still yields the wished unbiasedness of the estimating function at

, and it as close as possible to
g0 =
g0δ0.
We note that
where
Rn is a second order term (like
Rn1 below) involving the difference

and

. By definition of

and the fact that

converges to
Q*, it is reasonable to assume

as
n → ∞. So
Rn is a second order term, so that it is reasonable to assume

.
By definition of

, we do not only have
but also that

is equally or more nonparametric than
g0(
Q*) so that
This implies now that indeed
Finally, we note that the next theorem can be applied to any collaborative double robust estimator, as discussed in previous section, not only the collaborative double robust targeted maximum likelihood estimator.
Theorem 4 Let (
Q, g, ψ) →
D*(
Q, g, ψ)
be a well defined function that maps any possible (
Q, g, Ψ(
Q))
into a function of O. Let O1, . . . ,
On ~
P0 be i.i.d, and let Pn be the empirical probability distribution. Let Q → Ψ(
Q)
be a d-dimensional parameter, where ψ0 = Ψ(
Q0)
is the parameter value of interest. In the following template for proving asymptotic linearity of
as an estimator of Ψ(
Q0),
represents the collaborative targeted maximum likelihood estimator, but it can be any estimator. Let Q* denote the limit of
. Let gn be an estimator and g0 denote its limit. Assume
Efficient Influence Curve Estimating Equation:

,
where

.
Censoring Mechanism Estimator is Nonparametric Enough:
(Above we show why the latter is indeed a second order term for the C-TMLE.) Consistency:
as n → ∞. And the same is assumed if one or two of the triplets
is replaced by its limit (
Q*, g0, ψ0).
Identifiability/Invertibility: c0 = –d/dψ0P0D*(Q*, g0, ψ0) exists and is invertible.
Donsker Class: {
D*(
Q, g, Ψ(
Q)) :
Q, g}
is P0-Donsker, where (
Q, g)
vary over sets that contain (
, gn), (
Q*,
gn), (
, g)
with probability tending to 1. Contribution due to Censoring Mechanism Estimation: Define the mapping g → Φ(
g)
P0D*(
Q*,
g,
ψ0)
. Assume
for some mean zero function

.
Second order terms: Define second order term
and assume
. Note Rn1 is a second order term involving difference between
and gn –
g0.
Define second order term
and assume
. Note Rn2 is a second order term involving differences gn – g0 and ψn – ψ0.
Then, ψn is asymptotically linear estimator of ψ0 at P0 with influence curve
That is,
In particular,
converges in distribution to a multivariate normal distribution with mean zero and covariance matrix Σ
0 =
E0IC(
P0)
IC(
P0)
![[top top]](/corehtml/pmc/pmcents/x22A4.gif)
.
Proof: The principal equations are

and
P0D*(
Q*,
g0,
ψ0) = 0. So, we have
Let

. Then,
We denote the three terms on the right with I,II and III, and deal with them separately below.
I: By the Donsker condition, and consistency condition, we have
Thus, we obtain

as first term approximation. We refer to
van der Vaart and Wellner (1996) for this empirical process theorem.
II: We have
The first term is

by our Donsker class condition, and consistency condition at

. We also have
where
by assumption.
Rn1 is a second order term involving

and (
gn,
ψn) – (
g0,
ψ0). It remains to consider the term

, which is

by “Censoring Mechanism is Nonparametric Enough”-assumption.
III: We have
The first term is

by Donsker class condition, and consistency condition at

,
gn,
ψn. We also have
where
by assumption. Thus the third term equals
P0D*(
Q*,
gn,
ψ0)–
D*(
Q*,
g0,
ψ0), which, by definition, equals Φ(
gn)–Φ(
g0). We assumed that

. Thus, the third term equals

.
We can thus conclude that
This implies

, and thereby the stated asymptotic linearity. □
4.1. Statistical Inference
If
Q* =
Q0, then
ICg0 = 0, so that the influence curve reduces to the efficient influence curve
D*(
Q0,
g0,
ψ0) at a possibly weakly adjusted
g0. If
gn converges to the fully adjusted conditional distribution, given
X, then we know that
ICg0 equals minus the projection of
D*(
Q*,
g0,
ψ0) onto the tangent space of the model used by
gn (
van der Laan and Robins (2003), Section 2.3.7). We suggest that, even if
g0 is not the fully adjusted censoring mechanism, we will typically still have that
D*(
Q*,
g0,
ψ0) is a conservative influence curve. In other words, if
Qn starts approximating the true
Q0, then the
ICg0 contribution gets smaller and smaller, while if
Qn stays away from
Q0, then
gn starts approximating the fully adjusted
g0, in which case, inference based on
D* is conservative. This might explain why we see good coverage in our simulations based on “influence curve”

. If
gn corresponds with a parametric MLE estimator (for a data adaptively selected parametric model), then we propose to use the parametric delta-method to compute the analytic formula for the influence curve
ICg0 in order to obtain an accurate influence curve.
One can estimate the covariance matrix Σ =
E0ICIC![[top top]](/corehtml/pmc/pmcents/x22A4.gif)
of the influence curve with the empirical covariance matrix

, and statistical inference can be based on the corresponding mean zero multivariate normal distribution, as usual.
4.2. Selection among difference collaborative targeted maximum likelihood estimators
Suppose that we have a set of candidate collaborative targeted maximum likelihood estimators

,
k = 1, . . . ,
K. Suppose that each of these estimators satisfy the conditions of the theorem. For example, these might be collaborative targeted maximum likelihood estimators as defined in our template, using different initial estimators indexed by
k, but the same collaborative estimator for the censoring mechanism
as a function of the data and the initial estimator (thus still resulting in different realizations if the initial estimators are different). Then

is asymptotically linear with influence curve

,
k = 1, . . . ,
K. We can now select among these candidate C-DR-TMLEs by maximizing the estimated efficiency, as in
Rubin and van der Laan (2008).
Specifically, let Ψ be a one-dimensional parameter. We now select the
k that minimizes the cross-validated variance of the influence curve:
Thus, we would use the estimator

. If Ψ is multidimensional, then one needs to agree on a real valued criterion applied to the covariance matrix of the influence curve, such as the sum of the variances along the diagonal, and minimize over
k the criterion of the cross-validated covariance matrix of the
k-specific influence curve.
4.3. Irregular C-TMLE and super efficiency
If
gn converges to the fully adjusted
g0(· |
X) (fully adjusting for
X, under CAR) and

converges to
Q0, then it follows that
ψn is asymptotically linear with influence curve equal to the efficient influence curve
D*(
Q0,
g0,
ψ0). So in that case,
ψn is an asymptotically efficient estimator and thereby also a regular estimator.
Due to the particular way gn is constructed in response to Qn, it is easily argued that the collaborative targeted MLE can be an irregular estimator and can be super efficient by achieving an asymptotic variance that is smaller than the variance of the efficient influence curve. In particular, our previous arguments showed that if the initial estimator is a maximum likelihood estimator according to a correctly specified parametric model, then gn will avoid nonparametric fits, thereby staying away from estimating the fully adjusted g0 that would result in an efficient estimator in first order. In this case, by the above theorem, the influence curve of ψn will be equal to D*(Q0, g0, ψ0), using a non-fully adjusted g0, so that the variance of the influence curve will be smaller than the variance of the efficient influence curve that involves a fully adjusted g0.
The super efficiency may have very attractive features in practice. For example, there might be a covariate that is very predictive of censoring/treatment, but have no relation to the outcome. The C-TMLE will now decide to not adjust for this covariate at all in the selected censoring mechanism, and as a consequence, it might achieve the efficiency bound for the data structure excluding this covariate, but still assuming CAR, so that the C-TMLE will have smaller asymptotic variance than the efficiency bound. The resulting super efficient estimator not only shows improved precision, but also yields more reliable confidence intervals, by avoiding heavily non-robust (and harmful) operations. In most practical scenarios, such a covariate will still have a weak link with the outcome. In this case, for very large sample sizes, the C-TMLE will adjust for this covariate and thereby only be asymptotically efficient, but it will still behave as a super efficient estimator for practical sample sizes, by not adjusting for this covariate. That is, it invests in effective bias reduction focussing on covariates that are still predictive of the outcome, taking into account the already included initial estimator. This behavior is completely compatible with an estimator that aims to minimize mean squared error of the estimator of the target parameter, and certainly avoids steps that both increase bias as well as variance.
Finally, we remark that in simulations in which Qn converges fast to the true Q0, gn seems to have a temptation to converge to a random choice g0 that is beyond the required minimal censoring mechanism with probability 1. That is, likelihood based cross-validation might over-select the adjustment in the censoring mechanism relative to the minimal adjustment, and the amount of over-selection remains random (but small) for large sample sizes (this is a known property of cross-validation). This naturally results in an irregularity of the estimator. Simulations have not shown practical problems for statistical inference, but this remains an area of study.