The collaborative targeted maximum likelihood estimator
equals a kn
-th step collaborative targeted maximum likelihood estimator, and thereby equals a targeted maximum likelihood estimator with a starting estimator
(e.g., the kn
– 1-th collaborative targeted maximum likelihood estimator), and the censoring mechanism estimator gn
as selected in the kn
-step, given the collection of candidate estimators gnδ
indexed by δ
ranging over an index set.
Thus, just like the targeted maximum likelihood estimator, the collaborative targeted maximum likelihood estimator
solves the efficient influence curve estimating equation
For simplicity, we will make the assumption that the efficient influence curve at a PQ,g can be represented as an estimating function in ψ: i.e., the efficient influence curve at P can be represented as D*(Q(P), g(P), ψ(Q(P))) for some mapping (Q, g, ψ) → D*(Q, g, ψ). However, the theorem in this section can be generalized to any efficient influence curve D*(Q, g) at a data generating distribution PQ,g.
It is a reasonable assumption that
converges to some element Q*
in the model for Q0
, where Q*
is not necessarily equal to the true Q0
. In addition, let’s assume that, for each δ
, the δ
-specific censoring mechanism estimator gnδ
converges to some g0δ
. For example, if δ
indicates an adjustment set, then it might be assumed that gnδ
converges to the true conditional distribution, given this δ
-specific adjustment set.
For a given Q
, we define δ
) as the index δ
with entropy d
) minimal and so that
In other words, given the family of adjustments indexed by δ
) represents the minimal adjustment necessary in the censoring mechanism to obtain the collaborative double robustness/unbiased estimating function for ψ0
. It is then a natural assumption that
In other words, if one uses a more nonparametric estimator of the censoring mechanism than needed (i.e.., than δ
)), then one certainly obtains the wished unbiasedness.
We will assume that, as n converges to infinity, then the selected censoring mechanism estimator gn = gnδn converges to a fixed g0δ0 representing the limit of a gnδ0 , not necessarily equal to the conditional distribution, given the full X. For notational convenience, we will also denote this limit with g0.
It is assumed that d
) ≥ d
) so that
which will be the fundamental assumption for asymptotic normality of the CTMLE. In other words, it is assumed that our collaborative C-TMLE procedure selects a nonparametric enough estimator gn
for the censoring mechanism (in collaboration with
) so that the required unbiasedness of the efficient influence curve estimating function is achieved.
To derive the influence curve of
, the asymptotic linearity theorem below assumes also that the limit of the selected censoring mechanism estimator satisfies
As a consequence of this assumption (3), the influence curve does not involve a contribution requiring the analysis of a function of
. This important simplification of the influence curve allows straightforward calculation of standard errors for the C-TMLE. The assumption (3) requires the limit g0
to be nonparametric enough w.r.t. the actual estimator
so that enough orthogonality is achieved to make the contribution
Why assumption (3) holds for C-TMLE: We now explain why this assumption is reasonable for the C-TMLE.
. In other words,
corresponds with the limit of the least nonparametric estimator (among all estimators more nonparametric than the one identified by δ0
) that still yields the wished unbiasedness of the estimating function at
, and it as close as possible to g0
We note that
is a second order term (like Rn1
below) involving the difference
. By definition of
and the fact that
converges to Q*
, it is reasonable to assume
→ ∞. So Rn
is a second order term, so that it is reasonable to assume
By definition of
, we do not only have
but also that
is equally or more nonparametric than g0
) so that
This implies now that indeed
Finally, we note that the next theorem can be applied to any collaborative double robust estimator, as discussed in previous section, not only the collaborative double robust targeted maximum likelihood estimator.
Theorem 4 Let
(Q, g, ψ
) → D*
(Q, g, ψ
) be a well defined function that maps any possible
)) into a function of O. Let O1
, . . . , On
~ P0 be i.i.d, and let Pn be the empirical probability distribution. Let Q
) be a d-dimensional parameter, where ψ0
) is the parameter value of interest. In the following template for proving asymptotic linearity of
as an estimator of
represents the collaborative targeted maximum likelihood estimator, but it can be any estimator.
Let Q* denote the limit of
. Let gn be an estimator and g0 denote its limit.
Efficient Influence Curve Estimating Equation:
Censoring Mechanism Estimator is Nonparametric Enough:
(Above we show why the latter is indeed a second order term for the C-TMLE.) Consistency:
as n → ∞. And the same is assumed if one or two of the triplets
is replaced by its limit
(Q*, g0, ψ0
Identifiability/Invertibility: c0 = –d/dψ0P0D*(Q*, g0, ψ0) exists and is invertible.
)) : Q, g
} is P0-Donsker, where
) vary over sets that contain
) with probability tending to 1.
Contribution due to Censoring Mechanism Estimation: Define the mapping g
for some mean zero function
Second order terms: Define second order term
. Note Rn1 is a second order term involving difference between
Define second order term
. Note Rn2 is a second order term involving differences gn – g0 and ψn – ψ0
Then, ψn is asymptotically linear estimator of ψ0 at P0 with influence curve
converges in distribution to a multivariate normal distribution with mean zero and covariance matrix
The principal equations are
) = 0. So, we have
We denote the three terms on the right with I,II and III, and deal with them separately below.
By the Donsker condition, and consistency condition, we have
Thus, we obtain
as first term approximation. We refer to van der Vaart and Wellner (1996)
for this empirical process theorem.
The first term is
by our Donsker class condition, and consistency condition at
. We also have
is a second order term involving
) – (g0
). It remains to consider the term
, which is
by “Censoring Mechanism is Nonparametric Enough”-assumption.
The first term is
by Donsker class condition, and consistency condition at
. We also have
by assumption. Thus the third term equals P0D*
), which, by definition, equals Φ(gn
). We assumed that
. Thus, the third term equals
We can thus conclude that
, and thereby the stated asymptotic linearity. □
4.1. Statistical Inference
, then ICg0
= 0, so that the influence curve reduces to the efficient influence curve D*
) at a possibly weakly adjusted g0
. If gn
converges to the fully adjusted conditional distribution, given X
, then we know that ICg0
equals minus the projection of D*
) onto the tangent space of the model used by gn
(van der Laan and Robins (2003)
, Section 2.3.7). We suggest that, even if g0
is not the fully adjusted censoring mechanism, we will typically still have that D*
) is a conservative influence curve. In other words, if Qn
starts approximating the true Q0
, then the ICg0
contribution gets smaller and smaller, while if Qn
stays away from Q0
, then gn
starts approximating the fully adjusted g0
, in which case, inference based on D*
is conservative. This might explain why we see good coverage in our simulations based on “influence curve”
. If gn
corresponds with a parametric MLE estimator (for a data adaptively selected parametric model), then we propose to use the parametric delta-method to compute the analytic formula for the influence curve ICg0
in order to obtain an accurate influence curve.
One can estimate the covariance matrix Σ = E0ICIC
of the influence curve with the empirical covariance matrix
, and statistical inference can be based on the corresponding mean zero multivariate normal distribution, as usual.
4.2. Selection among difference collaborative targeted maximum likelihood estimators
Suppose that we have a set of candidate collaborative targeted maximum likelihood estimators
= 1, . . . , K
. Suppose that each of these estimators satisfy the conditions of the theorem. For example, these might be collaborative targeted maximum likelihood estimators as defined in our template, using different initial estimators indexed by k
, but the same collaborative estimator for the censoring mechanism as a function of the data and the initial estimator
(thus still resulting in different realizations if the initial estimators are different). Then
is asymptotically linear with influence curve
= 1, . . . , K
. We can now select among these candidate C-DR-TMLEs by maximizing the estimated efficiency, as in Rubin and van der Laan (2008)
Specifically, let Ψ be a one-dimensional parameter. We now select the k
that minimizes the cross-validated variance of the influence curve:
Thus, we would use the estimator
. If Ψ is multidimensional, then one needs to agree on a real valued criterion applied to the covariance matrix of the influence curve, such as the sum of the variances along the diagonal, and minimize over k
the criterion of the cross-validated covariance matrix of the k
-specific influence curve.
4.3. Irregular C-TMLE and super efficiency
converges to the fully adjusted g0
(· | X
) (fully adjusting for X
, under CAR) and
converges to Q0
, then it follows that ψn
is asymptotically linear with influence curve equal to the efficient influence curve D*
). So in that case, ψn
is an asymptotically efficient estimator and thereby also a regular estimator.
Due to the particular way gn is constructed in response to Qn, it is easily argued that the collaborative targeted MLE can be an irregular estimator and can be super efficient by achieving an asymptotic variance that is smaller than the variance of the efficient influence curve. In particular, our previous arguments showed that if the initial estimator is a maximum likelihood estimator according to a correctly specified parametric model, then gn will avoid nonparametric fits, thereby staying away from estimating the fully adjusted g0 that would result in an efficient estimator in first order. In this case, by the above theorem, the influence curve of ψn will be equal to D*(Q0, g0, ψ0), using a non-fully adjusted g0, so that the variance of the influence curve will be smaller than the variance of the efficient influence curve that involves a fully adjusted g0.
The super efficiency may have very attractive features in practice. For example, there might be a covariate that is very predictive of censoring/treatment, but have no relation to the outcome. The C-TMLE will now decide to not adjust for this covariate at all in the selected censoring mechanism, and as a consequence, it might achieve the efficiency bound for the data structure excluding this covariate, but still assuming CAR, so that the C-TMLE will have smaller asymptotic variance than the efficiency bound. The resulting super efficient estimator not only shows improved precision, but also yields more reliable confidence intervals, by avoiding heavily non-robust (and harmful) operations. In most practical scenarios, such a covariate will still have a weak link with the outcome. In this case, for very large sample sizes, the C-TMLE will adjust for this covariate and thereby only be asymptotically efficient, but it will still behave as a super efficient estimator for practical sample sizes, by not adjusting for this covariate. That is, it invests in effective bias reduction focussing on covariates that are still predictive of the outcome, taking into account the already included initial estimator. This behavior is completely compatible with an estimator that aims to minimize mean squared error of the estimator of the target parameter, and certainly avoids steps that both increase bias as well as variance.
Finally, we remark that in simulations in which Qn converges fast to the true Q0, gn seems to have a temptation to converge to a random choice g0 that is beyond the required minimal censoring mechanism with probability 1. That is, likelihood based cross-validation might over-select the adjustment in the censoring mechanism relative to the minimal adjustment, and the amount of over-selection remains random (but small) for large sample sizes (this is a known property of cross-validation). This naturally results in an irregularity of the estimator. Simulations have not shown practical problems for statistical inference, but this remains an area of study.