Denote the data from a
k-arm randomized trial,
k ≥ 2, as (
Yi,
Xi,
Zi),
i = 1, . . . ,
n, independent and identically distributed (iid) across
i, where, for subject
i,
Yi is outcome;
Xi is the vector of all available auxiliary baseline covariates; and
Zi =
g indicates assignment to treatment group
g with known randomization probabilities
P(
Z =
g) =
πg,
g = 1, . . . ,
k,
. Randomization ensures that
Z
X, where “
![[dbl vert, bar (under)]](/corehtml/pmc/pmcents/x2AEB.gif)
” means “independent of.”
Let
β denote a vector of parameters involved in making treatment comparisons under a specified statistical model. For example, in a two-arm trial, for a continuous real-valued response
Y, a natural basis for comparison is the difference in means for each treatment,
E(
Y |
Z = 2) −
E(
Y |
Z = 1), represented directly as
β2 in the model
In a three-arm trial, we may consider the model
In contrast to (
1), we have parameterized (
2) equivalently in terms of the three treatment means rather than differences relative to a reference treatment, and treatment comparisons may be based on pairwise contrasts among elements of
β. For binary outcome
Y = 0 or 1, where
Y = 1 indicates the event of interest, we may consider for a
k-arm trial
where logit(
p) = log{
p/(1 −
p)};
β = (
β1, . . . ,
βk)
T ; and the log-odds ratio for treatment
g relative to treatment 1 is
βg,
g = 2, . . . ,
k.
If
Yi is a vector of continuous longitudinal responses
Yij,
j = 1, . . . , m
i, at times
ti1, . . . ,
timi, response-time profiles in a two-arm trial might be described by the simple linear mixed model
where
β = (
β1,
β2)
T, and
β2 is the difference in mean slope between the two treatments; extension to
k > 2 treatment groups is straightforward. Alternatively, instead of considering the fully parametric model (
4), one might make no assumption beyond
leaving remaining features of the distribution of
Y given
Z unspecified. For binary
Yij, the marginal model logit{
E(
Yij |
Zi)} =
α + {
β1 +
β2I(
Zi = 2)}
tij might be adopted.
In all of (
1)-(
5),
β (
p × 1) is a parameter involved in making treatment comparisons in a model describing aspects of the conditional distribution of
Y given
Z and is of central interest. In addition to
β, models like (
4) and (
5) depend on a vector of parameters
γ, say; e.g., in (
4),
; and
γ =
α in (
5). In general, we define
θ = (
βT ,
γT)
T (
r × 1), recognizing that models like (
1)-(
3) do not involve an additional
γ, so that
θ =
β.
For these and similar models, consistent, asymptotically normal estimators for
θ, and hence for
β and functions of its elements reflecting treatment comparisons, based on the data (
Yi,
Zi),
i = 1, . . . ,
n, only and thus “unadjusted” for covariates, are readily available. Unadjusted, large-sample tests of null hypotheses of “no treatment effects” are also well-established. The difference of sample means is the obvious such estimator for
β2 in (
1) and is efficient (i.e., has smallest asymptotic variance) among estimators depending only on these data, and a test of
H0 :
β2 = 0 may be based on the usual
t statistic. Similarly, the maximum likelihood estimator (MLE) for
β2 in (
4) and associated tests may be obtained from standard mixed model software. For
k > 2, pairwise and global comparisons are possible; e.g., in (
2), the sample means are efficient estimators for each element of
β, and a test of
H0 :
β1 =
β2 =
β3 may be based on the corresponding two-degree-of-freedom Wald statistic.
As noted in Section 1, the standard approach in practice for covariate adjustment, thus using all of (
Yi,
Xi,
Zi),
i = 1, . . . ,
n, is based on regression models for mean outcome as a function of
X and
Z. E.g., for
k = 2 and continuous
Y, a popular such estimator for
β2 in (
1) is the ordinary least squares (OLS) estimator for
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
in the analysis of covariance model
extension to
k > 2 treatments is immediate. See
Tsiatis et al. (2007, Section 3) for discussion of related estimators for
β2 in the particular case of (
1). If (
6) is the correct model for
E(
Y |
X,
Z), then
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
and
β2 in (
1) coincide, and, moreover, the OLS estimator for
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
in (
6) is a consistent estimator for
β2 that is generally more precise than the usual unadjusted estimator, even if (
6) is not correct (e.g.,
Yang and Tsiatis, 2001). For binary
Y, covariate adjustment is often carried out based on the logistic regression model
where the MLE of
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
is taken as the adjusted estimator for the log-odds ratio
β2 in (
3) with
k = 2. In (
7),
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
is the log-odds ratio conditional on
X, assuming this quantity is constant for all
X. This assumption may or may not be correct; even if it were,
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
is generally different from
β2 in (
3).
Tsiatis et al. (2007, Section 2) discuss this point in more detail.
To derive alternative methods, we begin by describing our assumed semiparametric statistical model for the full data (
Y,
X,
Z), which is a characterization of the class of all joint densities for (
Y,
X,
Z) that could have generated the data. We seek methods that perform well over as large a class as possible; thus, we assume that densities in this class involve no restrictions beyond the facts that
Z
X, guaranteed by randomization; that
πg =
P(
Z =
g),
g = 1, . . . ,
k, are known; and that
β is defined through a specification on the conditional distribution of
Y given
Z as in (
1)-(
5). We thus first describe the conditional density of
Y given
Z. Under (
3) and (
4), this density is completely specified in terms of
θ, while (
5) describes only one aspect of the conditional distribution, the mean, in terms of
θ, and (
1) and (
2) make no restrictions on the conditional distribution of
Y given
Z. To represent all such situations, we assume that this conditional density may be written as
pY|Z (
y|
z;
θ,
η), where
η is an additional nuisance parameter possibly needed to describe the density fully. For (
3) and (
4),
η is null, as the density is already entirely characterized. For (
1), (
2), and (
5),
η is infinite-dimensional, as these specifications do not impose any additional constraints on what the density might be, so any density consistent with these models is possible.
Under the above conditions, we assume that all joint densities for (
Y,
X,
Z) may be written, in obvious notation, as
pY,X,Z(
y, x, z; θ, η, ψ, π) =
pY,X|Z(
y,
x |
z;
θ,
η,
ψ)
pZ(
z;
π), where
pZ(
z; π) is completely specified, as
π = (
π1, . . . ,
πk)
T is known, and satisfy the constraints
The joint density involves an additional, possibly infinite-dimensional nuisance parameter
ψ, needed to include in the class all joint densities satisfying (i) and (ii). Here,
pX(
x) is any arbitrary marginal density for the covariates, and (ii) follows because
Z
X. In Web Appendix A, we demonstrate that a rich class of joint distributions for (
Y,
X,
Z) may be identified such that
X is correlated with
Y and
Z
X [condition (ii)] that also satisfy condition (i). Because the joint density involves both finite (parametric) and infinite-dimensional components, it represents a semiparametric statistical model (see
Tsiatis, 2006, Section 1.2).