Denote the data from a

*k*-arm randomized trial,

*k* ≥ 2, as (

*Y*_{i},

*X*_{i},

*Z*_{i}),

*i* = 1, . . . ,

*n*, independent and identically distributed (iid) across

*i*, where, for subject

*i*,

*Y*_{i} is outcome;

*X*_{i} is the vector of all available auxiliary baseline covariates; and

*Z*_{i} =

*g* indicates assignment to treatment group

*g* with known randomization probabilities

*P*(

*Z* =

*g*) =

*π*_{g},

*g* = 1, . . . ,

*k*,

. Randomization ensures that

*Z**X*, where “

” means “independent of.”

Let

*β* denote a vector of parameters involved in making treatment comparisons under a specified statistical model. For example, in a two-arm trial, for a continuous real-valued response

*Y*, a natural basis for comparison is the difference in means for each treatment,

*E*(

*Y* |

*Z* = 2) −

*E*(

*Y* |

*Z* = 1), represented directly as

*β*_{2} in the model

In a three-arm trial, we may consider the model

In contrast to (

1), we have parameterized (

2) equivalently in terms of the three treatment means rather than differences relative to a reference treatment, and treatment comparisons may be based on pairwise contrasts among elements of

*β*. For binary outcome

*Y* = 0 or 1, where

*Y* = 1 indicates the event of interest, we may consider for a

*k*-arm trial

where logit(

*p*) = log{

*p*/(1 −

*p*)};

*β* = (

*β*_{1}, . . . ,

*β*_{k})

^{T} ; and the log-odds ratio for treatment

*g* relative to treatment 1 is

*β*_{g},

*g* = 2, . . . ,

*k*.

If

*Y*_{i} is a vector of continuous longitudinal responses

*Y*_{ij},

*j* = 1, . . . , m

_{i}, at times

*t*_{i1}, . . . ,

*t*_{imi}, response-time profiles in a two-arm trial might be described by the simple linear mixed model

where

*β* = (

*β*_{1},

*β*_{2})

^{T}, and

*β*_{2} is the difference in mean slope between the two treatments; extension to

*k* > 2 treatment groups is straightforward. Alternatively, instead of considering the fully parametric model (

4), one might make no assumption beyond

leaving remaining features of the distribution of

*Y* given

*Z* unspecified. For binary

*Y*_{ij}, the marginal model logit{

*E*(

*Y*_{ij} |

*Z*_{i})} =

*α* + {

*β*_{1} +

*β*_{2}*I*(

*Z*_{i} = 2)}

*t*_{ij} might be adopted.

In all of (

1)-(

5),

*β* (

*p* × 1) is a parameter involved in making treatment comparisons in a model describing aspects of the conditional distribution of

*Y* given

*Z* and is of central interest. In addition to

*β*, models like (

4) and (

5) depend on a vector of parameters

*γ*, say; e.g., in (

4),

; and

*γ* =

*α* in (

5). In general, we define

*θ* = (

*β*^{T} ,

*γ*^{T})

^{T} (

*r* × 1), recognizing that models like (

1)-(

3) do not involve an additional

*γ*, so that

*θ* =

*β*.

For these and similar models, consistent, asymptotically normal estimators for

*θ*, and hence for

*β* and functions of its elements reflecting treatment comparisons, based on the data (

*Y*_{i},

*Z*_{i}),

*i* = 1, . . . ,

*n*, only and thus “unadjusted” for covariates, are readily available. Unadjusted, large-sample tests of null hypotheses of “no treatment effects” are also well-established. The difference of sample means is the obvious such estimator for

*β*_{2} in (

1) and is efficient (i.e., has smallest asymptotic variance) among estimators depending only on these data, and a test of

*H*_{0} :

*β*_{2} = 0 may be based on the usual

*t* statistic. Similarly, the maximum likelihood estimator (MLE) for

*β*_{2} in (

4) and associated tests may be obtained from standard mixed model software. For

*k* > 2, pairwise and global comparisons are possible; e.g., in (

2), the sample means are efficient estimators for each element of

*β*, and a test of

*H*_{0} :

*β*_{1} =

*β*_{2} =

*β*_{3} may be based on the corresponding two-degree-of-freedom Wald statistic.

As noted in Section 1, the standard approach in practice for covariate adjustment, thus using all of (

*Y*_{i},

*X*_{i},

*Z*_{i}),

*i* = 1, . . . ,

*n*, is based on regression models for mean outcome as a function of

*X* and

*Z*. E.g., for

*k* = 2 and continuous

*Y*, a popular such estimator for

*β*_{2} in (

1) is the ordinary least squares (OLS) estimator for

in the analysis of covariance model

extension to

*k* > 2 treatments is immediate. See

Tsiatis et al. (2007, Section 3) for discussion of related estimators for

*β*_{2} in the particular case of (

1). If (

6) is the correct model for

*E*(

*Y* |

*X*,

*Z*), then

and

*β*_{2} in (

1) coincide, and, moreover, the OLS estimator for

in (

6) is a consistent estimator for

*β*_{2} that is generally more precise than the usual unadjusted estimator, even if (

6) is not correct (e.g.,

Yang and Tsiatis, 2001). For binary

*Y*, covariate adjustment is often carried out based on the logistic regression model

where the MLE of

is taken as the adjusted estimator for the log-odds ratio

*β*_{2} in (

3) with

*k* = 2. In (

7),

is the log-odds ratio conditional on

*X*, assuming this quantity is constant for all

*X*. This assumption may or may not be correct; even if it were,

is generally different from

*β*_{2} in (

3).

Tsiatis et al. (2007, Section 2) discuss this point in more detail.

To derive alternative methods, we begin by describing our assumed semiparametric statistical model for the full data (

*Y*,

*X*,

*Z*), which is a characterization of the class of all joint densities for (

*Y*,

*X*,

*Z*) that could have generated the data. We seek methods that perform well over as large a class as possible; thus, we assume that densities in this class involve no restrictions beyond the facts that

*Z**X*, guaranteed by randomization; that

*π*_{g} =

*P*(

*Z* =

*g*),

*g* = 1, . . . ,

*k*, are known; and that

*β* is defined through a specification on the conditional distribution of

*Y* given

*Z* as in (

1)-(

5). We thus first describe the conditional density of

*Y* given

*Z*. Under (

3) and (

4), this density is completely specified in terms of

*θ*, while (

5) describes only one aspect of the conditional distribution, the mean, in terms of

*θ*, and (

1) and (

2) make no restrictions on the conditional distribution of

*Y* given

*Z*. To represent all such situations, we assume that this conditional density may be written as

*p*_{Y|Z} (

*y*|

*z*;

*θ*,

*η*), where

*η* is an additional nuisance parameter possibly needed to describe the density fully. For (

3) and (

4),

*η* is null, as the density is already entirely characterized. For (

1), (

2), and (

5),

*η* is infinite-dimensional, as these specifications do not impose any additional constraints on what the density might be, so any density consistent with these models is possible.

Under the above conditions, we assume that all joint densities for (

*Y*,

*X*,

*Z*) may be written, in obvious notation, as

*p*_{Y,X,Z}(

*y, x, z; θ, η, ψ, π*) =

*p*_{Y,X|Z}(

*y*,

*x* |

*z*;

*θ*,

*η*,

*ψ*)

*p*_{Z}(

*z*;

*π*), where

*p*_{Z}(

*z; π*) is completely specified, as

*π* = (

*π*_{1}, . . . ,

*π*_{k})

^{T} is known, and satisfy the constraints

The joint density involves an additional, possibly infinite-dimensional nuisance parameter

*ψ*, needed to include in the class all joint densities satisfying (i) and (ii). Here,

*p*_{X}(

*x*) is any arbitrary marginal density for the covariates, and (ii) follows because

*Z**X*. In Web Appendix A, we demonstrate that a rich class of joint distributions for (

*Y*,

*X*,

*Z*) may be identified such that

*X* is correlated with

*Y* and

*Z**X* [condition (ii)] that also satisfy condition (i). Because the joint density involves both finite (parametric) and infinite-dimensional components, it represents a semiparametric statistical model (see

Tsiatis, 2006, Section 1.2).