Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Biometrics. Author manuscript; available in PMC 2014 February 2.
Published in final edited form as:
PMCID: PMC3909656

A broad symmetry criterion for nonparametric validity of parametrically-based tests in randomized trials


Pilot phases of a randomized clinical trial often suggest that a parametric model may be an accurate description of the trial's longitudinal trajectories. However, parametric models are often not used for fear that they may invalidate tests of null hypotheses of equality between the experimental groups. Existing work has shown that when, for some types of data, certain parametric models are used, the validity for testing the null is preserved even if the parametric models are incorrect. Here, we provide a broader and easier to check characterization of parametric models that can be used to (a) preserve nonparametric validity of testing the null hypothesis, i.e., even when the models are incorrect, and (b) increase power compared to the non- or semiparametric bounds when the models are close to correct. We demonstrate our results in a clinical trial of depression in Alzheimer's patients.

Keywords: Causal inference, Hypothesis test, Randomized clinical trial, Robustness, Superefficiency

1. Introduction

When analyzing data from randomized clinical trials, investigators often have information about the relative appropriateness of certain parametric models from pilot phases or existing literature. More specifically, suppose one is interested in assessing whether there is a difference in average trajectories between a treatment arm and a control arm. Previous observations that such trajectories are curvilinear over time would mean that a parametric model could approximate well the actual underlying trajectories.

For example, in pharmaceutical treatment for patients with depression and Alzheimer's disease studies (DIADS, Lyketsos et al 2003), depressive symptoms are studied longitudinally after initiation of an antidepressive treatment regime or placebo. It has been observed that such treatments generally result in an initial improvement in symptoms that reaches a plateau in a matter of weeks (e.g., Mulsant et al, 2001). This curvilinear shape indicates that a parametric model of quadratic curves for the mean outcome over time would be close to the actual trajectories. This in turn would mean that a test between a treatment and a control arm based on such parametric models might result in higher power than a nonparametric test.

Unfortunately, researchers tend to not use parametric models when analyzing data from such trials. This is understandably a result of hesitations about the validity of parametric tests when these models are misspecified. More specifically, the behavior of the type I error of hypothesis tests for RCTs based on misspecified parametric models has not been as carefully studied until recently. Early work by Gail, Tan, and Piantodosi (1988) examined test validity using a special case of misspecified generalized linear models (see discussion). For linear models, Robins (2004) has examined the behavior of hypothesis testing based on misspecified models in this context. Rosenblum and van der Laan (2009) have shed further light on this problem, by showing that there exist classes of possibly misspecified models that still lead to valid tests. These results, however, have been specific to testing for differences in means in particular subclasses of generalized linear models.

We derive a criterion that characterizes a broader class of parametric models through which non-parametrically robust hypothesis tests are obtainable. For example, we show in Section 4 that a large class of longitudinal parametric models can also be used to construct non-parametrically valid tests. Furthermore, the criterion that we propose is easy to verify as it has a geometrical symmetry interpretation. This is important because these parametric model-based tests (a) preserve nonparametric validity of testing the null hypothesis, i.e., even when the models are incorrect, and (b) increase power compared to the non- or semiparametric bounds when the models are close to correct (see Section 6). In the next section, we present the setting and notation of the remainder of the work. In Section 3 we give our main characterization result. In Section 4 we show that the classes characterized in Rosenblum and van der Laan (2009) are a subset of the class characterized by the more general symmetry criterion. In Section 5 we give an application to the DIADS trial, and we conclude with a discussion.

2. Scientific setting and goal

We consider a randomized clinical trial (RCT) that compares an outcome Y between two treatments, a = 0, 1. Specifically, for each of i = 1, ..., n patients, we measure the assigned treatment A and the outcome Y . We also allow that pre-treatment covariate information X is measured; X is not used for randomization but can be used for analysis. We wish to generalize inference statements in a reference population from which we can assume that the n patients are a representative random sample.

We denote by patrue(y;x) the true conditional distribution pr(Y = y | A = a, X = x) for randomized arms a = 0, 1. We wish to test the null hypothesis


as functions of y, x. More specifically, our goal is to test H0 with tests that are developed based on parametric models but are non-parametrically valid. This will be useful when pilot phases of a clinical trial have suggested that a parametric model may be an accurate description of the trial's data, such as shapes of longitudinal trajectories.

Typically, we would represent a parametric model for the RCT with covariates by a collection of distributions {P(Y = y | A = a, X = x, θ), for θ [set membership] Θ} over a parameter space Θ. Here, however, it will help give further intuition to our results if instead we use a different representation. Every parameter value θ [set membership] Θ gives rise simultaneously to one distribution for the arm A = 0 and another for A = 1, namely, to the vector of distributions (P(Y = y | A = 0, X = x, θ), P(Y = y | A = 1, X = x, θ)), which we denote by (p0(y; x; θ), p1(y; x; θ)). As the parameter θ varies over Θ, we therefore represent an arbitrary parametric model by the set of vectors

S:={(p0(y;x;θ),p1(y;x;θ)),θ[set membership]ϴ},

or, more briefly, by {(p0, p1)} where we have omitted the indices for y, x, θ. In words, S is a set whose members are the vectors of the distributions for the two arms of the RCT that are generated by a parameter value. We allow that the model S may be incorrect in the sense that S may not contain (p0true,p1true).

3. A symmetry criterion for non-parametric validity of parametric tests

Rosenblum and van der Laan (2009) consider regression models under the setting described above and show that a class of models with a particular form induce valid hypothesis tests, independently of whether or not the specified model is correct. We claim that this property holds for a more general class of models characterized by the following criterion:

Criterion 1: If (p0, p1) is a pair of distributions for treatment arms a = 0 and a = 1 in model S, then this criterion requires that the possibly incorrect null distributions also be members in model S.


In terms of the parameter-based, but longer notation, Criterion 1 is described as follows. For a given value of θ, which defines (p0(y; x; θ), p1(y; x; θ)) as an allowed pair of distributions for the treatment arms a = 0 and a = 1 in the model S, there exists two parameter values in S, say θ0*(θ) and θ1*(θ) for which: the null pair (p0(y; x; θ), p0(y; x; θ)) can be written as (p0(y;x;θ0*(θ)),p1(y;x;θ0*(θ))) and so belongs in S with parameter value θ0*(θ); and the null pair (p1(y; x; θ), p1(y; x; θ)) can be written as (p0(y;x;θ1*(θ)),p1(y;x;θ1*(θ))) and so belongs in S with parameter value θ1*(θ).

More intuitively, Criterion 1 can be depicted visually using the Kullback-Leibler (KL) distance (the negative of the KL information, Kullback and Leibler, 1951) as in Figure (1(a)-1(b)). The axes in these plots are the component-wise KL distance from the true null distribution in each arm, which is convenient for emphasizing the symmetric nature of the criterion. In Figure 1(a), Criterion 1 is satisfied, but in 1(b) is fails to hold. In simpler language, this criterion requires that if the model allows a distribution p for one of the arms, then it must allow that the null hypothesis (p, p) may be true. This criterion is reasonable and with the goal to compare between treatment arms, it would be difficult to justify a model that does not allow for such a null hypothesis.

Figure 1Figure 1
Depiction of the two scenarios in which Criterion 1 is satisfied (left) and not satisfied (right) by the class of models S.

Under the regularity condition that π0E{|log p0(Yi; Xi; θ)| | Ai = 0}+π1E{|log p1(Yi; Xi; θ)| | Ai = 1} < ∞ for all θ, and where πa = P(Ai = a), we have the following result.

Result 1: If Criterion 1 is satisfied, we have that under the null hypothesis (1), a null distribution (p*, p*) [set membership] S maximizes the limit of the log-likelihood function.

If, in addition, conditions A1-A6 of (White, 1982) hold then we have that (p*, p*) is the unique maximizer of limiting log-lkelihood and that the MLE of the contrast between p0 and p1 is asymptotically normal with mean 0. The result is shown in Appendix A. In what follows, we assume the above regularity conditions.

The above result is important because, although the researcher does not control the correctness of the parametric model S, the researcher fully controls and can select S to satisfy Criterion 1. The latter thus ensures that under the true null H0, any contrast (e.g., difference in means, medians) between the maximum likelihood estimates, say (p^0,p^1), is asymptotically also null. The uncertainty of the contrast between the maximum likelihood estimates, (p^0,p^1) should be estimated robustly, for example, using a bootstrap (see Section 5).

4. Relation with established literature

Result 1 of the last section generalizes the models of Rosenblum and van der Laan (2009), and of Gail, Tan, and Piantadosi (1988) (which are a special case of Rosenblum and van der Laan (2009), see discussion) that can be used as a basis for a valid test. Specifically, Rosenblum and van der Laan (2009) considered the null hypothesis to be on the mean regressions in each arm,

H0:μ0true(x)=μ1true(x),for allx

where μatrue(x)=E(Y[mid ]X=x,A=a), and showed that tests based on the working model for Y being a generalized linear model are robust to that model being incorrect. We can now see that that result follows from geometric symmetry arguments similar to the ones for Criterion 1 and Result 1. To see this, define μa(x, β) to be the model's mean regression E(Y | X = x, A = a) and define Smeans = {(μ0(·, β), μ1(·, β))} to be the set of mean functions allowed by the model simultaneously for the two treatment arms. Consider now the following symmetry criterion analogous to Criterion 1: Criterion 2: For a given value of β, which defines (μ0(·, β), μ1(·, β)) as an allowed pair of means for the treatment arms a = 0 and a = 1 in the mean model Smeans, the criterion requires that the null pairs

(μ0([center dot],β),μ0([center dot],β))and(μ1([center dot],β),μ1([center dot],β))

also be members in Smeans.

Note that, for the above null pairs to be in the model Smeans, we mean that for any given β, the left null pair of (5) can be rewritten as (μ0([center dot],β0*),μ1([center dot],β0*)) for some set of parameters, β0*(β); and the right null pair of (5) can be rewritten as (μ0([center dot],β1*),μ1([center dot],β1*)) for some set of parameters, β1*(β).

Result 2: If Criterion 2 is satisfied, we have that under the null hypothesis (4), the limit of the log-likelihood function is maximized at a parameter β for which the pair (μ0(·, β), μ1(·, β)) has a null contrast, i.e., μ0(·, β) = μ1(·, β).

In the Web Appendix, we prove Result 2 and also show that the generalized linear models described in Rosenblum and van der Laan (2009) satisfy Criterion 2. Criterion 2 is similar to Criterion 1 in its statement and function. The difference is in the null hypotheses (4 and 1, respectively). Criterion 2 requires symmetry in the mean structures allowed in the model but is limited to generalized linear models, whereas Criterion 1 requires symmetry with respect to the distribution, and is applicable to any parametric model. To show the generality of Criterion 1, we continue with two examples that demonstrate the ease of checking its conditions.

For a first example, consider the simple normal linear regression with homoscedastic variance σ2 and mean E(Y | X = x, A = a) modeled as


where β = (β0, βX, βA) are unrestricted. To evaluate Criterion 1, note that for a given value of β the pairs of expectations for the two treatments, [μ0(x, β), μ1(x, β)], are


Therefore it is seen easily that Criterion 1 holds because the null pair distributions with means


and with the same σ2 are also allowed models in S; the first pair is the null model that chooses the coefficient of A to be 0 and the intercept to be β0; the latter pair can be re-written as [(β0 + βA)+ βXx, (β0 + βA) + βXx], which can also be derived in S by choosing the coefficient of A to be 0 and absorbing βA in a new intercept, β0 + βA. This simple example is also a member of the classes of robust models that Rosenblum and van der Laan (2009) described.

As a second example, it is useful to consider a study measuring the outcome Y longitudinally, say at times t = 0, ...T, yielding values Yt respectively. For such outcome, consider a multivariate normal model with means E(Yt | A = a) modeled as


and unknown variance covariance matrices var(Y|A = a) = Σa, where Σa are positive definite and the parameters βa = (β0,a, β1,a, β2,a) for a = 0, 1 are unrestricted. Robustness properties of such a longitudinal model are not considered by Rosenblum and van der Laan (2009), yet we can now clearly see that this model too satisfies Criterion 1. Specifically, for any given value of βa=0 and βa=1, the pair of expectations for the two treatments, (μ0(t, β), μ1(t, β)) is

(βa=0,βa=1)[center dot](1,t,t2).

Therefore, the null pairs [μ0(t, β), μ0(t, β)] and [μ1(t, β), μ1(t, β)] are

(βa=0,βa=0)[center dot](1,t,t2)and(βa=1,βa=1)[center dot](1,t,t2).

Because the parameters βa=0, βa=1, Σa=0, Σa=1 are unrestricted, it follows that the last two pairs are also in the model, so Criterion 1 is satisfied.

5. Example: the DIADS Trial

5.1 Plans for evaluation

Although major depression is a significant cause of morbidity in patients with Alzheimer's disease (AD), reports concerning the treatment of such a condition are conflicting. Forty-four community-dwelling older adults who were diagnosed with probable AD and had experienced a major depressive episode were randomized to sertraline A = 1 or placebo A = 0 in the Depression in Alzheimer's Disease Study (DIADS). Details on inclusion and exclusion criteria, along with a more detailed description of the trial are available in Lyketsos et al. (2003).

In order to assess the effect of sertraline on depression, we consider the Cornell Scale for Depression in Dementia (CSDD) (Alexopoulos et al., 1988), which was measured at baseline (t = 0) and at t = 3, 6, 9, and 12 weeks after enrollment. The observed data are depicted in Figure (2), left panel, where the thicker lines denote the observed means in each treatment arm.

Figure 2
The observed CSDD measurements (black for placebo; grey for sertraline arm) in the DIADS trial and the fitted means (dotted curves) from the nonparametric (MANCOVA, left) and parametric (quadratic, right) models.

We consider testing the null hypothesis H0 of (1) against the alternative hypothesis that the distributions are different, using two models. For both models, we estimate a common quantity, the difference in means between treatment and placebo at each time past baseline, i.e., δt = E(Yt | A = 1) – E(Yt | A = 0). We assess the hypothesis that all δt = 0 which is true under H0.

The first model is the nonparametric version of the MANCOVA in which we represent the mean Cornell scores Y at time t as:

E(Yt[mid ]A=a)=μa(t)

and unknown variance covariance matrices var(Y|A = a) = Σa, where Σa are positive definite and the parameters μa(t) for a = 0, 1 and all t are unrestricted. From this model, we test for a treatment effect (after baseline) by (i) obtaining the nonparametric maximum likelihood estimators, δ^tnonpar of δt for t > 0, which are simply the differences in average Cornell scores between treatment and placebo at each time; and (ii) using the Wald test statistic Wnonpar=(δ^nonpar)S1δ^nonpar, where S is the estimated variance covariance matrix of δ^nonpar. We obtained S by bootstrap of the subjects under the null hypothesis.

Prior to DIADS, pilot studies had already suggested that the mean Cornell scores on sertraline show an initial benefit which then starts reaching a plateau (e.g., Mulsant et al, 2001). This suggests that the simple model in (9) that allows for a quadratic trajectory in time for the mean in each arm could represent parsimoniously the DIADS trajectories for the time frame of 12 weeks. Moreover, because model (9) satisfies Criterion 1, we know that under the nonparametric H0 of (1), the limits of the MLEs of βa=0 and βa=1 are the same fixed vector, say β* . Thus, under H0, the MLE of the difference, δ^param:=β^a=1β^a=0 has a probability limit of 0 even if the model is misspecified. From this model, then, we test for a treatment effect on the means (after baseline) by using the Wald test statistic Wparam=(δ^param)V1δ^param, where V is the estimated variance covariance matrix of δ^param. Here too, V is obtained by bootstrap as for the nonparametric test.

From the theoretical part of the paper, we know that because this parametrically-derived test satisfies Criterion 1, it should be nonparametrically valid under the null (1). Also, it will have better power than the nonparametric test to detect alternatives of diminishing drug benefit that is well described by the trajectory (9). We evaluated these two properties in the motivating study of DIADS.

5.2 Evaluation

First, in order to check that the tests are valid in data like those in DIADS, we estimated the type I error of the above two tests in the distribution that results by simulating 1,000 placebo and sertraline arms with sampling from the observed placebo arm only. This creates studies of the same size as the one we have, and enforces the null hypothesis with distribution equal to that of the observed placebo arm, which is not necessarily satisfying the parametric model (9). In this realistic example, the empirical type I error was 5% for both Wnonpar and Wparam.

Next, both models were fitted to the DIADS data and the fitted means are depicted in thick dashed lines in Figure (2). Estimates of the variance covariance matrices were obtained from 500 bootstrap samples. The significance levels (p-values) for a treatment effect were 0.10 for the nonparametrically derived test Wnonpar and 0.04 for the robust parametrically derived test Wparam.

Finally, we compared the two tests in terms of power to detect the empirical effects seen in the study. Specifically, in order to assess power, a bootstrap within arms was used to resample 1000 datasets with the same number of individuals in each of the treatment arms as the observed DIADS trial. For each of these resampled datasets, the MANCOVA and quadratic models were fit and standard errors were estimated (via a further bootstrap of the resampled individuals). The power was then calculated as the proportion of times each model rejected the null hypothesis of no treatment effect. These simulations estimated the power to be 61% for the nonparametrically derived test Wnonpar and 69% for the robust parametrically derived test Wparam.

The power of both tests converges to 1 with increasing effects and increasing sample size. The effect size at the end of this study was relatively large (67%). Thus we expect that the relative gains in power between the two methods should be larger in smaller effect sizes and smaller with larger sample sizes. A more comprehensive study of power is of interest for further work.

6. Discussion

We have demonstrated that for testing the null hypothesis of equivalence between treatment arms, a wide class of parametric models provides testing with nonparametric validity. We provided a simple symmetry characterization of such classes providing investigators an easy way to harness the efficiency of such parametric models while maintaining robustness properties traditionally considered reserved for nonparametric methods.

The work of Gail, Tan and Piantadosi (1988) also considered testing hypotheses using a class of misspecified generalized linear models. They calculated a score statistic from the residuals of a model fit omitting the treatment effect, and demonstrated that the standardization of this statistic based on a misspecified model yields Type I error rates above the nominal level. The incorrect Type I error rate from the score test normalized using the model-based variance resulted from the misspecified model. Indeed, the validity of the test was confirmed when robust variance estimation based on permutation of residuals was employed. Although our tests are different from those of Gail, Tan and Piantadosi, their models are a special case of the models considered by Rosenblum and van der Laan (2009) and by those satisfying our Criterion 1. In this sense, the new result therefore extends the class of possibly misspecified models that can be used to derive nonparametrically valid tests.

One can also use permutation tests with the general models satisfying Criterion 1, but now with any estimated contrast, say ĉ, between the two arms’ distributions p0 and p1 under the more general models satisfying Criterion 1. Specifically, one can easily find the permutation distribution of ĉ by having the computer calculate ĉ for a large number of permutations of the treatment labels, and then compare that reference distribution to the observed value of the estimated contrast. If there is a true effect, though, this mixing of the two arms’ data may yield a large variance in the reference distribution of the estimated contrast, leading to low power, which would be a tradeoff for using exact tests.

It is reasonable to surmise that, under our Criterion 1 and an adaptation of White's (1982) conditions A1-A6 to the permutation test setting, the Wald test statistic for treatment effect using a sandwich or bootstrap variance will be asymptotically equivalent under the null hypothesis to the Gail et al. score test statistic, and would have an asymptotic standard normal null distribution. A rigorous investigation of this issue is a potential topic for future work.

Although the Criterion 1 is quite general, there are more general conditions that ensure model robustness. An example of such a condition is:

Criterion 3: Let (p0, p1) be a valid pair distribution in S. Then, if pi [set membership] Si maximizes the limit of the log-likelihood under the null distribution, then (pi, pi) [set membership] S.

The same proof as for Criteron 1 is valid assuming the more general Criterion 3. This criterion is quite difficult to interpret, however, as it is dependent on the true distribution of the data. As such, it is of little practical import but illustrates a general nature of the robustness phenomenon.

Our results use the regularity conditions of White (1982). The conditions are similar in spirit to those ensuring the usual consistency and normality properties of the MLE, but are adapted to misspecified models with the assistance of the Kullback-Leibler distance. If these conditions are not met, there can be indeed multiple maximizers of the limiting loglikelihood. This can be addressed by defining the MLE (p^0,p^1) of interest in the study sample to be the maximizer that is closest to a null of distribution in S in terms of the KL distance. Under the true null, we expect that even under quite looser conditions this MLE (p^0,p^1) will converge to a null distribution in S, although the more technical parts of this problem will be explored in future work.

The results from this paper may be easily extended beyond the case of maximum likelihood estimation. The validity of the test from Result 1 holds in more general estimating equation settings as long as the limit of the objective function is of the form of (7) (Appendix) under the null hypothesis. Examples of such models include generalized estimating equations (Liang and Zeger, 1986), which are used routinely in the analysis of data from clinical trials.

It is also important to note the relation of our work to semiparametric methods that use covariates (e.g., Tsiatis et al. (2008)). Within a semiparametric model say Ssemipar, an efficient semiparametric estimator has the variance of the least favorable parametric submodel allowed in Ssemipar. Thus, if a researcher chooses to use a test based on a parametric model, say, Sparam, that satisfies Criterion 1 in a way described in this paper, then the following hold: (a) the test based on Sparam will be as valid as the test based on the semiparametric estimator; (b) if Sparam is true or in a sufficiently close neighborhood to being true, and the least favorable submodel of Ssemipar if different from Sparam, then the test based on Sparam will be more powerful than the test based on Ssemipar; (c) if Sparam is far from being true, then the test based on Ssemipar will be more powerful than the test based on Sparam. Thus, an important point to consider in whether or not to use the robust tests of models satisfying Criterion 1 is whether or not those models are expected to describe well features of the study, for example based on prior pilot studies.

Information from prior pilot studies or other scientific knowledge, although important, may not be critical for parametric-based procedures to be valid nonparametrically. This is suggested by work by Frangakis and Rubin (2001) and van der Laan et al. (2007), who examine how observed data from the study at hand can be used for choosing between a parametric-based versus a semi- or nonparametric-based estimator. To preserve nonparametric validity, these types of choice procedures are superefficient and not regular in the theoretical statistical sense, and require additional study.

Table 1
Comparison of the performance of the MANCOVA and parametric quadratic models on the DIADS data.

Supplementary Material

Supp Material S1


We thank the Editor, Associate Editor and Reviewers for constructive comments, Michael Rosenblum for inspiring discussions, and the NIH (R01DA023879) for partial financial support. Russell Shinohara is supported by the Epidemiology and Biostatistics of Aging Training Grant T32AG000247 from the National Institute on Aging.


Proof of Result 1

For a pair (p0; p1) of distributions allowed in the parametric model S, the log likelihood of a random sample of i = 1; ..., n individuals randomly assigned to either Ai = 0 or 1 is proportional to i:Ai=0logp0(Y;Xi)+i:Ai=1logp1(Yi;Xi), and therefore proportional to


where na is the number of patients in treatment arm a = 0, 1 and n0 + n1 = n. Under the regularity condition that π0E{|log p0(Yi; Xi; θ)| | Ai = 0} + π1E{|log p1(Yi; Xi; θ)| | Ai = 1} < ∞ for all θ, and where πa = P(Ai = a), the probability limit of the above log likelihood is

π0E{logp0(Yi;Xi)[mid ]Ai=0}+π1E{logp1(Yi;Xi)[mid ]Ai=1},

Assume now that the null hypothesis (1) that the true distributions p1true=p0true holds; then the operations E(log(·) | Ai = 0) and E(log(·) | Ai = 1) in (14) are the same operation, say Q(·), and so (14) is simplified as


where we have omitted the arguments Yi, Xi with no loss of generality.

Let us now assume Criterion (1) from the main section, and suppose that a maximizer of (15) is a non-null pair (p0*,p1*), i.e. with p0*p1*. Then there are two cases: (a) either Q(p0*)=Q(p1*) or (b) one of Q(p0*),Q(p1*) is larger. If (a) is true, then the null pair (p0*,p0*), which by Criterion 1 is also in the model, gives the same value of the functional (15) and so is also a maximizer (the same is true for the null pair (p1*,p1*)). If (b) is true, then suppose Q(p0*) is the larger of Q(p1*). Then, we can see that the null pair (p0*,p0*) will actually give a value π0Q(p0*)+π1Q(p0*) that is greater than the maximum, which would be a contradiction. So, (b) cannot be true, and so from (a) we know that the limit of the log likelihood (15) is maximized at a null pair of distributions in the model, say (p*, p*), which proves Result 1.

If, in addition, we have regularity conditions A1-A6 of (White, 1982) then we have that the null pair (p*, p*) is the unique maximizer of (15), and, with arguments analogous to White (1982) we get that the MLE of the contrast between p0 and p1 is asymptotically normal with mean 0.

Proof of generalization of results from Rosenblum and van der Laan (2009)

First, let us consider the form of the mean function of generalized linear model with the robustness property proposed by Rosenblum and van der Laan, that is,

μA([center dot],β)=jβj(0)fj(A)gj([center dot])+kβk(1)hk([center dot])

where {fj, gj, hk} are such that for each j there exists a k such that gj(·) = hk(·), and β={βj(0),βk(1)}. We will show that this property is a special case of (i.e., implies) symmetry Criterion 2 .

Suppose (μ0(·, β), μ1(·, β)) be a valid pair in Smeans. Without loss of generality, let us consider the model for the A = 1 arm:

μ1([center dot],β)=jβj(0)fi(1)gj([center dot])+kβk(1)hk([center dot]).

Since each of the gj is equal to an hk, which we denote by hk(j), we have that (17) equals:

jβj(0)fi(1)h(j)([center dot])+kβk(1)hk([center dot])=k{j:k(j)=kβj(0)fj(1)}hk([center dot])+kβk(1)hk([center dot])(where, if for somek,{j:k(j)=k}is empty, we definej:k(j)=kto be0)=k{j:k(j)=kβj(0)fj(1)+βk(1)}hk([center dot])=j0[center dot]fj(1)gj([center dot])+k{j:k(j)=kβj(0)fj(1)+βk(1)}hk([center dot])=μ1([center dot],β*)=j0[center dot]fi(0)gj([center dot])+k{j:k(j)=kβj(0)fj(1)+βk(1)}hk([center dot])=μ0([center dot],β*),

where we can define β* component-wise by inspection to match the definition of (16) (i.e., the components of β* for the first summand in (16) are 0, and for the second summand in (16) are j:k(j)=kβj(0)fj(1)+βk(1)). Hence (μ0(·, β), μ1(·, β)) [set membership] Smeans implies that (μ0(·, β), μ0(·, β)) [set membership] Smeans (because the latter pair equals to (μ0(·, β*), μ1(·, β*))). The analogous argument can be used to show that (μ1(·, β), μ1(·, β)) [set membership] Smeans. Therefore, Criterion 2 is satisfied and so Result 2 holds for the class of generalized linear models of the form (16).


Web Appendices referenced in Section 4 are available under the Paper Information link at the Biometrics website

Contributor Information

Russell T. Shinohara, Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.

Constantine E. Frangakis, Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.

Constantine G. Lyketsos, Department of Psychiatry, Johns Hopkins Bayview Hospital, Baltimore, MD, USA.


  • Alexopoulos G, Abrams R, Young R, Shamoian C. Cornell scale for depression in dementia. Biological Psychiatry. 1988;23:271–284. [PubMed]
  • Diggle P, Heagerty P, Liang K, Zeger S. Analysis of longitudinal data. Oxford Statistical Science Series. 2003
  • Frangakis C, Rubin D. Rejoinder to Discussions on Addressing an Idiosyncrasy in Estimating Survival Curves Using Double Sampling in the Presence of Self-Selected Right Censoring. Biometrics. 2001;57:351–353. [PubMed]
  • Kullback S, Leibler R. On information and sufficiency. Annals of Mathematical Statistics. 1951;22:79–86.
  • Lyketsos C, DelCampo L, Steinberg M, Miles Q, Steele C, Munro C, Baker A, Sheppard J-M, Frangakis C, Brandt K, Rabins P. Treating depression inalzheimer disease: Efficacy and safety of sertraline therapy, and the benefits of depression reduction: The DIADS. Archives of General Psychiatry. 2003;60:737–746. [PubMed]
  • MacCullagh P, Nelder J. Generalized linear models. Chapman & Hall; 1991.
  • Moore K, van der Laan M. Application of time-to-event methods in the assessment of safety in clinical trials. In Design and Analysis of Clinical Trials with Time-to-Event Endpoints. Chapman and Hall/CRC Biostatistics Series. 2009
  • Mulsant B, Pollock B, Nebes R, Miller M, Sweet R, Stack J, Houck P, Bensasi S, Maxumdar S, Reynolds C. A twelve-week, double-blind, randomized comparison of nortriptyline and paroxetine in older depressed inpatients and outpatients. American Journal of Geriatric Psychiatry. 2001;9:406–414. [PubMed]
  • Robins J. Optimal structural nested models for optimal sequential decisions. Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data. 2004
  • Rosenblum M, van der Laan M. Using regression models to analyze randomized trials: Asymptotically valid hypothesis tests depite incorrectly specified models. Biometrics. 2009;65:937–945. [PMC free article] [PubMed]
  • Tsiatis A, Davidian M, Zhang M, Lu X. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Statistics in medicine. 2008;27:4658–4677. [PMC free article] [PubMed]
  • van der Laan M, Polley E, Hubbard A. Super learner. Statistical applications in genetics and molecular biology. 2007;6:25. [PubMed]
  • Van der Vaart A. Asymptotic statistics. Cambridge University Press; Cambridge: 1998.
  • White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–25.