PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Am Stat Assoc. Author manuscript; available in PMC 2017 August 11.
Published in final edited form as:
PMCID: PMC5553128
NIHMSID: NIHMS777894

Change-Plane Analysis for Subgroup Detection and Sample Size Calculation

Ailin Fan, graduate student, Rui Song, Associate Professor, and Wenbin Lu, Professor*

Abstract

We propose a systematic method for testing and identifying a subgroup with an enhanced treatment effect. We adopts a change-plane technique to first test the existence of a subgroup, and then identify the subgroup if the null hypothesis on non-existence of such a subgroup is rejected. A semiparametric model is considered for the response with an unspecified baseline function and an interaction between a subgroup indicator and treatment. A doubly-robust test statistic is constructed based on this model, and asymptotic distributions of the test statistic under both null and local alternative hypotheses are derived. Moreover, a sample size calculation method for subgroup detection is developed based on the proposed statistic. The finite sample performance of the proposed test is evaluated via simulations. Finally, the proposed methods for subgroup identification and sample size calculation are applied to a data from an AIDS study.

Keywords: Change-plane analysis, Doubly robust test, Sample size calculation, Semiparametric model, Subgroup analysis

1 INTRODUCTION

Classical clinical trials are designed to assess therapeutic benefits of treatments for the whole population that has been considered. However, due to patients’ heterogeneity in response to treatments, it is likely that a new treatment is effective or has an enhanced effect compared to a standard treatment only for a specific subpopulation. By making use of patient-specific baseline covariates, subgroup analysis aims to identify subgroups of patients with enhanced treatment effects, which can help to narrow down the target population of a treatment. Hence, it provides an important tool for assessing treatment effects and selecting target populations for future studies.

A number of data-driven approaches have been developed for the subgroup identification. Song and Pepe (2004) considered the binary response case and proposed using the selection impact curve (SIC) to evaluate treatment policies dictated by a single covariate. Then, based on the SIC, an optimal division of the population for assigning treatments can be obtained. Bonetti and Gelber (2004) grouped patients by values of a single covariate and estimated treatment effects on overlapping subsets of patients using a moving average procedure. Kuk et al. (2010) used recursive subsetting algorithm for identifying subgroups who respond to treatment with high prediction accuracy for clinical outcomes. Foster et al. (2011) developed a “Virtual Twins” method which first predicted the probabilities of response to treatment and control, and then used tree methods to obtain the subgroups with an enhanced treatment effect. Cai et al. (2011) and Zhao et al. (2013) proposed using parametric scoring systems based on multiple baseline covariates to rank treatment effects and then identified patients who benefit more from the new treatment. There are, however, well known risks for undertaking subgroup analysis (Assmann et al., 2000; Wang et al., 2007). For example, subgroup identification may suffer from false positive findings without being performed with a sound statistical hypothesis testing procedure.

Recently, Shen and He (2015) considered a linear logistic-normal mixture model for the response and developed a likelihood-based test for the existence of a subgroup. If a subgroup exists as indicated by the test, the fitted logistic regression model for the subgroup indicator can be used to score patients for treatment selection. The method proposed in Shen and He (2015) provides a valid test for detecting the subgroup with the following two limitations. First, the method relies on some parametric assumptions, such as linear covariate effects and a logistic-normal mixture model for the response, which may be restrictive in applications. Second, since the subgroup is defined by a latent binary variable, the fitted logistic probability for the subgroup indicator is used for treatment selection. This requires selecting a proper threshold parameter, which can be subjective.

In this paper, we consider change-plane analysis for subgroup detection and sample size calculation. Our contribution over the literature can be summarized in the following three folds. First, we consider a semiparametric model with an unspecified baseline function and an interaction between a subgroup indicator and the treatment for the mean response, which greatly improves the flexibility of the response models considered in the literature. In addition, the subgroup indicator is explicitly defined by a change-plane as a function of covariates. Second, adopting techniques similar as those in change-point analysis (Liang et al., 1990; Andrews, 1993; Bai, 1997), we propose a doubly-robust score-type statistic for testing the existence of a subgroup with an enhanced treatment effect. The proposed test is doubly-robust in the sense that it is valid when either the baseline function or the propensity score model is correctly specified. If the null hypothesis that a subgroup does not exist is rejected, the change-plane that defines the subgroup can be estimated by maximizing the score-type statistic. Third, we derive the asymptotic distributions of the proposed statistic under both the null and the local alternative hypotheses. A resampling method is developed to approximate the asymptotic null distribution of the test statistic. Based on the derived asymptotic distributions, we also propose a sample size calculation procedure to design a randomized clinical trial for subgroup detection, which has been seldom studied in the literature.

The remainder of this paper is organized as follows. Section 2 introduces the considered semi-parametric model and the proposed doubly-robust score test statistic for subgroup detection. The asymptotic distributions of the test statistic under both the null and local alternative hypotheses are also presented. Section 3 presents a sample size calculation procedure based on the proposed test. The numerical performance of the proposed test and the associated sample size calculation method are evaluated by simulation studies in Section 4. An application of the proposed method to a data from the AIDS Clinical Trials Group protocol 175 is illustrated in Section 5. The paper is concluded with some discussions in Section 6. All the technical derivations are given in the Appendix.

2 CHANGE-PLANE ANALYSIS

2.1 The Proposed Model

Let X denote the baseline covariates collected for a subject in an experimental or observational study, A denote the treatment received by the subject, and Y denote his or her response of interest. Here we restrict our attention to a dichotomous treatment coded as 0 and 1, and a continuous response. Let Z = (XT, A, Y)T. The observed data consist of {Zi=(XiT,Ai,Yi)T,i=1,,n}, which are n independent and identically distributed (i.i.d.) copies of Z. Consider the following semiparametric model

Yi=μ(Xi)+τAi1(θTXi0)+ϵi,
(1)

where μ(X) is an unknown baseline mean function for patients in treatment 0, 1(·) is the indicator function, and E(εi|Ai, Xi)) = 0. We assume that the first element of X is 1, X is a (p + 1)-dimensional vector (1, X1, ..., Xp)T, and θ = (θ0, θ1, ..., θp)T, is a (p + 1)-dimensional vector of parameters. For the identifiability of θ, let ||θ|| = 1, where || · || is the [ell]2-norm. When τ = 0, treatments do not have an effect on the response and thus there are no subgroups with enhanced treatment effects. When τ ≠ 0, a subgroup of patients with an enhanced treatment effect exists and is defined by the change-plane 1(θTX ≥ 0).

The proposed model is flexible since it puts no assumptions on the baseline mean function. On the other hand, it places constraints on the form of subgroup and the treatment effect for the subgroup, which are directly related to our goal of subgroup detection and identification. Semipara-metric models analogous to model (1) have been considered in the literature for deriving optimal treatment regimes (Murphy, 2003; Robins, 2004). The difference is the way that the interaction between treatment A and covariates X is modeled. For the subgroup identification problem, we consider the interaction term A1(θTX ≥ 0). To test whether there exists a subgroup with an enhanced treatment effect, it is equivalent to test the hypothesis

H0:τ=0versusHa:τ0.
(2)

2.2 A Doubly-Robust Test

When θ is known, model (1) fits in the class of semiparametric models considered in Robins and Rotnizky (2001). Based on the semiparametric theory (Tsiatis, 2007), a class of doubly-robust estimating equations for τ is given by

i=1nλ(Xi){Aiπ(Xi)}{Yih(Xi)τAi1(θTXi0)}=0,
(3)

where λ(X) and h(X) are arbitrary functions, and π(X) = P(A = 1|X) is the propensity score. It can be shown that when either the baseline mean function h(X) or the propensity score model π(X) is correctly specified, (3) is a consistent estimating equation for τ.

Under the assumption that the random errors εi's are homoscedastic, the most efficient doubly-robust estimating equation is obtained by setting λ(X) = 1(θTX ≥ 0) and h(X) = μ(X). As the true baseline function μ(X) and propensity score model π(X) may not be known in practice, we posit parametric models h(X, β) and π(X, γ) for h(X) and π(X), respectively. Hereinafter, we assume a linear model for h(X, β) and a logistic model for π(X, γ). However, other parametric models can also be used. Define η = (βT, γT)T. We consider the following score test statistic for testing H0 : τ = 0:

i=1nψ1(Zi,η~;θ)i=1n1(θTXi0){Aiπ(Xi,γ~)}{Yih(Xi,β~)},

where η~=(β~T,γ~T)T, β~ is an estimator of β under the null, and γ~ is an estimator of γ. Specifically, β~ and γ~ are solutions to the following equations

Ψ2n(β)=1ni=1nψ2(Zi,β)=1ni=1nDβ(Xi){Yih(Xi,β)}=0,
Ψ3n(γ)=1ni=1nψ3(Zi,γ)=1ni=1nDγ(Xi){Aiπ(Xi,γ)}=0,

where Dβ(Xi) = [partial differential]h(Xi, β)/[partial differential]β and Dγ(Xi) = [π(Xi, γ){1 – π(Xi, γ)}]−1[partial differential]π(Xi, γ)/[partial differential]γ.

Here, although two parametric models are considered for fitting the baseline mean and propensity score functions, we do not require that both models hold in our theoretical derivation. In fact, our theoretical results show that the proposed test is valid when the model for either the baseline mean or propensity score function is correctly specified but not necessarily both, i.e. the so-called doubly robust property. In particular, when the propensity score is known as in randomized clinical trials, the proposed test is valid for any nonparametric baseline mean function, regardless of correctness of the posited linear model. Such theoretical results were also justified by our simulation studies. In this sense, although the proposed test has a parametric form, it is semiparametric in nature.

Note that model (1) does not depend on θ when τ = 0, hence the parameter θ is identifiable only under the alternative hypothesis. This makes the testing problem given in (2) non-regular and standard asymptotic testing framework are not directly applicable. Davies (1977, 1987) consider tests when a nuisance parameter appears under the alternative hypothesis. Andrews (2001) studied such non-regular testing problems for a number of likelihood-based testing procedures under a variety of parametric models. Similar testing problems have also been widely studied for detecting change-points. However, to our knowledge, it remains uninvestigated for detecting the existence of a change-plane based on a semiparametric model. We consider a supremum of squared score test statistics:

Tn=supθϴ{i=1nψ1(Zi,η~;θ)}2nV~S(θ),
(4)

where ϴ={θRp+1:θ=1} and V~S(θ) is a consistent estimator for the asymptotic variance of n12i=1nψ1(Zi,η~;θ) under the null hypothesis. The definition of V~S(θ) is given in the next section.

To compute the test statistic Tn, we need to find the supremum of squared score test statistics over a unit ball in Rp+1. Since it is infeasible to get the supremum explicitly, we use a numerical method to find the maximum over the space Θ. To incorporate the unit ball constraint, it is natural to consider a sphere coordinates transformation ϕ = (ϕ1, ..., ϕp)T 7→ θ, where ϕp ranges over [0, 2π) and other elements of ϕ range over [0, π]. The transformation is given as follows

{θ0=cos(ϕ1),θp1=sin(ϕ1)sin(ϕ2)cos(ϕp),θp=sin(ϕ1)sin(ϕ2)sin(ϕp).}

We consider a set of grid points of ϕ over [0, π]p-1 × [0, 2π) and compute the maximum of squared score statistics over the set of grid points to approximate Tn.

In the next section, we establish the asymptotic distributions of Tn under both the null and the local alternative hypotheses. In addition, we propose a resampling method to compute the critical values of the limiting null distribution. When the null hypotheses is rejected, the change-plane parameter θ can be estimated by

θ^=argsupθϴ{i=1nψ1(Zi,η~;θ)}2nV~S(θ).
(5)

Similar methods for estimating a change point have been studied in the literature (e.g. Bai, 1997). Thus the estimated subgroup with an enhanced treatment effect is 1(θ^TX0).

2.3 Asymptotic Distributions of Tn

Define Ψ2(β) = E{Ψ2n(β)} and Ψ3(γ) = E{Ψ3n(γ)}. To establish the asymptotic distributions of Tn, we make the following assumptions:

  • A1. Equations Ψ2(β) = 0 and Ψ3(γ) = 0 have unique solutions β0 and γ0, respectively, and the solutions η0=(β0T,γ0T)T are in a compact set of the parameter space.
  • A2. We have
    n(β~β0)=C111ni=1nψ2(Zi,β0)+op(1),
    n(γ~γ0)=C21i=1nψ3(Zi,γ0)+op(1),
    where C1 = E{[partial differential]ψ2(Z, β)/[partial differential]βT|β=β0}, C2 = E{[partial differential]ψ3(Z, γ)/[partial differential]γT|γ=γ0}, and both of them are finite and positive definite deterministic matrices.
  • A3. The function ψ1(Z, η;θ) is twice continuously differentiable with respect to η, and has bounded first and second derivatives.
  • A4. The function E[{Yh(X, β)}2] is uniformly bounded in β.
  • A5. We have 0 < P(θTX ≥ 0) < 1 for any θ [set membership] Θ.

Assumptions A1 and A2 ensure the consistency and asymptotic normality of β~ and γ~ These assumptions are satisfied for many commonly used parametric models under mild conditions, such as a linear model for h(X, β) and a logistic model for π(X, γ). The asymptotic distributions of β~ under the null and local alternative hypotheses are similar to those established in Le Cam's third lemma (Van der Vaart (2000, p. 90)). Assumptions A3-A5 are assumed to establish the weak convergence of the process n12i=1nψ1(Zi,η~;θ) indexed by θ.

Theorem 1

Suppose that either the baseline mean function h(X, β) or the propensity model π(X, γ) is correctly specified, but not necessarily both. If Assumptions A1-A5 hold, Tn converges in distribution to supθ [set membership]Θ G2(θ) under H0 as n goes to infinity , where{G(θ) : θ [set membership] Θ} is a mean zero Gaussian process with the covariance function

Σ(θ1,θ2)=E{ψ1(Z,η0;θ1)ψ1(Z,η0;θ2)}Eψ12(Z,η0;θ1)Eψ12(Z,η0;θ2)

for any θ1, θ2 [set membership] Θ, where

ψ1(Z,η0;θ)=ψ1(Z,η0;θ)K1TC11ψ2(Z,β0)K2TC21ψ3(Z,γ0),

K1 = E{[partial differential]ψ1(Z, η0; θ)/[partial differential]β}, and K2 = E{[partial differential]ψ1(Z, η0; θ)/[partial differential]γ}.

Next, we establish the asymptotic distribution of Tn under a sequence of local alternatives H1n: τ = n−1/2δ.

Theorem 2

Suppose that either the baseline mean function h(X, β) or the propensity model π(X,γ) is correctly specified, but not necessarily both. If Assumptions A1-A5 hold, Tn converges in distribution to supθϴGδ2(θ) under H1n as n goes to infinity, where{Gδ(θ) : θ [set membership] Θ} is a Gaussian process with the mean function

μ(θ)=δE[{1(θ0TX0,θTX0)π0(X){1π(X,γ0)}]E{ψ12(Z,η0;θ)}

and the covariance function Σ(θ1, θ2), where θ0 is true value of θ and π0(X) is the true propensity score model.

To calculate the critical values for the test, we use a resampling method to approximate the limiting null distribution of the test statistic. Define

ψ^1(Z,η~;θ)=ψ1(Z,η~;θ)K^1TC^11ψ2(Z,β~)K^2TC^21ψ3(Z,γ~),

where K^1, K^2, C^1, and C^2, are the empirical estimates of their population counterparts. Specifically, K^1=1ni=1nψ1(Zi,η~;θ)β, K^2=1ni=1nψ1(Zi,η~;θ)γ, C^1=1ni=1nψ2(Zi,β~)βT and C^2=1ni=1nψ3(Zi,γ~)γT. Then, V~S(θ)=n1i=1n{ψ^1(Z,η~;θ)}2. We consider the following perturbed test statistic

Tn=supθϴ{i=1nξiψ^1(Zi,η~;θ)}2nV~S(θ),

where ξ1, ..., ξn are i.i.d. standard normal random variables independent of data. By generating a large number of perturbed test statistics, we can use the empirical distribution of Tn to compute the critical value Cα, the upper α quantile of the empirical distribution. Then an α-level test reject the null hypothesis when Tn > Cα.

3 SAMPLE SIZE CALCULATION

Since most clinical trials are designed to detect the overall treatment effect, they may lack power to detect a subgroup with an enhanced treatment effect (Yusuf et al., 1991; Rothwell, 2005). For example, Brookes et al. (2004) has shown that a trial with 80% power for the overall effect had only 29% power to detect an interaction effect of the same magnitude. To appropriately conduct a subgroup analysis with targeted power, a careful design and predefined statistical analysis protocol are important (Assmann et al., 2000; Cui et al., 2002). In this section, we provide a sample size calculation method based on the proposed test for subgroup detection in a randomized clinical trial.

To derive the sample size formula, we first calculate the asymptotic power of the test under the local alternatives H1n : τ = n−1/2δ, where n is the sample size. The sample size formula can then be derived at a pre-specified power 1 – β. Here we are interested in sample size calculation for a randomized trial, therefore the propensity score is given and there is no need to estimate γ. In addition, we assume that the errors εi's in model (1) are i.i.d. with mean 0 and variance σ2. Under this case, the asymptotic covariance function of the test statistic Tn can be simplified as

Σ(θ1,θ2)=E{1(θ1TX0,θ2TX0)g(X)}E{1(θ1TX0)g(X)}E{1(θ2TX0)g(X)},

where g(X) = π(1 – π)[{μ(X) – h(X, β0)}2 + σ2] and π = P(A = 1). In addition, under the local alternatives, the asymptotic mean function of Tn is given by

μ(θ)=δπ(1π)E{1(θ0TX0,θTX0)}E{1(θTX0)g(X)}.

For an α-level test to have 1 – β power in detecting an enhanced treatment effect of size τ0, we need to find δ0 such that P(supθϴGδ02(θ)>qα)=1β, where qα is the upper α-quantile of the distribution of supθ[set membership]Θ G2(θ) is the Gaussian process defined in Theorem 1. Based on the relationship τ0 = n−1/2δ0, the required sample size is given by n = (δ0/τ0)2. To find δ0, we take the following three steps. In Step 1, we compute the mean function μ(θ) and the covariance function Σ(θ1, θ2) via numerical integration, for which we need to specify the true value (θ) in the change-plane, the distribution of covariates X, the difference between the true baseline mean function and the posited mean function, μ(X) − h(X, β0), and σ2, the variance of ε. These quantities can be estimated from historical data or a pilot study. In Step 2, for any given δ, we compute the probability P(supθϴGδ2(θ)>qα) via Monte Carlo simulations detailed as follows. We first approximate supθϴGδ2(θ) by maxk=1,,KGδ2(θk), where θ1, ..., θK is a set of fine grids of θ [set membership] Θ. This can be done by the same sphere coordinates transformation used previously. Next, we generate (W1, ...,WK)T from a multivariate normal distribution with the mean (μ(θ1), ..., μ(θK))T and the variance-covariance matrix {Σ(θk1, θk2) : k1, k2 = 1, ..., K}. Finally, we compute the probability P(supθϴGδ2(θ)>qα) based on the empirical distribution of maxk=1,,KWk2. Note that qα can be calculated similarly by generating (W1, ...,WK)T from a multivariate normal distribution with mean 0 and the same variance-covariance matrix. In practice, we generate a large set, say 10, 000 of maxk=1,,KWk2 to compute the probability. In Step 3, we find δ0 via a grid search.

4 SIMULATION STUDIES

4.1 Test and Estimation

We conducted extensive simulation studies to investigate the empirical performance of the proposed test for subgroup detection and the estimation for the change-plane parameter θ under the alternative hypotheses. In particular, we considered various settings to examine the robustness of the test against the misspecification of the baseline mean function in both randomized and observational studies.

Simulated data with sample sizes n = 500 and 1000 were generated based on model (1), where two covariates X = (X1, X2)T were considered. Here, X1 follows a Bernoulli distribution with the success probability 0.5 and X2 follows a uniform distribution on (−1, 1). The random noise ε is normally distributed with mean zero and variance 0.25. For the treatment assignment indicator A, we considered the following two settings for the propensity score model π(X) (In short as P-Model hereinafter):

  • - P-Model I: π(X) = 0.5;
  • - P-Model II: π(X)=exp(γ0+γ1X1+γ2X2)1+exp(γ0+γ1X1X1+γ2X2, γ0 = 0, γ1 = γ2 = 0.5.

The two settings represent a randomized clinical trial and an observational study, respectively. We also considered three baseline mean functions for μ(X) (In short as B-Model hereinafter):

  • - B-Model I: μ(X) = β0 + β1X1 + β2X2, β0 = β1 = β2 = 1;
  • - B-Model II: μ(X) = β0 + β1X1 + β3X22, β0 = 1, β1 = 0.5, β2 = 0, β3 = 1;
  • - B-Model III: μ(X) = β0 + β1 sin(β2X1 + β3πX2), β0 = β1 = β2 = β3 = 1.

The proposed test was implemented on each simulated dataset. When calculating the test statistic in (4), we fit a linear model h(X, β) for the baseline mean function and a logistic model π(X, γ) for the propensity score. Therefore, the baseline mean function is correctly specified for the setting with B-Model I and is misspecified for the settings with B-Models II and III, while the propensity score model is correctly specified for both P-Model I and II. When calculating the test statistic, we used the spherical coordinates transformation and searched the supremum over K = 100 × 100 grid points, with 100 grid values for each angular coordinate. For each test, we used 1000 resamplings to obtain the critical values of the test. We reported the empirical type I errors and powers of the test. Simulation results are summarized below for both the null (τ ≠ 0) and the alternative (τ ≠ 0), respectively.

4.1.1 Type I Errors

For each setting, we simulated 5000 data sets to compute type I errors of the test with the significance level of 0.05 and 0.1. The results in Table 1 show that the empirical type I errors are all close to their nominal values, which demonstrate the validity and robustness of the proposed test for subgroup detection.

Table 1
Type I errors of the proposed test based on resampling. (Corresponding standard errors for type I errors with size 0.05 and 0.1 are 0.003 and 0.004.)

4.1.2 Powers and Estimates of Change-Plane Parameters

Under alternative hypotheses, the enhanced treatment effect in the subgroup is set to be τ = ±0.1, ± 0.25 and ±0.5. In addition, the true change-plane parameter is chosen as θ0 = (−0.15, 0.3, 0.942)T for all settings. With this choice of θ0, the subgroup with an enhanced treatment effect contains approximately 50% of the population, and includes subjects with X1 = 1 and X2 ≥ −0.159 or X1 = 0 and X2 ≥ 0.159.

Empirical powers based on 1000 simulated datasets for all settings are shown in Table 2. As expected, the powers for detecting the subgroup increase as the sample size n or the magnitude of treatment effect τ increases. When the magnitude of treatment effect τ increases to 0.5, the powers are almost 100% for all settings. The powers for B-Model II are comparable to those for B-Model I, while the powers for B-Model III are relatively smaller than those of B-Models I and II. One explanation may be that since B-Model III has a very nonlinear baseline mean function, a posited linear model may not be a good fit and thus lost some efficiency.

Table 2
Power (%) of the proposed test at 0.05 and 0.1 levels. (Standard errors are shown in parenthesis.)

Next, we estimated the change-plane parameter θ by (5). We report the bias and the empirical standard deviation of the estimates θ^ in Figure 1. We also report the misclassification rate for identifying the true subgroup in Table 3. The misclassification rate is the proportion of subjects who are misidentified either as members in the subgroup or as members not in the subgroup, and is calculated by 1ni=1n1(θ^TXi0)1(θ0TXi0).

Figure 1
Bias and standard deviation of estimated change-plane parameter θ.
Table 3
Misclassification rate (%) of identified subgroup based on estimated change-plane parameter θ. (Standard errors are shown in parentheses.)

Based on the results, it is observed that the biases and standard deviations of the estimates decrease as the sample size or the magnitude of treatment effect increase. In particular, when the magnitude of treatment effect increases to 0.5, all the estimate are nearly unbiased. For small treatment effects, the estimators for θ are underestimated. This may be because that the true θ2 = 0.942, which is likely to be underestimated due to the upper limit of 1 for θ. Similar to the power results, the estimates for B-Model III have larger biases and standard deviations compared to those for B-Model I and II. In addition, Table 3 shows that the misclassification rates also decrease as the sample size n or the magnitude of treatment effect τ increases. When the magnitude of the treatment effect increases to 0.5, most misclassification rates are less than 5% except for those under B-Model III.

For practical use, we also report the computational time for conducting the proposed test for different sample sizes n and number of grid points K on the unit ball Θ. Table 4 summarizes the average computational time (in seconds) and its standard deviation over 1000 simulation runs for the setting with B-Model I, P-Model I, τ = 0.5 and M = 1000. The average computational time increases almost linearly in n and K. But even with n = 2000 and K = 10000, the average computational time is less than 2 minutes.

Table 4
Average computational time (in seconds).

Furthermore, we compare the proposed method with the test of Shen and He (2015) (named EM test) in terms of power for detecting the subgroup and with the method of Zhao et al. (2013) in terms of accuracy for identifying the true subgroup. Specifically, when comparing with the EM test of Shen and He (2015), we consider the simulation settings with P-Model I and B-Model I/II/III at τ = 0, 0.1 and 0.5. Note that under B-Model I, since the null model is a linear model, the EM test is a valid test. The type-I error and power results are reported in Table 5. We only present the results for B-Model I. Based on the results, we observe that when the baseline mean model is linear (B-Model I), the EM test has correct type I error at both levels, however, its empirical power is smaller than that of the proposed test, especially for τ = 0.1. When the baseline mean model is not linear (B-Model II and III), the model assumption for the EM test of Shen and He (2015) is violated. Our results (not presented here) show that the EM test has severely inflated type I error and hence is not valid as expected, while the proposed test still has correct type I error due to the doubly robust property.

Table 5
Type I error and power (%) of the EM test and proposed test for B-Model I at 0.05 and 0.1 levels. (Standard errors are shown in parenthesis.)

In Zhao et al. (2013), the subgroup of interest is defined by {D^(X)c}, where D^(X) is the estimated treatment difference and c is a threshold that will be determined based on data. Specifically, they propose an estimate of the average treatment effect in the subgroup defined by {D^(X)c} and denote it by AD^(c). Then, under the monotonicity assumption, the threshold c is a chosen to solve the equation AD^(c)=τ, where τ is a desired treatment effect in the subgroup. In our implementation, following Zhao et al. (2013), we first fit a linear model: E(YX,A)=β1TX+(β2TX)A. Note that in our notation, the covariates X include a column of 1 as the first column. The estimated treatment difference is D^(X)=β^2TX with the subgroup defined by {β^2TXc} Then, we obtain the estimated average treatment difference AD^(c) in the subgroup and determine the threshold c using the same method as in Zhao et al. (2013). We consider the simulation settings with P-Model I and B-Model I/II/III at τ = 0.1 and 0.5. Misclassification rates of the identified subgroups using Zhao et al. (2013)'s method and our method compared with the true subgroup are reported in Table 6. The results show that the identified subgroups using Zhao et al. (2013)'s method have much higher misclassification rates compared with the proposed method.

Table 6
Misclassification rates (%) of the identified subgroups using Zhao et al. (2013)'s method and our method. (Standard errors are shown in parenthesis.)

4.2 Sample Size Calculation Examples

In this section, we conducted a simulation study to evaluate the proposed sample size calculation procedure for a randomized trial with the equal treatment assignment probability, i.e. π = 0.5. In this simulation study, we considered a single covariate X from a uniform distribution on (−1,1) in all settings. The subgroup of interest is defined by Xθ0, where θ0 was chosen as: −0.5, 0 and 0.5, corresponding to the scenarios that 75%, 50% and 25% of the population are in the subgroup with an enhanced treatment effect. The variance of the random noise ε is set as σ2 = 0.25. We considered three levels of treatment effects: τ = 0.1, 0.25 and 0.5, which represent small, medium and large effects, respectively. In addition, we considered three baseline mean functions: μ(X) = 1 + X, 1 – X2 and 1 + sin(π~X), where π~ is the circumference to diameter ratio. In our test statistics, we fit a linear model h(X, β) for μ(X). After some calculation, it can be shown that when μ(X) = 1 – X2, h(X, β0) = 2/3; while when μ(X)=1+sin(π~X), h(X,β0)=1+(3π~)X. Therefore, the difference μ(X) – h(X, β0) can be calculated accordingly. We calculate the required sample size n for the test with level α = 0.05 and power 1 − β = 90%. For each setting, based on the calculated sample size n, we generated 1000 data sets and computed the empirical power of the proposed test statistic. Table 7 summarizes the results. We observe that the empirical powers under all settings are close to the nominal level 90%, which shows the validity of the proposed sample size formula.

Table 7
Results for sample size calculation. Here, n is the required sample size given by our procedure. Empirical power of the test under the calculated sample size is reported based on 1000 data replications.

For comparison, we also compute the required sample size for subgroup analysis based on a simple method proposed by Brookes et al. (2004). The main idea of Brookes et al. (2004) is to inflate the original sample size for testing the overall treatment effect such that the interaction test between treatment and subgroup can achieve the nominal level. For example, when the subgroup contains half of the population, the required sample size for testing the treatment-subgroup interaction is n = 4noverall, where noverall is the required sample size for testing the overall treatment effect. In our considered simulation settings, θ0 = 0 corresponds to the case that the subgroup contains half of the population. We only consider this setting in our comparison. Based on their formula, we need to calculate the sample size noverall for testing the overall treatment effect based on model Y = β0 + β1X + τA + ε. Here, we use the sample size formula for analysis of covariance proposed in Borm et al. (2007). Specifically, noverall = (1 – ρ2)nt, where nt=4(zα2+zβ)2σ2τ2 is the required sample size for a standard two-sample t-test, σ2 is the variance of ε, and ρ is the correlation between X and Y.

Table 8 shows the calculated sample size using the above formula for level α = 0.05 and power 1 – β = 0.9. We also report the corresponding empirical power of our proposed test under the calculated sample size. Based on these results, the sample sizes calculated using the method of Brookes et al. (2004) are all smaller than those obtained using the proposed sample size formula given in Table 7, and the corresponding power is much lower than the desired 90% level. This demonstrates that simply inflating the sample size for testing the overall treatment effect may not work well for detecting the subgroup and the proposed delicate sample size formula is necessary.

Table 8
Results for sample size calculation based on the method in Brookes et al. (2004). Here, n is the required sample size. Empirical power of the proposed test under the calculated sample size is reported based on 1000 data replications.

5 APPLICATION TO AIDS DATA

We illustrated the proposed method with a data from the AIDS Clinical Trials Group (ACTG) protocol 175 (Hammer et al., 1996), a study that randomized subjects to four different daily regimens: zidovudine (ZDV) monotherapy, ZDV + didanosine (ddI), ZDV + zalcitabine (zal) and ddI monotherapy. We focused on comparing two treatments: ZDV+ddI (treatment 1) and ZDV+zal (treatment 0). There are 522 subjects in treatment 1 and 524 subjects in treatment 0. Following Lu et al. (2013), we considered the CD4 counts (cells/mm3) at 20±5 weeks after randomization as the response and used two covariates for subgroup identification: age (years) and homosexual activity (0=no, 1=yes), denoted as homo.

We applied the proposed method to detect whether there is a subgroup with an enhanced treatment effect. A linear model was used for the baseline function. The value of the test statistic is 21.25, which is calculated based on 200 × 50 grid points of the sphere coordinates. The p-value based on 1000 resamplings is less than 0.001, showing a strong evidence for the existence of a subgroup with an enhanced treatment effect. The estimated change-plane parameter θ^=(0.576,0.037,0.816)T. The identified subgroup includes subjects with age > 37.64 if homo = 1 or with age > 15.58 if homo = 0. There are 622 subjects included in this subgroup, among which 315 subjects received treatment 1 and 307 received treatment 0. Given the estimated change-plane and the fitted linear model for the baseline mean function, we can estimate the enhanced treatment effect τ, which is τ^=41.6. Therefore, for patients in the identified subgroup, treatment 1 is better than treatment 0. This agrees with the findings in Lu et al. (2013) that treatment 1 is better than treatment 0 for older patients.

Next, based on the AIDS data, we calculated the required sample size for subgroup detection in future balanced randomized trials. As an illustration, we considered the test with size α = 0.05 and power 1–β = 0.9. From the AIDS data, we estimated the standard deviation of ε as σ^=145.9. Therefore, we set σ=σ^ and the true change-plane parameter as θ0 = (−0.576, 0.037, −0.816)T. Covariate age is assumed from a normal distribution with estimated mean 35.33 and standard deviation 8.75 and covariate homo is from a Bernoulli distribution with the success probability 0.66, similar to those in the AIDS data. For simplicity, we set the difference μ(X) – h(X, β0) = 0. The estimated sample sizes for different treatment effect sizes are given in Table 9. The estimated sample size has a wide range, which is common in practice since the weaker the treatment effect is, the larger the sample size is required. For this AIDS study, there were 1046 subjects receiving either treatment 1 or treatment 0. Therefore, the proposed test can approximately achieve 90% power for identifying a subgroup with an enhanced treatment effect τ = 60.

Table 9
Required sample sizes for detecting a subgroup with an enhanced treatment effect τ based on the AIDS study data.

6 DISCUSSION

In this paper, based on a change-plane analysis technique, we developed a doubly-robust testing procedure for detecting a subgroup with an enhanced treatment effect. We established the asymptotic distributions of the proposed test statistic under both the null and the local alternative hypotheses. We also developed its associated sample size calculation method, which is useful for sizing a clinical trial with desired power for subgroup detection.

In our current work, the subgroup with an enhanced treatment effect is defined by a change-plane, which may be restrictive sometimes. It is feasible to extend the way for defining a subgroup from a change-plane to more general forms, for example, a combination of multiple change-planes 1(θ10 + θ11X1 + θ12X2 ≥ 0) and 1(θ20 + θ21X3 ≥ 0). However, a more complicated form for defining a subgroup requires a more comprehensive way of searching the supremum of squared score-type statistics over the possible space, which may be challenging. One assumption made in the considered semiparametric model is that the enhanced treatment effect in the subgroup is constant. It is also interesting to study a more general situation that the magnitude of the enhanced treatment effect varies for subjects in the subgroup. Lastly, it is likely that many covariates are collected at the baseline but not all of them are useful for subgroup detection. Therefore, a built-in variable selection for subgroup detection will be very helpful, which warrants further investigation.

Acknowledgments

The authors would like to thank an associate editor and two referees for their thoughtful and constructive comments, which help to improve an earlier version of the paper. This work was partly supported by a NIH grant P01 CA142538.

Appendix: Proofs of Theorems

Proof of Theorem 1

By Taylor expansion and assumptions A1-A2, we have

1ni=1nψ1(Zi,η~;θ)=1ni=1n{ψ1(Zi,η0;θ)K1TC11ψ2(Z,β0)K2TC21ψ3(Z,γ0)}+op(1)=1ni=1nψ1(Zi,η0;θ)+op(1),

where ψ1* (Zi, η0; θ)'s are i.i.d. with mean 0 under the null when either the propensity score model or the baseline mean function is correctly specified.

In addition, by assumptions A3-A5, we can show that the class F={ψ1(Z,η0;θ):θ=1} is P-Donsker. Therefore, n12i=1nψ1(Zi,η~;θ) converges weakly to a mean zero Gaussian process with the convariance function E{ψ1(Z,η0;θ1)ψ1(Z,η0;θ2)}, where θ1, θ2 [set membership] Θ. Finally, it is easy to show that the variance estimator V~S(θ) converges uniformly to E{ψ12(Z,η0;θ)} for θ [set membership] Θ under both the null and the local alternative hypotheses. Therefore, the results established in Theorem 1 hold.

Proof of Theorem 2

Under the local alternatives, we have the same asymptotic representation (6). In addition,

1ni=1nψ1(Zi,η0;θ)=1ni=1n1(θTXi0){Aiπ(Xi,γ0)}{Yih(Xi,β0)δnA11(θ0TXi0)}+1ni=1nδ1(θTXi0,θ0TXi0)Ai{Aiπ(Xi,γ0)}.

The terms in the first summation are i.i.d. with mean 0 under the local alternatives when either the propensity score mode or the baseline mean function is correctly specified. As in Theorem 1, it can be show that the first summation term converges weakly to the same mean zero Gaussian process as 1ni=1nψ1(Zi,η0;θ) does under the null. In addition, it can be shown that the second summation term converges uniformly to δE[{1(θ0TX0,θTX0)π0(X){1π(X,γ0)}] for θ [set membership] Θ. Therefore, the results established in Theorem 2 hold.

References

  • Andrews DW. Tests for parameter instability and structural change with unknown change point. Econometrica: Journal of the Econometric Society. 1993;61(4):821–856.
  • Andrews DW. Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica. 2001;69(3):683–734.
  • Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis) uses of baseline data in clinical trials. The Lancet. 2000;355(9209):1064–1069. [PubMed]
  • Bai J. Estimation of a change point in multiple regression models. Review of Economics and Statistics. 1997;79(4):551–563.
  • Bonetti M, Gelber RD. Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics. 2004;5(3):465–481. [PubMed]
  • Borm GF, Fransen J, Lemmens WA. A simple sample size formula for analysis of covariance in randomized clinical trials. Journal of clinical epidemiology. 2007;60(12):1234–1238. [PubMed]
  • Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. Journal of clinical epidemiology. 2004;57(3):229–236. [PubMed]
  • Cai T, Tian L, Wong PH, Wei L. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics. 2011;12(2):270–282. [PMC free article] [PubMed]
  • Cui L, James Hung H, Wang SJ, Tsong Y. Issues related to subgroup analysis in clinical trials. Journal of biopharmaceutical statistics. 2002;12(3):347–358. [PubMed]
  • Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1977;64(2):247–254. [PubMed]
  • Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987;74(1):33–43. [PubMed]
  • Foster JC, Taylor JM, Ruberg SJ. Subgroup identification from randomized clinical trial data. Statistics in medicine. 2011;30(24):2867–2880. [PMC free article] [PubMed]
  • Hammer SM, Katzenstein DA, Hughes MD, Gundacker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine. 1996;335(15):1081–1090. [PubMed]
  • Kuk A, Li J, Rush AJ. Recursive subsetting to identify patients in the star* d: a method to enhance the accuracy of early prediction of treatment outcome and to inform personalized care. The Journal of clinical psychiatry. 2010;71(11):1502–1508. [PubMed]
  • Liang K-Y, Self SG, Liu X. The cox proportional hazards model with change point: An epidemiologic application. Biometrics. 1990;46(3):783–793. [PubMed]
  • Lu W, Zhang HH, Zeng D. Variable selection for optimal treatment decision. Statistical methods in medical research. 2013;22(5):493–504. [PMC free article] [PubMed]
  • Murphy SA. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65(2):331–355.
  • Robins JM. Proceedings of the Second Seattle Symposium in Biostatistics. Springer; 2004. Optimal structural nested models for optimal sequential decisions. pp. 189–326.
  • Robins JM, Rotnizky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: some questions and an answer”. Statistica Sinica. 2001;11(4):920–936.
  • Rothwell PM. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. The Lancet. 2005;365(9454):176–186. [PubMed]
  • Shen J, He X. Inference for subgroup analysis with a structured logistic-normal mixture model. Journal of the American Statistical Association. 2015;110(509):303–312.
  • Song X, Pepe MS. Evaluating markers for selecting a patient's treatment. Biometrics. 2004;60(4):874–883. [PubMed]
  • Tsiatis A. Semiparametric theory and missing data. Springer Science & Business Media; 2007.
  • Van der Vaart AW. Asymptotic statistics, volume 3. Cambridge university press; 2000.
  • Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in medicinereporting of subgroup analyses in clinical trials. New England Journal of Medicine. 2007;357(21):2189–2194. [PubMed]
  • Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. Jama. 1991;266(1):93–98. [PubMed]
  • Zhao L, Tian L, Cai T, Claggett B, Wei L-J. Effectively selecting a target population for a future comparative study. Journal of the American Statistical Association. 2013;108(502):527–539. [PMC free article] [PubMed]