Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC5553128

Formats

Article sections

- Abstract
- 1 INTRODUCTION
- 2 CHANGE-PLANE ANALYSIS
- 3 SAMPLE SIZE CALCULATION
- 4 SIMULATION STUDIES
- 5 APPLICATION TO AIDS DATA
- 6 DISCUSSION
- References

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2017 August 11.

Published in final edited form as:

Published online 2017 April 13. doi: 10.1080/01621459.2016.1166115

PMCID: PMC5553128

NIHMSID: NIHMS777894

Department of Statistics, North Carolina State University, Raleigh, NC 27695.

The publisher's final edited version of this article is available at J Am Stat Assoc

We propose a systematic method for testing and identifying a subgroup with an enhanced treatment effect. We adopts a change-plane technique to first test the existence of a subgroup, and then identify the subgroup if the null hypothesis on non-existence of such a subgroup is rejected. A semiparametric model is considered for the response with an unspecified baseline function and an interaction between a subgroup indicator and treatment. A doubly-robust test statistic is constructed based on this model, and asymptotic distributions of the test statistic under both null and local alternative hypotheses are derived. Moreover, a sample size calculation method for subgroup detection is developed based on the proposed statistic. The finite sample performance of the proposed test is evaluated via simulations. Finally, the proposed methods for subgroup identification and sample size calculation are applied to a data from an AIDS study.

Classical clinical trials are designed to assess therapeutic benefits of treatments for the whole population that has been considered. However, due to patients’ heterogeneity in response to treatments, it is likely that a new treatment is effective or has an enhanced effect compared to a standard treatment only for a specific subpopulation. By making use of patient-specific baseline covariates, subgroup analysis aims to identify subgroups of patients with enhanced treatment effects, which can help to narrow down the target population of a treatment. Hence, it provides an important tool for assessing treatment effects and selecting target populations for future studies.

A number of data-driven approaches have been developed for the subgroup identification. Song and Pepe (2004) considered the binary response case and proposed using the selection impact curve (SIC) to evaluate treatment policies dictated by a single covariate. Then, based on the SIC, an optimal division of the population for assigning treatments can be obtained. Bonetti and Gelber (2004) grouped patients by values of a single covariate and estimated treatment effects on overlapping subsets of patients using a moving average procedure. Kuk et al. (2010) used recursive subsetting algorithm for identifying subgroups who respond to treatment with high prediction accuracy for clinical outcomes. Foster et al. (2011) developed a “Virtual Twins” method which first predicted the probabilities of response to treatment and control, and then used tree methods to obtain the subgroups with an enhanced treatment effect. Cai et al. (2011) and Zhao et al. (2013) proposed using parametric scoring systems based on multiple baseline covariates to rank treatment effects and then identified patients who benefit more from the new treatment. There are, however, well known risks for undertaking subgroup analysis (Assmann et al., 2000; Wang et al., 2007). For example, subgroup identification may suffer from false positive findings without being performed with a sound statistical hypothesis testing procedure.

Recently, Shen and He (2015) considered a linear logistic-normal mixture model for the response and developed a likelihood-based test for the existence of a subgroup. If a subgroup exists as indicated by the test, the fitted logistic regression model for the subgroup indicator can be used to score patients for treatment selection. The method proposed in Shen and He (2015) provides a valid test for detecting the subgroup with the following two limitations. First, the method relies on some parametric assumptions, such as linear covariate effects and a logistic-normal mixture model for the response, which may be restrictive in applications. Second, since the subgroup is defined by a latent binary variable, the fitted logistic probability for the subgroup indicator is used for treatment selection. This requires selecting a proper threshold parameter, which can be subjective.

In this paper, we consider change-plane analysis for subgroup detection and sample size calculation. Our contribution over the literature can be summarized in the following three folds. First, we consider a semiparametric model with an unspecified baseline function and an interaction between a subgroup indicator and the treatment for the mean response, which greatly improves the flexibility of the response models considered in the literature. In addition, the subgroup indicator is explicitly defined by a change-plane as a function of covariates. Second, adopting techniques similar as those in change-point analysis (Liang et al., 1990; Andrews, 1993; Bai, 1997), we propose a doubly-robust score-type statistic for testing the existence of a subgroup with an enhanced treatment effect. The proposed test is doubly-robust in the sense that it is valid when either the baseline function or the propensity score model is correctly specified. If the null hypothesis that a subgroup does not exist is rejected, the change-plane that defines the subgroup can be estimated by maximizing the score-type statistic. Third, we derive the asymptotic distributions of the proposed statistic under both the null and the local alternative hypotheses. A resampling method is developed to approximate the asymptotic null distribution of the test statistic. Based on the derived asymptotic distributions, we also propose a sample size calculation procedure to design a randomized clinical trial for subgroup detection, which has been seldom studied in the literature.

The remainder of this paper is organized as follows. Section 2 introduces the considered semi-parametric model and the proposed doubly-robust score test statistic for subgroup detection. The asymptotic distributions of the test statistic under both the null and local alternative hypotheses are also presented. Section 3 presents a sample size calculation procedure based on the proposed test. The numerical performance of the proposed test and the associated sample size calculation method are evaluated by simulation studies in Section 4. An application of the proposed method to a data from the AIDS Clinical Trials Group protocol 175 is illustrated in Section 5. The paper is concluded with some discussions in Section 6. All the technical derivations are given in the Appendix.

Let ** X** denote the baseline covariates collected for a subject in an experimental or observational study,

$${Y}_{i}=\mu \left({\mathit{X}}_{i}\right)+\tau {A}_{i}1({\theta}^{T}{\mathit{X}}_{i}\ge 0)+{\u03f5}_{i},$$

(1)

where *μ*(**X**) is an unknown baseline mean function for patients in treatment 0, 1(·) is the indicator function, and *E*(*ε _{i}|A_{i}, X_{i}*)) = 0. We assume that the first element of

The proposed model is flexible since it puts no assumptions on the baseline mean function. On the other hand, it places constraints on the form of subgroup and the treatment effect for the subgroup, which are directly related to our goal of subgroup detection and identification. Semipara-metric models analogous to model (1) have been considered in the literature for deriving optimal treatment regimes (Murphy, 2003; Robins, 2004). The difference is the way that the interaction between treatment *A* and covariates ** X** is modeled. For the subgroup identification problem, we consider the interaction term

$${H}_{0}:\tau =0\phantom{\rule{thickmathspace}{0ex}}\text{versus}\phantom{\rule{thickmathspace}{0ex}}{H}_{a}:\tau \ne 0.$$

(2)

When * θ* is known, model (1) fits in the class of semiparametric models considered in Robins and Rotnizky (2001). Based on the semiparametric theory (Tsiatis, 2007), a class of doubly-robust estimating equations for

$$\sum _{i=1}^{n}\lambda \left({\mathit{X}}_{i}\right)\{{A}_{i}-\pi \left({\mathit{X}}_{i}\right)\}\left\{{Y}_{i}-h\left({\mathit{X}}_{i}\right)-\tau {A}_{i}1\left({\theta}^{T}{\mathit{X}}_{i}\ge 0\right)\right\}=0,$$

(3)

where *λ*(* X*) and

Under the assumption that the random errors *ε _{i}*'s are homoscedastic, the most efficient doubly-robust estimating equation is obtained by setting

$$\sum _{i=1}^{n}{\psi}_{1}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )\equiv \sum _{i=1}^{n}1\left({\theta}^{T}{\mathit{X}}_{i}\ge 0\right)\{{A}_{i}-\pi ({\mathit{X}}_{i},\stackrel{~}{\gamma})\}\left\{{Y}_{i}-h({\mathit{X}}_{i},\stackrel{~}{\beta})\right\},$$

where $\stackrel{~}{\eta}={({\stackrel{~}{\beta}}^{T},{\stackrel{~}{\gamma}}^{T})}^{T}$, $\stackrel{~}{\beta}$ is an estimator of *β* under the null, and $\stackrel{~}{\gamma}$ is an estimator of *γ*. Specifically, $\stackrel{~}{\beta}$ and $\stackrel{~}{\gamma}$ are solutions to the following equations

$${\Psi}_{2n}\left(\beta \right)=\frac{1}{n}\sum _{i=1}^{n}{\psi}_{2}({\mathit{Z}}_{i},\beta )=\frac{1}{n}\sum _{i=1}^{n}{D}_{\beta}\left({\mathit{X}}_{i}\right)\{{Y}_{i}-h({\mathit{X}}_{i},\beta )\}=0,$$

$${\Psi}_{3n}\left(\gamma \right)=\frac{1}{n}\sum _{i=1}^{n}{\psi}_{3}({\mathit{Z}}_{i},\gamma )=\frac{1}{n}\sum _{i=1}^{n}{D}_{\gamma}\left({\mathit{X}}_{i}\right)\{{A}_{i}-\pi ({\mathit{X}}_{i},\gamma )\}=0,$$

where *D _{β}*(

Here, although two parametric models are considered for fitting the baseline mean and propensity score functions, we do not require that both models hold in our theoretical derivation. In fact, our theoretical results show that the proposed test is valid when the model for either the baseline mean or propensity score function is correctly specified but not necessarily both, i.e. the so-called doubly robust property. In particular, when the propensity score is known as in randomized clinical trials, the proposed test is valid for any nonparametric baseline mean function, regardless of correctness of the posited linear model. Such theoretical results were also justified by our simulation studies. In this sense, although the proposed test has a parametric form, it is semiparametric in nature.

Note that model (1) does not depend on * θ* when

$${T}_{n}=\underset{\theta \in \u03f4}{\mathrm{sup}}\frac{{\left\{{\sum}_{i=1}^{n}{\psi}_{1}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )\right\}}^{2}}{n{\stackrel{~}{V}}_{S}\left(\theta \right)},$$

(4)

where $\u03f4=\{\theta \in {\mathbb{R}}^{p+1}:\Vert \theta \Vert =1\}$ and ${\stackrel{~}{V}}_{S}\left(\theta \right)$ is a consistent estimator for the asymptotic variance of ${n}^{-1\u22152}{\sum}_{i=1}^{n}{\psi}_{1}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )$ under the null hypothesis. The definition of ${\stackrel{~}{V}}_{S}\left(\theta \right)$ is given in the next section.

To compute the test statistic *T _{n}*, we need to find the supremum of squared score test statistics over a unit ball in ${\mathbb{R}}^{p+1}$. Since it is infeasible to get the supremum explicitly, we use a numerical method to find the maximum over the space

$$\{\begin{array}{c}{\theta}_{0}=\mathrm{cos}\left({\varphi}_{1}\right),\hfill \\ \cdots \hfill \\ {\theta}_{p-1}=\mathrm{sin}\left({\varphi}_{1}\right)\mathrm{sin}\left({\varphi}_{2}\right)\cdots \mathrm{cos}\left({\varphi}_{p}\right),\hfill \\ {\theta}_{p}=\mathrm{sin}\left({\varphi}_{1}\right)\mathrm{sin}\left({\varphi}_{2}\right)\cdots \mathrm{sin}\left({\varphi}_{p}\right).\hfill \end{array}\phantom{\}}$$

We consider a set of grid points of *ϕ* over [0, *π*]^{p-1} × [0, 2*π*) and compute the maximum of squared score statistics over the set of grid points to approximate *T _{n}*.

In the next section, we establish the asymptotic distributions of *T _{n}* under both the null and the local alternative hypotheses. In addition, we propose a resampling method to compute the critical values of the limiting null distribution. When the null hypotheses is rejected, the change-plane parameter

$$\widehat{\theta}=\underset{\theta \in \u03f4}{\mathrm{arg}\phantom{\rule{thinmathspace}{0ex}}\mathrm{sup}}\frac{{\left\{{\sum}_{i=1}^{n}{\psi}_{1}\frac{({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )}{}\right\}}^{2}}{n{\stackrel{~}{V}}_{S}\left(\theta \right)}.$$

(5)

Similar methods for estimating a change point have been studied in the literature (e.g. Bai, 1997). Thus the estimated subgroup with an enhanced treatment effect is $1({\widehat{\theta}}^{T}\mathit{X}\ge 0)$.

Define **Ψ**_{2}(*β*) = *E*{**Ψ**_{2n}(*β*)} and **Ψ**_{3}(*γ*) = *E*{**Ψ**_{3n}(*γ*)}. To establish the asymptotic distributions of *T _{n}*, we make the following assumptions:

- A1. Equations
**Ψ**_{2}(*β*) = 0 and**Ψ**_{3}(*γ*) = 0 have unique solutions*β*_{0}and*γ*_{0}, respectively, and the solutions ${\eta}_{0}={({\beta}_{0}^{T},{\gamma}_{0}^{T})}^{T}$ are in a compact set of the parameter space. - A2. We have$$\sqrt{n}(\stackrel{~}{\beta}-{\beta}_{0})=-{C}_{1}^{-1}\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\psi}_{2}({\mathit{Z}}_{i},{\beta}_{0})+{o}_{p}\left(1\right),$$where$$\sqrt{n}(\stackrel{~}{\gamma}-{\gamma}_{0})=-{C}_{2}^{-1}\sum _{i=1}^{n}{\psi}_{3}({\mathit{Z}}_{i},{\gamma}_{0})+{o}_{p}\left(1\right),$$
*C*_{1}=*E*{*ψ*_{2}(,**Z***β*)/*β*|^{T}_{β=β0}},*C*_{2}=*E*{*ψ*_{3}(,**Z***γ*)/*γ*|^{T}_{γ=γ0}}, and both of them are finite and positive definite deterministic matrices. - A3. The function
*ψ*_{1}(,**Z**;**η**) is twice continuously differentiable with respect to*θ*, and has bounded first and second derivatives.**η** - A4. The function
*E*[{*Y*–*h*(,**X***β*)}^{2}] is uniformly bounded in*β*. - A5. We have 0 <
*P*(≥ 0) < 1 for any**θ**^{T}**X***θ***Θ**.

Assumptions A1 and A2 ensure the consistency and asymptotic normality of $\stackrel{~}{\beta}$ and $\stackrel{~}{\gamma}$ These assumptions are satisfied for many commonly used parametric models under mild conditions, such as a linear model for *h*(* X, β*) and a logistic model for

Suppose that either the baseline mean function h(**X**, β) or the propensity model π(**X**, γ) is correctly specified, but not necessarily both. If Assumptions A1-A5 hold, T_{n} converges in distribution to sup_{θ Θ} G^{2}(**θ**) under H_{0} as n goes to infinity , where{G(**θ**) : **θ** **Θ**} is a mean zero Gaussian process with the covariance function

$$\Sigma ({\theta}_{1},{\theta}_{2})=E\left\{{\psi}_{1\ast}(\mathit{Z},{\eta}_{0};{\theta}_{1}){\psi}_{1\ast}(\mathit{Z},{\eta}_{0};{\theta}_{2})\right\}\u2215\sqrt{E{\psi}_{1\ast}^{2}(\mathit{Z},{\eta}_{0};{\theta}_{1})E{\psi}_{1\ast}^{2}(\mathit{Z},{\eta}_{0};{\theta}_{2})}$$

*for any*
*θ*_{1}, *θ*_{2} **Θ**, *where*

$${\psi}_{1\ast}(\mathit{Z},{\eta}_{0};\theta )={\psi}_{1}(\mathit{Z},{\eta}_{0};\theta )-{K}_{1}^{T}{C}_{1}^{-1}{\psi}_{2}(\mathit{Z},{\beta}_{0})-{K}_{2}^{T}{C}_{2}^{-1}{\psi}_{3}(\mathit{Z},{\gamma}_{0}),$$

*K*_{1} = *E*{*ψ*_{1}(* Z*,

Next, we establish the asymptotic distribution of *T _{n}* under a sequence of local alternatives

Suppose that either the baseline mean function h(**X**, β) or the propensity model π(* X*,

$$\mu \left(\theta \right)=\delta E\left[\left\{1({\theta}_{0}^{T}\mathit{X}\ge 0,{\theta}^{T}\mathit{X}\ge 0){\pi}_{0}\left(\mathit{X}\right)\{1-\pi (\mathit{X},{\gamma}_{0})\right\}\right]\u2215\sqrt{E\left\{{\psi}_{1\ast}^{2}(\mathit{Z},{\eta}_{0};\theta )\right\}}$$

*and the covariance function*
**Σ**(*θ*_{1}, *θ*_{2}), *where θ_{0} is true value of θ and π*

To calculate the critical values for the test, we use a resampling method to approximate the limiting null distribution of the test statistic. Define

$${\widehat{\psi}}_{1\ast}(\mathit{Z},\stackrel{~}{\eta};\theta )={\psi}_{1}(\mathit{Z},\stackrel{~}{\eta};\theta )-{\widehat{K}}_{1}^{T}{\widehat{C}}_{1}^{-1}{\psi}_{2}(\mathit{Z},\stackrel{~}{\beta})-{\widehat{K}}_{2}^{T}{\widehat{C}}_{2}^{-1}{\psi}_{3}(\mathit{Z},\stackrel{~}{\gamma}),$$

where ${\widehat{K}}_{1}$, ${\widehat{K}}_{2}$, ${\widehat{C}}_{1}$, and ${\widehat{C}}_{2}$, are the empirical estimates of their population counterparts. Specifically, ${\widehat{K}}_{1}={\scriptstyle \frac{1}{n}}{\sum}_{i=1}^{n}\partial {\psi}_{1}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )\u2215\partial \beta $, ${\widehat{K}}_{2}={\scriptstyle \frac{1}{n}}{\sum}_{i=1}^{n}\partial {\psi}_{1}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )\u2215\partial \gamma $, ${\widehat{C}}_{1}={\scriptstyle \frac{1}{n}}{\sum}_{i=1}^{n}\partial {\psi}_{2}({\mathit{Z}}_{i},\stackrel{~}{\beta})\u2215\partial {\beta}^{T}$ and ${\widehat{C}}_{2}={\scriptstyle \frac{1}{n}}{\sum}_{i=1}^{n}\partial {\psi}_{3}({\mathit{Z}}_{i},\stackrel{~}{\gamma})\u2215\partial {\gamma}^{T}$. Then, ${\stackrel{~}{V}}_{S}\left(\theta \right)={n}^{-1}{\sum}_{i=1}^{n}{\left\{{\widehat{\psi}}_{1\ast}(\mathit{Z},\stackrel{~}{\eta};\theta )\right\}}^{2}$. We consider the following perturbed test statistic

$${T}_{n}^{\ast}=\underset{\theta \in \u03f4}{\mathrm{sup}}\frac{{\left\{{\sum}_{i=1}^{n}{\xi}_{i}{\widehat{\psi}}_{1\ast}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )\right\}}^{2}}{n{\stackrel{~}{V}}_{S}\left(\theta \right)},$$

where *ξ*_{1}, ..., *ξ _{n}* are i.i.d. standard normal random variables independent of data. By generating a large number of perturbed test statistics, we can use the empirical distribution of ${T}_{n}^{\ast}$ to compute the critical value

Since most clinical trials are designed to detect the overall treatment effect, they may lack power to detect a subgroup with an enhanced treatment effect (Yusuf et al., 1991; Rothwell, 2005). For example, Brookes et al. (2004) has shown that a trial with 80% power for the overall effect had only 29% power to detect an interaction effect of the same magnitude. To appropriately conduct a subgroup analysis with targeted power, a careful design and predefined statistical analysis protocol are important (Assmann et al., 2000; Cui et al., 2002). In this section, we provide a sample size calculation method based on the proposed test for subgroup detection in a randomized clinical trial.

To derive the sample size formula, we first calculate the asymptotic power of the test under the local alternatives *H*_{1n} : *τ* = *n*^{−1/2δ}, where *n* is the sample size. The sample size formula can then be derived at a pre-specified power 1 – *β*. Here we are interested in sample size calculation for a randomized trial, therefore the propensity score is given and there is no need to estimate *γ*. In addition, we assume that the errors *ε _{i}*'s in model (1) are i.i.d. with mean 0 and variance

$$\Sigma ({\theta}_{1},{\theta}_{2})=\frac{E\left\{1({\theta}_{1}^{T}\mathit{X}\ge 0,{\theta}_{2}^{T}\mathit{X}\ge 0)g\left(\mathit{X}\right)\right\}}{\sqrt{E\left\{1({\theta}_{1}^{T}\mathit{X}\ge 0)g\left(\mathit{X}\right)\right\}E\left\{1({\theta}_{2}^{T}\mathit{X}\ge 0)g\left(\mathit{X}\right)\right\}}},$$

where *g*(* X*) =

$$\mu \left(\theta \right)=\delta \pi (1-\pi )\frac{E\left\{1({\theta}_{0}^{T}\mathit{X}\ge 0,{\theta}^{T}\mathit{X}\ge 0)\right\}}{\sqrt{E\left\{1({\theta}^{T}\mathit{X}\ge 0)g\left(\mathit{X}\right)\right\}}}.$$

For an *α*-level test to have 1 – *β* power in detecting an enhanced treatment effect of size *τ*_{0}, we need to find *δ*_{0} such that $P({\mathrm{sup}}_{\theta \in \u03f4}\phantom{\rule{thinmathspace}{0ex}}{G}_{{\delta}_{0}}^{2}\left(\theta \right)>{q}_{\alpha})=1-\beta $, where *q _{α}* is the upper

We conducted extensive simulation studies to investigate the empirical performance of the proposed test for subgroup detection and the estimation for the change-plane parameter * θ* under the alternative hypotheses. In particular, we considered various settings to examine the robustness of the test against the misspecification of the baseline mean function in both randomized and observational studies.

Simulated data with sample sizes *n* = 500 and 1000 were generated based on model (1), where two covariates ** X** = (

- - P-Model I:
*π*() = 0.5;*X* - - P-Model II: $\pi \left(\mathit{X}\right)={\scriptstyle \frac{\mathrm{exp}({\gamma}_{0}+{\gamma}_{1}{X}_{1}+{\gamma}_{2}{X}_{2})}{1+\mathrm{exp}({\gamma}_{0}+{\gamma}_{1}{X}_{1}{X}_{1}+{\gamma}_{2}{X}_{2}}}$,
*γ*_{0}= 0,*γ*_{1}=*γ*_{2}= 0.5.

The two settings represent a randomized clinical trial and an observational study, respectively. We also considered three baseline mean functions for *μ*(* X*) (In short as B-Model hereinafter):

- - B-Model I:
*μ*() =**X***β*_{0}+*β*_{1}*X*_{1}+*β*_{2}*X*_{2},*β*_{0}=*β*_{1}=*β*_{2}= 1; - - B-Model II:
*μ*() =**X***β*_{0}+*β*_{1}*X*_{1}+ ${\beta}_{3}{\mathit{X}}_{2}^{2}$,*β*_{0}= 1,*β*_{1}= 0.5,*β*_{2}= 0,*β*_{3}= 1; - - B-Model III:
*μ*() =**X***β*_{0}+*β*_{1}sin(*β*_{2}*X*_{1}+*β*_{3}π*X*_{2}),*β*_{0}=*β*_{1}=*β*_{2}=*β*_{3}= 1.

The proposed test was implemented on each simulated dataset. When calculating the test statistic in (4), we fit a linear model *h*(* X*,

For each setting, we simulated 5000 data sets to compute type I errors of the test with the significance level of 0.05 and 0.1. The results in Table 1 show that the empirical type I errors are all close to their nominal values, which demonstrate the validity and robustness of the proposed test for subgroup detection.

Under alternative hypotheses, the enhanced treatment effect in the subgroup is set to be *τ* = ±0.1, ± 0.25 and ±0.5. In addition, the true change-plane parameter is chosen as *θ*_{0} = (−0.15, 0.3, 0.942)* ^{T}* for all settings. With this choice of

Empirical powers based on 1000 simulated datasets for all settings are shown in Table 2. As expected, the powers for detecting the subgroup increase as the sample size *n* or the magnitude of treatment effect *τ* increases. When the magnitude of treatment effect *τ* increases to 0.5, the powers are almost 100% for all settings. The powers for B-Model II are comparable to those for B-Model I, while the powers for B-Model III are relatively smaller than those of B-Models I and II. One explanation may be that since B-Model III has a very nonlinear baseline mean function, a posited linear model may not be a good fit and thus lost some efficiency.

Next, we estimated the change-plane parameter * θ* by (5). We report the bias and the empirical standard deviation of the estimates $\widehat{\theta}$ in Figure 1. We also report the misclassification rate for identifying the true subgroup in Table 3. The misclassification rate is the proportion of subjects who are misidentified either as members in the subgroup or as members not in the subgroup, and is calculated by ${\scriptstyle \frac{1}{n}}{\sum}_{i=1}^{n}\mid 1({\widehat{\theta}}^{T}\phantom{\rule{thinmathspace}{0ex}}{\mathit{X}}_{i}\ge 0)-1({{\theta}_{0}}^{T}\phantom{\rule{thinmathspace}{0ex}}{\mathit{X}}_{i}\ge 0)\mid $.

Misclassification rate (%) of identified subgroup based on estimated change-plane parameter *θ*. (Standard errors are shown in parentheses.)

Based on the results, it is observed that the biases and standard deviations of the estimates decrease as the sample size or the magnitude of treatment effect increase. In particular, when the magnitude of treatment effect increases to 0.5, all the estimate are nearly unbiased. For small treatment effects, the estimators for ** θ** are underestimated. This may be because that the true

For practical use, we also report the computational time for conducting the proposed test for different sample sizes *n* and number of grid points *K* on the unit ball **Θ.**
Table 4 summarizes the average computational time (in seconds) and its standard deviation over 1000 simulation runs for the setting with B-Model I, P-Model I, *τ* = 0.5 and *M* = 1000. The average computational time increases almost linearly in *n* and *K*. But even with *n* = 2000 and *K* = 10000, the average computational time is less than 2 minutes.

Furthermore, we compare the proposed method with the test of Shen and He (2015) (named EM test) in terms of power for detecting the subgroup and with the method of Zhao et al. (2013) in terms of accuracy for identifying the true subgroup. Specifically, when comparing with the EM test of Shen and He (2015), we consider the simulation settings with P-Model I and B-Model I/II/III at *τ* = 0, 0.1 and 0.5. Note that under B-Model I, since the null model is a linear model, the EM test is a valid test. The type-I error and power results are reported in Table 5. We only present the results for B-Model I. Based on the results, we observe that when the baseline mean model is linear (B-Model I), the EM test has correct type I error at both levels, however, its empirical power is smaller than that of the proposed test, especially for *τ* = 0.1. When the baseline mean model is not linear (B-Model II and III), the model assumption for the EM test of Shen and He (2015) is violated. Our results (not presented here) show that the EM test has severely inflated type I error and hence is not valid as expected, while the proposed test still has correct type I error due to the doubly robust property.

Type I error and power (%) of the EM test and proposed test for B-Model I at 0.05 and 0.1 levels. (Standard errors are shown in parenthesis.)

In Zhao et al. (2013), the subgroup of interest is defined by $\{\widehat{D}\left(\mathit{X}\right)\ge c\}$, where $\widehat{D}\left(\mathit{X}\right)$ is the estimated treatment difference and *c* is a threshold that will be determined based on data. Specifically, they propose an estimate of the average treatment effect in the subgroup defined by $\{\widehat{D}\left(\mathit{X}\right)\ge c\}$ and denote it by $\widehat{\mathrm{AD}}\left(c\right)$. Then, under the monotonicity assumption, the threshold *c* is a chosen to solve the equation $\widehat{\mathrm{AD}}\left(c\right)=\tau $, where *τ* is a desired treatment effect in the subgroup. In our implementation, following Zhao et al. (2013), we first fit a linear model: $E(Y\mid \mathit{X},A)={\beta}_{1}^{T}\mathit{X}+\left({\beta}_{2}^{T}\mathit{X}\right)A$. Note that in our notation, the covariates * X* include a column of 1 as the first column. The estimated treatment difference is $\widehat{D}\left(\mathit{X}\right)={\widehat{\beta}}_{2}^{T}\mathit{X}$ with the subgroup defined by $\{{\widehat{\beta}}_{2}^{T}\mathit{X}\ge c\}$ Then, we obtain the estimated average treatment difference $\widehat{\mathrm{AD}}\left(c\right)$ in the subgroup and determine the threshold

Misclassification rates (%) of the identified subgroups using Zhao et al. (2013)'s method and our method. (Standard errors are shown in parenthesis.)

In this section, we conducted a simulation study to evaluate the proposed sample size calculation procedure for a randomized trial with the equal treatment assignment probability, i.e. *π* = 0.5. In this simulation study, we considered a single covariate *X* from a uniform distribution on (−1,1) in all settings. The subgroup of interest is defined by *X* ≥ *θ*_{0}, where *θ*_{0} was chosen as: −0.5, 0 and 0.5, corresponding to the scenarios that 75%, 50% and 25% of the population are in the subgroup with an enhanced treatment effect. The variance of the random noise *ε* is set as *σ*^{2} = 0.25. We considered three levels of treatment effects: *τ* = 0.1, 0.25 and 0.5, which represent small, medium and large effects, respectively. In addition, we considered three baseline mean functions: *μ*(*X*) = 1 + *X*, 1 – *X*^{2} and 1 + $\mathrm{sin}\left(\stackrel{~}{\pi}X\right)$, where $\stackrel{~}{\pi}$ is the circumference to diameter ratio. In our test statistics, we fit a linear model *h*(*X*, *β*) for *μ*(*X*). After some calculation, it can be shown that when *μ*(*X*) = 1 – *X*^{2}, *h*(*X*, *β*_{0}) = 2/3; while when $\mu \left(X\right)=1+\mathrm{sin}\left(\stackrel{~}{\pi}X\right)$, $h(X,{\beta}_{0})=1+(3\u2215\stackrel{~}{\pi})X$. Therefore, the difference *μ*(*X*) – *h*(*X*, *β*_{0}) can be calculated accordingly. We calculate the required sample size *n* for the test with level *α* = 0.05 and power 1 − *β* = 90%. For each setting, based on the calculated sample size *n*, we generated 1000 data sets and computed the empirical power of the proposed test statistic. Table 7 summarizes the results. We observe that the empirical powers under all settings are close to the nominal level 90%, which shows the validity of the proposed sample size formula.

Results for sample size calculation. Here, *n* is the required sample size given by our procedure. Empirical power of the test under the calculated sample size is reported based on 1000 data replications.

For comparison, we also compute the required sample size for subgroup analysis based on a simple method proposed by Brookes et al. (2004). The main idea of Brookes et al. (2004) is to inflate the original sample size for testing the overall treatment effect such that the interaction test between treatment and subgroup can achieve the nominal level. For example, when the subgroup contains half of the population, the required sample size for testing the treatment-subgroup interaction is *n* = 4*n _{overall}*, where

Table 8 shows the calculated sample size using the above formula for level *α* = 0.05 and power 1 – *β* = 0.9. We also report the corresponding empirical power of our proposed test under the calculated sample size. Based on these results, the sample sizes calculated using the method of Brookes et al. (2004) are all smaller than those obtained using the proposed sample size formula given in Table 7, and the corresponding power is much lower than the desired 90% level. This demonstrates that simply inflating the sample size for testing the overall treatment effect may not work well for detecting the subgroup and the proposed delicate sample size formula is necessary.

Results for sample size calculation based on the method in Brookes et al. (2004). Here, *n* is the required sample size. Empirical power of the proposed test under the calculated sample size is reported based on 1000 data replications.

We illustrated the proposed method with a data from the AIDS Clinical Trials Group (ACTG) protocol 175 (Hammer et al., 1996), a study that randomized subjects to four different daily regimens: zidovudine (ZDV) monotherapy, ZDV + didanosine (ddI), ZDV + zalcitabine (zal) and ddI monotherapy. We focused on comparing two treatments: ZDV+ddI (treatment 1) and ZDV+zal (treatment 0). There are 522 subjects in treatment 1 and 524 subjects in treatment 0. Following Lu et al. (2013), we considered the CD4 counts (cells/mm^{3}) at 20±5 weeks after randomization as the response and used two covariates for subgroup identification: age (years) and homosexual activity (0=no, 1=yes), denoted as homo.

We applied the proposed method to detect whether there is a subgroup with an enhanced treatment effect. A linear model was used for the baseline function. The value of the test statistic is 21.25, which is calculated based on 200 × 50 grid points of the sphere coordinates. The p-value based on 1000 resamplings is less than 0.001, showing a strong evidence for the existence of a subgroup with an enhanced treatment effect. The estimated change-plane parameter $\widehat{\theta}={(-0.576,0.037,-0.816)}^{T}$. The identified subgroup includes subjects with *age* > 37.64 if *homo* = 1 or with *age* > 15.58 if *homo* = 0. There are 622 subjects included in this subgroup, among which 315 subjects received treatment 1 and 307 received treatment 0. Given the estimated change-plane and the fitted linear model for the baseline mean function, we can estimate the enhanced treatment effect *τ*, which is $\widehat{\tau}=41.6$. Therefore, for patients in the identified subgroup, treatment 1 is better than treatment 0. This agrees with the findings in Lu et al. (2013) that treatment 1 is better than treatment 0 for older patients.

Next, based on the AIDS data, we calculated the required sample size for subgroup detection in future balanced randomized trials. As an illustration, we considered the test with size *α* = 0.05 and power 1–*β* = 0.9. From the AIDS data, we estimated the standard deviation of *ε* as $\widehat{\sigma}=145.9$. Therefore, we set $\sigma =\widehat{\sigma}$ and the true change-plane parameter as *θ*_{0} = (−0.576, 0.037, −0.816)* ^{T}*. Covariate

In this paper, based on a change-plane analysis technique, we developed a doubly-robust testing procedure for detecting a subgroup with an enhanced treatment effect. We established the asymptotic distributions of the proposed test statistic under both the null and the local alternative hypotheses. We also developed its associated sample size calculation method, which is useful for sizing a clinical trial with desired power for subgroup detection.

In our current work, the subgroup with an enhanced treatment effect is defined by a change-plane, which may be restrictive sometimes. It is feasible to extend the way for defining a subgroup from a change-plane to more general forms, for example, a combination of multiple change-planes 1(*θ*_{10} + *θ*_{11}*X*_{1} + *θ*_{12}*X*_{2} ≥ 0) and 1(θ_{20} + θ_{21}*X*_{3} ≥ 0). However, a more complicated form for defining a subgroup requires a more comprehensive way of searching the supremum of squared score-type statistics over the possible space, which may be challenging. One assumption made in the considered semiparametric model is that the enhanced treatment effect in the subgroup is constant. It is also interesting to study a more general situation that the magnitude of the enhanced treatment effect varies for subjects in the subgroup. Lastly, it is likely that many covariates are collected at the baseline but not all of them are useful for subgroup detection. Therefore, a built-in variable selection for subgroup detection will be very helpful, which warrants further investigation.

The authors would like to thank an associate editor and two referees for their thoughtful and constructive comments, which help to improve an earlier version of the paper. This work was partly supported by a NIH grant P01 CA142538.

By Taylor expansion and assumptions A1-A2, we have

$$\begin{array}{cc}\hfill \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\psi}_{1}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )=& \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\left\{{\psi}_{1}({\mathit{Z}}_{i},{\eta}_{0};\theta )-{K}_{1}^{T}{C}_{1}^{-1}{\psi}_{2}(\mathit{Z},{\beta}_{0})-{K}_{2}^{T}{C}_{2}^{-1}{\psi}_{3}(\mathit{Z},{\gamma}_{0})\right\}+{o}_{p}\left(1\right)\hfill \\ \hfill =& \frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\psi}_{1\ast}({\mathit{Z}}_{i},{\eta}_{0};\theta )+{o}_{p}\left(1\right),\hfill \end{array}$$

where *ψ*_{1*} (*Z** _{i}*,

In addition, by assumptions A3-A5, we can show that the class $\mathcal{F}=\{{\psi}_{1\ast}(\mathit{Z},{\eta}_{0};\theta ):\Vert \theta \Vert =1\}$ is P-Donsker. Therefore, ${n}^{-1\u22152}{\sum}_{i=1}^{n}{\psi}_{1}({\mathit{Z}}_{i},\stackrel{~}{\eta};\theta )$ converges weakly to a mean zero Gaussian process with the convariance function $E\left\{{\psi}_{1\ast}(\mathit{Z},{\eta}_{0};{\theta}_{1}){\psi}_{1\ast}(\mathit{Z},{\eta}_{0};{\theta}_{2})\right\}$, where *θ*_{1}, *θ*_{2} **Θ**. Finally, it is easy to show that the variance estimator ${\stackrel{~}{V}}_{S}\left(\theta \right)$ converges uniformly to $E\left\{{\psi}_{1\ast}^{2}(\mathit{Z},{\eta}_{0};\theta )\right\}$ for **θ****Θ** under both the null and the local alternative hypotheses. Therefore, the results established in Theorem 1 hold.

Under the local alternatives, we have the same asymptotic representation (6). In addition,

$$\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\psi}_{1}({\mathit{Z}}_{i},{\eta}_{0};\theta )\phantom{\rule{0ex}{0ex}}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}1\left({\theta}^{T}{\mathit{X}}_{i}\ge 0\right)\{{A}_{i}-\pi ({\mathit{X}}_{i},{\gamma}_{0})\}\left\{{Y}_{i}-h({\mathit{X}}_{i},{\beta}_{0})-\frac{\delta}{\sqrt{n}}{A}_{1}1\left({\theta}_{0}^{T}{\mathit{X}}_{i}\ge 0\right)\right\}\phantom{\rule{0ex}{0ex}}+\frac{1}{n}\sum _{i=1}^{n}\delta 1\left({\theta}^{T}{\mathit{X}}_{i}\ge 0,{\theta}_{0}^{T}{\mathit{X}}_{i}\ge 0\right){A}_{i}\{{A}_{i}-\pi ({\mathit{X}}_{i},{\gamma}_{0})\}.$$

The terms in the first summation are i.i.d. with mean 0 under the local alternatives when either the propensity score mode or the baseline mean function is correctly specified. As in Theorem 1, it can be show that the first summation term converges weakly to the same mean zero Gaussian process as ${\scriptstyle \frac{1}{\sqrt{n}}}{\sum}_{i=1}^{n}{\psi}_{1}({\mathit{Z}}_{i},{\eta}_{0};\theta )$ does under the null. In addition, it can be shown that the second summation term converges uniformly to $\delta E\left[\{1({\theta}_{0}^{T}\mathit{X}\ge 0,{\theta}^{T}\mathit{X}\ge 0){\pi}_{0}\left(\mathit{X}\right)\{1-\pi (\mathit{X},{\gamma}_{0})\}\right]$ for *θ***Θ**. Therefore, the results established in Theorem 2 hold.

- Andrews DW. Tests for parameter instability and structural change with unknown change point. Econometrica: Journal of the Econometric Society. 1993;61(4):821–856.
- Andrews DW. Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica. 2001;69(3):683–734.
- Assmann SF, Pocock SJ, Enos LE, Kasten LE. Subgroup analysis and other (mis) uses of baseline data in clinical trials. The Lancet. 2000;355(9209):1064–1069. [PubMed]
- Bai J. Estimation of a change point in multiple regression models. Review of Economics and Statistics. 1997;79(4):551–563.
- Bonetti M, Gelber RD. Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics. 2004;5(3):465–481. [PubMed]
- Borm GF, Fransen J, Lemmens WA. A simple sample size formula for analysis of covariance in randomized clinical trials. Journal of clinical epidemiology. 2007;60(12):1234–1238. [PubMed]
- Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. Journal of clinical epidemiology. 2004;57(3):229–236. [PubMed]
- Cai T, Tian L, Wong PH, Wei L. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics. 2011;12(2):270–282. [PMC free article] [PubMed]
- Cui L, James Hung H, Wang SJ, Tsong Y. Issues related to subgroup analysis in clinical trials. Journal of biopharmaceutical statistics. 2002;12(3):347–358. [PubMed]
- Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1977;64(2):247–254. [PubMed]
- Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987;74(1):33–43. [PubMed]
- Foster JC, Taylor JM, Ruberg SJ. Subgroup identification from randomized clinical trial data. Statistics in medicine. 2011;30(24):2867–2880. [PMC free article] [PubMed]
- Hammer SM, Katzenstein DA, Hughes MD, Gundacker H, Schooley RT, Haubrich RH, Henry WK, Lederman MM, Phair JP, Niu M, et al. A trial comparing nucleoside monotherapy with combination therapy in hiv-infected adults with cd4 cell counts from 200 to 500 per cubic millimeter. New England Journal of Medicine. 1996;335(15):1081–1090. [PubMed]
- Kuk A, Li J, Rush AJ. Recursive subsetting to identify patients in the star* d: a method to enhance the accuracy of early prediction of treatment outcome and to inform personalized care. The Journal of clinical psychiatry. 2010;71(11):1502–1508. [PubMed]
- Liang K-Y, Self SG, Liu X. The cox proportional hazards model with change point: An epidemiologic application. Biometrics. 1990;46(3):783–793. [PubMed]
- Lu W, Zhang HH, Zeng D. Variable selection for optimal treatment decision. Statistical methods in medical research. 2013;22(5):493–504. [PMC free article] [PubMed]
- Murphy SA. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65(2):331–355.
- Robins JM. Proceedings of the Second Seattle Symposium in Biostatistics. Springer; 2004. Optimal structural nested models for optimal sequential decisions. pp. 189–326.
- Robins JM, Rotnizky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: some questions and an answer”. Statistica Sinica. 2001;11(4):920–936.
- Rothwell PM. Subgroup analysis in randomised controlled trials: importance, indications, and interpretation. The Lancet. 2005;365(9454):176–186. [PubMed]
- Shen J, He X. Inference for subgroup analysis with a structured logistic-normal mixture model. Journal of the American Statistical Association. 2015;110(509):303–312.
- Song X, Pepe MS. Evaluating markers for selecting a patient's treatment. Biometrics. 2004;60(4):874–883. [PubMed]
- Tsiatis A. Semiparametric theory and missing data. Springer Science & Business Media; 2007.
- Van der Vaart AW. Asymptotic statistics, volume 3. Cambridge university press; 2000.
- Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in medicinereporting of subgroup analyses in clinical trials. New England Journal of Medicine. 2007;357(21):2189–2194. [PubMed]
- Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. Jama. 1991;266(1):93–98. [PubMed]
- Zhao L, Tian L, Cai T, Claggett B, Wei L-J. Effectively selecting a target population for a future comparative study. Journal of the American Statistical Association. 2013;108(502):527–539. [PMC free article] [PubMed]