Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Electron J Stat. Author manuscript; available in PMC 2018 January 1.
Published in final edited form as:
Electron J Stat. 2017; 11(1): 364–384.
doi:  10.1214/17-EJS1226
PMCID: PMC5612500

Semiparametric Single-Index Model for Estimating Optimal Individualized Treatment Strategy


Different from the standard treatment discovery framework which is used for finding single treatments for a homogenous group of patients, personalized medicine involves finding therapies that are tailored to each individual in a heterogeneous group. In this paper, we propose a new semiparametric additive single-index model for estimating individualized treatment strategy. The model assumes a flexible and nonparametric link function for the interaction between treatment and predictive covariates. We estimate the rule via monotone B-splines and establish the asymptotic properties of the estimators. Both simulations and an real data application demonstrate that the proposed method has a competitive performance.

Keywords: Personalized medicine, Single index model, Semiparametric inference

1. Introduction

In modern clinical researches, the goal to achieve better outcomes as well as lower cost and burden for individual patients has generated tremendous interest in personalized medicine. Individualized treatment rules (ITRs) operationalize personalized medicine as a decision function from patient’s individual biomarkers to a recommended treatment and the optimal ITRs should be the one which maximizes clinical benefit if implemented. Specifically, if we use A to denote treatment assignment taking values of −1 and 1, X to denote all biomarker and prognostic information associated with each patient and let Y be the clinical outcome of interest (assuming large values are desirable), then an individualized treatment rule (ITR), denoted by d(x), takes a given value x of X and provides a treatment choice from {−1, 1}. Furthermore, let Pd denote the distribution of (X, A, Y ) and expectation with respect to this distribution by Ed, where the individualized treatment rule d(x) is used to assign treatments. Define the value function as V (d) = Ed(Y ). Then an optimal ITR, d0, is a rule that has the maximal value, i.e., d0 is the maximizer of V (d) over decision rules d.

There has been growing interest in developing valid inference methods for estimating the optimal ITRs, d0, using clinical trial data. With trial data, it holds V (d) = E[Y I(A = d(X))/π(A|X)] [15], where π(a|X) is the known randomization probability of A = a given X, so it is easy to see d0(x) = sign{E[Y|A = 1,X = x] – E[Y|A = −1,X = x]}, where sign(·) function is defined as sign(x) = 1 when x > 0, sign(x) = −1 when x < 0. Therefore, most of the existing methods tend to model E[Y |A = a,X = x] including the interactions between the treatment and the covariates either parametrically or nonparametrically. Such literature include likelihood-based approach [19, 18, 20], parametric Q-learning in [1], and machine learning based methods [25]. Alternatively, one can parametrically model E[Y|A = a,X = x] – E[Y|A = d0(X),X = x] which is called A-learning as discussed in [14] and [16]. Recently, directly maximizing V (d) has been proposed using support vector machine in [26] or via robust parametric models in Zhang et al. [24]. However, all parametric methods potentially suffer from model misspecification especially when X is not low-dimensional and the optimal ITRs depends on high-order interactions among X’s. On the other hand, although the nonparametric methods such as machine-learning methods are flexible, the resulting rules are complicated so may not be interpretable in practice. The latter often comes with no rigorous inference procedures as in the parametric methods.

In this paper, we propose a semiparametric single-index model to estimate the optimal ITRs. Our model retains a flexible and nonparametric formulation of the treatment-covariate interactions but also yields a simple decision rule which only depends on a linear combination of X. Specifically, our proposal assumes the following model between Y and (X,A):


where X is a p-dimensional covariate vector and may contain 1 as the intercept, βTX is a single index and both μ and ψ are unknown functions. Moreover, ψ is a monotone increasing function with ψ(0) = 0. The proposed model has the following advantages in developing individualized treatment strategy. First, it provides a more flexible interaction between the covariates and the treatment as compared to the traditional parametric models, in which we allow a fully nonparametric baseline function of the covariates X, μ(X), and a close-to nonparametric interaction between the treatment A and the covariates X. Second, we can easily derive the best treatment strategy as d0 : X – → sign(ψ(βTX)). Since ψ is increasing, the resulting rule is practically interpretable. Moreover, if ψ(0) = 0, the above treatment strategy d0 can be simplified as a simple rule:


That is, only the sign of a risk score βTX needs to be evaluated for each patient. As a separate note, single index models have been studied extensively in literature with a number of inference methods developed, including the average derivative method [5], the sliced inverse regression [12, 3, 11], the iterative average derivative method [6] and other related methods [23]. Estimating both the single index and the link function at the same time has also been studied in [9, 8, 4]. However, none of these works have considered the single index model for estimating the optimal ITRs, especially that our model (1) assumes the main effect of X, μ(X), to be fully nonparametric.

The rest of the paper is organized as follows. In Section 2, we provide a full inference procedure for the proposed semiparametric single index model. Extensive simulation studies are presented in Section 3 and a real data analysis is presented in Section 4, followed by a discussion section.

2. Inference Procedure

Note that model (1) remains the same if we replace ψ(x) by ψ(rx) for any r > 0. Therefore, for identifiability, we further require ||β|| = 1 where || · ||is the Euclidean [ell]2-norm in Rp. Assume that data are obtained from a randomized trial with i.i.d observations (Yi,Xi,Ai), i = 1, ..., n. The randomization probability P(A = a|X) = π(a|X) is known by the trial design.

To avoid estimating the nonparametric function μ(X) when making inference for β, we first observe that,


Therefore, a natural estimate of β is obtained by minimizing the least square, given as


subject to ||β|| = 1. Since ψ is an increasing function, we approximate ψ(x) using monotone B-spline basis [2, 10],


where N1(x), ...,NK<sub>n</sub>+M(x) are B-spline basis, Kn is the number of interior knots with equal partition in an interval containing βTX and M is B-spline order, i.e., for cubic B-spline, M = 4. The condition ξ1(...)ξK<sub>n</sub>+M assures monoticity of the ψ(·) function [10]. Additionally, we impose an upper bound Mn for the summation of absolute values of all the B-spline coefficients of ψ(·) for theoretical consideration. Mn is a constant depending on n and the rate of Mn is given in Section 3. Thus, the minimization becomes


Set d = Kn + M. The objective function in (1) is quadratic in ξ and quite nonlinear in β. The constraint ||β|| = 1 is nonlinear in the elements of β. The inequality constraint in (1) is linear in ξ since it can be expressed as ≤ 0, where ξ = (ξ1, (...) , ξd)T and B is a (d–1) ×d matrix with B(i, i) = 1,B(i, i+1) = −1 and the rest of its entries being zero. To facilitate the implementation, we now propose an iterative estimation algorithm to solve (1). In particular, we iteratively solve β with ξ fixed at their current values, and then solve ξ with β fixed at their current values, and repeat them until the convergence criterion is met. The computation procedure can be summarized as the following.

  • Step 1: Get an initial estimator [beta](0). For example, we can set Nj(βTX) = β TX as a linear function in (1) and compute the ordinary least squares (OLS) estimator for β. Normalize β(0) such that ||β(0)|| = 1. Set [ell] = 0.
  • Step 2: Given the initial estimates of the index values {Zi = [beta]([ell])TXi, i = 1, (...) , n}, minimize over ξ by solving the followng quadratic programming (QP) problem:
    Denote the solution as [Xi w/ hat]([ell]).
  • Step 3: Fix ξ at the current values, minimize
    Denote the solution as [beta]([ell]+1). This problem can be solved using the nonlinear least squares (NLS) algorithm.
  • Step 4: Set [ell] = [ell] + 1. Go to Step 2 and iterate until convergence, i.e. ||[beta]([ell])[beta]([ell]−1)|| ≤ ε (1 + ||[beta]([ell]−1)||) and ||[Xi w/ hat]([ell])[Xi w/ hat]([ell]−1)|| ≤ ε (1 + ||[Xi w/ hat]([ell]−1)||) for a small ε > 0, which takes value 1e-3 in our numerical studies.

In our numerical examples, we use the MATLAB’s optimization toolbox: the function quadprog() for QP in Step 2 and lsqnonlin() for NLS in Step 3. In this paper, we choose cubic B-spline for all numerical studies and real data application. Our algorithm usually converges in less than 10 iterations.

Given Kn, we choose to place the interior knots at equally-spaced sample quantile of the predictor variable, which is βTX in this context. For example, if there are 4 interior knots, then they would be respectively at the 20th, 40th, 60th, 80th percentile. The boundary knots are naturally chosen as the minimum and maximum values of the predictor variable. During the iteration, the estimated single index β could change at each step, therefore the knots also change in the iteration. The number of knots Kn can be tuned with cross-validation. In general, 5 to 10 knots will be sufficient to have very good results.

3. Asymptotic Results

We establish the asymptotic properties of the estimators ([beta]n, [psi]n), including their consistency under certain metric, the convergence rates, and the asymptotic distribution of n(β^n-β0). We need the following conditions.

  • (C.1) β0 is assumed to be in the unit ball B of Rp and X has a compact support. In addition, E(XXTβ0TX) is positive definite. and E[Xβ0TX=x] is kth continuously differentiable with bounded derivatives for some k > 3.
  • (C.2) ψ0 has bounded kth derivative in an open interval containing the support of β0TX for some k > 3; moreover, ψ0(0)>0.
  • (C.3) E[ψ0(β0TX)βTX] is continuously differentiable in β and moreover,

Under these conditions, we first obtain the consistency and convergence rate of ([beta]n, [psi]n).

Theorem 1

Under (C.1)–(C.3), we further assume Kn = C1nγ and Mn = C2nτ for some positive constants C1, C2 with γ > 0, τ ≥ 0, and 11γ + 9τ ≤ 1, 2τ ≤ (2k − 5)γ. Let 0 < ν < 1/2, then




where Ws,∞ is the Sobolev space consisting of functions with bounded lth derivatives for any ls. Furthermore, the Sobolev norm is defined as ||ψ|| W1,∞[a,b] = maxα≤1 ||ψ(α)||L[a,b].

The asymptotic distribution of [beta]n is stated in the following theorem.

Theorem 2

In addition to (C.1)–(C.3), we assume Kn = C1nγ and Mn = C2nτ for some positive constants C1, C2 with γ > 1/(4k − 4), τ ≥ 0 and 11γ +9τ ≤ 1, 2τ ≤ (2k − 5)γ. Then n(β^n-β0) converges in distribution to a mean-zero normal distribution with covariance 1-121-1, where




Based on Theorem 2, a consistent estimator for the asymptotic covariance is given by ^1-1^2^1-1 in which [Sigma]1 and [Sigma]2 are given as follows. Then an estimator for Σ1 is given as




an estimator for Σ2 is given by


Under Theorem 1, it is clear that both [Sigma]1 and [Sigma]2 are consistent estimators for Σ1 and Σ2 respectively when the sample size converges to infinity. Finally, we estimate the optimal decision rule as sign(β^nTX). Under such a rule, for any subject, the reward gain of using the optimal rule vs the non-optimal rule is estimated to be 2n[ψ^n(β^nTX)].

4. Numerical Studies

In this section, we conduct extensive simulations to investigate the empirical performance of our proposed method. We first use three examples (Examples I–III) to compare our method with the inverse probability weighted estimator( IPWE), augmented inverse probability weighted estimator(AIPWE) in [24] and ordinary least square based on minimizing


Finally, in Example IV, we investigate the performance of our method under model misspecification (i.e. when ψ(·) is not monotone).

We consider the model Y = μ(X) + ψ(βTX)A + ε where X is generated uniformly from [−1, 1]p, A is generated as −1 and 1 with equal probability 0.5 and the noise ε follows a normal distribution with mean 0 and standard deviation σ = 0.5. The four examples are:

  • Example I : p = 2, μ(X)=X1X2+X22, ψ(u) = 2u3 − 1, β0=12(1,-1)T.
  • Example II : p = 3, μ(X)=X12+2X1X2, ψ(u) = exp(u) − 1, β0=13(1,-1,1)T.
  • Example III : p = 4, μ(X)=X1X2+X32, ψ(u) = u3 − 1, β0=12(1,-1,1,-1)T.
  • Example IV : p = 3, μ(X)=X1X2+X32, ψ(u) = cos(2u) + sin(4u), β0=13(1,-1,1)T.

To evaluate the estimation performance of the single index coefficient, we report its bias and the mean squared error MSE(β) = average over replications of ||[beta]β0||2/p. To evaluate the estimation performance of the link function, we report its mean squared error MSE(ψ) = average over replications of 1ni=1nψ^(β^TXi)-ψ(β0TXi)2. To evaluate the accuracy of a treatment assignment rule sign(βTX), we calculate the percentage of making correct decisions (PCD), i.e. 1-12ni=1nsign(ψ^(β^TXi))-sign(ψ(β0TXi)). We also study the behavior of the value function estimates. Based on the estimated rule, the value function can be estimated as 1ni=1nYi1(Ai=gi)P(Ai=giXi), where gi is the estimated rule. We compare the proposed method with [24] in terms of parameter estimates, percentage of making correct decisions (PCD) and value function estimates.

From Tables 13, we observe that our method shows better results compared with the inverse probability weighted estimator (IPWE) and the augmented inverse probability weighted estimator (AIPWE) [24] in terms of smaller bias of estimated single index coefficient, smaller mean square error of estimated link function. In most cases, the bias of estimated single index coefficient of our proposed approach is about ten times smaller than the other two approaches. As a result, our method also makes more correct decisions and gives estimated value function much closer to its theoretical value. We also note that as sample size increases, the mean squared error of the single index coefficient and estimated link function for three methods decreases, the PCD increases and the estimated value function gets closer to the true value function. However, Table 2 indicates that the ordinary least square method performs comparably with our method but gives larger PCD than all the other methods when ψ(0) = 0. This is simply because that, ψ′ > 0,

Table 1
Estimation and classification results for Example I. PCD denotes percentage of correct decisions, Val denotes value function estimates based on large sample. We report mean of estimated single index coefficient biases, mean squared errors of estimated ...
Table 2
Estimation and classification results for Example II. Other captions are the same as Table 1.
Table 3
Estimation and classification results for Example III. Other captions are the same as Table 1.

Table 4 indicates that all the methods are much worse under model misspecification. However, our method is still better compared to IPWE, AIPWE and the ordinary least square method. We also investigate our proposed inferential procedure for the single index coefficient β. It shows in Table 5 that, as sample size increases, the empirical standard error and the mean estimated standard error are getting closer to each other. For almost all cases, the empirical coverage rates are very close to the nominal level, as expected.

Table 4
Estimation and classification results for Example IV. Other captions are the same as Table 1.
Table 5
Inference for the single index parameters of Example 1–3. std1: empirical standard deviation, std2: mean estimated standard deviation, cover: empirical coverage rate of 95% confidence intervals.

5. Data application

To further illustrate the performance of our method, we consider its application to data from AIDS Clinical Trials Group Protocol 175 (ACTG175). The complete data contain 2139 HIV-infected subjects with study subjects randomized to four different treatment groups: zidovudine (ZDV) monotherapy, ZDV + didanosine (ddI), ZDV + zalcitabine and ddI monotherapy. The CD4 count (cells/mm3 ) at 20±5 weeks post-baseline is chosen as the continuous response Y, where large values are desired. Among all subjects, 524 subjects received the treatments ZDV + didanosine (ddI) and 522 subjects received the treatment ZDV + zalcitabine. For illustration purpose, we consider these two group of patients with the goal to find their individualized optimal treatment rules. We use A = 1 to denote treatment ZDV + zalcitabine and A = −1 to denote treatment ZDV + didanosine (ddI). Besides the treatment indicator, we also include two covariates: age and homosexual activity (in short as homo), which are selected as important covariates in [13].

We apply the proposed method to estimate the optimal treatment and perform statistical inference for the corresponding parameters. The estimates for the single index coefficients are 0.902, −0.036, and 0.430 respectively and the estimated variance of the single index coefficients are 0.2232, 0.0004 and 0.0984, respectively. The optimal treatment rule is sign(0.902-0.036×age+0.430×homo). That is, if 0.902-0.036×age+0.430×homo ≥ 0, the optimal treatment for this patient is ZDV + zalcitabine, otherwise, the optimal treatment is ZDV + didanosine( ddI). In other words, for a patient with homo = 0, the optimal treatment A = −1 if age > 25.2 and the optimal treatment A = 1 otherwise; while for a patient with homo = 1, the optimal treatment A = −1 if age > 37.2 and the optimal treatment A = 1 otherwise. We note that the age of study subjects ranges from 12 to 70. According to the estimated optimal rule, 565 out of 1046 patients (54.02%) in this subset should be assigned to treatment ZDV+didanosine (ddI).

An external file that holds a picture, illustration, etc.
Object name is nihms846272u1.jpg

6. Discussion

In this paper, we proposed a novel semiparametric single-index model for individualized treatment selection. Our model plays an important role as a compromise between parametric models and nonparametric models [24]. The decision rule based on our method is a simple linear combination of covariates. We provide statistical inference for this rule. The asymptotic properties for the proposed method are established. The proposed method demonstrates superior numerical behavior in terms of smaller bias and means square error. Based on the estimated rule, our method also provides more precise decisions than existing methods and gives more precise value function estimates.

In many clinical studies, the state space is often of very high dimension. To develop optimal individualized treatment rules in this case, it will be important to develop simultaneous variable selection and treatment rule estimation. Variable selection techniques such as penalized regression and variable screening can be nested into our semiparametric single index modeling framework as powerful tools to develop optimal individualized treatment rules.

In our current procedure, we assume the propensity score π(A|X) is known. In observational studies, the propensity scores are often unknown. For such observational data, we can estimate π(A|X) via logistic regression and plug-in the estimated propensity score funtion π(A|X) into the optimization equation (1). It is beyond the scope of the current work and is an interesting topic for future study.

Fig 1
Estimation performance for link function based on mean of 10 replications of Example 1–4 when n = 500.


Rui Song’s research is partially supported by the NSF grant DMS-1555244 and NCI grant P01 CA142538. Donglin Zeng’s research is partially supported by NIH grants U01-NS082062 and R01GM047845. Wenbin Lu’s research is partially supported by NCI grant P01 CA142538. Hao Helen Zhang’s research is partially supported by the grants NSF DMS-1309507, DMS-1418172 and NSFC 11571009. Zhiguo Li’s research is partially supported by NCI grant P01 CA142538.


AMS 2000 subject classifications: Primary 62G05; secondary 62G99.

Contributor Information

Rui Song, Department of Statistics, North Carolina State University.

Shikai Luo, Department of Statistics, North Carolina State University.

Donglin Zeng, Department of Biostatistics, University of North Carolina.

Hao Helen Zhang, Department of Mathematics, University of Arizona.

Wenbin Lu, Department of Statistics, North Carolina State University.

Zhiguo Li, Department of Biostatistics and Bioinformatics, Duke University.


1. Chakraborty B, Murphy S, Strecher V. Inference for non-regular parameters in optimal dynamic treatment regimes. Statistical methods in medical research. 2010;19:317–343. [PMC free article] [PubMed]
2. De Boor C. A practical guide to splines. Mathematics of Computation 1978
3. Duan N, Li K-C. Slicing regression: a link-free regression method. The Annals of Statistics. 1991:505–530.
4. Hardle W, Hall P, Ichimura H, et al. Optimal smoothing in single-index models. The annals of Statistics. 1993;21:157–178.
5. Horowitz JL, Härdle W. Direct semiparametric estimation of single-index models with discrete covariates. Journal of the American Statistical Association. 1996;91:1632–1640.
6. Hristache M, Juditsky A, Spokoiny V. Direct estimation of the index coefficient in a single-index model. Annals of Statistics. 2001:595–623.
7. Huang J, et al. Efficient estimation for the proportional hazards model with interval censoring. The Annals of Statistics. 1996;24:540–568.
8. Ichimura H. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. Journal of Econometrics. 1993;58:71–120.
9. Klein RW, Spady RH. An efficient semiparametric estimator for binary response models. Econometrica: Journal of the Econometric Society. 1993:387–421.
10. Leitenstorfer F, Tutz G. Generalized monotonic regression based on B-splines with an application to air pollution data. Biostatistics. 2007;8:654–673. [PubMed]
11. Li K-C. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association. 1991;86:316–327.
12. Li K-C, Duan N. Regression analysis under link violation. The Annals of Statistics. 1989:1009–1052.
13. Lu W, Zhang HH, Zeng D. Variable selection for optimal treatment decision. Statistical methods in medical research. 2011 0962280211428383. [PMC free article] [PubMed]
14. Murphy SA. Optimal dynamic treatment regimes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2003;65:331–355.
15. Qian M, Murphy SA. Performance guarantees for individualized treatment rules. Annals of statistics. 2011;39:1180. [PMC free article] [PubMed]
16. Robins JM. Proceedings of the Second Seattle Symposium in Biostatistics. Springer; 2004. Optimal structural nested models for optimal sequential decisions; pp. 189–326.
17. Schumaker L. Spline functions: basic theory. Cambridge University Press; 2007.
18. Thall PF, Sung H-G, Estey EH. Selecting therapeutic strategies based on efficacy and death in multicourse clinical trials. Journal of the American Statistical Association. 2002:97.
19. Thall PF, Millikan RE, Sung H-G, et al. Evaluating multiple treatment courses in clinical trials. Statistics in medicine. 2000;19:1011–1028. [PubMed]
20. Thall PF, Wooten LH, Logothetis CJ, Millikan RE, Tannir NM. Bayesian and frequentist two-stage treatment strategies based on sequential failure times subject to interval censoring. Statistics in medicine. 2007;26:4687–4702. [PubMed]
21. van der Vaart AW. Asymptotic statistics. Vol. 3. Cambridge university press; 2000.
22. van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes. Springer; 1996.
23. Xia Y. Asymptotic distributions for two estimators of the single-index model. Econometric Theory. 2006;22:1112–1137.
24. Zhang B, Tsiatis AA, Laber EB, Davidian M. A robust method for estimating optimal treatment regimes. Biometrics. 2012;68:1010–1018. [PMC free article] [PubMed]
25. Zhao Y, Zeng D, Socinski MA, Kosorok MR. Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics. 2011;67:1422–1433. [PMC free article] [PubMed]
26. Zhao Y, Zeng D, Rush AJ, Kosorok MR. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association. 2012;107:1106–1118. [PMC free article] [PubMed]