Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2832228

Formats

Article sections

- Summary
- 1. Introduction
- 2. Parametric measurement error models
- 3. Semiparametric measurement error models
- 4. Numerical studies and application
- 5. Discussion
- References

Authors

Related links

Bernoulli (Andover). Author manuscript; available in PMC 2011 January 1.

Published in final edited form as:

Bernoulli (Andover). 2010; 16(1): 274–300.

PMCID: PMC2832228

NIHMSID: NIHMS120716

Yanyuan Ma and Runze Li

Yanyuan Ma, Department of Statistics, Texas A&M University, College Station, TX 77843;

Yanyuan Ma: ude.umat.tats@am; Runze Li: ude.usp.tats@ilr

Measurement error data or errors-in-variable data are often collected in many studies. Natural criterion functions are often unavailable for general functional measurement error models due to the lack of information on the distribution of the unobservable covariates. Typically, the parameter estimation is via solving estimating equations. In addition, the construction of such estimating equations routinely requires solving integral equations, hence the computation is often much more intensive compared with ordinary regression models. Because of these difficulties, traditional best subset variable selection procedures are not applicable, and in the measurement error model context, variable selection remains an unsolved issue. In this paper, we develop a framework for variable selection in measurement error models via penalized estimating equations. We first propose a class of selection procedures for general parametric measurement error models and for general semiparametric measurement error models, and study the asymptotic properties of the proposed procedures. Then, under certain regularity conditions and with a properly chosen regularization parameter, we demonstrate that the proposed procedure performs as well as an oracle procedure. We assess the finite sample performance via Monte Carlo simulation studies and illustrate the proposed methodology through the empirical analysis of a familiar data set.

In the regression analysis, some covariates often can only be measured imprecisely or indirectly, thus resulting in the measurement error models, also known as errors-in-variable models in the literature. Various statistical procedures have been developed for statistical inference in measurement error models (Carroll, Ruppert, Stefanski and Crainiceanu, 2006). The study on linear measurement error models dates back to Bickel and Ritov (1987), where an efficient estimator is provided. Stefanski and Carroll (1987) constructed consistent estimators for generalized linear measurement error models. Recently, Tsiatis and Ma (2004) extended the model framework to an arbitrary parametric regression setting. Liang, Härdle and Carroll (1999) proposed partially linear measurement error models. Ma and Carroll (2006) studied generalized partially linear measurement error models. Further active research development has been established recently in the nonparametric measurement error area, see, for example, Delaigle and Hall (2007), and Delaigle and Meister (2007). The goal of this paper is to develop a class of variable selection procedures for general measurement error models. We would emphasize here that the scope of the paper is not limited to generalized linear models.

This study was motivated by examining the effects of systolic blood pressure (SBP), a covariate with error, and effects of three other covariates, respectively serum cholesterol, age and smoking status on the probability of the occurrence of heart disease. In our initial analysis, we include interactions between SBP and covariates, and interactions among covariates and quadratic terms of covariates to reduce modeling bias. It is found in our preliminary analysis that some interactions and quadratic terms are not significant and should be excluded to achieve a parsimonious model. To select significant variables in further analysis, we realized that the traditional AIC and BIC criteria are not well-defined for the model we consider in Section 4.4. Recently, a class of variable selection procedures for partially linear measurement error models via using penalized least squares and penalized quantile regression were proposed in Liang and Li (2009). However, their procedures are not applicable to cases beyond partially linear models, such as partially linear logistic regression models, and therefore, the procedures in Liang and Li (2009) cannot be applied for the model in Section 4.4 either. In fact, variable selection for general parametric or semiparametric measurement error models is challenging. One major difficulty is the lack of a likelihood function in these models, due to the difficulty in obtaining the distribution of the error-prone covariates. For example, using *Y* to denote the response variable, *X* to denote the unobservable covariate, and *W* to denote an observed surrogate of *X*. The likelihood of a single observation (*w, y*) is then *∫ p _{Y}*

The variable selection procedures we propose is indeed a penalized estimating equation method, which can be applied for both parametric and semiparametric measurement error models. In addition, the penalized estimating equation method is applicable to any set of consistent estimating equations. Note that here, the measurement error model we consider is completely general and not limited to generalized linear models. Variable selection and feature selection are very active research topics in the current literature. Candès and Tao (2007) and Fan and Lv (2008) have studied variable selection for linear models when the sample size is much smaller than the dimension of regression parameter space. Their results are inspiring, but only valid for linear models with very strong assumptions on design matrix or the distribution of covariates. Thus, we in this paper follow Fan and Peng (2004), and consider the statistical setting in which the number of regression coefficients diverges to infinity at a certain rate as the sample size tends to infinity. We systematically study the asymptotic properties of the proposed estimator. It is worth pointing out that theoretic results in this paper provide explicit results on the asymptotic properties when the dimension of regression coefficients increases as the sample size increases. This advances the results in current literature, where estimation and inference are studied only for fixed finite dimensional parameter for measurement error models. In our asymptotic analysis, we show that with a proper choice of the regularization parameters and the penalty function, our estimator possesses the oracle property, which roughly means that the estimate is as good as when the true model is known (Fan and Li, 2001). We also demonstrate that the oracle property holds in a simpler form for the more familiar setting where the true number of regression coefficients is fixed.

In addition, we address issues of practical implementation of the proposed methodology. It is desirable to have an automatic, data-driven method to select the regularization parameters. To this end, we propose GCV-type and BIC-type tuning parameter selectors for the proposed penalized estimating equation method. Monte Carlo simulation studies are conducted to assess finite sample performance in terms of model complexity and model error. From our simulation studies, both tuning parameter selectors result in sparse models, while the BIC-type tuning parameter selector outperforms the GCV-type tuning parameter selectors.

The rest of the paper is organized as follows. In Section 2, we propose a new class of variable selection procedures for parametric measurement error models and study asymptotic properties of the proposed procedures. We develop a new variable selection procedure for semiparametric measurement error models in Section 3. Implementation issues and numerical examples are presented in Section 4, where we describe data driven automatic tuning parameter selection methods (Section 4.1), define the concept of approximate model error to evaluate the selected model (Section 4.2), carry out a simulation study to assess the finite sample performance of the proposed procedures (Section 4.3), and illustrate our method in an example (Section 4.4). Technical details are collected in the Appendix.

A general parametric measurement error model has two parts, written as

$${p}_{Y\mid X,Z}(Y\mid X,Z,\beta )\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{p}_{W\mid X,Z}(W\mid X,Z,\xi ).$$

(2.1)

Here, the main model is *p _{Y}*

Denote ${S}_{\beta}^{\ast}$ as the purported score function. That is,

$${S}_{\beta}^{\ast}(W,Z,Y)=\frac{\partial log\int {p}_{W\mid X,Z}(W\mid X,Z){p}_{Y\mid X,Z}(Y\mid X,Z){p}_{X\mid Z}^{\ast}(X\mid Z)d\mu (X)}{\partial \beta},$$

where
${p}_{X\mid Z}^{\ast}(X\mid Z)$ is a conditional pdf that one posits, which can be either equal or not equal the true pdf *p _{X}*

$$E[{E}^{\ast}\{a(X,Z)\mid W,Z,Y\}\mid X,Z]=E\{{S}_{\beta}^{\ast}(W,Z,Y)\mid X,Z\},$$

where *E*^{*} indicates that the expectation is calculated using the posited
${p}_{X\mid Z}^{\ast}(X\mid Z)$. Note that here and in the sequel, a model
${p}_{X\mid Z}^{\ast}(X\mid Z)$ has to be proposed in order to actually construct the estimator. Define

$${S}_{\mathit{eff}}^{\ast}(W,Z,Y)={S}_{\beta}^{\ast}(W,Z,Y)-{\text{E}}^{\ast}\{a(X,Z)\mid W,Z,Y\}.$$

To select significant variables and estimate the corresponding parameters simultaneously, we propose the penalized estimating equations for model (2.1) as

$$\sum _{i=1}^{n}{S}_{\mathit{eff}}^{\ast}({W}_{i},{Z}_{i},{Y}_{i},\beta )-n{\stackrel{.}{p}}_{\lambda}(\beta )=0,$$

(2.2)

where
${\stackrel{.}{p}}_{\lambda}(\beta )={\{{p}_{\lambda}^{\prime}({\beta}_{1}),\cdots ,{p}_{\lambda}^{\prime}({\beta}_{d})\}}^{T}$ and
${p}_{\lambda}^{\prime}(\xb7)$ is the first order derivative of a penalty function *pλ* (·). Solving for from (2.2) gives the estimate of *β*. In practice, we may allow different coefficients to have penalty functions with different regularization parameters. For example, we may want to keep some variables in the model without penalizing their coefficients. For ease of presentation, we assume that the penalty functions and the regularization parameters are the same for all coefficients in this paper.

The penalties in the classic variable selection criteria, such as AIC and BIC, cannot be applied to the penalized estimating equations. Following the study on the choice of the penalty functions in Fan and Li (2001), we use the SCAD penalty, whose first order derivative is defined as

$${p}_{\lambda}^{\prime}(\gamma )=\lambda \left\{I(\mid \gamma \mid \phantom{\rule{0.16667em}{0ex}}\le \lambda )+\frac{{(a\lambda -\mid \gamma \mid )}_{+}}{(a-1)\lambda}I(\mid \gamma \mid \phantom{\rule{0.16667em}{0ex}}>\lambda )\right\}\text{sign}(\gamma )$$

(2.3)

for any scalar *γ*, where sign(·) is the sign function, i.e., sign(*γ*) = −1, 0, and 1 when *γ* < 0, = 0 and > 0 respectively. Here, *a* > 2 is a constant, and a choice of *a* = 3.7 is appropriate from a Bayesian point of view. A property of (2.2) is that with a proper choice of penalty functions, such as the SCAD penalty, the resulting estimate contains some exact zero coefficients. This is equivalent to excluding the corresponding variables from the final selected model, thus variable selection is achieved at the same time as parameter estimation.

Concerns about model bias often prompt us to build models that contain many variables, especially when the sample size becomes large. A reasonable way to capture such tendency is to consider the situation where the dimension of the parameter *β* increases along with the sample size *n*. We therefore study the asymptotic properties of the penalized estimating equation estimator under the setting in which both the dimension of the true non-zero components of *β* and the total length of *β* tend to infinity as *n* goes to infinity. Denote *β*_{0} = (*β*_{10}, ···, *β _{d}*

$${a}_{n}=max\{\mid {p}_{{\lambda}_{n}}^{\prime}(\mid {\beta}_{j0}\mid )\mid :{\beta}_{j0}\ne 0\},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{b}_{n}=max\{\mid {p}_{{\lambda}_{n}}^{\u2033}(\mid {\beta}_{j0}\mid )\mid :{\beta}_{j0}\ne 0\},$$

(2.4)

where we write *λ* as *λ _{n}* to emphasize its dependence on the sample size

Suppose that Condition (P1) in the Appendix holds. Under regularity conditions (A1)–(A3) in the Appendix, and if
${d}_{n}^{4}{n}^{-1}\to 0$, *λ _{n}* → 0 when

The proof of Theorem 1 is given in the Appendix. Theorem 1 demonstrates that the convergence rate depends on the penalty function and the regularization parameter *λ _{n}* through

To present the oracle property of the resulting estimate, we first introduce some notation. Without loss of generality, we assume
${\beta}_{0}={({\beta}_{I0}^{T},{\beta}_{II0}^{T})}^{T}$, and in the true model, any element in *β _{I}*

$$b={\{{p}_{{\lambda}_{n}}^{\prime}({\beta}_{10}),\cdots ,{p}_{{\lambda}_{n}}^{\prime}({\beta}_{{d}_{1}0})\}}^{T}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathrm{\sum}=\text{diag}\{{p}_{{\lambda}_{n}}^{\u2033}({\beta}_{10}),\cdots ,{p}_{{\lambda}_{n}}^{\u2033}({\beta}_{{d}_{1}0})\},$$

(2.5)

and the first *d*_{1} components of
${S}_{\mathit{eff}}^{\ast}(W,Z,Y,\beta )$ as
${S}_{\mathit{eff},I}^{\ast}(\beta )$. In the following theorem, we use the same formulation as that in Cai, Fan, Li and Zhou (2005).

Suppose that Condition (P1) holds. Under regularity conditions (A1)–(A3), assume *λ _{n}* → 0 and
${d}_{n}^{5}/n\to 0$ when

$$\underset{n\to \infty}{liminf}\underset{\gamma \to 0+}{liminf}\sqrt{n/{d}_{n}}{p}_{{\lambda}_{n}}^{\prime}(\gamma )\to \infty ,$$

(2.6)

then with probability tending to one, any root *n* consistent solution
${\widehat{\beta}}_{n}={({\widehat{\beta}}_{I}^{T},{\widehat{\beta}}_{II}^{T})}^{T}$ of (2.2) must satisfy that

= 0,_{II}- for any
*d*_{1}× 1 vector*v*, s.t.*v*= 1,^{T}v

$$\begin{array}{l}\sqrt{n}{v}^{T}{[E\{{S}_{\mathit{eff},I}^{\ast}({\beta}_{I0}){S}_{\mathit{eff},I}^{\ast T}({\beta}_{I0})\}]}^{-1/2}\left\{E\frac{\partial {S}_{\mathit{eff},I}^{\ast}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}-\mathrm{\sum}\right\}\\ \left[{\widehat{\beta}}_{I}-{\beta}_{I0}-{\left\{E\frac{\partial {S}_{\mathit{eff},I}^{\ast}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}-\mathrm{\sum}\right\}}^{-1}b\right]\stackrel{\text{D}}{\to}N(0,1).\end{array}$$

where the notation $\stackrel{\text{D}}{\to}$ stands for convergence in distribution.

The proof of Theorem 2 is given in the Appendix. For some penalty functions, including the SCAD penalty, *b* and Σ are zero when *λ _{n}* is sufficiently small, hence the results in Theorem 2 imply that the proposed procedure has the celebrated oracle property: i.e.,

$$\sqrt{n}{v}^{T}{[E\{{S}_{\mathit{eff},I}^{\ast}({\beta}_{I0}){S}_{\mathit{eff},I}^{\ast T}({\beta}_{I0})\}]}^{-1/2}E\left\{\frac{\partial {S}_{\mathit{eff},I}^{\ast}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}\right\}({\widehat{\beta}}_{I}-{\beta}_{I0})\stackrel{\text{D}}{\to}N(0,1).$$

(2.7)

Theorems 1 and 2 imply that for fixed and finite *d*, || − *β*_{0}|| = *O _{p}*(

$$\begin{array}{l}\sqrt{n}\left[{\widehat{\beta}}_{I}-{\beta}_{I0}-{\left\{E\frac{\partial {S}_{\mathit{eff},I}^{\ast}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}-\mathrm{\sum}\right\}}^{-1}b\right]\stackrel{\text{D}}{\to}\\ N\left[0,{\left\{E\frac{\partial {S}_{\mathit{eff},I}^{\ast}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}-\mathrm{\sum}\right\}}^{-1}E\{{S}_{\mathit{eff},I}^{\ast}({\beta}_{I0}){S}_{\mathit{eff},I}^{\ast T}({\beta}_{I0})\}{\left\{E\frac{\partial {S}_{\mathit{eff},I}^{\ast}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}-\mathrm{\sum}\right\}}^{-T}\right],\end{array}$$

where the notation *M*^{−}* ^{T}* denotes (

For SCAD penalty and for fixed and finite *d*, (2.7) becomes

$$\sqrt{n}({\widehat{\beta}}_{I}-{\beta}_{I0})\to N\{0,E{(\partial {S}_{\mathit{eff},I}^{\ast}/\partial {\beta}_{I}^{T})}^{-1}E({S}_{\mathit{eff},I}^{\ast}{S}_{\mathit{eff},I}^{\ast T})E{({S}_{\mathit{eff},I}^{\ast}/\partial {\beta}_{I}^{T})}^{-T}\}$$

in distribution. In other words, with probability tending to 1, the penalized estimator performs the same as the locally efficient estimator under the correct model.

To motivate the problems considered in this section, we start with some commonly-used semiparametric regression models for which the proposed procedure in this section can be directly applied. Consider first the error-free regression cases, and let *Y* be the response, *Z* and *S* be covariates. Throughout this paper, we consider univariate *Z* only. Consider the partially linear model defined as follows:

$$Y=\theta (Z)+{S}^{T}\beta +\epsilon .$$

(3.1)

The partially linear model keeps the flexibility of nonparametric model for the baseline function, while maintaining the explanatory power of parametric models. Therefore, it has received a lot of attention in the literature. See, for example, Härdle, Liang and Gao (2000) and references therein. Various extensions of the partially linear model have been proposed in the literature. Li and Nie (2007, 2008) proposed the partially nonlinear models

$$Y=\theta (Z)+f(S;\beta )+\epsilon ,$$

(3.2)

where *f*(*S; β*) is a specific, known function, which may be nonlinear in *β*. See Li and Nie (2007, 2008) for some interesting examples. Li and Liang (2008) and Lam and Fan (2008) studied the generalized varying-coefficient partially linear model

$$g\{E(Y\mid Z,S)\}={S}_{1}^{T}\beta +{S}_{2}^{T}\theta (Z),$$

(3.3)

where *g*(·) is a link function, and (*S*_{1}, *S*_{2}, *Z*) are covariates. Model (3.3) includes most commonly-used semiparametric models, such as the partially linear models (3.1), the generalized partially linear models (Severini and Staniswalis, 1994), and semi-varying coefficient partially linear models (Fan and Huang, 2005).

In the presence of covariates measured with error, one may extend the aforementioned semiparametric regression models for measurement error data. As in the last section, let *X* be the covariate vector measured with error. Among these semiparametric models with error, the partially linear measurement error model

$$Y=\theta (Z)+{X}^{T}{\beta}_{1}+{S}^{T}{\beta}_{2}+\epsilon $$

(3.4)

has been studied in Liang, Härdle and Carroll (1999). Liang and Li (2009) proposed a class of variable selection for model (3.4) using penalized least squares and penalized quantile regression. Our procedure in this section, however, is directly applicable for both the generalized varying-coefficient partially linear measurement error model

$$g\{E(Y\mid X,Z,S)\}={X}^{T}{\beta}_{1}+{S}_{1}^{T}{\beta}_{2}+{S}_{2}^{T}\theta (Z),$$

(3.5)

where $S={({S}_{1}^{T},{S}_{2}^{T})}^{T}$, and the partially nonlinear measurement error model

$$Y=\theta (Z)+f(X,S;\beta )+\epsilon ,$$

(3.6)

when an error distribution is assumed. It is worth noting that model (3.6) includes the following model as a special case

$$Y={X}^{T}{\beta}_{1}+{S}^{T}{\beta}_{2}+{(XZ)}^{T}{\beta}_{3}+\theta (Z)+\epsilon ,$$

(3.7)

where (*XZ*) consists of all interaction terms between *X* and *Z*, but model (3.4) does not. Thus, the variable selection procedures proposed in Liang and Li (2009) are not directly applicable for model (3.7).

In summary, in this section, we consider a general semiparametric error model that include models (3.4), (3.5) and (3.6) as its special cases. Specifically, the semiparametric measurement error model we consider here also has two parts:

$${p}_{Y\mid X,Z,S}\{Y\mid X,Z,S,\beta ,\theta (Z)\}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{p}_{W\mid X,Z,S}(W\mid X,Z,S).$$

(3.8)

The major difference from its parametric counterpart is that the main model contains unknown functions *θ* (*Z*). It is easy to check that models (3.4), (3.5) and (3.6) are special cases of model (3.8). Note that a simpler version of this model is considered in Ma and Carroll (2006), where the dimension of *β* is assumed to be fixed, and the dimension of *θ* is assumed to be one.

Throughout this paper, we assume that the model is identifiable. We propose the penalized estimating equation for the semiparametric model:

$$\sum _{i=1}^{n}\mathcal{L}({W}_{i},{Z}_{i},{S}_{i},{Y}_{i},\beta ,{\widehat{\theta}}_{i})-n{\stackrel{.}{p}}_{\lambda}(\beta )=0,$$

(3.9)

where * _{λ}* (

$$\begin{array}{c}\sum _{i=1}^{n}{K}_{h}({z}_{i}-{z}_{1})\mathrm{\Psi}({w}_{i},{z}_{i},{s}_{i},{y}_{i};\beta ,{\theta}_{1})=0\\ \vdots \\ \sum _{i=1}^{n}{K}_{h}({z}_{i}-{z}_{n})\mathrm{\Psi}({w}_{i},{z}_{i},{s}_{i},{y}_{i};\beta ,{\theta}_{n})=0,\end{array}$$

(3.10)

where *K _{h}*(

Suppose that condition (P1) holds. Under regularity conditions (B1)–(B4) in the Appendix, and if
${d}_{n}^{4}{n}^{-1}\to 0$, *λ _{n}* → 0 when

The proof of Theorem 3 is given in the Appendix. Theorem 3 indicates that to achieve root *n/d _{n}* convergence rate (or root

Let * _{I}* be the first

$$\begin{array}{l}A=E[{\mathcal{L}}_{I{\beta}_{I}}\{W,Z,S,Y,{\beta}_{0},{\theta}_{0}(Z)\}+{\mathcal{L}}_{I\theta}\{W,Z,S,Y,{\beta}_{0},{\theta}_{0}(Z)\}{\theta}_{{\beta}_{I}}(Z,{\beta}_{0})],\\ B=cov[{\mathcal{L}}_{I}\{W,Z,S,Y,{\beta}_{0},{\theta}_{0}(Z)\}-{\mathcal{U}}_{\mathcal{\mathcal{I}}}(Z)\mathrm{\Psi}\{W,Z,S,Y,{\beta}_{0},{\theta}_{0}(Z)\}],\end{array}$$

we obtain the following results.

Suppose that condition (P1) holds. Under regularity conditions (B1)–(B4), if *λ _{n}* → 0,
${d}_{n}^{5}{n}^{-1}\to 0$, and (2.6) holds, then with probability tending to one, any root

= 0,_{II}- for any
*d*_{1}× 1 vector*v*such that*v*= 1,^{T}v

$$\sqrt{n/{d}_{n}}{v}^{T}{B}^{-1/2}(A-\mathrm{\sum})\left\{{\widehat{\beta}}_{I}-{\beta}_{I0}-{(A-\mathrm{\sum})}^{-1}b\right\}\stackrel{\text{D}}{\to}N(0,1).$$

The proof of Theorem 4 is given in the Appendix. Theorem 4 implies that for fixed and finite *d*, the convergence rate of the resulting estimate is *n*^{−1/2} + *a _{n}*. It also implies that any root

$$\sqrt{n}\left\{{\widehat{\beta}}_{I}-{\beta}_{I0}-{(A-\mathrm{\sum})}^{-1}b\right\}\stackrel{\text{D}}{\to}N\{0,{(A-\mathrm{\sum})}^{-1}B{(A-\mathrm{\sum})}^{-T}\}.$$

See the earlier version of this work, Ma and Li (2007) for details.

In this section, we provide implementation details such as tuning parameter selection and model error approximation. Issues related to the numerical procedure to solve (2.2) and (3.9), the choice of kernel and bandwidth in the semiparametric model, and the treatment of multiple roots have been addressed in Ma and Carroll (2006) hence is not further discussed here. We assess the finite sample performance of the proposed procedure by Monte Carlo simulation, and illustrate the proposed methodology by an empirical analysis of the Framingham heart study data. In our simulation, we concentrate on the performance of the proposed procedure for a quadratic logistic measurement error model and a partially linear logistic measurement error model in terms of model complexity and model error.

An MM algorithm (Hunter and Li, 2005) and a local linear approximation (LLA) algorithm (Zou and Li, 2008) have been proposed for penalized likelihood with nonconcave penalty. However, both the MM algorithm and the LLA algorithm are difficult to be implemented for the measurement error models we consider. Thus, we employ the local quadratic approximation (LQA) algorithm (Fan and Li, 2001) to solve the penalized estimating equations. Specifically, in implementing the Newton-Raphson algorithm to solve the penalized estimating equations, we locally approximate the first order derivative of the penalty function by a linear function, following the idea of local quadratic approximation (LQA) algorithm. Specifically, suppose that at the *k*th step of the iteration, we obtain the value *β*^{(}^{k}^{)}. Then, for
${\beta}_{j}^{(k)}$ not very close to zero,

$${p}_{\lambda}^{\prime}({\beta}_{j})={p}_{\lambda}^{\prime}(\mid {\beta}_{j}\mid )\text{sign}({\beta}_{j})\approx \frac{{p}_{\lambda}^{\prime}(\mid {\beta}_{j}^{(k)}\mid )}{\mid {\beta}_{j}^{(k)}\mid}{\beta}_{j},$$

otherwise, we set
${\beta}_{j}^{(k+1)}=0$, and exclude the corresponding covariate from the model. This approximation is updated in every step of the Newton-Raphson algorithm iteration. The LQA was proposed by Fan and Li (2001). In practice, we set the initial value of *β* to be the unpenalized estimating equation estimate. It can be shown that when the algorithm converges, the solution will satisfy the penalized estimating equations. Following Theorem 2 and 4, we can further approximate the estimation variance of the resulting estimator. That is

$$\widehat{cov}(\widehat{\beta})=\frac{1}{n}{(E-{\mathrm{\sum}}_{\lambda})}^{-1}F{(E-{\mathrm{\sum}}_{\lambda})}^{-T},$$

where Σ* _{λ}* is a diagonal matrix with elements equal to
${p}_{\lambda}^{\prime}(\mid {\widehat{\beta}}_{j}\mid )/\mid {\widehat{\beta}}_{j}\mid $ for nonvanishing

It is desirable to have automatic, data-driven methods to select the tuning parameter *λ*. Here we will consider two tuning parameter selectors, the GCV and BIC tuning parameter selectors. To define the GCV statistic and the BIC statistic, we need to define the degrees of freedom and goodness of fit measure for the final selected model. Similar to the nonconcave penalized likelihood approach, we may define effective number of parameters or degrees of freedom to be

$$d{f}_{\lambda}=\text{trace}\{I{(I+{\mathrm{\sum}}_{\lambda})}^{-1}\},$$

where *I* stands for the Fisher Information matrix. For the logistic regression models employed in this section, a natural approximation of *I*, ignoring the measurement error effect, is *V ^{T}QV,* where

In the logistic regression model context of this section, we may employ its deviance as a goodness of fit measure. Specifically, let *μ _{i}* be the conditional expectation of

$$D({\widehat{\mu}}_{\lambda})=2\sum _{i=1}^{n}[{Y}_{i}log({Y}_{i}/{\widehat{\mu}}_{\lambda ,i})+(1-{Y}_{i})log\{(1-{Y}_{i})/(1-{\widehat{\mu}}_{\lambda ,i})\}].$$

Define the GCV statistic to be

$$\mathit{GCV}(\lambda )=\frac{D({\widehat{\mu}}_{\lambda})}{n{(1-d{f}_{\lambda}/n)}^{2}},$$

and the BIC statistic to be

$$\mathit{BIC}(\lambda )=D({\widehat{\mu}}_{\lambda})+2log(n)d{f}_{\lambda}.$$

The GCV and the BIC tuning parameter selectors select *λ* by minimizing *GCV* (*λ*) and *BIC*(*λ*) respectively. Note that the BIC tuning parameter selector is distinguished from the traditional BIC variable selection criterion, which is not well defined for estimating equation methods. Wang, Li and Tsai (2007) provided a study on the asymptotic behavior for the GCV and BIC tuning parameter selectors for the nonconcave penalized least squares variable selection procedures in linear and partially linear regression models. It is needed to further study the asymptotic property of the proposed tuning parameter selection, but it is out of the score of this paper.

Model error is an effective way of evaluating model adequacy versus model complexity. To implement the concept of model error in evaluating our procedure, we first simplify its definition for logistic partially linear measurement error model. Denote *μ*(*S, X, Z*) = *E*(*Y* |*S, X, Z*), and define the model error for a model (*S, X, Z*) as

$$ME(\widehat{\mu})=E{\{\widehat{\mu}({S}^{+},{X}^{+},{Z}^{+})-\mu ({S}^{+},{X}^{+},{Z}^{+})\}}^{2},$$

where the expectation is taken over the new observation *S*^{+}, *X*^{+} and *Z*^{+}. Let *g*(·) be the logit link. For the logistic partially linear model, the mean function has the form *μ*(*S, X, Z*) = *g*^{−1}{*θ* (*Z*)+*β ^{T}V*}, where

$$\begin{array}{l}ME(\widehat{\mu})\approx E({\stackrel{.}{g}}^{-1}{\{\theta ({Z}^{+})+{\beta}^{T}{V}^{+}\}}^{2}[{\{\widehat{\theta}({Z}^{+})-\theta ({Z}^{+})\}}^{2}\\ +{({\widehat{\beta}}^{T}{V}^{+}-{\beta}^{T}{V}^{+})}^{2}+2\{\widehat{\theta}({Z}^{+})-\theta ({Z}^{+})\}({\widehat{\beta}}^{T}{V}^{+}-{\beta}^{T}{V}^{+})]).\end{array}$$

The first component is the inherent model error due to (·), the second one is due to the lack of fit of *,* and the third one is the cross-product between the first two components. Thus, to assess the performance of the proposed variable selection procedure, we define the approximate model error (AME) for to be

$$\mathit{AME}(\widehat{\beta})=E\left[{\stackrel{.}{g}}^{-1}{\{\theta ({Z}^{+})+{\beta}^{T}{V}^{+}\}}^{2}{({\widehat{\beta}}^{T}{V}^{+}-{\beta}^{T}{V}^{+})}^{2}\right].$$

Furthermore, the AME of can be written as

$$\mathit{AME}(\widehat{\beta})={(\widehat{\beta}-\beta )}^{T}E[{\stackrel{.}{g}}^{-1}{\{\theta ({Z}^{+})+{\beta}^{T}{V}^{+}\}}^{2}{V}^{+}{V}^{+T}](\widehat{\beta}-\beta )\stackrel{\wedge}{=}{(\widehat{\beta}-\beta )}^{T}{C}_{X}(\widehat{\beta}-\beta ).$$

(4.1)

In our simulation, the matrix *C _{X}* is estimated by 1,000,000 Monte Carlo simulations. For measurement error data, we observe

$${\mathit{AME}}_{W}(\widehat{\beta})={(\widehat{\beta}-\beta )}^{T}{C}_{W}(\widehat{\beta}-\beta ),$$

(4.2)

where *C _{W}* is obtained by replacing

To demonstrate the performance of our method in both parametric and semiparametric measurement error models, we conduct two simulation studies. In our simulation, we will examine only the performance of the penalized estimating equation method with the SCAD penalty.

In this example, we generate data from a logistic model where the covariate measured with error enters the model through a quadratic function, and the covariates measured without error enter linearly. The measurement error follows a normal additive pattern. Specifically,

$$\begin{array}{l}\text{logit}\{p(Y=1\mid X,Z)\}={\beta}_{0}+{\beta}_{1}X+{\beta}_{2}{X}^{2}+{\beta}_{3}{Z}_{1}+{\beta}_{4}{Z}_{2}+{\beta}_{5}{Z}_{3}+{\beta}_{6}{Z}_{4}\\ +{\beta}_{7}{Z}_{5}+{\beta}_{8}{Z}_{6}+{\beta}_{9}{Z}_{7},\\ \text{and}\phantom{\rule{0.16667em}{0ex}}W=X+U,\end{array}$$

where *β* = (0, 1.5, 2, 0, 3, 0, 1.5, 0, 0, 0)* ^{T}*, the covariate

For the selected model, the model complexity is summarized in terms of the number of zero coefficients, and the model error is summarized in terms of relative approximation model error (RAME), defined to be the ratio of model error of the selected model to that of the full model. In Table 1, the RAME column corresponds to the sample median and median absolute deviation (MAD) divided by a factor of 0.6745 of the RAME values over 1000 simulations. Similarly, the RAME* _{W}* column corresponds to those of the RAME

We next verify the consistency of the estimators and test the accuracy of the proposed standard error formula. Table 2 displays the bias and sample standard deviation (SD) of the estimates for two nonzero coefficients, (*β*_{1}*, β*_{2}), over 1000 simulations, and the sample average and the sample standard deviations of the 1000 standard errors obtained by using the sandwich formula. The row labeled ‘EE’ corresponds to the unpenalized estimating equation estimator. We omit here the results for other nonzero coefficients and the results under sample size *n* = 500. Interested readers can find them in an earlier version of this work, Ma and Li (2007). Overall, the estimators are consistent and the sandwich formula works well.

In this example, we illustrate the performance of the method for a semiparametric measurement error model. Simulation data are generated from

$$\begin{array}{l}\text{logit}(Y)={\beta}_{1}X+{\beta}_{2}{S}_{1}+\cdots +{\beta}_{10}{S}_{9}+\theta (Z)\\ W=X+U\end{array}$$

where *β*, *X* and *W* are the same as in the previous simulation. We generate *S*’s in a similar fashion as the *Z*’s in Example 1. That is, (*S*_{1}, …, *S*_{8}) is generated from a normal distribution with mean zero and covariance between *S _{i}* and

The simulation results are summarized in Table 3, with notation similar to that of Table 1. From Table 3, we can see that the penalized estimating equation estimators can significantly reduce model complexity. Overall, the BIC tuning parameter selectors perform better, while GCV is too conservative. We have further tested the consistency and the accuracy of the standard error formula derived from the sandwich formula. The result is summarized in Table 4, with notation similar to that of Table 2. We note the consistency of the estimator and that the standard error formula performs very well. More simulation results are summarized in the earlier version of the work, Ma and Li (2007).

The Framingham heart study data set (Kannel et al., 1986) is a well known data set where it is generally accepted that measurement error exists on the long term systolic blood pressure (SBP). In addition to SBP, other measurements include age, smoking status and serum cholesterol. In the literature, there has been speculation that a second order term involving age might be needed to analyze the dependence of heart disease occurrence. In addition, it is unclear if the interaction between the various covariates plays a role in influencing the heart disease rate. The data set includes 1,615 observations.

With the method developed here, it is possible to perform a variable selection to address these issues. Following the literature, we adopt the measurement error model of log(MSBP − 50) = log(SBP − 50) + *U*, where *U* is a mean zero normal random variable with variance
${\sigma}_{u}^{2}=0.0126$, MSBP is the measured SBP. We denote the standardized log(MSBP − 50) as *W*. The standardization using the same parameters on log(SBP − 50) is denoted *X*. The standardized serum cholesterol and age are denoted by *Z*_{1}, *Z*_{2} respectively, and *Z*_{3} denotes the binary variable smoking status. Using *Y* to denote the occurrence of heart disease, the saturated model which includes all the interaction terms and also the square of age term is of the form

$$\begin{array}{l}\text{logit}\{p(Y=1\mid X,{Z}^{\prime}s)\}={\beta}_{1}X+{\beta}_{2}X{Z}_{1}+{\beta}_{3}X{Z}_{2}+{\beta}_{4}X{Z}_{3}+{\beta}_{5}+{\beta}_{6}{Z}_{1}+{\beta}_{7}{Z}_{2}\\ +{\beta}_{8}{Z}_{3}+{\beta}_{9}{Z}_{2}^{2}+{\beta}_{10}{Z}_{1}{Z}_{2}+{\beta}_{11}{Z}_{1}{Z}_{3}+{\beta}_{12}{Z}_{2}{Z}_{3},\\ W=X+U.\end{array}$$

We used both GCV and BIC tuning parameter selectors to choose *λ*. We present the tuning parameters and the corresponding GCV and BIC scores in Figure 1. The final chosen *λ* is 0.073 and 0.172 by the GCV and BIC selectors, respectively. The selected model is depicted in Table 5. The GCV criterion selects the covariates *X*, *XZ*_{1}, 1, *Z*_{1}, *Z*_{2}, *Z*_{3},
${Z}_{2}^{2}$, *Z*_{2}*Z*_{3} into the model, while the BIC criterion selects the covariates *X*, 1, *Z*_{1}, *Z*_{2} into the model. We report the selection and estimation results in Table 5, as well as the semiparametric estimation results without variable selection.

Tuning parameters and their corresponding BIC and GCV scores for the Framingham data. The scores are normalized to the range [0, 1].

As seen, the terms *X*, 1, *Z*_{1}, *Z*_{2} are selected by both criteria, while *Z*_{3},
${Z}_{2}^{2}$ and some of the interaction terms are selected only by GCV. The BIC criterion is very aggressive and it results in a very simple final model while the GCV criterion is much more conservative, hence the resulting model is more complex. This agrees with the simulation results obtained. Since both criteria have included the covariate *X*, the measurement error feature and its treatment in the Framingham data is unavoidable.

In this paper, we have proposed a new class of variable selection procedures in the framework of measurement error models. The procedure is proposed in a completely general functional measurement error model setting, and is suitable for both parametric and semiparametric models that contain unspecified smooth functions of an observable covariate. We have assumed the error model *p _{W}*

We also would like to point out that in the special case of generalized linear models and normal additive error with possible heteroscedasticity, the procedure of solving linear integral equations can be spared and the estimating equations are simplified significantly (Ma and Tsiatis, 2006). In such situations, the computation complexity of the proposed procedure will be reduced to about the same level as for variable selection in regressions without errors in the variables.

As pointed out by the referee, it is of great interest to perform variable selection for high dimensional data. In this paper, we allow the number of covariates to grow to infinity at a *o _{p}*(

Ma’s work was supported by a Swiss NSF grant. Li’s research was supported by a NSF grant DMS-0348869 and a National Institute on Drug Abuse (NIDA) grant P50 DA10075. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the National Institutes of Health.

(P1) Let
${c}_{n}=max\{\mid {p}_{\lambda}^{\u2033}(\mid {\beta}_{j0}\mid )\mid :{\beta}_{j0}\ne 0\}$. Assume that *λ _{n}* → 0,

It is easy to verify that both the *L*_{1} and the SCAD penalties satisfy this condition.

- (A1) The expectation of the first derivative of ${S}_{\mathit{eff}}^{\ast}$ with respect to
*β*exists at*β*_{0}and its left eigenvalues are bounded away from zero and infinity uniformly for all*n*. For any entry*S*in $\partial {S}_{\mathit{eff}}^{\ast}({\beta}_{0})/\partial {\beta}^{T},E({S}_{jk}^{2})<{C}_{1}<\infty $._{jk} - (A2) The eigenvalues of the matrix $E({S}_{\mathit{eff},I}^{\ast}{S}_{\mathit{eff},I}^{\ast T})$, satisfy 0 <
*C*_{2}<*λ*< ··· <_{min}*λ*<_{max}*C*_{3}< ∞ for all*n*. For any entries,*S*,_{k}*S*in ${S}_{\mathit{eff}}^{\ast}({\beta}_{0}),E({S}_{k}^{2}{S}_{j}^{2})<{C}_{4}<\infty $._{j} - (A3) The second derivatives of ${S}_{\mathit{eff}}^{\ast}$ with respect to
*β*exist and the entries are uniformly bounded by a function*M*(*W*,_{i}*Z*,_{i}*Y*) in a large enough neighborhood of_{i}*β*_{0}. In addition,*E*(*M*^{2}) <*C*_{5}< ∞ for all*n*,*d*.

Conditions (A1) – (A3) are mild regularity conditions. They guarantee that the solution of the following estimating equation

$$\sum _{i=1}^{n}{S}_{\mathit{eff}}^{\ast}({W}_{i},{Z}_{i},{Y}_{i},\beta )=0$$

is root *n*/*d _{n}* convergent, and possesses asymptotic normality.

Condition (A1) allows us to define

$$J={\left\{E\left(\frac{\partial {S}_{\mathit{eff}}^{\ast}}{\partial {\beta}^{T}}{\mid}_{{\beta}_{0}}\right)\right\}}^{-1},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\phi}_{\mathit{eff}}^{\ast}+{JS}_{\mathit{eff}}^{\ast}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{q}_{{\lambda}_{n}}^{\prime}(\beta )=J{p}_{{\lambda}_{n}}^{\prime}(\beta ).$$

Let *α _{n}* =

$${n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}(\beta )-{n}^{1/2}{q}_{{\lambda}_{n}}^{\prime}(\beta )=0$$

(A1)

has a solution that satisfies $\left|\right|\widehat{\beta}-{\beta}_{0}\left|\right|={O}_{p}(\sqrt{{d}_{n}}{\alpha}_{n})$. This will be shown using the Brouwer fixed point theorem. Using the Taylor expansion, we have

$$\begin{array}{l}{n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}(\beta )-{n}^{1/2}{q}_{{\lambda}_{n}}^{\prime}(\beta )\\ ={n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}({\beta}_{0})-{n}^{1/2}{q}_{{\lambda}_{n}}^{\prime}({\beta}_{0})+{n}^{-1/2}\sum _{i=1}^{n}\frac{\partial {\phi}_{\mathit{eff},i}^{\ast}({\beta}^{\ast})}{\partial {\beta}^{T}}(\beta -{\beta}_{0})\\ -{n}^{1/2}\frac{\partial {q}_{{\lambda}_{n}}^{\prime}({\beta}_{0})}{\partial {\beta}^{T}}(\beta -{\beta}_{0})\{1+{o}_{p}(1)\}\end{array}$$

where *β*^{*} is in between *β* and *β*_{0}. It can be shown by Conditions (A1), (A2) and (A3) and definition of
${\phi}_{\mathit{eff}}^{\ast}(\xb7)$ that

$${(\beta -{\beta}_{0})}^{T}\left\{\frac{1}{n}\sum _{i=1}^{n}\frac{\partial {\phi}_{\mathit{eff},i}^{\ast}({\beta}^{\ast})}{\partial {\beta}^{T}}\right\}(\beta -{\beta}_{0})={\left|\right|\beta -{\beta}_{0}\left|\right|}^{2}\{1+{o}_{P}(1)\}$$

We next check the key condition for the Brouwer fixed point theorem. For any *β* such that
$\left|\right|\beta -{\beta}_{0}\left|\right|=C\sqrt{{d}_{n}}{\alpha}_{n}$ for some constant *C*, it follows by Condition (P1) that

$$\begin{array}{c}{(\beta -{\beta}_{0})}^{T}\left\{\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}(\beta )-{n}^{1/2}{q}_{{\lambda}_{n}}^{\prime}(\beta )\right\}\\ ={(\beta -{\beta}_{0})}^{T}\left\{\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}({\beta}_{0})-\sqrt{n}{q}_{{\lambda}_{n}}^{\prime}({\beta}_{0})\right\}+\sqrt{n}{\left|\right|\beta -{\beta}_{0}\left|\right|}^{2}\{1+{o}_{P}(1)\}.\end{array}$$

Using the Cauchy-Schwartz inequality, it can be shown that the first term in the above equation is of order
$\left|\right|\beta -{\beta}_{0}\left|\right|{O}_{p}(\sqrt{{d}_{n}+{d}_{n}n{a}_{n}^{2}})={O}_{p}(C{n}^{1/2}{d}_{n}{\alpha}_{n}^{2})$. Note that
$\sqrt{n}{\left|\right|\beta -{\beta}_{0}\left|\right|}^{2}={C}^{2}{n}^{1/2}{d}_{n}{\alpha}_{n}^{2}$. Thus the second term in the above equation dominates the first term with probability 1 − *ε* for any *ε* > 0 as long as *C* is large enough. Thus, for any *ε* > 0, and for large enough *C*, the probability for the above display to be larger than zero is at least 1 −*ε*. From the Brouwer fixed-point theorem, we know that with probability at least 1 − *ε*, there exists at least one solution for (A1) in the region
$\left|\right|\beta -{\beta}_{0}\left|\right|\phantom{\rule{0.16667em}{0ex}}\le C\sqrt{{d}_{n}}{\alpha}_{n}$.

If the conditions in Theorem 2 hold, then for any given *β* that satisfies
$\left|\right|\beta -{\beta}_{0}\left|\right|\phantom{\rule{0.16667em}{0ex}}={O}_{p}(\sqrt{{d}_{n}/n})$, with probability tending to 1, any solution
${({\beta}_{I}^{T},{\beta}_{II}^{T})}^{T}$ of (2.2) satisfies that *β _{II}* = 0.

Denote the *k*th element in
${\sum}_{i=1}^{n}{S}_{\mathit{eff}}^{\ast}({W}_{i},{Z}_{i},{Y}_{i},\beta )$ as *L _{nk}*(

$$\begin{array}{l}{L}_{nk}(\beta )={L}_{nk}({\beta}_{0})+\sum _{j=1}^{{d}_{n}}\frac{\partial {L}_{nk}({\beta}_{0})}{\partial {\beta}_{j}}({\beta}_{j}-{\beta}_{j0})\\ +{2}^{-1}\sum _{l=1}^{{d}_{n}}\sum _{j=1}^{{d}_{n}}\frac{{\partial}^{2}{L}_{nk}({\beta}^{\ast})}{\partial {\beta}_{l}\partial {\beta}_{j}}({\beta}_{l}-{\beta}_{l0})({\beta}_{j}-{\beta}_{j0}),\end{array}$$

(A2)

where *β*^{*} is in between *β* and *β*_{0}. Because of condition (A2), the first term of (A2) is of order
${O}_{p}({n}^{1/2})={o}_{p}(\sqrt{n{d}_{n}})$. The second term in (A2) can be further written as

$$\sum _{j=1}^{{d}_{n}}\left\{\frac{\partial {L}_{nk}({\beta}_{0})}{\partial {\beta}_{j}}-E\frac{\partial {L}_{nk}({\beta}_{0})}{\partial {\beta}_{j}}\right\}({\beta}_{j}-{\beta}_{j0})+\sum _{j=1}^{{d}_{n}}E\frac{\partial {L}_{nk}({\beta}_{0})}{\partial {\beta}_{j}}({\beta}_{j}-{\beta}_{j0}).$$

(A3)

Using the Cauchy-Schwartz inequality and condition (A1), it can be shown by straightforward calculation that the first term in (A3) is of order ${O}_{p}(\sqrt{{d}_{n}/n})={o}_{p}(\sqrt{n{d}_{n}})$. Using the Cauchy-Schwartz inequality again, the second term in (A3) is controlled by

$$n{\left\{\sum _{j=1}^{{d}_{n}}{\left(E\frac{\partial {S}_{\mathit{eff},k}}{\partial {\beta}^{T}}\right)}^{2}\right\}}^{1/2}\left|\right|\beta -{\beta}_{0}\left|\right|\phantom{\rule{0.16667em}{0ex}}\le n{\lambda}_{max}{\left(E\frac{\partial {S}_{\mathit{eff},k}}{\partial {\beta}^{T}}\right)}^{2}\left|\right|\beta -{\beta}_{0}\left|\right|\phantom{\rule{0.16667em}{0ex}}={O}_{p}(\sqrt{n{d}_{n}}).$$

Thus, the second term of (A2) is of order
${O}_{p}(\sqrt{n{d}_{n}})$. As for the third term of (A2), we can have a similar decomposition to that of (A3). Using the Cauchy-Schwartz inequality in matrix form and condition (A3), it can be shown that the third term of (A2) is of order
${O}_{p}({d}_{n}^{2})+{O}_{p}({n}^{-1/2}{d}_{n}^{2})={o}_{p}(\sqrt{n{d}_{n}})$ as
${d}_{n}^{5}/n\to 0$. Thus, *L _{nk}*(

$${L}_{nk}(\beta )-n{p}_{{\lambda}_{n}}^{\prime}({\beta}_{k})=-\sqrt{n}\{\sqrt{n/{d}_{n}}{p}_{{\lambda}_{n}}^{\prime}(\mid {\beta}_{k}\mid )\text{sign}({\beta}_{k})+{O}_{p}(1)\}.$$

Using condition (2.6), the sign of
${L}_{nk}(\beta )-n{p}_{{\lambda}_{n}}^{\prime}({\beta}_{k})$ is decided by sign(*β _{k}*) completely when

From Theorem 1 and condition (P1), there is a root *n*/*d _{n}* consistent estimator . From Lemma 1,
$\widehat{\beta}={({\widehat{\beta}}_{I}^{T},{0}^{T})}^{T}$, so (i) is shown. Denote the first

$$\begin{array}{l}0={L}_{n}({\widehat{\beta}}_{I})-n{p}_{{\lambda}_{n},I}^{\prime}({\widehat{\beta}}_{I})\\ ={L}_{n}({\beta}_{I0})+\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}({\widehat{\beta}}_{I}-{\beta}_{I0})-n{b}_{n}-n{p}_{{\lambda}_{n}}^{\u2033}({\beta}_{I}^{\ast})({\widehat{\beta}}_{I}-{\beta}_{I0}),\end{array}$$

where
${\beta}_{I}^{\ast}$ is between *β _{I}*

$$\begin{array}{l}{\left|\right|{n}^{-1}\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}-{p}_{{\lambda}_{n},I}^{\u2033}({\beta}_{I}^{\ast})-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}+{p}_{{\lambda}_{n},I}^{\u2033}({\beta}_{I0})\left|\right|}^{2}\\ \le 2{\left|\right|{n}^{-1}\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}\left|\right|}^{2}+{O}_{p}({n}^{-1}{d}_{n})\end{array}$$

Furthermore, for any fixed *ε* > 0, it follows by conditions (A1), (A3) and the Chebyshev inequality that

$$\begin{array}{l}{P}_{r}\left\{\left|\right|{n}^{-1}\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}\left|\right|\phantom{\rule{0.16667em}{0ex}}\ge \epsilon {{d}_{n}}^{-1}\right\}\\ \le \frac{{{d}_{n}}^{2}}{{n}^{2}{\epsilon}^{2}}E{\left|\right|\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}-nE\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}\left|\right|}^{2}=O({{d}_{n}}^{2}{n}^{-2}{d}_{1}^{2}n)=o(1).\end{array}$$

since *d*_{1} ≤ *d _{n}*. Thus, we have

$$\left|\right|{n}^{-1}\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}\left|\right|\phantom{\rule{0.16667em}{0ex}}={o}_{p}({{d}_{n}}^{-1}).$$

Therefore,

$${\left|\right|{n}^{-1}\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}-{p}_{{\lambda}_{n},I}^{\u2033}({\beta}^{\ast})-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}+{p}_{{\lambda}_{n},I}^{\u2033}({\beta}_{I0})\left|\right|}^{2}\phantom{\rule{0.16667em}{0ex}}={o}_{p}({{d}_{n}}^{-2}),$$

and subsequently,

$$\begin{array}{l}\left|\right|\left\{{n}^{-1}\frac{\partial {L}_{n}({\beta}_{I0}^{\ast})}{\partial {\beta}_{I}^{T}}-{p}_{{\lambda}_{n},I}^{\u2033}({\beta}^{\ast})-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}+{p}_{{\lambda}_{n},I}^{\u2033}({\beta}_{I0})\right\}({\widehat{\beta}}_{I}-{\beta}_{I0})\left|\right|\\ \le {o}_{p}({d}_{n}^{-1}){O}_{p}({n}^{-1/2}{d}_{n}^{1/2})={o}_{p}({n}^{-1/2}).\end{array}$$

We thus obtain that

$$\left\{-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}+{\mathrm{\sum}}_{n}\right\}({\widehat{\beta}}_{I}-{\beta}_{I0})+{b}_{n}={n}^{-1}{L}_{n}({\beta}_{I0})+{o}_{p}({n}^{-1/2}).$$

Denote ${I}^{\ast}=E\{{S}_{n,\mathit{eff},I}^{\ast}({\beta}_{I0}){S}_{n,\mathit{eff},I}^{\ast T}({\beta}_{I0})\}$. Using condition (A2), it follows that

$${n}^{1/2}{v}^{T}{I}^{\ast -1/2}\left[\left\{-E\frac{\partial {L}_{n}({\beta}_{I0})}{\partial {\beta}_{I}^{T}}+{\mathrm{\sum}}_{n}\right\}({\widehat{\beta}}_{I}-{\beta}_{I0})+{b}_{n}\right]={n}^{-1/2}{v}^{T}{I}^{\ast -1/2}{L}_{n}({\beta}_{I0})+{o}_{p}(1).$$

Let *Y _{i}* =

$$\sum _{i=1}^{n}E{\left|\right|{Y}_{i}\left|\right|}^{2}\mathbf{1}(|\left|{Y}_{i}\right||\phantom{\rule{0.16667em}{0ex}}>\epsilon )=nE{\left|\right|{Y}_{1}\left|\right|}^{2}\mathbf{1}(|\left|{Y}_{1}\right||\phantom{\rule{0.16667em}{0ex}}>\epsilon )\le n{(E{\left|\right|{Y}_{1}\left|\right|}^{4})}^{1/2}{\{{P}_{r}(\left|\right|{Y}_{1}\left|\right|\phantom{\rule{0.16667em}{0ex}}>\epsilon )\}}^{1/2}.$$

Using Chebyshev’s inequality, we have

$${P}_{r}(|\left|{Y}_{1}\right||>\epsilon )\le \frac{E{\left|\right|{Y}_{1}\left|\right|}^{2}}{{\epsilon}^{2}}=\frac{E{\left|\right|v{I}^{\ast -1/2}{S}_{\mathit{eff},I}({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})\left|\right|}^{2}}{n{\epsilon}^{2}}=\frac{{v}^{T}v}{n{\epsilon}^{2}}=O({n}^{-1}).$$

Note that *E*(||*Y*_{1}||^{4}) = *n*^{−2}*E*||*v ^{T}I*

$$\begin{array}{l}E({\left|\right|{Y}_{1}\left|\right|}^{4})={n}^{-2}E{\{{S}_{\mathit{eff},I}{({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})}^{T}{I}^{\ast -1/2}v{v}^{T}{I}^{\ast -1/2}{S}_{\mathit{eff},I}({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})\}}^{2}\\ \le {n}^{-2}{\lambda}_{max}^{2}({I}^{\ast -1})E{\{{S}_{\mathit{eff},I}{({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})}^{T}{S}_{\mathit{eff},I}({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})\}}^{2}\\ ={n}^{-2}{\lambda}_{max}^{2}({I}^{\ast -1})E{\left|\right|{S}_{\mathit{eff},I}({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})\left|\right|}^{4}=O({d}_{1}^{2}{n}^{-2}),\end{array}$$

due to condition (A2). Hence,

$$\sum _{i=1}^{n}E{\left|\right|{Y}_{i}\left|\right|}^{2}\mathbf{1}(|\left|{Y}_{i}\right||\phantom{\rule{0.16667em}{0ex}}>\epsilon )=O(n{d}_{1}{n}^{-1}{n}^{-1/2})=o(1).$$

On the other hand,

$$\begin{array}{c}\sum _{i=1}^{n}cov({Y}_{i})=ncov\{{n}^{-1/2}v{I}^{\ast -1/2}{S}_{\mathit{eff},I}({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})\}\\ =v{I}^{\ast -1/2}E\{{S}_{\mathit{eff},I}({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0}){S}_{\mathit{eff},I}{({W}_{1},{Z}_{1},{Y}_{1},{\beta}_{I0})}^{T}\}{I}^{\ast -1/2}{v}^{T}=1.\end{array}$$

Following Lindeberg-Feller central limit theorem, the results in (ii) now follow.

The notation *C _{i}* below is generic and is allowed to be different from that in Conditions (A1)–(A3).

- (B1) The first derivatives of with respect to
*β*and*θ*exist and are denoted asand_{β}respectively. The first derivative of_{θ}*θ*with respect to*β*exists and is denoted as*θ*. Thus_{β}*E*(+_{β}) exists and its left eigenvalues are bounded away from zero and infinity uniformly for all_{θ}θ_{β}*n*at*β*_{0}and the true function*θ*_{0}(*Z*). For any entry*S*of the matrix_{jk}*d*(+_{β})/_{θ}θ_{β}*dβ*, $E({S}_{jk}^{2})<{C}_{1}<\infty $. - (B2) The eigenvalues of the matrix
*E*{− (_{I}*Z*)Ψ}{− (_{I}*Z*)Ψ}satisfy 0 <^{T}*C*_{2}<*λ*< ··· <_{min}*λ*<_{max}*C*_{3}< ∞ for all*n*; for any entries*S*,_{k}*S*in (_{j}+_{β}), $E({S}_{k}^{2}{S}_{j}^{2})<{C}_{4}<\infty $._{θ}θ_{β} - (B3) The second derivatives of with respect to
*β*and*θ*exist, the second derivatives of*θ*with respect to*β*exists, and the entries are uniformly bounded by a function*M*(*W*,_{i}*Z*,_{i}*S*,_{i}*Y*) in a neighborhood of_{i}*β*_{0},*θ*_{0}. In addition,*E*(*M*^{2}) <*C*_{5}< ∞ for all*n*,*d*. - (B4) The random variable
*Z*has compact support and its density*f*(_{Z}*z*) is positive on that support. The bandwidth*h*satisfies*nh*^{4}→ 0 and*nh*^{2}→ ∞.*θ*(*z*) has bounded second derivative.

Denote

$$J={[E\{({\mathcal{L}}_{\beta}+{\mathcal{L}}_{\theta}{\theta}_{\beta}){\mid}_{{\beta}_{0},{\theta}_{0}}\}]}^{-1},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\phi}_{\mathit{eff}}^{\ast}(\beta ,\theta )=J\mathcal{L}(\beta ,\theta )\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{q}_{{\lambda}_{n}}^{\prime}(\beta )=J{p}_{{\lambda}_{n}}^{\prime}(\beta ).$$

Let *α _{n}* =

$${n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}(\beta ,\widehat{\theta})-{n}^{1/2}{q}_{{\lambda}_{n}}^{\prime}(\beta )=0$$

(A4)

has a solution that satisfies $\left|\right|\widehat{\beta}-{\beta}_{0}\left|\right|\phantom{\rule{0.16667em}{0ex}}={O}_{p}({d}_{n}^{1/2}{\alpha}_{n})$.

Due to the usual local estimating equation expansion, we have

$$\begin{array}{l}\widehat{\theta}(z,{\beta}_{0})-{\theta}_{0}(z)\\ =({h}^{2}/2){\theta}_{0}^{\u2033}(z)-{n}^{-1}\sum _{j=1}^{n}{K}_{h}({Z}_{j}-z){\mathrm{\Omega}}^{-1}(z){\mathrm{\Psi}}_{j}({\beta}_{0},{\theta}_{0})/{f}_{Z}(z)+{o}_{p}({n}^{-1/2}),\end{array}$$

(A5)

which implies that (*z*, *β*_{0}) − *θ*_{0}(*z*) = *O _{p}*(

$$\begin{array}{l}{n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}\{\beta ,\widehat{\theta}(\beta )\}\\ ={n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0},\widehat{\theta}({\beta}_{0})\}\\ +{n}^{-1/2}\sum _{i=1}^{n}\left[\frac{\partial {\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0}+\widehat{\theta}({\beta}_{0})\}}{\partial {\beta}^{T}}+\frac{\partial {\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0}+\widehat{\theta}({\beta}_{0})\}}{\partial {\theta}^{T}}\frac{\partial \widehat{\theta}}{\partial {\beta}^{T}}\right]\phantom{\rule{0.16667em}{0ex}}(\beta -{\beta}_{0})\\ +\frac{1}{2\sqrt{n}}\sum _{i=1}^{n}{(\widehat{\beta}-{\beta}_{0})}^{T}\frac{d\phantom{\rule{0.16667em}{0ex}}\left[{\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0}+\widehat{\theta}({\beta}_{0})\}+{\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0}+\widehat{\theta}({\beta}_{0})\}{\scriptstyle \frac{d\widehat{\theta}}{d\beta}}\right]}{d{\beta}^{T}}{\mid}_{{\beta}^{\ast}}(\beta -{\beta}_{0}),\end{array}$$

where *β*^{*} is in between *β* and *β*_{0}. Because of condition (B3), each component of the last term is uniformly of order *O _{p}*(

$$\begin{array}{l}{n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0},\widehat{\theta}({\beta}_{0})\}\\ ={n}^{-1/2}\sum _{i=1}^{n}\frac{\partial {\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0},\widehat{\theta}({\beta}_{0})\}}{\partial {\theta}^{T}}\{\widehat{\theta}\phantom{\rule{0.16667em}{0ex}}({\beta}_{0})-{\theta}_{0}\}+{O}_{p}({n}^{1/2})\{\widehat{\theta}({\beta}_{0})-{\theta}_{0}\}{\{\widehat{\theta}({\beta}_{0})-{\theta}_{0}\}}^{T}\\ ={n}^{-1/2}\sum _{i=1}^{n}\frac{\partial {\phi}_{\mathit{eff},i}^{\ast}\{{\beta}_{0},\widehat{\theta}({\beta}_{0})\}}{\partial {\theta}^{T}}\{\widehat{\theta}\phantom{\rule{0.16667em}{0ex}}({\beta}_{0})-{\theta}_{0}\}+{o}_{p}(1)\end{array}$$

under conditions (B3) and (B4). Summarizing the above results, making use of (A5), we obtain

$$\begin{array}{l}{n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}\{\beta ,\widehat{\theta}(\beta )\}\\ ={n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}({\beta}_{0},{\theta}_{0})+{n}^{1/2}(\beta -{\beta}_{0})\\ -{n}^{-3/2}\sum _{j,i=1}^{n}\frac{\partial {\phi}_{\mathit{eff},i}^{\ast}({\beta}_{0},{\theta}_{0})}{\partial {\theta}^{T}}{K}_{h}({Z}_{j}-{Z}_{i})\frac{\mathrm{\Omega}({Z}_{i}){\mathrm{\Psi}}_{j}({\beta}_{0},{\theta}_{0})}{{f}_{Z}({Z}_{i})}+{o}_{p}(1)\\ ={n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}({\beta}_{0},{\theta}_{0})+{n}^{1/2}(\beta -{\beta}_{0})-{n}^{-1/2}\sum _{i=1}^{n}J\mathcal{U}({Z}_{i}){\mathrm{\Psi}}_{j}({\beta}_{0},{\theta}_{0})+{o}_{p}(1)\end{array}$$

under condition (B4). Similar to the situation in Theorem 1, under condition (P1), we further obtain

$$\begin{array}{l}{(\beta -{\beta}_{0})}^{T}\left\{{n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}(\beta ,\widehat{\theta})-{n}^{1/2}{q}_{{\lambda}_{n}}^{\prime}(\beta )\right\}\\ ={(\beta -{\beta}_{0})}^{T}\left\{{n}^{-1/2}\sum _{i=1}^{n}{\phi}_{\mathit{eff},i}^{\ast}({\beta}_{0},{\theta}_{0})-{n}^{1/2}{q}_{{\lambda}_{n}}^{\prime}({\beta}_{0})-{n}^{-1/2}\sum _{i=1}^{n}J\mathcal{U}({Z}_{i}){\mathrm{\Psi}}_{i}({\beta}_{0},{\theta}_{0})\right\}\\ +{n}^{1/2}{\left|\right|\beta -{\beta}_{0}\left|\right|}^{2}+{o}_{p}\{{n}^{1/2}{\left|\right|\beta -{\beta}_{0}\left|\right|}^{2}\}.\end{array}$$

(A6)

The first term in the above display is of order
${O}_{p}(C{n}^{1/2}{d}_{n}{\alpha}_{n}^{2})$, the second term equals
${C}^{2}{n}^{1/2}{d}_{n}{\alpha}_{n}^{2}$, which dominates the first term as long as *C* is large enough. The last term is dominated by the first two terms. Thus, for any *ε* > 0, as long as *C* is large enough, the probability for the above display to be larger than zero is at least 1 − *ε*. From Brouwer fixed-point theorem, we know that with probability at least 1 − *ε*, there exists at least one solution for (A4) in the region
$\left|\right|\beta -{\beta}_{0}\left|\right|\phantom{\rule{0.16667em}{0ex}}\le C{d}_{n}^{1/2}{\alpha}_{n}$.

If conditions in Theorem 4 hold, then for any given *β* that satisfies
$\left|\right|\beta -{\beta}_{0}\left|\right|={O}_{p}(\sqrt{d/n})$, with probability tending to 1, any solution
${({\beta}_{I}^{T},{\beta}_{II}^{T})}^{T}$ of (2.2) satisfies that *β _{II}* = 0.

Denote the *k*th equation in
${\sum}_{i=1}^{n}{\mathcal{L}}_{i}\{\beta ,\widehat{\theta}(\beta )\}$ as *L _{nk}*(

$$\begin{array}{l}{L}_{nk}(\beta ,\widehat{\theta})-n{p}_{{\lambda}_{n}}^{\prime}({\beta}_{k})\\ ={L}_{nk}({\beta}_{0},{\theta}_{0})-{G}_{nk}({\beta}_{0},{\theta}_{0})+n\sum _{j=1}^{d}{({J}^{-1})}_{kj}({\beta}_{j}-{\beta}_{j0})-n{p}_{{\lambda}_{n}}^{\prime}(\mid {\beta}_{k}\mid )\text{sign}({\beta}_{k})+{o}_{p}(\sqrt{n{d}_{n}}).\end{array}$$

Similar to the derivation in Lemma 1, the first three terms of the above display are all of order ${O}_{p}(\sqrt{n{d}_{n}})$, hence we have

$${L}_{nk}(\beta ,\widehat{\theta})-n{p}_{{\lambda}_{n}}^{\prime}({\beta}_{k})=-\sqrt{n{d}_{n}}\{\sqrt{n/{d}_{n}}{p}_{{\lambda}_{n}}^{\prime}(\mid {\beta}_{k}\mid )\text{sign}({\beta}_{k})+{O}_{p}(1)\}.$$

Because of (2.6), the sign of
${L}_{nk}(\beta )-n{p}_{{\lambda}_{n}}^{\prime}({\beta}_{k})$ is decided by sign(*β _{k}*) completely. From the continuity of
${L}_{nk}(\beta )-n{p}_{{\lambda}_{n}}^{\prime}({\beta}_{k})$, we obtain that it is zero at

(i) immediately follows by Lemma 1. Denote the first *d*_{1} equations in
${\sum}_{i=1}^{n}{\mathcal{L}}_{i}\{{({\beta}_{I}^{T},{0}^{T})}^{T},\widehat{\theta}\}$ as *L _{n}*{

$$\begin{array}{l}0={L}_{n}\{{\widehat{\beta}}_{I},\widehat{\theta}({\widehat{\beta}}_{I})\}-n{p}_{{\lambda}_{n},I}^{\prime}({\widehat{\beta}}_{I})\\ ={L}_{n}({\beta}_{I0},{\theta}_{0})-{G}_{n}({\beta}_{I0},{\theta}_{0})+nA({\widehat{\beta}}_{I}-{\beta}_{I0})-n{b}_{n}-n\{{\mathrm{\sum}}_{n}+{o}_{p}(1)\}({\widehat{\beta}}_{I}-{\beta}_{I0})+{o}_{p}({d}_{n}^{1/2}{n}^{1/2})\\ ={L}_{n}({\beta}_{I0},{\theta}_{0})-{G}_{n}({\beta}_{I0},{\theta}_{0})+n(A-{\mathrm{\sum}}_{n})\left[{\widehat{\beta}}_{I}-{\beta}_{I0}-{(A-{\mathrm{\sum}}_{n})}^{-1}{b}_{n}\right]+{o}_{p}({d}_{n}^{1/2}{n}^{1/2}).\end{array}$$

Using condition (B2), we have

$$\begin{array}{l}{n}^{1/2}{v}^{T}{B}^{-1/2}\left\{(-A+{\mathrm{\sum}}_{n})({\widehat{\beta}}_{I}-{\beta}_{I0})+{b}_{n}\right\}\\ ={n}^{-1/2}{v}^{T}{B}^{-1/2}\{{L}_{n}({\beta}_{I0},{\theta}_{0})-{G}_{n}({\beta}_{I0},{\theta}_{0})\}+{o}_{p}({v}^{T}{B}^{-1/2})\\ ={n}^{-1/2}{v}^{T}{B}^{-1/2}\{{L}_{n}({\beta}_{I0},{\theta}_{0})-{G}_{n}({\beta}_{I0},{\theta}_{0})\}+{o}_{p}(1).\end{array}$$

Let *Y _{i}* =

$$\sum _{i=1}^{n}E{\left|\right|{Y}_{i}\left|\right|}^{2}\mathbf{1}(|\left|{Y}_{i}\right||>\epsilon )=nE{\left|\right|{Y}_{1}\left|\right|}^{2}\mathbf{1}(|\left|{Y}_{1}\right||>\epsilon )\le n{(E{\left|\right|{Y}_{1}\left|\right|}^{4})}^{1/2}{\{{P}_{r}(\left|\right|{Y}_{1}\left|\right|>\epsilon )\}}^{1/2}.$$

Using the Chebyshev inequality, we have *P _{r}*(||

$${n}^{-2}{\lambda}_{max}^{2}({B}^{-1})E{[{\{{\mathcal{L}}_{nI1}({\beta}_{I0},{\theta}_{0})-{\mathcal{U}}_{nI}({Z}_{1}){\mathrm{\Psi}}_{1}({\beta}_{I0},{\theta}_{0})\}}^{T}\{{\mathcal{L}}_{nI1}({\beta}_{I0},{\theta}_{0})-{\mathcal{U}}_{nI}({Z}_{1}){\mathrm{\Psi}}_{1}({\beta}_{I0},{\theta}_{0})\}]}^{2}$$

which equals ${n}^{-2}{\lambda}_{max}^{2}({B}^{-1})E{\left|\right|\{{\mathcal{L}}_{nI1}({\beta}_{I0},{\theta}_{0})-{\mathcal{U}}_{nI}({Z}_{1}){\mathrm{\Psi}}_{1}({\beta}_{I0},{\theta}_{0})\}\left|\right|}^{4}=O({d}_{2}^{2}{n}^{-2})$ by condition (B2). Hence,

$$\sum _{i=1}^{n}E{\left|\right|{Y}_{i}\left|\right|}^{2}\mathbf{1}(|\left|{Y}_{i}\right||>\epsilon )=O(n{d}_{n}{n}^{-1}{n}^{-1/2})=o(1).$$

On the other hand,

$$\sum _{i=1}^{n}cov({Y}_{i})=ncov[{n}^{-1/2}{v}^{T}{B}^{-1/2}\{{\mathcal{L}}_{nI1}({\beta}_{I0},{\theta}_{0})-{\mathcal{U}}_{nI}({Z}_{1}){\mathrm{\Psi}}_{1}({\beta}_{I0},{\theta}_{0})\}]=1.$$

(ii) follows by Lindeberg-Feller central limit theorem.

Yanyuan Ma, Department of Statistics, Texas A&M University, College Station, TX 77843.

Runze Li, Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802.

- Bickel PJ, Ritov AJC. Efficient estimation in the errors-in-variables model. Annals of Statistics. 1987;15:513–540.
- Cai J, Fan J, Li R, Zhou H. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–16. [PMC free article] [PubMed]
- Candès E, Tao T. The Dantzig selector: statistical estimation when
*p*is much larger than*n*(with discussion) Annals of Statistics. 2007;35:2313–2392. - Carroll RJ, Hall P. Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Association. 1988;83:1184–1186.
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C. Measurement Error in Nonlinear Models: A Modern Perspective. 2. CRC Press; London: 2006.
- Delaigle A, Hall P. Using SIMEX for smoothing-parameter choice in errors-in-variables problems. Journal of the American Statistical Association. 2007;103:280–287.
- Delaigle A, Meister A. Nonparametric regression estimation in the heteroscedastic errors-in-variables problem. Journal of the American Statistical Association. 2007;102:1416–1426.
- Fan J. On the optimal rates of convergence for nonparametric deconvolution problems. Annals of Statistics. 1991;19:1257–1272.
- Fan J, Huang T. Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli. 2005;11:1031–1057.
- Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–60.
- Fan J, Lv J. Sure independence screening for ultra-high dimensional feature space (with discussion) Journal of the Royal Statistical Society, Serie B. 2008;70:849–911. [PMC free article] [PubMed]
- Fan J, Peng H. Nonconcave penalized likelihood with a diverging number of parameters. Annals of Statistics. 2004;32:928–61.
- Hall P, Ma Y. Semiparametric estimators of functional measurement error models with unknown error. Journal of the Royal Statistical Society, Serie B. 2007;69:429–46.
- Härdle W, Liang H, Gao J. Partially Linear Models. Heidelberg: Springer Physica; 2000.
- Hunter D, Li R. Variable selection using MM algorithms. Annals of Statistics. 2005;33:1617–1642. [PMC free article] [PubMed]
- Kannel WB, Newton JD, Wentworth D, Thomas HE, Stamler J, Hulley SB, Kjelsberg MO. Overall and coronary heart disease mortality rates in relation to major risk factors in 325, 348 men Screened for MRFIT. American Heart Journal. 1986;112:825–36. [PubMed]
- Lam C, Fan J. Profile-Kernel likelihood inference with diverging number of parameters. Annals of Statistics. 2008;36:2232–2260. [PMC free article] [PubMed]
- Li R, Liang H. Variable selection in semiparametric regression modeling. Annals of Statistics. 2008;36:261–286. [PMC free article] [PubMed]
- Li R, Nie L. A new estimation procedure for partially nonlinear model via a mixed effects approach. Canadian Journal of Statistics. 2007;35:399–411.
- Li R, Nie L. Efficient statistical inference procedures for partially nonlinear models and their applications. Biometrics. 2008;64:904–911. [PMC free article] [PubMed]
- Liang H, Härdle W, Carroll RJ. Estimation in a semiparametric partially linear errors-in-variables model. Annals of Statistics. 1999;27:1519–1535.
- Liang H, Li R. Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association. 2009 in press. [PMC free article] [PubMed]
- Ma Y, Carroll RJ. Locally efficient estimators for eemiparametric models with measurement error. Journal of the American Statistical Association. 2006;101:1465–74.
- Ma Y, Li R. Variable selection in measurement error models. Tech Report. 2007. Available at http://www2.unine.ch/webdav/site/statistics/shared/documents/v10.pdf.
- Ma Y, Tsiatis AA. Closed form semiparametric estimators for measurement error models. Statistica Sinica. 2006;16:183–93.
- Severini TA, Staniswalis JG. Quasilikelihood estimation in semiparametric models. Journal of the American Statistical Association. 1994;89:501–511.
- Stefanski LA, Carroll RJ. Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika. 1987;74:703–716.
- Tsiatis AA, Ma Y. Locally efficient semiparametric estimators for functional measurement error models. Biometrika. 2004;91:835–48.
- Wang H, Li R, Tsai C. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94:553–568. [PMC free article] [PubMed]
- Zou H, Li R. One-step sparse estimates in nonconcave penalized likelihood models (with discussion) Annals of Statistics. 2008;36:1509–1566. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |