Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2908045

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Local rank estimation procedure
- 3 Theoretical Properties
- 4 Numerical Studies
- Supplementary Material
- References

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2010 December 1.

Published in final edited form as:

J Am Stat Assoc. 2009 December 1; 104(488): 1631–1645.

doi: 10.1198/jasa.2009.tm09055PMCID: PMC2908045

NIHMSID: NIHMS142014

See other articles in PMC that cite the published article.

By allowing the regression coefficients to change with certain covariates, the class of varying coefficient models offers a flexible approach to modeling nonlinearity and interactions between covariates. This paper proposes a novel estimation procedure for the varying coefficient models based on local ranks. The new procedure provides a highly efficient and robust alternative to the local linear least squares method, and can be conveniently implemented using existing R software package. Theoretical analysis and numerical simulations both reveal that the gain of the local rank estimator over the local linear least squares estimator, measured by the asymptotic mean squared error or the asymptotic mean integrated squared error, can be substantial. In the normal error case, the asymptotic relative efficiency for estimating both the coefficient functions and the derivative of the coefficient functions is above 96%; even in the worst case scenarios, the asymptotic relative efficiency has a lower bound 88.96% for estimating the coefficient functions, and a lower bound 89.91% for estimating their derivatives. The new estimator may achieve the nonparametric convergence rate even when the local linear least squares method fails due to infinite random error variance. We establish the large sample theory of the proposed procedure by utilizing results from generalized U-statistics, whose kernel function may depend on the sample size. We also extend a resampling approach, which perturbs the objective function repeatedly, to the generalized U-statistics setting; and demonstrate that it can accurately estimate the asymptotic covariance matrix.

As introduced in Cleveland, Crosse and Shyu (1992) and Hastie and Tibshirani (1993), the varying coefficient model provides a natural and useful extension of the classical linear regression model by allowing the regression coefficients to depend on certain covariates. Due to its flexibility to explore the dynamic features which may exist in the data and its easy interpretation, the varying coefficient model has been widely applied in many scientific areas. It has also experienced rapid developments in both theory and methodology, see Fan and Zhang (2008) for a comprehensive survey. Fan and Zhang (1999) proposed a two-step estimation procedure for the varying coefficient model when the coefficient functions have possibly different degrees of smoothness. Kauermann and Tutz (1999) investigated the use of varying coefficient models for diagnosing the lack-of-fit of regression, regarding the varying coefficient model as an alternative to a parametric null model. Cai, Fan and Li (2000) developed a more efficient estimation procedure for varying coefficient models in the framework of generalized linear models. As special cases of varying coefficient models, time-varying coefficient models are particularly appealing in longitudinal studies, survival analysis and time series data since they allow one to explore the time-varying effect of covariates over the response. Pioneering works on novel applications of time-varying coefficient models to longitudinal data include Brumback and Rice (1998), Hoover, *et al.* (1998), Wu, *et al.* (1998) and Fan and Zhang (2000), among others. For more details, readers are referred to Fan and Li (2006) and the references therein. Time-varying coefficient models are also popular in modeling and predicting nonlinear time series data and survival data, see Fan and Zhang (2008) for related literature.

Estimation procedures in the aforementioned papers are built on either local least squares type or local likelihood type methods. Although these estimators remain asymptotically normal for a large class of random error distributions, their efficiency can deteriorate dramatically when the true error distribution deviates from normality. Furthermore, these estimators are very sensitive to outliers. Even a few outlying data points may introduce undesirable artificial features in the estimated functions. These considerations motivate us to develop a novel local rank estimation procedure that is highly efficient, robust and computationally simple. In particular, the proposed local rank regression estimator may achieve the nonparametric convergence rate even when the local linear least squares method fails to consistently estimate the regression coefficient functions due to infinite random error variance, which occurs for instance when the random error has a Cauchy distribution.

The new approach can substantially improve upon the commonly used local linear least squares procedure for a wide class of error distributions. Theoretical analysis reveals that the asymptotic relative efficiency (ARE), measured by the asymptotic mean squared error (or the asymptotic mean integrated squared error), of the local rank regression estimator in comparison with the local linear least squares estimator has an expression that is closely related to that of the Wilcoxon-Mann-Whitney rank test in comparison with the two-sample *t*-test. However, different from the two-sample test scenario where the efficiency is completely determined by the asymptotic variance, in the current setting of estimating an infinite-dimensional parameter both bias and variance contribute to the asymptotic efficiency. The value of ARE is often significantly greater than one. For example, the ARE is 167% for estimating the regression coefficient functions when the random error has a *t*_{3} distribution, is 240% for exponential random error distribution, and is 493% for lognormal random error distribution.

A striking feature of the local rank procedure is that its pronounced efficiency gain comes with only a little loss when the random error actually has a normal distribution, for which the ARE of the local rank regression estimator relative to the local linear least squares estimator is above 96% for estimating both the coefficient functions and their derivatives. For estimating the regression coefficient functions, the ARE has a sharp lower bound 88.96%, which implies that the efficiency loss is at most 11.04% in the worst case scenario. For estimating the first derivative of the regression coefficient functions, the ARE possesses a lower bound 89.91%. Kim (2008) developed a quantile regression procedure for varying coefficient models when the random errors are assumed to have a certain quantile equal to zero. She used the regression splines method and derived the convergence rate, but the lack of asymptotic normality result does not allow the comparison of the relative efficiency. On the other hand, one may extend the local quantile regression approach (Yu and Jones, 1998) to the varying coefficient models. However, this is expected to yield an estimator which still suffers from loss of efficiency and may have near zero ARE relative to the local linear least squares estimator in the worst case scenario.

The new estimator proposed in this paper minimizes a convex objective function based on local ranks. The implementation of the minimization can be conveniently carried out using existing functions in the R statistical software package via a simple algorithm (§4.1). The objective function has the form of a generalized *U*-statistic whose kernel varies with the sample size. Under some mild conditions, we establish the asymptotic representation of the proposed estimator and further prove its asymptotic normality. We derive the formula of the asymptotic relative efficiency of the local rank estimator relative to the local linear least squares estimator, which confirms the efficiency advantage of the local rank approach. We also extend a resampling approach, which perturbs the objective function repeatedly, to the generalized U-statistics setting; and demonstrate that it can accurately estimate the asymptotic covariance matrix.

This paper is organized as follows. Section 2 presents the local rank procedure for estimating the varying coefficient models. Section 3 discusses its large sample properties and proposes a resampling method for estimating the asymptotic covariance matrix. In Section 4, we address issues related to practical implementation and present Monte Carlo simulation results. We further illustrate the proposed procedure via analyzing an environment data set. Regularity conditions and technical proofs are presented in the Appendix.

Let *Y* be a response variable, and *U* and **X** be the covariates. The varying coefficient model is defined by

$$Y={a}_{0}(U)+{\mathbf{X}}^{T}\mathbf{a}(U)+\epsilon ,$$

(1)

where *a*_{0}(·) and **a**(·) are both unknown smooth functions. The random error *ε* has probability density function *g*(·) which has finite Fisher information, i.e., *∫* {*g*(*x*)}^{−1} *g*′(*x*)^{2}*dx* < ∞. In this paper, it is assumed that *U* is a scalar and **X** is a *p*-dimensional vector. The proposed procedures can be extended to the case of multivariate *U* with more complicated notations by following the same idea in this paper.

Suppose that {*U _{i}*,

$${a}_{m}(u)\approx {a}_{m}({u}_{0})+{a}_{m}^{\prime}({u}_{0})(u-{u}_{0}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}m=0,1,\dots ,p.$$

(2)

Denote *α*_{1} = *a*_{0}(*u*_{0}),
${\alpha}_{2}={a}_{0}^{\prime}({u}_{0})$, *β _{m}* =

$${e}_{i}={Y}_{i}-{\alpha}_{1}-{\alpha}_{2}({U}_{i}-{u}_{0})-\sum _{m=1}^{p}[{\beta}_{m}+{\beta}_{p+m}({U}_{i}-{u}_{0})]{X}_{im}.$$

(3)

We define the local rank objective function to be

$${Q}_{n}(\mathit{\beta},{\alpha}_{2})=\frac{1}{n(n-1)}\sum _{1\le i,j\le n}\mid {e}_{i}-{e}_{j}\mid {K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}),$$

(4)

where ** β** = (

For any given *u*_{0}, minimizing *Q _{n}*(

$${\widehat{a}}_{m}({u}_{0})={\widehat{\beta}}_{m},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\widehat{a}}_{m}^{\prime}({u}_{0})={\widehat{\beta}}_{p+m}\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}{\widehat{a}}_{0}^{\prime}({u}_{0})={\widehat{\alpha}}_{2},$$

In the sequel, we also use the vector notation **(***u*_{0}) = (*â*_{1}(*u*_{0}), …, *â _{p}*(

The location parameter *a*_{0}(*u*_{0}) needs to be estimated separately. This is analogous to the scenario of global rank estimation of intercept in the linear regression model. In order to make the intercept identifiable, it is essential to have additional location constraint on the random errors. We adopt the commonly used constraint that *ε _{i}* has median zero. Given (

$${n}^{-1}\sum _{i=1}^{n}\left|{Y}_{i}-{\alpha}_{1}-{\widehat{\alpha}}_{2}({U}_{i}-{u}_{0})-\sum _{m=1}^{p}[{\widehat{\beta}}_{m}+{\widehat{\beta}}_{p+m}({U}_{i}-{u}_{0})]{X}_{im}\right|{K}_{h}({U}_{i}-{u}_{0}),$$

(5)

which is a local version of a weighted *L*_{1}-norm objective function.

In this subsection, we investigate the asymptotic properties of ** and **_{2}. The main challenge comes from the non-smoothness of the objective function *Q _{n}*(

Let us begin with some new notation. Let *γ _{n}* = (

$$\begin{array}{l}{\mathit{\beta}}^{\ast}={\gamma}_{n}^{-1}{({\beta}_{1}-{a}_{1}({u}_{0}),\dots ,{\beta}_{p}-{a}_{p}({u}_{0}),h({\beta}_{p+1}-{a}_{1}^{\prime}({u}_{0})),\dots ,h({\beta}_{2p}-{a}_{p}^{\prime}({u}_{0})))}^{T},\\ {\mathit{\alpha}}^{\ast}={({\alpha}_{1}^{\ast},{\alpha}_{2}^{\ast})}^{T}={\gamma}_{n}^{-1}{({\alpha}_{1}-{a}_{0}({u}_{0}),h({\alpha}_{2}-{a}_{0}^{\prime}({u}_{0})))}^{T}\\ {\mathrm{\Delta}}_{i}({u}_{0})=\sum _{m=1}^{p}[{a}_{m}({U}_{i})-{a}_{m}({u}_{0})-{a}_{m}^{\prime}({u}_{0})({U}_{i}-{u}_{0})]{X}_{im}\\ +[{a}_{0}({U}_{i})-{a}_{0}({u}_{0})-{a}_{0}^{\prime}({u}_{0})({U}_{i}-{u}_{0})].\end{array}$$

Let ${({\widehat{\mathit{\beta}}}_{n}^{\ast T},{\widehat{\alpha}}_{2n}^{\ast})}^{T}$ be the value of ${({\mathit{\beta}}^{\ast T},{\alpha}_{2}^{\ast})}^{T}$ that minimizes the following reparametrized objective function

$$\begin{array}{l}{Q}_{n}^{\ast}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})=\frac{1}{n(n-1)}\sum _{1\le i,j\le n}|({\epsilon}_{i}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{i}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{i}+{\mathrm{\Delta}}_{i}({u}_{0}))\\ -({\epsilon}_{j}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{j}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))|{K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}),\end{array}$$

(6)

Where
${\mathbf{Z}}_{i}={({\mathbf{X}}_{i}^{T},(({U}_{i}-{u}_{0})/h){\mathbf{X}}_{i}^{T})}^{T}$. Let **H** = diag(1, *h*)**I*** _{p}*, where denotes the operation of Kronecker product and

$${\widehat{\mathit{\beta}}}_{n}^{\ast}=\sqrt{nh}\mathbf{H}(\widehat{\mathit{\beta}}-{\mathit{\beta}}_{0})\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}{\widehat{\alpha}}_{2n}^{\ast}=\sqrt{n{h}^{3}}[{\widehat{\alpha}}_{2}-{a}_{0}^{\prime}({u}_{0})].$$

We next show that the non-smooth function
${Q}_{n}^{\ast}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$ can be locally approximated by a quadratic function of
${({\mathit{\beta}}^{\ast T},{\alpha}_{2}^{\ast})}^{T}$. Let *μ _{i}* =

$$\begin{array}{l}{S}_{n1}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})=2{\gamma}_{n}{[n(n-1)]}^{-1}\sum _{i\ne j}[I({\epsilon}_{i}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{i}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}\\ -{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{j}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}),\end{array}$$

and

$$\begin{array}{l}{S}_{n2}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})=2{\gamma}_{n}{[n(n-1)]}^{-1}\sum _{i\ne j}[I({\epsilon}_{i}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{i}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}\\ -{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{j}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2](({U}_{i}-{U}_{j})/h){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}).\end{array}$$

Furthermore, we consider the following quadratic function of ${({\mathit{\beta}}^{\ast T},{\alpha}_{2}^{\ast})}^{T}$:

$${B}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})={\gamma}_{n}^{-1}({\mathit{\beta}}^{\ast T},{\alpha}_{2}^{\ast})\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{c}{S}_{n1}(\mathbf{0},0)\\ {S}_{n2}(\mathbf{0},0)\end{array}\right)+\frac{1}{2}{\gamma}_{n}({\mathit{\beta}}^{\ast T},{\alpha}_{2}^{\ast})\mathbf{A}\left(\begin{array}{c}{\mathit{\beta}}^{\ast}\\ {\alpha}_{2}^{\ast}\end{array}\right)+{\gamma}_{n}^{-1}{Q}_{n}^{\ast}(\mathbf{0},0),$$

(7)

where

$$\mathbf{A}=4\tau {f}^{2}({u}_{0})\left(\begin{array}{ccc}\mathrm{\sum}({u}_{0})& \mathbf{0}& \mathbf{0}\\ \mathbf{0}& {\mu}_{2}\mathrm{\sum}({u}_{0})& \mathbf{0}\\ \mathbf{0}& \mathbf{0}& {\mu}_{2}\end{array}\right),$$

(8)

$\mathrm{\sum}({u}_{0})=E[{\mathbf{X}}_{i}{\mathbf{X}}_{i}^{T}\mid {U}_{i}={u}_{0}]$, **0** denotes a matrix (or vector) of zeroes whose dimension is determined by the context, *τ* = *∫ g*^{2}(*t*)*dt* is the Wilcoxon constant, and *g*(·) is the density function of the random error *ε*.

Suppose that Conditions (C1) — (C4) in the Appendix hold. Then ∀ ε > 0, ∀ c > 0,

$$P\left[\underset{\mid \mid ({\mathit{\beta}}^{\ast T},{\alpha}_{2}^{\ast})\mid \mid \le c}{sup}\left|{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})-{B}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})\right|\ge \epsilon \right]\to 0,$$

where || · || denotes the Euclidean norm.

Lemma 3.1 implies that the non-smooth objective function
${Q}_{n}^{\ast}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$ can be uniformly approximated by a quadratic function
${B}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$ in a neighborhood around **0**. In the appendix, it is also shown that the minimizer of
${B}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$ is asymptotically within a *o*(1) neighborhood of
${({\widehat{\mathit{\beta}}}_{n}^{\ast T},{\widehat{\alpha}}_{n2}^{\ast})}^{T}$. This further allows us to derive the asymptotic distribution.

The local linear Wilcoxon estimator of **a**(*u*_{0}) = (*a*_{1}(*u*_{0}), …, *a _{p}*(

Suppose that Conditions (C1) — (C4) in the Appendix hold. Then we have the following asymptotic representation

$$\sqrt{nh}[\widehat{\mathbf{a}}({u}_{0})-\mathbf{a}({u}_{0})]=-{\gamma}_{n}^{-2}{[4\tau {f}^{2}({u}_{0})\mathrm{\sum}({u}_{0})]}^{-1}{S}_{n11}(\mathbf{0},0)+{o}_{P}(1),$$

(9)

where f(u) is the density function of U. Furthermore,

$$\sqrt{nh}\left[\widehat{\mathbf{a}}({u}_{0})-\mathbf{a}({u}_{0})-\frac{{\mu}_{2}}{2}{\mathbf{a}}^{\u2033}({u}_{0}){h}^{2}+o({h}^{2})\right]\to \mathbf{N}\phantom{\rule{0.16667em}{0ex}}\left(0,\frac{{\nu}_{0}}{12{\tau}^{2}f({u}_{0})}{\mathrm{\sum}}^{-1}({u}_{0})\right)$$

(10)

in distribution, where ${\mathbf{a}}^{\u2033}({u}_{0})={({a}_{1}^{\u2033}({u}_{0}),\dots ,{a}_{p}^{\u2033}({u}_{0}))}^{T}$.

For the estimators of the derivatives of the coefficient functions, we have the following asymptotic representations:

$$\sqrt{n{h}^{3}}[{\widehat{\alpha}}_{2}-{a}_{0}^{\prime}({u}_{0})]=-{\gamma}_{n}^{-2}{[4\tau {f}^{2}({u}_{0}){\mu}_{2}]}^{-1}{S}_{n2}(\mathbf{0},0)+{o}_{P}(1),$$

(11)

$$\sqrt{n{h}^{3}}[{\widehat{\mathbf{a}}}^{\prime}({u}_{0})-{\mathbf{a}}^{\prime}({u}_{0})]=-{\gamma}_{n}^{-2}{[4\tau {f}^{2}({u}_{0}){\mu}_{2}\mathrm{\sum}({u}_{0})]}^{-1}{S}_{n12}(\mathbf{0},0)+{o}_{P}(1).$$

(12)

Following similar proof as that for Theorem 3.2 in the appendix, it can be shown that
$\sqrt{n{h}^{3}}[{\widehat{\alpha}}_{n2}-{a}_{0}^{\prime}({u}_{0})]$ and
$\sqrt{n{h}^{3}}[{\widehat{\mathbf{a}}}^{\prime}({u}_{0})-{\mathbf{a}}^{\prime}({u}_{0})]$ are both asymptotically normal. The proof of the asymptotic normality of _{2} and **â**′(*u*_{0}) is given in the technical report version of this paper ((Wang, Kai and Li, 2009).

We now compare the estimation efficiency of the local rank estimator (denoted by b **â*** _{R}*(

Zhang and Lee (2000) gives the asymptotic MSE of **â*** _{LS}*(

$${\text{MSE}}_{LS}(h;{u}_{0})=E\mid \mid {\widehat{\mathbf{a}}}_{LS}({u}_{0})-\mathbf{a}({u}_{0})\mid {\mid}^{2}=\frac{{\mu}_{2}^{2}\mid \mid {\mathbf{a}}^{\u2033}({u}_{0})\mid {\mid}^{2}}{4}{h}^{4}+\frac{{\nu}_{0}{\sigma}^{2}}{f({u}_{0})}\text{tr}\{{\mathrm{\sum}}^{-1}({u}_{0})\}\frac{1}{nh},$$

where *σ*^{2} = var(*ε*) is assumed to be finite and positive. Thus, the theoretical optimal bandwidth, which minimizes the asymptotic MSE of **â*** _{LS}*(

$${h}_{LS}^{\mathit{opt}}({u}_{0})={\left[\frac{{\nu}_{0}{\sigma}^{2}\text{tr}\{{\mathrm{\sum}}^{-1}({u}_{0})\}}{{\mu}_{2}^{2}\mid \mid {\mathbf{a}}^{\u2033}({u}_{0})\mid {\mid}^{2}f({u}_{0})}\right]}^{1/5}{n}^{-1/5}.$$

(13)

From (10), the asymptotic MSE of the local rank estimator **â*** _{R}*(

$${\text{MSE}}_{R}(h;{u}_{0})=E\mid \mid {\widehat{\mathbf{a}}}_{R}({u}_{0})-\mathbf{a}({u}_{0})\mid {\mid}^{2}=\frac{{\mu}_{2}^{2}\mid \mid {\mathbf{a}}^{\u2033}({u}_{0})\mid {\mid}^{2}}{4}{h}^{4}+\frac{{\nu}_{0}}{12{\tau}^{2}f({u}_{0})}\text{tr}\{{\mathrm{\sum}}^{-1}({u}_{0})\}\frac{1}{nh}.$$

The theoretical optimal bandwidth for the local rank estimator thus is

$${h}_{R}^{\mathit{opt}}({u}_{0})={\left[\frac{{\nu}_{0}\text{tr}\{{\mathrm{\sum}}^{-1}({u}_{0})\}}{12{\tau}^{2}{\mu}_{2}^{2}\mid \mid {\mathbf{a}}^{\u2033}({u}_{0})\mid {\mid}^{2}f({u}_{0})}\right]}^{1/5}{n}^{-1/5}.$$

(14)

This allows us to calculate the local asymptotic relative efficiency.

The asymptotic relative efficiency of the local rank estimator to the local linear least squares estimator for **a**(u_{0}) is

$$\mathit{ARE}({u}_{0})=\frac{{\mathit{MSE}}_{LS}\{{h}_{LS}^{\mathit{opt}}({u}_{0}),{u}_{0}\}}{{\mathit{MSE}}_{R}\{{h}_{R}^{\mathit{opt}}({u}_{0}),{u}_{0}\}}={(12{\sigma}^{2}{\tau}^{2})}^{4/5}.$$

This asymptotic relative efficiency has a lower bound 0.8896, which is attained at the random error density $f(t)={\scriptstyle \frac{3}{20\sqrt{5}}}(5-{x}^{2})I(\mid x\mid \phantom{\rule{0.16667em}{0ex}}\le 5)$.

Alternatively, we may consider the asymptotic relative efficiency obtained by comparing the MISE, which is defined as MISE(*h*) = *∫ E*||**â**(*u*) − **a**(*u*)||^{2}*w*(*u*) *du* with a weight function *w*(·). This provides a global measurement. Interestingly, it leads to the same relative efficiency. This follows by observing that the theoretical optimal global bandwidths for the local linear least squares estimator and the local rank estimator are

$${h}_{LS}^{\mathit{opt}}={\left[\frac{{\nu}_{0}{\sigma}^{2}\int w(u)\text{tr}\{{\mathrm{\sum}}^{-1}(u)\}/f(u)\phantom{\rule{0.16667em}{0ex}}du}{{\mu}_{2}^{2}\int \mid \mid {\mathbf{a}}^{\u2033}(u)\mid {\mid}^{2}w(u)\phantom{\rule{0.16667em}{0ex}}du}\right]}^{1/5}{n}^{-1/5},$$

(15)

and

$${h}_{R}^{\mathit{opt}}={\left[\frac{{\nu}_{0}\int w(u)\text{tr}\{{\mathrm{\sum}}^{-1}(u)\}/f(u)\phantom{\rule{0.16667em}{0ex}}du}{12{\tau}^{2}{\mu}_{2}^{2}\int \mid \mid {\mathbf{a}}^{\u2033}(u)\mid {\mid}^{2}w(u)\phantom{\rule{0.16667em}{0ex}}du}\right]}^{1/5}{n}^{-1/5},$$

(16)

respectively. Thus, with the theoretical optimal bandwidths,

$$\text{ARE}=\frac{{\text{MISE}}_{LS}({h}_{LS}^{\mathit{opt}})}{{\text{MISE}}_{R}({h}_{R}^{\mathit{opt}})}={(12{\sigma}^{2}{\tau}^{2})}^{4/5}.$$

Define = (12*σ*^{2}*τ*^{2})^{4/5}. Then ARE(*u*_{0}) = ARE = .

Note that the above ARE is closely related to the asymptotic relative efficiency of the Wilcoxon-Mann-Whitney rank test in comparison with the two-sample *t*-test. Table 1 depicts the value of for some commonly used error distributions. It can be seen that the desirable high efficiency of traditional rank methods for estimating a finite-dimensional parameter completely carries over to the local rank method for estimating an infinite dimensional parameter.

By a similar calculation, we can show that the asymptotic relative efficiencies of the local rank estimator to the local linear estimator for **a**′(*u*_{0}) and **a**′(·) both equal *ψ* = (12*σ*^{2}*τ*^{2})^{8/11}, which has a lower bound 0.8991. This value is also reported in Table 1 for some common error distributions.

We may also apply the local median approach (Yu and Jones, 1998) to estimate the coefficient functions and their first derivatives. Similarly, we can prove that such estimators are asymptotically normal. The ARE of the local median estimator versus the local linear least squares estimator is closely related to that of the sign test versus the *t*-test. It is known that the ARE of the sign test versus the *t*-test for the normal distribution is 0.63. Thus we expect the efficiency loss of the local median procedure to be substantial for normal random error.

Following (5), ${\widehat{\alpha}}_{1}^{\ast}=\sqrt{nh}\{{\widehat{\alpha}}_{1}-{a}_{0}({u}_{0})\}$ is the value of ${\alpha}_{1}^{\ast}$ that minimizes

$$\begin{array}{l}{Q}_{n0}^{\ast}({\alpha}_{1}^{\ast},{\widehat{\alpha}}_{2},\widehat{\mathit{\beta}})={n}^{-1}\sum _{i=1}^{n}|{\epsilon}_{i}-{\gamma}_{n}{\alpha}_{1}^{\ast}-({\widehat{\alpha}}_{2}-{a}_{0}^{\prime}({u}_{0}))({U}_{i}-{u}_{0})\\ -\sum _{m=1}^{p}[({\widehat{\beta}}_{m}-{a}_{m}({u}_{0}))+({\widehat{\beta}}_{p+m}-{a}_{m}^{\prime}({u}_{0}))({U}_{i}-{u}_{0})]{X}_{im}+{\mathrm{\Delta}}_{i}({u}_{0})|{K}_{h}({U}_{i}-{u}_{0}).\end{array}$$

Similarly as Lemma 3.1, we can establish the following local quadratic approximation which holds uniformly in a neighborhood around 0:

$${\gamma}_{n}^{-1}{Q}_{n0}^{\ast}({\alpha}_{1}^{\ast},{\widehat{\alpha}}_{2},\widehat{\mathit{\beta}})={\gamma}_{n}^{-1}{\alpha}_{1}^{\ast}{S}_{n0}+{\gamma}_{n}g(0)f({u}_{0}){\alpha}_{1}^{\ast 2}+{\gamma}_{n}^{-1}{Q}_{n0}^{\ast}(0,{a}_{0}^{\prime}({u}_{0}),{\mathit{\beta}}_{0})+{o}_{p}(1),$$

(17)

where

$${S}_{n0}=2{\gamma}_{n}{n}^{-1}\sum _{i=1}^{n}[I({\epsilon}_{i}\le -{\mathrm{\Delta}}_{i}({u}_{0}))-1/2]{K}_{h}({U}_{i}-{u}_{0}).$$

(18)

This further leads to an asymptotic representation of _{1}:

$$\sqrt{nh}({\widehat{\alpha}}_{1}-{a}_{0}({u}_{0}))=-{\gamma}_{n}^{-2}{[2g(0)f({u}_{0})]}^{-1}{S}_{n0}+{o}_{p}(1).$$

(19)

The theorem below gives the asymptotic distribution of _{1}.

Under the conditions of Theorem 3.2, we have

$$\sqrt{nh}\left[{\widehat{\alpha}}_{1}-{a}_{0}({u}_{0})-\frac{{\mu}_{2}{a}_{0}^{\u2033}({u}_{0})}{2}{h}^{2}+o({h}^{2})\right]\to \mathbf{N}\left(0,{[12{g}^{2}(0)f({u}_{0})]}^{-1}{\nu}_{0}\right).$$

To make statistical inference based on the local rank methodology, one needs to estimate the standard error of the resulting estimator. As indicated by Theorem 3.2, the asymptotic covariance matrix of the local rank estimator is rather complex and involves unknown functions. Here we propose a standard error estimator using a simple resampling method proposed by Jin, Ying and Wei (2001).

Let *V*_{1}, …, *V _{n}* be independent and identically distributed nonnegative random variables with mean 1/2 and variance 1. We consider a stochastic perturbation of (4):

$${\overline{Q}}_{n}(\mathit{\beta},{\alpha}_{2})=\frac{1}{n(n-1)}\sum _{1\le i,j\le n}({V}_{i}+{V}_{j})\mid {e}_{i}-{e}_{j}\mid {K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}),$$

(20)

where *e _{i}* is defined in (3). In

Jin, Ying and Wei (2001) established the validity of the resampling method when the objective function has a *U*-statistics structure. Although their theory covers many important applications, they require that the *U*-statistic has a fixed kernel. We extend their result to our setting, where the *U*-statistic involves variable kernel due to nonparametric smoothing. Let **ā**(*u*_{0}) be the local rank estimator of **a**(*u*_{0}) based on the perturbed objective function (20), i.e., it is the subvector that consists of the first *p* components of **. Its asymptotic normality is given in the theorem below.**

Under the conditions of Lemma 3.1, conditional on almost surely every sequence of data {Y_{i}; U_{i}; **X**_{i}},

$$\sqrt{nh}[\overline{\mathbf{a}}({u}_{0})-\widehat{\mathbf{a}}({u}_{0})]\to \mathbf{N}\left(0,\frac{{\nu}_{0}}{12{\tau}^{2}f({u}_{0})}{\mathrm{\sum}}^{-1}({u}_{0})\right)$$

in distribution.

This theorem suggests that to estimate the asymptotic covariance matrix of b **â**(*u*_{0}), one can repeatedly perturb (4) by generating a large number of independent random samples
${\left\{{V}_{i}\right\}}_{i=1}^{n}$. For each perturbed objective function, one solves for **ā**(*u*_{0}). The sample covariance matrix of **ā**(*u*_{0}) based on a large number of independent perturbations provides a good approximation. The accuracy of the resulting standard error estimate will be tested in the next section.

The perturbed estimator has conditional bias equal to zero. It has been found that standard bootstrap method, which resamples from the empirical distribution of the data, also estimates bias as zero when estimating nonparametric curves (Hall and Kang, 2001). It is possible to use more delicate bootstrap technique to estimate the bias of a nonparametric curve estimator. Although some of the ideas may be adapted to the method of disturbing the objective function, this is beyond the scope of our paper and is not pursued further here.

The local rank estimator can be obtained by applying an efficient and reliable algorithm. Note that the local rank estimator of
${({\mathit{\beta}}_{0}^{T},{a}_{0}^{\prime}({u}_{0}))}^{T}$ can be solved by fitting a weighted *L*_{1} regression on
${\scriptstyle \frac{n(n-1)}{2}}$ pseudo observations (
${\mathbf{x}}_{i}^{\ast}-{\mathbf{x}}_{j}^{\ast}$, *Y _{i}*-

Bandwidth selection is an important issue for all statistical models that involve nonparametric smoothing. Although we have derived the theoretical optimal bandwidth for the local rank estimator in (14) and (16), it is difficult to use the “plug-in“ method to estimate it due to many unknown quantities.

We propose below an alternative bandwidth selection method that is practically feasible. This approach is based on the relationship between ${h}_{R}^{\mathit{opt}}$ and ${h}_{LS}^{\mathit{opt}}$. From Section 2.3, we see that

$${h}_{R}^{\mathit{opt}}({u}_{0})={\left(\frac{1}{12{\tau}^{2}{\sigma}^{2}}\right)}^{1/5}{h}_{LS}^{\mathit{opt}}({u}_{0})\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{h}_{R}^{\mathit{opt}}={\left(\frac{1}{12{\tau}^{2}{\sigma}^{2}}\right)}^{1/5}{h}_{LS}^{\mathit{opt}}.$$

(21)

Thus, we can first use existing bandwidth selectors (e.g. Zhang and Lee 2000) to estimate
${h}_{LS}^{\mathit{opt}}({u}_{0})$ or
${h}_{LS}^{\mathit{opt}}$. The error variance *σ*^{2} can be estimated based on the residuals; in particular when robustness is of concern it can be estimated using the MAD of the residuals. Hettmansperger and Mckean (1998, p.181) discussed in details how to estimate *τ*, which can be obtained by the function “wilcoxontau” in the R software developed by Terpstra and McKean (2005). In the end, we plug in these estimators into (21) to get the bandwidth for local rank estimator.

Alternatively, instead of the above two-step procedure, we may directly use the computationally intensive cross-validation to estimate the bandwidth for the local rank procedure. Note that under outlier contamination, standard cross-validation can lead to extremely biased bandwidth estimates because it can be adversely influenced by extreme prediction errors. Robust cross-validation method, such as that developed by Leung (2005), is therefore preferred.

We conduct Monte Carlo simulations to access the finite sample performance, and illustrate the proposed methodology on a real environmental data set. In the analysis, we use the Epanechnikov kernel *K*(*u*) =.75(1 − *u*^{2})*I*(|*u*| < 1).

We generate random data from

$$Y={a}_{0}(U)+{a}_{1}(U){X}_{1}+{a}_{2}(U){X}_{2}+\epsilon ,$$

where *a*_{0}(*u*) = exp(2*u* − 1), *a*_{1}(*u*) = 8*u*(1 − *u*) and *a*_{2}(*u*) = 2 sin^{2}(2*πu*). The covariate *U* follows a uniform distribution on [0, 1], and is independent from (*X*_{1}, *X*_{2}), where the covariates *X*_{1} and *X*_{2} are standard normal random variables with correlation coefficient 2^{−1/2}. The coefficient functions and the mechanism to generate *U* and (*X*_{1}, *X*_{2}) are the same as those in Cai, Fan and Li (2000). We consider six different error distributions: *N*(0, 1), Laplace, standard Cauchy, *t*-distribution with 3 degrees of freedom, mixture of normals 0.9*N*(0, 1) + 0.1*N*(0, 10^{2}) and lognormal distribution. Except for Cauchy error, all the other generated random errors are standardized to have median 0 and variance 1. We consider sample sizes *n* =400 and 800, and conduct 400 simulations for each case.

We compare the performance of the local rank estimate with the local least squares estimate using the square root of average squared errors (RASE), defined by

$$\text{RASE}={\left\{\frac{1}{{n}_{\text{grid}}}\sum _{m=1}^{p}\sum _{k=1}^{{n}_{\text{grid}}}{\{{\widehat{a}}_{m}({u}_{k})-{a}_{m}({u}_{k})\}}^{2}\right\}}^{1/2},$$

where {*u _{k}*:

Bar graphs of the RASE with standard error for sample size *n* = 400 over 400 simulations. The light gray bar denotes the local least squares method and the dark gray bar denotes the local rank method. The horizontal axis is in the unit of *h*^{opt}, which is **...**

Bar graphs of the RASE with standard error for sample size *n* = 800 over 400 simulations. The light gray bar denotes the local least squares method and the dark gray bar denotes the local rank method. The horizontal axis is in the unit of *h*^{opt}, which is **...**

Figure 3 depicts the estimated coefficient functions for the normal random error and the mixture normal random error for a typical sample, which is selected in such a way that its RASE value is the median of the 400 RASE values. For this typical sample, we observe that the local rank estimator is almost identical to the local least squares estimator for normal random error; but falls much closer to the truth than the local least squares estimator does for mixture normal random error. Figure 4 plots the estimated coefficient functions for all 400 simulations when the random error has a mixture of normal distribution. It is clear that the local rank estimator has smaller variance. In these two figures, we set the bandwidth to be the theoretic optimal one *h ^{opt}*, calculated using (15) and (16), for both the local rank estimator and the local least squares estimator.

(a) and (c) are plots of 400 local least squares estimators of *a*_{1}(·) and *a*_{2}(·) over 400 simulation, respectively. (b) and (d) are plots of 400 local rank estimators of *a*_{1}(·) and *a*_{2}(·), respectively.

At the end, we evaluate the resampling method (Section 3.4) for estimating the standard errors. We randomly perturb the objective function 1000 times; each time the random variables *V _{i}* in (20) are generated from the

As an illustration, we now apply the local rank procedure to the environmental data set in Fan and Zhang (1999). Of interest is to study the relationship between levels of pollutants and the number of total hospital admissions for circulatory and respiratory problems on every Friday from January 1, 1994 to December 31, 1995. The response variable is the logarithm of the number of total hospital admissions, and the covariates include the level of sulfur dioxide (*X*_{1}), the level of nitrogen dioxide (*X*_{2}) and the level of dust (*X*_{3}). A scatter plot of the response variable over time is given in Figure 5(a). We analyze this data set using the following varying coefficient model

(a) Scatterplot of the log of number of total hospital admissions over time, and the solid curve is an estimator of the expected log of number of hospital admissions over time at the average pollutant levels, i.e., *â*_{0}(*u*) + *â*_{1}(*u*) **...**

$$Y={a}_{0}(u)+{a}_{1}(u){X}_{1}+{a}_{2}(u){X}_{2}+{a}_{3}(u){X}_{3}+\epsilon ,$$

where *u* denotes time and is scaled to the interval [0,1].

We select the bandwidth via the relation (21). More specifically, we first use 20-fold cross-validation to select a bandwidth *ĥ _{LS}* for the local least squares estimator. We then use the function ‘wilcoxontau’ in the R package for rank regression by Terpstra and McKean to estimate (12

The estimated coefficient functions are depicted in Figures 5(b), (c) and (d), where the two dashed curves around the solid line are the estimated function plus/minus twice the standard errors estimated by the resampling method. These two dashed lines can be regarded as a pointwise confidence interval with bias ignored. The figures suggest clearly that the coefficient functions vary with time. The fitted curve is shown in Figure 5(a).

Now we demonstrate the robustness of the local rank procedure. To this end, we artificially perturb the dataset by moving the response value of the 68* ^{th}* observation from 5.89 to 6.89, and the response value of the 34

- (C1). Assume that {
*U*,_{i}**X**,_{i}*Y*} are independent and identically distributed, and that the random error_{i}*ε*and the covariate {*U*,**X**} are independent. Furthermore, assume that*ε*has probability density function*g*(·) which has finite Fisher information, i.e.,*∫*{*g*(*x*)}^{−1}*g*′(*x*)^{2}*dx*< ∞; and that*U*has probability density function*f*(·). - (C2). The function
*a*(·),_{m}*m*= 0, 1,…,*p*, has continuous second-order derivative in a neighborhood of*u*_{0}. - (C3). Assume that
*E*(*X*|_{i}*U*=_{i}*u*_{0}) = 0 and that $\mathrm{\sum}(u)=E({X}_{i}{X}_{i}^{T}\mid {U}_{i}=u)$ is continuous at*u*=*u*_{0}. The matrix Σ(*u*_{0}) is positive definite.(C4). The kernel function*K*(·) is symmetric about the origin, and has a bounded support. Assume that*h*→ 0 and*nh*^{2}→ ∞, as*n*→ ∞

These conditions are used to facilitate the proofs, but may not be the weakest ones. The assumptions on the random errors in (C1) are the same as those for multiple linear rank regression (Hettmansperger and McKean, 1998). (C2) imposes smoothness requirement on the coefficient functions. In (C3), the assumption *E*(*X _{i}*|

In our proofs, we will use some results on generalized *U*-statistic, where the kernel function is allowed to depend on the sample size *n*. The generalized *U*-statistic has the form *U _{n}* = [

If E[||H_{n}(D_{i}, D_{j})||^{2}] = o(n), then
$\sqrt{n}({U}_{n}-{\widehat{U}}_{n})={o}_{p}(1)$ and U_{n} = _{n} + o_{p}(1).

We need the following two lemmas to prove Lemma 3.1. Denote

$$\begin{array}{l}{\mathbf{A}}_{n11}=2{h}^{-2}E\left\{({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j}){({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j})}^{T}K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\right\},\\ {\mathbf{A}}_{n12}=2{h}^{-2}E\left\{({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j})[({U}_{i}-{U}_{j})/h]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\right\},\\ {\mathbf{A}}_{n21}={\mathbf{A}}_{n12}^{T},\\ {A}_{n22}=2{h}^{-2}E\left\{[{({U}_{i}-{U}_{j})}^{2}/{h}^{2}]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\right\},\end{array}$$

and define

$${\mathbf{A}}_{n}=\tau \left(\begin{array}{cc}{\mathbf{A}}_{n11}& {\mathbf{A}}_{n12}\\ {\mathbf{A}}_{n21}& {A}_{n22}\end{array}\right).$$

Suppose that Conditions (C1) — (C4) hold, then **A**_{n} → **A**, where **A** is defined in (8).

We can write ${\mathbf{A}}_{n11}=\left(\begin{array}{cc}{\mathbf{A}}_{n11}^{1}& {\mathbf{A}}_{n11}^{2}\\ {\mathbf{A}}_{n11}^{3}& {A}_{n11}^{4}\end{array}\right)$. Let

$${\mathbf{A}}_{n11}^{1}=2{h}^{-2}E\left[({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){({\mathbf{X}}_{i}-{\mathbf{X}}_{j})}^{T}K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\right].$$

Calculating the expectation by conditional on *U _{i}* and

$$2{h}^{-2}\int E[({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){({\mathbf{X}}_{i}-{\mathbf{X}}_{j})}^{T}\mid {U}_{i}=u,{U}_{j}=v]K\left(\frac{u-{u}_{0}}{h}\right)K\left(\frac{v-{u}_{0}}{h}\right)f(u)f(v)\phantom{\rule{0.16667em}{0ex}}\mathit{dudv}.$$

Using Condition (C3), straightforward calculation gives ${\mathbf{A}}_{n11}^{1}\to 4{f}^{2}({u}_{0})\mathrm{\sum}({u}_{0})$. Let

$${\mathbf{A}}_{n11}^{2}=2{h}^{-2}E\left\{({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){[{\mathbf{X}}_{i}({U}_{i}-{u}_{0})/h-{\mathbf{X}}_{j}({U}_{j}-{u}_{0})/h]}^{T}K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\right\}.$$

Using Condition (C3) and notice that *K*(·) is symmetric, it can be shown that
${\mathbf{A}}_{n11}^{2}\to \mathbf{0}$. By symmetry,
${\mathbf{A}}_{n11}^{3}\to \mathbf{0}$. Similarly, we have

$$\begin{array}{l}{\mathbf{A}}_{n11}^{4}=2{h}^{-2}E\{[{\mathbf{X}}_{i}({U}_{i}-{u}_{0})/h-{\mathbf{X}}_{j}({U}_{j}-{u}_{0})/h]\phantom{\rule{0.16667em}{0ex}}{[{\mathbf{X}}_{i}({U}_{i}-{u}_{0})/h-{\mathbf{X}}_{j}({U}_{j}-{u}_{0})/h]}^{T}\\ K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\}\to 4{f}^{2}({u}_{0})\mathrm{\sum}({u}_{0}){\mu}_{2}.\end{array}$$

Thus ${\mathbf{A}}_{n11}\to 4{f}^{2}({u}_{0})\mathrm{\sum}({u}_{0})\left(\begin{array}{cc}{\mathbf{I}}_{p}& \mathbf{0}\\ \mathbf{0}& {\mu}_{2}{\mathbf{I}}_{p}\end{array}\right)$. Similarly, we can show that ${\mathbf{A}}_{n12}={\mathbf{A}}_{n21}^{T}\to \mathbf{0}$, and

$${A}_{n22}=2\int {({t}_{1}-{t}_{2})}^{2}K({t}_{1})K({t}_{2})f({u}_{0}+{t}_{1}h)f({u}_{0}+{t}_{2}h){dt}_{1}{dt}_{2}\to 4{f}^{2}({u}_{0}){\mu}_{2}.$$

Under Conditions (C1) — (C4), we have

$${\gamma}_{n}^{-1}[{S}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})-{S}_{n}(\mathbf{0},0)]={\gamma}_{n}\mathbf{A}\left(\begin{array}{c}{\mathit{\beta}}^{\ast}\\ {\alpha}_{2}^{\ast}\end{array}\right)+{o}_{p}(1).$$

Let ${U}_{n}={\gamma}_{n}^{-1}[{S}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})-{S}_{n}(\mathbf{0},0)]={[n(n-1)]}^{-1}\sum {\sum}_{i\ne j}{W}_{n}({D}_{i},{D}_{j})$, where

$$\begin{array}{l}{W}_{n}({D}_{i},{D}_{j})=2[I({\epsilon}_{i}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{i}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{j}-{u}_{0})/h\\ -{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{c}{\mathbf{Z}}_{i}-{\mathbf{Z}}_{j}\\ ({U}_{i}-{U}_{j})/h\end{array}\right){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}).\end{array}$$

Let *H _{n}*(

$$\begin{array}{l}E[\mid \mid {W}_{n}({D}_{i},{D}_{j})\mid {\mid}^{2}]\\ \le 4{h}^{-4}E\left\{[{({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j})}^{T}({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j})+{[({U}_{i}-{U}_{j})/h]}^{2}]{K}^{2}\left(\frac{{U}_{i}-{u}_{0}}{h}\right){K}^{2}\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\right\}\\ =O({h}^{-2})=o(n)\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{since}\phantom{\rule{0.38889em}{0ex}}n{h}^{2}\to \infty \phantom{\rule{0.16667em}{0ex}}\text{by}\phantom{\rule{0.16667em}{0ex}}\text{assumption.}\end{array}$$

Thus *U _{n}* =

$$\begin{array}{l}E[{H}_{n}({D}_{i},{D}_{j})]\\ =2{h}^{-2}E\{\int [G(\epsilon +{\mathrm{\Delta}}_{j}({u}_{0})-{\mathrm{\Delta}}_{i}({u}_{0})-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{j}-{U}_{i})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}({\mathbf{Z}}_{j}-{\mathbf{Z}}_{i}))-G(\epsilon )]g(\epsilon )d\epsilon \\ \left(\begin{array}{c}{\mathbf{Z}}_{i}-{\mathbf{Z}}_{j}\\ ({U}_{i}-{U}_{j})/h\end{array}\right)K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\}\\ =2{h}^{-2}{\gamma}_{n}E\{\int g[\epsilon +{\mathrm{\Delta}}_{j}({u}_{0})-{\mathrm{\Delta}}_{i}({u}_{0})]g(\epsilon )d\epsilon \left(\begin{array}{c}{\mathbf{Z}}_{i}-{\mathbf{Z}}_{j}\\ ({U}_{i}-{U}_{j})/h\end{array}\right)\phantom{\rule{0.16667em}{0ex}}({\mathbf{Z}}_{i}^{T}-{\mathbf{Z}}_{j}^{T},({U}_{i}-{U}_{j})/h)\\ K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\}\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{c}{\mathit{\beta}}^{\ast}\\ {\alpha}_{2}^{\ast}\end{array}\right)(1+o(1))={\gamma}_{n}{\mathbf{A}}_{n}\left(\begin{array}{c}{\mathit{\beta}}^{\ast}\\ {\alpha}_{2}^{\ast}\end{array}\right)\phantom{\rule{0.16667em}{0ex}}\{1+o(1)\}.\end{array}$$

The proof is completed by using Lemma A.2.

In view of Lemma A.3, it follows that

$$\nabla [{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})-{B}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})]={\gamma}_{n}^{-1}[{S}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})-{S}_{n}(\mathbf{0},0)]-{\gamma}_{n}\mathbf{A}\left(\begin{array}{c}{\mathit{\beta}}^{\ast}\\ {\alpha}_{2}^{\ast}\end{array}\right)={o}_{p}(1).$$

The proof follows along the same lines of the proof of Theorem A.3.7. of Hettmansperger and McKean (1998), using a “diagonal subsequencing” argument and convexity.

By Lemma 3.1,
${\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})={B}_{n}({\mathbf{s}}_{1},{s}_{2})+{r}_{n}({\mathbf{s}}_{1},{s}_{2})$, where
${r}_{n}({\mathbf{s}}_{1},{s}_{2})\stackrel{p}{\to}0$ uniformly over any bounded set. Note that
${\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})$ is minimized by
${({\widehat{\mathit{\beta}}}_{n}^{\ast T},{\widehat{\alpha}}_{2n}^{\ast})}^{T}$, and *B _{n}*(

$$\begin{array}{l}{T}_{n}=\underset{\mid \mid ({\mathbf{s}}_{1}^{T},{s}_{2})-({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast T},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})\mid \mid =c}{inf}{B}_{n}({\mathbf{s}}_{1},{s}_{2})-{B}_{n}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})\\ {R}_{n}=\underset{\mid \mid ({\mathbf{s}}_{1}^{T},{s}_{2})-({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast T},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})\mid \mid \le c}{sup}\mid {\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})-{B}_{n}({\mathbf{s}}_{1},{s}_{1})\mid ,\end{array}$$

then
${R}_{n}\stackrel{p}{\to}0$ as *n* → ∞. Let
${({\mathbf{s}}_{1}^{T},{s}_{2})}^{T}$ be an arbitrary point outside the ball {
${({\mathbf{s}}_{1}^{T},{s}_{2})}^{T}:\mid \mid ({\mathbf{s}}_{1}^{T},{s}_{2})-({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast T},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})\mid \mid \phantom{\rule{0.16667em}{0ex}}\le c$}, then we can write
${({\mathbf{s}}_{1}^{T},{s}_{2})}^{T}={({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast T},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})}^{T}+l{\mathbf{1}}_{2p+1}$, where *l* > *c* is a positive constant and **1*** _{d}* denotes a unit vector of length

$$\frac{c}{l}[{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})-{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})]=\frac{c}{l}{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})+\left(1-\frac{c}{l}\right){\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})-{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast}).$$

By the convexity of ${\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})$, we have

$$\frac{c}{l}[{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})-{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})]\ge {\gamma}_{n}^{-1}{Q}_{n}^{\ast}\left(\frac{c}{l}({\mathbf{s}}_{1},{s}_{2})+\left(1-\frac{c}{l}\right)\phantom{\rule{0.16667em}{0ex}}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})\right)-{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast}).$$

Thus,

$$\begin{array}{l}\frac{c}{l}[{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})-{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})]\ge {\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast}+c{\mathbf{1}}_{2p},{\stackrel{\sim}{\alpha}}_{2n}^{\ast}+c)-{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})\\ ={B}_{n}({\stackrel{\sim}{\beta}}_{n}^{\ast}+c{\mathbf{1}}_{2p},{\stackrel{\sim}{\alpha}}_{2n}^{\ast}+c)+{r}_{n}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast}+c{\mathbf{1}}_{2p},{\stackrel{\sim}{\alpha}}_{2n}^{\ast}+c)-{B}_{n}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})-{r}_{n}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})\\ \ge {T}_{n}-2{R}_{n}.\end{array}$$

If ${R}_{n}\le {\scriptstyle \frac{1}{2}}{T}_{n}$, then ${\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\mathbf{s}}_{1},{s}_{2})>{\gamma}_{n}^{-1}{Q}_{n}^{\ast}({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})$ for all ${({\mathbf{s}}_{1}^{T},{s}_{2})}^{T}$ outside the ball. This implies if ${R}_{n}\le {\scriptstyle \frac{1}{2}}{T}_{n}$ then the minimizer of ${\gamma}_{n}^{-1}{Q}_{n}^{\ast}$ must be inside the ball. Thus

$$P\left(\mid \mid {({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast T},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})}^{T}-{({\widehat{\mathit{\beta}}}_{n}^{\ast T},{\widehat{\alpha}}_{2n}^{\ast})}^{T}\mid \mid \phantom{\rule{0.16667em}{0ex}}\ge c\right)\le P\left({R}_{n}\ge \frac{1}{2}{T}_{n}\right)=P\left({R}_{n}\ge \frac{1}{2}\lambda {c}^{2}\right)\to 0,$$

where *λ* is the smallest eigenvalue of **A**. Therefore,
${({\widehat{\mathit{\beta}}}_{n}^{\ast T},{\widehat{\alpha}}_{2n}^{\ast})}^{T}={({\stackrel{\sim}{\mathit{\beta}}}_{n}^{\ast T},{\stackrel{\sim}{\alpha}}_{2n}^{\ast})}^{T}+{o}_{p}(1)$. This in particular implies the asymptotic representations (9), (11) and (12).

We next show the asymptotic normality of **â**(*u*_{0}). From (9), we have

$$\sqrt{nh}(\widehat{\mathbf{a}}({u}_{0})-\mathbf{a}({u}_{0}))=-{\gamma}_{n}^{-2}{(4\tau {f}^{2}({u}_{0})\mathrm{\sum}({\mu}_{0}))}^{-1}{S}_{n11}(\mathbf{0},0)+{o}_{p}(1),$$

(A.1)

where

$$\begin{array}{l}{S}_{n11}(\mathbf{0},0)=2{\gamma}_{n}{[n(n-1)]}^{-1}\sum _{i\ne j}[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]\phantom{\rule{0.16667em}{0ex}}({\mathbf{X}}_{i}-{\mathbf{X}}_{j})\\ {K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}).\end{array}$$

(A.2)

By (A.2), let us rewrite $-{\gamma}_{n}^{2}{S}_{n11}(\mathbf{0},0)={S}_{n\mathbf{a}1}(\mathbf{0},0)+{S}_{n\mathbf{a}2}(\mathbf{0},0)$, where

$$\begin{array}{l}{S}_{n\mathbf{a}}(\mathbf{0},0)=2{\gamma}_{n}^{-1}{[n(n-1)]}^{-1}\sum _{i\ne j}[I({\epsilon}_{i}\le {\epsilon}_{j})-1/2]\phantom{\rule{0.16667em}{0ex}}({\mathbf{X}}_{j}-{\mathbf{X}}_{i}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}),\\ {S}_{n\mathbf{b}}(\mathbf{0},0)=2{\gamma}_{n}^{-1}{[n(n-1)]}^{-1}\sum _{i\ne j}[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-I({\epsilon}_{i}\le {\epsilon}_{j})]\phantom{\rule{0.16667em}{0ex}}({\mathbf{X}}_{j}-{\mathbf{X}}_{i})\\ {K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}).\end{array}$$

We next prove that

$${S}_{n\mathbf{a}}(\mathbf{0},0)\to N\left(0,\frac{4}{3}{f}^{3}({u}_{0}){\nu}_{0}\mathrm{\sum}({u}_{0})\right)\phantom{\rule{0.38889em}{0ex}}\text{in}\phantom{\rule{0.16667em}{0ex}}\text{distribution}.$$

(A.3)

Note that we can write
${S}_{n\mathbf{a}}(\mathbf{0},0)=\sqrt{n}{[n(n-1)]}^{-1}{\sum}_{i\ne j}{H}_{n}({D}_{i},{D}_{j})$, where *H _{n}*(

$${W}_{n}({D}_{i},{D}_{j})={h}^{-3/2}[I({\epsilon}_{i}\le {\epsilon}_{j})-1/2]\phantom{\rule{0.16667em}{0ex}}({\mathbf{X}}_{j}-{\mathbf{X}}_{i})K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right).$$

Similarly to the arguments in the proof of Lemma A.3, it can be shown that *E*[||*H _{n}*(

$$\begin{array}{l}{r}_{n}({D}_{i})=E[{H}_{n}({D}_{i},{D}_{j})\mid {D}_{i}]\\ =2{h}^{-3/2}[G({\epsilon}_{i})-1/2]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)E\left\{({\mathbf{X}}_{i}-{\mathbf{X}}_{j})K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\mid {\mathbf{X}}_{i},{U}_{i},{\epsilon}_{i}\right\}\\ =2{h}^{-1/2}[G({\epsilon}_{i})-1/2]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)\phantom{\rule{0.16667em}{0ex}}[\left(\int K(t)f({u}_{0}+th)dt\right){\mathbf{X}}_{i}\\ -\int E({X}_{j}\mid {U}_{j}={u}_{0}+th)K(t)f({u}_{0}+th)dt].\end{array}$$

Furthermore,

$$\begin{array}{l}E[{r}_{n}({D}_{i}){r}_{n}{({D}_{i})}^{T}]\\ =\frac{1}{3}{h}^{-1}E\{{K}^{2}\left(\frac{{U}_{i}-{u}_{0}}{h}\right)\phantom{\rule{0.16667em}{0ex}}[\left(\int K(t)f({u}_{0}+th)dt\right){\mathbf{X}}_{i}\\ -\int E({X}_{j}\mid {U}_{j}={u}_{0}+th)K(t)f({u}_{0}+th)dt]\\ \left[\left(\int K(t)f({u}_{0}+th)dt\right){\mathbf{X}}_{i}^{T}-\int E({X}_{j}^{T}\mid {U}_{j}={u}_{0}+th)K(t)f({u}_{0}+th)dt\right]\}\\ \to \frac{1}{3}{f}^{3}({u}_{0}){\nu}_{0}\mathrm{\sum}({u}_{0}).\end{array}$$

To prove the asymptotic normality of *S _{n}*

$${S}_{n\mathbf{b}}(\mathbf{0},0)=\frac{2{h}^{2}}{{\gamma}_{n}}[\tau {f}^{2}({u}_{0}){\mu}_{2}\mathrm{\sum}({u}_{0}){\mathbf{a}}^{\u2033}({u}_{0})+o(1)]+{o}_{p}(1).$$

(A.4)

We may write ${S}_{n\mathbf{b}}(\mathbf{0},0)={[n(n-1)]}^{-1}{\sum}_{i\ne j}{H}_{n}^{\ast}({D}_{i},{D}_{j})$, where ${H}_{n}^{\ast}({D}_{i},{D}_{j})={W}_{n}^{\ast}({D}_{i},{D}_{j})+{W}_{n}^{\ast}({D}_{j},{D}_{i})$ with

$$\begin{array}{l}{W}_{n}^{\ast}({D}_{i},{D}_{j})=n{h}^{-1}{\gamma}_{n}[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-I({\epsilon}_{i}\le {\epsilon}_{j})]\phantom{\rule{0.16667em}{0ex}}({\mathbf{X}}_{j}-{\mathbf{X}}_{i})\\ K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right).\end{array}$$

Note that

$$\begin{array}{l}{\mathrm{\Delta}}_{j}({u}_{0})-{\mathrm{\Delta}}_{i}({u}_{0})\\ =\frac{1}{2}[{({U}_{j}-{u}_{0})}^{2}{\mathbf{X}}_{j}^{T}-{({U}_{i}-{u}_{0})}^{2}{\mathbf{X}}_{i}^{T}]{\mathbf{a}}^{\u2033}({u}_{0})+\frac{1}{2}[{({U}_{j}-{u}_{0})}^{2}-{({U}_{i}-{u}_{0})}^{2}]{a}_{0}^{\u2033}({u}_{0})\\ +o({({U}_{i}-{u}_{0})}^{2})+o({({U}_{j}-{u}_{0})}^{2}).\end{array}$$

By applying Lemma A.1, it can be shown that ${S}_{n\mathbf{b}}(\mathbf{0},0)=E[{H}_{n}^{\ast}({D}_{i},{D}_{j})]+{o}_{p}(1)$. It follows by using the same arguments as those in the proof of Lemma A.2 that

$$\begin{array}{l}E[{H}_{n}^{\ast}({D}_{i},{D}_{j})]\\ =2n{h}^{-1}{\gamma}_{n}E\{\int [G(\epsilon +{\mathrm{\Delta}}_{j}({u}_{0})-{\mathrm{\Delta}}_{i}({u}_{0}))-G(\epsilon )]g(\epsilon )d\epsilon \\ ({\mathbf{X}}_{j}-{\mathbf{X}}_{i})K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\}\\ =2n{h}^{-1}{\gamma}_{n}[\tau +O(h)]E\left[({\mathrm{\Delta}}_{j}({u}_{0})-{\mathrm{\Delta}}_{i}({u}_{0}))({\mathbf{X}}_{j}-{\mathbf{X}}_{i})K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)K\left(\frac{{U}_{j}-{u}_{0}}{h}\right)\right](1+o(1))\\ =\frac{2{h}^{2}}{{\gamma}_{n}}[\tau {f}^{2}({u}_{0}){\mu}_{2}\mathrm{\sum}({u}_{0}){\mathbf{a}}^{\u2033}({u}_{0})+o(1)].\end{array}$$

This proves (A.4). By combining (A.3) and (A.4) and using the approximation given in (A.1), we obtain (10).

A result of Hodges and Lehmann (1956) indicates that the ARE has a lower bound 0.864^{4/5} = 0.8896, with this lower bound being attained at the density
$f(t)={\scriptstyle \frac{3}{20\sqrt{5}}}(5-{x}^{2})I(\mid x\mid \le 5)$.

Let

$$\begin{array}{l}{V}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})\\ ={(nh)}^{-1}\sum _{i=1}^{n}|{\epsilon}_{i}-{\gamma}_{n}{\alpha}_{1}^{\ast}-{\xi}_{1}({U}_{i}-{u}_{0})-{\mathit{\xi}}_{2}^{T}{\mathbf{X}}_{i}-{\mathit{\xi}}_{3}^{T}({U}_{i}-{u}_{0}){\mathbf{X}}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})|K\left(\frac{{U}_{i}-{u}_{0}}{h}\right),\end{array}$$

where
${\alpha}_{1}^{\ast}={\gamma}_{n}^{-1}({\alpha}_{1}-{a}_{0}({u}_{0}))$, *ξ*_{1} , *ξ*_{2} * ^{p}* and

$$\begin{array}{l}{S}_{n}^{\ast}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})\\ =\frac{2{\gamma}_{n}}{nh}\sum _{i=1}^{n}\left[I({\epsilon}_{i}\le {\gamma}_{n}{\alpha}_{1}^{\ast}+{\xi}_{1}({U}_{i}-{u}_{0})+{\mathit{\xi}}_{2}^{T}{\mathbf{X}}_{i}+{\mathit{\xi}}_{3}^{T}({U}_{i}-{u}_{0}){\mathbf{X}}_{i}-{\mathrm{\Delta}}_{i}({u}_{0}))-1/2\right]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right).\end{array}$$

We have
${S}_{n}^{\ast}(0,0,\mathbf{0},\mathbf{0})=2{\gamma}_{n}{(nh)}^{-1}{\sum}_{i=1}^{n}[I({\epsilon}_{i}\le {\mathrm{\Delta}}_{i}({u}_{0}))-1/2]K\left({\scriptstyle \frac{{U}_{i}-{u}_{0}}{h}}\right)$, which is the same as the *S _{n}*

$$\begin{array}{l}{U}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})\\ =2{(nh)}^{-1}\sum _{i=1}^{n}[I({\epsilon}_{i}\le {\gamma}_{n}{\alpha}_{1}^{\ast}+{\xi}_{1}({U}_{i}-{u}_{0})+{\mathit{\xi}}_{2}^{T}{\mathbf{X}}_{i}+{\mathit{\xi}}_{3}^{T}({U}_{i}-{u}_{0}){\mathbf{X}}_{i}-{\mathrm{\Delta}}_{i}({u}_{0}))\\ -I({\epsilon}_{i}\le {\mathrm{\Delta}}_{i}({u}_{0}))]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right).\end{array}$$

For any positive constants *c _{i}*,

$${U}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})=2{\gamma}_{n}g(0)f({u}_{0}){\alpha}_{1}^{\ast}+{o}_{p}(1).$$

(A.5)

This can be proved by directly checking the mean and variance. More specifically,

$$\begin{array}{l}E[{U}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})]\\ =2{h}^{-1}E\{[G({\gamma}_{n}{\alpha}_{1}^{\ast}+{\xi}_{1}({U}_{i}-{u}_{0})+{\mathit{\xi}}_{2}^{T}{\mathbf{X}}_{i}+{\mathit{\xi}}_{3}^{T}({U}_{i}-{u}_{0}){\mathbf{X}}_{i}-{\mathrm{\Delta}}_{i}({u}_{0}))\\ -G(-{\mathrm{\Delta}}_{i}({u}_{0}))]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)\}\\ =2{h}^{-1}g(0)E\left\{\left[{\gamma}_{n}{\alpha}_{1}^{\ast}+{\xi}_{1}({U}_{i}-{u}_{0})+{\mathit{\xi}}_{2}^{T}{\mathbf{X}}_{i}+{\mathit{\xi}}_{3}^{T}({U}_{i}-{u}_{0}){\mathbf{X}}_{i}\right]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right)\right\}(1+O(h))\\ =2{\gamma}_{n}g(0)f({u}_{0}){\alpha}_{1}^{\ast}(1+O(h)).\end{array}$$

And

$$\begin{array}{l}\mathit{Var}[{U}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})]\\ \le 4{n}^{-1}{h}^{-2}E\{[I({\epsilon}_{i}\le {\gamma}_{n}{\alpha}_{1}^{\ast}+{\xi}_{1}({U}_{i}-{u}_{0})+{\mathit{\xi}}_{2}^{T}{\mathbf{X}}_{i}+{\mathit{\xi}}_{3}^{T}({U}_{i}-{u}_{0}){\mathbf{X}}_{i}-{\mathrm{\Delta}}_{i}({u}_{0}))\\ {-I({\epsilon}_{i}\le {\mathrm{\Delta}}_{i}({u}_{0}))]}^{2}{K}^{2}\left(\frac{{U}_{i}-{u}_{0}}{h}\right)\}\\ \le 4{n}^{-1}{h}^{-2}E\left\{{K}^{2}\left(\frac{{U}_{i}-{u}_{0}}{h}\right)\right\}=O({n}^{-1}{h}^{-1})=o(1).\end{array}$$

By (A.5) and similar proof as that for Lemma 3.1, we have

$${\gamma}_{n}^{-1}{V}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})={V}_{n}^{\ast}({\alpha}_{1}^{\ast})+{o}_{p}(1),$$

(A.6)

where ${V}_{n}^{\ast}({\alpha}_{1}^{\ast})={\gamma}_{n}^{-1}{S}_{n}^{\ast}(0,0,\mathbf{0},\mathbf{0}){\alpha}_{1}^{\ast}+{\gamma}_{n}g(0)f({u}_{0}){\alpha}_{1}^{\ast 2}+{\gamma}_{n}^{-1}{V}_{n}(0,0,\mathbf{0},\mathbf{0})$. Because the function ${V}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})$ is convex in its arguments, (A.6) can be strengthened to uniform convergence (convexity lemma, see Pollard 1991), i.e.,

$$\underset{\begin{array}{l}{\alpha}_{1}^{\ast}\in \mathbb{C},\mid \mid {\xi}_{1}\mid \mid \le {c}_{1}{h}^{-1}{\gamma}_{n}\\ \mid \mid {\mathit{\xi}}_{2}\mid \mid \le {c}_{2}{\gamma}_{n},\mid \mid {\mathit{\xi}}_{3}\mid \mid \le {c}_{3}{h}^{-1}{\gamma}_{n}\end{array}}{sup}\mid {\gamma}_{n}^{-1}{V}_{n}({\alpha}_{1}^{\ast},{\xi}_{1},{\mathit{\xi}}_{2},{\mathit{\xi}}_{3})-{V}_{n}^{\ast}({\alpha}_{1}^{\ast})\mid ={o}_{p}(1),$$

where is a compact set in . By Theorem 3.2,
${\widehat{\alpha}}_{2}-{\alpha}_{0}^{\prime}({u}_{0})={O}_{p}({h}^{-1}{\gamma}_{n})$, **â**(*u*_{0}) − **a**(*u*_{0}) = *O _{p}*(

$$\underset{{\alpha}_{1}^{\ast}\in \mathbb{C}}{sup}\left|{\gamma}_{n}^{-1}{V}_{n}({\alpha}_{1}^{\ast},{\widehat{\alpha}}_{2}-{a}_{0}^{\prime}({u}_{0}),\widehat{\mathbf{a}}({u}_{0})-\mathbf{a}({u}_{0}),{\widehat{\mathbf{a}}}^{\prime}({u}_{0})-{\mathbf{a}}^{\prime}({u}_{0}))-{V}_{n}^{\ast}({\alpha}_{1}^{\ast})\right|={o}_{p}(1).$$

Note that
${V}_{n}({\alpha}_{1}^{\ast},{\widehat{\alpha}}_{2}-{a}_{0}^{\prime}({u}_{0}),\widehat{\mathbf{a}}({u}_{0})-\mathbf{a}({u}_{0}),{\widehat{\mathbf{a}}}^{\prime}({u}_{0})-{\mathbf{a}}^{\prime}({u}_{0}))={Q}_{n0}^{\ast}({\alpha}_{1}^{\ast},{\widehat{\alpha}}_{2},\widehat{\mathit{\beta}}),{S}_{n}^{\ast}(0,0,\mathbf{0},\mathbf{0})={S}_{n0}$, where
${Q}_{n0}^{\ast}$ and *S _{n}*

$$\begin{array}{l}{T}_{1n}=\frac{2{\gamma}_{n}^{-1}}{nh}\sum _{i=1}^{n}[I({\epsilon}_{i}\le 0)-1/2]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right),\\ {T}_{2n}=\frac{2{\gamma}_{n}^{-1}}{nh}\sum _{i=1}^{n}[I({\epsilon}_{i}\le -{\mathrm{\Delta}}_{i}({u}_{0}))-I({\epsilon}_{i}\le 0)]K\left(\frac{{U}_{i}-{u}_{0}}{h}\right).\end{array}$$

By the Lindeberg-Feller central limit theorem, *T*_{1}* _{n}* →

$${T}_{2n}=-\frac{{h}^{2}}{{\gamma}_{n}}g(0)f({u}_{0}){a}_{0}^{\u2033}({u}_{0}){\mu}_{2}(1+o(1))+{o}_{p}(1).$$

Combining the above results and using (19), the proof is completed.

To prove Theorem 3.5, we first extend Lemma A.1 to almost sure convergence.

If E[||H_{n}(D_{i}, D_{j})||^{2}] = O(h^{−2}), then U_{n} − Û_{n} = o(1) almost surely and U_{n} = _{n} + o(1) a.s.

The proof of Powell, Stock and Stoker (1989) for Lemma A.1 suggests that *E*[||*U _{n}* −

Let ** β*** and
${\alpha}_{2}^{\ast}$ be defined the same as before. We introduce the reparametrized objective function
${\overline{Q}}_{n}^{\ast}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$. Let
${\overline{S}}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})={({\overline{S}}_{n1}^{T}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast}),{\overline{S}}_{n2}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast}))}^{T}$ denote the gradient function of
${\overline{Q}}_{n}^{\ast}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$, which is defined similarly as in Section 2.2. We first show that
${\overline{S}}_{n}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$ has a similar local linear approximation as stated in Lemma A.3. To make the proof concise, we prove this for
${\overline{S}}_{n1}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$, where

$$\begin{array}{l}{\overline{S}}_{n1}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})\\ =2{\gamma}_{n}{[n(n-1)]}^{-1}\sum _{i\ne j}({V}_{i}+{V}_{j})[I({\epsilon}_{i}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{i}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}\\ -{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{j}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}).\end{array}$$

Let ${U}_{n}={\gamma}_{n}^{-1}[{\overline{S}}_{n1}({\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})-{\overline{S}}_{n1}(\mathbf{0},0)]={[n(n-1)]}^{-1}{\sum}_{i\ne j}({V}_{i}+{V}_{j}){M}_{n}({D}_{i},{D}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})$, where ${M}_{n}({D}_{i},{D}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})={\scriptstyle \frac{1}{2}}[{m}_{n}({D}_{i},{D}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})+{m}_{n}({D}_{j},{D}_{i},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})]$ and

$$\begin{array}{l}{m}_{n}({D}_{i},{D}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})\\ =2[I({\epsilon}_{i}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{i}-{u}_{0})/h-{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}-{\gamma}_{n}{\alpha}_{2}^{\ast}({U}_{j}-{u}_{0})/h\\ -{\gamma}_{n}{\mathit{\beta}}^{\ast T}{\mathbf{Z}}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]({\mathbf{Z}}_{i}-{\mathbf{Z}}_{j}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}).\end{array}$$

Note that
${U}_{n}=2{n}^{-1}{\sum}_{i=1}^{n}{V}_{i}[{(n-1)}^{-1}{\sum}_{j=1,j\ne i}^{n}{M}_{n}({D}_{i},{D}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})]$. Conditional on
${\left\{{D}_{i}\right\}}_{i=1}^{n}$, this is a weighted average of *V _{i}*. Note that

$$\begin{array}{l}E({U}_{n}\mid {\left\{{D}_{i}\right\}}_{i=1}^{n})={[n(n-1)]}^{-1}\sum _{i\ne j}{M}_{n}({D}_{i},{D}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast}).\\ \mathit{Var}({U}_{n}\mid {\left\{{D}_{i}\right\}}_{i=1}^{n})={n}^{-2}\sum _{i=1}^{n}{\left[{(n-1)}^{-1}\sum _{j=1,j\ne i}^{n}{M}_{n}({d}_{i},{d}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})\right]}^{2}.\end{array}$$

By Lemma A.4, it can be shown that
${[n(n-1)]}^{-1}{\sum}_{i\ne j}{M}_{n}({D}_{i},{D}_{j},{\mathit{\beta}}^{\ast},{\alpha}_{2}^{\ast})={\gamma}_{n}{A}^{\ast}{\mathit{\beta}}^{\ast}+o(1)$ almost surely, where **A*** = 4*τf*^{2}(*u*_{0})diag(**I**_{p},μ_{2}**I*** _{p}*) Σ(

$$\sqrt{nh}[{\overline{\mathbf{a}}}_{n}({u}_{0})-\mathbf{a}({u}_{0})]=-{\gamma}_{n}^{-2}{(4\tau {f}^{2}({u}_{0})\mathrm{\sum}({\mu}_{0}))}^{-1}{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)+{o}_{p}(1),$$

(A.7)

where *o _{p}*(1) is in the probability space generated by
${\left\{{V}_{i}\right\}}_{i=1}^{n}$, and

$$\begin{array}{l}{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)=2{\gamma}_{n}{[n(n-1)]}^{-1}\sum _{i\ne j}({V}_{i}+{V}_{j})[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]\\ ({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0}).\end{array}$$

The approximation (*A*.1) can be strengthened to almost surely convergence, i.e.,

$$\sqrt{nh}[{\widehat{\mathbf{a}}}_{n}({u}_{0})-\mathbf{a}({u}_{0})]=-{\gamma}_{n}^{-2}{(4\tau {f}^{2}({u}_{0})\mathrm{\sum}({\mu}_{0}))}^{-1}{S}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)+o(1)\phantom{\rule{0.38889em}{0ex}}\text{a}.\text{s}\mathrm{..}$$

(A.8)

Combining (17) and (A.7), we have that for almost surely every sequence ${\left\{{D}_{i}\right\}}_{i=1}^{n}$,

$$\sqrt{nh}[{\overline{\mathbf{a}}}_{n}({u}_{0})-{\widehat{\mathbf{a}}}_{n}({u}_{0})]=-{\gamma}_{n}^{-2}{(4\tau {f}^{2}({u}_{0})\mathrm{\sum}({\mu}_{0}))}^{-1}[{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)-{S}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)]+{o}_{p}(1).$$

Note that

$$\begin{array}{l}{\gamma}_{n}^{-2}[{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)-{S}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)]\\ =2{\gamma}_{n}^{-1}{[n(n-1)]}^{-1}\sum _{i\ne j}[({V}_{i}-1/2)+({V}_{j}-1/2)][I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]\\ ({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0})\\ =4{\gamma}_{n}^{-1}{n}^{-1}\sum _{i=1}^{n}({V}_{i}-1/2)\{{(n-1)}^{-1}\sum _{j=1,j\ne i}^{n}[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]\\ ({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0})\}.\end{array}$$

And $E\left\{{\gamma}_{n}^{-2}[{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)-{S}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)]\mid {\left\{{D}_{i}\right\}}_{i=1}^{n}\right\}=\mathbf{0}$. We have

$$\begin{array}{l}\mathit{Var}\left\{{\gamma}_{n}^{-2}[{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)-{S}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)]\mid {\left\{{D}_{i}\right\}}_{i=1}^{n}\right\}\\ =16{\gamma}_{n}^{-2}{n}^{-2}{(n-1)}^{2}\sum _{i=1}^{n}\{\sum _{j=1,j\ne i}^{n}[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2)]\\ {({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){K}_{h}({U}_{i}-{u}_{0}){K}_{h}({U}_{j}-{u}_{0})\}}^{2}\\ ={W}_{1}+{W}_{2},\end{array}$$

where

$$\begin{array}{l}{W}_{1}=16{\gamma}_{n}^{-2}{n}^{-2}{(n-1)}^{2}{h}^{-4}\sum _{i=1}^{n}\sum _{j=1,j\ne i}^{n}{[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{j}+{\mathrm{\Delta}}_{j}({u}_{0}))-1/2]}^{2}\\ ({\mathbf{X}}_{i}-{\mathbf{X}}_{j}){({\mathbf{X}}_{i}-{\mathbf{X}}_{j})}^{T}{K}^{2}(({U}_{i}-{u}_{0})/h){K}^{2}(({U}_{j}-{u}_{0})/h),\\ {W}_{2}=16{\gamma}_{n}^{-2}{n}^{-2}{(n-1)}^{2}{h}^{-4}\sum _{i=1}^{n}\sum _{{j}_{1}\ne i}^{n}\sum _{{j}_{2}\ne i,{j}_{1}}^{n}[I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{{j}_{1}}+{\mathrm{\Delta}}_{{j}_{1}}({u}_{0}))-1/2]\\ [I({\epsilon}_{i}+{\mathrm{\Delta}}_{i}({u}_{0})\le {\epsilon}_{{j}_{2}}+{\mathrm{\Delta}}_{{j}_{2}}({u}_{0}))-1/2]({\mathbf{X}}_{i}-{\mathbf{X}}_{{j}_{1}}){({\mathbf{X}}_{i}-{\mathbf{X}}_{{j}_{2}})}^{T}\\ {K}^{2}(({U}_{i}-{u}_{0})/h)K(({U}_{{j}_{1}}-{u}_{0})/h)K(({U}_{{j}_{2}}-{u}_{0})/h).\end{array}$$

Lemma A.4 can be used to show that *W*_{1} = *o*(1) almost surely; and a minor extension of Lemma A.4 to third-order U-statistic can be used to show that
${W}_{2}={\scriptstyle \frac{4}{3}}{f}^{3}({u}_{0}){\nu}_{0}\mathrm{\sum}({u}_{0})+o(1)$ almost surely. The asymptotic normality of
${\gamma}_{n}^{-2}[{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)-{S}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)]$ follows by showing that the condition of Lindeberg-Feller central limit theorem for triangular arrays holds almost surely. We have, for almost surely every sequence
${\left\{{D}_{i}\right\}}_{i=1}^{n}$,

$${\gamma}_{n}^{-2}[{\overline{S}}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)-{S}_{n\mathbf{a}}^{\ast}(\mathbf{0},0)]\to N\left(\mathbf{0},\frac{4}{3}{f}^{3}({u}_{0}){\nu}_{0}\mathrm{\sum}({u}_{0})\right)$$

in distribution. This completes the proof.

^{1}Lan Wang is Assistant Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455. Email: ude.nmu.tats@nal. Bo Kai is a graduate student, Department of Statistics, The Pennsylvania State University, University Park, PA 16802. Email: ude.usp@iakob. Runze Li is Professor, Department of Statistics and The Methodology Center, The Pennsylvania State University, University Park, PA 16802-2111. Email: ude.usp.tats@ilr. Wang’s research is supported by National Science Foundation grant DMS-0706842. Kai’s research is supported by National Science Foundation grants DMS 0348869 as a research assistant. Li’s research is supported by NIDA, NIH grants R21 DA024260 and P50 DA10075. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

1. Brumback B, Rice JA. Smoothing Spline Models for the Analysis of Nested and Crossed Samples of Curves (with discussion) Journal of the American Statistical Association. 1998;93:961–994.

2. Cai Z, Fan J, Li R. Efficient Estimation and Inferences for Varying- Coefficient Models. Journal of the American Statistical Association. 2000;95:888–902.

3. Cleveland WS, Grosse E, Shyu WM. Local regression models. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. Pacific Grove, California: Wadsworth & Brooks; 1992. pp. 309–376.

4. David HA. Early Sample Measures of Variability. Statistical Science. 1998;13:368–377.

5. Fan J, Li R. An Overview on Nonparametric and Semiparametric Techniques for Longitudinal Data. In: Fan J, Koul H, editors. Frontiers in Statistics. Imperial College Press; London: 2006. pp. 277–303.

6. Fan J, Zhang W. Statistical Estimation in Varying-Coefficient Models. The Annals of Statistics. 1999;27:1491–1518.

7. Fan J, Zhang W. Simultaneous Confidence Bands and Hypothesis Testing in Varying-Coefficient Models. Scandinavian Journal of Statistics. 2000;27:715–731.

8. Fan J, Zhang W. Statistical Methods with Varying Coefficient Models. Statistics and its Interface. 2008;1:179–195. [PMC free article] [PubMed]

9. Hall P, Kang KH. Bootstrapping Nonparametric Density Eestimators with Empirically Chosen Bandwidths. The Annals of Statistics. 2001;29:1443–1468.

10. Hastie TJ, Tibshirani RJ. Varying-Coefficient Models (with discussion) Journal of the Royal Statistical Society, Series B. 1993;55:757–796.

11. Hettmansperger TP, McKean JW. Robust Nonparametric Statistical Methods. London: Arnold; 1998.

12. Hjort NL, Pollard D. Asymptotics for minimisers of convex processes, Preprint. 1993. http://citeseer.ist.psu.edu/hjort93asymptotics.html.

13. Hodges JL, Lehmann EL. The Efficiency of Some Nonparametric Competitors of the *t*-Test. The Annals of Mathematical Statistics. 1956;27:324–335.

14. Hoover DR, Rice JA, Wu CO, Yang L-P. Nonparametric Smoothing Estimates of Time-varying Coefficient Models with Longitudinal Data. Biometrika. 1998;85:809–822.

15. Jin Z, Ying Z, Wei LJ. A Simple Resampling Method by Disturbibg the Minimand. Biometrika. 2001;88:381–390.

16. Kauermann G, Tutz G. On Model Diagnostics Using Varying Coefficient Models. Biometrika. 1999;86:119–128.

17. Kim M-O. Quantile Regression with Varying Coefficients. The Annals of Statistics. 2007;35:92–108.

18. Leung DH. Cross-validation in nonparametric regression with outliers. The Annals of Statistics. 2005;33:2291–2310.

19. McKean JW. Robust Analysis of Linear Models. Statistical Science. 2004;19:562–570.

20. Pollard D. Asymptotics for Least Absoulte Deviation Regression Estimators. Econometric Theory. 1991;7:186–199.

21. Powell JL, Stock JH, Stoker TM. Semiparametric Estimation of Index Coefficients. Econometrica. 1989;57:1403–1430.

22. Serfling R. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980.

23. Terpstra J, McKean J. Rank-based Analysis of Linear Models using R. Journal of Statistical Software. 2005;14:1–26.

24. Wang L, Kai B, Li R. Technical Report, the Methodology Center, the Pennsylvania State University. 2009. Local Rank Inference for Varying Coefficient Models. [PMC free article] [PubMed]

25. Wu CO, Chiang CT, Hoover DR. Asymptotic Confidence Regions for Kernel Smoothing of a Varying-coefficient Model with Longitudinal Data. Journal of the American Statistical Association. 1998;93:1388–1402.

26. Yu K, Jones MC. Local Linear Quantile Regression. Journal of the American Statistical Association. 1998;93:228–237.

27. Zhang W, Lee SY. Variable Bandwidth Selection in Varying-Coefficient Models. Journal of Multivariate Analysis. 2000;74:116–134.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |