Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2958780

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Estimation of regression function
- 3 Estimation of derivative
- 3.2 Asymptotic relative efficiency
- 4 Numerical comparisons and examples
- 5 Local p-polynomial CQR smoothing and proofs
- 6 Discussion
- Supplementary Material
- References

Authors

Related links

J R Stat Soc Series B Stat Methodol. Author manuscript; available in PMC 2011 January 1.

Published in final edited form as:

J R Stat Soc Series B Stat Methodol. 2010 January; 72(1): 49–69.

doi: 10.1111/j.1467-9868.2009.00725.xPMCID: PMC2958780

NIHMSID: NIHMS115056

See other articles in PMC that cite the published article.

Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. In this paper, we propose a new nonparametric regression technique called *local composite-quantile-regression (CQR) smoothing* in order to further improve local polynomial regression. Sampling properties of the proposed estimation procedure are studied. We derive the asymptotic bias, variance and normality of the proposed estimate. Asymptotic relative efficiency of the proposed estimate with respect to the local polynomial regression is investigated. It is shown that the proposed estimate can be much more efficient than the local polynomial regression estimate for various non-normal errors, while being almost as efficient as the local polynomial regression estimate for normal errors. Simulation is conducted to examine the performance of the proposed estimates. The simulation results are consistent with our theoretical findings. A real data example is used to illustrate the proposed method.

Consider the general nonparametric regression model

$$Y=m(T)+\sigma (T)\u03f5,$$

(1.1)

where *Y* is the response variable, *T* is a covariate, *m*(*T*) = *E*(*Y*|*T*), which is assumed to be a smooth nonparametric function, and σ(*T*) is a positive function representing the standard deviation. We assume ϵ has mean 0 and variance 1. Local polynomial regression is a popular and successful method for nonparametric regression, and it has been well studied in the literature (Fan & Gijbels 1996). By locally fitting a linear (or polynomial) regression model via adaptively weighted least squares, local polynomial regression is able to explore the fine features of the regression function and its derivatives. Although the least squares method is a popular and convenient choice in local polynomial fitting, we may consider using different local fitting methods. For example, in the presence of outliers, one may consider local least absolute deviation (LAD) polynomial regression (Fan, Hu & Truong 1994, Welsh 1996). When the error follows a Laplacian distribution, the local LAD polynomial regression is more efficient than the local least squares polynomial regression. Of course, the local LAD polynomial regression can do much worse than the local least squares polynomial regression in other different settings. The aim of this paper is to develop a new local estimation procedure that can significantly improve upon the classical local polynomial regression for a wide class of error distributions, and has comparable efficiency in the worst case scenario.

Our proposal is built upon the composite-quantile-regression (CQR) estimator recently proposed by Zou & Yuan (2008) for estimating the regression coefficients in the classical linear regression model. Zou & Yuan (2008) show that the relative efficiency of the CQR estimator compared to the least squares estimator is greater than 70% regardless the error distribution. Furthermore, the CQR estimator could be much more efficient and sometimes arbitrarily more efficient than the least squares estimator. These nice theoretical properties of CQR in linear regression motivates us to construct the local CQR smoothers as nonparametric estimates of the regression function and its derivatives.

We make several contributions in this paper.

- We propose the local linear CQR estimator for estimating the nonparametric regression function. We establish the asymptotic theory of the local linear CQR estimator and show that, compared with the classical local linear least squares estimator, the new method can significantly improve the estimation efficiency of the local linear least squares estimator for commonly used non-normal error distributions.
- We propose the local quadratic CQR estimator for estimating the derivative of the regression function. The asymptotic theory shows that the local quadratic CQR estimator can often drastically improve the estimation efficiency of its local least squares counterpart if the error distribution is non-normal, and at the same time, the loss in efficiency is at most 8.01% in the worst case scenario.
- The general asymptotic theory of the local
*p*-polynomial CQR estimator is established. Our theory does not require the error distribution to have a finite variance. Therefore, local CQR estimators can work well even when local polynomial regression fails due to the infinite variance in the noise.

It is a well-known fact that the local linear (polynomial) regression is the best linear smoother in terms of efficiency (Fan & Gijbels 1996). There is no contradiction between this fact and our results, because the proposed local CQR estimator is a *nonlinear* smoother.

The rest of this paper is organized as follows. In section 2, we introduce the local linear CQR for the nonparametric regression and study its asymptotic properties. In section 3, we propose the local quadratic CQR for estimating the derivative of the nonparametric regression, which is able to further reduce the estimation bias by the local linear CQR. Monte Carlo study and a real data example are presented in section 4. In section 5 we present the general theory of the local *p*-polynomial CQR and technical proofs.

Suppose that (*t _{i},y_{i}*),

$$(\widehat{a},\widehat{b})=\underset{a,b}{\text{arg min}}{\displaystyle \sum _{i=1}^{n}{\{{y}_{i}-a-b({t}_{i}-{t}_{0})\}}^{2}K\left(\frac{{t}_{i}-{t}_{0}}{h}\right)},$$

(2.1)

where *h* is the smoothing parameter. Local linear regression enjoys many good theoretical properties, such as its design adaptation property and high minimax efficiency (Fan & Gijbels 1992). However, local least squares regression breaks down when the error distribution does not have finite second moment, for the estimator is no longer consistent. The local least absolute deviation (LAD) polynomial regression (Fan et al. 1994, Welsh 1996) replaces the least squares loss in (2.1) with the *L*_{1} loss. By doing so, the local LAD estimator can deal with the infinite variance case, but for finite variance cases its relative efficiency compared to the local least squares estimator can be arbitrarily small.

We propose the local linear CQR estimator as an efficient alternative to the local linear regression estimator. Let *ρ _{τk}* (

$$\sum _{k=1}^{q}{\displaystyle \sum _{i=1}^{n}{\rho}_{{\tau}_{\kappa}}({y}_{i}-{a}_{k}-b{t}_{i})}}.$$

The CQR combines the strength across multiple quantile regressions with forcing a single parameter for “slope”. Since the nonparametric function is approximated by a linear model locally, we consider minimizing the locally weighted CQR loss

$$\sum _{k=1}^{q}\left[{\displaystyle \sum _{i=1}^{n}{\rho}_{{\tau}_{k}}}\{{y}_{i}-{a}_{k}-b({t}_{i}-{t}_{0})\}K(\frac{{t}_{i}-{t}_{0}}{h})\right]}.$$

(2.2)

Denote the minimizer of (2.2) by (â_{1}, , â_{q}, ). Then we let

$$\widehat{m}({t}_{0})=\frac{1}{q}{\displaystyle \sum _{k=1}^{q}{\widehat{a}}_{k},\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\tilde{m}\prime ({t}_{0})=\widehat{b}}.$$

(2.3)

We refer (*t*_{0}) to as the local linear CQR estimator of *m*(*t*_{0}). As an estimator of *m′*(*t*_{0}), *′*(*t*_{0}) can be further improved by using the local quadratic CQR estimator which is discussed in the next section.

It is worth mentioning here that although the check loss function is typically used to estimate the conditional quantile function of *y* given *T* (see Koenker (2005) and references therein), we simultaneously employ several check functions to estimate the regression (mean) function. So the local CQR smoother is conceptually different from nonparametric quantile regression by local fitting which has been studied in Yu & Jones (1998) and chapter 5 of Fan & Gijbels (1996).

In a short note Koenker (1984) studied the Hogg estimator as the minimizer of the weighted sum of check functions in the framework of parametric linear models. The focus there is to argue that the Hogg estimator is a different way to do L-estimation. The CQR loss can be regarded as a weighted sum of check functions with uniform weights and uniform quantiles $({\tau}_{k}=\frac{k}{q+1},k=1,2,\dots ,q)$. When *q* is large, such a choice leads to nice oracle-like estimators in the oracle model selection theoretic framework (Zou & Yuan 2008). Koenker (1984) did not discuss relative efficiency of the Hogg estimator relative to the least squares estimator. In this work we consider minimizing the locally weighted CQR loss and show that the local CQR smoothers have very interesting asymptotic efficiency properties. To our best knowledge, none of these has been studied in the literature.

To see why local linear CQR is an efficient alternative to local linear regression, we establish the asymptotic properties of the local linear CQR estimator. Some notation is necessary for the discussion. Let *F*(·) and *f*(·) denote the density function and cumulative distribution function of the error distribution, respectively. Denote by *f _{T}*(·) the marginal density function of the covariate

$${\mu}_{j}={\displaystyle \int {u}^{j}K(u)du\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\nu}_{j}={\displaystyle \int {u}^{j}{K}^{2}(u)du,\phantom{\rule{thinmathspace}{0ex}}j=0,1,2,\dots ,.}}$$

Define

$${R}_{1}(q)=\frac{1}{{q}^{2}}{\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{k\prime =1}^{q}\frac{{\tau}_{kk\prime}}{f({c}_{k})f({ck}_{\prime})}}},$$

(2.4)

where *c _{k}* =

*Suppose that t*_{0} *is an interior point of the support of f _{T}*(·).

$$\mathit{\text{Bias}}(\widehat{m}({t}_{0})|\mathbf{T})=\frac{1}{2}m\u2033({t}_{0}){\mu}_{2}{h}^{2}+{o}_{p}({h}^{2}),$$

(2.5)

$$\mathit{\text{Var}}(\widehat{m}({t}_{0})|\mathbf{T})=\frac{1}{nh}\frac{{\nu}_{0}{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}{R}_{1}(q)+{o}_{p}(\frac{1}{nh}).$$

(2.6)

*Furthermore, conditioning on* **T**, *we have*

$$\sqrt{nh}\{\widehat{m}({t}_{0})-m({t}_{0})-\frac{1}{2}m\u2033({t}_{0}){\mu}_{2}{h}^{2}\}\stackrel{\mathcal{L}}{\to}N\left(0,\frac{{\nu}_{0}{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}{R}_{1}(q)\right).$$

(2.7)

*where* $\stackrel{\mathcal{L}}{\to}$ *stands for convergence in distribution*.

In the proof given in section 5 we assume the error distribution is symmetric. Without such a condition the asymptotic bias will have a non-varnishing term. The asymptotic variance remains the same and the asymptotic normality still holds with a minor modification. In other words, the symmetric error distribution condition is only used to ensure that the quantity to which the local CQR estimator converges is the conditional mean. This is similar to the situation when using the local LAD to estimate the conditional mean function where we need to assume the mean and median of the error distribution coincide.

We see from Theorem 2.1 that the leading term of the asymptotic bias for the local linear CQR estimator is the same as that for the local linear least squares estimator, while their asymptotic variances are different. The mean squared error of (*t*_{0}) is

$$\text{MSE}\{\widehat{m}({t}_{0})\}={\{\frac{1}{2}m\u2033({t}_{0}){\mu}_{2}\}}^{2}{h}^{4}+\frac{1}{nh}\frac{{\nu}_{0}{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}{R}_{1}(q)+{o}_{p}({h}^{4}+\frac{1}{nh}).$$

By straightforward calculations we can see that the optimal variable bandwidth minimizing the asymptotic mean squared error of (*t*_{0}) is

$${h}^{\text{opt}}({t}_{0})={\left[\frac{{\nu}_{0}{\sigma}^{2}({t}_{0}){R}_{1}(q)}{{f}_{T}({t}_{0}){\{m\u2033({t}_{0}){\mu}_{2}\}}^{2}}\right]}^{1/5}{n}^{-1/5}.$$

In practice, one may select a constant bandwidth by minimizing the mean integrated squared error MISE () = ∫ MSE{(*t*_{0})}*w*(*t*) *dt* for a weight function *w*(*t*). Similarly, the optimal bandwidth minimizing the asymptotic MISE() is

$${h}^{\text{opt}}={\left[\frac{{\nu}_{0}{R}_{1}(q){\displaystyle \int {\sigma}^{2}(t){f}_{T}^{-1}(t)w(t)dt}}{{\mu}_{2}^{2}{\displaystyle \int {\{m\u2033(t)\}}^{2}w(t)dt}}\right]}^{1/5}{n}^{-1/5}.$$

The above calculations indicate that the local linear CQR estimator enjoys the optimal rate of convergence *n*^{2/5}.

In this section, we study the asymptotic relative efficiency of the local linear CQR estimator with respect to the local linear least squares estimator by comparing their mean squared errors. The role of *R*_{1} becomes clear in the relative efficiency study.

The local linear least squares estimator for *m*(*t*_{0}) has the mean squared error

$$\text{MSE}\{{\widehat{m}}_{\text{LS}}({t}_{0})\}={\{\frac{1}{2}m\u2033({t}_{0}){\mu}_{2}\}}^{2}{h}^{4}+\frac{1}{nh}\frac{{\nu}_{0}}{{f}_{T}({t}_{0})}{\sigma}^{2}({t}_{0})+{o}_{p}({h}^{4}+\frac{1}{nh}),$$

and hence

$${h}_{\text{LS}}^{\text{opt}}({t}_{0})={\left[\frac{{\nu}_{0}{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0}){\{m\u2033({t}_{0}){\mu}_{2}\}}^{2}}\right]}^{1/5}{n}^{-1/5},\phantom{\rule{thinmathspace}{0ex}}{h}_{\text{LS}}^{\text{opt}}={\left[\frac{{\nu}_{0}{\displaystyle \int {\sigma}^{2}(t){f}_{T}^{-1}(t)w(t)dt}}{{\mu}_{2}^{2}{\displaystyle \int {\{m\u2033(t)\}}^{2}w(t)dt}}\right]}^{1/5}{n}^{-1/5},$$

where ${h}_{\text{LS}}^{\text{opt}}({t}_{0})$ is the optimal variable bandwidth minimizing the asymptotic MSE and ${h}_{\text{LS}}^{\text{opt}}$ is the optimal bandwidth minimizing the asymptotic MISE. Therefore, we have

$${h}^{\text{opt}}({t}_{0})={R}_{1}{(q)}^{1/5}{h}_{\text{LS}}^{\text{opt}}({t}_{0}),\phantom{\rule{thinmathspace}{0ex}}{h}^{\text{opt}}={R}_{1}{(q)}^{1/5}{h}_{\text{LS}}^{\text{opt}}.$$

(2.8)

We use MSE_{opt} and MISE_{opt} to denote the MSE and MISE evaluated at their optimal bandwidth. Then by straightforward calculations we see that as *n* approaches ∞,

$$\frac{{\text{MSE}}_{\text{opt}}\{{\widehat{m}}_{\text{LS}}({t}_{0})\}}{{\text{MSE}}_{\text{opt}}\{\widehat{m}({t}_{0})\}}\to {R}_{1}{(q)}^{-4/5},\phantom{\rule{thinmathspace}{0ex}}\frac{{\text{MISE}}_{\text{opt}}\{{\widehat{m}}_{\text{LS}}\}}{{\text{MISE}}_{\text{opt}}\{\widehat{m}\}}\to {R}_{1}{(q)}^{-4/5}.$$

Thus, it is natural to define ARE(*, *_{LS} the asymptotic relative efficiency (ARE) of the local linear CQR estimator with respect to the local linear least squares estimator, as follows

$$\text{ARE}(\widehat{m},{\widehat{m}}_{\text{LS}})={R}_{1}{(q)}^{-4/5}.$$

(2.9)

The ARE only depends on the error distribution, although the dependence could be rather complex. However, for many commonly seen error distributions, we can directly compute the value of ARE. Table 1 displays the ARE(*, *_{LS}) for some commonly seen error distributions.

Several interesting observations can be made from Table 1. Firstly, when the error distribution is *N*(0, 1) for which the local linear least squares estimator is expected to have the best performance, the ARE(*, *_{LS}) is very close to 1 regardless the choice of *q* in the local linear CQR estimator. When *q* = 5 the the local linear CQR only loses at most 7% efficiency, while it performs as well as the local linear least squares estimator when *q* = 99. Secondly, for all the other non-normal distributions listed in Table 1, the local linear CQR estimator can have higher efficiencies than the local linear least squares estimator when a small *q* is used. The mixture of two normals is often used to model the so-called contaminated data. For such distributions, the ARE(*, *_{LS}) can be as large as 4.9 and even more. Table 1 also indicates that, except for the Laplace error, the local CQR with *q* = 5 or *q* = 9 are significantly better than the one with *q* = 1, which becomes the local LAD for these distributions. Finally, we observe that the ARE values for a variety of distributions are very close to 1 when *q* is large (*q* = 99). It turns out that this phenomenon is true in general, as demonstrated in the following theorem.

lim_{q→∞} *R*_{1}(*q*) = 1, *and thus* $\underset{q\to \infty}{\text{lim}}\mathit{\text{ARE}}(\widehat{m},{\widehat{m}}_{LS})=1$.

Theorem 2.2 provides us insights into the asymptotic behavior of the local linear CQR estimator and implies that the local linear CQR estimator is a safe competitor against the local linear least squares estimator, for it will not lose efficiency when using a large *q*. On the other hand, substaintial gain in efficiency could be achieved by using a relatively small *q* such as *q* = 9, as shown in Table 1.

In many situations we are interested in estimating the derivative of *m*(*t*). The local linear CQR also provides an estimator *′*(*t*_{0}) to the derivative of *m*(*t*). The asymptotic bias and variance of the estimate *′*(*t*_{0}) in (2.3) are given in (5.8) and (5.9) in section 5. The local linear CQR estimator and the local linear regression estimator have the same leading bias term which depends on the intrinsic part *m*(*t*_{0}) and the extra part $m\u2033({t}_{0}){f\prime}_{T}({t}_{0})/{f}_{T}({t}_{0})$. In Chu & Marron (1991) and Fan (1992), the authors already argued that the bias could be very large in many situations. So it may not be an ideal estimator because of the relatively large bias. The local quadratic regression is often preferred for estimating the derivative function, since it reduces the estimation bias without increasing the estimation variance (Fan & Gijbels 1992). We show here that the same phenomenon is true in local CQR smoothing.

We consider the local quadratic approximation of *m*(*t*) in the neighborhood of *t*_{0}: $m(t)\approx m({t}_{0})+m\prime ({t}_{0})(t-{t}_{0})+\frac{1}{2}m\u2033({t}_{0}){(t-{t}_{0})}^{2}$. Let **a** = (*a*_{1}, , *a _{q}*) and

$$(\mathbf{\xe2},\widehat{\mathbf{b}})=\underset{\mathbf{\text{a,b}}}{\text{arg min}}{\displaystyle \sum _{i=1}^{n}\left[{\displaystyle \sum _{k=1}^{q}{\rho}_{{\tau}_{k}}}\left({y}_{i}-{a}_{k}-{b}_{1}({t}_{i}-{t}_{0})-\frac{1}{2}{b}_{2}{({t}_{i}-{t}_{0})}^{2}\right)K(\frac{{t}_{i}-{t}_{0}}{h})\right]}.$$

(3.1)

Then the local quadratic CQR estimator for *m′*(*t _{0}*) is given by

$$\widehat{m}\prime ({t}_{0})={\widehat{b}}_{1}.$$

(3.2)

Denote

$${R}_{2}(q)=\left({\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{k\prime =1}^{q}{\tau}_{kk\prime}}}\right)/{\left({\displaystyle \sum _{k=1}^{q}f({c}_{k})}\right)}^{2}.$$

(3.3)

The asymptotic bias, variance and normality are given in the following theorem.

*Suppose that t*_{0} *is an interior point of the support of f _{T}(·). Under the regularity conditions (A)—(D) in section 5, if h → 0 and nh*

$$\mathit{\text{Bias}}(\widehat{m}\prime ({t}_{0})|\mathbf{T})=\frac{1}{6}m\u2034({t}_{0})\frac{{\mu}_{4}}{{\mu}_{2}}{h}^{2}+{o}_{p}({h}^{2}),$$

(3.4)

$$\mathit{\text{Var}}(\widehat{m}\prime ({t}_{0})|\mathbf{T})=\frac{1}{n{h}^{3}}\frac{{\nu}_{2}{\sigma}^{2}({t}_{0})}{{\mu}_{2}^{2}{f}_{T}({t}_{0})}{R}_{2}(q)+{o}_{p}(\frac{1}{n{h}^{3}}).$$

(3.5)

*Furthermore, conditioning on* **T**, *we have the following asymptotic normal distribution*

$$\sqrt{n{h}^{3}}(\widehat{m}\prime ({t}_{0})-m\prime ({t}_{0})-\frac{1}{6}m\u2034({t}_{0})\frac{{\mu}_{4}}{{\mu}_{2}}{h}^{2})\stackrel{\mathcal{L}}{\to}N\left(0,\frac{{\nu}_{2}{\sigma}^{2}({t}_{0})}{{\mu}_{2}^{2}{f}_{T}({t}_{0})}{R}_{2}(q)\right).$$

(3.6)

In Theorem 3.1 the symmetric-error-distribution assumption is used to get the asymptotic bias formula. Without that assumption, the asymptotic variance remains the same and the asymptotic normality still holds with a minor modification. It is also interesting to point out that when the variance function is homoscadastic the symmetric-error-distribution assumption is no longer needed for Theorem 3.1.

Comparing (5.8) and (3.4), we see that the extra part $m\u2033({t}_{0}){f\prime}_{T}({t}_{0})/{f}_{T}({t}_{0})$ is removed in the local quadratic CQR estimator. Comparing the local quadratic CQR and the local quadratic least squares estimators for *m′*(*t*_{0}), we see that they have the same leading bias term, while their asymptotic variances are different.

From Theorem 3.1, the mean squared error of local quadratic CQR estimator *′*(*t*_{0}) is given by

$$\text{MSE}\{m\prime ({t}_{0})\}={(\frac{1}{6}m\u2034({t}_{0})\frac{{\mu}_{4}}{{\mu}_{2}})}^{2}{h}^{4}+\frac{1}{n{h}^{3}}\frac{{\nu}_{2}{\sigma}^{2}({t}_{0})}{{\mu}_{2}^{2}{f}_{T}({t}_{0})}{R}_{2}(q)+{o}_{p}({h}^{4}+\frac{1}{n{h}^{3}}).$$

Thus, the optimal variable bandwidth minimizing MSE{*′*(*t*_{0})} is

$${h}^{\text{opt}}({t}_{0})={\{{R}_{2}(q)\}}^{1/7}{\left(\frac{27{\nu}_{2}{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0}){\{m\u2034({t}_{0}){\mu}_{4}\}}^{2}}\right)}^{1/7}{n}^{-1/7}.$$

Furthermore, we consider the mean integrated squared error MISE*′* = ∫ MSE{*′*(*t*)}*w*(*t*) *dt* with a weight function *w*(*t*). The optimal constant bandwidth minimizing the mean integrated squared error is given by

$${h}^{\text{opt}}={\{{R}_{2}(q)\}}^{1/7}{\left(\frac{27{\nu}_{2}{\displaystyle \int {\sigma}^{2}(t){f}_{T}^{-1}(t)w(t)\phantom{\rule{thinmathspace}{0ex}}dt}}{{\displaystyle \int {\{m\u2034(t)\}}^{2}w(t)\phantom{\rule{thinmathspace}{0ex}}dt\phantom{\rule{thinmathspace}{0ex}}{\mu}_{4}^{2}}}\right)}^{1/7}{n}^{-1/7}.$$

The above calculations indicate that the local quadratic CQR estimator enjoys the optimal rate of convergence *n*^{2/7}.

In what follows we study the asymptotic relative efficiency of the local quadratic CQR estimator with respect to the local quadratic least squares estimator. Note that the mean squared error of local quadratic least squares estimator ${\widehat{m}}_{\text{LS}}^{\prime}({t}_{0})$ is given by

$$\text{MSE}\{{\widehat{m}}_{\text{LS}}^{\prime}({t}_{0})\}={\left(\frac{1}{6}m\u2034({t}_{0})\frac{{\mu}_{4}}{{\mu}_{2}}\right)}^{2}{h}^{4}+\frac{1}{n{h}^{3}}\frac{{\nu}_{2}{\sigma}^{2}({t}_{0})}{{\mu}_{2}^{2}{f}_{T}({t}_{0})}+{o}_{p}({h}^{4}+\frac{1}{n{h}^{3}}),$$

and the mean integrated squared error (MISE) is $\text{MISE}({\widehat{m}}_{\text{LS}}^{\prime})={\displaystyle \int \text{MSE}}\{{\widehat{m}}_{\text{LS}}^{\prime}(t)\}w(t)dt$ with a weight function *w*(*t*). Thus, by straightforward calculations, we notice that

$${h}^{\text{opt}}({t}_{0})={h}_{\text{LS}}^{\text{opt}}({t}_{0}){R}_{2}{(q)}^{1/7},\phantom{\rule{thinmathspace}{0ex}}{h}^{\text{opt}}={h}_{\text{LS}}^{\text{opt}}{R}_{2}{(q)}^{1/7},$$

(3.7)

where ${h}_{\text{LS}}^{\text{opt}}({t}_{0})$ and ${h}_{\text{LS}}^{\text{opt}}$ are the corresponding optimal bandwidths of local quadratic least squares estimator. With the optimal bandwidths, we have

$$\frac{{\text{MSE}}_{\text{opt}}\{{\widehat{m}}_{\text{LS}}^{\prime}({t}_{0})\}}{{\text{MSE}}_{\text{opt}}\{\widehat{m}\prime ({t}_{0})\}}\to {R}_{2}{(q)}^{-4/7},\phantom{\rule{thinmathspace}{0ex}}\frac{{\text{MISE}}_{\text{opt}}({\widehat{m}}_{\text{LS}}^{\prime})}{{\text{MISE}}_{\text{opt}}(\widehat{m}\prime )}\to {R}_{2}{(q)}^{-4/7}.$$

Therefore, the asymptotic relative efficiency (ARE) of the local quadratic CQR estimator (*′*) with respect to the local quadratic least squares estimator $({\widehat{m}}_{\text{LS}}^{\prime})$ is defined to be

$$\text{ARE}(\widehat{m}\prime ,{\widehat{m}}_{\text{LS}}^{\prime})={R}_{2}{(q)}^{-4/7}.$$

(3.8)

The ARE only depends on the error distribution and it is scale invariant.

To gain insights into the asymptotic relative efficiency, we consider the limit when *q* is large. Zou & Yuan (2008) showed that

$$\underset{q\to \infty}{\text{lim}}{R}_{2}{(q)}^{-1}>\frac{6}{e\pi}=0.7026.$$

Immediately, we know that if using a large *q*, the ARE is bounded below by 0.7026^{4/7} = 0.8173. Having a universal lower bound is very useful because it prohibits severe loss in efficiency when replacing the local quadratic least squares estimator with the local quadratic CQR estimator. One of our contributions in this work is to provide an improved sharper lower bound as shown in the following theorem.

*Let denote the class of error distributions with mean 0 and variance 1, then we have*

$$\underset{f\in \mathcal{F}}{\text{inf}}\underset{q\to \infty}{\text{lim}}{R}_{2}{(q)}^{-1}=0.864.$$

(3.9)

*The lower bound is reached if and only if the error follows the rescaled Beta(2,2) distribution with mean zero and variance one. Thus*

$$\underset{q\to \infty}{\text{lim}}ARE(\widehat{m}\prime ,{\widehat{m}}_{\text{LS}}^{\prime})\ge 0.9199.$$

(3.10)

It is interesting to note that Theorem 3.2 provides us the *exact* lower bound of $\text{ARE}(\widehat{m}\prime ,{\widehat{m}}_{\text{LS}}^{\prime})$ as *q* → ∞. Theorem 3.2 indicates that if *q* is large, even in the worst scenario the potential efficiency loss for the local CQR estimator is only 8.01%.

Theorem 3.2 implies that the local quadratic CQR estimator is a safe alternative to the local quadratic least squares estimator. It concerns the worst case scenario. There are many optimistic scenarios as well in which the ARE can be much bigger than 1. We examine the $\text{ARE}(\widehat{m}\prime ,{\widehat{m}}_{\text{LS}}^{\prime})$ for the error distributions considered in Table 1. We also list the results in Table 2, where the column labeled *q* = ∞ shows the theoretical limit of the $\text{ARE}(\widehat{m}\prime ,{\widehat{m}}_{\text{LS}}^{\prime})$. Obviously, these limits are all larger than the lower bound 0.9199. The local quadratic CQR estimator only loses less than 4% efficiency when the error distribution is normal and *q* = 9. It is interesting to see that for the other non-normal distributions the $\text{ARE}(\widehat{m}\prime ,{\widehat{m}}_{\text{LS}}^{\prime})$ is larger than 1 and its value is insensitive to the choice of *q*. For example, with *q* = 9, the AREs are already very close to their theoretical limits.

In this section, we first use Monte Carlo simulation studies to assess the finite sample performance of the proposed estimation procedures and then demonstrate the application of the proposed method by using a real data example. Throughout this section we use the Epanechnikov kernel, i.e., $K(z)=\frac{3}{4}{(1-{z}^{2})}_{+}$. We adopt the MM algorithm proposed by (Hunter & Lange 2000) for solving the local CQR smoothing estimator. All the numerical results are computed using our MATLAB code, which is available upon request.

Bandwidth selection is an important issue in local smoothing. Here we briefly discuss the bandwidth selection issue in the local CQR smoothing estimator by using existing bandwidth selector for the local polynomial regression. Here we consider two bandwidth selectors.

*The “pilot” selector*. The idea is to use a pilot bandwidth in local cubic CQR (defined in section 5) to estimate*m″*(*t*) and*m*(*t*). The fitted residuals can be used to estimate*R*_{1}(*q*) and*R*_{2}(*q*). Thus, we can use the optimal bandwidth formula to estimate the optimal bandwidth and then refit the data.*A short-cut strategy*. In our numerical studies, we compare the local CQR and local least squares estimators. Note that in (2.8) and (3.7) we obtain very neat relationships between the optimal bandwidths for the local CQR and local least squares estimators. The optimal bandwidth for the local least squares estimators can be selected by existing bandwidth selectors (see Chapter 4 of Fan & Gijbels (1996)). In addition, we are able to infer the factors*R*_{1}(*q*) and*R*_{2}(*q*) from the residuals of the local least squares fit. Sometimes, we even know the exact values of the two factors (e.g., in simulations). Therefore, after fitting the local least squares estimator with the optimal bandwidth, we can estimate the optimal bandwidth for the local CQR estimator.

We used the short-cut strategy in our simulation examples. However, if the error variance is infinite or very large, then the local least squares estimator performs poorly. The “pilot” selector is a better choice than the short-cut strategy.

In our simulation studies, we compare the performance of the newly proposed method with the local polynomial least squares estimate. The bandwidth is set to the optimal one in which the ${h}_{\text{LS}}^{\text{opt}}$ is selected by a plug-in bandwidth selector (Ruppert, Sheather & Wand 1995). The performance of estimator (·) and *′*(·) is assessed via the average squared errors (ASE), defined by $\text{ASE}(\widehat{g})=\frac{1}{{n}_{\text{grid}}}{\displaystyle {\sum}_{k=1}^{{n}_{\text{grid}}}{\{\widehat{g}({u}_{k})-g({u}_{k})\}}^{2}}$, with *g* equals either *m*(·) or *m′*(·), where {*u _{k}, k* = 1, …,

We generated 400 data set, each consisting of *n* = 200 observations, from

$$Y=\text{sin}(2T)+2\phantom{\rule{thinmathspace}{0ex}}\text{exp}(-16{T}^{2})+0.5\u03f5,$$

(4.1)

where *T* follows *N*(0, 1). This model is adopted from Fan & Gijbels (1992). In our simulation, we considered five error distributions for ϵ: *N*(0, 1), Laplace, *t*_{3} distribution, a mixture of two normals (0.95*N*(0,1) + 0.05*N*(0, σ^{2}) with σ = 3,10). For the local polynomial CQR estimator, we consider *q* = 5, 9 and 19, and estimate *m*(*·*) and *m′*(·) over [−1.5, 1.5]. The mean and standard deviation of RASE over 400 simulations are summarized in Table 3. To see how the proposed estimate behaves at a typical point, Table 3 also depicts the biases and standard deviations of (*t*) and *′*(*t*) at *t* = 0.75. In Table 3, CQR_{5}, CQR_{9} and CQR_{19} correspond to the local CQR estimate with *q* = 5, 9 and 19, respectively.

It is of interest to investigate the effect of heteroscedastic errors. To this end, we generated 400 simulation data sets, each consisting of *n* = 200 observations, from

$$Y=T\phantom{\rule{thinmathspace}{0ex}}\text{sin}(2\pi T)+\sigma (T)\u03f5,$$

(4.2)

where *T* follows *U*(0, 1), σ(*t*) = {2 + cos(2*πt*)}/10, and ϵ is the same as that in Example 4.1. In this example, we estimate *m*(*t*) and *m′*(*t*) over [0,1]. The mean and standard deviation of RASE over 400 simulations are summarized in Table 4, in which we also show the biases and standard deviations of (*t*) and *′*(*t*) at *t* = 0.4. The notation of Table 4 is the same as that in Table 3.

Table 3 and Table 4 show very similar messages, although Table 4 indicates that the local CQR has more gains over the local least squares method. When the error follows the normal distribution, the RASEs of the local CQR estimators are slightly less than one. For non-normal distributions, the RASEs of the local CQR estimators can be greater than one, indicating the gain in efficiency. For estimating the regression function, CQR_{5} and CQR_{9} seem to have better overall performance than CQR_{19}. For estimating the derivative, all three CQR estimators perform very similarly. These findings are consistent with the theoretical analysis of AREs.

As an illustration, we now apply the proposed local CQR methodology to the U.K. Family Expenditure Survey data subset with high net-income, which consists of 363 observations. The scatter plot of data is depicted in the left panel of Figure 1. The data set was collected in the U.K. Family Expenditure Survey in 1973. Of interest is to study the relationship between the food expenditure and the net-income. Thus, we take the response variable *Y* to be the logarithm of the food expenditure, and the predictor variable *T* is the net-income.

The left panel is the scatter plot of data, the middle panel is the estimated regression function, and the right panel is the estimated derivative function.

We first estimated the regression function using the local least squares estimator with the plug-in bandwidth selector (Ruppert et al. 1995). We further employed the kernel density estimate to infer the error density *f*(·) based on the residuals from the local least squares estimator. Based on the estimated density, we estimated both *R*_{1}(*q*) and *R*_{2}(*q*), which were used to compute the bandwidth selector for the CQR estimator. For this example, the estimated ratios are close to 1, so we basically use the same bandwidths for these two methods. The selected bandwidths are 0.24 for regression estimation and 0.4 for derivative estimation. The CQR estimates with *q* = 5,9 and 19 with the selected bandwidths are evaluated. The CQR estimates with three different *q*’s are very similar, we only present the CQR estimate with *q* = 9 in Figure 1.

It is interesting to see from Figure 1 that the overall patten of the local least squares and the local CQR estimate are the same. The difference between the local least squares estimate and the local CQR estimate of the regression function becomes large when the net income around 2.8. From the scatter plot, there are two possible outlier observations: (2.7902,−2.5207) and (2.8063,−2.6105) (circled in the plot). To understand the impact of these two possible outliers, we re-evaluated the local CQR and the local least squares estimates after excluding these two possible outliers. The resulting estimates are depicted in the top panel of Figure 2, from which we can see that the local CQR estimate remains almost the same, while the local least squares estimate changes a lot. We also note that after removing these two possible outliers, the local least squares estimator becomes very close to the local CQR estimator. Furthermore, as a more extreme demonstration, we kept these two possible outliers in the data set and moved them to more extreme cases, i.e, we moved (2.7902,−2.5207) and (2.8063,−2.6105) to (2.7902,(2.7902,−26.5207) and (2.8063,−6.6105), respectively. After distorting the two observations, we re-calculated the local CQR and the local least squares estimate. The resulting estimates are depicted in the bottom panel of Figure 2, which clearly demonstrates that the local least squares estimate changes dramatically, while the local CQR estimate is nearly un-affected by the artificial data distortion.

In this section we establish asymptotic theory of the local *p*-polynomial CQR estimators. We then treat Theorems 2.1 and 3.1 as two special cases of the general theory. As a generalization of the local linear and local quadratic CQR estimators, the local *p*-polynomial CQR estimator is constructed by minimizing

$$\sum _{k=1}^{q}\left[{\displaystyle \sum _{i=1}^{n}{\rho}_{{\tau}_{k}}\left\{{y}_{i}-{a}_{k}-{\displaystyle \sum _{j=1}^{p}{b}_{j}{({t}_{i}-{t}_{0})}^{j}}\right\}K(\frac{{t}_{i}-{t}_{0}}{h})}\right],$$

(5.1)

and the local *p*-polynomial CQR estimators of *m*(*t*_{0}) and *m*^{(r)}(*t*_{0}) are given by

$$\widehat{m}({t}_{0})=\frac{1}{q}{\displaystyle \sum _{k=1}^{q}{\widehat{a}}_{k}},\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\widehat{m}}^{(r)}({t}_{0})=r!{\widehat{b}}_{r},\phantom{\rule{thinmathspace}{0ex}}r=1,\cdots ,p.$$

(5.2)

For the asymptotic analysis, we need the following regularity conditions:

*m*(*t*) has a continuous (*p*+ 2)^{th}derivative in the neighborhood of*t*_{0}.*f*(·), the marginal density function of_{T}*T*, is differentiable and positive in the neighborhood of*t*_{0}.- The conditional variance σ
^{2}(*t*) is continuous in the neighborhood of*t*_{0}. - Assume that the error has a symmetric distribution with a positive density
*f*(·).

We choose the kernel function *K* such that *K* is a symmetric density function with finite support [−*M, M*]. The following notation is needed to present the asymptotic properties of the local *p*-polynomial CQR estimator. Let *S*_{11} be a *q* × *q* diagonal matrix with diagonal elements *f*(*c _{k}*),

$$S=\left(\begin{array}{cc}{S}_{11}& {S}_{12}\\ {S}_{21}& {S}_{22}\end{array}\right),\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\mathrm{\Sigma}=\left(\begin{array}{cc}{\mathrm{\Sigma}}_{11}& {\mathrm{\Sigma}}_{12}\\ {\mathrm{\Sigma}}_{21}& {\mathrm{\Sigma}}_{22}\end{array}\right).$$

Partition *S*^{−1} into four submatrices as follows

$${S}^{-1}={\left(\begin{array}{cc}{S}_{11}& {S}_{12}\\ {S}_{21}& {S}_{22}\end{array}\right)}^{-1}=\left(\begin{array}{cc}{({S}^{-1})}_{11}& {({S}^{-1})}_{12}\\ {({S}^{-1})}_{21}& {({S}^{-1})}_{22}\end{array}\right),$$

where and hereafter, we use (·)_{11} to denote the left-top *q* × *q* submatrix and use (·)_{22} to denote the right-bottom *p* × *p* submatrix.

Furthermore, let ${u}_{k}=\sqrt{nh}\{{a}_{k}-m({t}_{0})-\sigma ({t}_{0}){c}_{k}\},{v}_{j}={h}^{j}\sqrt{nh}\{j!{b}_{j}-{m}^{(j)}({t}_{0})\}/j!$. Let *x _{i}* = (

The asymptotic properties of the local *p*-polynomial CQR estimator are based on the following theorem.

*Denote _{n}* = (û

$${\widehat{\theta}}_{n}+\frac{\sigma ({t}_{0})}{{f}_{T}({t}_{0})}{S}^{-1}E({W}_{n}^{*}|\mathbf{T})\stackrel{\mathcal{L}}{\to}MVN(0,\frac{{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}{S}^{-1}\mathrm{\Sigma}{S}^{-1}).$$

To prove theorem 5.1, we first establish Lemmas 5.2—5.3.

*Minimizing (5.1) is equivalent to minimizing*

$$\sum _{k=1}^{q}{u}_{k}}\left({\displaystyle \sum _{i=1}^{n}\frac{{K}_{i}{\eta}_{i,k}^{*}}{\sqrt{nh}}}\right)+{\displaystyle \sum _{j=1}^{p}{v}_{j}}\left({\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{i=1}^{n}\frac{{K}_{i}{x}_{i}^{j}{\eta}_{i,k}^{*}}{\sqrt{nh}}}}\right)+{\displaystyle \sum _{k=1}^{q}{B}_{n,k}(\theta )$$

*with respect to θ* = (*u*_{1}, , *u _{q}, ν*

$${B}_{n,k}(\theta )={\displaystyle \sum _{i=1}^{n}\{{K}_{i}{\displaystyle {\int}_{0}^{{\mathrm{\Delta}}_{i,k}}\left[I({\u03f5}_{i}\le {c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})}+\frac{z}{\sigma ({t}_{i})})-I({\u03f5}_{i}\le {c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})\right]\phantom{\rule{thinmathspace}{0ex}}dz}\}}.$$

*Proof*. To apply the identity (Knight 1998)

$${\rho}_{\tau}(x-y)-{\rho}_{\tau}(x)=y(I(x\le 0)-\tau )+{\displaystyle {\int}_{0}^{y}\{I(x\le z)-I(x\le 0)\}dz},$$

(5.3)

we write ${y}_{i}-{a}_{k}-{\displaystyle {\sum}_{j=1}^{p}{b}_{i}{({t}_{i}-{t}_{0})}^{j}=\sigma ({t}_{i})({\u03f5}_{i}-{c}_{k})+{d}_{i,k}-{\mathrm{\Delta}}_{i,k}}$. Minimizing (5.1) is equivalent to minimizing

$${L}_{n}(\theta )={\displaystyle \sum _{i=1}^{n}\left\{{K}_{i}{\displaystyle \sum _{k=1}^{q}\left[{\rho}_{{\tau}_{k}}(\sigma ({t}_{i})({\u03f5}_{i}-{c}_{k})+{d}_{i,k}-{\mathrm{\Delta}}_{i,k})-{\rho}_{{\tau}_{k}}(\sigma ({t}_{i})({\u03f5}_{i}-{c}_{k})+{d}_{i,k})\right]}\right\}}.$$

Using the identity (5.3) and with some straightforward calculations, it follows that

$${L}_{n}(\theta )={\displaystyle \sum _{k=1}^{q}{u}_{k}}\left({\displaystyle \sum _{i=1}^{n}\frac{{K}_{i}{\eta}_{i,k}^{*}}{\sqrt{nh}}}\right)+{\displaystyle \sum _{j=1}^{p}{v}_{j}}\left({\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{i=1}^{n}\frac{{K}_{i}{x}_{i}^{j}{\eta}_{i,k}^{*}}{\sqrt{nh}}}}\right)+{\displaystyle \sum _{k=1}^{q}{B}_{n,k}(\theta )}.$$

This completes the proof.

Let *S*_{n,11} be a *q* × *q* diagonal matrix with diagonal elements $f({c}_{k}){\displaystyle {\sum}_{i=1}^{n}\frac{{K}_{i}}{nh\sigma ({t}_{i})},k=1,\cdots ,q}$; *S*_{n,12} be a *q* × *p* matrix with (*k, j*)-element $f({c}_{k}){\displaystyle {\sum}_{i=1}^{n}\frac{{K}_{i}{x}_{i}^{j}}{nh\sigma ({t}_{i})},j=1,\cdots ,p}$; *S*_{n,22} be a *p* × *p* matrix with (*j, j′*) element $\sum}_{k=1}^{q}f({c}_{k}){\displaystyle {\sum}_{i=1}^{n}\frac{{K}_{i}{x}_{i}^{j+j\prime}}{nh\sigma ({t}_{i})}$. Denote

$${S}_{n}=\left(\begin{array}{cc}{S}_{n,11}& {S}_{n,12}\\ {S}_{n,12}^{T}& {S}_{n,22}\end{array}\right).$$

*Under Conditions (A) – (C)*, ${L}_{n}(\theta )=\frac{1}{2}{\theta}^{T}{S}_{n}\theta +{({W}_{n}^{*})}^{T}\theta +{o}_{p}(1)$.

*Proof*. Write *L _{n}*(θ) as

$${L}_{n}(\theta )={\displaystyle \sum _{k=1}^{q}{u}_{k}\left({\displaystyle \sum _{i=1}^{n}\frac{{K}_{i}{\eta}_{i,k}^{*}}{\sqrt{nh}}}\right)}+{\displaystyle \sum _{j=1}^{p}{v}_{j}}\left({\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{i=1}^{n}\frac{{K}_{i}{x}_{i}^{j}{\eta}_{i,k}^{*}}{\sqrt{nh}}}}\right)+{\displaystyle \sum _{k=1}^{q}{E}_{\u03f5}[{B}_{n,k}(\theta )|\mathbf{T}]}+{\displaystyle \sum _{k=1}^{q}{R}_{n,k}(\theta )},$$

where *R _{n,k}*(θ) =

$$\begin{array}{c}\text{}{\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{i=1}^{n}\left[{K}_{i}{\displaystyle {\int}_{0}^{{\mathrm{\Delta}}_{i,k}}\left\{\frac{z}{\sigma ({t}_{i})}f({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})+o(z)\right\}\phantom{\rule{thinmathspace}{0ex}}dz}\right]}}\hfill \\ ={\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{i=1}^{n}\left[{K}_{i}{\mathrm{\Delta}}_{i,k}^{2}\frac{f({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})}{2\sigma ({t}_{i})}\right]}+{o}_{p}(1)}\hfill \\ ={\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{i=1}^{n}\left[{K}_{i}{\mathrm{\Delta}}_{i,k}^{2}\frac{f({c}_{k})}{2\sigma ({t}_{i})}\right]}+{o}_{p}(1)=\frac{1}{2}{\theta}^{T}{S}_{n}\theta +{o}_{p}(1)}\hfill \end{array}$$

We now prove *R _{n,k}*(θ) =

$$\begin{array}{c}{\mathit{\text{Var}}}_{\u03f5}[{B}_{n,k}(\theta )|\mathbf{T}]\hfill \\ ={\displaystyle \sum _{i=1}^{n}{\mathit{\text{Var}}}_{\u03f5}\left[\left\{{K}_{i}{\displaystyle {\int}_{0}^{{\mathrm{\Delta}}_{i,k}}\left[I({\u03f5}_{i}\le {c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})}+\frac{z}{\sigma ({t}_{i})})-I({\u03f5}_{i}\le {c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})\right]\phantom{\rule{thinmathspace}{0ex}}dz}\right\}|\mathbf{T}\right]}\hfill \\ \le {\displaystyle \sum _{i=1}^{n}{E}_{\u03f5}\left[{\{{K}_{i}{\displaystyle {\int}_{0}^{{\mathrm{\Delta}}_{i,k}}\left[I({\u03f5}_{i}\le {c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})}+\frac{z}{\sigma ({t}_{i})})-I({\u03f5}_{i}\le {c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})\right]\phantom{\rule{thinmathspace}{0ex}}dz}\}}^{2}|\mathbf{T}\right]}\hfill \\ \le {\displaystyle \sum _{i=1}^{n}{K}_{i}^{2}{\displaystyle {\int}_{0}^{|{\mathrm{\Delta}}_{i,k}|}{\displaystyle {\int}_{0}^{|{\mathrm{\Delta}}_{i,k}|}\left[F({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})}+\frac{|{\mathrm{\Delta}}_{i,k}|}{\sigma ({t}_{i})})-F({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})\right]\phantom{\rule{thinmathspace}{0ex}}d{z}_{1}d{z}_{2}}}}\hfill \\ =o({\displaystyle \sum _{i=1}^{n}{K}_{i}^{2}{\mathrm{\Delta}}_{i,k}^{2})={o}_{p}}(1).\hfill \end{array}$$

Similar to Parzen (1962), we have $\frac{1}{nh}{\displaystyle {\sum}_{i=1}^{n}{K}_{i}{x}_{i}^{j}\stackrel{P}{\to}{f}_{T}({t}_{0}){\mu}_{j}},\phantom{\rule{thinmathspace}{0ex}}\text{where}\phantom{\rule{thinmathspace}{0ex}}\stackrel{P}{\to}$ stands for convergence in probability. Thus,

$${S}_{n}\stackrel{P}{\to}\frac{{f}_{T}({t}_{0})}{\sigma ({t}_{0})}S=\frac{{f}_{T}({t}_{0})}{\sigma ({t}_{0})}\left(\begin{array}{cc}{S}_{11}\hfill & {S}_{12}\hfill \\ {S}_{21}& {S}_{22}\end{array}\right).$$

This, together with Lemmas 5.2, 5.3, leads to

$${L}_{n}(\theta )=\frac{1}{2}\frac{{f}_{T}({t}_{0})}{\sigma ({t}_{0})}{\theta}^{T}S\theta +{({W}_{n}^{*})}^{T}\theta +{o}_{p}(1).$$

Since the convex function ${L}_{n}(\theta )-{({W}_{n}^{*})}^{T}\theta $ converges in probability to the convex function $\frac{1}{2}\frac{{f}_{T}({t}_{0})}{\sigma ({t}_{0})}{\theta}^{T}S\theta $, it follows from the convexity lemma (Pollard 1991) that for any compact set Θ, the quadratic approximation to *L _{n}*(θ) holds uniformly for θ in any compact set, which leads to

$${\widehat{\theta}}_{n}=-\frac{\sigma ({t}_{0})}{{f}_{T}({t}_{0})}{S}^{-1}{W}_{n}^{*}+{o}_{p}(1).$$

Denote *η _{i,k}* =

$$\frac{{W}_{n}|\mathbf{T}-E[{W}_{n}|\mathbf{T}]}{\sqrt{\mathit{\text{Var}}[{W}_{n}|\mathbf{T}]}}\stackrel{\mathcal{L}}{\to}MVN(0,{I}_{(p+q)\times (p+q)}).$$

(5.4)

Note that *Cov*(*η _{i,k}*,

$${\widehat{\theta}}_{n}+\frac{\sigma ({t}_{0})}{{f}_{T}({t}_{0})}{S}^{-1}E({W}_{n}^{*}|\mathbf{T})\stackrel{\mathcal{L}}{\to}MVN(0,\frac{{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}{S}^{-1}\mathrm{\Sigma}{S}^{-1}).$$

(5.5)

This completes the proof.

The asymptotic normality follows Theorem 5.1 with *p* = 1. Let us calculate the conditional bias and variance, respectively. Denote by *e*_{q×1} the vector that contains *q* 1’s. When *p* = 1, *S* is a diagonal matrix with diagonal elements $f({c}_{1}),\cdots ,f({c}_{q}),{\mu}_{2}{\displaystyle {\sum}_{k=1}^{q}f({c}_{k})}$. So the asymptotic conditional bias of $\widehat{m}({t}_{0})=\frac{1}{q}{\displaystyle {\sum}_{k=1}^{q}{\widehat{a}}_{k}}$ is

$$\begin{array}{cc}\mathit{\text{Bias}}(\widehat{m}({t}_{0})|\mathbf{T})\hfill & =\frac{1}{q}\sigma ({t}_{0}){\displaystyle \sum _{k=1}^{q}{c}_{k}-\frac{1}{q\xb7\sqrt{nh}}\frac{\sigma ({t}_{0})}{{f}_{T}({t}_{0})}{e}_{q\times 1}^{T}{({S}^{-1})}_{11}E({W}_{1n}^{*}|\mathbf{T})}\hfill \\ \hfill & =\frac{1}{q}\sigma ({t}_{0}){\displaystyle \sum _{k=1}^{q}{c}_{k}-\frac{1}{q\xb7nh}\frac{\sigma ({t}_{0})}{{f}_{T}({t}_{0})}{\displaystyle \sum _{i=1}^{n}{K}_{i}}{\displaystyle \sum _{k=1}^{q}\frac{1}{f({c}_{k})}\{F({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})-F({c}_{k})\}.}}\hfill \end{array}$$

Note that the error is symmetric, thus ${\sum}_{k=1}^{q}{c}_{k}=0$, and furthermore, it is easy to check that $\frac{1}{q}{\displaystyle {\sum}_{k=1}^{q}\frac{1}{f({c}_{k})}\{F({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})-F({c}_{k})\}=-\frac{{r}_{i,p}}{\sigma ({t}_{i})}\{1+{o}_{p}(1)\}}$. Therefore,

$$\mathit{\text{Bias}}(\widehat{m}({t}_{0})|\mathbf{T})=\frac{1}{nh}\frac{\sigma ({t}_{0})}{{f}_{T}({t}_{0})}{\displaystyle \sum _{i=1}^{n}{K}_{i}\frac{{r}_{i,p}}{\sigma ({t}_{i})}\{1+{o}_{p}(1)\}}.$$

By using the fact that

$$\frac{1}{nh}{\displaystyle \sum _{i=1}^{n}{K}_{i}\frac{{r}_{i,p}}{\sigma ({t}_{i})}=\frac{{f}_{T}({t}_{0})m\u2033({t}_{0})}{2\sigma ({t}_{0})}{\mu}_{2}{h}^{2}\{1+{o}_{p}(1)\}},$$

we obtain

$$\mathit{\text{Bias}}(\widehat{m}({t}_{0})|\mathbf{T})=\frac{1}{2}m\u2033({t}_{0}){\mu}_{2}{h}^{2}+{o}_{p}({h}^{2}).$$

(5.6)

Furthermore, the conditional variance of (*t*_{0}) is

$$\begin{array}{cc}\mathit{\text{Var}}(\widehat{m}({t}_{0})|\mathbf{T})\hfill & =\frac{1}{nh}\frac{{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}\frac{1}{{q}^{2}}{e}_{q\times 1}^{T}{({S}^{-1}\mathrm{\Sigma}{S}^{-1})}_{11}{e}_{q\times 1}+{o}_{p}(\frac{1}{nh})\hfill \\ \hfill & =\frac{1}{nh}\frac{{\nu}_{0}{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}{R}_{1}(q)+{o}_{p}(\frac{1}{nh}).\hfill \end{array}$$

(5.7)

By using Theorem 5.1, we can further derive the asymptotic bias and variance of *′*(*t*_{0}) given in (2.3):

$$\mathit{\text{Bias}}(\tilde{m}\prime ({t}_{0})|\mathbf{T})=\frac{1}{6}\left(m\u2034({t}_{0})+3m\u2033({t}_{0})\frac{{f\prime}_{T}({t}_{0})}{{f}_{T}({t}_{0})}\right)\frac{{\mu}_{4}}{{\mu}_{2}}{h}^{2}+{o}_{p}({h}^{2}),$$

(5.8)

$$\mathit{\text{Var}}(\tilde{m}\prime ({t}_{0})|\mathbf{T})=\frac{1}{n{h}^{3}}\frac{{\nu}_{2}{\sigma}^{2}({t}_{0})}{{\mu}_{2}^{2}{f}_{T}({t}_{0})}{R}_{2}(q)+{o}_{p}(\frac{1}{n{h}^{3}}).$$

(5.9)

Note that

$$\begin{array}{cc}\underset{q\to \infty}{\text{lim}}{R}_{1}(q)\hfill & ={\displaystyle {\int}_{0}^{1}{\displaystyle {\int}_{0}^{1}\frac{{s}_{1}\wedge {s}_{2}-{s}_{1}{s}_{2}}{f({F}^{-1}({s}_{1}))f({F}^{-1}({s}_{2}))}d{s}_{1}d{s}_{2}}}\hfill \\ \hfill & ={\displaystyle {\int}_{-\infty}^{\infty}{\displaystyle {\int}_{-\infty}^{\infty}\left(F({z}_{1})\wedge F({z}_{2})-F({z}_{1})F({z}_{2})\right)\phantom{\rule{thinmathspace}{0ex}}d{z}_{1}d{z}_{2}.}}\hfill \end{array}$$

(5.10)

by change of variables. Define two functions below $G(s)={\displaystyle {\int}_{-\infty}^{s}F(t)dt}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}H(s)={\displaystyle {\int}_{-\infty}^{s}G(t)\phantom{\rule{thinmathspace}{0ex}}dt}$. It is easy to verify that

$$G(s)={\displaystyle {\int}_{-\infty}^{s}(s-x)f(x)dx=sF(s)-{k}_{1}(s)},$$

(5.11)

where ${k}_{1}(s)={\displaystyle {\int}_{-\infty}^{s}xf(x)dx}$. Similarly, we obtain

$$2H(s)={\displaystyle {\int}_{-\infty}^{s}{(s-x)}^{2}f(x)dx={s}^{2}F(s)-2s{k}_{1}(s)+{k}_{2}(s)},$$

(5.12)

where ${k}_{2}(s)={\displaystyle {\int}_{-\infty}^{s}{x}^{2}f(x)dx}$. Let *I* be the integral in (5.10). We have that *I* equals

$$2{\displaystyle {\int}_{-\infty}^{\infty}\left({\displaystyle {\int}_{{z}_{1}}^{\infty}f(t)dt}\right)G({z}_{1})d{z}_{1}=2{\displaystyle {\int}_{-\infty}^{\infty}f(t)\left({\displaystyle {\int}_{-\infty}^{t}G({z}_{1})d{z}_{1}}\right)dt={\displaystyle {\int}_{-\infty}^{\infty}2f(t)H(t)dt}}}.$$

(5.13)

By the definition of *G* and *H*, we know $\frac{d(2H(t)F(t)-{G}^{2}(t))}{dt}=2H(t)f(t)$; and combining (5.11) and (5.12) yields $2H(t)F(t)-{G}^{2}(t)={k}_{2}(t)F(t)-{k}_{1}^{2}(t)$. Now it is easy to see that *I* equals 1, by the facts that ${\int}_{-\infty}^{\infty}{x}^{2}f(x)dx={E}_{F}[{\u03f5}^{2}]=1}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\displaystyle {\int}_{-\infty}^{\infty}xf(x)dx={E}_{F}[\u03f5]=0$.

We apply Theorem 5.1 to get the asymptotic normality. Denote by *e _{r}* the

Note that ${({S}^{-1})}_{21}=-{({S}^{-1})}_{22}{S}_{21}{S}_{11}^{-1}$. Thus, (*S*^{−1})_{21} equals ${({0}_{q\times 1},[{\mu}_{2}/\{({\mu}_{4}-{\mu}_{2}^{2}){\sum}_{k=1}^{q}f({c}_{k})\}]{1}_{q\times 1})}^{T}$. By Theorem 5.1

$$\begin{array}{cc}\mathit{\text{Bias}}(\widehat{m}\prime ({t}_{0})|\mathbf{T})\hfill & =-\frac{\sigma ({t}_{0})}{h{f}_{T}({t}_{0})}\frac{1}{\sqrt{nh}}{e}_{1}^{T}\{{({S}^{-1})}_{21}E({W}_{1n}^{*}|\mathbf{T})+{({S}^{-1})}_{22}E({W}_{2n}^{*}|\mathbf{T})\}\hfill \\ \hfill & =-\frac{\sigma ({t}_{0})}{h{f}_{T}({t}_{0})}\frac{1}{{\mu}_{2}{\displaystyle {\sum}_{k=1}^{q}f({c}_{k})}}\frac{1}{\sqrt{nh}}E({w}_{21}^{*}|\mathbf{T}).\hfill \end{array}$$

Note that $E({w}_{2j}^{*}|\mathbf{T})=\frac{1}{\sqrt{nh}}{\displaystyle {\sum}_{i=1}^{n}{K}_{i}{x}_{i}^{j}{\displaystyle {\sum}_{k=1}^{q}\{F({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})-F({c}_{k})\}}}$. Similarly, under condition (D), we have $\sum}_{k=1}^{q}\{F({c}_{k}-\frac{{d}_{i,k}}{\sigma ({t}_{i})})-F({c}_{k})\}=-{\displaystyle {\sum}_{k=1}^{q}f({c}_{k})\cdot \frac{{r}_{i,p}}{\sigma ({t}_{i})}\{1+{o}_{p}(1)\}$. Therefore, *Bias*(*′*(*t*_{0}|**T**) is equal to $\frac{1}{n{h}^{2}}\frac{\sigma ({t}_{0})}{{f}_{T}({t}_{0})}{\displaystyle {\sum}_{i=1}^{n}{K}_{i}{x}_{i}\frac{{r}_{i,p}}{\sigma ({t}_{i})}\{1+{o}_{p}(1)\}}$. For *p* = 2,

$$\frac{1}{nh}{\displaystyle \sum _{i=1}^{n}{K}_{i}{x}_{i}\frac{{r}_{i,p}}{\sigma ({t}_{i})}=\frac{{f}_{T}({t}_{0})m\u2034({t}_{0})}{6\sigma ({t}_{0})}\frac{{\mu}_{4}}{{\mu}_{2}}{h}^{3}\{1+{o}_{p}(1)\}},$$

we obtain

$$Bias(\widehat{m}\prime ({t}_{0})|\mathbf{T})=\frac{1}{6}m\u2034({t}_{0})\frac{{\mu}_{4}}{{\mu}_{2}}{h}^{2}+{o}_{p}({h}^{2}).$$

(5.14)

Furthermore, the conditional variance of (*t*_{0}) is

$$\begin{array}{cc}\mathit{\text{Var}}(\widehat{m}\prime ({t}_{0})|\mathbf{T})\hfill & =\frac{1}{n{h}^{3}}\frac{{\sigma}^{2}({t}_{0})}{{f}_{T}({t}_{0})}{e}_{1}^{T}{({S}^{-1}\mathrm{\Sigma}{S}^{-1})}_{22}{e}_{1}+{o}_{p}(\frac{1}{n{h}^{3}}),\hfill \\ \hfill & =\frac{1}{n{h}^{3}}\frac{{\nu}_{2}{\sigma}^{2}({t}_{0})}{{\mu}_{2}^{2}{f}_{T}({t}_{0})}{R}_{2}(q)+{o}_{p}(\frac{1}{n{h}^{3}}).\hfill \end{array}$$

(5.15)

which completes the proof.

From Zou & Yuan (2008), we know that

$$\underset{q\to \infty}{\text{lim}}\phantom{\rule{thinmathspace}{0ex}}{\left({\displaystyle \sum _{k=1}^{q}f({c}_{k})}\right)}^{2}/\left({\displaystyle \sum _{k=1}^{q}{\displaystyle \sum _{k\prime =1}^{q}{\tau}_{kk\prime}}}\right)=12{E}_{F}^{2}[f(\u03f5)]=12{\left({\displaystyle \int {f}^{2}(x)dx}\right)}^{2}.$$

Thus ${\text{lim}}_{q\to \infty}\frac{1}{{R}_{2}(q)}=12{({\displaystyle \int {f}^{2}(x)dx})}^{2}$. We notice that 12 (∫ *f*^{2}(*x*)*dx*)^{2} is also the asymptotic Pitman efficiency of the Wilcoxon test relative to the t-test Hodges & Lehmann (1956). For the rest of the proof, readers are referred to Hodges & Lehmann (1956).

In this paper our theoretical analysis deals with the classical setting in which *t*_{0} is an interior point and the error distribution has finite variance. We should point out here that the same arguments hold for estimating boundary points and the proposed methodology is valid even when the error variance is infinite.

**Automatic boundary correction.**For simplicity, consider*t*ϵ [0, 1] and*t*_{0}=*ch*for some constant*c*. We show that the leading team of the asymptotic bias of the local linear/quadratic CQR estimator is the same as that of the local linear/quadratic LS estimator, which indicates that the local CQR estimator enjoys the property of automatical boundary correction, a nice property of local LS estimator. Furthermore, the asymptotic relative efficiency remains exactly the same as that for interior points.**Infinite error variance.**We show that the local CQR estimator still enjoys the optimal rate of convergence and asymptotic normality even when the conditional variance is infinite. This property can be important for real applications, since we have no information on the error distribution in practice.

For detailed theoretical proof of the above claims, we refer interested readers to a supplementary file (Kai, Li & Zou 2009) of this paper, where we also provide additional simulation results to support the theory. We opt not to show these results here due to space limit.

In this paper, we focus on the local CQR estimate for the nonparametric regression model. The proposed methodology and theory may be extended to the settings in presence of multivariate covariates by considering varying coefficient models, additive models or semiparametric models. Such extensions are of great interest, and further research is needed for such extensions.

Finally, we would like to point out that the local CQR procedure is efficiently implemented using the MM algorithm. Our experiences show that for *q* = 9 and sample size *n* = 7000, the local CQR fit at a given location can be computed within 0.32 seconds on an AMD 1.9GHz machine. The MM implementation seems to be more efficient than the standard linear programming. We discuss the computing algorithm in details in a separate article.

The authors are grateful to the editor, the associate editor and two referees for their helpful and constructive comments, which lead to a substantial improvement of the quality of this paper.

Kai’s research is supported by NIDA, NIH grants R21 DA024260 and P50 DA10075 as a research assistant. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

Li’s research is supported by National Science Foundation grants DMS 0348869 and DMS 0722351.

Zou’s research is supported by National Science Foundation grant DMS 0706733.

- Chu C-K, Marron JS. Choosing a kernel regression estimator. Statist. Sci. 1991;6(4):404–436.
- Fan J. Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 1992;87(420):998–1004.
- Fan J, Gijbels I. Variable bandwidth and local linear regression smoothers. Ann. Statist. 1992;20(4):2008–2036.
- Fan J, Gijbels I. Local polynomial modelling and its applications. London: Chapman & Hall; 1996.
- Fan J, Hu TC, Truong YK. Robust non-parametric function estimation. Scand. J. Statist. 1994;21(4):433–446.
- Hodges J, Lehmann E. The efficiency of some nonparametric competitors of the t-test. Ann. Math. Stat. 1956;27(2):324–335.
- Hunter DR, Lange K. Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics. 2000;9(1):60–77.
- Kai B, Li R, Zou H. Supplementary materials for “local cqr smoothing: An efficient and safe alternative to local polynomial regression” 2009 Technical report, http://www.stat.psu.edu/rli/research/Supplement-of-localCQR.pdf. [PMC free article] [PubMed]
- Knight K. Limiting distributions for
*L*_{1}regression estimators under general conditions. Ann. Statist. 1998;26(2):755–770. - Koenker R. A note on l-estimates for linear models. Stat. and Prob. Letters. 1984;2(6):323–325.
- Koenker R. Quantile regression. Cambridge: Cambridge University Press; 2005.
- Parzen E. On estimation of a probability density function and mode. Ann. Math. Statist. 1962;33:1065–1076.
- Pollard D. Asymptotics for least absolute deviation regression estimators. Econometric Theory. 1991;7(2):186–199.
- Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 1995;90(432):1257–1270.
- Welsh AH. Robust estimation of smooth regression and spread functions and their derivatives. Statist. Sinica. 1996;6(2):347–366.
- Yu K, Jones MC. Local linear quantile regression. J. Amer. Statist. Assoc. 1998;93(441):228–237.
- Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. The Annals of Statistics. 2008;36(3):1108–1126.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |