J R Stat Soc Series B Stat Methodol. Author manuscript; available in PMC 2011 January 1.
Published in final edited form as:
J R Stat Soc Series B Stat Methodol. 2010 January; 72(1): 49–69.
PMCID: PMC2958780
NIHMSID: NIHMS115056

Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression

Bo Kai,* Runze Li, Professor of Statistics, and Hui Zou, Associate Professor of Statistics

Abstract

Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. In this paper, we propose a new nonparametric regression technique called local composite-quantile-regression (CQR) smoothing in order to further improve local polynomial regression. Sampling properties of the proposed estimation procedure are studied. We derive the asymptotic bias, variance and normality of the proposed estimate. Asymptotic relative efficiency of the proposed estimate with respect to the local polynomial regression is investigated. It is shown that the proposed estimate can be much more efficient than the local polynomial regression estimate for various non-normal errors, while being almost as efficient as the local polynomial regression estimate for normal errors. Simulation is conducted to examine the performance of the proposed estimates. The simulation results are consistent with our theoretical findings. A real data example is used to illustrate the proposed method.

Key words and phrases: Asymptotic efficiency, CQR estimator, Kernel function, Local polynomial regression, Nonparametric regression

1 Introduction

Consider the general nonparametric regression model

$Y=m(T)+σ(T)ϵ,$
(1.1)

where Y is the response variable, T is a covariate, m(T) = E(Y|T), which is assumed to be a smooth nonparametric function, and σ(T) is a positive function representing the standard deviation. We assume ϵ has mean 0 and variance 1. Local polynomial regression is a popular and successful method for nonparametric regression, and it has been well studied in the literature (Fan & Gijbels 1996). By locally fitting a linear (or polynomial) regression model via adaptively weighted least squares, local polynomial regression is able to explore the fine features of the regression function and its derivatives. Although the least squares method is a popular and convenient choice in local polynomial fitting, we may consider using different local fitting methods. For example, in the presence of outliers, one may consider local least absolute deviation (LAD) polynomial regression (Fan, Hu & Truong 1994, Welsh 1996). When the error follows a Laplacian distribution, the local LAD polynomial regression is more efficient than the local least squares polynomial regression. Of course, the local LAD polynomial regression can do much worse than the local least squares polynomial regression in other different settings. The aim of this paper is to develop a new local estimation procedure that can significantly improve upon the classical local polynomial regression for a wide class of error distributions, and has comparable efficiency in the worst case scenario.

Our proposal is built upon the composite-quantile-regression (CQR) estimator recently proposed by Zou & Yuan (2008) for estimating the regression coefficients in the classical linear regression model. Zou & Yuan (2008) show that the relative efficiency of the CQR estimator compared to the least squares estimator is greater than 70% regardless the error distribution. Furthermore, the CQR estimator could be much more efficient and sometimes arbitrarily more efficient than the least squares estimator. These nice theoretical properties of CQR in linear regression motivates us to construct the local CQR smoothers as nonparametric estimates of the regression function and its derivatives.

We make several contributions in this paper.

• We propose the local linear CQR estimator for estimating the nonparametric regression function. We establish the asymptotic theory of the local linear CQR estimator and show that, compared with the classical local linear least squares estimator, the new method can significantly improve the estimation efficiency of the local linear least squares estimator for commonly used non-normal error distributions.
• We propose the local quadratic CQR estimator for estimating the derivative of the regression function. The asymptotic theory shows that the local quadratic CQR estimator can often drastically improve the estimation efficiency of its local least squares counterpart if the error distribution is non-normal, and at the same time, the loss in efficiency is at most 8.01% in the worst case scenario.
• The general asymptotic theory of the local p-polynomial CQR estimator is established. Our theory does not require the error distribution to have a finite variance. Therefore, local CQR estimators can work well even when local polynomial regression fails due to the infinite variance in the noise.

It is a well-known fact that the local linear (polynomial) regression is the best linear smoother in terms of efficiency (Fan & Gijbels 1996). There is no contradiction between this fact and our results, because the proposed local CQR estimator is a nonlinear smoother.

The rest of this paper is organized as follows. In section 2, we introduce the local linear CQR for the nonparametric regression and study its asymptotic properties. In section 3, we propose the local quadratic CQR for estimating the derivative of the nonparametric regression, which is able to further reduce the estimation bias by the local linear CQR. Monte Carlo study and a real data example are presented in section 4. In section 5 we present the general theory of the local p-polynomial CQR and technical proofs.

2 Estimation of regression function

Suppose that (ti,yi), i = 1, , n, is an independent and identically distributed random sample. Consider estimating the value of m(T) at t0. In local linear regression we first approximate m(t) locally by a linear function m(t) ≈ m(t0) + m′(t0)(t − t0) and then fit a linear model locally in a neighborhood of t0. Let K(·) be a smooth kernel function, the local linear regression estimators of m(t0) is â, where

$(a^,b^)=arg min⁡a,b∑i=1n{yi−a−b(ti−t0)}2K(ti−t0h),$
(2.1)

where h is the smoothing parameter. Local linear regression enjoys many good theoretical properties, such as its design adaptation property and high minimax efficiency (Fan & Gijbels 1992). However, local least squares regression breaks down when the error distribution does not have finite second moment, for the estimator is no longer consistent. The local least absolute deviation (LAD) polynomial regression (Fan et al. 1994, Welsh 1996) replaces the least squares loss in (2.1) with the L1 loss. By doing so, the local LAD estimator can deal with the infinite variance case, but for finite variance cases its relative efficiency compared to the local least squares estimator can be arbitrarily small.

We propose the local linear CQR estimator as an efficient alternative to the local linear regression estimator. Let ρτk (r) = τkr − rI (r < 0), k = 1, 2, …, q, be q check loss functions at q quantiles positions: $τk=kq+1$. In the linear regression model the CQR loss is defined as (Zou & Yuan 2008)

$∑k=1q∑i=1nρτκ(yi−ak−bti).$

The CQR combines the strength across multiple quantile regressions with forcing a single parameter for “slope”. Since the nonparametric function is approximated by a linear model locally, we consider minimizing the locally weighted CQR loss

$∑k=1q[∑i=1nρτk{yi−ak−b(ti−t0)}K(ti−t0h)].$
(2.2)

Denote the minimizer of (2.2) by (â1, , âq, ). Then we let

$m^(t0)=1q∑k=1qa^k,andm˜′(t0)=b^.$
(2.3)

We refer (t0) to as the local linear CQR estimator of m(t0). As an estimator of m′(t0), (t0) can be further improved by using the local quadratic CQR estimator which is discussed in the next section.

Remark 1

It is worth mentioning here that although the check loss function is typically used to estimate the conditional quantile function of y given T (see Koenker (2005) and references therein), we simultaneously employ several check functions to estimate the regression (mean) function. So the local CQR smoother is conceptually different from nonparametric quantile regression by local fitting which has been studied in Yu & Jones (1998) and chapter 5 of Fan & Gijbels (1996).

Remark 2

In a short note Koenker (1984) studied the Hogg estimator as the minimizer of the weighted sum of check functions in the framework of parametric linear models. The focus there is to argue that the Hogg estimator is a different way to do L-estimation. The CQR loss can be regarded as a weighted sum of check functions with uniform weights and uniform quantiles $(τk=kq+1,k=1,2,…,q)$. When q is large, such a choice leads to nice oracle-like estimators in the oracle model selection theoretic framework (Zou & Yuan 2008). Koenker (1984) did not discuss relative efficiency of the Hogg estimator relative to the least squares estimator. In this work we consider minimizing the locally weighted CQR loss and show that the local CQR smoothers have very interesting asymptotic efficiency properties. To our best knowledge, none of these has been studied in the literature.

2.1 Asymptotic properties

To see why local linear CQR is an efficient alternative to local linear regression, we establish the asymptotic properties of the local linear CQR estimator. Some notation is necessary for the discussion. Let F(·) and f(·) denote the density function and cumulative distribution function of the error distribution, respectively. Denote by fT(·) the marginal density function of the covariate T. We choose the kernel K(·) as a symmetric density function and let

$μj=∫ujK(u)duandνj=∫ujK2(u)du,j=0,1,2,…,.$

Define

$R1(q)=1q2∑k=1q∑k′=1qτkk′f(ck)f(ck′),$
(2.4)

where ck = F−1 (τk) and τkk′ = τkτk′τkτk′. In the following theorem, we present the asymptotic bias, variance and normality of (t0), whose proof is given in section 5. Let T be the σ-field generated by {T1, , Tn}.

Theorem 2.1

Suppose that t0 is an interior point of the support of fT(·). Under the regularity conditions (A)—(D) in section 5, if h → 0 and nh → ∞, then the asymptotic conditional bias and variance of the local linear CQR estimator (t0) are given by

$Bias(m^(t0)|T)=12m″(t0)μ2h2+op(h2),$
(2.5)

$Var(m^(t0)|T)=1nhν0σ2(t0)fT(t0)R1(q)+op(1nh).$
(2.6)

Furthermore, conditioning on T, we have

$nh{m^(t0)−m(t0)−12m″(t0)μ2h2}→ℒN(0,ν0σ2(t0)fT(t0)R1(q)).$
(2.7)

where $→ℒ$ stands for convergence in distribution.

Remark 3

In the proof given in section 5 we assume the error distribution is symmetric. Without such a condition the asymptotic bias will have a non-varnishing term. The asymptotic variance remains the same and the asymptotic normality still holds with a minor modification. In other words, the symmetric error distribution condition is only used to ensure that the quantity to which the local CQR estimator converges is the conditional mean. This is similar to the situation when using the local LAD to estimate the conditional mean function where we need to assume the mean and median of the error distribution coincide.

We see from Theorem 2.1 that the leading term of the asymptotic bias for the local linear CQR estimator is the same as that for the local linear least squares estimator, while their asymptotic variances are different. The mean squared error of (t0) is

$MSE{m^(t0)}={12m″(t0)μ2}2h4+1nhν0σ2(t0)fT(t0)R1(q)+op(h4+1nh).$

By straightforward calculations we can see that the optimal variable bandwidth minimizing the asymptotic mean squared error of (t0) is

$hopt(t0)=[ν0σ2(t0)R1(q)fT(t0){m″(t0)μ2}2]1/5n−1/5.$

In practice, one may select a constant bandwidth by minimizing the mean integrated squared error MISE () = ∫ MSE{(t0)}w(t) dt for a weight function w(t). Similarly, the optimal bandwidth minimizing the asymptotic MISE() is

$hopt=[ν0R1(q)∫σ2(t)fT−1(t)w(t)dtμ22∫{m″(t)}2w(t)dt]1/5n−1/5.$

The above calculations indicate that the local linear CQR estimator enjoys the optimal rate of convergence n2/5.

2.2 Asymptotic relative efficiency

In this section, we study the asymptotic relative efficiency of the local linear CQR estimator with respect to the local linear least squares estimator by comparing their mean squared errors. The role of R1 becomes clear in the relative efficiency study.

The local linear least squares estimator for m(t0) has the mean squared error

$MSE{m^LS(t0)}={12m″(t0)μ2}2h4+1nhν0fT(t0)σ2(t0)+op(h4+1nh),$

and hence

$hLSopt(t0)=[ν0σ2(t0)fT(t0){m″(t0)μ2}2]1/5n−1/5,hLSopt=[ν0∫σ2(t)fT−1(t)w(t)dtμ22∫{m″(t)}2w(t)dt]1/5n−1/5,$

where $hLSopt(t0)$ is the optimal variable bandwidth minimizing the asymptotic MSE and $hLSopt$ is the optimal bandwidth minimizing the asymptotic MISE. Therefore, we have

$hopt(t0)=R1(q)1/5hLSopt(t0),hopt=R1(q)1/5hLSopt.$
(2.8)

We use MSEopt and MISEopt to denote the MSE and MISE evaluated at their optimal bandwidth. Then by straightforward calculations we see that as n approaches ∞,

$MSEopt{m^LS(t0)}MSEopt{m^(t0)}→R1(q)−4/5,MISEopt{m^LS}MISEopt{m^}→R1(q) −4/5.$

Thus, it is natural to define ARE(, LS the asymptotic relative efficiency (ARE) of the local linear CQR estimator with respect to the local linear least squares estimator, as follows

$ARE(m^,m^LS)=R1(q)−4/5.$
(2.9)

The ARE only depends on the error distribution, although the dependence could be rather complex. However, for many commonly seen error distributions, we can directly compute the value of ARE. Table 1 displays the ARE(, LS) for some commonly seen error distributions.

Comparisons of ARE , LS

Several interesting observations can be made from Table 1. Firstly, when the error distribution is N(0, 1) for which the local linear least squares estimator is expected to have the best performance, the ARE(, LS) is very close to 1 regardless the choice of q in the local linear CQR estimator. When q = 5 the the local linear CQR only loses at most 7% efficiency, while it performs as well as the local linear least squares estimator when q = 99. Secondly, for all the other non-normal distributions listed in Table 1, the local linear CQR estimator can have higher efficiencies than the local linear least squares estimator when a small q is used. The mixture of two normals is often used to model the so-called contaminated data. For such distributions, the ARE(, LS) can be as large as 4.9 and even more. Table 1 also indicates that, except for the Laplace error, the local CQR with q = 5 or q = 9 are significantly better than the one with q = 1, which becomes the local LAD for these distributions. Finally, we observe that the ARE values for a variety of distributions are very close to 1 when q is large (q = 99). It turns out that this phenomenon is true in general, as demonstrated in the following theorem.

Theorem 2.2

limq→∞ R1(q) = 1, and thus $limq→∞ARE(m^,m^LS)=1$.

Theorem 2.2 provides us insights into the asymptotic behavior of the local linear CQR estimator and implies that the local linear CQR estimator is a safe competitor against the local linear least squares estimator, for it will not lose efficiency when using a large q. On the other hand, substaintial gain in efficiency could be achieved by using a relatively small q such as q = 9, as shown in Table 1.

3 Estimation of derivative

In many situations we are interested in estimating the derivative of m(t). The local linear CQR also provides an estimator (t0) to the derivative of m(t). The asymptotic bias and variance of the estimate (t0) in (2.3) are given in (5.8) and (5.9) in section 5. The local linear CQR estimator and the local linear regression estimator have the same leading bias term which depends on the intrinsic part m(t0) and the extra part $m″(t0)f′T(t0)/fT(t0)$. In Chu & Marron (1991) and Fan (1992), the authors already argued that the bias could be very large in many situations. So it may not be an ideal estimator because of the relatively large bias. The local quadratic regression is often preferred for estimating the derivative function, since it reduces the estimation bias without increasing the estimation variance (Fan & Gijbels 1992). We show here that the same phenomenon is true in local CQR smoothing.

We consider the local quadratic approximation of m(t) in the neighborhood of t0: $m(t)≈m(t0)+m′(t0)(t−t0)+12m″(t0)(t−t0)2$. Let a = (a1, , aq) and b = (b1, b2). We solve

$(â,b^)=arg min⁡a,b∑i=1n[∑k=1qρτk(yi−ak−b1(ti−t0)−12b2(ti−t0)2)K(ti−t0h)].$
(3.1)

Then the local quadratic CQR estimator for m′(t0) is given by

$m^′(t0)=b^1.$
(3.2)

3.1 Asymptotic properties

Denote

$R2(q)=(∑k=1q∑k′=1qτkk′)/(∑k=1qf(ck))2.$
(3.3)

The asymptotic bias, variance and normality are given in the following theorem.

Theorem 3.1

Suppose that t0 is an interior point of the support of fT(·). Under the regularity conditions (A)—(D) in section 5, if h → 0 and nh3 → ∞, then the asymptotic conditional bias and variance of (t0), defined in (3.2) is given by

$Bias(m^′(t0)|T)=16m‴(t0)μ4μ2h2+op(h2),$
(3.4)

$Var(m^′(t0)|T)=1nh3ν2σ2(t0)μ22fT(t0)R2(q)+op(1nh3).$
(3.5)

Furthermore, conditioning on T, we have the following asymptotic normal distribution

$nh3(m^′(t0)−m′(t0)−16m‴(t0)μ4μ2h2)→ℒN(0,ν2σ2(t0)μ22fT(t0)R2(q)).$
(3.6)

Remark 4

In Theorem 3.1 the symmetric-error-distribution assumption is used to get the asymptotic bias formula. Without that assumption, the asymptotic variance remains the same and the asymptotic normality still holds with a minor modification. It is also interesting to point out that when the variance function is homoscadastic the symmetric-error-distribution assumption is no longer needed for Theorem 3.1.

Comparing (5.8) and (3.4), we see that the extra part $m″(t0)f′T(t0)/fT(t0)$ is removed in the local quadratic CQR estimator. Comparing the local quadratic CQR and the local quadratic least squares estimators for m′(t0), we see that they have the same leading bias term, while their asymptotic variances are different.

From Theorem 3.1, the mean squared error of local quadratic CQR estimator (t0) is given by

$MSE{m′(t0)}=(16m‴(t0)μ4μ2)2h4+1nh3ν2σ2(t0)μ22fT(t0)R2(q)+op(h4+1nh3).$

Thus, the optimal variable bandwidth minimizing MSE{(t0)} is

$hopt(t0)={R2(q)}1/7(27ν2σ2(t0)fT(t0){m‴(t0)μ4}2)1/7n−1/7.$

Furthermore, we consider the mean integrated squared error MISE = ∫ MSE{(t)}w(t) dt with a weight function w(t). The optimal constant bandwidth minimizing the mean integrated squared error is given by

$hopt={R2(q)}1/7(27ν2∫σ2(t)fT−1(t)w(t)dt∫{m‴(t)}2w(t)dtμ42)1/7n−1/7.$

The above calculations indicate that the local quadratic CQR estimator enjoys the optimal rate of convergence n2/7.

3.2 Asymptotic relative efficiency

In what follows we study the asymptotic relative efficiency of the local quadratic CQR estimator with respect to the local quadratic least squares estimator. Note that the mean squared error of local quadratic least squares estimator $m^LS′(t0)$ is given by

$MSE{m^LS′(t0)}=(16m‴(t0)μ4μ2)2h4+1nh3ν2σ2(t0)μ22fT(t0)+op(h4+1nh3),$

and the mean integrated squared error (MISE) is $MISE(m^LS′)=∫MSE{m^LS′(t)}w(t)dt$ with a weight function w(t). Thus, by straightforward calculations, we notice that

$hopt(t0)=hLSopt(t0)R2(q)1/7,hopt=hLSoptR2(q)1/7,$
(3.7)

where $hLSopt(t0)$ and $hLSopt$ are the corresponding optimal bandwidths of local quadratic least squares estimator. With the optimal bandwidths, we have

$MSEopt{m^LS′(t0)}MSEopt{m^′(t0)}→R2(q)−4/7,MISEopt(m^LS′)MISEopt(m^′)→R2(q)−4/7.$

Therefore, the asymptotic relative efficiency (ARE) of the local quadratic CQR estimator () with respect to the local quadratic least squares estimator $(m^LS′)$ is defined to be

$ARE(m^′,m^LS′)=R2(q)−4/7.$
(3.8)

The ARE only depends on the error distribution and it is scale invariant.

To gain insights into the asymptotic relative efficiency, we consider the limit when q is large. Zou & Yuan (2008) showed that

$limq→∞R2(q)−1>6eπ=0.7026.$

Immediately, we know that if using a large q, the ARE is bounded below by 0.70264/7 = 0.8173. Having a universal lower bound is very useful because it prohibits severe loss in efficiency when replacing the local quadratic least squares estimator with the local quadratic CQR estimator. One of our contributions in this work is to provide an improved sharper lower bound as shown in the following theorem.

Theorem 3.2

Let denote the class of error distributions with mean 0 and variance 1, then we have

$inff∈ℱlim⁡q→∞R2(q)−1=0.864.$
(3.9)

The lower bound is reached if and only if the error follows the rescaled Beta(2,2) distribution with mean zero and variance one. Thus

$limq→∞ARE(m^′,m^LS′)≥0.9199.$
(3.10)

It is interesting to note that Theorem 3.2 provides us the exact lower bound of $ARE(m^′,m^LS′)$ as q → ∞. Theorem 3.2 indicates that if q is large, even in the worst scenario the potential efficiency loss for the local CQR estimator is only 8.01%.

Theorem 3.2 implies that the local quadratic CQR estimator is a safe alternative to the local quadratic least squares estimator. It concerns the worst case scenario. There are many optimistic scenarios as well in which the ARE can be much bigger than 1. We examine the $ARE(m^′,m^LS′)$ for the error distributions considered in Table 1. We also list the results in Table 2, where the column labeled q = ∞ shows the theoretical limit of the $ARE(m^′,m^LS′)$. Obviously, these limits are all larger than the lower bound 0.9199. The local quadratic CQR estimator only loses less than 4% efficiency when the error distribution is normal and q = 9. It is interesting to see that for the other non-normal distributions the $ARE(m^′,m^LS′)$ is larger than 1 and its value is insensitive to the choice of q. For example, with q = 9, the AREs are already very close to their theoretical limits.

Comparisons of $ARE(m^′,m^LS′)$

4 Numerical comparisons and examples

In this section, we first use Monte Carlo simulation studies to assess the finite sample performance of the proposed estimation procedures and then demonstrate the application of the proposed method by using a real data example. Throughout this section we use the Epanechnikov kernel, i.e., $K(z)=34(1−z2)+$. We adopt the MM algorithm proposed by (Hunter & Lange 2000) for solving the local CQR smoothing estimator. All the numerical results are computed using our MATLAB code, which is available upon request.

4.1 Bandwidth selection in practical implementation

Bandwidth selection is an important issue in local smoothing. Here we briefly discuss the bandwidth selection issue in the local CQR smoothing estimator by using existing bandwidth selector for the local polynomial regression. Here we consider two bandwidth selectors.

1. The “pilot” selector. The idea is to use a pilot bandwidth in local cubic CQR (defined in section 5) to estimate m″(t) and m(t). The fitted residuals can be used to estimate R1(q) and R2(q). Thus, we can use the optimal bandwidth formula to estimate the optimal bandwidth and then refit the data.
2. A short-cut strategy. In our numerical studies, we compare the local CQR and local least squares estimators. Note that in (2.8) and (3.7) we obtain very neat relationships between the optimal bandwidths for the local CQR and local least squares estimators. The optimal bandwidth for the local least squares estimators can be selected by existing bandwidth selectors (see Chapter 4 of Fan & Gijbels (1996)). In addition, we are able to infer the factors R1(q) and R2(q) from the residuals of the local least squares fit. Sometimes, we even know the exact values of the two factors (e.g., in simulations). Therefore, after fitting the local least squares estimator with the optimal bandwidth, we can estimate the optimal bandwidth for the local CQR estimator.

We used the short-cut strategy in our simulation examples. However, if the error variance is infinite or very large, then the local least squares estimator performs poorly. The “pilot” selector is a better choice than the short-cut strategy.

4.2 Simulation examples

In our simulation studies, we compare the performance of the newly proposed method with the local polynomial least squares estimate. The bandwidth is set to the optimal one in which the $hLSopt$ is selected by a plug-in bandwidth selector (Ruppert, Sheather & Wand 1995). The performance of estimator (·) and (·) is assessed via the average squared errors (ASE), defined by $ASE(g^)=1ngrid∑k=1ngrid{g^(uk)−g(uk)}2$, with g equals either m(·) or m′(·), where {uk, k = 1, …, ngrid} are the grid points at which the functions {ĝ(·)} are evaluated. In our simulation, we set ngrid = 200 and grid points are evenly distributed over the interval at which the m(·) and m′(·) are estimated. We summarize our simulation results using the ratio of average squared errors (RASE), $RASE(g^)=ASE(g^LS)ASE(g^)$ for an estimator ĝ, where ĝLS is the local polynomial regression estimator under the least squares loss. We considered two simulation examples.

Example 4.1

We generated 400 data set, each consisting of n = 200 observations, from

$Y=sin⁡(2T)+2exp⁡(−16T2)+0.5ϵ,$
(4.1)

where T follows N(0, 1). This model is adopted from Fan & Gijbels (1992). In our simulation, we considered five error distributions for ϵ: N(0, 1), Laplace, t3 distribution, a mixture of two normals (0.95N(0,1) + 0.05N(0, σ2) with σ = 3,10). For the local polynomial CQR estimator, we consider q = 5, 9 and 19, and estimate m(·) and m′(·) over [−1.5, 1.5]. The mean and standard deviation of RASE over 400 simulations are summarized in Table 3. To see how the proposed estimate behaves at a typical point, Table 3 also depicts the biases and standard deviations of (t) and (t) at t = 0.75. In Table 3, CQR5, CQR9 and CQR19 correspond to the local CQR estimate with q = 5, 9 and 19, respectively.

Simulation Results for Example 4.1

Example 4.2

It is of interest to investigate the effect of heteroscedastic errors. To this end, we generated 400 simulation data sets, each consisting of n = 200 observations, from

$Y=Tsin⁡(2πT)+σ(T)ϵ,$
(4.2)

where T follows U(0, 1), σ(t) = {2 + cos(2πt)}/10, and ϵ is the same as that in Example 4.1. In this example, we estimate m(t) and m′(t) over [0,1]. The mean and standard deviation of RASE over 400 simulations are summarized in Table 4, in which we also show the biases and standard deviations of (t) and (t) at t = 0.4. The notation of Table 4 is the same as that in Table 3.

Simulation Results for Example 4.2.

Table 3 and Table 4 show very similar messages, although Table 4 indicates that the local CQR has more gains over the local least squares method. When the error follows the normal distribution, the RASEs of the local CQR estimators are slightly less than one. For non-normal distributions, the RASEs of the local CQR estimators can be greater than one, indicating the gain in efficiency. For estimating the regression function, CQR5 and CQR9 seem to have better overall performance than CQR19. For estimating the derivative, all three CQR estimators perform very similarly. These findings are consistent with the theoretical analysis of AREs.

4.3 A real data example

As an illustration, we now apply the proposed local CQR methodology to the U.K. Family Expenditure Survey data subset with high net-income, which consists of 363 observations. The scatter plot of data is depicted in the left panel of Figure 1. The data set was collected in the U.K. Family Expenditure Survey in 1973. Of interest is to study the relationship between the food expenditure and the net-income. Thus, we take the response variable Y to be the logarithm of the food expenditure, and the predictor variable T is the net-income.

The left panel is the scatter plot of data, the middle panel is the estimated regression function, and the right panel is the estimated derivative function.

We first estimated the regression function using the local least squares estimator with the plug-in bandwidth selector (Ruppert et al. 1995). We further employed the kernel density estimate to infer the error density f(·) based on the residuals from the local least squares estimator. Based on the estimated density, we estimated both R1(q) and R2(q), which were used to compute the bandwidth selector for the CQR estimator. For this example, the estimated ratios are close to 1, so we basically use the same bandwidths for these two methods. The selected bandwidths are 0.24 for regression estimation and 0.4 for derivative estimation. The CQR estimates with q = 5,9 and 19 with the selected bandwidths are evaluated. The CQR estimates with three different q’s are very similar, we only present the CQR estimate with q = 9 in Figure 1.

It is interesting to see from Figure 1 that the overall patten of the local least squares and the local CQR estimate are the same. The difference between the local least squares estimate and the local CQR estimate of the regression function becomes large when the net income around 2.8. From the scatter plot, there are two possible outlier observations: (2.7902,−2.5207) and (2.8063,−2.6105) (circled in the plot). To understand the impact of these two possible outliers, we re-evaluated the local CQR and the local least squares estimates after excluding these two possible outliers. The resulting estimates are depicted in the top panel of Figure 2, from which we can see that the local CQR estimate remains almost the same, while the local least squares estimate changes a lot. We also note that after removing these two possible outliers, the local least squares estimator becomes very close to the local CQR estimator. Furthermore, as a more extreme demonstration, we kept these two possible outliers in the data set and moved them to more extreme cases, i.e, we moved (2.7902,−2.5207) and (2.8063,−2.6105) to (2.7902,(2.7902,−26.5207) and (2.8063,−6.6105), respectively. After distorting the two observations, we re-calculated the local CQR and the local least squares estimate. The resulting estimates are depicted in the bottom panel of Figure 2, which clearly demonstrates that the local least squares estimate changes dramatically, while the local CQR estimate is nearly un-affected by the artificial data distortion.

Plot of estimated regression function and its derivative. The top panel is for the estimate removing the two possible outliers, and bottom panel is for the estimate moving the two possible outliers to more extreme cases. The left panel is for the estimated ...

5 Local p-polynomial CQR smoothing and proofs

In this section we establish asymptotic theory of the local p-polynomial CQR estimators. We then treat Theorems 2.1 and 3.1 as two special cases of the general theory. As a generalization of the local linear and local quadratic CQR estimators, the local p-polynomial CQR estimator is constructed by minimizing

$∑k=1q[∑i=1nρτk{yi−ak−∑j=1pbj(ti−t0)j}K(ti−t0h)],$
(5.1)

and the local p-polynomial CQR estimators of m(t0) and m(r)(t0) are given by

$m^(t0)=1q∑k=1qa^k,andm^(r)(t0)=r!b^r,r=1,⋯,p.$
(5.2)

For the asymptotic analysis, we need the following regularity conditions:

1. m(t) has a continuous (p + 2)th derivative in the neighborhood of t0.
2. fT(·), the marginal density function of T, is differentiable and positive in the neighborhood of t0.
3. The conditional variance σ2(t) is continuous in the neighborhood of t0.
4. Assume that the error has a symmetric distribution with a positive density f(·).

We choose the kernel function K such that K is a symmetric density function with finite support [−M, M]. The following notation is needed to present the asymptotic properties of the local p-polynomial CQR estimator. Let S11 be a q × q diagonal matrix with diagonal elements f(ck), k = 1, , q, S12 be a q × p matrix with (k, j)-element being f(ck)μj, k = 1, ,q and $j=1,⋯,p,S21=S12T$, and S22 be a p × p matrix with (j, j′)-element being $∑k=1qf(ck)μj+j′,forj,j′=1,⋯,p$. Similarly, Let Σ11 be a q × q matrix with (k, k′)-element ν0τkk′, k, k′ = 1, , q, Σ12 be a q×p matrix with (k, j)-element being $νj∑k′=1qτkk′,k=1,⋯,q$ and $j=1,⋯,p,Σ21=Σ12T$, and Σ22 be a p × p matrix with (j, j′)-element being $(∑k,k′=1qτkk′)νj+j′,forj,j′=1,⋯,p$. Define

$S=(S11S12S21S22),andΣ=(Σ11Σ12Σ21Σ22).$

Partition S−1 into four submatrices as follows

$S−1=(S11S12S21S22)−1=((S−1)11(S−1)12(S−1)21(S−1)22),$

where and hereafter, we use (·)11 to denote the left-top q × q submatrix and use (·)22 to denote the right-bottom p × p submatrix.

Furthermore, let $uk=nh{ak−m(t0)−σ(t0)ck},vj=hjnh{j!bj−m(j)(t0)}/j!$. Let xi = (tit0)/h, Ki = K(xi) and $Δi,k=uknh+∑j=1pvjxijnh$. Write di,k = ck[σ(ti) − σ(t0)] + ri,p with $ri,p=m(ti)−∑j=0pm(j)(t0)(ti−t0)j/j!$. $ηi,k*$ to be $I(ϵi≤ck−di,kσ(ti))−τk$. let $Wn*=(w11*,⋯,w1q*,w21*,⋯,w2p*)Twithw1k*=1nh∑i=1nKiηi,k*andw2j*=1nh∑k=1q∑i=1nKixijηi,k*$.

The asymptotic properties of the local p-polynomial CQR estimator are based on the following theorem.

Theorem 5.1

Denote n = (û1, …, û1, 1, …, p) be the minimizer of (5.1). Then under the regularity conditions (A)—(C), we have

$θ^n+σ(t0)fT(t0)S−1E(Wn*|T)→ℒMVN(0,σ2(t0)fT(t0)S−1ΣS−1).$

To prove theorem 5.1, we first establish Lemmas 5.2—5.3.

Lemma 5.2

Minimizing (5.1) is equivalent to minimizing

$∑k=1quk(∑i=1nKiηi,k*nh)+∑j=1pvj(∑k=1q∑i=1nKixijηi,k*nh)+∑k=1qBn,k(θ)$

with respect to θ = (u1, , uq, ν1, , νp)T, where

$Bn,k(θ)=∑i=1n{Ki∫0Δi,k[I(ϵi≤ck−di,kσ(ti)+zσ(ti))−I(ϵi≤ck−di,kσ(ti))]dz}.$

Proof. To apply the identity (Knight 1998)

$ρτ(x−y)−ρτ(x)=y(I(x≤0)−τ)+∫0y{I(x≤z)−I(x≤0)}dz,$
(5.3)

we write $yi−ak−∑j=1pbi(ti−t0)j=σ(ti)(ϵi−ck)+di,k−Δi,k$. Minimizing (5.1) is equivalent to minimizing

$Ln(θ)=∑i=1n{Ki∑k=1q[ρτk(σ(ti)(ϵi−ck)+di,k−Δi,k)−ρτk(σ(ti)(ϵi−ck)+di,k)]}.$

Using the identity (5.3) and with some straightforward calculations, it follows that

$Ln(θ)=∑k=1quk(∑i=1nKiηi,k*nh)+∑j=1pvj(∑k=1q∑i=1nKixijηi,k*nh)+∑k=1qBn,k(θ).$

This completes the proof.

Let Sn,11 be a q × q diagonal matrix with diagonal elements $f(ck)∑i=1nKinhσ(ti),k=1,⋯,q$; Sn,12 be a q × p matrix with (k, j)-element $f(ck)∑i=1nKixijnhσ(ti),j=1,⋯,p$; Sn,22 be a p × p matrix with (j, j′) element $∑k=1qf(ck)∑i=1nKixij+j′nhσ(ti)$. Denote

$Sn=(Sn,11Sn,12Sn,12TSn,22).$

Lemma 5.3

Under Conditions (A) – (C), $Ln(θ)=12θTSnθ+(Wn*)Tθ+op(1)$.

Proof. Write Ln(θ) as

$Ln(θ)=∑k=1quk(∑i=1nKiηi,k*nh)+∑j=1pvj(∑k=1q∑i=1nKixijηi,k*nh)+∑k=1qEϵ[Bn,k(θ)|T]+∑k=1qRn,k(θ),$

where Rn,k(θ) = Bn,k(θ) − Eϵ[Bn,k(θ)|T]. Using F(ck + z) − F(ck) = zf(ck) + o(z), then $∑k=1qEϵ[Bn,k(θ)|T]$ equals

We now prove Rn,k(θ) = op(1). It is sufficient to show Varϵ[Bn,k(θ)|T] = op(1). In fact,

Proof of Theorem 5.1

Similar to Parzen (1962), we have $1nh∑i=1nKixij→PfT(t0)μj,where→P$ stands for convergence in probability. Thus,

$Sn→PfT(t0)σ(t0)S=fT(t0)σ(t0)(S11S12S21S22).$

This, together with Lemmas 5.2, 5.3, leads to

$Ln(θ)=12fT(t0)σ(t0)θTSθ+(Wn*)Tθ+op(1).$

Since the convex function $Ln(θ)−(Wn*)Tθ$ converges in probability to the convex function $12fT(t0)σ(t0)θTSθ$, it follows from the convexity lemma (Pollard 1991) that for any compact set Θ, the quadratic approximation to Ln(θ) holds uniformly for θ in any compact set, which leads to

$θ^n=−σ(t0)fT(t0)S−1Wn*+op(1).$

Denote ηi,k = Iick)−τk and Wn = (w11, , w1q,, w21, , w2p)T with $w1k=1nh∑i=1nKiηi,k$ and $w2j=1nh∑k=1q∑i=1nKixijηi,k$. By the Cramer-Wald theorem, it is easy to see that the CLT for Wn|T holds

$Wn|T−E[Wn|T]Var[Wn|T]→ℒMVN(0,I(p+q)×(p+q)).$
(5.4)

Note that Cov(ηi,k, ηi,k′) = τkk′,Cov(ηi,k, ηj,k′) = 0, if ij. Similar to Parzen (1962), we have $1nh∑i=1nKi2xij→PfT(t0)νj,Therefore,Var[Wn|T]→PfT(t0)Σ$. Combined with (5.4), we have $Wn|T→ℒMVN(0,fT(t0)Σ)$. Moreover, we have $Var(w1k*−w1k|T)=1nh∑i=1nKi2Var(ηi,k*−ηi,k)≤1nh∑i=1nKi2{F(ck+|di,k|σ(ti))−F(ck)}=op(1)and alsoVar(w2j*−w2j|T)=1nh∑i=1nKi2xijVar(∑k=1qηi,k*−ηi,k)≤q2nh∑i=1nKi2xijmax⁡k{F(ck+|di,k|σ(ti))−F(c)}=op(1),thusVar(Wn*−Wn|T)=op(1)$. So by Slutsky’s theorem, conditioning on T, we have $Wn*|T−E(Wn*|T)→ℒMVN(0,fT(t0)Σ)$. Therefore,

$θ^n+σ(t0)fT(t0)S−1E(Wn*|T)→ℒMVN(0,σ2(t0)fT(t0)S−1ΣS−1).$
(5.5)

This completes the proof.

Proof of Theorem 2.1

The asymptotic normality follows Theorem 5.1 with p = 1. Let us calculate the conditional bias and variance, respectively. Denote by eq×1 the vector that contains q 1’s. When p = 1, S is a diagonal matrix with diagonal elements $f(c1),⋯,f(cq),μ2∑k=1qf(ck)$. So the asymptotic conditional bias of $m^(t0)=1q∑k=1qa^k$ is

$Bias(m^(t0)|T)=1qσ(t0)∑k=1qck−1q·nhσ(t0)fT(t0)eq×1T(S−1)11E(W1n*|T)=1qσ(t0)∑k=1qck−1q·nhσ(t0)fT(t0)∑i=1nKi∑k=1q1f(ck){F(ck−di,kσ(ti))−F(ck)}.$

Note that the error is symmetric, thus $∑k=1qck=0$, and furthermore, it is easy to check that $1q∑k=1q1f(ck){F(ck−di,kσ(ti))−F(ck)}=−ri,pσ(ti){1+op(1)}$. Therefore,

$Bias(m^(t0)|T)=1nhσ(t0)fT(t0)∑i=1nKiri,pσ(ti){1+op(1)}.$

By using the fact that

$1nh∑i=1nKiri,pσ(ti)=fT(t0)m″(t0)2σ(t0)μ2h2{1+op(1)},$

we obtain

$Bias(m^(t0)|T)=12m″(t0)μ2h2+op(h2).$
(5.6)

Furthermore, the conditional variance of (t0) is

$Var(m^(t0)|T)=1nhσ2(t0)fT(t0)1q2eq×1T(S−1ΣS−1)11eq×1+op(1nh)=1nhν0σ2(t0)fT(t0)R1(q)+op(1nh).$
(5.7)

By using Theorem 5.1, we can further derive the asymptotic bias and variance of (t0) given in (2.3):

$Bias(m˜′(t0)|T)=16(m‴(t0)+3m″(t0)f′T(t0)fT(t0))μ4μ2h2+op(h2),$
(5.8)

$Var(m˜′(t0)|T)=1nh3ν2σ2(t0)μ22fT(t0)R2(q)+op(1nh3).$
(5.9)

Proof of Theorem 2.2

Note that

$limq→∞R1(q)=∫01∫01s1∧s2−s1s2f(F−1(s1))f(F−1(s2))ds1ds2=∫−∞∞∫−∞∞(F(z1)∧F(z2)−F(z1)F(z2))dz1dz2.$
(5.10)

by change of variables. Define two functions below $G(s)=∫−∞sF(t)dtandH(s)=∫−∞sG(t)dt$. It is easy to verify that

$G(s)=∫−∞s(s−x)f(x)dx=sF(s)−k1(s),$
(5.11)

where $k1(s)=∫−∞sxf(x)dx$. Similarly, we obtain

$2H(s)=∫−∞s(s−x)2f(x)dx=s2F(s)−2sk1(s)+k2(s),$
(5.12)

where $k2(s)=∫−∞sx2f(x)dx$. Let I be the integral in (5.10). We have that I equals

$2∫−∞∞(∫z1∞f(t)dt)G(z1)dz1=2∫−∞∞f(t)(∫−∞tG(z1)dz1)dt=∫−∞∞2f(t)H(t)dt.$
(5.13)

By the definition of G and H, we know $d(2H(t)F(t)−G2(t))dt=2H(t)f(t)$; and combining (5.11) and (5.12) yields $2H(t)F(t)−G2(t)=k2(t)F(t)−k12(t)$. Now it is easy to see that I equals 1, by the facts that $∫−∞∞x2f(x)dx=EF[ϵ2]=1and∫−∞∞xf(x)dx=EF[ϵ]=0$.

Proof of Theorem 3.1

We apply Theorem 5.1 to get the asymptotic normality. Denote by er the p-vector (0, 0, , 1, 0, , 0)T with 1 on the rth position. When p = 2, S12 has the following forms $S12=(0q×1,μ2(f(ck))q×1),andS22=diag(μ2∑k=1qf(ck),μ4∑k=1qf(ck)).SinceS11=diag(f(c1),⋯,f(cq)),(S−1)22=(S22−S21S11−1S12)−1=diag((μ2∑k=1qf(ck))−1,{(μ4−μ22)∑k=1qf(ck)}−1)$.

Note that $(S−1)21=−(S−1)22S21S11−1$. Thus, (S−1)21 equals $(0q×1,[μ2/{(μ4−μ22)∑k=1qf(ck)}]1q×1)T$. By Theorem 5.1

$Bias(m^′(t0)|T)=−σ(t0)hfT(t0)1nhe1T{(S−1)21E(W1n*|T)+(S−1)22E(W2n*|T)}=−σ(t0)hfT(t0)1μ2∑k=1qf(ck)1nhE(w21*|T).$

Note that $E(w2j*|T)=1nh∑i=1nKixij∑k=1q{F(ck−di,kσ(ti))−F(ck)}$. Similarly, under condition (D), we have $∑k=1q{F(ck−di,kσ(ti))−F(ck)}=−∑k=1qf(ck)⋅ri,pσ(ti){1+op(1)}$. Therefore, Bias((t0|T) is equal to $1nh2σ(t0)fT(t0)∑i=1nKixiri,pσ(ti){1+op(1)}$. For p = 2,

$1nh∑i=1nKixiri,pσ(ti)=fT(t0)m‴(t0)6σ(t0)μ4μ2h3{1+op(1)},$

we obtain

$Bias(m^′(t0)|T)=16m‴(t0)μ4μ2h2+op(h2).$
(5.14)

Furthermore, the conditional variance of (t0) is

$Var(m^′(t0)|T)=1nh3σ2(t0)fT(t0)e1T(S−1ΣS−1)22e1+op(1nh3),=1nh3ν2σ2(t0)μ22fT(t0)R2(q)+op(1nh3).$
(5.15)

which completes the proof.

Proof of Theorem 3.2

From Zou & Yuan (2008), we know that

$lim⁡q→∞(∑k=1qf(ck))2/(∑k=1q∑k′=1qτkk′)=12EF2[f(ϵ)]=12(∫f2(x)dx)2.$

Thus $limq→∞1R2(q)=12(∫f2(x)dx)2$. We notice that 12 (∫ f2(x)dx)2 is also the asymptotic Pitman efficiency of the Wilcoxon test relative to the t-test Hodges & Lehmann (1956). For the rest of the proof, readers are referred to Hodges & Lehmann (1956).

6 Discussion

In this paper our theoretical analysis deals with the classical setting in which t0 is an interior point and the error distribution has finite variance. We should point out here that the same arguments hold for estimating boundary points and the proposed methodology is valid even when the error variance is infinite.

• Automatic boundary correction. For simplicity, consider t ϵ [0, 1] and t0 = ch for some constant c. We show that the leading team of the asymptotic bias of the local linear/quadratic CQR estimator is the same as that of the local linear/quadratic LS estimator, which indicates that the local CQR estimator enjoys the property of automatical boundary correction, a nice property of local LS estimator. Furthermore, the asymptotic relative efficiency remains exactly the same as that for interior points.
• Infinite error variance. We show that the local CQR estimator still enjoys the optimal rate of convergence and asymptotic normality even when the conditional variance is infinite. This property can be important for real applications, since we have no information on the error distribution in practice.

For detailed theoretical proof of the above claims, we refer interested readers to a supplementary file (Kai, Li & Zou 2009) of this paper, where we also provide additional simulation results to support the theory. We opt not to show these results here due to space limit.

In this paper, we focus on the local CQR estimate for the nonparametric regression model. The proposed methodology and theory may be extended to the settings in presence of multivariate covariates by considering varying coefficient models, additive models or semiparametric models. Such extensions are of great interest, and further research is needed for such extensions.

Finally, we would like to point out that the local CQR procedure is efficiently implemented using the MM algorithm. Our experiences show that for q = 9 and sample size n = 7000, the local CQR fit at a given location can be computed within 0.32 seconds on an AMD 1.9GHz machine. The MM implementation seems to be more efficient than the standard linear programming. We discuss the computing algorithm in details in a separate article.

Acknowledgements

The authors are grateful to the editor, the associate editor and two referees for their helpful and constructive comments, which lead to a substantial improvement of the quality of this paper.

Kai’s research is supported by NIDA, NIH grants R21 DA024260 and P50 DA10075 as a research assistant. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIDA or the NIH.

Li’s research is supported by National Science Foundation grants DMS 0348869 and DMS 0722351.

Zou’s research is supported by National Science Foundation grant DMS 0706733.

References

• Chu C-K, Marron JS. Choosing a kernel regression estimator. Statist. Sci. 1991;6(4):404–436.
• Fan J. Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 1992;87(420):998–1004.
• Fan J, Gijbels I. Variable bandwidth and local linear regression smoothers. Ann. Statist. 1992;20(4):2008–2036.
• Fan J, Gijbels I. Local polynomial modelling and its applications. London: Chapman & Hall; 1996.
• Fan J, Hu TC, Truong YK. Robust non-parametric function estimation. Scand. J. Statist. 1994;21(4):433–446.
• Hodges J, Lehmann E. The efficiency of some nonparametric competitors of the t-test. Ann. Math. Stat. 1956;27(2):324–335.
• Hunter DR, Lange K. Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics. 2000;9(1):60–77.
• Kai B, Li R, Zou H. Supplementary materials for “local cqr smoothing: An efficient and safe alternative to local polynomial regression” 2009 Technical report, http://www.stat.psu.edu/rli/research/Supplement-of-localCQR.pdf. [PubMed]
• Knight K. Limiting distributions for L1 regression estimators under general conditions. Ann. Statist. 1998;26(2):755–770.
• Koenker R. A note on l-estimates for linear models. Stat. and Prob. Letters. 1984;2(6):323–325.
• Koenker R. Quantile regression. Cambridge: Cambridge University Press; 2005.
• Parzen E. On estimation of a probability density function and mode. Ann. Math. Statist. 1962;33:1065–1076.
• Pollard D. Asymptotics for least absolute deviation regression estimators. Econometric Theory. 1991;7(2):186–199.
• Ruppert D, Sheather SJ, Wand MP. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 1995;90(432):1257–1270.
• Welsh AH. Robust estimation of smooth regression and spread functions and their derivatives. Statist. Sinica. 1996;6(2):347–366.
• Yu K, Jones MC. Local linear quantile regression. J. Amer. Statist. Assoc. 1998;93(441):228–237.
• Zou H, Yuan M. Composite quantile regression and the oracle model selection theory. The Annals of Statistics. 2008;36(3):1108–1126.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.