Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2779551

Formats

Article sections

- Abstract
- 1 Introduction
- 2 A new estimation procedure
- 3 Numerical comparison and application
- 4 Proofs
- 5 Discussions
- References

Authors

Related links

Acta Math Appl Sin. Author manuscript; available in PMC 2010 July 1.

Published in final edited form as:

Acta Math Appl Sin. 2009 July 1; 25(3): 427–444.

doi: 10.1007/s10255-008-8813-3PMCID: PMC2779551

NIHMSID: NIHMS104023

Runze Li, Department of Statistics and The Methodology Center, Pennsylvania State University, University Park, PA 16802-2111, USA, Email: ude.usp.tats@ilr;

See other articles in PMC that cite the published article.

In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.

When data are correlated, it is of great interest to improve effciency of parameter estimates by including the correlation information into estimation procedures. This issue has been well studied in the longitudinal or panel data. The generalized method of moments (GMM, Hansen, 1982), the generalized estimating equation (GEE, Liang and Zeger, 1986; Zeger and Liang, 1986) and quadratic inference function (QIF, Qu, Li and Lindsay, 2000) are well-known methods to incorporate the correlation information into estimation procedure for parametric regression models with longitudinal data. Lin and Carroll (2000) showed that kernel GEE, a direct estimation of the parametric GEE, fails to incorporate the correlation information into the kernel estimate for the nonparametric function of clustered/longitudinal data. Wang (2003) proposed the marginal kernel method for longitudinal data. The marginal kernel method achieves its effciency by incorporating the true correlation structure. Fan, Huang and Li (2007) proposed the idea of minimizing generalized variance (MGV) to improve the effciency of estimates of nonparametric regression function under the context of longitudinal data using working independence.

Beyond the setting of longitudinal data, many authors have studied the topic of nonparametric regression with correlated errors. A good review on this topic is given in Opsomer, Wang and Yang (2001), in which attentions have been paid to a nonparametric regression model with fixed designs:

$${y}_{t}=m\left({x}_{t}\right)+{\epsilon}_{t},$$

(1.1)

where *x _{t}* =

In this paper, we consider the situation in which *x _{t}* is a random design. More specifically, it is assumed that (

The remainder of this paper is organized as follows. In Section 2, we propose a new estimation procedure, and discuss the issues related to practical implementation. Section 3 presents numerical comparison and analysis of a real data example. Regularity conditions and technical proofs are given in Section 4. Some discussions and final concluding remarks are given in Section 5.

Suppose that (*x _{t}*,

$${y}_{t}=m\left({x}_{t}\right)+{\epsilon}_{t},$$

(2.1)

where error process *ε _{t}* is a correlated random error with mean zero. Throughout this paper, it is assumed that the error process

$${\epsilon}_{t}={\beta}_{1}{\epsilon}_{t-1}+\dots +{\beta}_{d}{\epsilon}_{t-d}+{\eta}_{t},$$

where *η _{t}* is independent and identically distributed random error with mean zero and variance

$${y}_{t}=m({x}_{t})+{\beta}_{1}{\epsilon}_{t-1}+\dots +{\beta}_{d}{\epsilon}_{t-d}+{\eta}_{t}.$$

In practice, *ε _{t}* is not available, but it may be estimated by

Replacing *ε _{t}*’s with

$${y}_{t}=m\left({x}_{t}\right)+{\mathbf{e}}_{t}^{T}\mathit{\beta}+{\eta}_{t},$$

(2.2)

where **e**_{t} = ( _{t−1}, … , _{t−d})^{T} and ** β** = (

As to the partially linear model (2.2), there exist various estimation procedures, including partially spline estimate (Wahba, 1984, Heckman, 1986, Engle et al., 1986), partial residual method (Speckman, 1998) and profile least squares or likelihood method (Severini and Staniswalis, 1994). Here we will employ the profile least squares techniques to estimate ** β** and

For given ** β**, denote
${y}_{t}^{\ast}={y}_{t}-{\mathbf{e}}_{t}^{T}\mathit{\beta}$ for

$${y}_{t}^{\ast}=m\left({x}_{t}\right)+{\eta}_{t}$$

(2.3)

which is one-dimensional nonparametric model. We may employ existing linear smoothers, such as local polynomial regression, and smoothing splines (Gu, 2002), to estimate *m*(·). Here we will employ the local linear regression. For a given *x*_{0}, we locally approximate the regression function

$$m\left(x\right)\approx m\left({x}_{0}\right)+{m}^{\prime}\left({x}_{0}\right)\left(x-{x}_{0}\right)\stackrel{\wedge}{=}a+b\left(x-{x}_{0}\right)$$

for *x* in the local neighborhood of *x* _{0}. Thus, the local linear estimate of *m* (·) is the minimizer of the following weighted least squares function

$${\left(\widehat{a},\widehat{b}\right)}^{T}={\mathrm{argmin}}_{\left(a,b\right)}\phantom{\rule{1em}{0ex}}\sum _{t=d+1}^{n}{\left\{{y}_{t}^{\ast}-a-b\left({x}_{t}-{x}_{0}\right)\right\}}^{2}{K}_{h}\left({x}_{t}-{x}_{0}\right),$$

where *K _{h}*(

$$\widehat{\mathbf{m}}={S}_{h}\mathbf{y}\ast ,$$

(2.4)

where *S _{h}* is a (

Substituting *m*(*x _{t}*) in (2.3) by (

$$\left(I-{S}_{h}\right)\mathbf{y}=\left(I-{S}_{h}\right)\mathbf{E}\mathit{\beta}+\mathit{\eta},$$

where *I* is the identity matrix, **E** = (**e**_{d+1}, … , **e**_{n})^{T} and ** η** = (

$$\widehat{\mathit{\beta}}={\{{\mathbf{E}}^{T}\left(I-{S}_{h}\right)}^{T}\left(I-{S}_{h}\right){\mathbf{E})}^{-1}{\mathbf{E}}^{T}{\left(I-{S}_{h}\right)}^{T}\left(I-{S}_{h}\right)\mathbf{y},$$

(2.5)

and

$$\widehat{\mathbf{m}}={S}_{h}\left(\mathbf{y}-\mathbf{E}\widehat{\mathit{\beta}}\right),$$

(2.6)

respectively.

The asymptotic distribution of ** and the asymptotic bias and variance of (***x*_{0}) are given in the following theorem. Denote *μ _{i}* = ∫

**Theorem 1.** *Suppose that Conditions A—G listed in Section 4 hold. Then*

*The asymptotic distribution of is given*$$\sqrt{n}\left(\widehat{\mathit{\beta}}-\mathit{\beta}\right)\to N(0,{\sigma}^{2}{\{E(\mathbf{f}{\mathbf{f}}^{T})\}}^{-1})$$*where***f**= (_{t}*ε*_{t−1}*, … , ε*_{t−d})^{T}and σ^{2}=*var*(*η*) ._{t}*The asymptotic distribution of*(*x*_{0},**),***conditioning on x*_{1}, … ,*x*_{n}, is given below$$\sqrt{nh}\{\widehat{m}({x}_{0},\widehat{\mathit{\beta}})-m\left({x}_{0}\right)-\frac{1}{2}{\mu}_{2}{m}^{\u2033}\left({x}_{0}\right){h}^{2}\}\to N(0,\frac{{\nu}_{0}{\sigma}^{2}}{f\left({x}_{0}\right)}),$$*where f*(*x*)*is the density function of x*.

Note that the asymptotic distribution of ** is the same as that of Yule-Walker estimator for the AR model:
**

$${\epsilon}_{t}={\beta}_{1}{\epsilon}_{t-1}+\dots +{\beta}_{d}{\epsilon}_{t-d}+{\eta}_{t}.$$

(see Theorem 8.1.1 of Brockwell and Davis, 1991). In other words, Theorem 1 implies that ** is as efficient as if one knew the true regression function ***m*(·) in advance. The asymptotic bias and variance of (·, **) are the same as those of the local linear regression for independent and identically distributed observations, respectively. This implies that the proposed profile least square estimate is very efficient.**

To implement the profile least squares estimation procedure, we have to determine the order of AR process. In practice, we may start with a large order AR process, and then apply variable selection procedure to select its order. The penalized likelihood procedures with the smoothly clipped absolute deviation (SCAD) penalty was proposed for variable selection in parametric models in Fan and Li (2001). The SCAD procedure is distinguished from the traditional variable selection procedures, such as the stepwise regression and the best subset selection with the AIC and BIC, in that it selects significant variables and estimates their coefficients simultaneously. Thus, it can be directly applied for high-dimensional data analysis. The SCAD procedure was further developed for partially linear model with longitudinal data in Fan and Li (2004). In this section, we apply the SCAD procedure to determine the complexity of AR process.

The SCAD penalized least squares function is defined to be

$$\frac{1}{2}\sum _{t=d+1}^{n}{\{{y}_{t}-m\left({x}_{t}\right)-{\mathbf{e}}_{t}^{T}\mathit{\beta}\}}^{2}+n\sum _{j=1}^{d}p{\lambda}_{j}(|{\beta}_{j}|),$$

where *pλ*(|*β*|) is the SCAD penalty with a tuning parameter λ, defined by

$$p\lambda \left(\left|\beta \right|\right)=\{\begin{array}{ll}\lambda \left|\beta \right|,& \mathrm{if}\phantom{\rule{0.2em}{0ex}}0\phantom{\rule{0.1em}{0ex}}\le \phantom{\rule{0.1em}{0ex}}\left|\beta \right|<\phantom{\rule{0.1em}{0ex}}\lambda ;\\ \frac{({a}^{2}-1){\lambda}^{2}-{\left(\left|\beta \right|-a\lambda \right)}^{2}}{2\left(a-1\right)},& \mathrm{if}\phantom{\rule{0.2em}{0ex}}\lambda \phantom{\rule{0.1em}{0ex}}\le \phantom{\rule{0.1em}{0ex}}\left|\beta \right|\phantom{\rule{1em}{0ex}}<\phantom{\rule{0.1em}{0ex}}a\lambda ;\\ \frac{\left(a+1\right){\lambda}^{2}}{2},& \mathrm{if}\phantom{\rule{0.1em}{0ex}}\left|\beta \right|\phantom{\rule{1em}{0ex}}\ge \phantom{\rule{0.1em}{0ex}}a\lambda .\end{array}$$

Fan and Li (2001) suggested fixing *a* = 3.7 from a Bayesian argument.

Applying the profile techniques for the penalized least squares, we can derive the penalized profile least squares estimate, the minimizer of the following penalized least squares

$$\frac{1}{2}\Vert \left(I-{S}_{h}\right)\mathbf{y}-\left(I-{S}_{h}\right)\mathbf{E}\mathit{\beta}{\Vert}^{2}+n\sum _{j=1}^{d}p{\lambda}_{j}(|{\beta}_{j}|).$$

(2.7)

As demonstrated in Fan and Li (2004), with proper choice of tuning parameter, the resulting estimate contains some exact zero coefficients. This is equivalent to excluding the corresponding terms from the selected model and reducing model complexity. Since the SCAD penalty function is a nonconvex function over [0, ∞), it is challenging in minimizing the SCAD penalized profile least squares function. Following Fan and Li (2004), we employ the local quadratic approximation (LQA) for the SCAD penalty function. Suppose we can get an estimate *β _{j}*

$${[p{\lambda}_{j}(|{\beta}_{j}|)]}^{\prime}={p}^{\prime}{\lambda}_{j}(|{\beta}_{j}|).\mathrm{sgn}({\beta}_{j})\approx {p}^{\prime}{\lambda}_{j}(|{\beta}_{j}^{(k)}|)/|{\beta}_{j}^{(k)}|{\beta}_{j}.$$

This is equivalent to

$$p{\lambda}_{j}(|{\beta}_{j}|)\approx p{\lambda}_{j}(|{\beta}_{j0}|).+\frac{1}{2}\{{p}^{\prime}{\lambda}_{j}(|{\beta}_{j}^{(k)}|)/|{\beta}_{j}^{(k)}|\}({\beta}_{j}^{2}-{\beta}_{j}^{(k)2}).$$

With the aid of LQA, we may employ the following iterative ridge regression to find the minimizer of (2.7):

$${\mathit{\beta}}^{(k+1)}={\{{\mathbf{E}}^{T}{(I-{S}_{h})}^{T}(I-{S}_{h})\mathbf{E}+n{\sum}_{\lambda}({\mathit{\beta}}^{(k)})\}}^{-1}{\mathbf{E}}^{T}{(I-{S}_{h})}^{T}(I-{S}_{h})\mathbf{y},$$

(2.8)

where
${\sum}_{\mathit{\lambda}}({\mathit{\beta}}^{(k)})=\mathrm{diag}\{{{p}^{\prime}}_{{\lambda}_{1}}(|{\beta}_{1}^{(k)}|)/|{\beta}_{1}^{(k)}|,\dots ,{{p}^{\prime}}_{{\lambda}_{d}}(|{\beta}_{d}^{(k)}|)/|{\beta}_{d}^{(k)}|\}$ for nonvanished *β*^{(k)}.

In this section, we address how to determine * λ* in the SCAD procedure and how to select a bandwidth for the profile least squares estimation procedure, two important issues in the practical implementation of the proposed methodology.

In the implementation of the SCAD procedure, we need to choose the tuning parameter ** λ** = (

$$e\left(\mathit{\lambda}\right)=\mathrm{tr}[{\{\stackrel{\sim}{D}+{\sum}_{\mathit{\lambda}}\left(\widehat{\beta}\right)\}}^{-1}\stackrel{\sim}{D}]$$

where = **E**^{T} (*I − S _{h}*)

The BIC score is defined to be

$$\mathit{BIC}\left(\mathit{\lambda}\right)=\mathrm{log}\left\{\frac{\mathit{RSS}\left(\mathit{\lambda}\right)}{n}\right\}+e\left(\mathit{\lambda}\right)\frac{\mathrm{log}n}{n}$$

where *RSS* (** λ**) = ||(

It is challenging to minimize *BIC* (** λ**) over a

Xiao et al. (2003) pointed out it was challenging in selecting a bandwidth for their procedure, and the authors simply used the rule of thumb bandwidth,
$h=1.06\phantom{\rule{0.2em}{0ex}}{S}_{X}\phantom{\rule{0.2em}{0ex}}{n}^{-\frac{1}{5}}$, to prewhite AR process, where *S _{X}* is the standard error of

We use local linear regression to get the initial estimate * _{I}* (·) with the plug-in bandwidth selector (Ruppert, Sheather and Wand, 1995), pretending the data are independent. Since model (2.2) is a partially linear model, we can use the existing bandwidth selector for partially linear model in the literature. Here we suggest using the proposal of Fan and Li (2004). Specifically, we calculate the difference-based estimate for

In this section, we investigate the finite sample performance of the proposed procedures by Monte Carlo simulation, and compare the performance of proposed procedures with existing ones by the mean squares errors, defined by

$$\mathrm{MSE}\left\{\widehat{m}(\xb7)\right\}=\frac{1}{n}\sum _{t=1}^{n}{\left\{\widehat{m}\left({x}_{t}\right)-m\left({x}_{t}\right)\right\}}^{2}.$$

We summarize our simulation results in terms of relative MSE (RMSE), defined by the ratio of the MSE of an estimation procedure to the MSE of * _{I}* (·), the estimate of

**Example 1.** In this example, a random sample of size *n*, either *n* = 100 or *n* = 500, is generated from

$${y}_{t}=m\left({x}_{t}\right)+{\epsilon}_{t}.$$

In this example, we consider two scenarios for *m*(*x*). The first one is

$$m\left(x\right)=4\mathrm{cos}\left(2\pi x\right),$$

and the second one is

$$m\left(x\right)=\mathrm{exp}\left(2x\right).$$

The mean function *m* (*x*) is not monotone in the first scenario, while it is monotone in the second scenario. The error process *ε _{t}* is an AR process of order

$${\epsilon}_{t}=\sum _{j=1}^{d}{\beta}_{j}{\epsilon}_{t-j}+{\eta}_{t},$$

where *η _{t}* ~

To understand how the sampling scheme of *x _{t}* affects the proposed procedure, we consider two sampling schemes in our simulation.

*x*is independent and identically distributed according to the uniform distribution over [0, 1]._{t}*u*is independent and identically distributed according to the standard normal distribution for_{t}*t*= 1,2, … Let ${x}_{t}=\mathrm{\Phi}\{({\mathit{au}}_{t}+{\mathit{bu}}_{t-1})/\sqrt{{a}^{2}+{b}^{2}}\}$ for*t*= 2,3, … , where Φ(*u*) is the cumulative distribution function of the standard normal distribution. Thus,*x*is 1-dependent process. In our simulation, we take_{t}*a*= 0.9 and*b*= 0.1.

For each sampling scheme, three methods, Xiao, Linton, Carroll and Mammen (2003) method (XLCM), profile least squares method (Profile) and penalized profile least square method with SCAD penalty function (SCAD) are compared with regard to the effciency improvement. In addition, oracle procedure by substituting the true autoregressive coefficient and order is listed as a benchmark. The overall pattern for *d* = 10 and *d* = 20 is the same. To save space, we present results with *d* = 20 only.

For sample scheme I, the covariate *x _{t}*’s are independent and identically distributed, only the random error is correlated. Table 1 summarizes the simulation for sampling scheme I with

When the sample size is large, such as *n* = 500, the performance of Xiao et al’s method, the profile least squares method and the SCAD procedure are very closely to each other, although the SCAD procedure is slightly better than the other two. The gain for these three methods in terms of RMSE with large sample is more than the one with the smaller sample size (*n* = 100). This is expected because with the large sample size, all three methods can estimate ** β** more accurate. This leads the decorrelation method works better.

Simulation results for sampling scheme II are summarized in Table 2. The overall pattern of Table 2 is similar to that in Table 1. Although the sampling scheme II is different from the sampling scheme I in that the covariate *x _{t}* ’s is dependent in the sample scheme II, while they are independent in the sample scheme I. For the sample scheme II, the SCAD procedures performs best among the three methods in the comparison, and its performance is very close to the oracle procedure. The performances of Xiao et al’s method and the profile least squares procedure are similar, and no one dominates the other one. As a summary, the performance of the proposed profile least squares procedure and the SCAD procedures seems not to rely on the sampling scheme of covariate

**Example 2.** In this example, we illustrate the proposed methodology by analysis of a data set about U.S. macroeconomics, collected from January 1980 to December 2006 in a monthly basis. Our interest here is to investigate the relationship between the unemployment rate and house price index change.

In the past few years, house price in U.S. has shown a strong upward trend, although the bubble warning always exists. Many home buyers who do not have sound credit history nor suffcient financial capability become home owners with the help of sub-prime mortgage. They bear with high level of interest payments but believe the property will keep appreciating. In the meantime, the mortgage agent packages the debt and sell it to other institutional investors. This long chain prospers and works well when the housing market is booming. However, when the house price began to plummet in spring 2007, borrowers had to default and many houses went to foreclosure. Consequently, a number of big financial institutions that have heavy investment in sub-prime mortgage market claimed billions dollars write-off due to the crisis.

In this example, we are interested in the effect of unemployment rate on the house price. By classical economics theory, unemployment rate is an important indicator of the overall economics. If many people claim unemployment, the purchase power is definitely be hurt. However, to our best knowledge, there are no many literatures to study the relationship between the unemployment rate and the housing market in a quantitative manner. Motivated by the sub-prime mortgage turmoil and recent suspicion of recession, it is believed that the historical data might shed some interesting insights on how these two indexes are related. Thus, we take the unemployment rate as the covariate *x* and House Price Index Change as the response variable *y*, and consider the following model

$${y}_{t}=m\left({x}_{t}\right)+{\epsilon}_{t}.$$

(3.1)

We ignore the correlation of the random errors temporarily and estimate (3.1) by the conventional local linear model as Fan and Gijbels (1996). The Ruppert, Sheather and Wand (1995)’s direct plug-in bandwidth is 0.2969.

When the initial estimate (*x _{t}*) is obtained, we can estimate the residual

Correlogram of residual _{t} and _{t}. Plot (a) and (b) are the autocorrelation and partial autocorrelation for _{t}. Plot (c) and (d) are the autocorrelation and partial autocorrelation for **...**

From a conservative point of view, we suspect that the house price might have a year lag. So we assume AR(12) model on errors and employ the penalized profile least square method to select the AR order and estimate *m*(·) simultaneously. The plug-in bandwidth in the profile least square estimation is 0.2140. By the BIC criterion, the optimal tuning parameter used in the order selection procedure is 0.000019.

As a result, AR(1) model with a strong autocorrelation coefficient 0.9438 is most appropriate. It means that the error has only one month lag, which agrees with the partial-autocorrelation plot in Figure 1 (b). After accounting for the autocorrelation, the correlogram of * _{t}* does not have any significant pattern. (See Figure 1 (c) and (d)) In addition, the P-value in the Ljung-Box-Pierce test at the first 24 lags, 0.9134, also shows that the autocorrelation has been successfully removed.

By applying the penalized profile least squares estimation method, the relationship between the House Price Index Change and the unemployment rate turns out to be

$${\widehat{y}}_{t}=\widehat{m}\left({x}_{t}\right)+0.9438{\widehat{\epsilon}}_{t-1}$$

(3.2)

where (·) is displayed in Figure 2. The penalized profile least square approach yields a smoother estimate than the conventional local linear regression because it takes the correlation into account. As expected, the unemployment rate has a negative correlation with house price index change. But this effect is most significant when the unemployment varies between 4% and 5% or between 8% and 10%.

To present the regularity conditions, we need the following definitions for a sequence of random vectors {**z*** _{t}, t* = 0, ±1, ±2, … }. The following notation and definitions are adopted from Chapter 2 of Fan and Yao (2003).

**Definition 1.** *A sequence of random vectors* {**z**_{t}, *t* = 0, ±1, ±2, …} *is said to be strictly stationary if* {**z**_{1}, … , **z**_{n}} and {**z**_{1+k}, … , **z**_{1+n}} *have the same joint distributions for any integer n* ≥ 1 *and any integer k*.

Denote *F _{i}^{j}* to be the

$$\alpha \left(n\right)=\underset{A\epsilon {F}_{-\infty}^{0},B\epsilon {F}_{n}^{\infty}}{\mathrm{sup}}\left|P\left(A\right)P\left(B\right)-P\left(AB\right)\right|$$

(4.1)

**Definition 2.** *A sequence of random vectors* {**z**_{t}, *t* = 0, ±1, ±2, … } *is said to be α-mixing if it is strictly stationary and α*(*n*) → 0 *as n* → ∞.

To make the argument concise, denote **F** = (**f**_{1}, … , **f**_{n})^{T} with **f**_{t} = (*ε*_{t−1}, … , *ε*_{t−d})^{T} , and **E** = (**e**_{1}, … , **e**_{n})^{T} with **e**_{t} = (_{t− 1}, … ,_{t−d})^{T}. Define **Δ** = **E** − **F**. Our proof follows the same strategy as that in Fan and Huang (2005). The following conditions are imposed to facilitate the proof and are adopted from Fan and Huang (2005). They are not the weakest possible conditions.

- The random variable
*x*has a bounded support Ω. Its density function_{t}*f*(.) is Lipschitz continuous and bounded away from 0 on its support. - There is an
*s*> 2 such that*E*||**f**_{t}||^{s}< ∞ and for some*ξ*> 0 such that*n*^{1−2s−1−2ξ}*h*→ ∞. *m*(·) has the continuous second derivative in*x ε*Ω.- The function
*K*(·) is a bounded symmetric density function with bounded support [−*M, M*], satisfying the Lipschitz condition. *nh*^{8}→ 0 and*nh*^{2}/ (log*n*)^{2}→ ∞.- ${\mathrm{sup}}_{x\epsilon \mathrm{\Omega}}\left|{\widehat{m}}_{I}\left(x\right)-m\left(x\right)\right|={o}_{p}({n}^{-\frac{1}{4}})$ where
(_{I}*x*) is obtained by local linear regression pretending that data are i.i.d._{t} - The sequence of random vector (
*x*),_{t}, ε_{t}*t*= 1,2, … , is a strictly stationary and satisfies the mixing condition for*α*-mixing processes: assume that for some*δ*> 2 and*a*> 1 − 2/*δ*,$$\sum _{l}{l}^{a}{\left[\alpha \left(l\right)\right]}^{1-2/\delta}<\infty ,\phantom{\rule{0.2em}{0ex}}E{\left|{\epsilon}_{1}\right|}^{\delta}<\infty ,\phantom{\rule{0.2em}{0ex}}{g}_{{x}_{1}|{\epsilon}_{1}}\left(x|\epsilon \right)\le {A}_{1}<\infty $$

Lemma 4.1. is taken from Lemma 6.1 of Fan and Yao (2003) and will be used in our proof repeatedly.

**Lemma 4.1**. *Let* (*x*_{1}, *ε*_{1}), … , (*x _{n}*,

$$E{\left|{\epsilon}_{t}\right|}^{s}<\infty \phantom{\rule{0.1em}{0ex}}\mathrm{and}\phantom{\rule{0.1em}{0ex}}\underset{\forall x\in \left[a,b\right]}{\mathrm{sup}}\int {\left|{\epsilon}_{t}\right|}^{s}g\left(x,\epsilon \right)\mathrm{d}\epsilon <\infty ,$$

*where g denote the joint density of* (*x _{t}*,

$$\underset{x\in \left[a,b\right]}{\mathrm{sup}}\left|\frac{1}{n}\sum _{i=1}^{n}\left\{{K}_{h}\left({x}_{i}-x\right){\epsilon}_{i}-E\left[{K}_{h}\left({x}_{i}-x\right){\epsilon}_{i}\right]\right\}\right|={O}_{p}\left({\left\{\frac{\mathrm{log}n}{nh}\right\}}^{1/2}\right)$$

*provided that h* → 0, *for some ξ* > 0, *n*^{1−2s−1−2ξ}*h* → ∞ *and n*^{(τ+1.5)(s−1+ξ)−τ/2+5/4}*h*^{−τ/2−5/4} → 0.

**Lemma 4.2.** *Under Conditions A—G, it follows that*

$$\frac{1}{n}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{F}\stackrel{P}{\to}E(\mathbf{f}{\mathbf{f}}^{T}).$$

**Proof** Denote *W _{x}* be a

$${D}_{x}=\left(\begin{array}{cc}1& \frac{{x}_{1}-x}{h}\\ \vdots & \vdots \\ 1& \frac{{x}_{n}-x}{h}\end{array}\right)$$

Then the smoothing matrix **S** for the local linear regression can be expressed as

$$\mathbf{S}=\left(\begin{array}{c}\left[1,0\right]{\left\{{D}_{{x}_{1}}^{T}{W}_{{x}_{1}}{D}_{{x}_{1}}\right\}}^{-1}{D}_{{x}_{1}}^{T}{W}_{{x}_{1}}\\ \vdots \\ \left[1,0\right]{\left\{{D}_{{x}_{n}}^{T}{W}_{{x}_{n}}{D}_{{x}_{n}}\right\}}^{-1}{D}_{{x}_{n}}^{T}{W}_{{x}_{n}}\end{array}\right)$$

where

$${D}_{x}^{T}{W}_{x}{D}_{x}=\left(\begin{array}{cc}{\sum}_{i=1}^{n}{K}_{h}\left({x}_{i}-x\right)& {\sum}_{i=1}^{n}\left({x}_{i}-x\right){K}_{h}\left({x}_{i}-x\right)/h\\ {\sum}_{i=1}^{n}\left({x}_{i}-x\right){K}_{h}\left({x}_{i}-x\right)/h& {\sum}_{i=1}^{n}{\left({x}_{i}-x\right)}^{2}{K}_{h}\left({x}_{i}-x\right)/{h}^{2}\end{array}\right)$$

The generic element of matrix *D _{x}^{T}W_{x}D_{x}* is in the form of
${{\sum}_{i=1}^{n}\left(\frac{{x}_{i}-x}{h}\right)}^{j}{K}_{h}\left({x}_{i}-x\right)$,

$$\begin{array}{ll}{S}_{n,j}& =n\int {\nu}^{j}K\left(\nu \right)f\left(x+h\nu \right)\mathrm{d}\nu +{O}_{p}\left(\sqrt{nE\left\{{({x}_{1}-x)}^{2j}{K}_{h}^{2}({x}_{1}-x)\right\}}\right)\\ & =nf\left(x\right){\mu}_{j}+{O}_{p}({h}^{2}+1/\sqrt{nh})\end{array}$$

Because of the symmetry of kernel function, for any odd numbered *j*, *μ _{j}* = 0 and then
${S}_{n,j}={O}_{p}\left(h+1/\sqrt{nh}\right)$. Indeed, with Lemma 4.1, it can be further shown that for even

$${S}_{n,j}=nf\left(x\right){\mu}_{j}+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}\left(n\right)/nh}),$$

and for odd *j*,

$${S}_{n,j}={O}_{p}(h+\sqrt{\mathrm{log}\left(n\right)/nh})$$

holds uniformly in *x*. Therefore,

$$\frac{1}{n}{D}_{x}^{T}{W}_{x}{D}_{x}=\left(\begin{array}{cc}f(x)(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh}))& {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})\\ {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})& f(x){\mu}_{2}(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh}))\end{array}\right)$$

holds uniformly in *x*.

Since $h+\sqrt{\mathrm{log}\left(n\right)/nh}={o}_{p}\left(1\right)$, we can regard the above matrix as being approximately diagonal.

Then its inverse is

$${\left\{\frac{1}{n}{D}_{x}^{T}{W}_{x}{D}_{x}\right\}}^{-1}=\left(\begin{array}{cc}{\left\{f\left(x\right)\right\}}^{-1}(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh})& {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})\\ {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})& {\left\{f\left(x\right){\mu}_{2}\right\}}^{-1}(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh}))\end{array}\right)$$

holds uniformly in *x*.

Similarly, by Lemma 4.1 and the assumption of independence between the process *ε _{t}* and the process

$$\frac{1}{n}{D}_{x}^{T}{W}_{x}\mathbf{F}=\left(\begin{array}{c}{O}_{p}({h}^{2}+\sqrt{\frac{\mathrm{log}n}{nh}})\\ {O}_{p}(h+\sqrt{\frac{\mathrm{log}n}{nh}})\end{array}\right)$$

holds uniformly in *x*.

Consequently,

$$\begin{array}{l}\left[1,0\right]{\{\frac{1}{n}{D}_{x}^{T}{W}_{x}{D}_{x}\}}^{-1}\{\frac{1}{n}{D}_{x}^{T}{W}_{x}\mathbf{F}\}\\ =\left[1,0\right]\left(\begin{array}{cc}{\left\{f\left(x\right)\right\}}^{-1}(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh}))& {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})\\ {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})& {\left\{f\left(x\right){\mu}_{2}\right\}}^{-1}(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh}))\end{array}\right)\left(\begin{array}{c}{O}_{p}({h}^{2}+\sqrt{\frac{\mathrm{log}n}{nh}})\\ {O}_{p}(h+\sqrt{\frac{\mathrm{log}n}{nh}})\end{array}\right)\\ ={\left\{f\left(x\right)\right\}}^{-1}{O}_{p}({h}^{2}+\sqrt{\frac{\mathrm{log}n}{nh}})(1+{o}_{p}\left(1\right))={o}_{p}\left(1\right)\end{array}$$

Substituting this result into the smoothing matrix *S*, we have

$$S\mathbf{F}=\left(\begin{array}{c}\left[1,0\right]{\left\{{D}_{{x}_{1}}^{T}{W}_{{x}_{1}}{D}_{{x}_{1}}\right\}}^{-1}{D}_{{x}_{1}}^{T}{W}_{{x}_{1}}\mathbf{F}\\ \vdots \\ \left[1,0\right]{\left\{{D}_{{x}_{n}}^{T}{W}_{{x}_{n}}{D}_{{x}_{n}}\right\}}^{-1}{D}_{{x}_{n}}^{T}{W}_{{x}_{n}}\mathbf{F}\end{array}\right)=\left(\begin{array}{c}{o}_{p}\left(1\right)\\ \vdots \\ {o}_{p}\left(1\right)\end{array}\right).$$

Thus,

$$\mathbf{F}-S\mathbf{F}=\mathbf{F}\left\{1+{o}_{p}\left(1\right)\right\}.$$

Finally, by the WLLN,

$$\frac{1}{n}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{F}=\left(\frac{1}{n}\sum _{i=1}^{n}{\mathbf{f}}_{i}{\mathbf{f}}_{i}^{T}\right){\left\{1+{o}_{p}\left(1\right)\right\}}^{2}\stackrel{P}{\to}E\left(\mathbf{f}{\mathbf{f}}^{T}\right)$$

**Lemma 4.3.** *Under Conditions A—G, we have*

$$\frac{1}{n}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}\stackrel{P}{\to}E\left(\mathbf{f}{\mathbf{f}}^{T}\right).$$

**Proof** Since **Δ** = **E** − **F**, the generic element of **Δ** is of the form *m*(*x _{t}*) − (

$$\frac{1}{n}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}=\frac{1}{n}{\left(\mathbf{F}+\mathbf{\Delta}\right)}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\left(\mathbf{F}+\mathbf{\Delta}\right)$$

By using similar argument in the proof of Lemma 4.2, it can be shown that

$$\frac{1}{n}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}=\frac{1}{n}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{F}+{o}_{P}\left(1\right)$$

Thus, Lemma 4.3 follows by Lemma 4.2.

**Lemma 4.4.** *Suppose Conditions A—G hold. It follows*

$$\frac{1}{\sqrt{n}}{\mathbf{F}}^{T}{\left(I-\mathbf{S}\right)}^{T}\left(I-\mathbf{S}\right)\mathbf{m}={o}_{P}\left(1\right)$$

**Proof** It is noted that

$$\frac{1}{\sqrt{n}}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}[{\mathbf{f}}_{i}-{(S\mathbf{f})}_{i}]\phantom{\rule{0.1em}{0ex}}[m({x}_{i})-\left[1,0\right]{\{{D}_{{x}_{i}}^{T}{W}_{{x}_{i}}{D}_{{x}_{i}}\}}^{-1}{D}_{{x}_{i}}^{T}{W}_{{x}_{i}}\mathbf{m}]$$

(4.2)

Similar to the argument in the proof of Lemma 4.2, we can show that

$$\left[1,0\right]{\{\frac{1}{n}{D}_{x}^{T}{W}_{x}{D}_{x}\}}^{-1}\{\frac{1}{n}{D}_{x}^{T}{W}_{x}\mathbf{m}\}=m(x)(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh}))$$

holds uniformly in *x* ε Ω. Plugging this in (4.2), it follows that

$$\begin{array}{ll}\frac{1}{\sqrt{n}}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}& =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}[{\mathbf{f}}_{i}-{(S\mathbf{f})}_{i}]\phantom{\rule{0.1em}{0ex}}[m({x}_{i})-m({x}_{i})(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh})]\\ & =\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\mathbf{f}}_{i}m({x}_{i})\phantom{\rule{0.1em}{0ex}}[1+{o}_{p}(1)]{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh})\end{array}$$

Note that *E*{**f**_{i}*m* (*x _{i}*)} = 0, and covariance matrix for {

**Lemma 4.5.***Under Conditions A—G, we have*

$$\frac{1}{\sqrt{n}}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}={o}_{p}\left(1\right)$$

**Proof** Since **E** = **F** + **Δ**, we can break
$\frac{1}{\sqrt{n}}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}$ into two terms:
$\frac{1}{\sqrt{n}}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}$, which is *o _{P}* (1) by Lemma 4.4, and
$\frac{1}{\sqrt{n}}{\mathbf{\Delta}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}$, which is also

**Lemma 4.6.** *Suppose that Conditions A—G hold. We have*

$$\frac{1}{\sqrt{n}}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{\Delta}\mathit{\beta}={o}_{p}\left(1\right)$$

**Proof** This is a direct result from the proof of Lemma 4.3.

**Lemma 4.7.** *Under Conditions A—G, let η* = (*η*_{1}, … , *η _{n}*)

$$\sqrt{n}{\left[{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{F}\right]}^{-1}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta \to N(0,{\sigma}^{2}{\{E(\mathbf{f}{\mathbf{f}}^{T})\}}^{-1})$$

**Proof** We observe that

$${\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta =\sum _{i=1}^{n}{\mathbf{f}}_{i}[{\eta}_{i}-[1,0]{\{{D}_{{x}_{i}}^{T}{W}_{{x}_{i}}{D}_{{x}_{i}}\}}^{-1}{D}_{{x}_{i}}^{T}{W}_{{x}_{i}}\eta ]\phantom{\rule{0.2em}{0ex}}[1+{o}_{p}(1)]$$

(4.3)

By using Lemma 4.1 on {*x _{i}, η_{i}*}, we can show that

$$\begin{array}{l}\left[1,0\right]{\left\{\frac{1}{n}{D}_{x}^{T}{W}_{x}{D}_{x}\right\}}^{-1}\left\{\frac{1}{n}{D}_{x}^{T}{W}_{x}\eta \right\}\\ =\left[1,0\right]\left(\begin{array}{ll}{\left\{f\left(x\right)\right\}}^{-1}(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh})& {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})\\ {O}_{p}(h+\sqrt{\mathrm{log}(n)/nh})& {\{f(x){\mu}_{2}\}}^{-1}(1+{O}_{p}({h}^{2}+\sqrt{\mathrm{log}(n)/nh})\end{array}\right)\left(\begin{array}{l}{O}_{p}({h}^{2}+\sqrt{\frac{\mathrm{log}n}{nh}})\\ {O}_{p}(h+\sqrt{\frac{\mathrm{log}n}{nh}})\end{array}\right)\\ ={o}_{p}\left(1\right)\end{array}$$

Then ${\eta}_{i}-\left[1,0\right]{\left\{{D}_{{x}_{i}}^{T}{W}_{{x}_{i}}{D}_{{x}_{i}}\right\}}^{-1}{D}_{{x}_{i}}^{T}{W}_{{x}_{i}}\eta ={\eta}_{i}\left\{1+{o}_{p}\left(1\right)\right\}$ Plugging this in (4.3), we obtain that

$${\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta =\sum _{i=1}^{n}{\mathbf{f}}_{i}{\eta}_{i}\left\{1+{o}_{p}\left(1\right)\right\}$$

Since *E*(**f*** _{i}η_{i}*) = 0, Var(

$$\frac{1}{\sqrt{n}}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta \stackrel{L}{\to}N\left(0,{\sigma}^{2}\left\{E\left(\mathbf{f}{\mathbf{f}}^{T}\right)\right\}\right).$$

By Lemma 4.2, $\frac{1}{n}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{F}\stackrel{P}{\to}E\left(\mathbf{f}{\mathbf{f}}^{T}\right)$. Apply the Slutsky theorem, it follows that

$$\sqrt{n}\left[{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{F}\right]{)}^{-1}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta \stackrel{L}{\to}N(0,{\sigma}^{2}{\{E(\mathbf{f}{\mathbf{f}}^{T})\}}^{-1}).$$

**Lemma 4.8.** *Under Conditions A—G, we have*

$$\sqrt{n}\left[{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}\right]{)}^{-1}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta \stackrel{P}{\to}N(0,{\sigma}^{2}{\{E(\mathbf{f}{\mathbf{f}}^{T})\}}^{-1})$$

**Proof** Since **E** = **F** + **Δ**, we may write **E**^{T} (*I − S*)^{T} (*I − S*)*η* = **F**^{T} (*I − S*)^{T} (*I − S*)*η* + **Δ**^{T} (*I −S*)^{T} (*I − S*)*η*. Note that **Δ** = *o _{P}* (

$$\frac{1}{\sqrt{n}}{\mathrm{\Delta}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta ={o}_{p}\left(1\right).$$

Furthermore, we have shown in the last lemma that $\frac{1}{\sqrt{n}}{\mathbf{F}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta \to N\left(0,{\sigma}^{2}E\left(\mathbf{f}{\mathbf{f}}^{T}\right)\right)$. So $\frac{1}{\sqrt{n}}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta \to N\left(0,{\sigma}^{2}E\left(\mathbf{f}{\mathbf{f}}^{T}\right)\right)$ as well. The proof is completed by the Slutsky theorem and Lemma 4.3.

**Proof of Theorem 1** Let us first show the asymptotic normality of ** . According to the expression of **** in (2.5), we can break
$\sqrt{n}\left(\widehat{\mathit{\beta}}-\mathit{\beta}\right)$ into the sum of the following three terms (a), (b) and (c)
**

$$\begin{array}{cc}\left(a\right)& \stackrel{\wedge}{=}\sqrt{n}\left[{\left\{{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}\right\}}^{-1}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}\right]\\ \left(b\right)& \stackrel{\wedge}{=}\sqrt{n}\left[{\left\{{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}\right\}}^{-1}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{\Delta}\beta \right]\\ \left(c\right)& \stackrel{\wedge}{=}\sqrt{n}\left[{\left\{{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}\right\}}^{-1}{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\eta \right]\end{array}$$

For term (a), it is a product of two terms
${\left[\frac{\left\{{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{E}\right\}}{n}\right]}^{-1}$ and
$\left[\frac{{\mathbf{E}}^{T}{\left(I-S\right)}^{T}\left(I-S\right)\mathbf{m}}{\sqrt{n}}\right]$. From Lemmas 4.3 and 4.5, the asymptotic properties of these two terms lead to the conclusion that (*a*) = *o _{p}*(1). Similarly, applying Lemmas 4.3 and 4.6 on two product components of term (b) results in (

Next we derive the asymptotic bias and variance of (·). From Lemmas 4.1— 4.8, we have

$$\widehat{m}\left({x}_{0},\widehat{\mathit{\beta}}\right)=\left[1,0\right]{\left\{{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}{D}_{{x}_{0}}\right\}}^{-1}{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}\left(\mathbf{m}+\eta \right)\left\{1+{o}_{P}\left(1\right)\right\}$$

Note that *E*(*η* |χ) = 0, where χ = (*x*_{1}, … , *x _{n}*). Thus, So

$$E\left\{\widehat{m}\left({x}_{0},\widehat{\mathit{\beta}}\right)|\mathrm{\chi}\right\}=\left[1,0\right]{\left\{{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}{D}_{{x}_{0}}\right\}}^{-1}{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}\mathbf{m}\left\{1+{o}_{P}\left(1\right)\right\}$$

which is same as the conditional expected mean for local linear regression derived in Fan and Gijbels(1992). So the asymptotic bias is $\frac{1}{2}{m}^{\u2033}\left({x}_{0}\right){h}^{2}\int {x}^{2}K\left(x\right)$.

Regarding to the asymptotic variance of (·), conditioning on *x*_{1}, … , *x _{n}*,

$$\mathrm{Var}\left[\widehat{m}\left({x}_{0},\widehat{\mathit{\beta}}\right)|\mathrm{\chi}\right]=\left[1,0\right]{\left\{{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}{D}_{{x}_{0}}\right\}}^{-1}{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}\mathrm{Var}\left\{\eta \right\}{W}_{{x}_{0}}{D}_{{x}_{0}}{\left\{{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}{D}_{{x}_{0}}\right\}}^{-1}{\left[1,0\right]}^{T}$$

Using the same argument as that in the proof of Lemma 4.2, we have

$$\mathrm{Var}\left[\widehat{m}\left({x}_{0},\widehat{\mathit{\beta}}\right)|\mathrm{\chi}\right]=\frac{{\sigma}^{2}}{\mathit{nhf}\left({x}_{0}\right)}\int {K}^{2}\left(x\right)\mathrm{d}x$$

As to the asymptotic normality,

$$\widehat{m}\left({x}_{0},\widehat{\mathit{\beta}}\right)-E\left\{\widehat{m}\left({x}_{0},\widehat{\mathit{\beta}}\right)|\mathrm{\chi}\right\}=\left[1,0\right]{\left\{{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}{D}_{{x}_{0}}\right\}}^{-1}{D}_{{x}_{0}}^{T}{W}_{{x}_{0}}\eta \left\{1+{o}_{P}\left(1\right)\right\}$$

Thus, conditioning on χ, the asymptotic normality can be established using the CLT since *η _{i}* is independent and identically distributed with mean zero and variance

In this paper, we proposed a new estimation procedure for the nonparametric regression model with AR error by using profile least squares techniques. We further propose to determine the order of the AR error process using the penalized profile least squares with the SCAD penalty. We studied the asymptotic properties of the proposed estimators, and established their asymptotic normality. We conducted extensive Monte Carlo simulation studies to examine the finite sample performance of the proposed procedure and compare the proposed procedure with the proposal of Xiao et al. (2003).

It is of interest to theoretic compare the asymptotic mean squares errors of the proposed procedures and the local linear estimator without taking into account the error correlation. It is also of interest to investigate the effect of misspecification of error model. This needs more research in future.

This article is dedicated to the 30th Anniversary of Institute of Applied Mathematics, Chinese Academy of Sciences, Beijing. The study and work experience at the Institute has been invaluable treasure in the first author’s life. Runze Li would like to thank supports from the faculty members, staffs and friends of the Institute during his stay in the Institute. The authors are very grateful to Professor Jia-An Yan for his invitation. Runze Li’s research was supported by National Institute on Drug Abuse grant R21 DA024260, and Yan Li is supported by National Science Foundation grant DMS 0348869 as a graduate research assistant.

Runze Li, Department of Statistics and The Methodology Center, Pennsylvania State University, University Park, PA 16802-2111, USA, Email: ude.usp.tats@ilr.

Yan Li, Cards Acquisitions, Capital One Financial Inc, 1680 Capital One Dr, 19050-0701, McLean, VA 22102 USA, Email: moc.enolatipac@il.eiggam.

1. Altman NS. Kernel Smoothing of Data with Correlated Errors. Journal of the American Statistical Association. 1990;85:749–759.

2. Brockwell PJ, Davis RA. Time Series: Theorey and Methods. Springer; New York: 1991.

3. Engle RF, Granger CWJ, Rice J, Weiss A. Semiparametric estimates of the relation between weather and electricity sales. J Amer Stat Assoc. 1986;81:310–320.

4. Fan J, Gijbels I. Chapman and Hall. London: 1996. Local Polynomial Modeling and Its Applications.

5. Fan J, Huang T. Profile Likelihood Inferences on Semiparametric Varying-coefficient Partially Linear Models. Bernoulli. 2005;11:1031–1059.

6. Fan J, Huang T, Li R. Analysis of Longitudinal Data with Semiparametric Estimation of Covariance Function. Journal of American Statistical Association. 2007;102:632–641. [PMC free article] [PubMed]

7. Fan J, Li R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of American Statistical Association. 2001;96:1348–1360.

8. Fan J, Li R. New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis. Journal of American Statistical Association. 2004;99:710–723.

9. Fan J, Yao Q. Nonlinear Time Series: Nonparametric and Parametric Methods. Springer-Verlag; New York: 2003.

10. Gu C. Smoothing Spline ANOVA Models. Springer-Verlag; New York: 2002.

11. Hart JD. Kernel Regression Estimation with Time Series Errors. Journal of the Royal Statistical Society, Series B. 1991;53:173–187.

12. Hansen LP. Large sample properties of generalized method of moments estimators. Econometrics. 1982;59:1029–1054.

13. Heckman N. Spline smoothing in partly linear models. J Royal Stat Soc Ser B. 1986;48:244–248.

14. Liang K, Zeger S. Longitudinal Data-Analysis Using Generalized Linear-Models. Biometrika. 1986;73:13–22.

15. Lin X, Carroll R. Nonparametric Function Estimation for Clustered Data When the Predictor is Measured without/with Error. Journal of The American Statistical Association. 2000;95:520–534.

16. Opsomer J. Estimating a Function by Local Linear Regression when the Errors are Correlated. Department of Statistics, Iowa State University; 1995. pp. 95–42. preprint.

17. Opsomer J, Wang Y, Yang Y. Nonparametric Regression with Correlated Errors. Statistical Science. 2001;16:134–153.

18. Qu A, Lindsay B, Li B. Improving Generalized Estimating Equations Using Quadratic Inference Functions. Biometrika. 2000;87:823–836.

19. Ruppert D, Sheather SJ, Wand MP. An Effective Bandwidth Selector for Local Least Squares Regression. Journal of American Statistical Association. 1995;90:1257–1270.

20. Severini TA, Staniswalis JG. Quasi-likelihood estimation in semiparametric models. J Amer Stat Assoc. 1994;89:501–511.

21. Silverman BW. Density Estimation for Statistics and Data Analysis. Chapman and Hall; London: 1986.

22. Speckman P. Kernel smoothing in partial linear models. Journal Royal Statist Soc B. 1988;50:413–436.

23. Wahba G. Partial spline models for semiparametric estimation of functions of several variables. Statist Analysis of Time Ser; Proceedings of the Japan U.S Joint Seminar; Tokyo. 1984. pp. 319–329.

24. Wang H, Li R, Tsai C-L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 2007;94:553–568. [PMC free article] [PubMed]

25. Wang N. Marginal nonparametric kernel regression accounting within-subject correlation. Biometrika. 2003;90:29–42.

26. Xiao Z, Linton O, Carroll RJ, Mammen E. More Efficient Local Polynomial Estimation in Nonparametric Regression with Autocorrelated Errors. Journal of the AMerican Statistical Association. 2003;98:980–992.

27. Zeger S, Liang K. Longitudinal Data-Analysis for Discrete and Continuous Outcomes. Biometrics. 1986;42:121–130. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |