Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2997475

Formats

Article sections

- Abstract
- 1. Introduction
- 2. GLMs and quasi-likelihood models
- 3. Nonparametric quasi-maximum likelihood with a parametric guide
- 4. Asymptotic properties
- 5. Pre-asymptotic bandwidth selection
- 6. Monte Carlo Study
- Example 6.1 Poisson
- Example 6.2 Bernoulli
- 7. Real Data Analysis
- 8. Discussion
- References

Authors

Related links

Ann Stat. Author manuscript; available in PMC 2010 December 6.

Published in final edited form as:

Ann Stat. 2009 December; 37(6B): 4153–4183.

doi: 10.1214/09-AOS713PMCID: PMC2997475

NIHMSID: NIHMS248827

Princeton University

See other articles in PMC that cite the published article.

Generalized linear models and quasi-likelihood method extend the ordinary regression models to accommodate more general conditional distributions of the response. Nonparametric methods need no explicit parametric specification and the resulting model is completely determined by the data themselves. However nonparametric estimation schemes generally have a slower convergence rate such as the local polynomial smoothing estimation of nonparametric generalized linear models studied in Fan, Heckman and Wand (1995). In this work, we propose two parametrically guided nonparametric estimation schemes by incorporating prior shape information on the link transformation of the response variable’s conditional mean in terms of the predictor variable. Asymptotic results and numerical simulations demonstrate the improvement of our new estimation schemes over the original nonparametric counterpart.

As an extension of the ordinary linear model, generalized linear model (GLM) broadens techniques of ordinary linear regression to accommodate more general conditional distributions of the response. It was first introduced by Nelder and Wedderburn (1972). Its estimation is based on the iteratively reweighted least squares (IRLS) algorithm, which only requires a relationship between the response’s conditional mean and variance instead of its full conditional distribution. This feature was noticed by Wedderburn (1974). In this important further extension, Wedderburn replaced the log-likelihood by a quasi-loglikelihood function. This is usually referred to as the quasi-likelihood method (QLM).

In generalized linear models (GLMs) (McCullagh and Nelder, 1989), a typical parametric assumption is that a transformation of the response’s conditional mean, referred to as the link function, belongs to some parametric family (say, linear or quadratic in the predictor variables). However misspecification of the parametric family can lead to a completely wrong picture of the underlying conditional mean function. This deficiency of parametric modeling has long been realized in ordinary regression and applies to GLMs as well. It calls for an extension of nonparametric regression techniques to the GLMs. Green and Yandell (1985), O’Sullivan et al. (1986), and Cox and O’Sullivan (1990) studied the extension of smoothing splines. Tibshirani and Hastie (1987) based their generalization on the “running lines” smoother. Fan, Heckman and Wand (1995) extended the local polynomial fitting technique and includes Staniswalis (1989) as a special case.

Local polynomial smoothing is a useful technique to explore unknown structure in regression and dates back to Stone (1975, 1977). This area was blossomed when Fan (1993) provided a deep theoretic understanding and discovered its elegant properties, especially the automatic boundary carpentry. Here we focus on local polynomial techniques although the idea can be extended to other nonparametric methods.

Nonparametric methods need no explicit specification of the form of the conditional mean for ordinary regression, and more generally the link transformation of the conditional mean in the context of GLMs. However they have in general a slower rate of convergence. In practice, prior knowledge or exploratory studies may provide us some prior information about the shape of the link function. This information is ready to guide us in the nonparametric modeling process. In the literature, parametrically guided nonparametric estimation methods were proposed to improve over its nonparametric counterpart in the context of density estimation (Hjort and Glad, 1995; Naito, 2004) and least squares regression (Glad, 1998; Martins-Filho et al., 2007). The idea is very easy to explain in the least squares regression case. Assume that the response *Y* given a covariate *X* has a conditional mean *m(x) = E(Y|X = x)*. Once a parametric estimator *m*(*x*, ) of *m*(*x*) is obtained, any nonparametric method can be applied on residuals {*Y _{i}*/

Due to its nice property of bias reduction, it is desirable to extend this parametrically guided estimation scheme to GLMs and QLM. However for response with a general distribution other than normal, the residuals *Y*/*m*(*X*,) and *Y − m*(*Y*,) do not have a nice statistical property to facilitate estimating *m*(*x*)/*m*(*x*, ) and *m*(*x*) − *m*(*x*,) to make the straightforward extension possible. In this work, we take on this problem and extend the parametrically guided estimation scheme to GLMs and QLM. Asymptotic theory and numerical simulations are used to justify our proposed methods. In the literature, similar approaches have been used to reduce variance. Cheng et al. (2007) proposed to form a linear combination of a preliminary estimator to reduce variance in smoothing and Cheng and Hall (2003) studied variance reduction in nonparametric surface estimation.

The rest of the paper is organized as follows. Section 2 presents the fundamental framework of GLMs and QLM. Two new semiparametric estimation schemes based on additive and multiplicative correction are introduced in Section 3. Asymptotic properties are developed to show their improvement over the original nonparametric counterpart in Section 4. Section 5 gives a general pre-asymptotic bandwidth selector based on bias-variance tradeoff. Simulations in Section 6 and real data analysis in Section 7 show our new schemes’ finite sample performance in comparison to the original nonparametric method. We conclude with a short discussion in Section 8. All technical proofs are given in Appendix.

Let (**X**_{1,} *Y*_{1}),,(**X**_{n},*Y _{n}*) be a set of

$${f}_{Y|\mathbf{X}}(y|\mathbf{x})=\text{exp}([y\theta (\mathbf{x})-b(\theta (\mathbf{x}))]/a(\varphi )+c(y,\varphi )),$$

(2.1)

where *b*(·) and *c*(·, ·) are some known functions, ϕ is the dispersion parameter, and *a*(ϕ) typically can be written as ϕ/*w* with weight parameter *w*. The parameter θ is usually called the canonical or natural parameter. For the one-parameter exponential family (2.1), the conditional mean and variance of the response variable are given by μ(**x**) = *b*′(θ(**x**)) and var(*Y|***X** = **x**) = *a*(ϕ)*b*″(θ(**x**)), respectively.

Parametric GLMs typically assume that a transformation of the conditional mean function μ(**x**) is linear in regression coefficients, *i.e*.,

$$\eta (\mathbf{x})={\mathit{\beta}}_{0}+{\mathbf{x}}^{T}\mathit{\beta}\text{\hspace{1em}and\hspace{1em}}\eta (\mathbf{x})=g(\mu (\mathbf{x})),$$

where *g*(·) is monotonic and referred to as the link function.

The canonical link refers to *g* = (*b*′)^{−1} in that *b*′(θ(**x**)) = μ(**x**). When the canonical link is used, the composition *g* *b*′(·) reduces to the identity function and θ(**x**) is the same as η(x). In this case, the conditional density can be simplified as

$${f}_{Y|\mathbf{X}}(y|\mathbf{x})=\text{exp}([y\eta (\mathbf{x})-b(\eta (\mathbf{x}))]/a(\varphi )+c(y,\varphi )).$$

In common practice, there are many circumstances in which the full likelihood is unknown. However the relationship between the conditional mean and variance may be readily available. In this case, estimation of the conditional mean function can be achieved by replacing the conditional log-likelihood log *f*_{Y|X}(*y*|**x**) by a quasi-loglikelihood function *Q*(μ(**x**),*y*). When the conditional variance is modeled as var(*Y*|**X** = **x**) = *V*(μ(**x**)) for some known positive function *V*(·), the corresponding *Q*(μ, *y*) satisfies

$$U(w,y)=\frac{\partial}{\partial w}Q(w,y)=\frac{y-w}{V(w)},$$

(2.2)

where *V*(·) is called the variance function. More explicitly, $Q(\mu ,y)={\displaystyle {\int}_{y}^{\mu}(y-w)/V(w)dw}$ See Wedderburn (1974) and Chapter 9 of McCullagh and Nelder (1989) for more details. The quasi-score (2.2) possesses properties similar to those of the usual log-likelihood score function, *i.e*., it satisfies the first two moment conditions of Bartlett’s identities. Note that the loglikelihood of the one-parameter exponential family (2.1) is a special case of quasi-likelihood function with *V*(·) = *a*(ϕ)*b*″ (*b′*)^{−1}(·).

Due to the generality of the QLM and the fact that it includes the GLMs as a special case, we will focus on QLM. Fan, Heckman and Wand (1995) introduced nonparametric GLMs and QLM by extending the local polynomial techniques. We will follow their framework and notations. To ease our presentation, we focus on the one-dimension case as the extension to the multivariate case is straightforward. For the one-dimension case, our data consist of *n* pairs of observations ${({X}_{i},{Y}_{i})}_{i=1}^{n}$

To enhance flexibility, η(*x*) can be modeled nonparametrically. Its estimation can be achieved using the local polynomial technique. For any *x*_{0} in its domain, the local polynomial estimator of η(*x*_{0}) is given by (*x*_{0}) (*x*_{0}; *p*, *h*) = _{0}, where = (_{0}, _{1}, , _{p})^{T} maximizes the locally weighted quasi-likelihood function

$$Q(\mathit{\beta})=Q(\mathit{\beta};h,{x}_{0})={\displaystyle \sum _{i=1}^{n}Q({g}^{-1}({\mathbf{X}}_{i}^{T}\mathit{\beta}),{Y}_{i}){K}_{h}({X}_{i}-{x}_{0}),}$$

(2.3)

where, with slight abuse of notation, we define **X**_{i} = (1, *X _{i}−x_{0}*, , (

As argued in the introduction, prior knowledge, physical model, or exploratory analysis may give us some useful information about the shape of η(*x*), which falls approximately into a parametric family {η(*x*, **α**) : **α** = (α_{1}, α_{2},, α_{q})^{T} ; ^{q}}. In this section, we present our two new estimation schemes by incorporating the available useful shape information of η(*x*) to guide us while estimating η(*x*). Within the parametric family η(*x*, **α**), we find the optimal fit by maximizing

$$\sum _{i=1}^{n}Q({g}^{-1}(\eta ({X}_{i},\mathbf{\alpha})),{Y}_{i})$$

(3.1)

with respect to **α** . Denote the best fit by η(*x*,), where is the maximizer of (3.1).

In the local polynomial fitting framework, the bias is due to the approximation error of the Taylor expansion. The smaller approximation error the less bias in the local polynomial estimator. Recall that we identify some parametric family {η(*x*, **α**) : **α** } based on exploratory studies or prior knowledge and find the best fit η(*x*, ) within this family. As a result, η(*x*, ) should capture the major shape of η(*x*) and consequently η(*x*)/η(*x*, ) and η(*x*) − η(*x*, ) have less variation than the original η (*x*) does. Consequently they are easier to approximate and the approximation errors in their corresponding Taylor expansions are smaller than those of the original function η(*x*). For example, the true unknown η(·) is given by ${\eta}_{0}(x)=3\text{sin}(\frac{\pi}{4}x-\frac{\pi}{2})+6$ for *x* [−2, 2] (as shown by the solid line in panel (A) of Figure 1) in our Poisson simulation Example 6.1. Nonparametric estimate (·) is given by the dotted line in Figure 2 and indicates a parabolic shape. Hence we identify a parametric family {η(*x*, **α**) = α_{1} + α_{2}*x* + α_{3}*x*^{2} : **α** = (α_{1}, α_{2}, α_{3})^{T} ^{3}}, within which the best fit is given by the dotted line in panel (A) of Figure 1. The difference η (*x*) − η(*x*, ) and ratio η(*x*)/η (*x*, ) are shown in panels (B) and (C) of Figure 1, respectively. We can see that the difference and ratio functions are much flatter than the original function η(·) as we desired.

Plots of true η(·), nonparametric estimate (·), and two parametrically guided estimates _{a}(·) and _{m}(·) for one random sample in Example 6.1.

Based on the above argument, two different estimation schemes corresponding to multiplicative and additive corrections are introduced in Sections 3.2 and 3.3, respectively.

Consider the multiplicative identity

$$\eta (x)\equiv \eta (x,\mathbf{\alpha}){r}_{m}(x),$$

where *r _{m}*(

$$\sum _{i=1}^{n}Q({g}^{-1}({\mathbf{X}}_{i}^{T}\mathit{\beta}\eta ({X}_{i}\widehat{\mathbf{\alpha}})),{Y}_{i}){K}_{h}({X}_{i}-{x}_{0})$$

with respect to **β** and set (*x*_{0} = _{0}, the first component of the maximizer. Then η(*x*_{0}) can be estimated by η(*x*_{0}, ) (*x*_{0}). This two-step formulation is equivalent to the following one, to which we shall prefer due to its facilitation to our theoretical development.

Locally approximating *r _{m}*(·) by a polynomial function and re-scaling it by a fact η(

$${Q}_{m}(\mathit{\beta})\equiv {Q}_{m}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}})={\displaystyle \sum _{i=1}^{n}Q({g}^{-1}({\mathbf{X}}_{i}^{T}\mathit{\beta}\eta ({X}_{i},\widehat{\mathbf{\alpha}})/\eta ({x}_{0},\widehat{\mathbf{\alpha}})),{Y}_{i}){K}_{h}({X}_{i}-{x}_{0}).}$$

(3.2)

We maximize (3.2) with respect to **β** and set the final estimator _{m}(*x*_{0}) _{m} (*x*_{0}; *p*, *h*, ) = _{0}. In this formulation, the Taylor expansion ${\mathbf{X}}_{i}^{T}\mathit{\beta}$ is supposed to approximate η(*X*_{i})η(*x*_{0}, )/η(*X*_{i}, ) locally at *x = x _{0}*. This immediately justisifies setting

The other additive identity

$$\eta (x)\equiv \eta (x,\mathbf{\alpha})+{r}_{a}(x)$$

with *r _{a}*(

$${Q}_{a}(\mathit{\beta})\equiv {Q}_{a}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}})={\displaystyle \sum _{i=1}^{n}Q({g}^{-1}(\eta ({X}_{i},\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta}),{Y}_{i}){K}_{h}({X}_{i}-{x}_{0})}$$

(3.3)

with respect to **β**. Similarly the expansion ${\mathbf{X}}_{i}^{T}\mathit{\beta}$ in this formulation targets at approximating η(*X*_{i}) − η(*X*_{i}, ) + η(*x*_{0}, ) locally at *x = x _{0}*. Hence

As argued in Section 3.1, η(*x*)η(*x*_{0}, )/η(*x*, ) and η(*x*) − η(*x*, )+η(*x*_{0}, ) are flatter and easier to approximate with a larger bandwidth. As a result our two new parametrically guided nonparametric estimation schemes lead to estimators with smaller bias. This is confirmed by our theoretical results in Section 4 and Monte Carlo studies carried out in Section 6.

To proceed our theoretical justification for the bias reduction of our new methods, we assume that our data ${({X}_{i},{Y}_{i})}_{i=1}^{n}$ are generated from the quasi-likelihood model with unknown true η_{0}(*x*). Asymptotic properties of our final estimates are achieved in two steps: (1) establish asymptotic properties with a fixed parametric guide in Section 4.1 and (2) show that the same asymptotic properties apply to the case with an estimated parametric guide in Section 4.2.

The asymptotic properties for local polynomial estimator are different for *x*_{0} lying in the interior of supp(*f*) from for *x*_{0} lying near the boundary Suppose that *K* is supported on [−1, 1]. Then the support of *K _{h}*(

$${K}_{r,p}(z;\mathcal{A})=r!\{|{\mathbf{M}}_{r,p}(z;\mathcal{A})|/|{\mathbf{N}}_{p}(\mathcal{A})|\}K(z).$$

(4.1)

When [−1, 1] , we will suppress and simply write ν_{l}, **N**_{p}, **M**_{r,p,} and *K _{r,p}*. It can be shown that (−1)

$${\sigma}_{r,s,p}^{2}({x}_{0};K,\mathcal{A})=\text{var}(Y|X={x}_{0})g\prime {(\mu ({x}_{0}))}^{2}f{({x}_{0})}^{-1}{\displaystyle {\int}_{\mathcal{A}}{K}_{r,p}(z;\mathcal{A}){K}_{s,p}(z;\mathcal{A})dz.}$$

Recall that our parametrically guided nonparametric estimators are achieved by maximizing *Q _{m}*(

For multiplicative correction with a fixed guide η(·, **α**), asymptotic properties of the corresponding estimator _{m}(*x*_{0}; *p, h*, **α**) are given by Theorems 1 and 2.

**Theorem 1.** *Let p* > 0 *be odd and suppose that Conditions (A1)–(A5) stated in Appendix are satisfied. Assume that h = h _{n}* → 0,

$$\frac{\sqrt{nh}}{{\sigma}_{0,0,p}({x}_{0};K)}[{\widehat{\eta}}_{m}({x}_{0};p,h,\mathbf{\alpha})-{\eta}_{0}({x}_{0})-Bia{s}_{m,o}]\stackrel{\mathcal{D}}{\to}N(0,1),$$

(4.2)

*where*

$$Bia{s}_{m,o}=\frac{\eta ({x}_{0},\mathbf{\alpha})}{(p+1)!}{(\frac{{\eta}_{0}(\cdot )}{\eta (\cdot ,\mathbf{\alpha})})}^{(p+1)}({x}_{0}){h}^{p+1}\left({\displaystyle \int {z}^{p+1}{K}_{0,p}(z)dz}\right)\phantom{\rule{thinmathspace}{0ex}}\{1+O(h)\}$$

*and the subscripts m and o stand for multiplicative correction and odd p. If x* _{0} = *x _{n} is of the form x*

**Theorem 2.** *Let p* > 0 *be even and suppose that Conditions (A1)–(A5) stated in Appendix are satisfied. Assume that h = h _{n} →* 0,

$$\frac{\sqrt{nh}}{{\sigma}_{0,0,p}({x}_{0};K)}({\widehat{\eta}}_{m}({x}_{0};p,h,\mathbf{\alpha})-{\eta}_{0}({x}_{0})-Bia{s}_{m,e})\stackrel{\mathcal{D}}{\to}N(0,1),$$

(4.3)

*where*

$$Bia{s}_{m,e}=\{{\displaystyle \int {z}^{p+2}{K}_{0,p}(z)dz\frac{1}{(p+2)!}{(\frac{{\eta}_{0}(\cdot )}{\eta (\cdot ,\mathbf{\alpha})})}^{(p+2)}({x}_{0})\eta ({x}_{0},\mathbf{\alpha})}+\frac{1}{(p+1)!}{(\frac{{\eta}_{0}(\cdot )}{\eta (\cdot ,\mathbf{\alpha})})}^{(p+1)}({x}_{0})\frac{(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)\prime ({x}_{0})}{(\rho \eta (\cdot ,\mathbf{\alpha})f)({x}_{0})}{\displaystyle \int {z}^{p+2}{K}_{0,p}(z)dz\left\}{h}^{p+2}\right\{1+O(h)\}.}$$

*If x*_{0} = *x _{n} is of the form*

**Remark 1.** Note that we use η(*x*_{0}, **α**) in the denominator for the multiplicative correction. This poses difficulty handling any zero point of η(·, **α**), *i.e., x*_{0} satisfying η(*x*_{0}, **α**) = 0. These zero points are ruled out in Theorems 1 and 2. Similar observation was made by Hjort and Glad (1995). However, this difficulty does not occur in our limited numerical experiments.

**Remark 2**. To simplify our presentation, we only state the asymptotic properties of the estimator for the original function η_{0}(·). However our method can also estimate its high order derivatives, for which the asymptotic properties can be found in Propositions 1 and 2 in Appendix.

Our next two theorems give the asymptotic properties of the estimator _{a}(*x*_{0}; *p, h*, **α**) using a fixed guide η(·, **α**).

**Theorem 3.**. *Let p* > 0 *be odd and suppose that Conditions (A1)–(A5) stated in Appendix are satisfied. Assume that h = h _{n} →* 0,

$$\frac{\sqrt{nh}}{{\sigma}_{0,0,p}({x}_{0};K)}[{\widehat{\eta}}_{a}({x}_{0};p,h,\mathbf{\alpha})-{\eta}_{0}({x}_{0})-Bia{s}_{a,o}]\stackrel{\mathcal{D}}{\to}N(0,1)$$

(4.4)

*where*

$$Bia{s}_{a,o}=\{{\displaystyle \int {z}^{p+1}{k}_{j,p}(z)dz\}\{\frac{{\eta}_{0}^{(p+1)}({x}_{0})-{\eta}^{(p+1)}(x,\widehat{\mathbf{\alpha}})}{(p+1)!}\}{h}^{p+1}\{1+O(h)\}.}$$

*If x*_{0} = *x _{n} is of the form x*

**Theorem 4.** *Let p* > 0 *be even and suppose that Conditions (A1)–(A5) stated in Appendix are satisfied. Assume that h = h _{n} → 0, nh*

$$\frac{\sqrt{nh}}{{\sigma}_{0,0,p}({x}_{0};K)}({\widehat{\eta}}_{a}({x}_{0};p,h,\mathbf{\alpha})-{\eta}_{0}({x}_{0})-Bia{s}_{a,e})\stackrel{\mathcal{D}}{\to}N(0,1)$$

(4.5)

*where*

$$Bia{s}_{a,e}={h}^{p+2}[\{{\displaystyle \int {z}^{p+2}{K}_{0,p}(z)dz\}\{\frac{{\eta}_{0}^{(p+2)}({x}_{0})-{\eta}^{(p+2)}({x}_{0},\mathbf{\alpha})}{(p+2)!}\}}+\{{\displaystyle \int {z}^{p+2}{K}_{0,p}(z)dz\}\{\frac{({\eta}_{0}^{(p+1)}({x}_{0})-{\eta}^{(p+1)}({x}_{0},\mathbf{\alpha}))(\rho f)\prime ({x}_{0})}{(\rho f)({x}_{0})(p+1)!}\}]\{1+O(h)\}.}$$

*If x*_{0}= *x _{n} is of the form x*

Note that our proposed estimation schemes use the best parametric fit η(*x*, ) estimated based on our data instead of a fixed guide η(*x*, **α**). Compared to the simpler case with a fixed guide, parameter estimation variability now influences the asymptotic result. However, we shall show below that asymptotically there is no precision loss caused by the additional estimation step.

Clearly the parametric family used in the first step of our estimation schemes is most likely an incorrect specification. Consequently, the first-stage parametric estimator is a maximum quasi-likelihood estimator with a misspecified model. As in Hurvich and Tsai (1995), denote the assumed parametric joint density of (*X _{i}, Y_{i}*) and the corresponding true unknown joint density by

$${\mathbf{\alpha}}_{0}\triangleq \underset{\mathbf{\alpha}\in \mathbb{A}}{\text{argmin}}\phantom{\rule{thinmathspace}{0ex}}E\phantom{\rule{thinmathspace}{0ex}}\text{log}(\frac{{h}_{0}(X,Y)}{h(X,Y;\mathbf{\alpha})})=\underset{\mathbf{\alpha}\in \mathbb{A}}{\text{argmin}}{\displaystyle \int {\displaystyle \int (Q({g}^{-1}({\eta}_{0}(x)),y)-Q({g}^{-1}(\eta (x,\mathbf{\alpha})),y)){h}_{0}(x,y)dydx,}}$$

where the expectation *E* is taken with respect to the unknown true density.

To proceed, we make regularity assumptions (B1)–(B5) given in Appendix to assure that the pseudo maximum quasi-likelihood estimator is root-*n* consistent of **α**_{0}, *i.e*., $\sqrt{n}(\widehat{\mathbf{\alpha}}-{\mathbf{\alpha}}_{0})={O}_{p}(1)$ (see White, 1982).

**Theorem 5.** *Under additional Conditions (B1)–(B5), the asymptotic results, with α replaced by α_{0}, of Theorems 1–4 continue to hold when an estimated fit η(x, ) is used*.

While optimizing (2.3), (3.2), and (3.3), we need to tune the corresponding smoothing bandwidths. In this work, we will use the pre-asymptotic bandwidth selection method introduced in Fan et al. (1998), which is based on the bias-variance tradeoff.

Without loss of generality, we use (3.3) to demonstrate the idea. It will include (2.3) as a special case by setting η(*x*, ) to be a constant function and can be analogously extended to handle (3.2). In the remaining of this section we denote $\widehat{\mathit{\beta}}=\widehat{\mathit{\beta}}({x}_{0},\widehat{\mathbf{\alpha}})=\underset{\mathit{\beta}}{\text{arg max}}{Q}_{a}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}})$. The bias of the estimate comes from the approximation error in the Taylor expansion. Denote by

$$r({X}_{i})=\eta ({X}_{i})-\eta ({X}_{i},\widehat{\mathbf{\alpha}})-{\displaystyle \sum _{j=0}^{p}({\eta}^{(j)}({x}_{0})-{\eta}^{(j)}({x}_{0},\widehat{\mathbf{\alpha}})){({X}_{i}-{x}_{0})}^{j}/j!}$$

the approximation error at *X _{i}*. Suppose that the (

$$r({X}_{i})\approx {\displaystyle \sum _{j=1}^{a}[{\eta}^{(p+j)}({x}_{0})-{\eta}^{(p+j)}({x}_{0},\widehat{\mathbf{\alpha}})]{({X}_{i}-{x}_{0})}^{p+j}/(p+j)!\triangleq {r}_{i}.}$$

Here the choice of *a*, the approximation order, will affect the performance of the estimated bias. Practically it can be chosen as *a* = 1 or 2.

Now pretend that the approximated approximation errors *r _{i}* are known. A more accurate local quasi-loglikelihood is

$${Q}_{a}^{*}(\mathit{\beta})={Q}_{a}^{*}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}})={\displaystyle \sum _{i=1}^{n}Q({g}^{-1}(\eta ({X}_{i},\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta}+{r}_{i}),{Y}_{i}){K}_{h}({X}_{i}-{x}_{0}).}$$

The maximizer of the local quasi-loglikelihood ${Q}_{a}^{*}(\mathit{\beta})$ is denoted by * = * (*x*_{0}, ). Define

$$\frac{\partial}{\partial \mathit{\beta}}{Q}_{a}^{*}(\mathit{\beta})={\displaystyle \sum _{i=1}^{n}\frac{{Y}_{i}-{g}^{-1}(\eta ({X}_{i}\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta}+{r}_{i})}{V({g}^{-1}(\eta ({X}_{i},\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta}+{r}_{i}))}}\phantom{\rule{thinmathspace}{0ex}}\times ({g}^{-1})\prime (\eta ({X}_{i},\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta}+{r}_{i}){\mathbf{X}}_{i}{K}_{h}({X}_{i}-{x}_{0})$$

and similarly $\frac{{\partial}^{2}}{\partial \mathit{\beta}\partial {\mathit{\beta}}^{T}}{Q}_{a}^{*}(\mathit{\beta})$ to denote the gradient vector and Hessian matrix of the local quasi-likelihood ${Q}_{a}^{*}$ respectively. Applying Taylor expansion to ${Q}_{a}^{*}(\mathit{\beta})$ around (*x*_{0}, ), we get

$$0={Q}_{a}^{*\prime}({\widehat{\mathit{\beta}}}^{*})\approx {Q}_{a}^{*\prime}(\widehat{\mathit{\beta}})+{Q}_{a}^{*\u2033}(\widehat{\mathit{\beta}})({\widehat{\mathit{\beta}}}^{*}-\widehat{\mathit{\beta}}),$$

which implies the following approximation of the estimation bias

$$\widehat{\mathit{\beta}}({x}_{0},\widehat{\mathbf{\alpha}})-{\widehat{\mathit{\beta}}}^{*}({x}_{0},\widehat{\mathbf{\alpha}})\approx {({Q}_{a}^{*\u2033}(\widehat{\mathit{\beta}}))}^{-1}{Q}_{a}^{*\prime}(\widehat{\mathit{\beta}}).$$

Next we try to access the variance of the estimate . To obtain variance, note that

$$0={Q}_{a}^{\prime}(\widehat{\mathit{\beta}})\approx {Q}_{a}^{\prime}({\mathit{\beta}}^{0})+{Q}_{a}^{\u2033}({\mathit{\beta}}^{0})(\widehat{\mathit{\beta}}-{\mathit{\beta}}^{0}),$$

where ${\mathit{\beta}}^{0}={({\mathit{\beta}}_{0}^{0},{\mathit{\beta}}_{1}^{0},\cdots ,{\mathit{\beta}}_{p}^{0})}^{T}$ and ${\mathit{\beta}}_{j}^{0}=({\eta}^{(j)}({x}_{0})-{\eta}^{(j)}({x}_{0},\widehat{\mathbf{\alpha}}))/(j!)$ for *j* = 0,1, , *p*. This implies that

$$\widehat{\mathit{\beta}}-\mathit{\beta}0\approx -{Q}_{a}^{\u2033}{({\mathit{\beta}}^{0})}^{-1}{Q}_{a}^{\prime}({\mathit{\beta}}^{0})$$

and an approximation for the conditional variance is given by

$$\text{var}(\widehat{\mathit{\beta}}|\mathbb{X})\approx {Q}_{a}^{\u2033}{({\mathit{\beta}}^{0})}^{-1}\phantom{\rule{thinmathspace}{0ex}}\text{var}({Q}_{a}^{\u2033}({\mathit{\beta}}^{0})|\mathbb{X}){Q}_{a}^{\u2033}{({\mathit{\beta}}^{0})}^{-1}.$$

Here the Hessian matrix can be approximated by ${Q}_{a}^{\u2033}(\widehat{\mathit{\beta}})$ and the variance term can be approximated as follows

$$\begin{array}{lll}\text{var}({Q}_{a}^{\prime}({\mathit{\beta}}_{0})|\mathbb{X})\hfill & =\hfill & {\displaystyle \sum _{i=1}^{n}\text{var}{(\frac{\partial}{\partial \mathit{\beta}}Q({g}^{-1}(\eta ({X}_{i}\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta}),{Y}_{i})|{\mathbf{X}}_{i})}_{\mathit{\beta}={\mathit{\beta}}^{0}}{K}_{h}^{2}({X}_{i}-{x}_{0})}\hfill \\ \hfill & \hfill & {\displaystyle \sum _{i=1}^{n}{\xi}_{i}{\mathbf{X}}_{i}{\mathbf{X}}_{i}^{T}{K}_{h}^{2}({X}_{i}-{x}_{0}),}\hfill \end{array}$$

where

$${\xi}_{i}=\text{var}{[\frac{{Y}_{i}-{g}^{-1}(\eta ({X}_{i},\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta})}{V({g}^{-1}(\eta ({X}_{i}\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta}))}({g}^{-1})\prime (\eta ({X}_{i},\widehat{\mathbf{\alpha}})-\eta ({x}_{0},\widehat{\mathbf{\alpha}})+{\mathbf{X}}_{i}^{T}\mathit{\beta})|{\mathbf{X}}_{i}]}_{\mathit{\beta}={\mathit{\beta}}^{0}}.$$

Note that *X _{i}* has significant weight only in a neighborhood around

$$\text{var}({Q}_{a}^{\prime}({\mathit{\beta}}_{0})|\mathbb{X})\phantom{\rule{thinmathspace}{0ex}}\approx \phantom{\rule{thinmathspace}{0ex}}\frac{{[({g}^{-1})\prime (\eta ({x}_{0}))]}^{2}}{V({g}^{-1}(\eta ({x}_{0})))}{S}_{n},$$

where

$${S}_{n}={\displaystyle \sum _{i=1}^{n}{\mathbf{X}}_{i}{\mathbf{X}}_{i}^{T}{K}_{h}^{2}({X}_{i}-{x}_{0}).}$$

Combining the above results, we get

$$\text{var}(\widehat{\mathit{\beta}}|\mathbb{X})\approx \frac{{[({g}^{-1}(\eta ({x}_{0}))]}^{2}}{V({g}^{-1}(\eta ({x}_{0})))}{Q}_{a}^{\u2033}{({\mathit{\beta}}^{0})}^{-1}{S}_{n}{Q}_{a}^{\u2033}{({\mathit{\beta}}^{0})}^{-1},$$

where the unknown η(*x*_{0}) and **β**^{0} can be replaced by their estimates _{a}(*x*_{0}) and , respectively.

Based on the above arguments, we first select a pilot bandwidth ${\u0125}_{p+a+1,p+a}^{*}$ which can be chosen using the Extended Residual Squares Criterion (ERSC) (see Equation (5.6) of Fan et al., 1998). Next we fit a local polynomial with degree *p* + *a* + 1 and bandwidth ${\u0125}_{p+a+1,p+a}^{*}$ to get an estimate ^{(p+a)} = (_{0}, _{1}, , _{p+a})^{T} via maximizing quasi-loglikelihood function (3.3). Using ^{(p+a)}, we get the approximation error *r _{i}* and hence the estimated bias

$${\widehat{\text{MSE}}}_{p,0}({x}_{0};h)={\mathrm{B\u0302}}_{p,0}^{2}({x}_{0};h)+{\mathrm{V\u0302}}_{p,0}({x}_{0};h),$$

which leads to our final bandwidth selector

$${\widehat{h}}_{p,0}=\text{arg}{\text{min}}_{h}{\displaystyle \int {\widehat{\text{MSE}}}_{p,0}(x;h)dx.}$$

In this section, we use simulations to illustrate the improvement of our new proposed estimators, by comparing them with the original nonparametric method. For simulations in this section and real data analysis in next section, we use the corresponding canonical link and local linear fitting by setting *p* = 1. To access the bias term in the pre-asymptotic bandwidth selection as discussed in Section 5, we choose the order of the approximation to Taylor expansion error to be *a* = 2. In simulation studies, we first generate ten independent data sets, on which the pre-asymptotic bandwidth selector based on a grid search is applied. We set our final selected bandwidth to be the median of the obtained ten bandwidths and it is fixed and used in our simulation. Different methods with their corresponding final selected bandwidths are applied to another 1000 independent data sets. Mean and standard deviation (in parenthesis) of root average squared error (RASE) over these 1000 independent samples are reported. Epanechnikov kernel is used in all of our numerical examples.

Each observation pair (*X*, *Y*) in this example is generated in two steps: (1) the predictor variable *X* is marginally uniformly distributed over [−2, 2]; (2) given *X* = *x*, the response *Y* is generated from Poisson distribution with mean exp(η_{0}(*x*)), where ${\eta}_{0}(x)=3\text{sin}(\frac{\pi}{4}x-\frac{\pi}{2})+6$. Each sample consists of 100 *i.i.d*.pairs of observations. We estimate η_{0}(·) over 100 uniform grid points ${\{{x}_{j}\}}_{j=1}^{100}$ on the interval [−2, 2]. For an estimate (·), we define RASE as $\sqrt{{\mathbf{\Sigma}}_{i=1}^{100}{(\widehat{\eta}({x}_{i})-{\eta}_{0}({x}_{i}))}^{2}/100}$. The RASEs for different methods and different parametric guides are reported in Table 1.

In this example, we consider Bernoulli distribution. The predictor variable *X* is generated from Uniform[−1, 1]. Conditioning on *X* = *x*, the response *Y* is generated from Bernoulli distribution with success probability exp(η_{0}(*x*))/(1 + exp(η_{0}(*x*))), where η_{0}(*x*) = 2sin(π*x*). In this case, we consider samples of size 500 due to two reasons: (1) the estimation of Bernoulli success probability is harder than the case of Poisson, (2) the use of a full sinusoid true η_{0}(*x*) makes it even harder. Function η_{0}(·) is estimated over a uniform grid with 100 points over [−1, 1]. The RASEs for different methods and different parametric guides are reported in Table 2.

In Tables 1 and and2,2, the first column summarizes the result of the original nonparametric method; the second and third columns correspond to our parametrically guided nonparametric method with additive and multiplicative corrections, respectively. For our parametrically guided methods, each row corresponds to the parametric guide specified in the fourth column with the corresponding parametric RASE given in the last column.

Our new proposed parametrically guided nonparametric estimation schemes give smaller RASEs comparing to the original nonparametric method, especially when the parametric family is correctly specified. The only exception is the case of linear guide α_{1} + α_{2}*x* in Example 6.2. We can resort to our theoretical results to understand this exception. As we use local linear fitting, theoretically asymptotic bias depends on the second order derivative of η_{0}(·) − η(·, **α**_{0}) + η(*x*_{0}, **α**_{0}) and η_{0}(·)η(*x*_{0}, **α**_{0})/η(·, **α**_{0}) for additive and multiplicative corrections, respectively. A linear guide cannot reduce the second order derivative of η_{0}(·)−η(·, **α**_{0}) + η(*x*_{0}, **α**_{0}) and consequently does not reduce bias. However, a linear guide slightly reduces the second order derivative of η_{0}(·)η(*x*_{0}, **α**_{0})/η(·, **α**_{0}) and improves the corresponding performance. This is consistently evidenced by our numerical results.

For one random sample in Example 6.1, the best fit within the quadratic family is shown by the dotted line in panel (A) of Figure 1. The true unknown η_{0}(·), nonparametric estimate (·), two parametrically guided estimates _{a}(·) and _{m}(·) are given by the solid, dotted, dashed, and dot dashed lines, respectively, in Figure 2. From this, we can see that parametrically guided estimates improve the nonparametric counterpart around *x* = 0, where the curvature of η_{0}(·) is large and makes non-parametric estimation difficult.

In this section, we apply our new proposed parametrically guide nonparametric estimation schemes to the Financial Aid Award Data, provided by National Longitudinal Survey of the High School Class of 1972. The data set is available online and interested readers may find more information about this data set at http://www.oswego.edu/~kane/econometrics/finaid.htm. There are twenty variables. We are interested in using SAT score (*X*) to predict whether a student received financial aid grants. There are 3076 students in total with SAT score between 600 and 1300. Out of these 3076 students, 916 students received some financial aid grants. The binary response *Y* is coded in this way: *Y* = 1 means that this student received financial aid grants and *Y* = 0 otherwise. Our parametrically guided logistic regression is applied with a cubic guide. The pre-asymptotic bandwidth selector gives bandwidths 258.4615, 296.1538, and 296.1538 for the nonparametric, parametrically guided additive, and parametrically guided multiplicative methods, respectively. Result is summarized in Figure 3. Nonparametric estimate of the log odds ratio log $\text{log}\frac{P(Y=1|X=x)}{P(Y=0|X=x)}$ is given by the dot dashed line; cubic parametric estimate of the log odds ratio is given by the solid line; our parametrically guided estimates are given by the dashed and dotted lines for additive and multiplicative methods, respectively.

Plots of the nonparametric estimate, cubic estimate, and our parametrically guided estimates of the log odds ratio function for the Financial Aid Award Data.

We observe that our parametrically guided additive and multiplicative estimates follow the cubic fit very closely. This suggests that there is no model specification bias by using a cubic model. However the nonparametric estimate differs from the cubic fit for lower SAT score.

In this work, we extend the methodology of parametrically guided nonparametric estimation to GLMs and QLM. Asymptotical properties and numerical evidence demonstrate its improvement over the original nonparametric estimation scheme. There are possible extensions. For example the whole estimation scheme can be easily extended to varying-coefficient GLMs and QLM.

Let *q _{i}*(

The following technical conditions are imposed.

- (A1)The function
*q*_{2}(*x, y*) < 0 for*x*and*y*in the range of the response variable. - (A2)The functions $f\prime ,{\eta}_{0}^{(p+2)},\frac{{\partial}^{p+2}}{\partial {x}^{p+2}}\eta (x,\mathbf{\alpha})$ , var(
*Y*|*X*= ·),*V*″, and*g*are continuous. - (A3)For each
*x*supp(*f*),ρ(*x*), var(*Y|X*= ·), and*g*′( μ (*x*)) are nonzero. - (A4)The kernel
*K*is a symmetric probability density with support [−1, 1]. - (A5)For each point
*x*_{δ}on the boundary of supp(*f*), there exists an interval containing*x*_{δ}having nonnull interior such that inf_{x}*f*(*x*) > 0.

White (1982)-type conditions:

- (B1)
*E*log(*h*_{0}(*x,y*)) exists and there exists a*m*_{1}(*x,y*) such that |log(*h*(*x,y*;**α**))| ≤*m*_{1}(*x,y*) for any**α**and*Em*_{1}(*x,y*) < ∞. - (B2)
*E*(log(*h*_{0}(*x,y*)/*h*(*x,y*;**α**))) has a unique minimizer**α**_{0}. - (B3)$\frac{\partial}{\partial {\alpha}_{j}}\text{log}\phantom{\rule{thinmathspace}{0ex}}h(x,y;\mathbf{\alpha})$ is continuously differentiable in
**α**for*j*= 1, 2, ,*q*. - (B4)There exist
*m*_{1}(*x,y*) and*m*_{2}(*x,y*) such that $|\frac{\partial}{\partial {\alpha}_{i}}\text{log}\phantom{\rule{thinmathspace}{0ex}}h(x,y;\mathbf{\alpha})\frac{\partial}{\partial {\alpha}_{j}}\text{log}\phantom{\rule{thinmathspace}{0ex}}h(x,y;\mathbf{\alpha})|\le {m}_{2}(x,y)$ and $|\frac{{\partial}^{2}}{\partial {\alpha}_{i}\partial {\alpha}_{j}}\text{log}\phantom{\rule{thinmathspace}{0ex}}h(x,y;\mathbf{\alpha})|\le {m}_{3}(x,y)$ for any**α**, 1 ≤*i, j*≤*q*. Furthermore, both*Em*_{2}(*X,Y*) and*Em*_{3}(*X,Y*) exist. - (B5)Assume that
**α**_{0}is an interior point of ; the matrix ${\left(E\frac{\partial}{\partial {\alpha}_{i}}\text{log}\phantom{\rule{thinmathspace}{0ex}}h(x,y;\mathbf{\alpha})\frac{\partial}{\partial {\alpha}_{j}}\text{log}\phantom{\rule{thinmathspace}{0ex}}h(x,y;\mathbf{\alpha})\right)}_{1\le i,j\le q}$ is nonsingular at**α**_{0};**α**_{0}is a regular point of matrix ${\left(E\frac{{\partial}^{2}}{\partial {\alpha}_{i}\partial {\alpha}_{j}}\text{log}\phantom{\rule{thinmathspace}{0ex}}h(x,y;\mathbf{\alpha})\right)}_{1\le i,j\le q}$

.

In the following, we provide detailed proofs for Theorems 1 and 2. To save space, we skip the proofs for Theorems 3 and 4, which are similar and even simpler.

For the case of multiplicative correction with a fixed guide η(*x*, **α**), denote $\widehat{\mathit{\beta}}=\widehat{\mathit{\beta}}({x}_{0},\widehat{\mathbf{\alpha}})=\underset{\mathit{\beta}}{\text{argmax}}{Q}_{m}(\mathit{\beta};h,{x}_{0},\mathbf{\alpha})$. Because is calculated using *X _{i}* near

$$\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\{{\mathit{\beta}}_{0}+\cdots +{\mathit{\beta}}_{p}{({X}_{i}-{x}_{0})}^{p}\}\approx {\eta}_{0}({x}_{0})+{\eta}_{0}^{\prime}({x}_{0})({X}_{i}-{x}_{0})+\cdots +{\eta}_{0}^{(p)}({x}_{0}){({X}_{i}-{x}_{0})}^{p}/p!.$$

Consequently we expect that _{0} → η_{0}(*x*_{0}) and

$${\widehat{\mathit{\beta}}}_{j}\to \eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(j)}({x}_{0})/j!\phantom{\rule{thinmathspace}{0ex}}\text{for}\phantom{\rule{thinmathspace}{0ex}}1\le j\le p.$$

We define ϕ* _{α}*(

$${\widehat{\mathit{\beta}}}^{*}={(nh)}^{1/2}{\left({\widehat{\mathit{\beta}}}_{0}-{\eta}_{0}({x}_{0}),{h}^{1}\{{\widehat{\mathit{\beta}}}_{1}-\eta ({x}_{0},\mathbf{\alpha}){\varphi}_{\mathbf{\alpha}}^{(1)}({x}_{0})\},\cdots ,{h}^{p}\{p!{\widehat{\mathit{\beta}}}_{p}-\eta ({x}_{0},\mathbf{\alpha}){\varphi}_{\mathbf{\alpha}}^{(p)}({x}_{0})\}\right)}^{T}$$

so that each component has the same rate of convergence. Let * Q_{p}*() and

$${\mathbf{\Gamma}}_{x}(\mathcal{A})=\frac{f(x)\text{var}(Y|X=x)}{{\{V(\mu (x))g\prime (\mu (x))\}}^{2}}\mathbf{D}{\mathbf{T}}_{p}(\mathcal{A})\mathbf{D},{\mathbf{\Lambda}}_{x}(\mathcal{A})=\frac{(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)\prime (x)}{{\eta}^{2}(x,\mathbf{\alpha})}\mathbf{D}{\mathbf{Q}}_{p}(\mathcal{A})\mathbf{D},$$

$${a}_{1,j}(\mathcal{A})=\frac{1}{(p+1)!}{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})\eta ({x}_{0},\mathbf{\alpha}){\displaystyle {\int}_{\mathcal{A}}{z}^{p+1}{K}_{j-1,p}(z;\mathcal{A})dz,}$$

and

$${a}_{2,j}(\mathcal{A})=\eta ({x}_{0},\mathbf{\alpha})\frac{1}{(p+2)!}{\varphi}_{\mathbf{\alpha}}^{(p+2)}({x}_{0}){\displaystyle {\int}_{\mathcal{A}}{z}^{p+2}{K}_{j-1,p}(z;\mathcal{A})dz+\frac{{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)\prime ({x}_{0})}{(p+1)!(\rho \eta (\cdot ,\mathbf{\alpha})f)({x}_{0})}}\times \{{\displaystyle {\int}_{\mathcal{A}}{z}^{p+2}{K}_{j-1,p}(z;\mathcal{A})dz-(j-1)\times {\displaystyle {\int}_{\mathcal{A}}{z}^{p+1}{K}_{j-2,p}(z;\mathcal{A})dz}}-\frac{1}{p!}{\displaystyle {\int}_{\mathcal{A}}{z}^{p+1}{K}_{j-1,p}(z;\mathcal{A})dz{\displaystyle {\int}_{\mathcal{A}}{z}^{p+1}{K}_{p,p}(z;\mathcal{A})dz\}.}}$$

Let **b**_{x0}() be the (*p* + 1) × 1 vector having *j*th entry equal to$\sqrt{n{h}^{2p+3}}{a}_{1,j}(\mathcal{A})+\sqrt{n{h}^{2p+5}}{a}_{2,j}(\mathcal{A})$

**Main Theorem** *Suppose that Conditions (A1)–(A5) hold and that h = h _{n}* → 0,

$${\{{\mathbf{\Sigma}}_{{x}_{0}}{([-1,1])}^{-1}{\mathbf{\Gamma}}_{{x}_{0}}([-1,1]){\mathbf{\Sigma}}_{{x}_{0}}{([-1,1])}^{-1}\}}^{-1/2}\times \{\widehat{\mathit{\beta}}*-{\mathbf{b}}_{{x}_{0}}([-1,1])+o(\sqrt{n{h}^{2p+5}})\}\stackrel{\mathcal{D}}{\to}N(0,{\mathbf{I}}_{p+1})$$

*If x*_{0} = *x _{n}, is of the form x*

$${\{{\mathbf{\Sigma}}_{{x}_{0}}{({\mathcal{D}}_{{x}_{0},h})}^{-1}{\mathbf{\Gamma}}_{{x}_{0}}({\mathcal{D}}_{{x}_{0},h}){\mathbf{\Sigma}}_{{x}_{0}}{({\mathcal{D}}_{{x}_{0},h})}^{-1}\}}^{-1/2}\times \{\widehat{\mathit{\beta}}*-{\mathbf{b}}_{{x}_{0}}({\mathcal{D}}_{{x}_{0},h})+o(\sqrt{n{h}^{2p+5}})\}\stackrel{\mathcal{D}}{\to}N(0,{\mathbf{I}}_{p+1}).$$

The proof of the main theorem follows directly from Lemmas 1 and 2, which are stated and proved as follows. Denote

$${\mathbf{Z}}_{i}=(1,({X}_{i}-{x}_{0})/h,\cdots ,{({X}_{i}-{x}_{0})}^{p}/{({h}^{p}p!))}^{T}.$$

**Lemma 1.** *Let* $\overline{\eta}({x}_{0},x)=\eta ({x}_{0},\mathbf{\alpha}){\displaystyle \sum _{j=0}^{p}{\varphi}_{\mathbf{\alpha}}^{(j)}({x}_{0}){(x-{x}_{0})}^{j}/j!}$ and ${\mathbf{W}}_{n}={(nh)}^{-1/2}{\displaystyle \sum _{i=1}^{n}{\mathbf{Y}}_{i}^{*}}$ where

$${\mathbf{Y}}_{i}^{*}=\frac{\eta ({X}_{i}\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}{q}_{1}\left(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{i}),{Y}_{i}\right)K(\frac{{X}_{i}-{x}_{0}}{h}){\mathbf{Z}}_{i}.$$

*Then, under Conditions (A1)–(A5), nh*^{3} → ∞, *and h* → 0, *we have*

$${\widehat{\mathit{\beta}}}^{*}={\mathbf{\Sigma}}_{{x}_{0}}^{-1}{\mathbf{W}}_{n}-h{\mathbf{\Sigma}}_{{x}_{0}}^{-1}{\mathbf{\Lambda}}_{{x}_{0}}{\mathbf{\Sigma}}_{{x}_{0}}^{-1}{\mathbf{W}}_{n}+{o}_{P}(h).$$

*Proof* Recall that maximizes *Q _{m}*(

$$\mathit{\beta}*\equiv {(nh)}^{1/2}{\left({\beta}_{0}-\eta ({x}_{0}),{h}^{1}\{{\beta}_{1}-\eta ({x}_{0},\mathbf{\alpha}){\varphi}_{\mathbf{\alpha}}^{(1)}({x}_{0})\},\cdots ,{h}^{p}\{p!{\beta}_{p}-\eta ({x}_{0},\mathbf{\alpha}){\varphi}_{\mathbf{\alpha}}^{(p)}({x}_{0})\}\right)}^{T}.$$

Then

$$\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\{{\beta}_{0}+{\beta}_{1}({X}_{i}-{x}_{0})+\cdots +{\beta}_{p}{({X}_{i}-{x}_{0})}^{p}\}=\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\{\overline{\eta}({x}_{0},{X}_{i})+{a}_{n}{\mathit{\beta}}^{*T}{\mathrm{Z}}_{i}\},$$

where *a _{n}* = (

$$\sum _{i=1}^{n}Q({g}^{-1}(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\{\overline{\eta}({x}_{0},{X}_{i})+{a}_{n}{\mathit{\beta}}^{*T}{\mathbf{Z}}_{i}\}),{Y}_{i})K\phantom{\rule{thinmathspace}{0ex}}\left(\frac{{X}_{i}-{x}_{0}}{h}\right)$$

as a function of **β***. To study the asymptotic properties of * we apply the quadratic approximation lemma (Fan and Gijbels, 1995) to the maximization of the normalized function

$$\phantom{\rule{thinmathspace}{0ex}}{l}_{n}({\mathit{\beta}}^{*})={\displaystyle \sum _{i=1}^{n}\left\{Q({g}^{-1}(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}(\overline{\eta}({x}_{0},{X}_{i})+{a}_{n}{\mathit{\beta}}^{*T}{\mathrm{Z}}_{i})),{Y}_{i})-Q({g}^{-1}(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{i})),{Y}_{i})\right\}K\left(\frac{{X}_{i}-{x}_{0}}{h}\right).}$$

Then * maximize *l _{n}*. We remark that Condition (A1) implies that

$${l}_{n}(\mathit{\beta}*)={a}_{n}{\displaystyle \sum _{i=1}^{n}\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}{q}_{1}(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{i}),{Y}_{i}){\mathit{\beta}}^{*T}{\mathbf{Z}}_{i}K\{({X}_{i}-{x}_{0})/h\}}+\frac{{a}_{n}^{2}}{2}{\displaystyle \sum _{i=1}^{n}{(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}{q}_{2}(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{i}),{Y}_{i}){({\mathit{\beta}}^{*T}{\mathbf{Z}}_{i})}^{2}K\{({X}_{i}-{x}_{0})/h\}}+\frac{{a}_{n}^{3}}{6}{\displaystyle \sum _{i=1}^{n}{(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{3}{q}_{3}(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}{\eta}_{i},{Y}_{i}){({\mathit{\beta}}^{*T}{\mathbf{Z}}_{i})}^{3}K\{({X}_{i}-{x}_{0})/h\},}$$

(8.1)

where η_{i} is between (*x*_{0}, *X*_{i}) and (*x*_{0}, *X*_{i}) + *a _{n}*

$${\mathbf{A}}_{n}={a}_{n}^{2}{\displaystyle \sum _{i=1}^{n}{(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}{q}_{2}(\frac{\eta ({X}_{i},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{i}),{Y}_{i})K\{({X}_{i}-{x}_{0})/h\}{\mathbf{Z}}_{i}{\mathbf{Z}}_{i}^{T}.}$$

Then the second term in (8.1) equals $\frac{1}{2}\mathit{\beta}{*}^{T}{\mathbf{A}}_{n}\mathit{\beta}*$ Now ${({\mathbf{A}}_{n})}_{ij}={(E{\mathbf{A}}_{n})}_{ij}+{O}_{P}(\sqrt{\text{var}({({\mathbf{A}}_{n})}_{ij}))}$ and

$$E{\mathbf{A}}_{n}={h}^{-1}E\{{(\frac{\eta ({X}_{1},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}{q}_{2}(\frac{\eta ({X}_{1},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{1}),\mu ({X}_{1}))K\{({X}_{1}-{x}_{0})/h\left\}{\mathbf{Z}}_{1}{\mathbf{Z}}_{1}^{T}\right\}$$

since *q*_{2} is linear in *y* for fixed *x*. Because supp(*K*) = [−1, 1], we need only consider |*X*_{1} − *x*_{0}| ≤ *h*, and thus

$$\frac{\eta ({X}_{1},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{1})-{\eta}_{0}({X}_{1})\phantom{\rule{thinmathspace}{0ex}}=-\eta ({X}_{1},\mathbf{\alpha})\{\frac{1}{(p+1)!}{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0}){({X}_{1}-{x}_{0})}^{p+1}+\frac{1}{(p+2)!}{\varphi}_{\mathbf{\alpha}}^{(p+2)}({x}_{0}){({X}_{1}-{x}_{0})}^{p+2}\}+o({h}^{p+2}).$$

Then

$$\begin{array}{ll}\hfill & (i-1)!(j-1)!{(E{\mathbf{A}}_{n})}_{ij}\hfill \\ =\hfill & {h}^{-1}E\{{(\frac{\eta ({X}_{1},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}{q}_{2}(\frac{\eta ({X}_{1},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{1}),\mu ({X}_{1}))K\{({X}_{1}-{x}_{0})/h\}{({X}_{1}-{x}_{0})}^{i+j-2}/({h}^{i+j-2})\}\hfill \\ =\hfill & {\displaystyle \int}{(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}{q}_{2}(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{x}_{0}+hZ),\mu ({x}_{0}+hZ))K(Z){Z}^{i+j-2}f({x}_{0}+hZ)dZ\hfill \\ =\hfill & {\displaystyle \int}{(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}{q}_{2}({\eta}_{0}({x}_{0}+hZ)+o({h}^{p}),\mu ({x}_{0}+hZ))K(Z){Z}^{i+j-2}f({x}_{0}+hZ)dZ\hfill \\ =\hfill & {\displaystyle \int}{(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}[{q}_{2}({\eta}_{0}({x}_{0}+hZ),\mu ({x}_{0}+hZ))+o({h}^{p})]K(Z){Z}^{i+j-2}f({x}_{0}+hZ)dZ\hfill \\ =\hfill & {\displaystyle \int}-{(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})})}^{2}\rho ({x}_{0}+hZ)f({x}_{0}+hZ)K(Z){Z}^{i+j-2}dZ+o(h)\hfill \\ =\hfill & -(\rho f)({x}_{0}){\nu}_{i+j-2}-h\frac{(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)\prime}{{\eta}^{2}({x}_{0},\mathbf{\alpha})}{\nu}_{i+j-1}+o(h).\hfill \end{array}$$

Similar arguments show that var{(**A**_{n})_{ij}} = *O*{(*nh*)^{−1}} and that the last term in (8.1) is *O _{p}*{(

**Lemma 2.** *Suppose that the conditions of Theorem 1 hold. For* **W**_{n} *as defined in Lemma 1*,

$$\begin{array}{c}\{{\mathbf{\Sigma}}_{{x}_{0}}^{-1}-h{\mathbf{\Sigma}}_{{x}_{0}}^{-1}{\mathbf{\Lambda}}_{{x}_{0}}{\mathbf{\Sigma}}_{{x}_{0}}^{-1}\}E({\mathbf{W}}_{n})={\mathbf{b}}_{{x}_{0}}+o\left\{{(n{h}^{2p+5})}^{1/2}\right\},\\ {\mathbf{\Gamma}}_{{x}_{0}}^{-1/2}\text{cov}({\mathrm{W}}_{n}){\mathbf{\Gamma}}_{{x}_{0}}^{-1/2}\to {\mathbf{I}}_{p+1},\end{array}$$

*and*

$${\mathbf{\Gamma}}_{{x}_{0}}^{-1/2}({\mathbf{W}}_{n}-E{\mathbf{W}}_{n})\stackrel{\mathcal{D}}{\to}N(0,{\mathbf{I}}_{p+1}).$$

*Proof*. We compute the mean and covariance matrix of the random vector **W**_{n} by studying ${\mathbf{Y}}_{1}^{*}$ as defined in Lemma 1. The mean of the *i*th component of ${\mathbf{Y}}_{1}^{*}$ is easily shown to be

$${(E{\mathbf{Y}}_{1}^{*})}_{i}=\frac{h}{(i-1)!}{\displaystyle \int \frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}{q}_{1}\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{x}_{0}+hZ),\mu ({x}_{0}+hZ)){Z}^{i-1}K(Z)f({x}_{0}+hZ)dZ.}$$

Now by Taylor expansion,

$$\begin{array}{ll}\hfill & {q}_{1}(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{x}_{0}+hZ),\mu ({x}_{0}+hZ))\hfill \\ =\hfill & \eta ({x}_{0}+hZ,\mathbf{\alpha})\{\frac{{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})}{(p+1)!}{(hZ)}^{p+1}+\frac{{\varphi}_{\mathbf{\alpha}}^{(p+2)}({x}_{0})}{(p+2)!}{(hZ)}^{p+2}+o({h}^{p+2})\}\rho ({x}_{0}+hZ)+o({h}^{p+2}).\hfill \end{array}$$

Thus

$${(E{\mathbf{Y}}_{1}^{*})}_{i}=\frac{(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)({x}_{0})}{\eta ({x}_{0},\mathbf{\alpha})(i-1)!}\left({h}^{p+2}\frac{{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})}{(p+1)!}{\nu}_{p+i}+{h}^{p+3}{\zeta}_{p}({x}_{0}){\nu}_{p+i+1}\right)+o({h}^{p+3}),$$

(8.2)

where

$${\zeta}_{p}({x}_{0})=\frac{1}{(p+2)!}{\varphi}_{\mathbf{\alpha}}^{(p+2)}({x}_{0})+\frac{{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)\prime ({x}_{0})}{(p+1)!(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)({x}_{0})}.$$

Note that

$$\begin{array}{lll}{\mathbf{\Sigma}}_{{x}_{0}}^{-1}E{\mathbf{W}}_{n}\hfill & =\hfill & {(\frac{n}{h})}^{1/2}\frac{1}{(\rho f)({x}_{0})}{D}^{-1}{N}^{-1}{D}^{-1}E{\mathbf{Y}}_{1}^{*}\hfill \\ \hfill & =\hfill & {(\frac{n}{h})}^{1/2}{D}^{-1}{N}^{-1}[{h}^{p+2}\frac{1}{(p+1)!}{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})\eta ({x}_{0},\mathbf{\alpha}){({\nu}_{p+1},{\nu}_{p+2},\cdots ,{\nu}_{2p+1})}^{T}\hfill \\ \hfill & \hfill & +{h}^{p+3}{\zeta}_{p}({x}_{0})\eta ({x}_{0},\mathbf{\alpha}){({\nu}_{p+2},{\nu}_{p+3},\cdots ,{\nu}_{2p+2})}^{T}].\hfill \end{array}$$

By Lemma 3 of Fan and Gijbels (1995), the *i*th component of ${\mathbf{\Sigma}}_{{x}_{0}}^{-1}E{\mathbf{W}}_{n}$ is

$$\begin{array}{cc}{({\mathbf{\Sigma}}_{{x}_{0}}^{-1}E{\mathbf{W}}_{n})}_{i}& ={(\frac{n}{h})}^{1/2}(i-1)!({\left\{{N}^{-1}\right\}}_{i,1},{\left\{{N}^{-1}\right\}}_{i,2},\cdots ,{\left\{{N}^{-1}\right\}}_{i,p+1})[{h}^{p+2}\frac{1}{(p+1)!}{\varphi}_{\mathbf{\alpha}a}^{(p+1)}({x}_{0})\eta ({x}_{0},\mathbf{\alpha}){({\nu}_{p+1},{\nu}_{p+2},\cdots ,{\nu}_{2p+1})}^{T}+{h}^{p+3}{\zeta}_{p}({x}_{0})\eta ({x}_{0},\mathbf{\alpha}){({\nu}_{p+2},{v}_{p+3},\cdots ,{\nu}_{2p+2})}^{T}]\\ \hfill & ={(n{h}^{2p+3})}^{1/2}\frac{1}{(p+1)!}{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})\eta ({x}_{0},\mathbf{\alpha}){\displaystyle \int {z}^{p+1}{K}_{i-1,p}(z)dz}+{(n{h}^{2p+5})}^{1/2}{\zeta}_{p}({x}_{0})\eta ({x}_{0},\mathbf{\alpha}){\displaystyle \int {z}^{p+2}{K}_{i-1,p}(z)dz+o\left\{{(n{h}^{2p+5})}^{1/2}\right\}.}\hfill \end{array}$$

Next, consider the second term in the expression

$$\begin{array}{cc}h{({\mathbf{\Sigma}}_{{x}_{0}}^{-1}{\mathbf{\Lambda}}_{{x}_{0}}{\mathbf{\Sigma}}_{{x}_{0}}^{-1}E{\mathbf{W}}_{n})}_{i}& ={(n{h}^{2p+5})}^{1/2}(i-1)!\frac{{\varphi}_{\mathbf{\alpha}}^{(p+1)}({x}_{0})(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)\prime ({x}_{0})}{(p+1)!(\rho \eta (\cdot ,\mathbf{\alpha})f)({x}_{0})}\times \phantom{\rule{thinmathspace}{0ex}}{\displaystyle \sum _{j=1}^{p+1}{({\mathbf{N}}_{p}^{-1}{\mathbf{Q}}_{p}{\mathbf{N}}_{p}^{-1})}_{ij}{\nu}_{p+j}+O\{{(n{h}^{2p+7})}^{1/2}\}.}\end{array}$$

Using the fact that (**Q**_{p})_{kl} = (**N**_{p})_{k,l+1} for *l* < *p* + 1, it can be shown that, for *i* = 2, , *p* + 1,

$${({\mathbf{N}}_{p}^{-1}{\mathbf{Q}}_{p}{\mathbf{N}}_{p}^{-1})}_{ij}={({\mathbf{N}}_{p}^{-1})}_{i-1,j}+\{{\displaystyle \sum _{k=1}^{p+1}{({\mathbf{N}}_{p}^{-1})}_{i,k}{\nu}_{p+k}\}{({\mathbf{N}}_{p}^{-1})}_{p+1,j}}$$

and for the similar reasoning,

$${({\mathbf{N}}_{p}^{-1}{\mathbf{Q}}_{p}{\mathbf{N}}_{p}^{-1})}_{1j}=\{{\displaystyle \sum _{k=1}^{p+1}{({\mathbf{N}}_{p}^{-1})}_{1,k}{\nu}_{p+k}\}{({\mathbf{N}}_{p}^{-1})}_{p+1,j}.}$$

So by Lemma 3 of Fan and Gijbels (1995),

$$(i-1)!{\displaystyle \sum _{j=1}^{p+1}{({\mathbf{N}}_{p}^{-1}{\mathbf{Q}}_{p}{\mathbf{N}}_{p}^{-1})}_{ij}{\nu}_{p+j}=(i-1){\displaystyle \int {z}^{p+1}{K}_{i-2,p}(z)dz+\frac{1}{p!}{\displaystyle \int {z}^{p+1}{K}_{p,p}(z)dz{\displaystyle \int {z}^{p+1}{K}_{i-1,p}(z)dz.}}}}$$

The statement concerning the asymptotic mean follows immediately. By (8.2), the covariance between the *i*th and *j*th component of ${\mathbf{Y}}_{1}^{*}$ is $E({({\mathbf{Y}}_{1}^{*})}_{i}{({\mathbf{Y}}_{1}^{*})}_{j})+O({h}^{2p+4})$. By a Taylor series expansion,

$$\begin{array}{lll}E({({\mathbf{Y}}_{1}^{*})}_{i}{({\mathbf{Y}}_{1}^{*})}_{j})\hfill & =\hfill & E[{\left(\frac{\eta ({X}_{1},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}{q}_{1}(\frac{\eta ({X}_{1},\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{X}_{1}),{Y}_{1})K\{({X}_{1}-{x}_{0})/h\}\right)}^{2}\frac{{\{({X}_{1}-{x}_{0})/h\}}^{i+j-2}}{(i-1)!(j-1)!}]\hfill \\ \hfill & =\hfill & {{\displaystyle \int \left(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}{q}_{1}(\frac{\eta ({x}_{0}+hZ,\mathbf{\alpha})}{\eta ({x}_{0},\mathbf{\alpha})}\overline{\eta}({x}_{0},{x}_{0}+hZ),{Y}_{1})K(Z)\right)}}^{2}\frac{{Z}^{i+j-2}f({x}_{0}+hZ)h}{(i-1)!(j-1)!}dZ\hfill \\ \hfill & =\hfill & {\displaystyle \int {({q}_{1}(\eta ({x}_{0}+hZ),{Y}_{1})K(Z))}^{2}\frac{{Z}^{i+j-2}f({x}_{0})h}{(i-1)!(j-1)!}dZ+o(h).}\hfill \end{array}$$

Noticing that

$${q}_{1}(\eta ({x}_{0}+hZ),{Y}_{1})\phantom{\rule{thinmathspace}{0ex}}=\phantom{\rule{thinmathspace}{0ex}}\frac{{Y}_{1}-{g}^{-1}(\eta ({x}_{0}+hZ))}{V({g}^{-1}(\eta ({x}_{0}+hZ)))}({g}^{-1})\prime (\eta ({x}_{0}+hz)),$$

we can derive

$${\{\text{cov}({\mathbf{Y}}_{1}^{*})\}}_{ij}=\frac{hf({x}_{0})var(Y|X={x}_{0})}{{[V(\mu ({x}_{0}))g\prime (\mu ({x}_{0}))]}^{2}}{\displaystyle \int \frac{{z}^{i+j-2}}{(i-1)!(j-1)!}{K}^{2}(Z)dZ+o(h).}$$

Therefore, ${\mathbf{\Gamma}}_{{x}_{0}}^{-1/2}\text{cov}({\mathbf{W}}_{n}){\mathbf{\Gamma}}_{{x}_{0}}^{-1/2}\to {\mathbf{I}}_{p+1}$. Now, we use the Cramer-Wold device to derive the asymptotic normality of **W**_{n}. For any unit vector **u** ^{p+1}, if

$${(n{a}_{n}^{2})}^{-1/2}{\mathbf{u}}^{T}\text{cov}{({\mathbf{Y}}_{1}^{*})}^{-1/2}({\mathbf{W}}_{n}-E{\mathbf{W}}_{n}){\to}_{D}N(0,1)$$

(8.3)

then ${{h}^{1/2}\text{cov}({\mathbf{Y}}_{1}^{*})}^{-1/2}({\mathbf{W}}_{n}-E{\mathbf{W}}_{n}){\to}_{D}N(0,{\mathbf{I}}_{p+1})$ , and so ${\mathbf{\Gamma}}_{{x}_{0}}^{-1/2}({\mathbf{W}}_{n}-E{\mathbf{W}}_{n}){\to}_{D}N(0,{\mathbf{I}}_{p+1})$. To prove (8.3). We only need to check Lyapounov’s condition for that sequence, which can be easily verified.

Noting that _{0} → η_{0}(*x*_{0}) and ${\widehat{\mathit{\beta}}}_{j}\to \eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(j)}({x}_{0})/j!$
for 1 ≤ *j* ≤ *p*, we can define the estimator of ${\eta}_{0}^{(j)}({x}_{0})$ iteratively as _{m,0} (*x*_{0}; *p, h*, **α**) _{m} (*x*_{0}; *p, h*, **α**) = _{0} and

$${\widehat{\eta}}_{m,j}({x}_{0};p,h)=j!{\widehat{\beta}}_{j}-\eta ({x}_{0},\mathbf{\alpha}){\displaystyle \sum _{i=0}^{j-1}{\widehat{\eta}}_{m,i}({x}_{0};p,h){(1/\eta )}^{(j-i)}({x}_{0},\mathbf{\alpha})\left(\begin{array}{c}j\\ i\end{array}\right)\phantom{\rule{thinmathspace}{0ex}}\text{for}\phantom{\rule{thinmathspace}{0ex}}1\le j\le p,}$$

where $\left(\begin{array}{c}j\\ i\end{array}\right)=j!/(i!(j-i)!)$. Simple algebra leads to

$$\begin{array}{ll}\hfill & {\widehat{\eta}}_{m,j}({x}_{0};p,h)-{\eta}_{0}^{(j)}({x}_{0})\hfill \\ =\hfill & j!{\widehat{\mathit{\beta}}}_{j}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(j)}({x}_{0})-\eta ({x}_{0},\mathbf{\alpha}){\displaystyle \sum _{i=0}^{j-1}\left({\widehat{\eta}}_{m,i}({x}_{0};p,h)-{\eta}_{0}^{(i)}({x}_{0})\right){(1/\eta )}^{(j-i)}({x}_{0},\mathbf{\alpha})\left(\begin{array}{c}j\\ i\end{array}\right).}\hfill \end{array}$$

Denote ${w}_{j}={\widehat{\eta}}_{m,j}({x}_{0};p,h)-{\eta}_{0}^{(j)}({x}_{0}),{\upsilon}_{j}=j!{\widehat{\mathit{\beta}}}_{j}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(j)}({x}_{0})$ and

$${\omega}_{i,j}=\left(\begin{array}{c}j\\ i\end{array}\right){(1/\eta )}^{(j-i)}({x}_{0},\mathbf{\alpha})\eta ({x}_{0},\mathbf{\alpha}).$$

Let **L** be a (*p*+1) × (*p*+1) matrix. For 0 ≤ *i, j* ≤ *p*, its (*i* + 1, *j* + 1)element, denoted by *L _{i,j}*, is defined as follows. Set

$${L}_{i,j}=-{\omega}_{j,i}+{\displaystyle \sum _{l=1}^{i-j-1}{(-1)}^{l+1}\phantom{\rule{thinmathspace}{0ex}}{\displaystyle \sum _{j<{k}_{1}<{k}_{2}<\cdots {k}_{l}<i}{\omega}_{j,{k}_{1}}{\omega}_{k1,{k}_{2}}\cdots {\omega}_{{k}_{l},i}}}$$

when *i* > *j*. Then ${w}_{j}={\upsilon}_{j}-{\displaystyle {\sum}_{i=0}^{j-1}{\omega}_{i,j}{w}_{i}={\displaystyle {\sum}_{i=0}^{j}{L}_{j,j-i}{\upsilon}_{j-i}={\displaystyle {\sum}_{i=0}^{j}{L}_{j,i}{\upsilon}_{i}}}}$

With the above notations, we have

$$\left[\begin{array}{c}{\widehat{\eta}}_{m,0}({x}_{0};p,h)-{\eta}_{0}^{(0)})({x}_{0})\\ {\widehat{\eta}}_{m,1}({x}_{0};p,h)-{\eta}_{0}^{(1)})({x}_{0})\\ \vdots \\ {\widehat{\eta}}_{m,p}({x}_{0};p,h)-{\eta}_{0}^{(p)})({x}_{0})\end{array}\right]=\mathbf{L}\times \left[\begin{array}{c}0!{\widehat{\mathit{\beta}}}_{0}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(0)}({x}_{0})\\ 1!{\widehat{\mathit{\beta}}}_{1}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(1)}({x}_{0})\\ \vdots \\ p!{\widehat{\mathit{\beta}}}_{p}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(p)}({x}_{0})\end{array}\right].$$

(8.4)

The above equation allows us to study the asymptotic bias and variance of ${\widehat{\eta}}_{m,j}({x}_{0};p,h)-{\eta}_{0}^{(j)}(x)$ using those of $i!{\widehat{\beta}}_{i}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(i)}({x}_{0})$
, where 0 ≤ *i* ≤ *p*.

**Proposition 1.** *Let p − j* > 0 *be odd and suppose that Conditions (A1)–(A5) stated in Appendix are satisfied. Assume that h = h _{n}* → 0,

$$\frac{\sqrt{n{h}^{2j+1}}}{{\sigma}_{j,j,p}({x}_{0};K)}\left(j!{\widehat{\beta}}^{(j)}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(j)}({x}_{0})-Bia{s}_{1}(j)\right)\stackrel{\mathcal{D}}{\to}N(0,1),$$

(8.5)

*where the bias term is given by*

$$Bia{s}_{1}(j)=\frac{{h}^{p-j+1}}{(p+1)!}{(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(p+1)}({x}_{0})\eta ({x}_{0},\mathbf{\alpha})\left({\displaystyle \int {z}^{p+1}{K}_{j,p}(z)dz}\right)\phantom{\rule{thinmathspace}{0ex}}\{1+O(h)\}.$$

*Based on (8.4), we get the asymptotic distribution result for our estimates* _{m,j} (*x*_{0}; *p, h*, **α**) *as follows*.

$$\frac{\sqrt{n{h}^{2j+1}}}{{\sigma}_{j,j,p}({x}_{0},K)}\left({\widehat{\eta}}_{m,j}({x}_{0};p,h,\mathbf{\alpha})-{\eta}_{0}^{(j)}({x}_{0})-Bia{s}_{1}(j)\right)\stackrel{\mathcal{D}}{\to}N(0,1).$$

(8.6)

*If x*_{0} = *x _{n} is of the form x*

**Proposition 2.**. *Let p − j* ≥ 0 *be even and p* > 0 *and suppose that Conditions (A1)–(A5) stated in Appendix are satisfied. Assume that h = h _{n}* → 0,

$$\frac{\sqrt{n{h}^{2j+1}}}{{\sigma}_{j,j,p}({x}_{0};K)}\left(j!{\widehat{\beta}}^{(j)}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(j)}({x}_{0})-Bia{s}_{2}(j)\right)\stackrel{\mathcal{D}}{\to}N(0,1),$$

(8.7)

*where the bias term is given by*

$$\begin{array}{ll}Bia{s}_{2}(j)\phantom{\rule{thinmathspace}{0ex}}=\hfill & \{{\displaystyle \int {z}^{p+2}{K}_{j,p}(z)dz\frac{\eta ({x}_{0},\mathbf{\alpha})}{(p+2)!}{(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(p+2)}({x}_{0})+({\displaystyle \int {z}^{p+2}{K}_{j,p}(z)dz-j{\displaystyle \int {z}^{p+1}{K}_{j-1,p}(z)dz)}}}\times \phantom{\rule{thinmathspace}{0ex}}\frac{1}{(p+1)!}{(\frac{{\eta}_{0}}{\eta (\cdot ,\mathbf{\alpha})})}^{(p+1)}({x}_{0})\frac{(\rho {\eta}^{2}(\cdot ,\mathbf{\alpha})f)\prime ({x}_{0})}{(\rho \eta (\cdot ,\mathbf{\alpha})f)({x}_{0})}\}{h}^{p-j+2}\{1+O(h)\}.\hfill \end{array}$$

*Based on (8.4), we get the asymptotic distribution result for our estimates* _{m,j}(*x*_{0}; *p, h*, **α**) *as follows*.

$$\frac{\sqrt{n{h}^{2j+1}}}{{\sigma}_{j,j,p}({x}_{0};K)}\left({\widehat{\eta}}_{m,j}({x}_{0};p,h)-{\eta}_{0}^{(j)}({x}_{0})-Bia{s}_{2}(j)\right)\stackrel{\mathcal{D}}{\to}N(0,1).$$

(8.8)

*If x*_{0} = *x _{n} is of the form x*

*Proof of Propositions 1 and 2*. The results (8.5) and (8.7) in Propositions 1 and 2 follow from the main theorem by reading off the marginal distributions of the components of *. To calculate the asymptotic variance, we calculate the (*r* + 1, *s* + 1) entry of *r*!*s*!**N**_{p} ()^{−1}**T**_{p}()**N**_{p}()^{−1} as

$$r!s!{\displaystyle \sum _{k=1}^{p+1}{\displaystyle \sum _{l=1}^{p+1}\frac{{c}_{r+1,k}{c}_{s+1,l}}{|{\mathbf{N}}_{p}(\mathcal{A}){|}^{2}}}}{\{{\mathbf{T}}_{p}(\mathcal{A})\}}_{kl}={\displaystyle {\int}_{\mathcal{A}}{K}_{r,p}(z;\mathcal{A}){K}_{s,p}(z;\mathcal{A})dz,}$$

where *c _{ij}* is the cofactor of {

$$\begin{array}{lll}{\displaystyle \sum _{k=1}^{p+1}{\displaystyle \sum _{l=1}^{p+1}{c}_{r+1,k}{c}_{s+1,l}{\{{\mathbf{T}}_{p}(\mathcal{A})\}}_{kl}}}\hfill & =\hfill & {\displaystyle {\int}_{\mathcal{A}}{\displaystyle \sum _{k=1}^{p+1}{\displaystyle \sum _{l=1}^{p+1}{c}_{r+1,k}{c}_{s+1,l}{z}^{k+l-2}K{(z)}^{2}dz}}}\hfill \\ \hfill & =\hfill & {\displaystyle {\int}_{\mathcal{A}}({\displaystyle \sum _{k=1}^{p+1}{c}_{r+1,k}{z}^{k-1}K(z))({\displaystyle \sum _{k=1}^{p+1}{c}_{s+1,k}{z}^{k-1}K(z))dz}}}\hfill \\ \hfill & =\hfill & {\displaystyle {\int}_{\mathcal{A}}|{\mathbf{M}}_{r,p}(z;\mathcal{A})||{\mathbf{M}}_{s,p}(z;\mathcal{A})|K{(z)}^{2}dz.}\hfill \end{array}$$

The asymptotical results in (8.6) and (8.8) for _{m,j} (*x*_{0}; *p, h*, **α**) are easily proved by noting that its bias and variance are dominated by those of the single term $j!{\widehat{\mathit{\beta}}}_{j}-\eta ({x}_{0},\mathbf{\alpha}){(\frac{{\eta}_{0}(\cdot )}{\eta (\cdot ,\mathbf{\alpha})})}^{(j)}({x}_{0})$
based on (8.4) since *L*is a lower triangle matrix.

*Proof of Theorems 1 and 2*. The results of Theorems 1 and 2 are the special cases of Propositions 1 and 2 for *j* = 0.

*Proof of Theorem 5*. It is enough to prove the result for the multiplicative case as the parallel extension to the additive case is straightforward. Note first that, under Conditions (B1)–(B5), we have ‖ − **α**_{0} ‖ = *n*^{−1/2} *O _{p}* (1) by Theorem 3.2 of White (1982). This implies that $\frac{1}{n}({Q}_{m}(\mathit{\beta};h,{x}_{0},{\mathbf{\alpha}}_{0})-{Q}_{m}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}}))\stackrel{P}{\to}0$. Note that Condition (A1) implies that both

Note that $\frac{1}{n}({Q}_{m}(\mathit{\beta};h,{x}_{0},{\mathbf{\alpha}}_{0})-{Q}_{m}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}}))={O}_{p}(1/\sqrt{n})$, $\frac{1}{n}\Vert \frac{\partial}{\partial \mathit{\beta}}{Q}_{m}(\mathit{\beta};h,{x}_{0},{\mathbf{\alpha}}_{0})-\frac{\partial}{\partial \mathit{\beta}}{Q}_{m}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}})\Vert ={O}_{p}(1/\sqrt{n})$, $\frac{1}{n}{\Vert \frac{{\partial}^{2}}{\partial \mathit{\beta}\partial {\mathit{\beta}}^{T}}{Q}_{m}(\mathit{\beta};h,{x}_{0},{\mathbf{\alpha}}_{0})-\frac{{\partial}^{2}}{\partial \mathit{\beta}\partial {\mathit{\beta}}^{T}}{Q}_{m}(\mathit{\beta};h,{x}_{0},\widehat{\mathbf{\alpha}})\Vert}_{F}={O}_{p}(1/\sqrt{n})$ for every **β**, where ‖ · ‖_{F} denotes matrix’s Frobenius norm defined as the square root of the sum of squares of each element. With consistency established above, we can consider a local compact set. By the standard argument of Taylor expansion used for proving asymptotic normality we get ‖ (*x*_{0}, **α**_{0}) − (*x*_{0}, ) ‖ = *n*^{−1/2} *O _{p}* (1), which is faster than the convergence rates in our Theorems 1–4. Hence using estimated does not affect our asymptotic convergence rates as desired.

^{*}Supported in part by NIH grant R01-GM07261 and NSF grant DMS-0704337.

- Cheng M-Y, Hall P. Reducing variance in nonparametric surface estimation. Journal of Multivariate Analysis. 2003;86:375–397.
- Cheng M-Y, Peng L, Wu J-S. Reducing variance in univariate smoothing. Annals of Statistics. 2007;35:522–542.
- Cox DD, O’Sullivan F. Asymptotic analysis of penalized likelihood and related estimators. The Annals of Statistics. 1990;18:1676–1695.
- Fan J. Local linear regression smoothers and their minimax efficiency. The Annals of Statistics. 1993;21:196–216.
- Fan J, Farmen M, Gijbels I. Local maximum likelihood estimation and inference. Journal of the Royal Statistical Society. Series B. Statistical Methodology. 1998;60:591–608.
- Fan J, Gijbels I. Data-driven bandwidth selection in local polynomial fitting: Variable bandwidth and spatial adaptation. Journal of the Royal Statistical SocietySeries B (Methodological) 1995;57:371–394.
- Fan J, Heckman NE, Wand MP. Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J. Amer. Statist. Assoc. 1995;90:141–150.
- Gasser T, MÜller H-G, Mammitzsch V. Kernels for nonparametric curve estimation. J. Roy. Statist. Soc. Ser. B. 1985;47:238–252.
- Glad IK. Parametrically guided non-parametric regression. Scandinavian Journal of Statistics. Theory and Applications. 1998;25:649–668.
- Green PJ, Yandell B. Semiparametric generalized linear models. Proceedings of the 2nd International GLIM conference; Berlin. Sprinter-Verlag; 1985.
- Hjort NL, Glad IK. Nonparametric density estimation with a parametric start. The Annals of Statistics. 1995;23:882–904.
- Hurvich CM, Tsai C-L. Model selection for extended quasi-likelihood models in small samples. Biometrics. 1995;51:1077–1084. [PubMed]
- Martins-Filho C, Mishra S, Ullah A. A class of improved parametrically guided nonparametric regression estimators. Econometric Reviews. 2007 To appear.
- McCullagh P, Nelder JA. Generalized linear models. 2nd ed. London: Chapman & Hall; 1989.
- Naito K. Semiparametric density estimation by local
*L*_{2}-fitting. The Annals of Statistics. 2004;32:1162–1191. - Nelder JA, Wedderburn RWM. Generalized linear models. Journal of the Royal Statistical Society. Series A. 1972;135:370–384.
- O’Sullivan F, Yandell B, Raynor W. Automatic smoothing of regression functions in generalized linear models. Journal of the American Statistical Association. 1986;81:96–103.
- Staniswalis JG. The kernel estimate of a regression function in likelihood-based models. Journal of the American Statistical Association. 1989;84:276–283.
- Stone CJ. Nearest neighbor estimators of a nonlinear regression function. Proc. Computer Science and Statistics, 8th Annual Symposium on the Interface; 1975. pp. 413–418.
- Stone CJ. Consistent nonparametric regression, with discussion. The Annals of Statistics. 1977;5:549–645.
- Tibshirani R, Hastie T. Local likelihood estimation. Journal of the American Statistical Association. 1987;82:559–568.
- Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the gauss-Newton method. Biometrika. 1974;61:439–447.
- White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–25.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |