Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2663964

Formats

Article sections

- Abstract
- 1 Introduction
- 2. Improved Estimator in Multiple Linear Model
- 3. Asymptotic Bias and Risk Functions
- 4. Simulation Studies
- 5. Real Data Example
- 6. Discussion
- References

Authors

Related links

J Multivar Anal. Author manuscript; available in PMC 2010 April 1.

Published in final edited form as:

J Multivar Anal. 2009 April 1; 100(4): 726–741.

doi: 10.1016/j.jmva.2008.08.003PMCID: PMC2663964

NIHMSID: NIHMS93813

Hua Liang, Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA;

In this paper, we define two restricted estimators for the regression parameters in a multiple linear regression model with measurement errors when prior information for the parameters is available. We then construct two sets of improved estimators which include the preliminary test estimator, the Stein-type estimator and the positive rule Stein type estimator for both slope and intercept, and examine their statistical properties such as the asymptotic distributional quadratic biases, the asymptotic distributional quadratic risks. We remove the distribution assumption on the error term, which was generally imposed in the literature, but provide a more general investigation of comparison of the quadratic risks for these estimators. Simulation studies illustrate the finite-sample performance of the proposed estimators, which are then used to analyze a dataset from the Nurses Health Study.

Improvement of estimation for regression models is a fundamental and interesting topic. In certain cases, one may have, but is not sure, some prior information for the parameters of interest. By incorporating the information into the estimation procedure, one may give more efficient estimators than the estimators obtained when the prior information is ignored. Statistical approaches for developing more efficient estimators can be roughly classified into two categories. The first one focuses on developing a proper test procedure to check the validity of the uncertain prior. If the prior information is confirmed, then the commonly used estimators are modified to accommodate the prior. The second one is to develop a procedure in which test and estimation can be conducted simultaneously. The first procedure is very natural and commonly used for theory and application purposes. For example, consider the multiple linear regression models:

$$Y=X\beta +\epsilon ,$$

(1.1)

where *X* is the *n* ×*p* design matrix with rank *p*, *β* is the *p* × 1 regression parameter, and *ε* is the *n* × 1 random error vector. Suppose that we have prior information for *β*, which can be described as *Rβ* = *r*, where *R* is a *q*×*p* known matrix of rank *q* and *r* is a *q*×1 known vector, *q* ≤ *p*. A proper test statistic would be based on a distance between *R _{n}* and

The estimators for the regression parameters, which fall into the second category, include the preliminary test estimator (PTE), the James-Stein type estimator (JSTE), and the positive rule Stein type estimator (PRSE). See Judge and Bock (1978) and Saleh (2006) for a detailed discussion on these estimators. Bancroft (1944) was among the first to consider PTE. Saleh and Sen (1978) extended his idea to a nonparametric setup. JSTE was introduced by Stein (1956) and James and Stein (1961), and expanded by Saleh and Sen (1978, 1986) and Sen and Saleh (1987) to nonparametric areas.

The aforementioned estimation techniques have received much attention recently in linear regression model when the covariates are measured with errors. Stanley (1986, 1988) revealed that JSTE can eliminate inconsistency of the classical least square estimators. Shalabh (1998) studied properties of JSTE when the covariance matrix of the measurement errors is known. For the simple linear regression model with measurement error, when the slope parameter may be the null scalar and all the random components are normally distributed, Kim and Saleh (2003) compared these estimators in the sense of asymptotic distributional quadratic bias, mean square error matrix, and asymptotic distributional quadratic risk. Their comparisons show that PTE behaves better than the attenuation-correction (AC) estimator if the slope is close to 0, but not uniformly better than the AC estimator over the whole range of the regression parameter. Kim and Saleh (2005) further investigated the same question for multiple linear models under the same assumption and setting. Various risk functions based on the asymptotic distribution of the estimators under certain local alternatives are calculated and compared. They also showed that JSTE and PRSE dominate the AC estimator.

This paper mainly focuses on the estimation problem in multiple linear regression models with measurement errors. The contributions we made to the existing literature in this work contain four parts:

- to remove the normality assumption. The normality assumption greatly simplifies the theoretical argument, but it is often violated in a practical study. The removal of the normality assumption will make the theoretical results more general and applicable;
- to improve estimation under the general constraint
*Rβ*=*r*, which contains the case investigated by Kim and Saleh (2003, 2005),*β*= 0, as a special one. The theoretical difficulty lies in the question that how the asymptotic distribution of the AC estimator under the local hypothesis depends on the unknown parameter*β*. This has a substantial impact on constructing the estimators subject to the constraint, or the restricted estimator (RE), which is the building block for constructing PTE, JSTE and PRSE; - to calculate the risk functions for linear combinations of intercept and slope parameters to help estimating the mean response;
- to explicitly compare risk functions for various proposed estimators under a certain circumstance.

The outline of the paper is as follows. In Section 2, we define two REs, based on which two sets of PTE, JSTE and PRSE are constructed. The risk functions of various estimators for the slope parameters under the null hypothesis and local alternatives are presented in Section 3. Also the risks are compared among proposed estimators in some special cases. Simulation studies are presented in Section 4. The proposed estimators are used to analyze a dataset from the Nurses Health Study in Section 5. Our results provide more appropriate estimates and information for nutritional study. The proofs of the main results are shifted to the Appendix.

Suppose that (*Y _{i}*,

$${Y}_{i}={\beta}_{0}+{X}_{i}^{\prime}\beta +{\epsilon}_{i},$$

(2.1)

where *X _{i}* is a

Section 4.4.3 of Saleh (2006) provides a general road map to construct improved estimators if one has some uncertain information about the unknown parameter, say *θ* Θ, in a statistical model. First, we obtain an optimal unrestricted estimator, say *θ _{n}*, and an optimal RE, say
${\theta}_{n}^{\ast}$, by likelihood method if the likelihood function is available, or by least squares method if the likelihood function is unavailable. Second, we construct an optimum test statistic, say

To adopt the above general rule in our setting, we need to find an “optimal” test statistic to check *Rβ* = *r*. The test statistic we use in this paper is *L _{n}* =

$${\widehat{\beta}}_{\text{AC}}={({S}_{WW}-{\mathrm{\sum}}_{uu})}^{-1}{S}_{WY},$$

(2.2)

where *S _{WW}* and

Here is the intuition behind the methods of construction of RE. If (*X*, *ε*, *U*) is normally distributed, then we can directly calculate the conditional expectation *E*(*Y*|*W*), which is a linear function of *W* and can be used to derive the maximum likelihood estimators of *β*_{0} and *β*, and the associated RE. If (*X*, *ε*, *U*) is not normally distributed, *E*(*Y*|*W*) may not easily be calculated, or may be a nonlinear function of *W*. We give another version of RE as follows.

Assume that (*X _{i}*,

To construct REs for *β*_{0} and *β*, we first give a RE for *ν*. Note that the general restriction can be written as
$R{K}_{xx}^{-1}\nu =r$. If *K _{xx}* is known, then using the Lagrangian multiplier, one can show that the restricted maximum likelihood estimators of

$${\widehat{\nu}}_{\text{RE}}={\widehat{\nu}}_{n}-{S}_{WW}^{-1}{K}_{xx}^{\prime -1}{R}^{\prime}\phantom{\rule{0.16667em}{0ex}}{(R{K}_{xx}^{-1}{S}_{WW}^{-1}{K}_{xx}^{\prime -1}R)}^{-1}\phantom{\rule{0.16667em}{0ex}}(R{K}_{xx}^{-1}{\widehat{\nu}}_{n}-r),\phantom{\rule{0.38889em}{0ex}}{\widehat{\nu}}_{0n}=\overline{Y}-{\overline{W}}^{\prime}{\widehat{\nu}}_{\text{RE}}$$

If the reliability matrix *K _{xx}* is unknown, replace it with

$${\widehat{\beta}}_{\text{RE}}={\widehat{K}}_{xx}^{-1}{\widehat{\nu}}_{\text{RE}}={\widehat{\beta}}_{\text{AC}}-{\widehat{H}}_{n}{R}^{\prime}{(R{\widehat{H}}_{n}{R}^{\prime})}^{-1}(R{\widehat{\beta}}_{\text{AC}}-r),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\widehat{\beta}}_{0,\text{RE}}=\overline{Y}-{\overline{W}}^{\prime}{\widehat{\beta}}_{\text{RE}},$$

(2.3)

where ${\widehat{H}}_{n}={\widehat{K}}_{xx}^{-1}{S}_{WW}^{-1}{\widehat{K}}_{xx}^{\prime \phantom{\rule{0.16667em}{0ex}}-1}$, which can easily be shown to be a consistent estimator of $H={K}_{xx}^{-1}{\mathrm{\sum}}_{WW}^{-1}{\widehat{K}}_{xx}^{\prime -1}$.

If the random components are not normal, the conditional expectation *E*(*Y*|*W*) = *β*_{0} + *β*′*E*(*X*|*W*) may not be linear in *W*, and the conditional variance Var(*Y*|*W*) may vary with *W*. Hence, a linear model is inappropriate. In fact, linearity of *E*(*X*|*W*) in *W* and homoscedasticity of Var(*X*|*W*) in *W* imply that (*X*′, *U*′) must be multivariate normal, see Geol and Degroot (1980), Rao (1976). Although the estimators given in (2.3) are obtained based on the normal assumption, it is worth investigate whether the estimators _{RE}, _{0, RE} have good properties in non-normal settings. See Gleser (1992) for the details.

Another way to construct the REs in the measurement error regression models is to mimic the procedure for the RE in model (1.1). That is, _{RE} = _{LS} − *SR*′(*RSR*′)^{−1}(*R*_{LS} − *r*), where *S* is the asymptotic covariance matrix of _{LS}. Then an RE for *β* in the measurement error setting can be defined as

$${\stackrel{\sim}{\beta}}_{\text{RE}}={\widehat{\beta}}_{\text{AC}}-{\widehat{G}}_{n}{R}^{\prime}{(R{\widehat{G}}_{n}{R}^{\prime})}^{-1}(R{\widehat{\beta}}_{\text{AC}}-r),$$

(2.4)

where *Ĝ _{n}* is a consistent estimator of the asymptotic covariance matrix

$$G=({\sigma}^{2}+{\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{uu}\beta ){\mathrm{\sum}}_{xx}^{-1}{\mathrm{\sum}}_{WW}{\mathrm{\sum}}_{xx}^{-1}+{\mathrm{\sum}}_{xx}^{-1}[E(u{u}^{\prime}\beta {\beta}^{\prime}u{u}^{\prime})-{\mathrm{\sum}}_{uu}\beta {\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{uu}-{\mathrm{\sum}}_{uu}({\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{uu}\beta )]{\mathrm{\sum}}_{xx}^{-1}.$$

(2.5)

See Fuller (1987), Carroll et al. (2006) and Kim and Saleh (2005) how to derive *Ĝ _{n}* under the normal and non-normal setups. The RE for

Now we are ready to give PTE, JSTE and PRSE for the regression coefficient *β* based on Saleh’s (2006) general rule and the REs defined by (2.3) and (2.4). For the sake of clarity, we shall put a ^ over those estimators based on _{RE}, and a ~ over those based on _{RE}.

- PTE: ${\widehat{\beta}}_{\text{PTE}}={\widehat{\beta}}_{\text{AC}}-({\widehat{\beta}}_{\text{AC}}-{\widehat{\beta}}_{\text{RE}})I({L}_{n}<{\chi}_{\alpha}^{2})$, where ${\chi}_{\alpha}^{2}$ is the upper
*α*-percentile of the*χ*^{2}-distribution with degrees of freedom*q*; - If
*q*≥ 3, then one can define - JSTE: ${\widehat{\beta}}_{\text{JSTE}}={\widehat{\beta}}_{\text{AC}}-(q-2)({\widehat{\beta}}_{\text{AC}}-{\widehat{\beta}}_{\text{RE}}){L}_{n}^{-1}$,
- PRSE: ${\widehat{\beta}}_{\text{PRSE}}={\widehat{\beta}}_{\text{RE}}+[1-(q-2){L}_{n}^{-1}]I({L}_{n}>q-2)({\widehat{\beta}}_{\text{AC}}-{\widehat{\beta}}_{\text{RE}})$.

In a similar way, we can define _{PTE}, _{JSTE}, and _{PRSE} by replacing _{RE} with _{RE} in the expressions of _{PTE}, _{JSTE} and _{PRSE}, respectively. It would be interesting to see what differences these estimators based on the two REs may have with respect to the risk comparisons.

From the above definitions, one can see that if the data yield
${L}_{n}<{\chi}_{\alpha}^{2}$, then _{PTE} = _{RE}, otherwise, _{PTE} = _{AC}. So PTE is indeed a simple mixture of the AC estimator and the RE. In the ordinary two-step procedure, one would test the hypothesis *Rβ* = *r* first, then based on the testing result decide which estimator will be adopted. PTE simply combines these two steps to form a single one. That is, testing and estimation are done simultaneously, while JSTE replaces the indicator function
$I({L}_{n}<{\chi}_{\alpha}^{2})$ with a continuous function,
$(q-2){L}_{n}^{-1}$, of *L _{n}*. In the normal case, one can actually obtain JSTE by the empirical Bayesian estimation approach. The constant appearing in JSTE in the classical multiple regression model should be (

For the sake of convenience, in what follows, we shall call the estimators _{RE}, _{PTE}, _{JSTE}, and _{PRSE} the “hat” estimators, and the estimators _{RE}, _{PTE}, _{JSTE}, and _{PRSE} the “tilde” estimators.

To begin with, we state regularity conditions associated with model (2.1), which will be used throughout the current and next sections. Some conditions are already mentioned in the previous section. They are listed again for the sake of completeness.

- C1: The measurement errors
*ε*,_{i}*i*= 1,…,*n*, are i.i.d. with mean 0 and finite positive variance*σ*^{2}; the measurement error vectors*u*,_{i}*i*= 1,…,*n*, are i.i.d. with mean vector 0 and covariance matrix Σ, which is known. The fourth moment of the Euclidean norm for_{uu}*u*exists, that is,*E*||*u*||^{4}< ∞. - C2:
*ε*and_{i}*u*are independent,_{i}*i*= 1,…,*n*. - C3:
*X*’s are i.i.d. with mean vector_{i}*μ*and finite positive covariance matrix Σ_{x}, and are independent of_{xx}*ε*and_{i}*u*._{i}

These conditions are quite common in the literature of measurement error models. The existence of the fourth moment of *u* is needed to ensure the asymptotic normality of the AC estimator. The existence of the fourth moment of *u* was also assumed in Schneeweiss (1976).

We begin with considering the asymptotic behavior of the improved estimators under the fixed alternative: *H _{a}* :

$${L}_{n}=n(R{\widehat{\beta}}_{n}-R\beta {)}^{\prime}{(R{\widehat{G}}_{n}{R}^{\prime})}^{-1}(R{\widehat{\beta}}_{n}-R\beta )+2n(R{\widehat{\beta}}_{n}-R\beta {)}^{\prime}{(R{\widehat{G}}_{n}{R}^{\prime})}^{-1}\mathit{\delta}+n{\mathit{\delta}}^{\prime}{(R{\widehat{G}}_{n}{R}^{\prime})}^{-1}\mathit{\delta}.$$

The first term is the order of *O _{p}*(1), the second
${O}_{p}(\sqrt{n})$ and the third

Let
${\beta}_{n}^{\ast}$ be a generic notation for the hat estimators and tilde estimators. Under the fixed alternative, it is ready to see that
$\sqrt{n}({\beta}_{n}^{\ast}-\beta )=\sqrt{n}({\widehat{\beta}}_{\text{AC}}-\beta )+{o}_{p}(1)$ if Conditions C1–C3 hold. That is, all the estimators defined above are asymptotically equivalent to _{AC}. This implies that the asymptotic risk functions are all the same if *H _{a}* is true. Thus we can not tell any difference among these estimators.

To obtain meaningful risk comparisons among these estimators, we consider a sequence of local alternatives, that is,

$${H}_{na}:R{\beta}_{n}-r=\mathit{\delta}/\sqrt{n},$$

(2.6)

with fixed *R*, ** δ** and

$${J}_{C}=C{R}^{\prime}{(RC{R}^{\prime})}^{-1},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}C=G\phantom{\rule{0.16667em}{0ex}}\text{or}\phantom{\rule{0.16667em}{0ex}}H.$$

(2.7)

The following theorem states the asymptotic distributions of the AC, hat and tilde estimators under the local alternatives (2.6).

Suppose Conditions C1–C3 hold. Then, under the local alternatives (2.6), as n → ∞, we have, in distribution,

$$\sqrt{n}({\beta}_{n}^{\ast}-{\beta}_{n})\to Z-\phi (L){J}_{C}(RZ+\mathit{\delta}),$$

where Z ~ N(0, G), L = (RZ +**δ**)′(RGR′)^{−1}(RZ + **δ**). For
${\beta}_{n}^{\ast}={\widehat{\beta}}_{\text{AC}}$, (L) = 0; for the hat estimators, C = H, for _{RE}, (L) = 1; for _{PTE},
$\phi (L)={I}_{(0,{\chi}_{\alpha}^{2})}(L)$; for β_{JSTE}, (L) = (q − 2)L^{−1}; and for _{PRSE}, (L) = 1 − [1 − (q − 2)L^{−1}]I_{(q−2,∞)}(L). For the tilde estimators, C = G, the (·) functions are the same as their counterpart in the hat estimators.

In particular, if *R* = *I _{p}*

The commonly used quantities to evaluate the performance of an estimator are its bias, variance or mean squared error. For the estimators proposed in the last section, we will calculate their asymptotic distributional quadratic bias, asymptotic distributional quadratic risk function with different weight matrices, and show that all estimators but AC are asymptotically biased. The general comparisons between different estimators based on these asymptotic result turn out to be very difficult, but detailed comparisons may be made under special circumstances. We mainly focus on the discussions of asymptotic bias and risk functions of the estimators for the slope parameters and certain linear combinations of the intercept and slope parameters.

The asymptotic distributional bias function of the estimator ${\beta}_{n}^{\ast}$ is defined as the bias of the asymptotic distribution of $\sqrt{n}({\beta}_{n}^{\ast}-{\beta}_{n})$, that is, $b(\beta ,{\beta}_{n}^{\ast})=E[Z-\phi (L){J}_{C}(RZ+\mathit{\delta})]$ according to Theorem 2.1. To make meaningful comparisons among the asymptotic distributional biases, we define the asymptotic distributional quadratic bias as below:

$$B(\beta ,{\beta}_{n}^{\ast})=b(\beta ,{\beta}_{n}^{\ast}{)}^{\prime}b(\beta ,{\beta}_{n}^{\ast}).$$

Let *M* be a known positive definite weight matrix. The asymptotic risk function of the estimator
${\beta}_{n}^{\ast}$ of *β* is defined as

$$\rho (\beta ,{\beta}_{n}^{\ast})=E\{[Z-\phi (L){J}_{C}(RZ+\mathit{\delta}){]}^{\prime}M[Z-\phi (L){J}_{C}(RZ+\mathit{\delta})]\}.$$

To concisely state the results, the following notation is used.

$$\begin{array}{l}\mu ={G}^{1/2}{R}^{\prime}{(RG{R}^{\prime})}^{-1}\mathit{\delta},\phantom{\rule{0.38889em}{0ex}}\lambda ={\mathit{\delta}}^{\prime}{(RG{R}^{\prime})}^{-1}\mathit{\delta},\phantom{\rule{0.38889em}{0ex}}\eta ={\mathit{\delta}}^{\prime}{J}_{G}^{\prime}M{J}_{G}\mathit{\delta},\\ {g}_{q+i}(\lambda )=P({\chi}_{q+i,\lambda}^{2}<{\chi}_{\alpha}^{2}),\phantom{\rule{0.38889em}{0ex}}{h}_{j,q+i}(\lambda )={(q-2)}^{j}E[{({\chi}_{q+i,\lambda}^{2})}^{-j}],\\ {k}_{j,q+i}(\lambda )=P({\chi}_{q+i,\lambda}^{2}<q-2)+{(q-2)}^{j}E[{({\chi}_{q+i,\lambda}^{2})}^{-j}I({\chi}_{q+i,\lambda}^{2}>q-2)],\end{array}$$

where *i* = 2, 4, *j* = 1, 2,
${\chi}_{q+i,\lambda}^{2}$ is the noncentral *χ*^{2} random variable with degrees of freedom *q* + *i* and noncentral parameter λ.

The derivation of the asymptotic distributional bias functions can be done by using the fact $E\phi (L)Z={G}^{1/2}\mu [E\phi ({\chi}_{q+2,\lambda}^{2})-E\phi ({\chi}_{q,\lambda}^{2})]$, for any measurable function (·). This can be verified by using (A.8) in the Appendix. The details are omitted for the sake of brevity. The following theorem lists the results for the asymptotic distributional bias. Its proof can be finished by Theorem 2.1. We omit the details.

Suppose Conditions C1–C3 hold. Then, under the local alternatives, the asymptotic biases and asymptotic distributional quadratic biases for the AC, hat and tilde estimators are given by

$$b(\beta ,{\beta}_{n}^{\ast})=-f(\lambda ){J}_{C}\mathit{\delta},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}B(\beta ,{\beta}_{n}^{\ast})={f}^{2}(\lambda ){\mathit{\delta}}^{\prime}{J}_{C}^{\prime}{J}_{C}\mathit{\delta}.$$

respectively. For the AC estimator, f(λ) = 0; for all hat estimators, C = H, and for all tilde estimators, C = G. For RE, f(λ) = 1; for PTE, f(λ) = g_{q+2}(λ); for JSTE, f(λ) = h_{1,q+2}(λ); and for PRSE, f(λ) = k_{1,q+2}(λ).

Within each set of estimators, the asymptotic distributional quadratic biases can be compared simply through the quantities *g _{q}*

The more interesting comparison would be made between the hat estimators and tilde estimators. A natural way to make the comparisons is to compute the difference between the biases or the risk functions. However, the computation is rather complex in general situations. But for some special cases, the comparison can be made easily. In particular, we have the following corollary.

Suppose u ~ N_{p}(0, σ_{uu}I), Σ_{xx} = σ_{xx}I, then for any **δ**, the asymptotic distributional quadratic biases of the hat estimators are all less than those of the corresponding tilde estimators.

For an illustrative purpose, let *p* = 5, *q* = 3 (recall that *q* > 2 is required in constructing JSTE and PRSE), *α* = 0.15, *β* = (3, 3, 1, 2, 3)′, Σ* _{xx}* = Σ

Bias Plots. The dotted and solid lines are the biases for the tilde estimators and the hat estimators and the horizontal line represents the bias of the AC estimator.

The asymptotic weighted risk functions of _{AC}, the hat and tilde estimators are summarized in the following theorem.

Suppose Conditions C1–C3 hold, then

$$\begin{array}{l}\rho (\beta ,{\widehat{\beta}}_{\text{AC}})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G),\\ \rho (\beta ,{\widehat{\beta}}_{\text{RE}})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G)-2\text{tr}(RG\phantom{\rule{0.16667em}{0ex}}M\phantom{\rule{0.16667em}{0ex}}{J}_{H})+\text{tr}[{J}_{H}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})]+{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{H}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}\mathit{\delta},\\ \rho (\beta ,{\widehat{\beta}}_{\text{PTE}})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G)-2\text{tr}(RG\phantom{\rule{0.16667em}{0ex}}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}){g}_{q+2}(\lambda )+2{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}\mathit{\delta}[{g}_{q+2}(\lambda )-{g}_{q+4}(\lambda )]\\ +\text{tr}[{J}_{H}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})]{g}_{q+2}(\lambda )+{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{H}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}\mathit{\delta}{g}_{q+4}(\lambda ),\end{array}$$

while the risk functions of _{JSTE}, _{PRSE} have similar forms as the following

$$\begin{array}{c}\rho (\beta ,{\widehat{\beta}}_{n}^{\ast})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G)-2\text{tr}(RG\phantom{\rule{0.16667em}{0ex}}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}){f}_{1,q+2}(\lambda )+2{\mathit{\delta}}^{\prime}{J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}\mathit{\delta}[{f}_{1,q+2}(\lambda )-{f}_{1,q+4}(\lambda )]\\ +\text{tr}[{J}_{H}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})]{f}_{2,q+2}(\lambda )+{\mathit{\delta}}^{\prime}{J}_{H}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{H}\mathit{\delta}{f}_{2,q+4}(\lambda ).\end{array}$$

The risk function of _{JSTE} corresponds to f = h, and _{PRSE} corresponds to f = k.

The risk function comparisons can be done by investigating the matrices *G* and *H*, but the actual comparison may be complicated, if possible, because
$G=({\sigma}^{2}+{\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{uu}\beta )H+{\mathrm{\sum}}_{xx}^{-1}[E(u{u}^{\prime}\beta {\beta}^{\prime}u{u}^{\prime})-\mathrm{\sum}{}_{uu}\beta {\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}\mathrm{\sum}{}_{uu}-\mathrm{\sum}{}_{uu}({\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}\mathrm{\sum}{}_{uu}\beta )]{\mathrm{\sum}}_{xx}^{-1}$, and the risk function involves inverse operations such as (*RGR*′)^{−l}. This comparison can be done in a straightforward way for some special cases. For example, if *M* = *H*^{−1}, we have the following corollary.

Suppose Conditions C1–C3 hold. Let M = H^{−1}. Then

$$\begin{array}{l}\rho (\beta ,{\widehat{\beta}}_{\text{AC}})=\text{tr}({H}^{-1}G),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\rho (\beta ,{\widehat{\beta}}_{\text{RE}})=\text{tr}({H}^{-1}G)-\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]+{\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta},\\ \rho (\beta ,{\widehat{\beta}}_{\text{PTE}})=\text{tr}({H}^{-1}G)-\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]{g}_{q+2}(\lambda )+{\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}[2{g}_{q+2}(\lambda )-{g}_{q+4}(\lambda )],\end{array}$$

the risk functions of _{JSTE}, _{PRSE} have similar forms as the following

$$\begin{array}{l}\rho (\beta ,{\widehat{\beta}}_{n}^{\ast})=\text{tr}({H}^{-1}G)-\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}][2{f}_{1,q+2}(\lambda )-{f}_{2,q+2}(\lambda )]\\ +{\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}[2{f}_{1,q+2}(\lambda )-2{f}_{1,q+4}(\lambda )+{f}_{2,q+4}(\lambda )].\end{array}$$

The risk function of _{JSTE} corresponds to f = h, and _{PRSE} corresponds to f = k.

Based on Corollary 3.2, one can make the comparisons among the estimators more specifically. In fact,

- the pretest estimator
_{PTE}performs better than the AC estimator_{AC}if and only ifsatisfies*δ*$${\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}\le \frac{{g}_{q+2}(\lambda )\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]}{2{g}_{q+2}(\lambda )-{g}_{q+4}(\lambda )}.$$In particular, under*H*_{0}, that is= 0, then*δ*$$\rho (\beta ,{\widehat{\beta}}_{\text{AC}})-\rho (\beta ,{\widehat{\beta}}_{\text{PTE}})={g}_{q+2}(0)\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]>0.$$ - For a given
*α*, PTE is not uniformly better than the AC estimator. One may determine an*α*such that PTE has a minimum guaranteed relative efficiency. Similarly to Kim & Saleh (2005) and Saleh (2006), the relative efficiency of_{PTE}to_{AC}is defined as E(*α*, λ) =*ρ*(*β*,_{AC})/*ρ*(*β*,_{PTE}). For any given*R*,*r*,*G*,*H*, the relative efficiency is a function of*α*and λ. Suppose the minimum efficiency required is*E*_{0}, then we can choose*α*by solving the equation min_{λ}*E*(*α*, λ) =*E*_{0}. The explicit solution may not be available, but we can use a numerical method to search for the minimization. That is, we compute the corresponding min_{λ}*E*(*α*, λ) for several*α*values, and select one such that min_{λ}*E*(*α*, λ) is close to but not less than*E*_{0}. Now we use the setup for Figure 1 to illustrate the choice of*α*. The weight matrix*M*is chosen to be*H*^{−1}and*G*^{−1}, respectively. Table 1 reports, for each*α*, the maximum relative efficiency (denoted by max), the minimal relative efficiency (denoted by min) and also the value of*δ*corresponding to the minimal relative efficiency, denoted by*δ*_{min}.So, in our cases, if PTE of*β*is chosen with at least 0.80 relative efficiency comparing to the AC estimator, in both case (*M*=*H*^{−1}and*M*=*G*^{−1}), we choose*α*= 0.15 as the level of the test. In practice,*G*and*H*are unknown. To implement this procedure, one needs to have some preliminary estimates for these quantities. - The risk function of
_{JSTE}can be written as$$\begin{array}{l}\rho (\beta ,{\widehat{\beta}}_{\text{JSTE}})=\text{tr}({H}^{-1}G)-(q-2)\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]\xb7\\ \left((q-2)E({\chi}_{q+2,\lambda}^{-4})+2\lambda \left\{1-\frac{(q+2){\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}}{2\lambda \text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]}\right\}E{\chi}_{q+4,\lambda}^{-4}\right),\end{array}$$which can be shown by using the definition of*h*-function and the following facts on*χ*^{2}-distributions:$$\begin{array}{l}\lambda E[{({\chi}_{q+4,\lambda}^{2})}^{-2}]=E[{({\chi}_{q+2,\lambda}^{2})}^{-1}]-(q-2)E[{({\chi}_{q+2,\lambda}^{2})}^{-2}],\\ 2E[{({\chi}_{q+4,\lambda}^{2})}^{-2}]=E[{({\chi}_{q+2,\lambda}^{2})}^{-1}]-E[{({\chi}_{q+4,\lambda}^{2})}^{-1}].\end{array}$$See (2.2.13d) and (2.2.13e) in Saleh [2006].A sufficient condition that ensures_{JSTE}being superior to_{AC}is given by$$\frac{\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]}{{\mathrm{\Lambda}}_{max}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]}\ge \frac{q+2}{2},$$where Λ_{max}(·) is the maximal eigenvalue of its matrix argument. If, in addition, we assume that*u*~*N*(0,*σ*), Σ_{uu}I=_{xx}*σ*, then the above sufficient condition can be written as (_{xx}I*c*_{1}−*c*_{2})*q*≥ 2*c*_{1}, where*c*_{1}=*σ*^{2}+*σ*′_{uu}β*β*, ${c}_{2}={\beta}^{\prime}{R}^{\prime}{(R{R}^{\prime})}^{-1}R\beta {\sigma}_{uu}^{2}/({\sigma}_{xx}+{\sigma}_{uu})$. Furthermore, if*c*_{2}= 0, this sufficient condition is simplified as*q*≥ 2. This, together with the following fact, implies that PRSE and JSTE dominate the AC estimator when*c*_{2}= 0, the case studied by Kim and Saleh (2005). - The difference
*ρ*(*β*,_{JSTE}) −*ρ*(*β*,_{PRSE}) is$$\begin{array}{l}\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]\xb7E\{{[1-(q-2){\chi}_{q+2,\lambda}^{-2}]}^{2}I({\chi}_{q+2,\lambda}^{2}<q-2)\}\\ +{\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}E[{(1-(q-2){\chi}_{q+4,\lambda}^{-2})}^{2}I({\chi}_{q+4,\lambda}^{2}<q-2)]\\ +2{\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}E[(q-2){\chi}_{q+2,\lambda}^{-2}-1)I({\chi}_{q+2,\lambda}^{2}<q-2)],\end{array}$$which is nonnegative for all, and implies that the performance of PRSE is uniformly better than that of JSTE.*δ* - The difference
*ρ*(*β*,_{JSTE}) −*ρ*(*β*,_{PTE}) is$$\begin{array}{l}\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]\xb7[{g}_{q+2}(\lambda )-2{h}_{1,q+2}(\lambda )+{h}_{2,q+2}(\lambda )]\\ +{\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}[2{h}_{1,q+2}(\lambda )-2{h}_{1,q+4}(\lambda )+{h}_{2,q+4}(\lambda )-2{g}_{q+2}(\lambda )+{g}_{q+4}(\lambda )].\end{array}$$Under the null hypothesis (= 0),*δ*$$\rho (\beta ,{\widehat{\beta}}_{\text{JSTE}})-\rho (\beta ,{\widehat{\beta}}_{\text{PTE}})=\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]\xb7\left[P({\chi}_{q+2}^{2}<{\chi}_{\alpha}^{2})-\frac{q-2}{q}\right].$$Thus, in the case of*Rβ*being close to*r*, at the significance level*α*, PTE should be used if $qP({\chi}_{q+2}^{2}<{\chi}_{\alpha}^{2})\ge q-2$, otherwise, JSTE is preferable. - The difference
*ρ*(*β*,_{PRSE}) −*ρ*(*β*,_{PTE}) is$$\begin{array}{l}\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]\xb7[{g}_{q+2}(\lambda )-2{k}_{1,q+2}(\lambda )+{k}_{2,q+2}(\lambda )]\\ +{\mathit{\delta}}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{\delta}[2{k}_{1,q+2}(\lambda )-2{k}_{1,q+4}(\lambda )+{k}_{2,q+4}(\lambda )-2{g}_{q+2}(\lambda )+{g}_{q+4}(\lambda )].\end{array}$$Under the null hypothesis (= 0),*δ**ρ*(*β*,_{PRSE}) −*ρ*(*β*,_{PTE})equal to$$\text{tr}[{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}]\xb7\left[P({\chi}_{q+2}^{2}<{\chi}_{\alpha}^{2})-\frac{q-2}{q}-E[{(1-(q-2){\chi}_{q+2}^{-2})}^{2}I({\chi}_{q+2}^{2}<q-2)]\right].$$Thus, in the case of*Rβ*being close to*r*, at the significance level*α*, PTE should be used if $qP({\chi}_{q+2}^{2}<{\chi}_{\alpha}^{2})\ge q-2+qE[{(1-(q-2){\chi}_{q+2}^{-2})}^{2}I({\chi}_{q+2}^{2}<q-2)]$, otherwise, PRSE is preferable.

By applying the above theorem with *H* replaced by *G*, we can obtain the asymptotic weighted risk functions for all tilde estimators. Similar risk comparison analysis can be made as above, which we summarize in the following theorem.

Suppose Conditions C1–C3 hold, then

$$\begin{array}{l}\rho (\beta ,{\stackrel{\sim}{\beta}}_{\text{RE}})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G)-\text{tr}({J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}G\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})+{\mathit{\delta}}^{\prime}{J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{G}\mathit{\delta},\\ \rho (\beta ,{\stackrel{\sim}{\beta}}_{\text{PTE}})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G)-{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{G}\mathit{\delta}{g}_{q+4}(\lambda )-[\text{tr}({J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}G\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})-2{\mathit{\delta}}^{\prime}{J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{G}\mathit{\delta}]{g}_{q+2}(\lambda ),\end{array}$$

the risk functions of _{JSTE}, _{PRSE} have similar forms as the following

$$\begin{array}{l}\rho (\beta ,{\stackrel{\sim}{\beta}}_{n}^{\ast})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G)-\text{tr}({J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}G\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})(2{f}_{1,q+2}(\lambda )-{f}_{2,q+2})\\ +{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{G}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{G}\mathit{\delta}[2{f}_{1,q+2}(\lambda )-2{f}_{1,q+4}(\lambda )+{f}_{2,q+4}(\lambda )].\end{array}$$

The risk functions of _{JSTE} and correspond to f = h, and f = k, respectively.

In particular, if *M* = *G*^{−1}, then the above theorem reduces to the following corollary.

Suppose Conditions C1–C3 hold, then

$$\begin{array}{l}\rho (\beta ,{\stackrel{\sim}{\beta}}_{\text{RE}})=p-q+\lambda ,\phantom{\rule{0.38889em}{0ex}}\rho (\beta ,{\stackrel{\sim}{\beta}}_{\text{PTE}})=p-(q-2\lambda ){g}_{q+2}(\lambda )-\lambda {g}_{q+4}(\lambda ),\\ \rho (\beta ,{\stackrel{\sim}{\beta}}_{\text{JSTE}})=p-q[2{h}_{1,q+2}(\lambda )-{h}_{2,q+2}(\lambda )]+\lambda [2{h}_{1,q+2}(\lambda )-2{h}_{1,q+4}(\lambda )+{h}_{2,q+4}(\lambda )],\\ \rho (\beta ,{\stackrel{\sim}{\beta}}_{\text{PRSE}})=p-q[2{k}_{1,q+2}(\lambda )-{k}_{2,q+2}(\lambda )]+\lambda [2{k}_{1,q+2}(\lambda )-2{k}_{1,q+4}(\lambda )+{k}_{2,q+4}(\lambda )].\end{array}$$

For the purpose of illustration, we use the previous setting, that is, *p* = 5, *q* = 3, *α* = 0.15, *β* = (3, 3, 1, 2, 3)′, Σ* _{xx}* = Σ

Risk Plots. The dotted and solid lines are the risks for the tilde estimators and the hat estimators and the horizontal line is the risk of the AC estimator.

Sometimes, we are interested in estimating a linear combination of the intercept and the slope parameters. For example, to predict the response at a specified value of the predictor, say *x*_{0} * ^{p}*, one needs to calculate the value of
${\beta}_{0n}^{\ast}+{x}_{0}^{\prime}{\beta}_{n}^{\ast}$, where
${\beta}_{0n}^{\ast}$ and
${\beta}_{n}^{\ast}$ are the generic notation for the AC estimator, hat or tilde estimators of

Suppose Conditions C1–C3 hold. Then
$\rho ({\beta}_{0}-{x}_{0}^{\prime}\beta ,{\beta}_{0n}^{\ast}+{x}_{0}^{\prime}{\beta}_{n}^{\ast})={\sigma}_{0}^{2}+{x}_{0}^{\prime}G{x}_{0}+2{x}_{0}^{\prime}G\tau +({x}_{0}-{\mu}_{x}{)}^{\prime}{J}_{C}[RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}E{\phi}^{2}({\chi}_{q+2,\lambda}^{2})+\mathit{\delta}{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}E{\phi}^{2}({\chi}_{q+4,\lambda}^{2})]{J}_{C}^{\prime}({x}_{0}-{\mu}_{x})-2({x}_{0}-{\mu}_{x}{)}^{\prime}{J}_{C}[RG\phantom{\rule{0.16667em}{0ex}}E\phi ({\chi}_{q+2,\lambda}^{2})-\mathit{\delta}{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{G}^{\prime}E\phi ({\chi}_{q+2,\lambda}^{2})+\mathit{\delta}{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{G}^{\prime}E\phi ({\chi}_{q+4,\lambda}^{2})]({x}_{0}+\tau )$, where C = H corresponds to the estimators based on _{RE} and C = G corresponds to the estimators based on _{RE}, (x) = 0 for the AC estimator, (x) = 1 for the RE,
$\phi (x)={I}_{(0,{\chi}_{\alpha}^{2})}(x)$ for PTE, (x) = (q −2)x^{−1} for JSTE and (x) = 1 − [1 − [q − 2)x^{−1}]I_{(q−2,∞)} (x) for PRSE.

When the measurement errors are symmetric around 0, *Euβ*′*uu*′*β* = 0 for all *β*. As a consequence, we have the following corollary.

Suppose Conditions C1–C3 hold, and Euβ′uu′β = 0. Then

$$\begin{array}{l}\rho ({\beta}_{0}-{x}_{0}^{\prime}\beta ,{\beta}_{0n}^{\ast}+{x}_{0}^{\prime}{\beta}_{n}^{\ast})={\sigma}_{0}^{2}+({x}_{0}-{\mu}_{x}{)}^{\prime}[{J}_{C}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{C}^{\prime}E{\phi}^{2}({\chi}_{q+2,\lambda}^{2})\\ +{J}_{C}\mathit{\delta}{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{C}^{\prime}E{\phi}^{2}({\chi}_{q+4,\lambda}^{2})]({x}_{0}-{\mu}_{x})-2({x}_{0}-{\mu}_{x}{)}^{\prime}{J}_{C}[RG\phantom{\rule{0.16667em}{0ex}}E\phi ({\chi}_{q+2,\lambda}^{2})\\ -\mathit{\delta}{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{G}^{\prime}E\phi ({\chi}_{q+2,\lambda}^{2})+\mathit{\delta}{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{G}^{\prime}E\phi ({\chi}_{q+4,\lambda}^{2})]({x}_{0}-{\mu}_{x}),\end{array}$$

where ${\sigma}_{0}^{2}={\sigma}^{2}+{\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{uu}\beta +{\mu}_{x}^{\prime}G{\mu}_{x}$, C and (·) are as in Theorem 3.4.

To see the finite sample performance of the proposed estimators, we conduct simulation experiments under the various scenarios. The data are generated from the following multiple linear regression model with measurement errors

$$\{\begin{array}{l}Y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}+{\beta}_{3}{X}_{3}+{\beta}_{4}{X}_{4}+{\beta}_{5}{X}_{5}+{\beta}_{6}{X}_{6}+{\beta}_{7}{X}_{7}+{\beta}_{8}{X}_{8}+\epsilon ,\hfill \\ {W}_{j}={X}_{j}+{u}_{j},\phantom{\rule{0.38889em}{0ex}}j=1,2,\dots ,8.\hfill \end{array}$$

(4.1)

The true regression parameter is chosen to be
${\beta}_{n}=\beta +\delta \mathbf{1}/\sqrt{n}$ with *Rβ* = *r*. We consider two cases, *δ* = 0, and *δ* ≠ 0. For each case, we calculate the risk function for sample size *n* = 50, 100, 200, and 500, and repeat each simulation 1000 times. The values of the risk reported in the tables below are the average of 1000 sample risks. In the simulation, *α* is chosen to be 0.15. In practice, one can choose the value of *α* based on the maximin rule given in Table 1. The predictors *X* are generated from multivariate normal *N*(0, *I*_{8×8}), and the measurement errors *u* are generated from multivariate normal *N*(0, 0.2^{2}*I*_{8×8}). *X* and *u* are independent. The regression error *ε* follows *N*(0, 1).

When *δ* = 0 or *Rβ _{n}* =

The true values of the regression parameters are chosen to be *β*_{0} = 1, *β*_{1} = *β*_{5} = *β*_{7} = 0, *β*_{2} = 1.5, *β*_{3} = 0.75, *β*_{4} = −*β*_{6} = 2, *β*_{8} = 3. So the *β*’s satisfy the constraint *Rβ* = *r* with

$$R=\left(\begin{array}{cccccccc}1& 0& 0& 0& 0& 0& 0& 0\\ 0& 1& -2& 0& 0& 0& 0& 0\\ 0& 0& 0& 1& 0& 1& 0& 0\\ 0& 0& 0& 0& 1& 0& 0& 0\\ 0& 0& 0& 0& 0& 0& 1& 0\end{array}\right),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}r=\left(\begin{array}{c}0\\ 0\\ 0\\ 0\\ 0\end{array}\right).$$

(4.2)

Table 2 reports the risk values of various slope estimators.

In Table 2, “hat” denotes the hat estimators, and “tilde” denotes the tilde estimators. Each cell gives the risk value with weight matrix *G*^{−1}. The risks with weight matrix *H*^{−1} have a similar pattern and therefore are omitted. In this simulation, one can see that the risks are pretty stable within each class of estimators, also the risks of the hat estimators and of the tilde estimators are very close. The smallest risk is achieved by RE, then PTE, PRSE, JSTE, and the largest risk is obtained by the AC estimator. These results coincide with our theory. For an illustrative purpose, the histograms of the estimates, with the weight matrix *M* = *H*^{−1}, of the slope parameter for *n* = 200 are given in Figure 3.

When *δ* ≠ 0 or *Rβ _{n}* =

In this case,
$R{\beta}_{n}-r=\delta R\mathbf{1}/\sqrt{n}$, where **1** is a 8 ×1 vector with elements all 1’s. Other quantities in the model stay unchanged. Figure 4 reports the risks of the tilde estimators with weight matrix *G*^{−1} for different sample sizes. The *δ* value ranges from 0 to 10. The risks of the hat estimators have the similar pattern regardless of the choice of weight matrix.

Risk Functions. Thin solid-line: risk of the AC estimator; dashed line: risk of RE; dotted line: risk of PTE; dash-dotted line: risk of JSTE; thick solid-line: risk of PRSE.

The risk of the AC estimator for the slope is almost constant for various *δ*. When the values of *δ* is close to 0, the RE for both slopes and intercept achieves the smallest risk, but its risk increases quickly when *δ* gets bigger. PTE for both slopes and intercept has smaller risk when *δ* is smaller, once *δ* leaves 0, the risk of PTE begins to increase and exceeds the risk of the AC estimator. After it hits a certain point, it comes down and eventually approaches to the risk of the AC estimator. For small *δ* values, the risks of JSTE and PRSE for the slope estimators are higher than that of the RE. However, when *δ* gets bigger, JSTE and PRSE begin to dominate all other estimators. The patterns of the risks of the AC estimator, RE and PTE for the intercept are similar to those of the AC estimator, RE and PTE for the slopes, but this is not true for the risks of JSTE and PRSE for the intercept.

Upon the request of one referee, we also conduct a simulation study when *X* and *u* follow non-normal distributions. Each component of the predictors *X* is generated from a uniform distribution on [1, 2], and each component of the measurement error *u* is generated from a uniform distribution on [−0.5, 0.5], *X*_{1}, …, *X*_{8}, *u*_{1}, …, *u*_{8} are independent. All other entities are the same as in the normal case. The simulation results are similar to the normal cases and are not reported here.

The assessment of an individual diet is difficult, but fundamental in exploring the relationship between diet and cancer, and in monitoring dietary behavior among individuals and populations. A variety of dietary assessment instruments have been derived, of which three main types are most commonly used in nutritional research. The instrument of choice in large nutritional epidemiology studies is the Food Frequency Questionnaire (FFQ). For proper interpretation of epidemiologic studies that use FFQ’s as the basic dietary instrument, one needs to know the relationship between reported intakes from the FFQ, usual intake, energy, vitamin A, and other variables such as age and body mass index (bmi).

FFQ’s are thought to often involve a systematic bias (i.e., under- or over-reporting at the level of the individual). The other records also include measurement errors. To illustrate the proposed method, we analyze a data set from the Nurses Health Study (Rosner, Spiegelman, and Willett, 1990), which has a calibration study of size *n* = 168 women. All of them completed a single FFQ and four multiple-day food diaries. There are 6 variables, age (*X*_{1}), bmi (*X*_{2}), energy (*X*_{3}), vitamin A (*X*_{4}), usual intake (*X*_{5}), and the calories from fat, FFQ (*Y*), in this dataset. Among these 6 variables, energy, usual intake, vitamin A are measured with error, but for each subject, these 3 variables are measured four times. A simple variance analysis suggests that the variance of energy is 3.63, the variance of vitamin A is 381.92, and the variance of usual intake is 10.34. For an initial analysis, the following multiple regression model is used to fit this dataset.

$$Y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}+{\beta}_{3}{X}_{3}+{\beta}_{4}{X}_{4}+{\beta}_{5}{X}_{5}+\epsilon .$$

The averages of these four replications in *X*_{3}, *X*_{4}, *X*_{5} are used in the design matrix. The covariance matrix of the measurement errors is estimated by using the formula in Liang, Härdle, and Carroll (1999).

With all variables in the model and without constraint, Table 3 lists the estimated values based on the AC estimation for the slope, the estimated standard errors based on the procedure developed by Liang, et al.(1999), and the associated *p*-values calculated from *t*-distribution with degrees of freedom *n* − 6 = 168 − 6 = 162.

Table 3 shows that *X*_{3} and *X*_{4} are not significant at 0.1 significant level, while variable *X*_{1}, *X*_{2}, and *X*_{5} are significant. Recall that *X*_{5} represents usual intake, which is strongly related to intakes from the FFQ. On the other hand, vitamin A should not be a good predictor of food composition. Thus, using advanced statistical methods for getting a reasonable estimate but weighting towards *β*_{3} = *β*_{4} = 0 makes a lot of sense for nutritional research.

Now we impose a constraint on our model *β*_{3} = *β*_{4} = 0. In this case *q* = 2. Table 4 reports the estimated values, obtained by calculating the estimators we are studying, for the regression parameters. To measure the variation of these estimates, the bootstrap standard errors are also reported here. For each estimation procedure, 1000 bootstrap samples are drawn. Accordingly, 1000 bootstrap estimates for the slope parameters are obtained. To get a robust estimation for the standard errors, only the middle 80%, or 800 estimates are used in the calculation. The resulting bootstrap standard errors are shown in the brackets.

Note that the bootstrap standard errors of AC in Table 4 are close to the standard errors of AC in Table 3. Also, the standard errors of RE, PTE are all smaller than their counterparts of AC. These results clearly show that the proposed estimators improve upon the usual AC estimator.

We have introduced two classes of estimators, the hat and tilde estimators, and made a comprehensive comparison of their risk functions in some special cases. The comparison in general cases is more complicated, and it is difficult to say which estimators should be used in practice, unless further information is available, such as the values of *R*, *r*, *G* and *H* etc. Usually, the practitioner may have some prior information about *R* and *r*. Also, once the sample is obtained, one can estimate *G* and *H*. Based on our comparison (see Figures 1 and and2),2), the quadratic bias function and the risk functions of the hat estimators and the tilde estimators for the slope parameters do not differ substantially, even for different weight matrices. Our analysis indicates that if the prior knowledge about the regression parameters is true, RE, PTE, JSTE and PRSE all have smaller risks than the AC estimator. When the regression parameters deviates from the prior information, RE becomes useless eventually, while PTE behaves quite well, except for some medium departure from the null hypothesis. If the slope parameters are of interest, JSTE and PRSE are highly recommended in that they possess smaller risk than the AC estimator, RE and PTE. In particular, PRSE dominates JSTE in some special cases, see the discussions following Corollary 3.1.

The procedure developed in this paper can easily be extended to the case where Σ* _{uu}* is unknown but we have a consistent estimator of Σ

It is also possible to extend the procedure to the partially linear models *Y* = *X*′ *β* + *ν*(*z*) + *ε* with error-prone linear covariate *X*. The major work can be regarded as a combination of this paper and the work of Liang et al. [1999], because the latter already derived a root −*n* consistent estimator of the parameter *β*.

In principle, the method proposed in this paper can also be extended to linear or partially linear models for longitudinal data. The derivation should be straightforward except for more complex notation.

The authors thank two referees for their helpful comments which improved the presentation of this manuscript. They also thank Dr. Raymond J. Carroll for very helpful comments. This research was partially supported by NIH/NIAID grants AI62247 and AI059773.

A direct derivation yields _{AC} − *β _{n}* = (

$$\sqrt{n}({\widehat{\beta}}_{\text{AC}}-{\beta}_{n})={({S}_{WW}-{\mathrm{\sum}}_{uu})}^{-1}\left[\frac{1}{\sqrt{n}}\sum _{i=1}^{n}{\xi}_{in}-\sqrt{n}({\mu}_{x}-\overline{W}){\overline{u}}^{\prime}{\beta}_{n}+{O}_{p}(1/\sqrt{n})\right].$$

Note that the last two terms approach zero and *S _{WW}* − Σ

For all other hat estimators, we have the general form * = _{AC} − (_{AC} − _{RE})(*L _{n}*). Recall

$$\sqrt{n}({\widehat{\beta}}^{\ast}-{\beta}_{n})=\sqrt{n}({\widehat{\beta}}_{\text{AC}}-{\beta}_{n})-\phi ({L}_{n}){\widehat{H}}_{n}{R}^{\prime}{(R{\widehat{H}}_{n}{R}^{\prime})}^{-1}[R\sqrt{n}({\widehat{\beta}}_{\text{AC}}-{\beta}_{n})+\mathit{\delta}].$$

So the result for hat estimators follows from the fact that *Ĥ _{n}R*′(

To prove Corollary 3.1 and compute the risk function of other estimators, we need the following lemmas.

(Saleh, 2006) If the p × 1 vector Y is distributed normally with mean vector μ and covariance matrix I_{p×p}, then for any measurable function ,

$$E[\phi ({Y}^{\prime}Y)Y{Y}^{\prime}]=E[\phi ({\chi}_{(p+2,{\mu}^{\prime}\mu )}^{2})]{I}_{p\times p}+E[\phi ({\chi}_{p+4,{\mu}^{\prime}\mu}^{2})]\mu {\mu}^{\prime}.$$

(A.1)

$$E[\phi ({Y}^{\prime}Y)Y]=\mu E[\phi ({\chi}_{(p+2,{\mu}^{\prime}\mu )}^{2})]$$

(A.2)

provided that the expectations exist.

If the p × 1 vector Y is distributed normally with mean vector μ and covariance matrix I_{p×p}, and A is an idempotent matrix with rank q ≤ p, then for any measurable function , we have

$$\begin{array}{l}E[\phi ({Y}^{\prime}AY)Y{Y}^{\prime}]=(I-A)(I+\mu {\mu}^{\prime})(I-A){h}_{0}\\ +(A+A\mu {\mu}^{\prime}+\mu {\mu}^{\prime}A-2A\mu {\mu}^{\prime}A){h}_{2}+A\mu {\mu}^{\prime}A{h}_{4}\end{array}$$

(A.3)

and

$$E\phi ({Y}^{\prime}AY)Y=A\mu {h}_{2}+(I-A)\mu {h}_{0}$$

(A.4)

provided the expectations exist, where
${h}_{i}=E\phi ({\chi}_{(q+i,\lambda )}^{2})$, λ = *μ*′ *A μ*.

Let *P* be the *p* × *p* orthogonal matrix such that *PAP*′ = Blockdiag(*I _{q}*

$$E[\phi ({Y}^{\prime}AY)Y{Y}^{\prime}]=E[\phi ({Z}_{1}^{\prime}{Z}_{1}){P}^{\prime}Z{Z}^{\prime}P]={P}^{\prime}E\left[\phi ({Z}_{1}^{\prime}{Z}_{1})\left(\begin{array}{cc}{Z}_{1}{Z}_{1}^{\prime}& {Z}_{1}{Z}_{2}^{\prime}\\ {Z}_{2}{Z}_{1}^{\prime}& {Z}_{2}{Z}_{2}^{\prime}\end{array}\right)\right]P.$$

By (A.1),
$E[\phi ({Z}_{1}^{\prime}{Z}_{1}){Z}_{1}{Z}_{1}^{\prime}]=E[\phi ({\chi}_{(q+2,{\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}\mu )}^{2})]{I}_{q\times q}+E[\phi ({\chi}_{q+4,{\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}\mu}^{2})]{P}_{1}\mu {\mu}^{\prime}{P}_{1}^{\prime}$. By the independence of *Z*_{1} and *Z*_{2} and (A.2),

$$\begin{array}{c}E[\phi ({Z}_{1}^{\prime}{Z}_{1}){Z}_{1}{Z}_{2}^{\prime}]=E[\phi ({Z}_{1}^{\prime}{Z}_{1}){Z}_{1}]E{Z}_{2}^{\prime}={P}_{1}\mu {\mu}^{\prime}{P}_{2}^{\prime}E[\phi ({\chi}_{(q+2,{\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}\mu )}^{2})],\\ E[\phi ({Z}_{1}^{\prime}{Z}_{1}){Z}_{2}{Z}_{2}^{\prime}]=E[\phi ({Z}_{1}^{\prime}{Z}_{1})]E({Z}_{2}{Z}_{2}^{\prime})=(I+{P}_{2}\mu {\mu}^{\prime}{P}_{2}^{\prime})E[\phi ({\chi}_{(q,{\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}\mu )}^{2})].\end{array}$$

Hence *E*[(*Y*′*AY*)*YY*′] equals to

$$\begin{array}{l}({P}_{2}^{\prime}{P}_{2}+{P}_{2}^{\prime}{P}_{2}\mu {\mu}^{\prime}{P}_{2}^{\prime}{P}_{2})E[\phi ({\chi}_{(q,{\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}\mu )}^{2})]+({P}_{1}^{\prime}{P}_{1}+{P}_{1}^{\prime}{P}_{1}\mu {\mu}^{\prime}{P}_{2}^{\prime}{P}_{2}\\ +{P}_{2}^{\prime}{P}_{2}\mu {\mu}^{\prime}{P}_{1}^{\prime}{P}_{1})E[\phi ({\chi}_{(q+2,{\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}\mu )}^{2})]+{P}_{1}^{\prime}{P}_{1}\mu {\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}E[\phi ({\chi}_{(q+4,{\mu}^{\prime}{P}_{1}^{\prime}{P}_{1}\mu )}^{2})].\end{array}$$

(A.5)

Thus (A.3) is obtained by using the facts that ${P}_{1}^{\prime}{P}_{1}=A,{P}_{2}^{\prime}{P}_{2}=I-A$.

To prove (A.4), one can show that $E\phi ({Y}^{\prime}AY)Y={P}_{1}^{\prime}{P}_{1}\mu {h}_{2}+{P}_{2}^{\prime}{P}_{2}\mu {h}_{0}$. Again ${P}_{1}^{\prime}{P}_{1}=A$, and ${P}_{2}^{\prime}{P}_{2}=I-A$ imply the desired result.

It is sufficient to show that
${J}_{H}^{\prime}{J}_{H}\le {J}_{G}^{\prime}{J}_{G}$. From the normality of *u* and Lemma A.2, one can show that

$$G=({\sigma}^{2}+{\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{uu}\beta ){\mathrm{\sum}}_{xx}^{-1}{\mathrm{\sum}}_{WW}{\mathrm{\sum}}_{xx}^{-1}+{\mathrm{\sum}}_{xx}^{-1}{\mathrm{\sum}}_{uu}\beta {\beta}^{\prime}\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{uu}{\mathrm{\sum}}_{xx}^{-1}.$$

Let
${c}_{1}=({\sigma}^{2}+{\sigma}_{uu}{\beta}^{\prime}\beta )({\sigma}_{xx}+{\sigma}_{uu})/{\sigma}_{xx}^{2},{c}_{2}={\sigma}_{uu}^{2}/{\sigma}_{xx}^{2}$ The diagonal form of the covariance matrices of *x* and *u* implies *G* = *c*_{1}*I* + *c*_{2}*ββ*′. Then

$$\mathit{RGG}\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}=R{[{c}_{1}I+{c}_{2}\beta {\beta}^{\prime}]}^{2}{R}^{\prime}={c}_{1}^{2}R{R}^{\prime}+2{c}_{1}{c}_{2}R\beta {\beta}^{\prime}{R}^{\prime}+{c}_{2}^{2}({\beta}^{\prime}\beta )R\beta {\beta}^{\prime}{R}^{\prime}.$$

(A.6)

Note that
$H=({\sigma}_{xx}+{\sigma}_{uu})I/{\sigma}_{xx}^{2}$, one can obtain that (*RH R*′)^{−1}*RH HR*′(*RH R*′)^{−1} = (*RR*′)^{−1}.

It follows from a direct calculation that

$$\begin{array}{l}(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}){(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RH\phantom{\rule{0.16667em}{0ex}}H\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})\\ =R({c}_{1}I+{c}_{2}\beta {\beta}^{\prime}){R}^{\prime}{(R{R}^{\prime})}^{-1}R({c}_{1}I+{c}_{2}\beta {\beta}^{\prime}){R}^{\prime}\\ ={c}_{1}^{2}R{R}^{\prime}+2{c}_{1}{c}_{2}R\beta {\beta}^{\prime}{R}^{\prime}+{c}_{2}^{2}({\beta}^{\prime}{R}^{\prime}{(R{R}^{\prime})}^{-1}R\beta )R\beta {\beta}^{\prime}{R}^{\prime}.\end{array}$$

(A.7)

Since *R*′(*RR*′)^{−1}*R* is an idempotent matrix with rank *q* ≤ *p*, so *I* −*R*′(*RR*′)^{−1}*R* is nonnegative definite, and then *β*′*R*′(*RR*′)^{−1}*Rβ* ≤ *β*′*β*, (A.6) and (A.7) imply

$$(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}){(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RH\phantom{\rule{0.16667em}{0ex}}H\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})\le \mathit{RGG}\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}$$

or

$${(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}RH\phantom{\rule{0.16667em}{0ex}}H\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}{(RH\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\le {(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}\mathit{RGG}\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}{(RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})}^{-1}.$$

Note that the left hand side is ${J}_{H}^{\prime}{J}_{H}$, and the right hand side is ${J}_{G}^{\prime}{J}_{G}$. We complete the proof.

From Theorem 2.1, one can see that the asymptotic distributions for all estimators are the same as that of *Z* − (*L*)*CR*′(*RCR*′)^{−1} (*RZ* + ** δ**), where

Let *Y* = *G*^{−1/2}*Z* + *μ*, *μ* = *G*^{1/2}*R*′(*RGR*′)^{−1}** δ**,

$$E\phi (L)Z={G}^{1/2}\mu [E\phi ({\chi}_{q+2,\lambda}^{2})-E\phi ({\chi}_{q,\lambda}^{2})],$$

(A.8)

and

$$E\phi (L)Z{Z}^{\prime}={G}^{1/2}[(I-A+\mu {\mu}^{\prime})E\phi ({\chi}_{q,\lambda}^{2})+(A-2\mu {\mu}^{\prime})E\phi ({\chi}_{q+2,\lambda}^{2})+\mu {\mu}^{\prime}E\phi ({\chi}_{q+4,\lambda}^{2})]{G}^{1/2}.$$

(A.9)

In fact, *E*(*L*)*Z* can be written as *G*^{1/2}*E*(*L*)*Y* − *G*^{1/2}*μE*(*L*). By (A.4) and
$E\phi (L)=E\phi ({\chi}_{q,\lambda}^{2})$, we obtain that

$$E\phi (L)Z={G}^{1/2}A\mu E\phi ({\chi}_{q+2,\lambda}^{2})+{G}^{1/2}(I-A)\mu E\phi ({\chi}_{q,\lambda}^{2})-{G}^{1/2}\mu E\phi ({\chi}_{q,\lambda}^{2}).$$

Then (A.8) is obtained by noticing that *Aμ* = *μ*.

Note that *E*(*L*)*ZZ*′ = *G*^{1/2}*E*[(*L*) (*Y* − *μ*) (*Y* − *μ*)′]*G*^{1/2}, and *E*[(*L*) (*Y* − *μ*) (*Y* − *μ*)′] = *E*[(*L*)*YY*′] − *E*[(*L*)*Y*]*μ*′ − *μE*[(*L*)*Y*′] + *μμ*′*E*[(*L*)]. Using (A.3) and (A.4), after some algebra, we prove (A.9).

Now we are ready to compute the risk functions. For any positive definite matrix *M*, we have
$\rho (\beta ,{\beta}_{n}^{\ast})=E[Z-\phi (L){J}_{C}(RZ+\mathit{\delta}){]}^{\prime}M[Z-\phi (L){J}_{C}(RZ+\mathit{\delta})]$, which equals

$$E({Z}^{\prime}\phantom{\rule{0.16667em}{0ex}}M\phantom{\rule{0.16667em}{0ex}}Z)-2E[\phi (L){Z}^{\prime}\phantom{\rule{0.16667em}{0ex}}M\phantom{\rule{0.16667em}{0ex}}{J}_{C}(RZ+\mathit{\delta})]+E[{\phi}^{2}(L){J}_{C}(RZ+\mathit{\delta}){]}^{\prime}M[{J}_{C}(RZ+\mathit{\delta})].$$

The first term equals tr(*MG*). To compute the second and third terms, note that

$$E[\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}]=RG\phantom{\rule{0.16667em}{0ex}}E[\phi ({\chi}_{q+2,\lambda}^{2})]-\mathit{\delta}{\mu}^{\prime}{G}^{1/2}E[\phi ({\chi}_{q+2,\lambda}^{2})]+\mathit{\delta}{\mu}^{\prime}{G}^{1/2}E[\phi ({\chi}_{q+4,\lambda}^{2})]$$

(A.10)

and

$$E[{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}]=RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime}\phantom{\rule{0.16667em}{0ex}}E[{\phi}^{2}({\chi}_{q+2,\lambda}^{2})]+\mathit{\delta}{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}E[{\phi}^{2}({\chi}_{q+4,\lambda}^{2})]$$

(A.11)

$$\begin{array}{l}\rho (\beta ,{\beta}_{n}^{\ast})=\text{tr}(M\phantom{\rule{0.16667em}{0ex}}G)-2\text{tr}\{M\phantom{\rule{0.16667em}{0ex}}{J}_{C}\phantom{\rule{0.16667em}{0ex}}E[\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}]\}\\ +\text{tr}\{{J}_{C}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{C}\phantom{\rule{0.16667em}{0ex}}E[{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}]\}\\ =\text{tr}M\phantom{\rule{0.16667em}{0ex}}G-2\text{tr}(RG\phantom{\rule{0.16667em}{0ex}}M\phantom{\rule{0.16667em}{0ex}}{J}_{C})E[\phi ({\chi}_{q+2,\lambda}^{2})]+2{\mu}^{\prime}{G}^{1/2}M\phantom{\rule{0.16667em}{0ex}}{J}_{C}\mathit{\delta}\phantom{\rule{0.16667em}{0ex}}E[\phi ({\chi}_{q+2,\lambda}^{2})]\\ -2{\mu}^{\prime}{G}^{1/2}M\phantom{\rule{0.16667em}{0ex}}{J}_{C}\mathit{\delta}\phantom{\rule{0.16667em}{0ex}}E[\phi ({\chi}_{q+4,\lambda}^{2})]+\text{tr}({J}_{C}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{C}\phantom{\rule{0.16667em}{0ex}}RG\phantom{\rule{0.16667em}{0ex}}{R}^{\prime})E[{\phi}^{2}({\chi}_{q+2,\lambda}^{2})]\\ +{\mathit{\delta}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{C}^{\prime}M\phantom{\rule{0.16667em}{0ex}}{J}_{C}\mathit{\delta}\phantom{\rule{0.16667em}{0ex}}E[{\phi}^{2}({\chi}_{q+4,\lambda}^{2})].\end{array}$$

This completes the proof.

Recall that the AC estimator for *β* is given by _{AC} = (*S _{WW}* − Σ

$$\sqrt{n}\left(\begin{array}{c}{\widehat{\beta}}_{0n}-{\beta}_{0}\\ {\widehat{\beta}}_{\text{AC}}-{\beta}_{n}\end{array}\right)=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\left[\begin{array}{c}{\epsilon}_{i}-{u}_{i}^{\prime}{\beta}_{n}-{\mu}_{xx}^{\prime}{\mathrm{\sum}}_{xx}^{-1}({W}_{i}-{\mu}_{x})(\epsilon -{u}_{i}^{\prime}{\beta}_{n})-{\mu}_{x}^{\prime}{\mathrm{\sum}}_{xx}^{-1}{\mathrm{\sum}}_{uu}{\beta}_{n}\\ {\mathrm{\sum}}_{xx}^{-1}({W}_{i}-{\mu}_{x})(\epsilon -{u}_{i}^{\prime}{\beta}_{n})+{\mathrm{\sum}}_{xx}^{-1}{\mathrm{\sum}}_{uu}{\beta}_{n}\end{array}\right]+{o}_{p}(1).$$

Let
${Z}_{n}=\sqrt{n}({\widehat{\beta}}_{\text{AC}}-{\beta}_{n}),{Z}_{0n}=\sqrt{n}({\widehat{\beta}}_{0n}-{\beta}_{0})$. The multivariate central limit theorem and the fact that lim_{n}_{→∞}*β _{n}* =

$$\left(\begin{array}{cc}{\sigma}_{0}^{2}& E{\beta}^{\prime}u{u}^{\prime}\beta {u}^{\prime}{\mathrm{\sum}}_{xx}^{-1}-{\mu}_{x}^{\prime}G\\ {\mathrm{\sum}}_{xx}^{-1}Eu{\beta}^{\prime}u{u}^{\prime}\beta -G{\mu}_{x}& G\end{array}\right),$$

(A.12)

where *G* is given in (2.5) and *β* here is subject to *Rβ* = *r*.

Note that all the proposed estimators for *β* have a common form
${\beta}_{n}^{\ast}={\widehat{\beta}}_{\text{AC}}-\phi ({L}_{n}){J}_{{C}_{n}}(R{\widehat{\beta}}_{\text{AC}}-r)$ with *C _{n}* =

$$\begin{array}{l}nE{[{x}_{0}^{\prime}({\beta}_{n}^{\ast}-{\beta}_{n})]}^{2}\to E[{x}_{0}^{\prime}{(Z-\phi (L){J}_{C}(RZ+\mathit{\delta})]}^{2}\\ ={x}_{0}^{\prime}G{x}_{0}-2{x}_{0}^{\prime}{J}_{C}E[\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}]{x}_{0}\\ +{x}_{0}^{\prime}{J}_{C}E{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}{J}_{C}^{\prime}{x}_{0}.\end{array}$$

(A.13)

The estimators for the intercept *β _{0}*,
${\beta}_{0n}^{\ast}=\overline{Y}-{\overline{W}}^{\prime}{\beta}_{n}^{\ast}$, can be expressed

$$\sqrt{n}({\beta}_{0n}^{\ast}-{\beta}_{0})=\sqrt{n}({\widehat{\beta}}_{0n}-{\beta}_{0})+\sqrt{n}\phi ({L}_{n}){\overline{W}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{{C}_{n}}(R{\widehat{\beta}}_{\text{AC}}-r),$$

which can be written as

$$\sqrt{n}({\beta}_{0n}^{\ast}-{\beta}_{0})={Z}_{0n}+\phi ({L}_{n}){\overline{W}}^{\prime}\phantom{\rule{0.16667em}{0ex}}{J}_{{C}_{n}}(R{Z}_{n}+\mathit{\delta}),$$

by recalling the notation *Z*_{0}* _{n}* and

$$\begin{array}{l}nE{({\beta}_{0n}^{\ast}-{\beta}_{0})}^{2}\to E{[{Z}_{0}+\phi (L){\mu}_{x}^{\prime}{J}_{C}(RZ+\mathit{\delta})]}^{2}\\ ={\sigma}_{0}^{2}+{\mu}_{x}{J}_{C}E[{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}]{J}_{C}^{\prime}{\mu}_{x}\\ +2{\mu}_{x}^{\prime}{J}_{C}E[\phi (L){Z}_{0}(RZ+\mathit{\delta})].\end{array}$$

To deal with the last term, denote
$\tau ={G}^{-1}{\mathrm{\sum}}_{xx}^{-1}Eu{\beta}^{\prime}u{u}^{\prime}\beta -{\mu}_{x}$, then *E*(*Z*_{0}|*Z*) = *τ*′*Z*, *Z*_{0} − *E*(*Z*_{0}|*Z*) and *Z* are independent, and *L* depends on *Z* only. We have

$$\begin{array}{l}E[\phi (L){Z}_{0}(RZ+\mathit{\delta})]=E\{\phi (L)[{Z}_{0}-E({Z}_{0}\mid Z)+E({Z}_{0}\mid Z)](RZ+\mathit{\delta})\}\\ =E[\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}]\tau .\end{array}$$

(A.14)

Therefore, $nE{({\beta}_{0n}^{\ast}-{\beta}_{0})}^{2}$ tends to

$${\sigma}_{0}^{2}+{\mu}_{x}{J}_{C}E[{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}]{J}_{C}^{\prime}{\mu}_{x}+2{\mu}_{x}^{\prime}{J}_{C}E[\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}]\tau .$$

(A.15)

Finally, we have

$$\begin{array}{l}nE[({\beta}_{0n}^{\ast}-{\beta}_{0}){x}_{0}^{\prime}({\beta}_{n}^{\ast}-{\beta}_{n})]\to E\{[{Z}_{0}+\phi (L){\mu}_{x}^{\prime}{J}_{C}(RZ+\mathit{\delta})][Z-\phi (L){J}_{C}(RZ+\mathit{\delta})]\}{x}_{0}\\ ={x}_{0}^{\prime}E({Z}_{0}Z)-E\phi (L)(RZ+\mathit{\delta}{)}^{\prime}{Z}_{0}{J}_{C}^{\prime}{x}_{0}+{\mu}_{x}^{\prime}{J}_{C}E\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}{x}_{0}\\ -{\mu}_{x}^{\prime}{J}_{C}E{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}{J}_{C}^{\prime}{x}_{0}.\end{array}$$

From *E*(*Z*_{0}|*Z*) = *τ*′*Z*, we have *E*(*Z*_{0}*Z*) = *E*(*ZZ*′)*τ* = *Gτ*. Then by (A.14),

$$\begin{array}{l}nE[({\beta}_{0n}^{\ast}-{\beta}_{0}){x}_{0}^{\prime}({\beta}_{n}^{\ast}-{\beta}_{n})]\to {x}_{0}^{\prime}G\tau -{\tau}^{\prime}E\phi (L)Z(RZ+\mathit{\delta}{)}^{\prime}{J}_{C}^{\prime}{x}_{0}\\ +{\mu}_{x}^{\prime}{J}_{C}E\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}{x}_{0}-{\mu}_{x}^{\prime}{J}_{C}E{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}{J}_{C}^{\prime}{x}_{0}.\end{array}$$

(A.16)

A combination of (A.15), (A.13) and (A.16) yields

$$\begin{array}{l}\rho ({\beta}_{0}-{x}_{0}^{\prime}\beta ,{\beta}_{0n}^{\ast}+{x}_{0}^{\prime}{\beta}_{n}^{\ast})={\sigma}_{0}^{2}+{x}_{0}^{\prime}G{x}_{0}+2{x}_{0}^{\prime}G\tau \\ +({x}_{0}-{\mu}_{x}{)}^{\prime}{J}_{C}E{\phi}^{2}(L)(RZ+\mathit{\delta})(RZ+\mathit{\delta}{)}^{\prime}{J}_{C}^{\prime}({x}_{0}-{\mu}_{x})\\ -2({x}_{0}-{\mu}_{x}{)}^{\prime}{J}_{C}E\phi (L)(RZ+\mathit{\delta}){Z}^{\prime}({x}_{0}+\tau ).\end{array}$$

The theorem is proved by this expression, (A.10) and (A.11).

AMS 2000 subject classification: 62J05; 62F30; secondary 62J99.

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Hua Liang, Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.

Weixing Song, Department of Statistics, Kansas State University, Manhattan, Kansas, 66506, USA.

1. Bancroft TA. On biases in estimation due to the use of preliminary tsts of significance. Ann Math Statist. 1944;15:190–204.

2. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models. 2. New York: Chapman and Hall; 2006.

3. Fuller WA. Measurement Error Models. New York: Wiley & Sons; 1987.

4. Gleser LJ. The importance of assessing measurement reliability in multivariate regression. J Am Statist Assoc. 1992;87:696–707.

5. Goel PK, DeGroot MH. Only normal distributions have linear posterior expectations in linear regression. J Am Statist Assoc. 1980;75:895–900.

6. James W, Stein C. Estimation with quadratic loss. Proceedings of the Fourth Berkeley Symposium on Mathematics and Statistical Probability. 1961;1:361–379.

7. Judge GG, Bock ME. The Statistical Implications of Pre-test and Stein-Rule Estimators in Econometrics. New York: North Holland Publication; 1978.

8. Kim HM, Saleh AKMdE. Preliminary test estimators of the parameters of simple linear model with measurement error. Metrika. 2003;57:223–251.

9. Kim HM, Saleh AKMdE. Improved estimation of regression parameters in measurement error models. J Mult Anal. 2005;95:273–300.

10. Liang H, Härdle W, Carroll RJ. Estimation in a semiparametric partially linear errors-in-variables model. Ann Statist. 1999;27:1519–1535.

11. Rao CR. Characterization of prior distributions and solutions to a compound decision problem. Ann Statist. 1976;4:823–835.

12. Rosner B, Spiegelman D, Willett WC. Correction of logistic regression relative risk estimates and confidence intervals for measurement error: the case of multiple covariates measured with error. Am J Epidemiol. 1990;132:734–735. [PubMed]

13. Saleh AKMdE. Theory of Preliminary Test and Stein-Type Estimation with Application. New York: Wiley & Sons; 2006.

14. Saleh AKMdE, Sen PK. On shrinkage least-squares estimation in a parallelism problem. Ann Statist. 1978;6:154–168.

15. Saleh AKMdE, Sen PK. Non-parametric estimation of location parameter after a preliminary test regression. Commun Statist. 1986;15:1451–1466.

16. Schneeweiss Consistent estimation of a regression with errors in the variables. Metrika. 1976;23:101–115.

17. Sen PK, Saleh AKMdE. On preliminary test and shrinkage M-estimation in linear models. Ann Statist. 1987;15:1580–1592.

18. Shalabh Improved estimation in measurement error models through Stein rule procedure. J Mult Anal. 1998;67:35–48.

19. Stanley TD. Stein rule least squares estimation, a heuristic for fallible data. Econ Lett. 1986;20:147–150.

20. Stanley TD. Improved estimators in some linear errors-in-variables models in finite samples. J Forecasting. 1988;7:103–113.

21. Stein C. Inadmissibility of theusual estimator for the mean of a multivariate normal distribution. Proceedings of the Third Berkeley Symposium on Mathematics and Statistical Probability. 1956;1:197–206.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |