|Home | About | Journals | Submit | Contact Us | Français|
In this paper, we define two restricted estimators for the regression parameters in a multiple linear regression model with measurement errors when prior information for the parameters is available. We then construct two sets of improved estimators which include the preliminary test estimator, the Stein-type estimator and the positive rule Stein type estimator for both slope and intercept, and examine their statistical properties such as the asymptotic distributional quadratic biases, the asymptotic distributional quadratic risks. We remove the distribution assumption on the error term, which was generally imposed in the literature, but provide a more general investigation of comparison of the quadratic risks for these estimators. Simulation studies illustrate the finite-sample performance of the proposed estimators, which are then used to analyze a dataset from the Nurses Health Study.
Improvement of estimation for regression models is a fundamental and interesting topic. In certain cases, one may have, but is not sure, some prior information for the parameters of interest. By incorporating the information into the estimation procedure, one may give more efficient estimators than the estimators obtained when the prior information is ignored. Statistical approaches for developing more efficient estimators can be roughly classified into two categories. The first one focuses on developing a proper test procedure to check the validity of the uncertain prior. If the prior information is confirmed, then the commonly used estimators are modified to accommodate the prior. The second one is to develop a procedure in which test and estimation can be conducted simultaneously. The first procedure is very natural and commonly used for theory and application purposes. For example, consider the multiple linear regression models:
where X is the n ×p design matrix with rank p, β is the p × 1 regression parameter, and ε is the n × 1 random error vector. Suppose that we have prior information for β, which can be described as Rβ = r, where R is a q×p known matrix of rank q and r is a q×1 known vector, q ≤ p. A proper test statistic would be based on a distance between Rn and r, where n is a “good” estimator of β, for instance, the least square estimator, LS = (X′X)−1X′Y, or the maximum likelihood estimator. If the prior information is rejected, one should keep using these “good” estimators, otherwise restricted least square estimators should be used.
The estimators for the regression parameters, which fall into the second category, include the preliminary test estimator (PTE), the James-Stein type estimator (JSTE), and the positive rule Stein type estimator (PRSE). See Judge and Bock (1978) and Saleh (2006) for a detailed discussion on these estimators. Bancroft (1944) was among the first to consider PTE. Saleh and Sen (1978) extended his idea to a nonparametric setup. JSTE was introduced by Stein (1956) and James and Stein (1961), and expanded by Saleh and Sen (1978, 1986) and Sen and Saleh (1987) to nonparametric areas.
The aforementioned estimation techniques have received much attention recently in linear regression model when the covariates are measured with errors. Stanley (1986, 1988) revealed that JSTE can eliminate inconsistency of the classical least square estimators. Shalabh (1998) studied properties of JSTE when the covariance matrix of the measurement errors is known. For the simple linear regression model with measurement error, when the slope parameter may be the null scalar and all the random components are normally distributed, Kim and Saleh (2003) compared these estimators in the sense of asymptotic distributional quadratic bias, mean square error matrix, and asymptotic distributional quadratic risk. Their comparisons show that PTE behaves better than the attenuation-correction (AC) estimator if the slope is close to 0, but not uniformly better than the AC estimator over the whole range of the regression parameter. Kim and Saleh (2005) further investigated the same question for multiple linear models under the same assumption and setting. Various risk functions based on the asymptotic distribution of the estimators under certain local alternatives are calculated and compared. They also showed that JSTE and PRSE dominate the AC estimator.
This paper mainly focuses on the estimation problem in multiple linear regression models with measurement errors. The contributions we made to the existing literature in this work contain four parts:
The outline of the paper is as follows. In Section 2, we define two REs, based on which two sets of PTE, JSTE and PRSE are constructed. The risk functions of various estimators for the slope parameters under the null hypothesis and local alternatives are presented in Section 3. Also the risks are compared among proposed estimators in some special cases. Simulation studies are presented in Section 4. The proposed estimators are used to analyze a dataset from the Nurses Health Study in Section 5. Our results provide more appropriate estimates and information for nutritional study. The proofs of the main results are shifted to the Appendix.
Suppose that (Yi, Xi), i = 1,…,n, constitute an independent and identically distributed sample from the linear regression model:
where Xi is a p × 1 vector-valued covariate. We are interested in the estimation of the unknown parameter β when the covariates Xi are measured with error. Instead of observing Xi, we observe Wi = Xi + ui, where the measurement errors ui are independent and identically distributed, independent of (Yi, Xi), with mean zero and covariance matrix Σuu which is assumed to be known throughout this paper.
Section 4.4.3 of Saleh (2006) provides a general road map to construct improved estimators if one has some uncertain information about the unknown parameter, say θ Θ, in a statistical model. First, we obtain an optimal unrestricted estimator, say θn, and an optimal RE, say , by likelihood method if the likelihood function is available, or by least squares method if the likelihood function is unavailable. Second, we construct an optimum test statistic, say Ln, for testing the “uncertain prior information”, say θ Θ0, where Θ0 is a subset of the parameter space Θ. Third, we construct PTE of θ as , where Ln,α is the α-level critical value of Ln from its distribution under H0: θ Θ0. Finally, we replace the indicator function I(Ln < Ln,α) by a smooth decreasing function , where c is a suitable constant derived by using empirical Bayesian theory, then JSTE is defined by , and PRSE by .
To adopt the above general rule in our setting, we need to find an “optimal” test statistic to check Rβ = r. The test statistic we use in this paper is Ln = n(RAC − r)′(RĜnR′)−1(RAC − r), where AC is the AC estimator which is defined by
where SWW and SWY are the sample covariance matrices of Wi’s, and the sample covariance between Wi’s and Yi’s, respectively, Ĝn is a consistent estimator of the asymptotic covariance matrix of AC, denoted by G which is defined by (2.5) in the next section. Under general conditions, Ln has an asymptotic χ2-distribution with q degrees of freedom. In fact, Ln is the likelihood ratio statistic under the normality assumptions. The next step towards our end is to find an RE for β under the general constraint.
Here is the intuition behind the methods of construction of RE. If (X, ε, U) is normally distributed, then we can directly calculate the conditional expectation E(Y|W), which is a linear function of W and can be used to derive the maximum likelihood estimators of β0 and β, and the associated RE. If (X, ε, U) is not normally distributed, E(Y|W) may not easily be calculated, or may be a nonlinear function of W. We give another version of RE as follows.
Assume that (Xi, εi, ui) follow N2p+1[( , 0, 0′)′, blockdiag(Σxx, σ2, Σuu)]. It is easy to see that E(Yi|Wi) = ν0 + ν′Wi, , where ν0 = β0 + β′(I − Kxx)μx, ν = Kxxβ, is the reliability matrix. Gleser (1992) and Kim and Saleh (2005) showed that the maximum likelihood estimators of ν0, ν and are just the naive least squares estimators, namely and provided is nonnegative, where , and are the sample means of Yi’s and Wi’s, respectively. Hence an estimator for the slope β can be obtained from ν = Kxxβ with ν replaced by n and Kxx by xx. This leads to the AC estimator AC given by (2.2), and 0 = − ′n.
To construct REs for β0 and β, we first give a RE for ν. Note that the general restriction can be written as . If Kxx is known, then using the Lagrangian multiplier, one can show that the restricted maximum likelihood estimators of ν, ν0 are given by,
If the reliability matrix Kxx is unknown, replace it with xx. Then the REs for β, β0 are defined as
where , which can easily be shown to be a consistent estimator of .
If the random components are not normal, the conditional expectation E(Y|W) = β0 + β′E(X|W) may not be linear in W, and the conditional variance Var(Y|W) may vary with W. Hence, a linear model is inappropriate. In fact, linearity of E(X|W) in W and homoscedasticity of Var(X|W) in W imply that (X′, U′) must be multivariate normal, see Geol and Degroot (1980), Rao (1976). Although the estimators given in (2.3) are obtained based on the normal assumption, it is worth investigate whether the estimators RE, 0, RE have good properties in non-normal settings. See Gleser (1992) for the details.
Another way to construct the REs in the measurement error regression models is to mimic the procedure for the RE in model (1.1). That is, RE = LS − SR′(RSR′)−1(RLS − r), where S is the asymptotic covariance matrix of LS. Then an RE for β in the measurement error setting can be defined as
where Ĝn is a consistent estimator of the asymptotic covariance matrix G of AC, and
See Fuller (1987), Carroll et al. (2006) and Kim and Saleh (2005) how to derive Ĝn under the normal and non-normal setups. The RE for β0 can be defined accordingly. Clearly, the difference between the REs given in (2.3) and (2.4) comes from the difference between Ĥn and Ĝn.
Now we are ready to give PTE, JSTE and PRSE for the regression coefficient β based on Saleh’s (2006) general rule and the REs defined by (2.3) and (2.4). For the sake of clarity, we shall put a ^ over those estimators based on RE, and a ~ over those based on RE.
In a similar way, we can define PTE, JSTE, and PRSE by replacing RE with RE in the expressions of PTE, JSTE and PRSE, respectively. It would be interesting to see what differences these estimators based on the two REs may have with respect to the risk comparisons.
From the above definitions, one can see that if the data yield , then PTE = RE, otherwise, PTE = AC. So PTE is indeed a simple mixture of the AC estimator and the RE. In the ordinary two-step procedure, one would test the hypothesis Rβ = r first, then based on the testing result decide which estimator will be adopted. PTE simply combines these two steps to form a single one. That is, testing and estimation are done simultaneously, while JSTE replaces the indicator function with a continuous function, , of Ln. In the normal case, one can actually obtain JSTE by the empirical Bayesian estimation approach. The constant appearing in JSTE in the classical multiple regression model should be (q − 2)(n − p)/(n − p + 2) instead of q − 2 (Saleh, 2006). Since we take into account the asymptotic risk function, a change of the constant to q − 2 will not induce any difference in large sample sense, although this change may have some impacts on the small sample behavior of the estimators. The derivation of PRSE in the current setting is similar to the counterpart in the classical regression case. In fact, if Ln tends to 0, then JSTE may go “past” the estimator RE, or JSTE may have a different sign from the RE. As a partial remedy, one may restrict Ln > q − 2, which results in PRSE. The corresponding PTE, JSTE and PRSE for the intercept can be defined accordingly.
For the sake of convenience, in what follows, we shall call the estimators RE, PTE, JSTE, and PRSE the “hat” estimators, and the estimators RE, PTE, JSTE, and PRSE the “tilde” estimators.
To begin with, we state regularity conditions associated with model (2.1), which will be used throughout the current and next sections. Some conditions are already mentioned in the previous section. They are listed again for the sake of completeness.
These conditions are quite common in the literature of measurement error models. The existence of the fourth moment of u is needed to ensure the asymptotic normality of the AC estimator. The existence of the fourth moment of u was also assumed in Schneeweiss (1976).
We begin with considering the asymptotic behavior of the improved estimators under the fixed alternative: Ha : Rβ − r = δ ≠ 0, where δ is a vector of length q. We claim that Ln → ∞ in probability as n → ∞. This claim is based on the following expression.
The first term is the order of Op(1), the second and the third Op(n) and positive.
Let be a generic notation for the hat estimators and tilde estimators. Under the fixed alternative, it is ready to see that if Conditions C1–C3 hold. That is, all the estimators defined above are asymptotically equivalent to AC. This implies that the asymptotic risk functions are all the same if Ha is true. Thus we can not tell any difference among these estimators.
To obtain meaningful risk comparisons among these estimators, we consider a sequence of local alternatives, that is,
with fixed R, δ and r, limn→∞ βn = β, Rβ = r, where δ is a q × 1 vector. Write
The following theorem states the asymptotic distributions of the AC, hat and tilde estimators under the local alternatives (2.6).
Suppose Conditions C1–C3 hold. Then, under the local alternatives (2.6), as n → ∞, we have, in distribution,
where Z ~ N(0, G), L = (RZ +δ)′(RGR′)−1(RZ + δ). For , (L) = 0; for the hat estimators, C = H, for RE, (L) = 1; for PTE, ; for βJSTE, (L) = (q − 2)L−1; and for PRSE, (L) = 1 − [1 − (q − 2)L−1]I(q−2,∞)(L). For the tilde estimators, C = G, the (·) functions are the same as their counterpart in the hat estimators.
In particular, if R = Ip×p, then the asymptotic distribution of the two REs are the same, so are the asymptotic distributions of the two PTEs, JSTEs, and PRSEs.
The commonly used quantities to evaluate the performance of an estimator are its bias, variance or mean squared error. For the estimators proposed in the last section, we will calculate their asymptotic distributional quadratic bias, asymptotic distributional quadratic risk function with different weight matrices, and show that all estimators but AC are asymptotically biased. The general comparisons between different estimators based on these asymptotic result turn out to be very difficult, but detailed comparisons may be made under special circumstances. We mainly focus on the discussions of asymptotic bias and risk functions of the estimators for the slope parameters and certain linear combinations of the intercept and slope parameters.
The asymptotic distributional bias function of the estimator is defined as the bias of the asymptotic distribution of , that is, according to Theorem 2.1. To make meaningful comparisons among the asymptotic distributional biases, we define the asymptotic distributional quadratic bias as below:
Let M be a known positive definite weight matrix. The asymptotic risk function of the estimator of β is defined as
To concisely state the results, the following notation is used.
where i = 2, 4, j = 1, 2, is the noncentral χ2 random variable with degrees of freedom q + i and noncentral parameter λ.
The derivation of the asymptotic distributional bias functions can be done by using the fact , for any measurable function (·). This can be verified by using (A.8) in the Appendix. The details are omitted for the sake of brevity. The following theorem lists the results for the asymptotic distributional bias. Its proof can be finished by Theorem 2.1. We omit the details.
Suppose Conditions C1–C3 hold. Then, under the local alternatives, the asymptotic biases and asymptotic distributional quadratic biases for the AC, hat and tilde estimators are given by
respectively. For the AC estimator, f(λ) = 0; for all hat estimators, C = H, and for all tilde estimators, C = G. For RE, f(λ) = 1; for PTE, f(λ) = gq+2(λ); for JSTE, f(λ) = h1,q+2(λ); and for PRSE, f(λ) = k1,q+2(λ).
Within each set of estimators, the asymptotic distributional quadratic biases can be compared simply through the quantities gq+2(λ), h1,q+2(λ), and k1,q+2(λ). It is worth mentioning that among each set of estimators, the asymptotic distributional quadratic biases of PTE, JSTE and PRSE are smaller than that of RE, since gq+2(λ), h1,q+2(λ), and k1,q+2(λ)are all not bigger than 1.
The more interesting comparison would be made between the hat estimators and tilde estimators. A natural way to make the comparisons is to compute the difference between the biases or the risk functions. However, the computation is rather complex in general situations. But for some special cases, the comparison can be made easily. In particular, we have the following corollary.
Suppose u ~ Np(0, σuuI), Σxx = σxxI, then for any δ, the asymptotic distributional quadratic biases of the hat estimators are all less than those of the corresponding tilde estimators.
For an illustrative purpose, let p = 5, q = 3 (recall that q > 2 is required in constructing JSTE and PRSE), α = 0.15, β = (3, 3, 1, 2, 3)′, Σxx = Σuu = I5×5, u ~ N(0, Σuu), σ2 = 1, and R be a 3 × 5 matrix with the components in the first two columns all 0 and the last three columns forming an identity matrix. δ is a vector with same elements, specifically, δ =δ1, where δ is a scalar, 1 is a vector of 1’s. The plot (a) in Figure 1, in which the dotted lines represent the asymptotic distributional quadratic biases for the tilde estimators and the solid lines for the hat estimators, delineates the asymptotic distributional quadratic biases. In our case, the asymptotic distributional quadratic bias functions of the hat and tilde estimators for each type of estimators (RE, PTE, JSTE or PRSE) are close to each other, but the asymptotic distributional quadratic biases of the hat estimators are all slightly smaller than those of the corresponding tilde estimators, which confirms our discovery in Corollary 3.1. Now, we change Σxx to a positive matrix with diagonal elements all 1’s and off-diagonal elements all 0.1’s, other quantities stay unchanged. In contrast to plot (a), plot (b) in Figure 1 shows an inverse direction, that is, the asymptotic distributional quadratic biases of the tilde estimators are all slightly smaller than those of the corresponding hat estimators. This phenomenon shows, in the sense of asymptotic distributional quadratic bias, that neither the hat estimators nor the tilde estimators can dominate the others over all scenarios.
The asymptotic weighted risk functions of AC, the hat and tilde estimators are summarized in the following theorem.
Suppose Conditions C1–C3 hold, then
while the risk functions of JSTE, PRSE have similar forms as the following
The risk function of JSTE corresponds to f = h, and PRSE corresponds to f = k.
The risk function comparisons can be done by investigating the matrices G and H, but the actual comparison may be complicated, if possible, because , and the risk function involves inverse operations such as (RGR′)−l. This comparison can be done in a straightforward way for some special cases. For example, if M = H−1, we have the following corollary.
Suppose Conditions C1–C3 hold. Let M = H−1. Then
the risk functions of JSTE, PRSE have similar forms as the following
The risk function of JSTE corresponds to f = h, and PRSE corresponds to f = k.
Based on Corollary 3.2, one can make the comparisons among the estimators more specifically. In fact,
By applying the above theorem with H replaced by G, we can obtain the asymptotic weighted risk functions for all tilde estimators. Similar risk comparison analysis can be made as above, which we summarize in the following theorem.
Suppose Conditions C1–C3 hold, then
the risk functions of JSTE, PRSE have similar forms as the following
The risk functions of JSTE and correspond to f = h, and f = k, respectively.
In particular, if M = G−1, then the above theorem reduces to the following corollary.
Suppose Conditions C1–C3 hold, then
For the purpose of illustration, we use the previous setting, that is, p = 5, q = 3, α = 0.15, β = (3, 3, 1, 2, 3)′, Σxx = Σuu = I5×5, σ2 = 1, and R is a 3 × 5 matrix with the components in the first two columns being 0’s and the last three columns an identity matrix. For simplicity, δ is taken to be a vector with same elements. We plot the risk functions of the estimators in Figure 2, in which plot (a) is for M = H−1, and plot (b) is for M = G−1. In our cases, the risk functions of the hat and tilde estimators for each type of estimators (RE, PTE, JSTE or PRSE) are close to each other.
Sometimes, we are interested in estimating a linear combination of the intercept and the slope parameters. For example, to predict the response at a specified value of the predictor, say x0 p, one needs to calculate the value of , where and are the generic notation for the AC estimator, hat or tilde estimators of β0 and β. As we mentioned before, the corresponding estimators of the intercept β0 are given by . For example, the hat AC estimator of β0 is defined as 0,AC = − ′AC, and the tilde RE of β0 is given by 0,RE = − ′RE. We can define the AC estimator and the hat, tilde estimators of the linear combination accordingly. The risk function of is then defined as the mean square error of the asymptotic distribution of which is denoted as , where βn is the same as in (2.6). The following theorem gives the risk functions of various estimators of the linear combination. Let .
Suppose Conditions C1–C3 hold. Then , where C = H corresponds to the estimators based on RE and C = G corresponds to the estimators based on RE, (x) = 0 for the AC estimator, (x) = 1 for the RE, for PTE, (x) = (q −2)x−1 for JSTE and (x) = 1 − [1 − [q − 2)x−1]I(q−2,∞) (x) for PRSE.
When the measurement errors are symmetric around 0, Euβ′uu′β = 0 for all β. As a consequence, we have the following corollary.
Suppose Conditions C1–C3 hold, and Euβ′uu′β = 0. Then
where , C and (·) are as in Theorem 3.4.
To see the finite sample performance of the proposed estimators, we conduct simulation experiments under the various scenarios. The data are generated from the following multiple linear regression model with measurement errors
The true regression parameter is chosen to be with Rβ = r. We consider two cases, δ = 0, and δ ≠ 0. For each case, we calculate the risk function for sample size n = 50, 100, 200, and 500, and repeat each simulation 1000 times. The values of the risk reported in the tables below are the average of 1000 sample risks. In the simulation, α is chosen to be 0.15. In practice, one can choose the value of α based on the maximin rule given in Table 1. The predictors X are generated from multivariate normal N(0, I8×8), and the measurement errors u are generated from multivariate normal N(0, 0.22I8×8). X and u are independent. The regression error ε follows N(0, 1).
When δ = 0 or Rβn = r holds.
The true values of the regression parameters are chosen to be β0 = 1, β1 = β5 = β7 = 0, β2 = 1.5, β3 = 0.75, β4 = −β6 = 2, β8 = 3. So the β’s satisfy the constraint Rβ = r with
Table 2 reports the risk values of various slope estimators.
In Table 2, “hat” denotes the hat estimators, and “tilde” denotes the tilde estimators. Each cell gives the risk value with weight matrix G−1. The risks with weight matrix H−1 have a similar pattern and therefore are omitted. In this simulation, one can see that the risks are pretty stable within each class of estimators, also the risks of the hat estimators and of the tilde estimators are very close. The smallest risk is achieved by RE, then PTE, PRSE, JSTE, and the largest risk is obtained by the AC estimator. These results coincide with our theory. For an illustrative purpose, the histograms of the estimates, with the weight matrix M = H−1, of the slope parameter for n = 200 are given in Figure 3.
When δ ≠ 0 or Rβn = r does not hold.
In this case, , where 1 is a 8 ×1 vector with elements all 1’s. Other quantities in the model stay unchanged. Figure 4 reports the risks of the tilde estimators with weight matrix G−1 for different sample sizes. The δ value ranges from 0 to 10. The risks of the hat estimators have the similar pattern regardless of the choice of weight matrix.
The risk of the AC estimator for the slope is almost constant for various δ. When the values of δ is close to 0, the RE for both slopes and intercept achieves the smallest risk, but its risk increases quickly when δ gets bigger. PTE for both slopes and intercept has smaller risk when δ is smaller, once δ leaves 0, the risk of PTE begins to increase and exceeds the risk of the AC estimator. After it hits a certain point, it comes down and eventually approaches to the risk of the AC estimator. For small δ values, the risks of JSTE and PRSE for the slope estimators are higher than that of the RE. However, when δ gets bigger, JSTE and PRSE begin to dominate all other estimators. The patterns of the risks of the AC estimator, RE and PTE for the intercept are similar to those of the AC estimator, RE and PTE for the slopes, but this is not true for the risks of JSTE and PRSE for the intercept.
Upon the request of one referee, we also conduct a simulation study when X and u follow non-normal distributions. Each component of the predictors X is generated from a uniform distribution on [1, 2], and each component of the measurement error u is generated from a uniform distribution on [−0.5, 0.5], X1, …, X8, u1, …, u8 are independent. All other entities are the same as in the normal case. The simulation results are similar to the normal cases and are not reported here.
The assessment of an individual diet is difficult, but fundamental in exploring the relationship between diet and cancer, and in monitoring dietary behavior among individuals and populations. A variety of dietary assessment instruments have been derived, of which three main types are most commonly used in nutritional research. The instrument of choice in large nutritional epidemiology studies is the Food Frequency Questionnaire (FFQ). For proper interpretation of epidemiologic studies that use FFQ’s as the basic dietary instrument, one needs to know the relationship between reported intakes from the FFQ, usual intake, energy, vitamin A, and other variables such as age and body mass index (bmi).
FFQ’s are thought to often involve a systematic bias (i.e., under- or over-reporting at the level of the individual). The other records also include measurement errors. To illustrate the proposed method, we analyze a data set from the Nurses Health Study (Rosner, Spiegelman, and Willett, 1990), which has a calibration study of size n = 168 women. All of them completed a single FFQ and four multiple-day food diaries. There are 6 variables, age (X1), bmi (X2), energy (X3), vitamin A (X4), usual intake (X5), and the calories from fat, FFQ (Y), in this dataset. Among these 6 variables, energy, usual intake, vitamin A are measured with error, but for each subject, these 3 variables are measured four times. A simple variance analysis suggests that the variance of energy is 3.63, the variance of vitamin A is 381.92, and the variance of usual intake is 10.34. For an initial analysis, the following multiple regression model is used to fit this dataset.
The averages of these four replications in X3, X4, X5 are used in the design matrix. The covariance matrix of the measurement errors is estimated by using the formula in Liang, Härdle, and Carroll (1999).
With all variables in the model and without constraint, Table 3 lists the estimated values based on the AC estimation for the slope, the estimated standard errors based on the procedure developed by Liang, et al.(1999), and the associated p-values calculated from t-distribution with degrees of freedom n − 6 = 168 − 6 = 162.
Table 3 shows that X3 and X4 are not significant at 0.1 significant level, while variable X1, X2, and X5 are significant. Recall that X5 represents usual intake, which is strongly related to intakes from the FFQ. On the other hand, vitamin A should not be a good predictor of food composition. Thus, using advanced statistical methods for getting a reasonable estimate but weighting towards β3 = β4 = 0 makes a lot of sense for nutritional research.
Now we impose a constraint on our model β3 = β4 = 0. In this case q = 2. Table 4 reports the estimated values, obtained by calculating the estimators we are studying, for the regression parameters. To measure the variation of these estimates, the bootstrap standard errors are also reported here. For each estimation procedure, 1000 bootstrap samples are drawn. Accordingly, 1000 bootstrap estimates for the slope parameters are obtained. To get a robust estimation for the standard errors, only the middle 80%, or 800 estimates are used in the calculation. The resulting bootstrap standard errors are shown in the brackets.
Note that the bootstrap standard errors of AC in Table 4 are close to the standard errors of AC in Table 3. Also, the standard errors of RE, PTE are all smaller than their counterparts of AC. These results clearly show that the proposed estimators improve upon the usual AC estimator.
We have introduced two classes of estimators, the hat and tilde estimators, and made a comprehensive comparison of their risk functions in some special cases. The comparison in general cases is more complicated, and it is difficult to say which estimators should be used in practice, unless further information is available, such as the values of R, r, G and H etc. Usually, the practitioner may have some prior information about R and r. Also, once the sample is obtained, one can estimate G and H. Based on our comparison (see Figures 1 and and2),2), the quadratic bias function and the risk functions of the hat estimators and the tilde estimators for the slope parameters do not differ substantially, even for different weight matrices. Our analysis indicates that if the prior knowledge about the regression parameters is true, RE, PTE, JSTE and PRSE all have smaller risks than the AC estimator. When the regression parameters deviates from the prior information, RE becomes useless eventually, while PTE behaves quite well, except for some medium departure from the null hypothesis. If the slope parameters are of interest, JSTE and PRSE are highly recommended in that they possess smaller risk than the AC estimator, RE and PTE. In particular, PRSE dominates JSTE in some special cases, see the discussions following Corollary 3.1.
The procedure developed in this paper can easily be extended to the case where Σuu is unknown but we have a consistent estimator of Σuu.
It is also possible to extend the procedure to the partially linear models Y = X′ β + ν(z) + ε with error-prone linear covariate X. The major work can be regarded as a combination of this paper and the work of Liang et al. , because the latter already derived a root −n consistent estimator of the parameter β.
In principle, the method proposed in this paper can also be extended to linear or partially linear models for longitudinal data. The derivation should be straightforward except for more complex notation.
The authors thank two referees for their helpful comments which improved the presentation of this manuscript. They also thank Dr. Raymond J. Carroll for very helpful comments. This research was partially supported by NIH/NIAID grants AI62247 and AI059773.
A direct derivation yields AC − βn = (SWW − Σuu)−1(SWY − SWWβn+Σuuβn), and . Let . It follows that
Note that the last two terms approach zero and SWW − Σuu → Σxx in probability. Theorem 2.1 follows for the AC estimator by applying the central limit theorem, together with the fact that βn → β.
For all other hat estimators, we have the general form * = AC − (AC − RE)(Ln). Recall RE = AC − ĤnR′(RĤnR′)−1(RAC−r). It follows that * = AC − (Ln)ĤnR′(RĤnR′)−1(RAC −r). Under the local alternative , we know that
So the result for hat estimators follows from the fact that ĤnR′(RĤnR′)−1 → JH in probability, and in distribution. In the same way, we can prove Theorem for all tilde estimators.
To prove Corollary 3.1 and compute the risk function of other estimators, we need the following lemmas.
(Saleh, 2006) If the p × 1 vector Y is distributed normally with mean vector μ and covariance matrix Ip×p, then for any measurable function ,
provided that the expectations exist.
If the p × 1 vector Y is distributed normally with mean vector μ and covariance matrix Ip×p, and A is an idempotent matrix with rank q ≤ p, then for any measurable function , we have
provided the expectations exist, where , λ = μ′ A μ.
Let P be the p × p orthogonal matrix such that PAP′ = Blockdiag(Iq×q,0(p−q) × (p−q)), and , P1 is q × p, and P2 is (p − q) × p. Let Z = PY, then Y′AY = Z′Blockdiag(I, 0)Z. Partition Z into two blocks . It is easy to see Z1 ~ N(P1μ, Iq×q), Z2 ~ N(P2μ, I(p−q) × (p−q)) Z1 and Z2 are independent. Hence , YY′ = P′ZZ′P, and
Hence E[(Y′AY)YY′] equals to
Thus (A.3) is obtained by using the facts that .
To prove (A.4), one can show that . Again , and imply the desired result.
It is sufficient to show that . From the normality of u and Lemma A.2, one can show that
Let The diagonal form of the covariance matrices of x and u implies G = c1I + c2ββ′. Then
Note that , one can obtain that (RH R′)−1RH HR′(RH R′)−1 = (RR′)−1.
It follows from a direct calculation that
Note that the left hand side is , and the right hand side is . We complete the proof.
From Theorem 2.1, one can see that the asymptotic distributions for all estimators are the same as that of Z − (L)CR′(RCR′)−1 (RZ + δ), where C = H or G, and (·) is defined in Theorem 2.1. We now compute the risk function for this general form. The results in Theorems 3.2 and 3.3 can be obtained by replacing C with H and G, respectively.
Let Y = G−1/2Z + μ, μ = G1/2R′(RGR′)−1δ, A = G1/2R′(RGR′)−1RG1/2. Then Y ~ Np (μ, I), and L = Y′AY. We first show that
In fact, E(L)Z can be written as G1/2E(L)Y − G1/2μE(L). By (A.4) and , we obtain that
Then (A.8) is obtained by noticing that Aμ = μ.
Now we are ready to compute the risk functions. For any positive definite matrix M, we have , which equals
The first term equals tr(MG). To compute the second and third terms, note that
This completes the proof.
Recall that the AC estimator for β is given by AC = (SWW − Σuu)−1SWY, and the estimator of β0 is given by 0 = − ′AC. Note that
Let . The multivariate central limit theorem and the fact that limn→∞βn = β indicate that (Z0n, ) converges to (Z0, Z′), a normal vector with mean (0, 0)′ and covariance matrix being
where G is given in (2.5) and β here is subject to Rβ = r.
Note that all the proposed estimators for β have a common form with Cn = Ĥn or Ĝn. So . From (A.12), we know that in distribution. Therefore, for any real vector x0,
The estimators for the intercept β0, , can be expressed
which can be written as
by recalling the notation Z0n and Zn. From (A.12), we have , in distribution. Therefore,
To deal with the last term, denote , then E(Z0|Z) = τ′Z, Z0 − E(Z0|Z) and Z are independent, and L depends on Z only. We have
Therefore, tends to
Finally, we have
From E(Z0|Z) = τ′Z, we have E(Z0Z) = E(ZZ′)τ = Gτ. Then by (A.14),
AMS 2000 subject classification: 62J05; 62F30; secondary 62J99.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Hua Liang, Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.
Weixing Song, Department of Statistics, Kansas State University, Manhattan, Kansas, 66506, USA.