Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2756710

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Estimation methods
- 3. Numerical Study
- 4. A real data application
- 5. Discussion
- References

Authors

Related links

Comput Stat Data Anal. Author manuscript; available in PMC 2011 January 1.

Published in final edited form as:

Comput Stat Data Anal. 2010 January 1; 54(1): 25–36.

doi: 10.1016/j.csda.2009.08.012PMCID: PMC2756710

NIHMSID: NIHMS142682

See other articles in PMC that cite the published article.

Measurement error occurs in many biomedical fields. The challenges arise when errors are heteroscedastic since we literally have only one observation for each error distribution. This paper concerns the estimation of smooth distribution function when data are contaminated with heteroscedastic errors. We study two types of methods to recover the unknown distribution function: a Fourier-type deconvolution method and a simulation extrapolation (SIMEX) method. The asymptotics of the two estimators are explored and the asymptotic pointwise confidence bands of the SIMEX estimator are obtained. The finite sample performances of the two estimators are evaluated through a simulation study. Finally, we illustrate the methods with medical rehabilitation data from a neuro-muscular electrical stimulation experiment.

Many practical problems involve estimation of distribution functions or density functions from indirect observations. For example, in low level microarray data from either the complementary DNA (cDNA) microarray or the Affymetrix GeneChip system, each observation is an original signal coupled with a background noise. To obtain an expression measure, the goals often include developing better statistical tools or enhancing algorithms for background correction so that the disease genes can be detected accurately and efficiently. In medical image analysis, observable outputs are often blurred images. In astronomy, due to great astronomical distances and atmospheric noise, most data are subject to measurement errors. Statistical analyses that ignore measurement errors could be misleading. Measurement error model is an active, rich research field in statistics. There is an enormous literature on this topic in linear regression, as summarized by Fuller (1987) and in nonlinear models, as summarized by Carroll et al. (2006). In this paper, we investigate estimation methods of smooth distribution function from data contaminated with heteroscedastic measurement errors.

Nonparametric kernel type methods have been widely used in estimating density functions and their derivatives or in regression. Kernel smoothing is also an important tool for distribution function estimation. The kernel estimate of a distribution function, first introduced by Nadaraya (1964), has been investigated by many authors (Azzalini, 1981; Reiss, 1981; Sarda, 1993; Bowman et al., 1998). The kernel smooth estimator, which has good statistical properties, can be expressed as

$${\widehat{F}}_{X}(x)=\frac{1}{n}\sum _{j=1}^{n}L\left(\frac{x-{X}_{j}}{{h}_{n}}\right),$$

(1)

where *X*_{1}, *X*_{2}, …, *X _{n}* are independent random variables with the common distribution function

$${\widehat{F}}_{X}(x)={\int}_{-\infty}^{x}{\widehat{f}}_{X}(t)dt,$$

where is the well-known kernel density estimator.

In real data applications, there are many examples where observable data are contaminated with measurement errors and it is not realistic to assume that the errors are homoscedastic. The measurement process might be subjective and differs among all individuals. Fuller (1987) had an early consideration of this problem. Cheng and Riu (2006) discussed the point estimation of the parameters in a linear measurement error (heteroscedastic errors in variables) model. Kulathinal et al. (2002) considered estimation problem of an errors-in-variables regression model when the variances of the measurement errors vary between observations in the analysis of aggregate data in epidemiology. Sun et al. (2002) studied a measurement error model with application to astronomical data that came with information on their heteroscedastic errors. In the research of density estimation with measurement errors, the literature is vast; see for example, Fan (1991); Zhang (1990); Stefanski and Carroll (1990); Carroll and Hall (1989); Delaigle and Gijbels (2004). They focused on the study of the *deconvoluting kernel density estimation* through an inverse Fourier transform with the case of homoscedastic errors. Until very recently, Delaigle and Meister (2008) studied the deconvoluting kernel estimator under the heteroscedastic setting. Staudenmayer et al. (2008) addressed a different type of model, where the observable data with heteroscedastic measurement errors were assumed normally distributed. A Monte Carlo Markov chain and a random-walk Metropolis-Hastings algorithm were proposed to estimate the unknown density.

Estimating cumulative distribution functions with measurement errors was also of interest. The sample cumulative distribution function is a nonlinear function of data and is biased when it is estimated ignoring measurement errors. Stefanski and Bay (1996) studied the estimation of a discrete population cumulative distribution function when data were contaminated with measurement errors. Using the method of simulation extrapolation (SIMEX) (Cook and Stefanski, 1994; Stefanski and Cook, 1995), they proposed a bias-adjusted estimator that reduced much of the bias. Later, Cordy and Thomas (1996) considered an expectation-maximization algorithm for estimating a distribution function when data were from a mixture of a finite number of known distribution. Nusser et al. (1996) presented a method for estimating distributions with additive normal errors with application to a study of daily dietary intakes. Their approach differed from the kernel estimators in that they assumed that a transformation existed such that both the original observations and the measurement errors were normally distributed. Most recently, Hall and Lahiri (2008) studied Fourier-type estimation of distributions, moments, and quantiles with homoscedastic errors.

Challenges of estimation problems arise when errors are heteroscedastic where we literally have only one observation for each error distribution. In this paper, we study two types of methods to recover the unknown smooth distribution function: a Fourier-type deconvolution method and a SIMEX method, when data are contaminated with heteroscedastic errors. In section 2, we first extend Hall and Lahiri’s (2008) Fourier-type estimator to the case of heteroscedastic errors and then generalize the work of Stefanski and Bay (1996) to estimate the smooth distribution function with heteroscedastic errors using SIMEX. The asymptotics of the two estimators are studied and the asymptotic pointwise confidence bands of the SIMEX estimator are obtained. In section 3, we conduct a simulation study to compare the finite sample performances of the two estimators. In section 4, we apply our proposed method to real data in a medical rehabilitation study. In section 5, we close this paper with a discussion.

To investigate the estimation of the smooth distribution function for data contaminated with heteroscedastic errors, we first consider a general *heteroscedastic measurement error model*. Let *Y*_{1}, ···, *Y _{n}* be an observed random sample such that

$${Y}_{j}={X}_{j}+{U}_{j},$$

(2)

with the measurement error *U _{j}* independent of

Denote the characteristic functions of *X* and *U _{j}* by

$${\widehat{f}}_{X,\mathit{Fourier}}(x)=\frac{1}{n{h}_{n}}\sum _{j=1}^{n}{\stackrel{\sim}{K}}_{j}\left(\frac{x-{Y}_{j}}{{h}_{n}}\right),$$

(3)

where

$${\stackrel{\sim}{K}}_{j}(z)=\frac{1}{2\pi}\int {e}^{-\mathit{itz}}\frac{{\phi}_{K}(t)}{{\psi}_{{U}_{j}}(t/{h}_{n})}dt,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\psi}_{{U}_{j}}(t)=\frac{{\scriptstyle \frac{1}{n}}{\sum}_{k=1}^{n}\mid {\phi}_{{U}_{k}}(t){\mid}^{2}}{{\phi}_{{U}_{j}}(-t)},$$

and * _{K}* is the characteristic function of a symmetric probability kernel,

Similarly to Hall and Lahiri (2008), our distribution estimator * _{X,Fourier}* in the case of heteroscedastic errors is defined as simply the integral of

$${\stackrel{\sim}{L}}_{j}(z)=\frac{1}{2}+\frac{1}{2\pi}{\int}_{-\infty}^{\infty}\frac{sin(tz)\xb7{\phi}_{K}({h}_{n}t)}{t\xb7{\psi}_{{U}_{j}}(t)}dt,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}j=1,\cdots ,n.$$

By integrating * _{X,Fourier}* in (3), we have

$${\widehat{F}}_{X,\mathit{Fourier}}(x)={\int}_{-\infty}^{x}{\widehat{f}}_{X,\mathit{Fourier}}(t)dt=\frac{1}{n}\sum _{j=1}^{n}{\stackrel{\sim}{L}}_{j}(x-{Y}_{j}).$$

(4)

We now study asymptotic properties of our estimator. Due to the nature of the deconvolution kernel estimation, similarly as defined in Fan (1991), we consider a slightly different definition of the distribution function estimator as in the following form to derive the asymptotics. For a sequence of positive numbers *M _{n}* → ∞ as

$${\widehat{F}}_{n,\mathit{Fourier}}(x)={\int}_{-{M}_{n}}^{x}{\widehat{f}}_{X,\mathit{Fourier}}(t)dt.$$

Here we included the terms about *M _{n}* in the uniform bound. Conventionally,

- For an integer
*m*≥ 0, 0 <*α*≤ 1,$$\mid {f}^{(m)}(x)-{f}^{(m)}(y)\mid \le C\xb7\mid x-y{\mid}^{\alpha};$$ *K*(*x*) <*D*· |*x*|^{−}^{m}^{−2}for all*x*;(_{K}*t*) has support [−1, 1];- Condition
*C*of Delaigle and Meister (2008).

The asymptotic behavior of the estimator is described in the next theorem.

Assume Conditions (1) – (4). Then, for x_{0} with specifically chosen sequence of numbers M_{n} → ∞ and bandwidth h_{n} → 0, we have

$$\begin{array}{l}\underset{-\infty <{x}_{0}<\infty}{sup}E{[{\widehat{F}}_{n,\mathit{Fourier}}({x}_{0})-F({x}_{0})]}^{2}\le O({h}_{n}^{2(m+\alpha +1)})+O({F}^{2}(-{M}_{n}(1-{h}_{n})))\\ +O({M}_{n}^{-2m-2})+O\left({M}_{n}{\left[{h}_{n}\sum _{k=1}^{n}\mid {\phi}_{{U}_{k}}(1/{h}_{n}){\mid}^{2}\right]}^{-1}\right).\end{array}$$

The last term of the upper bound reflects the effect of the heteroscedastic measurement errors to the estimation of the unknown distribution function. The smoothness of the measurement error distributions influence the convergence rate of the distribution function estimation, as discussed in Fan (1991). In the above theorem, we present a general asymptotic result of our estimator. The specific convergence rate could be obtained similar as in Hall and Lahiri (2008), where they classified the distribution functions of the unobservable random variable and the measurement errors into eight classes.

SIMEX is a “jackknife”-type bias-adjusted method that has been widely applied in regression problems with measurement errors. Stefanski and Cook (1995) applied the SIMEX algorithm to parametric regression problems. Staudenmayer and Ruppert (2004), Carroll et al. (1999) discussed the nonparametric regression in the presence of measurement errors using SIMEX method. Stefanski and Bay (1996) studied SIMEX estimation of a finite population cumulative distribution function when sample units are measured with errors. Now we generalize Stefanski and Bay’s (1996) method to estimate the smooth distribution function with heteroscedastic Gaussian errors.

Under the model setting (2), we further assume
${U}_{j}\sim N(0,{\sigma}_{j}^{2})$, for *j* = 1, ···, *n*. Typically, *σ _{j}* can be obtained from auxiliary data. By the general SIMEX algorithm, estimators are re-computed from a large number

$${Y}_{jb}(\lambda )={Y}_{j}+{\lambda}^{1/2}{U}_{jb}={Y}_{j}+{\sigma}_{j}{\lambda}^{1/2}{Z}_{jb},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}j=1,\cdots ,n,\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}b=1,\cdots ,B,$$

where *Z _{jb}* are independent, standard normal pseudo-random variables, and

The smooth distribution function estimator from the *b*th variance-inflated data
${\{{Y}_{jb}(\lambda )\}}_{j=1}^{n}$ is

$${\widehat{G}}_{b}(x)=\frac{1}{n}\sum _{j=1}^{n}L\left(\frac{x-{Y}_{jb}}{h}\right),$$

where *L*(·) is defined from a kernel *K*(·) as
$L(x)={\int}_{-\infty}^{x}K(t)dt$ and *K*(·) is a symmetric kernel with a finite variance such that ∫ *K*(*t*)*dt* = 1, ∫ *tK*(*t*)*dt* = 0, and ∫ *t*^{2}*K*(*t*)*dt* < ∞.

In the general SIMEX algorithm, the simulation and estimation steps are repeated a large number of times, and the average value of the estimators for each level of contamination is calculated by,

$$\widehat{G}(x)=\frac{1}{B}\sum _{b=1}^{B}{\widehat{G}}_{b}(x)=\frac{1}{B}\sum _{b=1}^{B}\left(\frac{1}{n}\sum _{j=1}^{n}L\left(\frac{x-{Y}_{jb}}{h}\right)\right).$$

(5)

However, it is noted that *U _{jb}* =

$$\frac{1}{B}\sum _{b=1}^{B}L\left(\frac{x-{Y}_{jb}}{h}\right)=\frac{1}{B}\sum _{b=1}^{B}L\left(\frac{x-{Y}_{j}-{\lambda}^{1/2}{U}_{jb}}{h}\right)=\frac{1}{B}\sum _{b=1}^{B}L\left(\frac{{\scriptstyle \frac{x-{Y}_{j}}{{\sigma}_{j}{\lambda}^{1/2}}}-{Z}_{jb}}{{\stackrel{\sim}{h}}_{j}}\right).$$

Notice that the smooth distribution estimator is asymptotically unbiased and has the same variance as the EDF. It uniformly converges to the true distribution function with probability one (Nadaraya, 1964). Hence, with a fixed *j*, conditional on *Y _{j}*,

$$\frac{1}{B}\sum _{b=1}^{B}L\left(\frac{x-{Y}_{jb}}{h}\right)\stackrel{\mathcal{D}}{\to}\mathrm{\Phi}\left(\frac{x-{Y}_{j}}{{\sigma}_{j}{\lambda}^{1/2}}\right),$$

where Φ(·) denotes the distribution function of the standard normal distribution.

Therefore, the simulation step can be bypassed in the SIMEX algorithm for the distribution estimation. We use *Ĝ** in (6) to replace *Ĝ* in (5) for our estimation,

$${\widehat{G}}^{\ast}(x,\lambda )\triangleq \frac{1}{n}\sum _{j=1}^{n}\mathrm{\Phi}\left(\frac{x-{Y}_{j}}{{\sigma}_{j}{\lambda}^{1/2}}\right).$$

(6)

The above equation has some similarity with equation (3) in Stefanski and Bay (1996). We further calculate the quantity in (6) for a pre-determined sequence of *λ*, i.e. 0 ≤ *λ*_{1} < *λ*_{2} < ··· < *λ _{l}*. The success of SIMEX technique depends on the fact that the expectation of

$$\text{E}({\widehat{G}}^{\ast}(x,\lambda ))\approx {\beta}_{0}+{\beta}_{1}\lambda +{\beta}_{2}{\lambda}^{2}.$$

It is indeed a good approximation when
$max\{{\sigma}_{j}^{2}\}$ is not very large. The SIMEX estimator for the unknown distribution function *F _{X}* without measurement error can be obtained by the extrapolation step,

$${\widehat{F}}_{X,\mathit{SIMEX}}(x)={\widehat{\beta}}_{0}-{\widehat{\beta}}_{1}+{\widehat{\beta}}_{2}.$$

(7)

The following proposition shows that, in the case of heteroscedastic Gaussian errors, our estimator (7) is an asymptotically unbiased estimator for the unknown distribution function *F _{X}*, if the extrapolant is exact.

Assume (a) the polynomial extrapolant is exact, (b) the distribution function F_{X}(x) of unobserved X_{1}, ···, X_{n} has continuous fourth derivative, and (c)
${\sigma}_{j}^{2}<{C}_{0}$ for all j and C_{0} < ∞. Then, the estimator (7) is an asymptotically unbiased estimator of F_{X}(x). The corresponding asymptotic variance is F_{X}(x)(1 − F_{X}(x))/n, which can be consistently estimated by _{X,SIMEX}(x)(1 − _{X,SIMEX}(x))/n.

Under realistic applications, the assumption (a) in proposition 1 will only be approximately true. See for instance Carroll et al. (1999); Staudenmayer and Ruppert (2004). As we show in the equation (8) in Appendix, the expectation of Ĝ*(x, λ) is a quadratic function of λ at a given x plus the approximation error term, $o\left({\scriptstyle \frac{1}{n}}{\sum}_{j=1}^{n}{\sigma}_{j}^{4}\right)$. The success of the SIMEX method in practice depends on the fact that E[Ĝ*(x, λ)] can well-approximated by a quadratic function of λ.

From proposition 1, the estimated asymptotic pointwise confidence bands when data are contaminated with heteroscedastic Gaussian errors have the form

$$\begin{array}{l}I(x)=({\widehat{F}}_{X,\mathit{SIMEX}}(x)-c\sqrt{{\widehat{F}}_{X,\mathit{SIMEX}}(x)(1-{\widehat{F}}_{X,\mathit{SIMEX}}(x))/n},\\ {\widehat{F}}_{X,\mathit{SIMEX}}(x)+c\sqrt{{\widehat{F}}_{X,\mathit{SIMEX}}(x)(1-{\widehat{F}}_{X,\mathit{SIMEX}}(x))/n}),\end{array}$$

where c is chosen as the (1− *α*/2) quantile of the standard normal distribution.

Due to the heterogeneity of measurement errors, the variance estimation method in Stefanski and Bay (1996) can not be applied to our proposed estimator _{X,SIMEX}(x). However, the variance of _{X,SIMEX}(x) can be well approximated by the variance of Ĝ*(x, λ) as λ → −1. The latter can then be approximately consistently estimated. The naive confidence bands therefore can be constructed. The confidence bands based on the aforementioned variance estimation are simulated. It almost coincides the nonparametric bootstrap confidence band. This variance estimation of the distribution function can also be applied to the case of homoscedastic measurement errors.

In the SIMEX distribution estimation, it is possible that the estimated values are out of [0, 1] in tail regions. This situation is similar to many classes of kernel methods, such as wavelet density estimators, sinc kernel estimators, and spline estimators. The disadvantage will not affect the global performance of our estimation. A simple correction version of our SIMEX estimator in practice is

$${\widehat{F}}_{X,\mathit{SIMEX}}^{\ast}(x)=\{\begin{array}{ll}0\hfill & :\phantom{\rule{0.38889em}{0ex}}{\widehat{F}}_{X,\mathit{SIMEX}}(x)<0\hfill \\ {\widehat{F}}_{X,\mathit{SIMEX}}(x)\hfill & :\phantom{\rule{0.38889em}{0ex}}0\le {\widehat{F}}_{X,\mathit{SIMEX}}(x)\le 1.\hfill \\ 1\hfill & :\phantom{\rule{0.38889em}{0ex}}1<{\widehat{F}}_{X,\mathit{SIMEX}}(t)\hfill \end{array}$$

In the Fourier-type method, we consider a modified “plug-in” approach to select the bandwidth *h _{n}*. Fan (1992) proposed a plug-in asymptotical bandwidth for the density estimation with homoscedastic errors. Delaigle and Gijbels (2004) suggested a “normal reference” approach and Hall and Lahiri (2008) discussed it for distribution estimation with homoscedastic errors. Here we adopt Hall and Lahiri’s (2008) “normal reference” bandwidth approach for the case of heteroscedastic errors. Specifically, we shall temporarily take

The SIMEX estimator requires specification of *λ*_{1}, ···, *λ _{l}*. They are not the smoothing parameters in the sense of the conventional kernel estimation. They act as “design points” of our estimator, which is similar to the knots in a spline (Kooperberg and Stone, 1992). So, the choice of

$${\overline{\sigma}}_{U}{\lambda}_{1}^{1/2}={c}_{1}{\widehat{h}}_{\mathit{rot},Y},$$

where *ĥ _{rot,Y}* is the Silverman’s rule-of-thumb bandwidth (Silverman, 1986) based on the observed data

We now investigate the finite sample performances of the Fourier-type estimator and the SIMEX estimator via a simulation study. Our study involves three types of target distributions: (1) *X* ~ *N*(0, 1), (2) *X* ~ 0.5 *N*(−3, 1) + 0.5 *N*(3, 1), and (3) *X* ~ Γ(2, 1). From each of these distributions, 500 samples of size *n* = 50, 100, and 500 are generated, each of which is then contaminated by heteroscedastic errors. The measurement errors are generated from
$N(0,{\sigma}_{j}^{2})$, where *σ _{j}* (

To assess the quality of the smooth distribution function estimators, we use the *integrated squared error* (ISE) criterion:

$$\text{ISE}(\widehat{F}(x))=\int {\{\widehat{F}(x)-F(x)\}}^{2}dx.$$

We compare four estimators in the study: the Fourier-type estimator; the SIMEX estimator; the naive estimator, *i.e.* the smooth distribution estimator of *Y* where measurement errors are ignored; and the smooth distribution estimator from the uncontaminated sample *X*.

Table 1 summarizes the results of the average of the 500 ISEs for different estimators under different simulation conditions. The simulation results show that both the Fourier-type estimator and the SIMEX estimator perform much better than the naive estimator in terms of the ISE criterion. The ISEs for the Fourier-type method and the ISEs for the SIMEX method become smaller and closer to the ISEs from the uncontaminated sample when sample sizes become larger. We also note that the ISEs are larger at the case of *σ _{U}* ~

Figure 1 allows us to display and compare estimated curves visually. In the upper plot, the true distribution is standard normal, *N*(0, 1). The measurement errors are generated from
$N(0,{\sigma}_{j}^{2})$ and *σ _{j}* ~

Distribution estimation for data contaminated with heteroscedastic errors: In the upper plot, the true distribution is standard normal, *N*(0, 1). The measurement errors are from
$N(0,{\sigma}_{j}^{2})$ and *σ*_{j} ~ *U*(0.4, 0.6), *j* = 1, ···, **...**

To investigate the performance of the estimated asymptotic confidence bands with the SIMEX method derived in the last section, we compare them with the nonparametric bootstrap confidence bands and the estimated confidence bands from the uncontaminated sample *X*. Figure 2 shows a simulated example of the SIMEX estimate and three different confidence bands of the distribution function *F*(*x*). The true distribution is a normal mixture 0.5*N*(−3, 1) + 0.5*N*(3, 1) and the measurement errors are generated from
$N(0,{\sigma}_{j}^{2})$ and *σ _{j}* ~

*Spinal cord injury* (SCI) is damage to the spinal cord often due to traumatic accident, resulting in upper and lower motor neuron lesions. It typically leads to paralysis and loss of sensation in parts of the body controlled through the spinal cord below the level where the injury occurred. All individuals with SCI, and particularly those with complete lesions, are considered to be at high risk of *pressure ulcer* development throughout their lifetime. Pressure ulcers are areas of damaged skin and tissue that develop when sustained pressure occurs. Traditionally, techniques to reduce pressure ulcer incidence have focused on reducing extrinsic risk factors. These techniques include providing cushions to improve pressure distribution. Another approach is educating individuals on the importance of regular pressure relief procedures. *Neuro-muscular electrical stimulation* (NMES) is the application of electrical stimuli to a group of muscles, which is a new clinic tool for pressure ulcers to produce beneficial changes at the user/support system interface by altering the intrinsic characteristics of the user’s paralyzed tissue itself (Bogie et al., 2006, 2008).

The primary goal of the NMES study at Cleveland FES center was to investigate the distribution of pressure intensities for each patient under different clinical conditions such as before and after NMES treatment. Pressure intensity data at the seating interface for each patient were recorded by using the Tekscan advanced clinseat pressure mapping system (Tekscan, Inc.). The seating interface was divided as a 48 × 42 matrix. The pressure intensity of each element was measured simultaneously. Figure 3 displays an example of pressure intensity data in the NMES study. Pressure intensities at the seating area for one subject correspond to color-scale rectangular segments in the image. The color bar indicates the mapping from data values to colors. However, the outcomes of pressure intensities are subject to measurement errors. Measurements were taken at several different times for a patient, resulting in replicate measurements of pressure intensity maps. It is reasonable to assume that the observable intensities are contaminated with heteroscedastic Gaussian errors at the seating area. From the replicate measurement data, we are able to estimate the heteroscedastic variances of measurement errors for each active location. Hence, the fundamental statistical question is how to estimate the unobserved distribution of pressure intensities from the data contaminated with errors. In this example, our pressure data contain total 1518 activated observations. The pressure intensities with measurement errors have mean 42.99 and standard deviation 16.73. The range of the observed intensities is from 11.0 to 120.0. The heteroscedastic standard deviations of measurement errors vary from 6.32 to 8.01.

An example of pressure intensity data at the seating interface for one subject: each pressure intensity at the seating area corresponds to a color-scale rectangular segment in the image. The color bar indicates the mapping from intensity values to colors. **...**

We conduct the analysis using both the Fourier-type and the SIMEX methods for the data from the NMES study. Figure 4 displays the smooth distribution estimation of pressure intensities in the single case study. The solid line is the estimated distribution function by the SIMEX method and the dotted line is the estimate by the Fourier-type method, while the dashed line is the estimated distribution function by the naive method where measurement errors are ignored. The recovered function shows asymmetric features. Both the Fourier-type estimator and the SIMEX estimator obtain coincident results and there is only a slight difference at the left tails. The curve for the naive estimator does not fall within the confidence region of the SIMEX estimator. The example demonstrates that correcting the measurement errors is critical in the statistical analysis. After recovering the distribution function, we will be able to make further statistical inferences, such as comparing two recovered distribution functions under different clinical treatments.

We studied two bias-correction methods to estimate smooth distribution function for the data contaminated with heteroscedastic errors. The Fourier-type method was generalized from the conventional deconvolution kernel method (Hall and Lahiri, 2008), which can be applied to any arbitrary error distributions. The SIMEX method, extending the Stefanski and Bay’s (1996) work, was easy to be implemented and computationally fast due to the fact that the simulation step was bypassed. As shown in the simulation results, both methods allowed us to recover accurately the true distribution function from a contaminated sample when the sample size was large. However, the SIMEX method worked better than the Fourier-type method when sample size was small and the variances of measurement errors were large. The asymptotic variance of the SIMEX estimator was obtained and the simulation showed that the native estimate of the asymptotic confidence bands performed very well. Thus, applying the SIMEX method in real applications would be attractive.

We addressed in section 2 that the success of the SIMEX method depended on the fact that the expectation of *Ĝ**(*x, λ*) was well-approximated by a quadratic function of *λ* for a small
${\scriptstyle \frac{1}{n}}{\sum}_{j=1}^{n}{\sigma}_{j}^{4}$. How large should the error level be for the SIMEX method to be feasible? With a moderate sample size, our simulations suggested that max(*σ _{j}*) < 1 could work adequately well for the three special cases considered here. For other cases, more simulations and study would be needed in order to answer the above question.

The SIMEX method we discussed only dealt with the case of the Gaussian errors, while the Fourier-type method could work with a large classes of errors including both super-smoothed errors and ordinary-smoothed errors as defined by Fan (1991). Indeed, the SIMEX method could also be applied to the case of exponential errors. Sometimes, measurement errors only could be positive in medical applications. An exponential distribution, as a non-zero mean, skewed distribution, was a common assumption in those studies (Ballico, 2001; Savin, 2000). We noticed that, however, the SIMEX estimator in the case of exponential errors was not asymptotically unbiased even if the polynomial extrapolant was exact. A natural idea was to apply an extra smoothing step (*i.e.* smoothing splines) on * _{X,SIMEX}*(

As one of reviewers pointed out, it would be interesting to compare the performances of our estimators with (homoscedastic) estimators that heteroscedastic measurement error variances were replaced by a constant measurement error variance (the average of heteroscedastic variances). Our preliminary simulation showed that the degree of variation of heteroscedastic measurement error variances affected the estimators using the constant variance. A more comprehensive study of the model misspecification problem for both density and distribution estimation could be done in our future research.

The research of the confidence bands for the Fourier-type method remained as an open problem. Bissantz et al.’s (2007) nonparametric confidence intervals in deconvolution density estimation was relevant where they only focused on the homoscedastic and ordinary-smoothed errors.

We are grateful to the reviewers and the associate editor for their valuable comments. We thank Dr. Kath Bogie from the Cleveland FES center and Case Western Reserve University for making NMES clinical data available for this analysis. The research of Xiao-Feng Wang is supported in part by the NIH Clinical and Translational Science Awards. The research of Zhaozhi Fan is supported in part by the Natural Sciences and Engineering Research Council of Canada.

Under the regularity conditions (1) ~ (4), for *x*_{0} (−∞,∞) with specifically chosen sequence of numbers *M _{n}* → ∞ and bandwidth

$$\begin{array}{l}\mathit{Bias}[{\widehat{F}}_{n,\mathit{Fourier}}({x}_{0})]=E[{\widehat{F}}_{n,\mathit{Fourier}}({x}_{0})]-F({x}_{0})\\ ={\int}_{-{M}_{n}}^{{x}_{0}}E[\widehat{f}(x)]dx-f({x}_{0})\\ ={\int}_{-{M}_{n}}^{{x}_{0}}{\int}_{-\infty}^{\infty}f(x-y)\frac{1}{{h}_{n}}K(\frac{y}{{h}_{n}})\mathit{dydx}-F({x}_{0})\\ ={\int}_{-\infty}^{\infty}\frac{1}{{h}_{n}}[F({x}_{0}-y)-F(-{M}_{n}-y)]K(\frac{y}{{h}_{n}})dy-F({x}_{0}).\end{array}$$

The norm of the bias is then

$$\begin{array}{l}\left|\mathit{Bias}[{\widehat{F}}_{n,\mathit{Fourier}}({x}_{0})]\right|\le \left|{\int}_{-\infty}^{\infty}\frac{1}{{h}_{n}}F({x}_{0}-y)K(\frac{y}{{h}_{n}})dy-F({x}_{0})\right|\\ +{\int}_{-\infty}^{\infty}F(-{M}_{n}-{h}_{n}y)K(y)dy.\end{array}$$

It is uniformly bounded from above by

$$O({h}_{n}^{m+\alpha +1})+O(F(-{M}_{n}(1-{h}_{n})))+O({M}_{n}^{-m-1}).$$

On the other hand, the variance of * _{n,Fourier}*(

$$\begin{array}{l}\mathit{Var}[{\widehat{F}}_{n,\mathit{Fourier}}({x}_{0})]=\sum _{j=1}^{n}\frac{1}{{(2\pi )}^{2}}\mathit{Var}\left[{\int}_{-{M}_{n}}^{{x}_{0}}{\int}_{-\infty}^{\infty}{e}^{-\mathit{itx}}{\phi}_{K}(t\xb7{h}_{n}){e}^{it{Y}_{j}}{\psi}_{j}(t)\mathit{dtdx}\right]\\ \le ({M}_{n}+\mid {x}_{0}\mid )\sum _{j=1}^{n}\frac{1}{{(2\pi )}^{2}}{\int}_{-\infty}^{\infty}\mid {\phi}_{K}(t\xb7{h}_{n}){\mid}^{2}\mid {\psi}_{j}(t){\mid}^{2}dt\\ =\frac{{M}_{n}+\mid {x}_{0}\mid}{4{\pi}^{2}}{\int}_{-\infty}^{\infty}\mid {\phi}_{K}(t\xb7{h}_{n}){\mid}^{2}{\left[\sum _{k=1}^{n}\mid {\phi}_{{U}_{k}}(t){\mid}^{2}\right]}^{-1}dt\\ =\frac{{M}_{n}+\mid {x}_{0}\mid}{4{\pi}^{2}}{\int}_{-1/{h}_{n}}^{1/{h}_{n}}\mid {\phi}_{K}(t\xb7{h}_{n}){\mid}^{2}{\left[\sum _{k=1}^{n}\mid {\phi}_{{U}_{k}}(t){\mid}^{2}\right]}^{-1}dt\\ =\frac{{M}_{n}+\mid {x}_{0}\mid}{2{\pi}^{2}}{\int}_{0}^{1/h{}_{n}}\mid {\phi}_{K}(t\xb7{h}_{n}){\mid}^{2}{\left[\sum _{k=1}^{n}\mid {\phi}_{{U}_{k}}(t){\mid}^{2}\right]}^{-1}dt\\ =\frac{{M}_{n}+\mid {x}_{0}\mid}{2{\pi}^{2}}{\int}_{0}^{1}\mid {\phi}_{K}(s){\mid}^{2}{\left[\sum _{k=1}^{n}\mid {\phi}_{{U}_{k}}(s/{h}_{n}){\mid}^{2}\right]}^{-1}{h}_{n}^{-1}ds,\end{array}$$

which is uniformly bounded from above by

$$O\left({M}_{n}\xb7{\left[{h}_{n}\sum _{k=1}^{n}\mid {\phi}_{{U}_{k}}(1/{h}_{n}){\mid}^{2}\right]}^{-1}\right).$$

Thus, the conclusion follows.

Under the conditions (a) ~ (c),

$$\begin{array}{l}E\left[{\widehat{G}}^{\ast}(x,\lambda )\right]=\frac{1}{n}\sum _{j=1}^{n}E\left[\mathrm{\Phi}\left(\frac{x-{Y}_{j}}{{\sigma}_{j}{\lambda}^{1/2}}\right)\right]=\frac{1}{n}\sum _{j=1}^{n}E\left[\mathrm{\Phi}\left(\frac{x-{X}_{j}-{\sigma}_{j}{Z}_{j}}{{\sigma}_{j}{\lambda}^{1/2}}\right)\right]\\ =\frac{1}{n}\sum _{j=1}^{n}\int \int \left[\mathrm{\Phi}\left(\frac{x-u-{\sigma}_{j}z}{{\sigma}_{j}{\lambda}^{1/2}}\right)\right]f(u)\phi (z)\mathit{dudz}\\ =\frac{1}{n}\sum _{j=1}^{n}\int \left[-\int \mathrm{\Phi}(v)dF(x-{\sigma}_{j}{\lambda}^{1/2}v-{\sigma}_{j}z)\right]\phi (z)dz\\ =\frac{1}{n}\sum _{j=1}^{n}\int \int F(x-{\sigma}_{j}{\lambda}^{1/2}v-{\sigma}_{j}z)\phi (v)\phi (z)\mathit{dudz}\\ =\frac{1}{n}\sum _{j=1}^{n}E[F(x-{(1+\lambda )}^{1/2}{\sigma}_{j}Z)]\end{array}$$

where *Z* is a standard normal random variable. By Lebesgue dominated convergence theorem, lim_{λ}_{→−1} *E* [*Ĝ**(*x, λ*)]= *F*(*x*).

Using Taylor expansion, if, $max({\sigma}_{j}^{4})$ is small, the above expectation can be written as

$$\begin{array}{l}E\left[{\widehat{G}}^{\ast}(x,\lambda )\right]=F(x)+\frac{1}{2n}{F}^{\u2033}(x)(\lambda +1)\sum _{j=1}^{n}{\sigma}_{j}^{2}\\ +\frac{3}{4!n}{F}^{(4)}(x){(\lambda +1)}^{2}\sum _{j=1}^{n}{\sigma}_{j}^{4}+o\left(\frac{1}{n}\sum _{j=1}^{n}{\sigma}_{j}^{4}\right).\end{array}$$

(8)

Hence the expectation of *Ĝ**(*x, λ*) is well-approximated by a quadratic function of *λ* at a given *x* with the approximation error
$o\left({\scriptstyle \frac{1}{n}}{\sum}_{j=1}^{n}{\sigma}_{j}^{4}\right)$. *i.e. E*[*Ĝ**(*x, λ*)] ≈ *β*_{0} + *β*_{1}*λ* + *β*_{2}*λ*^{2}.

Therefore,

$$\begin{array}{l}E\left[\widehat{F}(x,\lambda )\right]=E\left[{\widehat{\beta}}_{0}-{\widehat{\beta}}_{1}+{\widehat{\beta}}^{2}\right]=\underset{\lambda \to -1}{lim}E\left[{\widehat{\beta}}_{0}-{\widehat{\beta}}_{1}\lambda +{\widehat{\beta}}_{2}{\lambda}^{2}\right]\\ \approx \underset{\lambda \to -1}{lim}E\left[{\widehat{G}}^{\ast}(x,\lambda )\right]=F(x).\end{array}$$

By analogous argument as above, the variance of *Ĝ**(*x, λ*) is

$$\begin{array}{l}\mathit{Var}({\widehat{G}}^{\ast}(t,\lambda ))=\frac{1}{{n}^{2}}\sum _{j=1}^{n}\mathit{Var}(\mathrm{\Phi}(\frac{x-{Y}_{j}}{{\lambda}^{1/2}{\sigma}_{j}}))\\ =\frac{1}{{n}^{2}}\sum _{j=1}^{n}[E\{2\mathrm{\Phi}(V)F[x-{(1+\lambda )}^{1/2}{\sigma}_{j}Z]\}\\ -\{E\mid F(x-{(1+\lambda )}^{1/2}{\sigma}_{j}Z)]{\}}^{2}]\end{array}$$

where *V* is a standard normal random variable,

$$Z=\frac{V+{\lambda}^{1/2}W}{{(1+\lambda )}^{1/2}},$$

which has standard normal distribution. *W* is a standard normal random variable, independent of *V*.

By Lebesgue dominated convergence theorem,

$$\underset{\lambda \to -1}{lim}\mathit{Var}\left[{\widehat{G}}^{\ast}(x,\lambda )\right]=F(x)(1-F(x))/n,$$

thus, it can then be estimated by * _{X,SIMEX}*(

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Azzalini A. A note on the estimation of a distribution function and quantiles by a kernel method. Biometrika. 1981;68:326–328.
- Ballico M. Calculation of key comparison reference values in the presence of non-zero-mean uncertainty distributions, using the maximum-likelihood technique. Metrologia. 2001;38:155–159.
- Bogie KM, Wang X, Fei B, Sun J. New technique for real-time interface pressure analysis: Getting more out of large image data sets. Journal of Rehabilitation Research and Development. 2008;45 (4):523–536. [PMC free article] [PubMed]
- Bogie KM, Wang X, Triolo RJ. Long-term prevention of pressure ulcers in high-risk patients: a single case study of the use of gluteal neuromuscular electric stimulation. Arch Phys Med Rehabil. 2006;87 (4):585–591. [PubMed]
- Bowman A, Hall P, Prvan T. Bandwidth selection for the smoothing of distribution functions. Biometrika. 1998;85:799–808.
- Carroll RJ, Hall P. Optimal rates of convergence for deconvolving a density. Journal of the American Statistical Associations. 1989;83:1184–1186.
- Carroll RJ, Maca J, Ruppert D. Nonparametric regression in the presence of measurement error. Biometrika. 1999;86:541–554.
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C. Measurement Error in Nonlinear Models: A Modern Perspective. 2. Chapman Hall; New York: 2006.
- Cheng CL, Riu J. On estimating linear relationships when both variables are subject to heteroscedastic measurement errors. Technometrics. 2006;48:511–519.
- Cook JR, Stefanski LA. Simulation extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association. 1994;89:1314–1328.
- Cordy C, Thomas D. Deconvolution of a distribution function. Journal of the American Statistical Association. 1996;92:1459–1465.
- Delaigle A, Gijbels I. Practical bandwidth selection in deconvolution kernel density estimation. Computational Statistics and Data Analysis. 2004;45:249–267.
- Delaigle A, Meister A. Density estimation with heteroscedastic error. Bernoulli. 2008;14:562–579.
- Fan J. On the optimal rates of convergence for nonparametric deconvolution problems. The Annals of Statistics. 1991;19:1257–1272.
- Fan J. Deconvolution with supersmooth distributions. The Canadian Journal of Statistics. 1992;20:155–169.
- Fuller WA. Measurement Error Models. John Wiley & Sons; New York: 1987.
- Hall P, Lahiri SN. Estimation of distributions, moments and quantiles in deconvolution problems. Annals of Statistics. 2008;36 (5):2110–2134.
- Kooperberg C, Stone C. Logspline density estimation for censored data. Journal of Computational and Graphical Statistics. 1992;1:301–328.
- Kulathinal SB, Kuulasmaa K, Gasbarra D. Estimation of an errors-in-variables regression model when the variances of the measurement errors vary between the observations. Statistics in Medicine. 2002;21:1089–1101. [PubMed]
- Nadaraya EA. Some new estimates for distribution functions. Theory of Probability and its Applications. 1964;9:497–500.
- Nusser SM, Carriquiry AL, Dodd KW, Fuller WA. A semi-parametric transformation approach to estimating usual daily intake distributions. Journal of the American Statistical Association. 1996;91 (436):1440–1449.
- Reiss RD. Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics. 1981;8:116–119.
- Sarda P. Smoothing parameter selection for smooth distribution functions. Journal of Statistical Planning and Inference. 1993;35:65–75.
- Savin SK. Reliability of parameter control with exponential error. Measurement Techniques. 2000;43:329–334.
- Silverman BW. Density Estimation. Chapman and Hall; London: 1986.
- Staudenmayer J, Ruppert D. Local polynomial regression and simulation-extrapolation. Journal of the Royal Statistical Society, Series B. 2004;66:17–30.
- Staudenmayer J, Ruppert D, Buonaccorsi J. Density estimation in the presence of heteroskedastic measurement error. Journal of the American Statistical Association. 2008;103:726–736.
- Stefanski LA, Bay JM. Simulation extrapolation deconvolution of finite population cumulative distribution function estimators. Biometrika. 1996;83:407–417.
- Stefanski LA, Carroll RJ. Deconvoluting kernel density estimators. Statistics. 1990;21:169–184.
- Stefanski LA, Cook JR. Simulation extrapolation: the measurement error jackknife. Journal of the American Statistical Association. 1995;90:1247–1256.
- Sun J, Morrison H, Harding P, Woodroofe M. Density and mixture estimation from data with measurement errors. 2002. Technical Report, http://sun.cwru.edu/~jiayang/india.ps.
- Zhang CH. Fourier methods for estimating mixing densities and distributions. The Annals of Statistics. 1990;18:806–830.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |