|Home | About | Journals | Submit | Contact Us | Français|
Association studies in environmental statistics often involve exposure and outcome data that are misaligned in space. A common strategy is to employ a spatial model such as universal kriging to predict exposures at locations with outcome data and then estimate a regression parameter of interest using the predicted exposures. This results in measurement error because the predicted exposures do not correspond exactly to the true values. We characterize the measurement error by decomposing it into Berkson-like and classical-like components. One correction approach is the parametric bootstrap, which is effective but computationally intensive since it requires solving a nonlinear optimization problem for the exposure model parameters in each bootstrap sample. We propose a less computationally intensive alternative termed the “parameter bootstrap” that only requires solving one nonlinear optimization problem, and we also compare bootstrap methods to other recently proposed methods. We illustrate our methodology in simulations and with publicly available data from the Environmental Protection Agency.
A challenge for association studies in environmental statistics is that we cannot directly measure the exposure at every location where there is outcome data. Modern Geographic Information System (GIS) technology makes it feasible to sample environmental exposures and then to predict exposures at unmonitored locations using a statistical model such as universal kriging that exploits dependence on GIS covariates and incorporates spatial smoothing (Cressie, 1993). The overall strategy is to use predicted exposures in place of the true exposures at locations with outcome data in order to estimate the parameter of interest in a regression model. The problem that we address in this paper is how to ensure valid inference in light of the resulting measurement error.
An example application in environmental epidemiology is evaluating the relationship between exposure to ambient air pollution and adverse health outcomes. Many studies have documented adverse effects of air pollution e.g. (Dockery and others, 1993; Samet2000; Pope2002), and recent studies emphasize the importance of using predicted individual air pollution exposures to account for spatial variability within urban areas (Jerrett and others, 2005b; Kunzli2005; Gryparis2007; Szpiro2009). Other environmental applications that do not involve human health effects are analogous from a statistical perspective. An example that we will return to later in this paper involves assessing the relationship between stream water quality and nearby watershed land cover (Madsen and others, 2008; Herlihy1998).
Various methods have been employed for predicting exposures, including nearest neighbor interpolation (Miller and others, 2007), regression based on GIS covariates (Brauer and others, 2003; Jerrett2005a), interpolation by a geostatistical method such as kriging (Jerrett and others, 2005b; Kunzli2005), and semi-parametric smoothing (Gryparis and others, 2007; Kunzli2005). All these methods result in measurement error that does not fit into the standard categories of classical or Berkson error (Carroll and others, 2006). In this paper, we focus on universal kriging.
Kim and others (2009) have shown that using predicted exposures from kriging performs better than nearest neighbor interpolation but significant errors may remain resulting in confidence intervals that do not provide correct coverage. Gryparis and others (2009) review the relevant measurement error literature and compare several correction strategies in a simulation study, and Madsen and others (2008) apply a version of the parametric bootstrap to obtain corrected standard errors (SEs).
The parametric bootstrap is effective, but it is computationally intensive since it requires solving a nonlinear optimization problem to estimate the exposure model parameters in each bootstrap sample. For a universal kriging exposure model with 450 monitors (as in the examples considered here), each nonlinear optimization takes 30–60 s on an Intel Xeon processor running at 2.33 GHz, so a parametric bootstrap with only 100 samples would take approximately 1 h. This is uncomfortably long for routine usage, but it is feasible if the bootstrap is employed judiciously. If we consider, instead, a more complex spatiotemporal model of the kind being used in modern air pollution studies (Szpiro and others, 2010), the time required for a single optimization is an hour or more, so a full parametric bootstrap is essentially impractical unless its use is restricted to a very limited number of definitive analyses.
We describe a new method termed the “parameter bootstrap” that is a less computationally demanding approximation to the parametric bootstrap. The parameter bootstrap is consistent with a decomposition of the measurement error into 2 approximately independent components, one of which is similar to Berkson error (“Berkson-like”) and the other of which is similar to classical measurement error (“classical-like”). We develop our methodology in a setting where we use universal kriging to predict the exposure and where we model the association of interest with linear regression, including the possibility of spatially correlated residuals. The methodology extends easily to more complex spatiotemporal exposure models that generalize universal kriging (Banerjee and others, 2004), (Szpiro and others, 2010).
In Section 2, we introduce notation and formally set out the problem. In Section 3, we characterize the measurement error by decomposing it into Berkson-like and classical-like components, and in Section 4, we define the parametric and parameter bootstraps and briefly review 2 alternative strategies that have been proposed in recently published papers. In Section 5, we illustrate our methodology in a simulation study and compare it to other methods, and in Section 6, we consider an example with publicly available stream data from the Environmental Protection Agency (EPA). We conclude in Section 7 with a discussion.
Consider an association study with the N×1 vector of observed outcomes Y, N×1 vector of exposures X, and N×m matrix of covariates Z. Assume a linear regression model
with regression coefficient of interest βX. Assume that ε is an N×1 random vector distributed as N(0,Σε(θε)), for a positive-definite matrix function Σε(·) and unknown parameter θε.
Inference for βX would be straightforward if X, Y, and Z were all observed. We would estimate by ordinary least squares (OLS) and then estimate from the residuals and use a sandwich-based SE (Liang and Zeger, 1986). If the ε are independent, and we estimate by the method-of-moments, the sandwich form reduces to the classical SE estimate.
We are interested in the situation where Y and Z are observed, but instead of X we observe the N*×1 vector X* of exposures at different locations. N* is the number of exposure monitors. Assume that X and X* are jointly distributed as
In this expression, S and S* are known N×k and N*×k dimensional matrices of GIS covariates, α is an unknown k×1 vector of coefficients, and
independent of ε, for a positive-definite matrix function Σ(ηη*)(·) and unknown parameter θη. It is useful to introduce the decomposition
Universal kriging is a special case if θη comprises the range, partial sill, and nugget parameters from a geostatistical model (Cressie, 1993).
Although the exposure X is not observed directly, we can exploit the observed values X* and the spatial model in (2.2) to estimate as follows. First, we estimate the exposure model parameters and based on X* by maximum likelihood or another nonlinear optimization approach, and then we define the estimated exposure by
Since we are interested in frequentist sampling properties of an estimator for βX, we take care to specify the assumed data-generating mechanism. All the geographic locations are fixed and known, as are the corresponding GIS covariates S and S* and any covariates Z in the outcome model. The regression coefficients β0, βX, βZ, and α and variance parameters θε and θη are all fixed but unknown. A realization from the data-generating mechanism is obtained by drawing from the joint distribution of ε, η, and η*.
If we ignore the measurement error from using W in place of X, we can derive naïve SEs by the procedure described at the beginning of Section 2. However, these SEs are based on the assumption that all sampling variability in is induced by ε, and they ignore the additional sampling variability from U = X − W that is induced by η and η*. Therefore, naïve SEs will typically not estimate the true sampling variability of . In addition to altering the SEs, the measurement error may introduce bias.
We decompose the measurement error into 2 components
where the Berkson-like component is
and the classical-like component is
The Berkson-like component UBL accounts for variability from η and η*, conditional on known exposure model parameters, and the classical-like component UCL incorporates additional variability from η* in estimating the exposure model parameters. Both of these components change the sampling variance of , and the classical-like component can also introduce bias.
Assume that the exposure model parameters α and θη are known so that UBL is the only source of measurement error. A primary feature of Berkson error is that it has mean zero conditional on the estimated exposure W p. 9 (Carroll and others, 2006). With known exposure model parameters, it is easy to see that this holds for UBL
The second line holds since W is deterministic conditional on X*, the third line is the definition of UBL, and the final line holds since α and θη are the parameters in the data-generating mechanism for X.
Since we can rewrite (2.1) in the form
it is easy to see that derived by OLS with W in place of X is unbiased for estimating βX. We verify this by conditioning on W, exploiting the fact that E(UBL|W) = 0, and then taking the expectation of over the sampling distribution of W. As in the case of Berkson error, the effect of UBL is to make W less variable than the true exposure X, effectively adding to the variance of the noise in the outcome model and resulting in increased variability of .
It is tempting to carry the Berkson analogy further and argue that we can derive valid SEs by accounting for the correlation in the new noise term ε′ = UBLβX + ε, using either generalized least squares or the sandwich estimator, as would be appropriate for Berkson error with a nondiagonal covariance p. 90 (Gryparis and others, 2009), (Szpiro and others, 2008), (Carroll and others, 2006). This reasoning is not completely correct, however, because it is based on treating W as fixed.
A primary feature of classical measurement error is that it increases the variability of W relative to X, introducing variation that is not correlated with the outcome Y p. 28 (Carroll and others, 2006). UCL is analogous since it comprises the error from estimating the exposure model parameters, which introduces variability that is not informative for Y. Strictly speaking, UCL is not independent of Y since and are derived from X* which is correlated with X. It is also not independent across locations. Therefore, we emphasize that UCL is similar to classical measurement error but also distinct in important ways, so we cannot rely on standard measurement error correction techniques like regression calibration.
Our simulation results in Section 5 suggest that the dominant effect of UCL is to increase the sampling variability of . The bias in our examples is relatively small, but it has some interesting features and for completeness we illustrate bias correction using bootstrap methods and discuss the theoretical properties of bias from this form of classical-like error in the Online Supplement 1 (see supplementary material available at Biostatistics online). One interesting finding is that the bias can be away from the null, rather than toward the null as in the case of standard classical measurement error.
A natural, but computationally intensive, approach to estimating SEs and correcting bias is the parametric bootstrap (Davison and Hinkley, 1997; Madsen2008). The parameter estimate of interest is calculated as in Section 2, and we wish to approximate its sampling distribution under the true data-generating mechanism. We do this by simulating bootstrap samples under our best estimate of the data-generating mechanism and calculating their empirical distribution. Given a set of observations Y and X*, the parametric bootstrap SE based on M bootstrap samples is derived as follows:
Note that in step 3(a), we simulate Xj in order to obtain Yj, but we do not use Xj in the remainder of the procedure. See the Online Supplement 3 (see supplementary material available at Biostatistics online) for additional implementation details.
It is straightforward to use to estimate and correct for bias rather than to derive SEs (Davison and Hinkley, 1997). In principle, we need a nested double bootstrap to rigorously derive a bias-corrected point estimates and corresponding SEs, but such a procedure can require M2 bootstrap samples which is very computationally intensive. Since the bias tends to be small in our examples, we approximate a nested double bootstrap by applying a bias correction and estimating SEs based on the same set of M bootstrap samples, so our SEs do not include the additional variability from bias correction.
The idea of the parameter bootstrap is to decrease the computational burden by eliminating the nonlinear optimization that is repeated M times in step 3(b) above. This is feasible because we can typically obtain an estimate of the sampling distribution for and in step (1) without much additional computation. The procedure differs from the parametric bootstrap in the addition of step 1(a) and modification of step 3(b).
As described in the Online Supplement 3 (see supplementary material available at Biostatistic online), our implementation of step 1(a) uses a Gaussian approximation centered at the maximum likelihood value with covariance based on the estimated Hessian. In the Online Supplement 2 (see supplementary material available at Biostatistic online), we also describe assumptions that underlie validity of the parameter bootstrap.
If we neglect the classical-like error, another alternative is to modify the parameter bootstrap by using and in each bootstrap sample instead of drawing new values from the estimated sampling distribution as in step 3(b). We call this the partial parametric bootstrap. Since the partial parametric bootstrap only accounts for Berkson-like error, we use it to estimate SEs but not for bias correction.
Two alternative methods have been proposed in recently published papers. We describe these approaches briefly and compare them to our proposed bootstrap methodology in the simulation and data examples that follow. For more details, we refer an interested reader to the cited papers.
Gryparis and others (2009) and Madsen and others (2008) propose jointly modeling the exposure and outcome data in order to estimate . Since the joint model is multivariate normal given the parameters, it is possible to write down a joint likelihood and estimate all the model parameters by either maximum likelihood or Bayesian methods. We show results in a subset of our examples from a joint model fit by maximum likelihood.
Another approach proposed by Gryparis and others (2009) is to leave out a subset of the monitoring data for out-of-sample validation and then to use regression calibration to derive bias-corrected effect estimates. This approach is based on a classical measurement error model, which we and they have shown does not hold. It also requires fitting the exposure model with only a subset of the available data. The out-of-sample regression calibration algorithm given by Gryparis and others (2009) is for uncorrelated outcomes, and we implement their algorithm for a subset of our examples with uncorrelated outcomes under the optimistic assumption that 50 additional validation monitors are available.
We conduct a simulation study based on the data we analyze below in Section 6. We use the universal kriging exposure model described above and allow for correlation in the outcome residuals. In a 300×500 box, we randomly select locations for N* = 450 exposure monitors and N = 100 or 2000 outcome measurements. The x and y coordinates are covariates in the universal kriging exposure model with regression coefficients α = ( − 25.95, − 0.0035,0.00084)t, and the spatial correlation has an exponential variogram structure with range η = 24.13, partial sill ψη = 3.76, and nugget τη = 1.34 (Cressie, 1993).
The linear regression model for the outcome conditional on X is (2.1), with β0 = 5.06, β1 = − 0.322, and no additional covariates Z. We consider the case of uncorrelated residuals ε with variance σε2 = 0.76, and we also consider the case of correlated residuals ε following an exponential variogram structure with range ε = 80.39, partial sill ψε = 0.26, and nugget τε = 0.50.
In Table 1, we summarize the results for 2000 Monte Carlo simulations. Due to the computational intensity of the parametric bootstrap, we restrict to 100 Monte Carlo runs for these results. Scatterplots illustrating agreement between parameter bootstrap and parametric bootstrap SEs are shown in Figure 1. With uncorrelated outcomes, naïve SEs that do not correct for measurement error are too small compared to the observed sampling distribution of , resulting in less than nominal coverage for 95% confidence intervals. The parameter bootstrap consistently gives near nominal confidence interval coverage. There is some evidence of over-coverage with the parametric bootstrap, but this may be attributable to the smaller number of simulations. In the correlated outcome model, the coverage for naïve SEs is closer to nominal. This is presumably because much of the Berkson-like error appears as additional correlated variability in the outcome and is accounted for by the sandwich form.
Simulation results for universal kriging exposure surface with range η = 24.13. The columns give the bias and standard deviation (SD) of the estimates, the mean and mode of the estimated SEs, and the coverage for % Wald confidence intervals ...
Scatterplot showing agreement between parametric bootstrap SEs and parameter bootstrap approximation, based on 100 Monte Carlo simulations. Partial parametric bootstrap and uncorrected SEs are also included for comparison.
We also show SEs calculated with the partial parametric bootstrap. In scenarios with N = 100, the partial parametric bootstrap gives similar results to the parameter bootstrap. This suggests that the Berkson-like component of measurement error dominates in this situation. For N = 2000, however, the partial parametric bootstrap SEs are significantly smaller, indicating that the classical-like component is important in that setting.
The bias in is a very small component of the total error in all scenarios, and it is partially corrected by the parameter bootstrap. Due to the small number of parametric bootstrap simulations, it is difficult to confirm that this approach provides an effective bias correction, but given the general agreement with the parameter bootstrap we expect it to perform similarly.
We show results for out-of-sample regression calibration and joint maximum likelihood estimation in Table 1 for the cases with uncorrelated outcomes. We do not include cases with correlated outcomes because the regression calibration method in Gryparis and others (2009) is not applicable, and the maximum likelihood algorithm failed to consistently converge due to difficulty in identifying the covariance structures between the exposure and outcome variables. Regression calibration results in reasonable coverage probabilities but much larger SEs than the other methods. Joint maximum likelihood estimation also gives nominal coverage probabilities, with somewhat smaller SEs than the 2-step methods with bootstrap corrections. We restricted this part of the simulation study to 50 Monte Carlo simulations due to the computational burden (it took on average 8 h to optimize the joint likelihood with N = 2000, compared to less than 5 min for the parameter bootstrap).
Finally, in Table 2, we illustrate the impact of misspecifying the spatial correlation in the exposure model. Other types of model misspecification are also possible, but we focus on the spatial correlation in the exposure since it is particularly difficult to know in advance. We consider Gaussian, spherical, and cubic variogram models, but we assume an exponential model for the purposes of estimation. Ignoring measurement error or using the partial parametric bootstrap results in less than nominal confidence interval coverage. The parameter bootstrap and parametric bootstrap give nearly nominal coverage, as do out-of-sample regression calibration and joint maximum likelihood fitting. As in the case of a correctly specified model, regression calibration results in much larger SEs, while joint maximum likelihood gives somewhat smaller standard errors than the bootstrap approaches (at significant computational cost). There is bias that the bootstrap methods fail to correct, while maximum likelihood estimates are nearly unbiased.
Simulation results for misspecified variance models. All simulations are fit with an exponential variogram model, but the data are generated according to either a Gaussian, spherical, or cubic model. The columns give the bias and standard deviation (SD) ...
The Environmental Monitoring and Assessment Program (EMAP) was conducted by the EPA from 1990–2006 to advance the science of ecological risk assessment and improve the EPA's ability to estimate current and future risks to the health of aquatic ecological resources (US EPA monitoring and assessment program, 1999). Previous work has found that there is a strong relationship between local land use and water quality in nearby streams (Herlihy and others, 1998). For example, a higher percentage of local forestation has been found to be associated with improved stream water quality. Following Madsen and others (2008), we consider forestation level (logit((0.98×%forestation) + 1)) as the exposure and analyze its association with chloride concentrations (log(μEq/L)), where elevated chloride concentrations are a marker for poor stream water quality.
We use EMAP data from the Mid-Atlantic Highlands region of the eastern United States collected during the years 1993–1996 (US EPA monitoring and assessment program, 1999). Where multiple measurements are available from different times at the same location, we use the earliest time. The outcome and the exposure are both available at a total 422 of these locations, so based on these locations, we can estimate the coefficient in a linear model without measurement error. Allowing for spatially correlated outcomes, we find a highly statistically significant negative association between local forestation and chloride concentrations ( = −0.346, SE = 0.025).
At an additional 157 locations, only the chloride concentrations are available. To assess the impact of measurement error and our correction methods, we use a universal kriging model (with an exponential variogram and latitude and longitude as covariates) to predict the exposure at these locations and re-estimate the association using the predicted exposures and measured outcomes at 157 locations. The exposure model parameter estimates are = (−28.91, −0.0037, 0.0012), range = 13.92km, partial sill , and nugget . A map of the respective locations is shown in Figure 2.
EPA stream locations, for example, data analysis (N* = 422 sites with exposure and outcome data, N = 157 locations with outcome data only).
The results are shown in Table 3. The uncorrected effect estimate is − 0.390 with a SE of 0.105, which is consistent with primarily Berkson-like measurement error since there is little change in the effect estimate compared to the case with no measurement error (only a small part of the increased SE can be attributed to the smaller sample size of 157 instead of 422). The partial parametric bootstrap does not change the estimated SE at all, which suggests that the Berkson-like error is nearly pure Berkson error. The parameter bootstrap and parametric bootstrap increase the estimated SEs slightly (0.115 and 0.111, respectively) and estimate little or no bias. Joint maximum likelihood optimization failed to converge (even after trying 20 random initial conditions) due to difficulty in distinguishing between spatial correlation in the measurement error and the outcome, indicating that joint modeling is not feasible for the present example.
Results of estimating the relationship between the log-transformed chloride level (“outcome”) and the logit-transformed percent local forestation (“exposure”) in streams, using EPA data from the Mid-Atlantic Highlands region ...
In conclusion, when we use predicted exposures at stream locations where the true exposures are not available, we see statistically significant evidence of a negative association between local forestation and stream water quality as measured by chloride concentrations. The SE is inflated by the presence of measurement error, and the correct SE can be estimated at little additional computational cost by the parameter bootstrap. It turns out that a naïve analysis also gives nearly correct SEs, although we needed to do the bootstrap analysis to verify this finding.
We have characterized the measurement error from using smoothing to predict exposures in environmental statistics association studies when the exposure and outcome data are misaligned in space. The resulting measurement error has a Berkson-like component from information lost in smoothing and a classical-like component that is related to uncertainty associated with estimating the smoothing parameters.
The measurement error structure we have identified is complex because it is a mixture of 2 types of error, neither one of which fits exactly into the traditional categories of Berkson or classical. Therefore, standard measurement error correction methods are not appropriate. If we are willing to assume that the exposure and outcome models are correctly specified, we can use a parametric bootstrap to estimate bias and standard errors. This requires that we be precise about the assumed data-generating mechanism since the idea is to draw multiple samples from an approximation to the data-generating mechanism, with parameters estimated from observed data. We have defined a data-generating mechanism that is consistent with the geostatistical kriging model we use for smoothing. Although it is well known that geostatistical methods are useful for interpolating physical processes that arise in environmental statistics, it is not clear what real-world phenomenon the spatial random effect represents. A promising direction for future research is to investigate the scientific validity of this and other possible data-generating mechanisms and to characterize the implications for measurement error correction.
We propose the parameter bootstrap as a less computationally intensive approximation to the parametric bootstrap. The main assumption required for the parameter bootstrap is that the estimated sampling distribution for the exposure model parameter estimates be a valid approximation to the true sampling distribution. In general, this is true for sufficiently rich exposure data, but in some applications, the available monitoring data are limited. As a computational tool, the parameter bootstrap is necessary when estimating the exposure model parameters is computationally intensive, which occurs when there is a relatively large amount of exposure data. Therefore, the choice between the parametric bootstrap and the parameter bootstrap should be informed by the amount of exposure data available, considering implications for the computational burden of the parametric bootstrap and the validity of the parameter bootstrap.
In the universal kriging model considered here with 450 exposure monitors, the parametric bootstrap is marginally feasible, requiring approximately 1 h of computing time for 100 bootstrap samples (compared to less than 5 min for the parameter bootstrap). One practical compromise is to use the parameter bootstrap with a larger number of bootstrap samples as the primary correction and to validate it with the parametric bootstrap using a limited number of samples. In applications where the exposure model is more complex (e.g. the spatiotemporal air pollution model described by Szpiro and others, 2010), a single optimization can take on the order of an hour so the parameter bootstrap's computational advantage becomes even more important.
We compared our bootstrap methods to 2 recently proposed alternatives (Gryparis and others, 2009; Madsen2008). Regression calibration is incompatible with the theoretical properties of the measurement error, and it results in much larger SEs than any of the other alternatives. Joint estimation by maximum likelihood performed consistently well, even with a misspecified exposure model and resulted in somewhat smaller SEs than bootstrap methods. However, as Gryparis and others (2009) point out, the joint estimation methodology can be extremely computationally intensive and can lead to spurious feedback into the exposure when there are outliers or misspecification in the outcome model. It is also difficult to fit a joint model with spatial correlation in the outcomes, due to the challenge in distinguishing this correlation from correlation in the exposure. The parameter bootstrap is a computationally efficient alternative that works well in a wide range of settings, and further research comparing it to the joint modeling approach is needed to determine which is preferable for problems where both are feasible.
US Environmental Protection Agency (R831697); Assistance Agreement (CR-83407101); National Institute of Environmental Health Sciences (R01-ES009411 and P50 ES051915). It has not been subjected to peer and policy review so no official endorsement should be inferred.
The authors thank the associate editor and 2 anonymous referees for helpful suggestions.
conflict of Interest: None declared.