In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model.
We present an asymptotic treatment of errors involved in point-based image registration where control point (CP) localization is subject to heteroscedastic noise; a suitable model for image registration in fluorescence microscopy. Assuming an affine transform, CPs are used to solve a multivariate regression problem. With measurement errors existing for both sets of CPs this is an errors-in-variable problem and linear least squares is inappropriate; the correct method being generalized least squares. To allow for point dependent errors the equivalence of a generalized maximum likelihood and heteroscedastic generalized least squares model is achieved allowing previously published asymptotic results to be extended to image registration. For a particularly useful model of heteroscedastic noise where covariance matrices are scalar multiples of a known matrix (including the case where covariance matrices are multiples of the identity) we provide closed form solutions to estimators and derive their distribution. We consider the target registration error (TRE) and define a new measure called the localization registration error (LRE) believed to be useful, especially in microscopy registration experiments. Assuming Gaussianity of the CP localization errors, it is shown that the asymptotic distribution for the TRE and LRE are themselves Gaussian and the parameterized distributions are derived. Results are successfully applied to registration in single molecule microscopy to derive the key dependence of the TRE and LRE variance on the number of CPs and their associated photon counts. Simulations show asymptotic results are robust for low CP numbers and non-Gaussianity. The method presented here is shown to outperform GLS on real imaging data.
Errors-in-variable; fluorescence microscopy; generalized least squares; image registration
Exposure measurement error is a problem in many epidemiological studies, including those using biomarkers and measures of dietary intake. Measurement error typically results in biased estimates of exposure-disease associations, the severity and nature of the bias depending on the form of the error. To correct for the effects of measurement error, information additional to the main study data is required. Ideally, this is a validation sample in which the true exposure is observed. However, in many situations, it is not feasible to observe the true exposure, but there may be available one or more repeated exposure measurements, for example, blood pressure or dietary intake recorded at two time points. The aim of this paper is to provide a toolkit for measurement error correction using repeated measurements. We bring together methods covering classical measurement error and several departures from classical error: systematic, heteroscedastic and differential error. The correction methods considered are regression calibration, which is already widely used in the classical error setting, and moment reconstruction and multiple imputation, which are newer approaches with the ability to handle differential error. We emphasize practical application of the methods in nutritional epidemiology and other fields. We primarily consider continuous exposures in the exposure-outcome model, but we also outline methods for use when continuous exposures are categorized. The methods are illustrated using the data from a study of the association between fibre intake and colorectal cancer, where fibre intake is measured using a diet diary and repeated measures are available for a subset. © 2014 The Authors.
measurement error; regression calibration; moment reconstruction; multiple imputation; diet diary; food frequency questionnaire; nutritional epidemiology
Litter decomposition rate (k) is typically estimated from proportional litter mass loss data using models that assume constant, normally distributed errors. However, such data often show non-normal errors with reduced variance near bounds (0 or 1), potentially leading to biased k estimates. We compared the performance of nonlinear regression using the beta distribution, which is well-suited to bounded data and this type of heteroscedasticity, to standard nonlinear regression (normal errors) on simulated and real litter decomposition data. Although the beta model often provided better fits to the simulated data (based on the corrected Akaike Information Criterion, AICc), standard nonlinear regression was robust to violation of homoscedasticity and gave equally or more accurate k estimates as nonlinear beta regression. Our simulation results also suggest that k estimates will be most accurate when study length captures mid to late stage decomposition (50–80% mass loss) and the number of measurements through time is ≥5. Regression method and data transformation choices had the smallest impact on k estimates during mid and late stage decomposition. Estimates of k were more variable among methods and generally less accurate during early and end stage decomposition. With real data, neither model was predominately best; in most cases the models were indistinguishable based on AICc, and gave similar k estimates. However, when decomposition rates were high, normal and beta model k estimates often diverged substantially. Therefore, we recommend a pragmatic approach where both models are compared and the best is selected for a given data set. Alternatively, both models may be used via model averaging to develop weighted parameter estimates. We provide code to perform nonlinear beta regression with freely available software.
We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.
Generalized least squares; Heteroscedasticity; Large p small n; Model selection; Sparse regression; Variance estimation
Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile) scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis.
We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC).
In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis.
Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a percentile scale. Relating back to the original scale of the exposure solves the problem. The conclusion regards all regression models.
Toxicologists and pharmacologists often describe toxicity of a chemical using parameters of a nonlinear regression model. Thus estimation of parameters of a nonlinear regression model is an important problem. The estimates of the parameters and their uncertainty estimates depend upon the underlying error variance structure in the model. Typically, a priori the researcher would know if the error variances are homoscedastic (i.e., constant across dose) or if they are heteroscedastic (i.e., the variance is a function of dose). Motivated by this concern, in this article we introduce an estimation procedure based on preliminary test which selects an appropriate estimation procedure accounting for the underlying error variance structure. Since outliers and influential observations are common in toxicological data, the proposed methodology uses M-estimators. The asymptotic properties of the preliminary test estimator are investigated; in particular its asymptotic covariance matrix is derived. The performance of the proposed estimator is compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using a data set obtained from the National Toxicology Program.
Asymptotic normality; Dose-response study; Heteroscedasticity; Hill model; M-estimation procedure; Preliminary test estimation; Toxicology
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material.
Bias correction; Method-of-moments correction; Subsampling extrapolation
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case–control study.
Calibration sample; Estimating equation; Heteroscedastic measurement error; Nonparametric correction
Preterm birth, defined as delivery before 37 completed weeks’ gestation, is a leading cause of infant morbidity and mortality. Identifying factors related to preterm delivery is an important goal of public health professionals who wish to identify etiologic pathways to target for prevention. Validation studies are often conducted in nutritional epidemiology in order to study measurement error in instruments that are generally less invasive or less expensive than ”gold standard” instruments. Data from such studies are then used in adjusting estimates based on the full study sample. However, measurement error in nutritional epidemiology has recently been shown to be complicated by correlated error structures in the study-wide and validation instruments. Investigators of a study of preterm birth and dietary intake designed a validation study to assess measurement error in a food frequency questionnaire (FFQ) administered during pregnancy and with the secondary goal of assessing whether a single administration of the FFQ could be used to describe intake over the relatively short pregnancy period, in which energy intake typically increases. Here, we describe a likelihood-based method via Markov Chain Monte Carlo to estimate the regression coefficients in a generalized linear model relating preterm birth to covariates, where one of the covariates is measured with error and the multivariate measurement error model has correlated errors among contemporaneous instruments (i.e. FFQs, 24-hour recalls, and/or biomarkers). Because of constraints on the covariance parameters in our likelihood, identifiability for all the variance and covariance parameters is not guaranteed and, therefore, we derive the necessary and suficient conditions to identify the variance and covariance parameters under our measurement error model and assumptions. We investigate the sensitivity of our likelihood-based model to distributional assumptions placed on the true folate intake by employing semi-parametric Bayesian methods through the mixture of Dirichlet process priors framework. We exemplify our methods in a recent prospective cohort study of risk factors for preterm birth. We use long-term folate as our error-prone predictor of interest, the food-frequency questionnaire (FFQ) and 24-hour recall as two biased instruments, and serum folate biomarker as the unbiased instrument. We found that folate intake, as measured by the FFQ, led to a conservative estimate of the estimated odds ratio of preterm birth (0.76) when compared to the odds ratio estimate from our likelihood-based approach, which adjusts for the measurement error (0.63). We found that our parametric model led to similar conclusions to the semi-parametric Bayesian model.
Adaptive-Rejection Sampling; Dirichlet process prior; MCMC; Semiparametric Bayes
We consider statistical inference on a regression model in which some covariables are measured with errors together with an auxiliary variable. The proposed estimation for the regression coefficients is based on some estimating equations. This new method alleates some drawbacks of previously proposed estimations. This includes the requirment of undersmoothing the regressor functions over the auxiliary variable, the restriction on other covariables which can be observed exactly, among others. The large sample properties of the proposed estimator are established. We further propose a jackknife estimation, which consists of deleting one estimating equation (instead of one obervation) at a time. We show that the jackknife estimator of the regression coefficients and the estimating equations based estimator are asymptotically equivalent. Simulations show that the jackknife estimator has smaller biases when sample size is small or moderate. In addition, the jackknife estimation can also provide a consistent estimator of the asymptotic covariance matrix, which is robust to the heteroscedasticity. We illustrate these methods by applying them to a real data set from marketing science.
Linear regression model; noised variable; measurement error; auxiliary variable; estimating equation; jackknife estimation; asymptotic normality
Motivation: Immunoassays are primary diagnostic and research tools throughout the medical and life sciences. The common approach to the processing of immunoassay data involves estimation of the calibration curve followed by inversion of the calibration function to read off the concentration estimates. This approach, however, does not lend itself easily to acceptable estimation of confidence limits on the estimated concentrations. Such estimates must account for uncertainty in the calibration curve as well as uncertainty in the target measurement. Even point estimates can be problematic: because of the non-linearity of calibration curves and error heteroscedasticity, the neglect of components of measurement error can produce significant bias.
Methods: We have developed a Bayesian approach for the estimation of concentrations from immunoassay data that treats the propagation of measurement error appropriately. The method uses Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution of the target concentrations and numerically compute the relevant summary statistics. Software implementing the method is freely available for public use.
Results: The new method was tested on both simulated and experimental datasets with different measurement error models. The method outperformed the common inverse method on samples with large measurement errors. Even in cases with extreme measurements where the common inverse method failed, our approach always generated reasonable estimates for the target concentrations.
Availability: Project name: Baecs; Project home page: www.computationalimmunology.org/utilities/; Operating systems: Linux, MacOS X and Windows; Programming language: C++; License: Free for Academic Use.
Supplementary data are available at Bioinformatics online.
Purpose: The aim of this study was to select the best calibration model for determination of propofol plasma concentration by high-performance liquid chromatography method.
Methods: Determination of propofol in plasma after deproteinization with acetonitrile containing thymol (as internal standard) was carried out on a C18 column with a mixture of acetonitrile and trifluoroacetic acid 0.1% (60:40) as mobile phase which delivered at the flow rate of 1.2 mL/minute . Fluorescence detection was done at the excitation and emission wavelengths of 276 and 310 nm, respectively. After fitting different equations to the calibration data using weighted regression, the adequacy of models were assessed by lack-of-fit test, significance of all model parameters, adjusted coefficient of determination (R2adjusted) and by measuring the predictive performance with median relative prediction error and median absolute relative prediction error of the validation data set.
Results: The best model was a linear equation without intercept with median relative prediction error and median absolute relative prediction error of 4.0 and 9.4%, respectively in the range of 10-5000 ng/mL. The method showed good accuracy and precision.
Conclusion: The presented statistical framework could be used to choose the best model for heteroscedastic calibration data for analytes like propofol with wide range of expected concentration.
Propofol; High-performance liquid chromatography; Calibration; Heteroscedasticty; Weighted least squares regression
Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart's percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.
Data from many scientific areas often come with measurement error. Density or distribution function estimation from contaminated data and nonparametric regression with errors-in-variables are two important topics in measurement error models. In this paper, we present a new software package decon for
R, which contains a collection of functions that use the deconvolution kernel methods to deal with the measurement error problems. The functions allow the errors to be either homoscedastic or heteroscedastic. To make the deconvolution estimators computationally more efficient in
R, we adapt the fast Fourier transform algorithm for density estimation with error-free data to the deconvolution kernel estimation. We discuss the practical selection of the smoothing parameter in deconvolution methods and illustrate the use of the package through both simulated and real examples.
measurement error models; deconvolution; errors-in-variables problems; smoothing; kernel; faster Fourier transform; heteroscedastic errors; bandwidth selection
Spatial data with covariate measurement errors have been commonly observed in public health studies. Existing work mainly concentrates on parameter estimation using Gibbs sampling, and no work has been conducted to understand and quantify the theoretical impact of ignoring measurement error on spatial data analysis in the form of the asymptotic biases in regression coefficients and variance components when measurement error is ignored. Plausible implementations, from frequentist perspectives, of maximum likelihood estimation in spatial covariate measurement error models are also elusive. In this paper, we propose a new class of linear mixed models for spatial data in the presence of covariate measurement errors. We show that the naive estimators of the regression coefficients are attenuated while the naive estimators of the variance components are inflated, if measurement error is ignored. We further develop a structural modeling approach to obtaining the maximum likelihood estimator by accounting for the measurement error. We study the large sample properties of the proposed maximum likelihood estimator, and propose an EM algorithm to draw inference. All the asymptotic properties are shown under the increasing-domain asymptotic framework. We illustrate the method by analyzing the Scottish lip cancer data, and evaluate its performance through a simulation study, all of which elucidate the importance of adjusting for covariate measurement errors.
Measurement error; Spatial data; Structural modeling; Variance components; Asymptotic bias; Consistency and asymptotic normality; Increasing domain asymptotics; EM algorithm
Regression calibration (RC) is a popular method for estimating regression coefficients when one or more continuous explanatory variables, X, are measured with an error. In this method, the mismeasured covariate, W, is substituted by the expectation E(X|W), based on the assumption that the error in the measurement of X is non-differential. Using simulations, we compare three versions of RC with two other ‘substitution’ methods, moment reconstruction (MR) and imputation (IM), neither of which rely on the non-differential error assumption. We investigate studies that have an internal calibration sub-study. For RC, we consider (i) the usual version of RC, (ii) RC applied only to the ‘marker’ information in the calibration study, and (iii) an ‘efficient’ version (ERC) in which the estimators (i) and (ii) are combined. Our results show that ERC is preferable when there is non-differential measurement error. Under this condition, there are cases where ERC is less efficient than MR or IM, but they rarely occur in epidemiology. We show that the efficiency gain of usual RC and ERC over the other methods can sometimes be dramatic. The usual version of RC carries similar efficiency gains to ERC over MR and IM, but becomes unstable as measurement error becomes large, leading to bias and poor precision. When differential measurement error does pertain, then MR and IM have considerably less bias than RC, but can have much larger variance. We demonstrate our findings with an analysis of dietary fat intake and mortality in a large cohort study.
differential measurement error; moment reconstruction; multiple imputation; non-differential measurement error; regression calibration
The 1986 accident at the Chernobyl nuclear power plant remains the most serious nuclear accident in history, and excess thyroid cancers, particularly among those exposed to releases of iodine-131 remain the best-documented sequelae. Failure to take dose-measurement error into account can lead to bias in assessments of dose-response slope. Although risks in the Ukrainian-US thyroid screening study have been previously evaluated, errors in dose assessments have not been addressed hitherto. Dose-response patterns were examined in a thyroid screening prevalence cohort of 13,127 persons aged <18 at the time of the accident who were resident in the most radioactively contaminated regions of Ukraine. We extended earlier analyses in this cohort by adjusting for dose error in the recently developed TD-10 dosimetry. Three methods of statistical correction, via two types of regression calibration, and Monte Carlo maximum-likelihood, were applied to the doses that can be derived from the ratio of thyroid activity to thyroid mass. The two components that make up this ratio have different types of error, Berkson error for thyroid mass and classical error for thyroid activity. The first regression-calibration method yielded estimates of excess odds ratio of 5.78 Gy−1 (95% CI 1.92, 27.04), about 7% higher than estimates unadjusted for dose error. The second regression-calibration method gave an excess odds ratio of 4.78 Gy−1 (95% CI 1.64, 19.69), about 11% lower than unadjusted analysis. The Monte Carlo maximum-likelihood method produced an excess odds ratio of 4.93 Gy−1 (95% CI 1.67, 19.90), about 8% lower than unadjusted analysis. There are borderline-significant (p = 0.101–0.112) indications of downward curvature in the dose response, allowing for which nearly doubled the low-dose linear coefficient. In conclusion, dose-error adjustment has comparatively modest effects on regression parameters, a consequence of the relatively small errors, of a mixture of Berkson and classical form, associated with thyroid dose assessment.
Growth curves are monotonically increasing functions that measure repeatedly the same subjects over time. The classical growth curve model in the statistical literature is the Generalized Multivariate Analysis of Variance (GMANOVA) model. In order to model the tree trunk radius (r) over time (t) of trees on different sites, GMANOVA is combined here with the adapted PL regression model Q = A·T+E, where for
, A = initial relative growth to be estimated, , and E is an error term for each tree and time point. Furthermore, Ei[–b·r] = , , with TPR being the turning point radius in a sigmoid curve, and at is an estimated calibrating time-radius point. Advantages of the approach are that growth rates can be compared among growth curves with different turning point radiuses and different starting points, hidden outliers are easily detectable, the method is statistically robust, and heteroscedasticity of the residuals among time points is allowed. The model was implemented with dendrochronological data of 235 Pinus montezumae trees on ten Mexican volcano sites to calculate comparison intervals for the estimated initial relative growth . One site (at the Popocatépetl volcano) stood out, with being 3.9 times the value of the site with the slowest-growing trees. Calculating variance components for the initial relative growth, 34% of the growth variation was found among sites, 31% among trees, and 35% over time. Without the Popocatépetl site, the numbers changed to 7%, 42%, and 51%. Further explanation of differences in growth would need to focus on factors that vary within sites and over time.
The estimation of the spatio-temporal gait parameters is of primary importance in both physical activity monitoring and clinical contexts. A method for estimating step length bilaterally, during level walking, using a single inertial measurement unit (IMU) attached to the pelvis is proposed. In contrast to previous studies, based either on a simplified representation of the human gait mechanics or on a general linear regressive model, the proposed method estimates the step length directly from the integration of the acceleration along the direction of progression.
The IMU was placed at pelvis level fixed to the subject's belt on the right side. The method was validated using measurements from a stereo-photogrammetric system as a gold standard on nine subjects walking ten laps along a closed loop track of about 25 m, varying their speed. For each loop, only the IMU data recorded in a 4 m long portion of the track included in the calibrated volume of the SP system, were used for the analysis. The method takes advantage of the cyclic nature of gait and it requires an accurate determination of the foot contact instances. A combination of a Kalman filter and of an optimally filtered direct and reverse integration applied to the IMU signals formed a single novel method (Kalman and Optimally filtered Step length Estimation - KOSE method). A correction of the IMU displacement due to the pelvic rotation occurring in gait was implemented to estimate the step length and the traversed distance.
The step length was estimated for all subjects with less than 3% error. Traversed distance was assessed with less than 2% error.
The proposed method provided estimates of step length and traversed distance more accurate than any other method applied to measurements obtained from a single IMU that can be found in the literature. In healthy subjects, it is reasonable to expect that, errors in traversed distance estimation during daily monitoring activity would be of the same order of magnitude of those presented.
Inertial measurement; Gait analysis; Gait parameters; Accelerometer; Step length; Gait monitoring; Stride length; Inertial sensor; Wearable.
Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, explicit and novel forms of regression calibration estimators and associated asymptotic variances are derived for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations.
The motivating application involves a longitudinal study of exposure to two pollutants (predictors) – outdoor fine particulate matter and cigarette smoke – and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE4, outcome) in asthmatic children. Since the exposure concentrations could not be directly observed, measurements from a fixed outdoor monitor and urinary cotinine concentrations were used as instrumental variables, and concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors were used as unbiased surrogate variables. The derived regression calibration methods were applied to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. Simulations were used to verify accuracy of inferential methods based on asymptotic theory.
measurement error; errors in variables; surrogate; PM2.5; LTE4; cotinine
We aimed at assessing the degree of measurement error in essential fatty acid intakes from a food frequency questionnaire and the impact of correcting for such an error on precision and bias of odds ratios in logistic models. To assess these impacts, and for illustrative purposes, alternative approaches and methods were used with the binary outcome of cognitive decline in verbal fluency.
Using the Atherosclerosis Risk in Communities (ARIC) study, we conducted a sensitivity analysis. The error-prone exposure – visit 1 fatty acid intake (1987–89) – was available for 7,814 subjects 50 years or older at baseline with complete data on cognitive decline between visits 2 (1990–92) and 4 (1996–98). Our binary outcome of interest was clinically significant decline in verbal fluency. Point estimates and 95% confidence intervals were compared between naïve and measurement-error adjusted odds ratios of decline with every SD increase in fatty acid intake as % of energy. Two approaches were explored for adjustment: (A) External validation against biomarkers (plasma fatty acids in cholesteryl esters and phospholipids) and (B) Internal repeat measurements at visits 2 and 3. The main difference between the two is that Approach B makes a stronger assumption regarding lack of error correlations in the structural model. Additionally, we compared results from regression calibration (RCAL) to those from simulation extrapolation (SIMEX). Finally, using structural equations modeling, we estimated attenuation factors associated with each dietary exposure to assess degree of measurement error in a bivariate scenario for regression calibration of logistic regression model.
Results and conclusion
Attenuation factors for Approach A were smaller than B, suggesting a larger amount of measurement error in the dietary exposure. Replicate measures (Approach B) unlike concentration biomarkers (Approach A) may lead to imprecise odds ratios due to larger standard errors. Using SIMEX rather than RCAL models tends to preserve precision of odds ratios. We found in many cases that bias in naïve odds ratios was towards the null. RCAL tended to correct for a larger amount of effect bias than SIMEX, particularly for Approach A.
Online hearing tests conducted in home settings on a personal computer (PC) require prior calibration. Biological calibration consists of approximating the reference sound level via the hearing threshold of a person with normal hearing.
The objective of this study was to identify the error of the proposed methods of biological calibration, their duration, and the subjective difficulty in conducting these tests via PC.
Seven methods have been proposed for measuring the calibration coefficients. All measurements were performed in reference to the hearing threshold of a normal-hearing person. Three methods were proposed for determining the reference sound level on the basis of these calibration coefficients. Methods were compared for the estimated error, duration, and difficulty of the calibration. Web-based self-assessed measurements of the calibration coefficients were carried out in 3 series: (1) at a otolaryngology clinic, (2) at the participant’s home, and (3) again at the clinic. Additionally, in series 1 and 3, pure-tone audiometry was conducted and series 3 was followed by an offline questionnaire concerning the difficulty of the calibration. Participants were recruited offline from coworkers of the Department and Clinic of Otolaryngology, Wroclaw Medical University, Poland.
All 25 participants, aged 22-35 years (median 27) completed all tests and filled in the questionnaire. The smallest standard deviation of the calibration coefficient in the test-retest measurement was obtained at the level of 3.87 dB (95% CI 3.52-4.29) for the modulated signal presented in accordance with the rules of Bekesy’s audiometry. The method is characterized by moderate duration time and a relatively simple procedure. The simplest and shortest method was the method of self-adjustment of the sound volume to the barely audible level. In the test-retest measurement, the deviation of this method equaled 4.97 dB (95% CI 4.53-5.51). Among methods determining the reference sound level, the levels determined independently for each frequency revealed the smallest error. The estimated standard deviations of the difference in the hearing threshold between the examination conducted on a biologically calibrated PC and pure-tone audiometry varied from 7.27 dB (95% CI 6.71-7.93) to 10.38 dB (95% CI 9.11-12.03), depending on the calibration method.
In this study, an analysis of biological calibration was performed and the presented results included calibration error, calibration time, and calibration difficulty. These values determine potential applications of Web-based hearing tests conducted in home settings and are decisive factors when selecting the calibration method. If there are no substantial time limitations, it is advisable to use Bekesy method and determine the reference sound level independently at each frequency because this approach is characterized by the lowest error.
pure-tone audiometry; computer-assisted instruction; self-examination
New technology introduced over time results in changes in densitometers during longitudinal studies of bone mineral density (BMD). This requires that a cross-calibration process be completed to translate measurements from the old densitometer to the new one. Previously described cross-calibration methods for research settings have collected single measures on each densitometer and used linear regression to estimate cross-calibration corrections. Thus, these methods may produce corrections that have limited precision and underestimate the variability in converted BMD values. Furthermore, most prior studies have included small samples recruited from specialized populations. Increasing the sample size, obtaining multiple measures on each machine, and utilizing linear mixed models to account for between- and within-subject variability may improve cross-calibration estimates. The purpose of this study was to conduct an in vivo cross-calibration of a Lunar DPX-L with a Lunar Prodigy densitometer using a sample of 249 healthy volunteers who were scanned twice on each densitometer, without repositioning, at both the femur and spine. Scans were analyzed using both automated and manual placement of regions of interest. Wilcoxon rank-sum tests and Bland-Altman plots were used to examine possible differences between repeat scans within and across densitometers. We used linear mixed models to determine the cross-calibration equations for the femoral neck, trochanter, total hip and lumbar spine (L2-L4) regions. Results using automated and manual placement of the regions of interest did not differ significantly The DPX–L exhibited larger median absolute differences in repeat scans for femoral neck [0.016 vs. 0.012, p=0.1] and trochanter [0.011 vs. 0.009, p=0.06] BMD values compared to the Prodigy. The Bland-Altman plots revealed no statistically significant linear relation between the difference in paired measures between machines and mean BMD. In our large sample of healthy volunteers we did detect systematic differences between the DPX-L and Prodigy densitometers. Our proposed cross-calibration method, which includes acquiring multiple measures and using linear mixed models, provides researchers with a more realistic estimate of the variance of cross-calibrated BMD measures, potentially reducing the chance of making a type I error in longitudinal studies of changes in BMD.
cross-calibration; densitometer; bone mineral density; DXA; mixed models; Framingham Osteoporosis Study
Occupational, environmental, and nutritional epidemiologists are often interested in estimating the prospective effect of time-varying exposure variables such as cumulative exposure or cumulative updated average exposure, in relation to chronic disease endpoints such as cancer incidence and mortality. From exposure validation studies, it is apparent that many of the variables of interest are measured with moderate to substantial error. Although the ordinary regression calibration approach is approximately valid and efficient for measurement error correction of relative risk estimates from the Cox model with time-independent point exposures when the disease is rare, it is not adaptable for use with time-varying exposures. By re-calibrating the measurement error model within each risk set, a risk set regression calibration method is proposed for this setting. An algorithm for a bias-corrected point estimate of the relative risk using an RRC approach is presented, followed by the derivation of an estimate of its variance, resulting in a sandwich estimator. Emphasis is on methods applicable to the main study/external validation study design, which arises in important applications. Simulation studies under several assumptions about the error model were carried out, which demonstrated the validity and efficiency of the method in finite samples. The method was applied to a study of diet and cancer from Harvard’s Health Professionals Follow-up Study (HPFS).
Cox proportional hazards model; Measurement error; Risk set regression calibration; Time-varying covariates