In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model.
We present an asymptotic treatment of errors involved in point-based image registration where control point (CP) localization is subject to heteroscedastic noise; a suitable model for image registration in fluorescence microscopy. Assuming an affine transform, CPs are used to solve a multivariate regression problem. With measurement errors existing for both sets of CPs this is an errors-in-variable problem and linear least squares is inappropriate; the correct method being generalized least squares. To allow for point dependent errors the equivalence of a generalized maximum likelihood and heteroscedastic generalized least squares model is achieved allowing previously published asymptotic results to be extended to image registration. For a particularly useful model of heteroscedastic noise where covariance matrices are scalar multiples of a known matrix (including the case where covariance matrices are multiples of the identity) we provide closed form solutions to estimators and derive their distribution. We consider the target registration error (TRE) and define a new measure called the localization registration error (LRE) believed to be useful, especially in microscopy registration experiments. Assuming Gaussianity of the CP localization errors, it is shown that the asymptotic distribution for the TRE and LRE are themselves Gaussian and the parameterized distributions are derived. Results are successfully applied to registration in single molecule microscopy to derive the key dependence of the TRE and LRE variance on the number of CPs and their associated photon counts. Simulations show asymptotic results are robust for low CP numbers and non-Gaussianity. The method presented here is shown to outperform GLS on real imaging data.
Errors-in-variable; fluorescence microscopy; generalized least squares; image registration
Exposure measurement error is a problem in many epidemiological studies, including those using biomarkers and measures of dietary intake. Measurement error typically results in biased estimates of exposure-disease associations, the severity and nature of the bias depending on the form of the error. To correct for the effects of measurement error, information additional to the main study data is required. Ideally, this is a validation sample in which the true exposure is observed. However, in many situations, it is not feasible to observe the true exposure, but there may be available one or more repeated exposure measurements, for example, blood pressure or dietary intake recorded at two time points. The aim of this paper is to provide a toolkit for measurement error correction using repeated measurements. We bring together methods covering classical measurement error and several departures from classical error: systematic, heteroscedastic and differential error. The correction methods considered are regression calibration, which is already widely used in the classical error setting, and moment reconstruction and multiple imputation, which are newer approaches with the ability to handle differential error. We emphasize practical application of the methods in nutritional epidemiology and other fields. We primarily consider continuous exposures in the exposure-outcome model, but we also outline methods for use when continuous exposures are categorized. The methods are illustrated using the data from a study of the association between fibre intake and colorectal cancer, where fibre intake is measured using a diet diary and repeated measures are available for a subset. © 2014 The Authors.
measurement error; regression calibration; moment reconstruction; multiple imputation; diet diary; food frequency questionnaire; nutritional epidemiology
Litter decomposition rate (k) is typically estimated from proportional litter mass loss data using models that assume constant, normally distributed errors. However, such data often show non-normal errors with reduced variance near bounds (0 or 1), potentially leading to biased k estimates. We compared the performance of nonlinear regression using the beta distribution, which is well-suited to bounded data and this type of heteroscedasticity, to standard nonlinear regression (normal errors) on simulated and real litter decomposition data. Although the beta model often provided better fits to the simulated data (based on the corrected Akaike Information Criterion, AICc), standard nonlinear regression was robust to violation of homoscedasticity and gave equally or more accurate k estimates as nonlinear beta regression. Our simulation results also suggest that k estimates will be most accurate when study length captures mid to late stage decomposition (50–80% mass loss) and the number of measurements through time is ≥5. Regression method and data transformation choices had the smallest impact on k estimates during mid and late stage decomposition. Estimates of k were more variable among methods and generally less accurate during early and end stage decomposition. With real data, neither model was predominately best; in most cases the models were indistinguishable based on AICc, and gave similar k estimates. However, when decomposition rates were high, normal and beta model k estimates often diverged substantially. Therefore, we recommend a pragmatic approach where both models are compared and the best is selected for a given data set. Alternatively, both models may be used via model averaging to develop weighted parameter estimates. We provide code to perform nonlinear beta regression with freely available software.
We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.
Generalized least squares; Heteroscedasticity; Large p small n; Model selection; Sparse regression; Variance estimation
Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile) scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis.
We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC).
In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis.
Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a percentile scale. Relating back to the original scale of the exposure solves the problem. The conclusion regards all regression models.
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case–control study.
Calibration sample; Estimating equation; Heteroscedastic measurement error; Nonparametric correction
Toxicologists and pharmacologists often describe toxicity of a chemical using parameters of a nonlinear regression model. Thus estimation of parameters of a nonlinear regression model is an important problem. The estimates of the parameters and their uncertainty estimates depend upon the underlying error variance structure in the model. Typically, a priori the researcher would know if the error variances are homoscedastic (i.e., constant across dose) or if they are heteroscedastic (i.e., the variance is a function of dose). Motivated by this concern, in this article we introduce an estimation procedure based on preliminary test which selects an appropriate estimation procedure accounting for the underlying error variance structure. Since outliers and influential observations are common in toxicological data, the proposed methodology uses M-estimators. The asymptotic properties of the preliminary test estimator are investigated; in particular its asymptotic covariance matrix is derived. The performance of the proposed estimator is compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using a data set obtained from the National Toxicology Program.
Asymptotic normality; Dose-response study; Heteroscedasticity; Hill model; M-estimation procedure; Preliminary test estimation; Toxicology
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material.
Bias correction; Method-of-moments correction; Subsampling extrapolation
We consider statistical inference on a regression model in which some covariables are measured with errors together with an auxiliary variable. The proposed estimation for the regression coefficients is based on some estimating equations. This new method alleates some drawbacks of previously proposed estimations. This includes the requirment of undersmoothing the regressor functions over the auxiliary variable, the restriction on other covariables which can be observed exactly, among others. The large sample properties of the proposed estimator are established. We further propose a jackknife estimation, which consists of deleting one estimating equation (instead of one obervation) at a time. We show that the jackknife estimator of the regression coefficients and the estimating equations based estimator are asymptotically equivalent. Simulations show that the jackknife estimator has smaller biases when sample size is small or moderate. In addition, the jackknife estimation can also provide a consistent estimator of the asymptotic covariance matrix, which is robust to the heteroscedasticity. We illustrate these methods by applying them to a real data set from marketing science.
Linear regression model; noised variable; measurement error; auxiliary variable; estimating equation; jackknife estimation; asymptotic normality
Preterm birth, defined as delivery before 37 completed weeks’ gestation, is a leading cause of infant morbidity and mortality. Identifying factors related to preterm delivery is an important goal of public health professionals who wish to identify etiologic pathways to target for prevention. Validation studies are often conducted in nutritional epidemiology in order to study measurement error in instruments that are generally less invasive or less expensive than ”gold standard” instruments. Data from such studies are then used in adjusting estimates based on the full study sample. However, measurement error in nutritional epidemiology has recently been shown to be complicated by correlated error structures in the study-wide and validation instruments. Investigators of a study of preterm birth and dietary intake designed a validation study to assess measurement error in a food frequency questionnaire (FFQ) administered during pregnancy and with the secondary goal of assessing whether a single administration of the FFQ could be used to describe intake over the relatively short pregnancy period, in which energy intake typically increases. Here, we describe a likelihood-based method via Markov Chain Monte Carlo to estimate the regression coefficients in a generalized linear model relating preterm birth to covariates, where one of the covariates is measured with error and the multivariate measurement error model has correlated errors among contemporaneous instruments (i.e. FFQs, 24-hour recalls, and/or biomarkers). Because of constraints on the covariance parameters in our likelihood, identifiability for all the variance and covariance parameters is not guaranteed and, therefore, we derive the necessary and suficient conditions to identify the variance and covariance parameters under our measurement error model and assumptions. We investigate the sensitivity of our likelihood-based model to distributional assumptions placed on the true folate intake by employing semi-parametric Bayesian methods through the mixture of Dirichlet process priors framework. We exemplify our methods in a recent prospective cohort study of risk factors for preterm birth. We use long-term folate as our error-prone predictor of interest, the food-frequency questionnaire (FFQ) and 24-hour recall as two biased instruments, and serum folate biomarker as the unbiased instrument. We found that folate intake, as measured by the FFQ, led to a conservative estimate of the estimated odds ratio of preterm birth (0.76) when compared to the odds ratio estimate from our likelihood-based approach, which adjusts for the measurement error (0.63). We found that our parametric model led to similar conclusions to the semi-parametric Bayesian model.
Adaptive-Rejection Sampling; Dirichlet process prior; MCMC; Semiparametric Bayes
Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart's percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.
Motivation: Immunoassays are primary diagnostic and research tools throughout the medical and life sciences. The common approach to the processing of immunoassay data involves estimation of the calibration curve followed by inversion of the calibration function to read off the concentration estimates. This approach, however, does not lend itself easily to acceptable estimation of confidence limits on the estimated concentrations. Such estimates must account for uncertainty in the calibration curve as well as uncertainty in the target measurement. Even point estimates can be problematic: because of the non-linearity of calibration curves and error heteroscedasticity, the neglect of components of measurement error can produce significant bias.
Methods: We have developed a Bayesian approach for the estimation of concentrations from immunoassay data that treats the propagation of measurement error appropriately. The method uses Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution of the target concentrations and numerically compute the relevant summary statistics. Software implementing the method is freely available for public use.
Results: The new method was tested on both simulated and experimental datasets with different measurement error models. The method outperformed the common inverse method on samples with large measurement errors. Even in cases with extreme measurements where the common inverse method failed, our approach always generated reasonable estimates for the target concentrations.
Availability: Project name: Baecs; Project home page: www.computationalimmunology.org/utilities/; Operating systems: Linux, MacOS X and Windows; Programming language: C++; License: Free for Academic Use.
Supplementary data are available at Bioinformatics online.
Regression calibration (RC) is a popular method for estimating regression coefficients when one or more continuous explanatory variables, X, are measured with an error. In this method, the mismeasured covariate, W, is substituted by the expectation E(X|W), based on the assumption that the error in the measurement of X is non-differential. Using simulations, we compare three versions of RC with two other ‘substitution’ methods, moment reconstruction (MR) and imputation (IM), neither of which rely on the non-differential error assumption. We investigate studies that have an internal calibration sub-study. For RC, we consider (i) the usual version of RC, (ii) RC applied only to the ‘marker’ information in the calibration study, and (iii) an ‘efficient’ version (ERC) in which the estimators (i) and (ii) are combined. Our results show that ERC is preferable when there is non-differential measurement error. Under this condition, there are cases where ERC is less efficient than MR or IM, but they rarely occur in epidemiology. We show that the efficiency gain of usual RC and ERC over the other methods can sometimes be dramatic. The usual version of RC carries similar efficiency gains to ERC over MR and IM, but becomes unstable as measurement error becomes large, leading to bias and poor precision. When differential measurement error does pertain, then MR and IM have considerably less bias than RC, but can have much larger variance. We demonstrate our findings with an analysis of dietary fat intake and mortality in a large cohort study.
differential measurement error; moment reconstruction; multiple imputation; non-differential measurement error; regression calibration
Spatial data with covariate measurement errors have been commonly observed in public health studies. Existing work mainly concentrates on parameter estimation using Gibbs sampling, and no work has been conducted to understand and quantify the theoretical impact of ignoring measurement error on spatial data analysis in the form of the asymptotic biases in regression coefficients and variance components when measurement error is ignored. Plausible implementations, from frequentist perspectives, of maximum likelihood estimation in spatial covariate measurement error models are also elusive. In this paper, we propose a new class of linear mixed models for spatial data in the presence of covariate measurement errors. We show that the naive estimators of the regression coefficients are attenuated while the naive estimators of the variance components are inflated, if measurement error is ignored. We further develop a structural modeling approach to obtaining the maximum likelihood estimator by accounting for the measurement error. We study the large sample properties of the proposed maximum likelihood estimator, and propose an EM algorithm to draw inference. All the asymptotic properties are shown under the increasing-domain asymptotic framework. We illustrate the method by analyzing the Scottish lip cancer data, and evaluate its performance through a simulation study, all of which elucidate the importance of adjusting for covariate measurement errors.
Measurement error; Spatial data; Structural modeling; Variance components; Asymptotic bias; Consistency and asymptotic normality; Increasing domain asymptotics; EM algorithm
Purpose: The aim of this study was to select the best calibration model for determination of propofol plasma concentration by high-performance liquid chromatography method.
Methods: Determination of propofol in plasma after deproteinization with acetonitrile containing thymol (as internal standard) was carried out on a C18 column with a mixture of acetonitrile and trifluoroacetic acid 0.1% (60:40) as mobile phase which delivered at the flow rate of 1.2 mL/minute . Fluorescence detection was done at the excitation and emission wavelengths of 276 and 310 nm, respectively. After fitting different equations to the calibration data using weighted regression, the adequacy of models were assessed by lack-of-fit test, significance of all model parameters, adjusted coefficient of determination (R2adjusted) and by measuring the predictive performance with median relative prediction error and median absolute relative prediction error of the validation data set.
Results: The best model was a linear equation without intercept with median relative prediction error and median absolute relative prediction error of 4.0 and 9.4%, respectively in the range of 10-5000 ng/mL. The method showed good accuracy and precision.
Conclusion: The presented statistical framework could be used to choose the best model for heteroscedastic calibration data for analytes like propofol with wide range of expected concentration.
Propofol; High-performance liquid chromatography; Calibration; Heteroscedasticty; Weighted least squares regression
Online hearing tests conducted in home settings on a personal computer (PC) require prior calibration. Biological calibration consists of approximating the reference sound level via the hearing threshold of a person with normal hearing.
The objective of this study was to identify the error of the proposed methods of biological calibration, their duration, and the subjective difficulty in conducting these tests via PC.
Seven methods have been proposed for measuring the calibration coefficients. All measurements were performed in reference to the hearing threshold of a normal-hearing person. Three methods were proposed for determining the reference sound level on the basis of these calibration coefficients. Methods were compared for the estimated error, duration, and difficulty of the calibration. Web-based self-assessed measurements of the calibration coefficients were carried out in 3 series: (1) at a otolaryngology clinic, (2) at the participant’s home, and (3) again at the clinic. Additionally, in series 1 and 3, pure-tone audiometry was conducted and series 3 was followed by an offline questionnaire concerning the difficulty of the calibration. Participants were recruited offline from coworkers of the Department and Clinic of Otolaryngology, Wroclaw Medical University, Poland.
All 25 participants, aged 22-35 years (median 27) completed all tests and filled in the questionnaire. The smallest standard deviation of the calibration coefficient in the test-retest measurement was obtained at the level of 3.87 dB (95% CI 3.52-4.29) for the modulated signal presented in accordance with the rules of Bekesy’s audiometry. The method is characterized by moderate duration time and a relatively simple procedure. The simplest and shortest method was the method of self-adjustment of the sound volume to the barely audible level. In the test-retest measurement, the deviation of this method equaled 4.97 dB (95% CI 4.53-5.51). Among methods determining the reference sound level, the levels determined independently for each frequency revealed the smallest error. The estimated standard deviations of the difference in the hearing threshold between the examination conducted on a biologically calibrated PC and pure-tone audiometry varied from 7.27 dB (95% CI 6.71-7.93) to 10.38 dB (95% CI 9.11-12.03), depending on the calibration method.
In this study, an analysis of biological calibration was performed and the presented results included calibration error, calibration time, and calibration difficulty. These values determine potential applications of Web-based hearing tests conducted in home settings and are decisive factors when selecting the calibration method. If there are no substantial time limitations, it is advisable to use Bekesy method and determine the reference sound level independently at each frequency because this approach is characterized by the lowest error.
pure-tone audiometry; computer-assisted instruction; self-examination
Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, explicit and novel forms of regression calibration estimators and associated asymptotic variances are derived for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations.
The motivating application involves a longitudinal study of exposure to two pollutants (predictors) – outdoor fine particulate matter and cigarette smoke – and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE4, outcome) in asthmatic children. Since the exposure concentrations could not be directly observed, measurements from a fixed outdoor monitor and urinary cotinine concentrations were used as instrumental variables, and concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors were used as unbiased surrogate variables. The derived regression calibration methods were applied to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. Simulations were used to verify accuracy of inferential methods based on asymptotic theory.
measurement error; errors in variables; surrogate; PM2.5; LTE4; cotinine
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Occupational, environmental, and nutritional epidemiologists are often interested in estimating the prospective effect of time-varying exposure variables such as cumulative exposure or cumulative updated average exposure, in relation to chronic disease endpoints such as cancer incidence and mortality. From exposure validation studies, it is apparent that many of the variables of interest are measured with moderate to substantial error. Although the ordinary regression calibration approach is approximately valid and efficient for measurement error correction of relative risk estimates from the Cox model with time-independent point exposures when the disease is rare, it is not adaptable for use with time-varying exposures. By re-calibrating the measurement error model within each risk set, a risk set regression calibration method is proposed for this setting. An algorithm for a bias-corrected point estimate of the relative risk using an RRC approach is presented, followed by the derivation of an estimate of its variance, resulting in a sandwich estimator. Emphasis is on methods applicable to the main study/external validation study design, which arises in important applications. Simulation studies under several assumptions about the error model were carried out, which demonstrated the validity and efficiency of the method in finite samples. The method was applied to a study of diet and cancer from Harvard’s Health Professionals Follow-up Study (HPFS).
Cox proportional hazards model; Measurement error; Risk set regression calibration; Time-varying covariates
Measurement error is common in epidemiological and biomedical studies. When biomarkers are measured in batches or groups, measurement error is potentially correlated within each batch or group. In regression analysis, most existing methods are not applicable in the presence of batch-specific measurement error in predictors. We propose a robust conditional likelihood approach to account for batch-specific error in predictors when batch effect is additive and the predominant source of error, which requires no assumptions on the distribution of measurement error. While a regression model with batch as a categorical covariable yields the same parameter estimates as the proposed conditional likelihood approach for linear regression, this result does not hold in general for all generalized linear models, in particular, logistic regression. Our simulation studies show that the conditional likelihood approach achieves better finite sample performance than the regression calibration approach or a naive approach without adjustment for measurement error. In the case of logistic regression, our proposed approach is shown to also outperform the regression approach with batch as a categorical covariate. In addition, we also examine a “hybrid” approach combining the conditional likelihood method and the regression calibration method, which is shown in simulations to achieve good performance in the presence of both batch-specific and measurement-specific error. We illustrate our method using data from a colorectal adenoma study.
Batch-specific error; Biomarker; Conditional likelihood; Exponential family; Generalized linear models; Robust method
In-vivo measurement of bone lead by means of K-X ray fluorescence (KXRF) is the preferred biological marker of chronic exposure to lead. Unfortunately, considerable measurement error associated with KXRF estimations can introduce bias in estimates of the effect of bone lead when this variable is included as the exposure in a regression model. Estimates of uncertainty reported by the KXRF instrument reflect the variance of the measurement error and, although they can be used to correct the measurement error bias, they are seldom used in epidemiological statistical analyses. Errors-in-variables regression (EIV) allows for correction of bias caused by measurement error in predictor variables, based on the knowledge of the reliability of such variables. The authors propose a way to obtain reliability coefficients for bone lead measurements from uncertainty data reported by the KXRF instrument and compare, by use of Monte Carlo simulations, results obtained using EIV regression models versus those obtained by the standard procedures. Results of the simulations show that Ordinary Least Square (OLS) regression models provide severely biased estimates of effect, and that EIV provides nearly unbiased estimates. Although EIV effect estimates are more imprecise, their mean squared error is much smaller than that of OLS estimates. In conclusion, EIV is a better alternative than OLS to estimate the effect of bone lead when measured by KXRF.
Lead; KXRF; measurement error; errors-in-variables model; simulations
Data from many scientific areas often come with measurement error. Density or distribution function estimation from contaminated data and nonparametric regression with errors-in-variables are two important topics in measurement error models. In this paper, we present a new software package decon for
R, which contains a collection of functions that use the deconvolution kernel methods to deal with the measurement error problems. The functions allow the errors to be either homoscedastic or heteroscedastic. To make the deconvolution estimators computationally more efficient in
R, we adapt the fast Fourier transform algorithm for density estimation with error-free data to the deconvolution kernel estimation. We discuss the practical selection of the smoothing parameter in deconvolution methods and illustrate the use of the package through both simulated and real examples.
measurement error models; deconvolution; errors-in-variables problems; smoothing; kernel; faster Fourier transform; heteroscedastic errors; bandwidth selection
Audits are often performed to assess the quality of clinical trial data, but beyond detecting fraud or sloppiness, the audit data is generally ignored. In earlier work using data from a non-randomized study, Shepherd and Yu (2011) developed statistical methods to incorporate audit results into study estimates, and demonstrated that audit data could be used to eliminate bias.
In this manuscript we examine the usefulness of audit-based error-correction methods in clinical trial settings where a continuous outcome is of primary interest.
We demonstrate the bias of multiple linear regression estimates in general settings with an outcome that may have errors and a set of covariates for which some may have errors and others, including treatment assignment, are recorded correctly for all subjects. We study this bias under different assumptions including independence between treatment assignment, covariates, and data errors (conceivable in a double-blinded randomized trial) and independence between treatment assignment and covariates but not data errors (possible in an unblinded randomized trial). We review moment-based estimators to incorporate the audit data and propose new multiple imputation estimators. The performance of estimators is studied in simulations.
When treatment is randomized and unrelated to data errors, estimates of the treatment effect using the original error-prone data (i.e., ignoring the audit results) are unbiased. In this setting, both moment and multiple imputation estimators incorporating audit data are more variable than standard analyses using the original data. In contrast, in settings where treatment is randomized but correlated with data errors and in settings where treatment is not randomized, standard treatment effect estimates will be biased. And in all settings, parameter estimates for the original, error-prone covariates will be biased. Treatment and covariate effect estimates can be corrected by incorporating audit data using either the multiple imputation or moment-based approaches. Bias, precision, and coverage of confidence intervals improve as the audit size increases.
The extent of bias and the performance of methods depend on the extent and nature of the error as well as the size of the audit. This work only considers methods for the linear model. Settings much different than those considered here need further study.
In randomized trials with continuous outcomes and treatment assignment independent of data errors, standard analyses of treatment effects will be unbiased and are recommended. However, if treatment assignment is correlated with data errors or other covariates, naive analyses may be biased. In these settings, and when covariate effects are of interest, approaches for incorporating audit results should be considered.
audit; bias; clinical trials; measurement error; multiple imputation
Data collected in many epidemiological or clinical research studies are often contaminated with measurement errors that may be of classical or Berkson error type. The measurement error may also be a combination of both classical and Berkson errors and failure to account for both errors could lead to unreliable inference in many situations. We consider regression analysis in generalized linear models when some covariates are prone to a mixture of Berkson and classical errors and calibration data are available only for some subjects in a subsample. We propose an expected estimating equation approach to accommodate both errors in generalized linear regression analyses. The proposed method can consistently estimate the classical and Berkson error variances based on the available data, without knowing the mixture percentage. Its finite-sample performance is investigated numerically. Our method is illustrated by an application to real data from an HIV vaccine study.
Berkson error; calibration subsample; classical error; expected estimating equation; generalized linear model; instrumental variable