To propose a more realistic model for disease cluster detection, through a modification of the spatial scan statistic to account simultaneously for inflated zeros and overdispersion.
Spatial Scan Statistics  usually assume Poisson or Binomial distributed data, which is not adequate in many disease surveillance scenarios. For example, small areas distant from hospitals may exhibit a smaller number of cases than expected in those simple models. Also, underreporting may occur in underdeveloped regions, due to inefficient data collection or the difficulty to access remote sites. Those factors generate excess zero case counts or overdispersion, inducing a violation of the statistical model and also increasing the type I error (false alarms). Overdispersion occurs when data variance is greater than the predicted by the used model. To accommodate it, an extra parameter must be included; in the Poisson model, one makes the variance equal to the mean.
Tools like the Generalized Poisson (GP) and the Double Poisson  may be a better option for this kind of problem, modeling separately the mean and variance, which could be easily adjusted by covariates. When excess zeros occur, the Zero Inflated Poisson (ZIP) model is used, although ZIP’s estimated parameters may be severely biased if nonzero counts are too dispersed, compared to the Poisson distribution. In this case the Inflated Zero models for the Generalized Poisson (ZIGP), Double Poisson (ZIDP) and Negative Binomial (ZINB) could be good alternatives to the joint modeling of excess zeros and overdispersion. By one hand, Zero Inflated Poisson (ZIP) models were proposed using the spatial scan statistic to deal with the excess zeros . By the other hand, another spatial scan statistic was based on a Poisson-Gamma mixture model for overdispersion . In this work we present a model which includes inflated zeros and overdispersion simultaneously, based on the ZIDP model. Let the parameter p indicate the zero inflation. As the the remaining parameters of the observed cases map and the parameter p are not independent, the likelihood maximization process is not straightforward; it becomes even more complicated when we include covariates in the analysis. To solve this problem we introduce a vector of latent variables in order to factorize the likelihood, and obtain a facilitator for the maximization process using the E-M (Expectation-Maximization) algorithm. We derive the formulas to maximize iteratively the likelihood, and implement a computer program using the E-M algorithm to estimate the parameters under null and alternative hypothesis. The p-value is obtained via the Fast Double Bootstrap Test .
Numerical simulations are conducted to assess the effectiveness of the method. We present results for Hanseniasis surveillance in the Brazilian Amazon in 2010 using this technique. We obtain the most likely spatial clusters for the Poisson, ZIP, Poisson-Gamma mixture and ZIDP models and compare the results.
The Zero Inflated Double Poisson Spatial Scan Statistic for disease cluster detection incorporates the flexibility of previous models, accounting for inflated zeros and overdispersion simultaneously.
The Hanseniasis study case map, due to excess of zero cases counts in many municipalities of the Brazilian Amazon and the presence of overdispersion, was a good benchmark to test the ZIDP model. The results obtained are easier to understand compared to each of the previous spatial scan statistic models, the Zero Inflated Poisson (ZIP) model and the Poisson-Gamma mixture model for overdispersion, taken separetely. The E-M algorithm and the Fast Double Bootstrap test are computationally efficient for this type of problem.
Scan statistics; Zero inflated; Overdispersion; Expectation-Maximization algorithm
Predictive microbiology develops mathematical models that can predict the growth rate of a microorganism population under a set of environmental conditions. Many primary growth models have been proposed. However, when primary models are applied to bacterial growth curves, the biological variability is reduced to a single curve defined by some kinetic parameters (lag time and growth rate), and sometimes the models give poor fits in some regions of the curve. The development of a prediction band (from a set of bacterial growth curves) using non-parametric and bootstrap methods permits to overcome that problem and include the biological variability of the microorganism into the modelling process.
Absorbance data from Listeria monocytogenes cultured at 22, 26, 38, and 42°C were selected under different environmental conditions of pH (4.5, 5.5, 6.5, and 7.4) and percentage of NaCl (2.5, 3.5, 4.5, and 5.5). Transformation of absorbance data to viable count data was carried out. A random effect multiplicative heteroscedastic model was considered to explain the dynamics of bacterial growth. The concept of a prediction band for microbial growth is proposed. The bootstrap method was used to obtain resamples from this model. An iterative procedure is proposed to overcome the computer intensive task of calculating simultaneous prediction intervals, along time, for bacterial growth. The bands were narrower below the inflection point (0-8 h at 22°C, and 0-5.5 h at 42°C), and wider to the right of it (from 9 h onwards at 22°C, and from 7 h onwards at 42°C). A wider band was observed at 42°C than at 22°C when the curves reach their upper asymptote. Similar bands have been obtained for 26 and 38°C.
The combination of nonparametric models and bootstrap techniques results in a good procedure to obtain reliable prediction bands in this context. Moreover, the new iterative algorithm proposed in this paper allows one to achieve exactly the prefixed coverage probability for the prediction band. The microbial growth bands reflect the influence of the different environmental conditions on the microorganism behaviour, helping in the interpretation of the biological meaning of the growth curves obtained experimentally.
Public health researchers often estimate health effects of exposures (e.g., pollution, diet, lifestyle) that cannot be directly measured for study subjects. A common strategy in environmental epidemiology is to use a first-stage (exposure) model to estimate the exposure based on covariates and/or spatio-temporal proximity and to use predictions from the exposure model as the covariate of interest in the second-stage (health) model. This induces a complex form of measurement error. We propose an analytical framework and methodology that is robust to misspecification of the first-stage model and provides valid inference for the second-stage model parameter of interest.
We decompose the measurement error into components analogous to classical and Berkson error and characterize properties of the estimator in the second-stage model if the first-stage model predictions are plugged in without correction. Specifically, we derive conditions for compatibility between the first- and second-stage models that guarantee consistency (and have direct and important real-world design implications), and we derive an asymptotic estimate of finite-sample bias when the compatibility conditions are satisfied. We propose a methodology that (1) corrects for finite-sample bias and (2) correctly estimates standard errors. We demonstrate the utility of our methodology in simulations and an example from air pollution epidemiology.
measurement error; spatial statistics; two-stage estimation; air pollution; environmental epidemiology
Propensity-score matching is frequently used to estimate the effect of treatments, exposures, and interventions when using observational data. An important issue when using propensity-score matching is how to estimate the standard error of the estimated treatment effect. Accurate variance estimation permits construction of confidence intervals that have the advertised coverage rates and tests of statistical significance that have the correct type I error rates. There is disagreement in the literature as to how standard errors should be estimated. The bootstrap is a commonly used resampling method that permits estimation of the sampling variability of estimated parameters. Bootstrap methods are rarely used in conjunction with propensity-score matching. We propose two different bootstrap methods for use when using propensity-score matching without replacementand examined their performance with a series of Monte Carlo simulations. The first method involved drawing bootstrap samples from the matched pairs in the propensity-score-matched sample. The second method involved drawing bootstrap samples from the original sample and estimating the propensity score separately in each bootstrap sample and creating a matched sample within each of these bootstrap samples. The former approach was found to result in estimates of the standard error that were closer to the empirical standard deviation of the sampling distribution of estimated effects.
propensity score; propensity-score matching; bootstrap; variance estimation; Monte Carlo simulations; matching
Health-Related Quality of Life (HRQoL) measures are becoming increasingly used in clinical trials as primary outcome measures. Investigators are now asking statisticians for advice on how to analyse studies that have used HRQoL outcomes.
HRQoL outcomes, like the SF-36, are usually measured on an ordinal scale. However, most investigators assume that there exists an underlying continuous latent variable that measures HRQoL, and that the actual measured outcomes (the ordered categories), reflect contiguous intervals along this continuum.
The ordinal scaling of HRQoL measures means they tend to generate data that have discrete, bounded and skewed distributions. Thus, standard methods of analysis such as the t-test and linear regression that assume Normality and constant variance may not be appropriate. For this reason, conventional statistical advice would suggest that non-parametric methods be used to analyse HRQoL data. The bootstrap is one such computer intensive non-parametric method for analysing data.
We used the bootstrap for hypothesis testing and the estimation of standard errors and confidence intervals for parameters, in four datasets (which illustrate the different aspects of study design). We then compared and contrasted the bootstrap with standard methods of analysing HRQoL outcomes. The standard methods included t-tests, linear regression, summary measures and General Linear Models.
Overall, in the datasets we studied, using the SF-36 outcome, bootstrap methods produce results similar to conventional statistical methods. This is likely because the t-test and linear regression are robust to the violations of assumptions that HRQoL data are likely to cause (i.e. non-Normality). While particular to our datasets, these findings are likely to generalise to other HRQoL outcomes, which have discrete, bounded and skewed distributions. Future research with other HRQoL outcome measures, interventions and populations, is required to confirm this conclusion.
Health Related Quality of Life; SF-36; Bootstrap Simulation; Statistical Analysis.
In many environmental epidemiology studies, the locations and/or times of exposure measurements and health assessments do not match. In such settings, health effects analyses often use the predictions from an exposure model as a covariate in a regression model. Such exposure predictions contain some measurement error as the predicted values do not equal the true exposures. We provide a framework for spatial measurement error modeling, showing that smoothing induces a Berkson-type measurement error with nondiagonal error structure. From this viewpoint, we review the existing approaches to estimation in a linear regression health model, including direct use of the spatial predictions and exposure simulation, and explore some modified approaches, including Bayesian models and out-of-sample regression calibration, motivated by measurement error principles. We then extend this work to the generalized linear model framework for health outcomes. Based on analytical considerations and simulation results, we compare the performance of all these approaches under several spatial models for exposure. Our comparisons underscore several important points. First, exposure simulation can perform very poorly under certain realistic scenarios. Second, the relative performance of the different methods depends on the nature of the underlying exposure surface. Third, traditional measurement error concepts can help to explain the relative practical performance of the different methods. We apply the methods to data on the association between levels of particulate matter and birth weight in the greater Boston area.
Air pollution; Measurement error; Predictions; Spatial misalignment
In resampling methods, such as bootstrapping or cross validation, a very similar
computational problem (usually an optimization procedure) is solved over and over
again for a set of very similar data sets. If it is computationally burdensome to
solve this computational problem once, the whole resampling method can become
unfeasible. However, because the computational problems and data sets are so
similar, the speed of the resampling method may be increased by taking advantage of
these similarities in method and data. As a generic solution, we propose to learn
the relation between the resampled data sets and their corresponding optima. Using
this learned knowledge, we are then able to predict the optima associated with new
resampled data sets. First, these predicted optima are used as starting values for
the optimization process. Once the predictions become accurate enough, the
optimization process may even be omitted completely, thereby greatly decreasing the
computational burden. The suggested method is validated using two simple problems
(where the results can be verified analytically) and two real-life problems (i.e.,
the bootstrap of a mixed model and a generalized extreme value distribution). The
proposed method led on average to a tenfold increase in speed of the resampling
Model rejections lie at the heart of systems biology, since they provide conclusive statements: that the corresponding mechanistic assumptions do not serve as valid explanations for the experimental data. Rejections are usually done using e.g. the chi-square test (χ2) or the Durbin-Watson test (DW). Analytical formulas for the corresponding distributions rely on assumptions that typically are not fulfilled. This problem is partly alleviated by the usage of bootstrapping, a computationally heavy approach to calculate an empirical distribution. Bootstrapping also allows for a natural extension to estimation of joint distributions, but this feature has so far been little exploited.
We herein show that simplistic combinations of bootstrapped tests, like the max or min of the individual p-values, give inconsistent, i.e. overly conservative or liberal, results. A new two-dimensional (2D) approach based on parametric bootstrapping, on the other hand, is found both consistent and with a higher power than the individual tests, when tested on static and dynamic examples where the truth is known. In the same examples, the most superior test is a 2D χ2vsχ2, where the second χ2-value comes from an additional help model, and its ability to describe bootstraps from the tested model. This superiority is lost if the help model is too simple, or too flexible. If a useful help model is found, the most powerful approach is the bootstrapped log-likelihood ratio (LHR). We show that this is because the LHR is one-dimensional, because the second dimension comes at a cost, and because LHR has retained most of the crucial information in the 2D distribution. These approaches statistically resolve a previously published rejection example for the first time.
We have shown how to, and how not to, combine tests in a bootstrap setting, when the combination is advantageous, and when it is advantageous to include a second model. These results also provide a deeper insight into the original motivation for formulating the LHR, for the more general setting of nonlinear and non-nested models. These insights are valuable in cases when accuracy and power, rather than computational speed, are prioritized.
Model rejection; Bootstrapping; Combining information; 2D; Insulin signaling; Model Mimicry; Likelihood ratio
Joint modeling of survival and longitudinal data has been studied extensively in the recent literature. The likelihood approach is one of the most popular estimation methods employed within the joint modeling framework. Typically, the parameters are estimated using maximum likelihood, with computation performed by the expectation maximization (EM) algorithm. However, one drawback of this approach is that standard error (SE) estimates are not automatically produced when using the EM algorithm. Many different procedures have been proposed to obtain the asymptotic covariance matrix for the parameters when the number of parameters is typically small. In the joint modeling context, however, there may be an infinite-dimensional parameter, the baseline hazard function, which greatly complicates the problem, so that the existing methods cannot be readily applied. The profile likelihood and the bootstrap methods overcome the difficulty to some extent; however, they can be computationally intensive. In this paper, we propose two new methods for SE estimation using the EM algorithm that allow for more efficient computation of the SE of a subset of parametric components in a semiparametric or high-dimensional parametric model. The precision and computation time are evaluated through a thorough simulation study. We conclude with an application of our SE estimation method to analyze an HIV clinical trial dataset.
EM algorithm; HIV clinical trial; Numerical differentiation; Observed information matrix; Profile likelihood; Semiparametric joint modeling
Often, in environmental data collection, data arise from two sources: numerical models and monitoring networks. The first source provides predictions at the level of grid cells, while the second source gives measurements at points. The first is characterized by full spatial coverage of the region of interest, high temporal resolution, no missing data but consequential calibration concerns. The second tends to be sparsely collected in space with coarser temporal resolution, often with missing data but, where recorded, provides, essentially, the true value. Accommodating the spatial misalignment between the two types of data is of fundamental importance for both improved predictions of exposure as well as for evaluation and calibration of the numerical model. In this article we propose a simple, fully model-based strategy to downscale the output from numerical models to point level. The static spatial model, specified within a Bayesian framework, regresses the observed data on the numerical model output using spatially-varying coefficients which are specified through a correlated spatial Gaussian process.
As an example, we apply our method to ozone concentration data for the eastern U.S. and compare it to Bayesian melding (Fuentes and Raftery 2005) and ordinary kriging (Cressie 1993; Chilès and Delfiner 1999). Our results show that our method outperforms Bayesian melding in terms of computing speed and it is superior to both Bayesian melding and ordinary kriging in terms of predictive performance; predictions obtained with our method are better calibrated and predictive intervals have empirical coverage closer to the nominal values. Moreover, our model can be easily extended to accommodate for the temporal dimension. In this regard, we consider several spatio-temporal versions of the static model. We compare them using out-of-sample predictions of ozone concentration for the eastern U.S. for the period May 1–October 15, 2001. For the best choice, we present a summary of the analysis. Supplemental material, including color versions of Figures 4, 5, 6, 7, and 8, and MCMC diagnostic plots, are available online.
Bayesian melding; Calibration; Markov chain Monte Carlo; Ordinary kriging; Spatial misalignment; Spatially varying coefficient model
Fluorodeoxyglucose positron emission tomography (FDG-PET) studies report characteristic patterns of cerebral hypometabolism in probable Alzheimer's disease (pAD) and amnestic mild cognitive impairment (aMCI). This study aims to characterize the consistency of regional hypometabolism in pAD and aMCI patients enrolled in the AD Neuroimaging Initiative (ADNI) using statistical parametric mapping (SPM) and bootstrap resampling, and to compare bootstrap based reliability index to the commonly used type-I error approach with or without correction for multiple comparisons. Batched SPM5 was run for each of 1,000 bootstrap iterations to compare FDG-PET images from 74 pAD and 142 aMCI patients, respectively, to 82 normal controls. Maps of the hypometabolic voxels detected for at least a specific percentage of times over the 1000 runs were examined and compared to an overlap of the hypometabolic maps obtained from 3 randomly partitioned independent sub-datasets. The results from the bootstrap derived reliability of regional hypometabolism in the overall data set were similar to that observed in each of the three non-overlapping sub-sets using family-wise error. Strong but non-linear association was found between the bootstrap based reliability index and the type-I error. For threshold p=0.0005, pAD was associated with extensive hypometabolic voxels in the posterior cingulate/precuneus and parietotemporal regions with reliability between 90% and 100%. Bootstrap analysis provides an alternative to the parametric family-wise error approach used to examine consistency of hypometabolic brain voxels in pAD and aMCI patients. These results provide a foundation for the use of bootstrap analysis characterize statistical ROIs or search regions in both cross-sectional and longitudinal FDG PET studies. This approach offers promise in the early detection and tracking of AD, the evaluation of AD-modifying treatments, and other biologically or clinical important measurements using brain images and voxel-based data analysis techniques.
Alzheimer's Disease; MCI; FDG PET; Reproducibility of Results; Reliability; Bootstrap Resampling; Familywise Error; SPM
It is a common practice to use resampling methods such as the bootstrap for calculating the p-value for each test when performing large scale multiple testing. The precision of the bootstrap p-values and that of the false discovery rate (FDR) relies on the number of bootstraps used for testing each hypothesis. Clearly, the larger the number of bootstraps the better the precision. However, the required number of bootstraps can be computationally burdensome, and it multiplies the number of tests to be performed. Further adding to the computational challenge is that in some applications the calculation of the test statistic itself may require considerable computation time. As technology improves one can expect the dimension of the problem to increase as well. For instance, during the early days of microarray technology, the number of probes on a cDNA chip was less than 10,000. Now the Affymetrix chips come with over 50,000 probes per chip. Motivated by this important need, we developed a simple adaptive bootstrap methodology for large scale multiple testing, which reduces the total number of bootstrap calculations while ensuring the control of the FDR. The proposed algorithm results in a substantial reduction in the number of bootstrap samples. Based on a simulation study we found that, relative to the number of bootstraps required for the Benjamini-Hochberg (BH) procedure, the standard FDR methodology which was the proposed methodology achieved a very substantial reduction in the number of bootstraps. In some cases the new algorithm required as little as 1/6th the number of bootstraps as the conventional BH procedure. Thus, if the conventional BH procedure used 1,000 bootstraps, then the proposed method required only 160 bootstraps. This methodology has been implemented for time-course/dose-response data in our software, ORIOGEN, which is available from the authors upon request.
Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation.
In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data.
Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained.
When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.
Electronic supplementary material
The online version of this article (doi:10.1186/s12874-016-0239-7) contains supplementary material, which is available to authorized users.
Missing values; Incomplete data; Prediction model; Predictive performance; Bootstrap; Internal validation; Resampling; Cross-validation; Multiple imputation; MICE
The goal of genome-wide prediction (GWP) is to predict phenotypes based on marker genotypes, often obtained through single nucleotide polymorphism (SNP) chips. The major problem with GWP is high-dimensional data from many thousands of SNPs scored on several thousands of individuals. A large number of methods have been developed for GWP, which are mostly parametric methods that assume statistical linearity and only additive genetic effects. The Bayesian additive regression trees (BART) method was recently proposed and is based on the sum of nonparametric regression trees with the priors being used to regularize the parameters. Each regression tree is based on a recursive binary partitioning of the predictor space that approximates an unknown function, which will automatically model nonlinearities within SNPs (dominance) and interactions between SNPs (epistasis). In this study, we introduced BART and compared its predictive performance with that of the LASSO, Bayesian LASSO (BLASSO), genomic best linear unbiased prediction (GBLUP), reproducing kernel Hilbert space (RKHS) regression and random forest (RF) methods.
Tests on the QTLMAS2010 simulated data, which are mainly based on additive genetic effects, show that cross-validated optimization of BART provides a smaller prediction error than the RF, BLASSO, GBLUP and RKHS methods, and is almost as accurate as the LASSO method. If dominance and epistasis effects are added to the QTLMAS2010 data, the accuracy of BART relative to the other methods was increased. We also showed that BART can produce importance measures on the SNPs through variable inclusion proportions. In evaluations using real data on pigs, the prediction error was smaller with BART than with the other methods.
BART was shown to be an accurate method for GWP, in which the regression trees guarantee a very sparse representation of additive and complex non-additive genetic effects. Moreover, the Markov chain Monte Carlo algorithm with Bayesian back-fitting provides a computationally efficient procedure that is suitable for high-dimensional genomic data.
Electronic supplementary material
The online version of this article (doi:10.1186/s12711-016-0219-8) contains supplementary material, which is available to authorized users.
This paper presents a Bayesian hierarchical spatiotemporal method of interpolation, termed as Markov Cube Kriging (MCK). The classical Kriging methods become computationally prohibitive, especially for large datasets due to the O(n3) matrix decomposition. MCK offers novel and computationally efficient solutions to address spatiotemporal misalignment, mismatch in the spatiotemporal scales and missing values across space and time in large spatiotemporal datasets. MCK is flexible in that it allows for non-separable spatiotemporal structure and nonstationary covariance at the hierarchical spatiotemporal scales. Employing MCK we developed estimates of daily concentration of fine particulates matter ≤2.5 μm in aerodynamic diameter (PM2.5) at 2.5 km spatial grid for the Cleveland Metropolitan Statistical Area, 2000 to 2009. Our validation and cross-validation suggest that MCK achieved robust prediction of spatiotemporal random effects and underlying hierarchical and nonstationary spatiotemporal structure in air pollution data. MCK has important implications for environmental epidemiology and environmental sciences for exposure quantification and collocation of data from different sources, available at different spatiotemporal scales.
Time-space Kriging; Spatiotemporal hierarchical model; Gaussian Markov Random Fields; Nonstationarity; Bayesian computation; Fine particulate matter PM2.5
The relationship between exposure to environmental chemicals during pregnancy and early childhood development is an important issue which has a spatial risk component. In this context, we have examined mental retardation and developmental delay (MRDD) outcome measures for children in a Medicaid population in South Carolina and sampled measures of soil chemistry (e.g. As, Hg, etc.) on a network of sites which are misaligned to the outcome residential addresses during pregnancy. The true chemical concentration at the residential addresses is not observed directly and must be interpolated from soil samples. In this study, we have developed a Bayesian joint model which interpolates soil chemical fields and estimates the associated MRDD risk simultaneously. Having multiple spatial fields to interpolate, we have considered a low-rank Kriging method for the interpolation which requires less computation than Bayesian Kriging. We performed a sensitivity analysis for a bivariate smoothing, changing the number of knots and the smoothing parameter. These analyses show that a low-rank Kriging method can be used as an alternative to a full-rank Kriging, reducing computational burden. However, the number of knots for the low-rank Kriging model need to be selected with caution as a bivariate surface estimation can be sensitive to the choice of the number of knots.
environmental exposure; logistic; spatial; low-rank Kriging; Bayesian
Prescriptions for radiation therapy are given in terms of dose-volume constraints (DVCs). Solving the fluence map optimization (FMO) problem while satisfying DVCs often requires a tedious trial-and-error for selecting appropriate dose control parameters on various organs. In this paper, we propose an iterative approach to satisfy DVCs using a multi-objective linear programming (LP) model for solving beamlet intensities. This algorithm, starting from arbitrary initial parameter values, gradually updates the values through an iterative solution process toward optimal solution. This method finds appropriate parameter values through the trade-off between OAR sparing and target coverage to improve the solution. We compared the plan quality and the satisfaction of the DVCs by the proposed algorithm with two nonlinear approaches: a nonlinear FMO model solved by using the L-BFGS algorithm and another approach solved by a commercial treatment planning system (Eclipse 8.9). We retrospectively selected from our institutional database five patients with lung cancer and one patient with prostate cancer for this study. Numerical results show that our approach successfully improved target coverage to meet the DVCs, while trying to keep corresponding OAR DVCs satisfied. The LBFGS algorithm for solving the nonlinear FMO model successfully satisfied the DVCs in three out of five test cases. However, there is no recourse in the nonlinear FMO model for correcting unsatisfied DVCs other than manually changing some parameter values through trial and error to derive a solution that more closely meets the DVC requirements. The LP-based heuristic algorithm outperformed the current treatment planning system in terms of DVC satisfaction. A major strength of the LP-based heuristic approach is that it is not sensitive to the starting condition.
Fluence Map Optimization (FMO); Linear Programming (LP); Nonlinear Programming (NLP); Dose-Volume Constraint (DVC); Intensity-Modulated Proton Therapy (IMPT)
We computationally investigate two approaches for uncertainty quantification in inverse problems for nonlinear parameter dependent dynamical systems. We compare the bootstrapping and asymptotic theory approaches for problems involving data with several noise forms and levels. We consider both constant variance absolute error data and relative error which produces non-constant variance data in our parameter estimation formulations. We compare and contrast parameter estimates, standard errors, confidence intervals, and computational times for both bootstrapping and asymptotic theory methods.
Uncertainty quantification; parameter estimation; nonlinear dynamic models; bootstrapping; asymptotic theory standard errors; ordinary least squares vs. generalized least squares; computational examples
Background: Studies estimating health effects of long-term air pollution exposure often use a two-stage approach: building exposure models to assign individual-level exposures, which are then used in regression analyses. This requires accurate exposure modeling and careful treatment of exposure measurement error.
Objective: To illustrate the importance of accounting for exposure model characteristics in two-stage air pollution studies, we considered a case study based on data from the Multi-Ethnic Study of Atherosclerosis (MESA).
Methods: We built national spatial exposure models that used partial least squares and universal kriging to estimate annual average concentrations of four PM2.5 components: elemental carbon (EC), organic carbon (OC), silicon (Si), and sulfur (S). We predicted PM2.5 component exposures for the MESA cohort and estimated cross-sectional associations with carotid intima-media thickness (CIMT), adjusting for subject-specific covariates. We corrected for measurement error using recently developed methods that account for the spatial structure of predicted exposures.
Results: Our models performed well, with cross-validated R2 values ranging from 0.62 to 0.95. Naïve analyses that did not account for measurement error indicated statistically significant associations between CIMT and exposure to OC, Si, and S. EC and OC exhibited little spatial correlation, and the corrected inference was unchanged from the naïve analysis. The Si and S exposure surfaces displayed notable spatial correlation, resulting in corrected confidence intervals (CIs) that were 50% wider than the naïve CIs, but that were still statistically significant.
Conclusion: The impact of correcting for measurement error on health effect inference is concordant with the degree of spatial correlation in the exposure surfaces. Exposure model characteristics must be considered when performing two-stage air pollution epidemiologic analyses because naïve health effect inference may be inappropriate.
Citation: Bergen S, Sheppard L, Sampson PD, Kim SY, Richards M, Vedal S, Kaufman JD, Szpiro AA. 2013. A national prediction model for PM2.5 component exposures and measurement error–corrected health effect inference. Environ Health Perspect 121:1017–1025; http://dx.doi.org/10.1289/ehp.1206010
Knowledge of the uncertainty in model parameters is essential for decision-making in drug development. Contrarily to other aspects of nonlinear mixed effects models (NLMEM), scrutiny towards assumptions around parameter uncertainty is low, and no diagnostic exists to judge whether the estimated uncertainty is appropriate. This work aims at introducing a diagnostic capable of assessing the appropriateness of a given parameter uncertainty distribution. The new diagnostic was applied to case bootstrap examples in order to investigate for which dataset sizes case bootstrap is appropriate for NLMEM. The proposed diagnostic is a plot comparing the distribution of differences in objective function values (dOFV) of the proposed uncertainty distribution to a theoretical Chi square distribution with degrees of freedom equal to the number of estimated model parameters. The uncertainty distribution was deemed appropriate if its dOFV distribution was overlaid with or below the theoretical distribution. The diagnostic was applied to the bootstrap of two real data and two simulated data examples, featuring pharmacokinetic and pharmacodynamic models and datasets of 20–200 individuals with between 2 and 5 observations on average per individual. In the real data examples, the diagnostic indicated that case bootstrap was unsuitable for NLMEM analyses with around 70 individuals. A measure of parameter-specific “effective” sample size was proposed as a potentially better indicator of bootstrap adequacy than overall sample size. In the simulation examples, bootstrap confidence intervals were shown to underestimate inter-individual variability at low sample sizes. The proposed diagnostic proved a relevant tool for assessing the appropriateness of a given parameter uncertainty distribution and as such it should be routinely used.
Parameter uncertainty distributions; Bootstrap; Model diagnostics; Nonlinear mixed-effects models
A dynamic treatment regime (DTR) comprises a sequence of decision rules, one per stage of intervention, that recommends how to individualize treatment to patients based on evolving treatment and covariate history. These regimes are useful for managing chronic disorders, and fit into the larger paradigm of personalized medicine. The Value of a DTR is the expected outcome when the DTR is used to assign treatments to a population of interest.
The Value of a data-driven DTR, estimated using data from a sequential multiple assignment randomized trial, is both a data-dependent parameter and a non-smooth function of the underlying generative distribution. These features introduce additional variability that is not accounted for by standard methods for conducting statistical inference, e.g., the bootstrap or normal approximations, if applied without adjustment. Our purpose is to provide a feasible method for constructing valid confidence intervals for this quantity of practical interest.
We propose a conceptually simple and computationally feasible method for constructing valid confidence intervals for the Value of an estimated DTR based on subsampling. The method is self-tuning by virtue of an approach called the double bootstrap. We demonstrate the proposed method using a series of simulated experiments.
The proposed method offers considerable improvement in terms of coverage rates of the confidence intervals over the standard bootstrap approach.
In this paper, we have restricted our attention to Q-learning for estimating the optimal DTR. However, other methods can be employed for this purpose; to keep the discussion focused, we have not explored these alternatives.
Subsampling-based confidence intervals provide much better performance compared to standard bootstrap for the Value of an estimated DTR.
It has been suggested that children with larger brains tend to perform better on IQ tests or cognitive function tests. Prenatal head growth and head growth in infancy are two crucial periods for subsequent intelligence. Studies have shown that environmental exposure to air pollutants during pregnancy is associated with fetal growth reduction, developmental delay, and reduced IQ. Meanwhile, genetic polymorphisms may modify the effect of environment on head growth. However, studies on gene–environment or gene–gene interactions on growth trajectories have been quite limited partly due to the difficulty to quantitatively measure interactions on growth trajectories. Moreover, it is known that assessing the significance of gene–environment or gene–gene interactions on cross-sectional outcomes empirically using the permutation procedures may bring substantial errors in the tests. We proposed a score that quantitatively measures interactions on growth trajectories and developed an algorithm with a parametric bootstrap procedure to empirically assess the significance of the interactions on growth trajectories under the likelihood framework. We also derived a Wald statistic to test for interactions on growth trajectories and compared it to the proposed parametric bootstrap procedure. Through extensive simulation studies, we demonstrated the feasibility and power of the proposed testing procedures. We applied our method to a real dataset with head circumference measures from birth to age 7 on a cohort currently being conducted by the Columbia Center for Children's Environmental Health (CCCEH) in Krakow, Poland, and identified several significant gene–environment interactions on head circumference growth trajectories.
gene–environment interactions; growth curves; Wald test; parametric bootstrap
Very frequently the same biological system is described by several, sometimes competing mathematical models. This usually creates confusion around their validity, ie, which one is correct. However, this is unnecessary since validity of a model cannot be established; model validation is actually a misnomer. In principle the only statement that one can make about a system model is that it is incorrect, ie, invalid, a fact which can be established given appropriate experimental data. Nonlinear models of high dimension and with many parameters are impossible to invalidate through simulation and as such the invalidation process is often overlooked or ignored.
We develop different approaches for showing how competing ordinary differential equation (ODE) based models of the same biological phenomenon containing nonlinearities and parametric uncertainty can be invalidated using experimental data. We first emphasize the strong interplay between system identification and model invalidation and we describe a method for obtaining a lower bound on the error between candidate model predictions and data. We then turn to model invalidation and formulate a methodology for discrete-time and continuous-time model invalidation. The methodology is algorithmic and uses Semidefinite Programming as the computational tool. It is emphasized that trying to invalidate complex nonlinear models through exhaustive simulation is not only computationally intractable but also inconclusive.
Biological models derived from experimental data can never be validated. In fact, in order to understand biological function one should try to invalidate models that are incompatible with available data. This work describes a framework for invalidating both continuous and discrete-time ODE models based on convex optimization techniques. The methodology does not require any simulation of the candidate models; the algorithms presented in this paper have a worst case polynomial time complexity and can provide an exact answer to the invalidation problem.
Case-control studies are widely used to detect gene-environment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development.
Bayesian inference; Errors in variables; Gene-environment interactions; Markov Chain Monte Carlo sampling; Missing data; Pseudo-likelihood; Semiparametric methods
In this paper, we study filtering of multiscale dynamical systems with model error arising from limitations in resolving the smaller scale processes. In particular, the analysis assumes the availability of continuous-time noisy observations of all components of the slow variables. Mathematically, this paper presents new results on higher order asymptotic expansion of the first two moments of a conditional measure. In particular, we are interested in the application of filtering multiscale problems in which the conditional distribution is defined over the slow variables, given noisy observation of the slow variables alone. From the mathematical analysis, we learn that for a continuous time linear model with Gaussian noise, there exists a unique choice of parameters in a linear reduced model for the slow variables which gives the optimal filtering when only the slow variables are observed. Moreover, these parameters simultaneously give the optimal equilibrium statistical estimates of the underlying system, and as a consequence they can be estimated offline from the equilibrium statistics of the true signal. By examining a nonlinear test model, we show that the linear theory extends in this non-Gaussian, nonlinear configuration as long as we know the optimal stochastic parametrization and the correct observation model. However, when the stochastic parametrization model is inappropriate, parameters chosen for good filter performance may give poor equilibrium statistical estimates and vice versa; this finding is based on analytical and numerical results on our nonlinear test model and the two-layer Lorenz-96 model. Finally, even when the correct stochastic ansatz is given, it is imperative to estimate the parameters simultaneously and to account for the nonlinear feedback of the stochastic parameters into the reduced filter estimates. In numerical experiments on the two-layer Lorenz-96 model, we find that the parameters estimated online, as part of a filtering procedure, simultaneously produce accurate filtering and equilibrium statistical prediction. In contrast, an offline estimation technique based on a linear regression, which fits the parameters to a training dataset without using the filter, yields filter estimates which are worse than the observations or even divergent when the slow variables are not fully observed. This finding does not imply that all offline methods are inherently inferior to the online method for nonlinear estimation problems, it only suggests that an ideal estimation technique should estimate all parameters simultaneously whether it is online or offline.
filtering multi-scale systems; covariance inflation; stochastic parameterization; uncertainty quantification; model error; parameter estimation