Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed.
Monte Carlo; correlated sampling; efficiency; uncertainty; bootstrap
The spatial and space-time scan statistics are commonly applied for the detection of geographical disease clusters. Monte Carlo hypothesis testing is typically used to test whether the geographical clusters are statistically significant as there is no known way to calculate the null distribution analytically. In Monte Carlo hypothesis testing, simulated random data are generated multiple times under the null hypothesis, and the p-value is r/(R + 1), where R is the number of simulated random replicates of the data and r is the rank of the test statistic from the real data compared to the same test statistics calculated from each of the random data sets. A drawback to this powerful technique is that each additional digit of p-value precision requires ten times as many replicated datasets, and the additional processing can lead to excessive run times.
We propose a new method for obtaining more precise p-values with a given number of replicates. The collection of test statistics from the random replicates is used to estimate the true distribution of the test statistic under the null hypothesis by fitting a continuous distribution to these observations. The choice of distribution is critical, and for the spatial and space-time scan statistics, the extreme value Gumbel distribution performs very well while the gamma, normal and lognormal distributions perform poorly. From the fitted Gumbel distribution, we show that it is possible to estimate the analytical p-value with great precision even when the test statistic is far out in the tail beyond any of the test statistics observed in the simulated replicates. In addition, Gumbel-based rejection probabilities have smaller variability than Monte Carlo-based rejection probabilities, suggesting that the proposed approach may result in greater power than the true Monte Carlo hypothesis test for a given number of replicates.
For large data sets, it is often advantageous to replace computer intensive Monte Carlo hypothesis testing with this new method of fitting a Gumbel distribution to random data sets generated under the null, in order to reduce computation time and obtain much more precise p-values and slightly higher statistical power.
In biomedical research, response variables are often encountered which have bounded support on the open unit interval - (0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models.
In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided.
If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the response data are generated from a discrete multinomial distribution with support on (0,1).
The linear regression model, the variable-dispersion beta regression model and the fractional logit regression model all perform well across the simulation experiments under consideration. When employing beta regression to estimate covariate effects on (0,1) response data, researchers should ensure their dispersion sub-model is properly specified, else inferential errors could arise.
Regression modelling; Linear regression; Beta regression; Variable-dispersion beta regression; Fractional Logit regression; Beta distribution; Multinomial distribution; Monte Carlo simulation
Concentrations of metabolites of illicit drugs in sewage water can be measured with great accuracy and precision, thanks to the development of sensitive and robust analytical methods. Based on assumptions about factors including the excretion profile of the parent drug, routes of administration and the number of individuals using the wastewater system, the level of consumption of a drug can be estimated from such measured concentrations. When presenting results from these ‘back-calculations’, the multiple sources of uncertainty are often discussed, but are not usually explicitly taken into account in the estimation process. In this paper we demonstrate how these calculations can be placed in a more formal statistical framework by assuming a distribution for each parameter involved, based on a review of the evidence underpinning it. Using a Monte Carlo simulations approach, it is then straightforward to propagate uncertainty in each parameter through the back-calculations, producing a distribution for instead of a single estimate of daily or average consumption. This can be summarised for example by a median and credible interval. To demonstrate this approach, we estimate cocaine consumption in a large urban UK population, using measured concentrations of two of its metabolites, benzoylecgonine and norbenzoylecgonine. We also demonstrate a more sophisticated analysis, implemented within a Bayesian statistical framework using Markov chain Monte Carlo simulation. Our model allows the two metabolites to simultaneously inform estimates of daily cocaine consumption and explicitly allows for variability between days. After accounting for this variability, the resulting credible interval for average daily consumption is appropriately wider, representing additional uncertainty. We discuss possibilities for extensions to the model, and whether analysis of wastewater samples has potential to contribute to a prevalence model for illicit drug use.
•Analysis of wastewater allows estimation of illicit drug consumption.•However, it is crucial to formally acknowledge the many sources of uncertainty.•The simple and flexible Monte Carlo simulation approach allows this.•There are many software options: we provide an Excel spreadsheet and R code.•Bayesian modelling using Markov chain Monte Carlo allows interesting extensions.
Sewage epidemiology; Monte Carlo simulation; Uncertainty propagation; Bayesian modelling; Illicit drugs
A new method for reconstruction of the interatomic distance distribution, P(r), directly from two-dimensional detector images of solution scattering data is developed and tested. This method employs Bayesian inference and a Markov chain Monte Carlo method to simultaneously estimate indirect transform coefficients and beam and detector parameters, while also evaluating the covariance among all parameters.
The interatomic distance distribution, P(r), is a valuable tool for evaluating the structure of a molecule in solution and represents the maximum structural information that can be derived from solution scattering data without further assumptions. Most current instrumentation for scattering experiments (typically CCD detectors) generates a finely pixelated two-dimensional image. In continuation of the standard practice with earlier one-dimensional detectors, these images are typically reduced to a one-dimensional profile of scattering intensities, I(q), by circular averaging of the two-dimensional image. Indirect Fourier transformation methods are then used to reconstruct P(r) from I(q). Substantial advantages in data analysis, however, could be achieved by directly estimating the P(r) curve from the two-dimensional images. This article describes a Bayesian framework, using a Markov chain Monte Carlo method, for estimating the parameters of the indirect transform, and thus P(r), directly from the two-dimensional images. Using simulated detector images, it is demonstrated that this method yields P(r) curves nearly identical to the reference P(r). Furthermore, an approach for evaluating spatially correlated errors (such as those that arise from a detector point spread function) is evaluated. Accounting for these errors further improves the precision of the P(r) estimation. Experimental scattering data, where no ground truth reference P(r) is available, are used to demonstrate that this method yields a scattering and detector model that more closely reflects the two-dimensional data, as judged by smaller residuals in cross-validation, than P(r) obtained by indirect transformation of a one-dimensional profile. Finally, the method allows concurrent estimation of the beam center and D
max, the longest interatomic distance in P(r), as part of the Bayesian Markov chain Monte Carlo method, reducing experimental effort and providing a well defined protocol for these parameters while also allowing estimation of the covariance among all parameters. This method provides parameter estimates of greater precision from the experimental data. The observed improvement in precision for the traditionally problematic D
max is particularly noticeable.
structure analysis; small-angle X-ray scattering; small-angle neutron scattering; Bayesian inference; Markov chain Monte Carlo methods
An important step in strain optimization is to identify reactions whose activities should be modified to achieve the desired cellular objective. Preferably, these reactions are identified systematically, as the number of possible combinations of reaction modifications could be very large. Over the last several years, a number of computational methods have been described for identifying combinations of reaction modifications. However, none of these methods explicitly address uncertainties in implementing the reaction activity modifications. In this work, we model the uncertainties as probability distributions in the flux carrying capacities of reactions. Based on this model, we develop an optimization method that identifies reactions for flux capacity modifications to predict outcomes with high statistical likelihood.
We compare three optimization methods that select an intervention set comprising up- or down-regulation of reaction flux capacity: CCOpt (Chance constrained optimization), DetOpt (Deterministic optimization), and MCOpt (Monte Carlo-based optimization). We evaluate the methods using a Monte Carlo simulation-based method, MCEval (Monte Carlo Evaluations). We present two case studies analyzing a CHO cell and an adipocyte model. The flux capacity distributions required for our methods were estimated from maximal reaction velocities or elementary mode analysis. The intervention set selected by CCOpt consistently outperforms the intervention set selected by DetOpt in terms of tolerance to flux capacity variations. MCEval shows that the optimal flux predicted based on the CCOpt intervention set is more likely to be obtained, in a probabilistic sense, than the flux predicted by DetOpt. The intervention sets identified by CCOpt and MCOpt were similar; however, the exhaustive sampling required by MCOpt incurred significantly greater computational cost.
Maximizing tolerance to variable engineering outcomes (in modifying enzyme activities) can identify intervention sets that statistically improve the desired cellular objective.
Enzyme activity modification; Flux capacity; Uncertainty; Chance-constrained optimization
This paper describes an extension of the perturbation Monte Carlo method to model light transport
when the phase function is arbitrarily perturbed. Current perturbation Monte Carlo methods allow
perturbation of both the scattering and absorption coefficients, however, the phase function can not
be varied. The more complex method we develop and test here is not limited in this way. We derive a
rigorous perturbation Monte Carlo extension that can be applied to a large family of important
biomedical light transport problems and demonstrate its greater computational efficiency compared
with using conventional Monte Carlo simulations to produce forward transport problem solutions. The
gains of the perturbation method occur because only a single baseline Monte Carlo simulation is
needed to obtain forward solutions to other closely related problems whose input is described by
perturbing one or more parameters from the input of the baseline problem. The new perturbation Monte
Carlo methods are tested using tissue light scattering parameters relevant to epithelia where many
tumors originate. The tissue model has parameters for the number density and average size of three
classes of scatterers; whole nuclei, organelles such as lysosomes and mitochondria, and small
particles such as ribosomes or large protein complexes. When these parameters or the wavelength is
varied the scattering coefficient and the phase function vary. Perturbation calculations give
accurate results over variations of ∼15–25% of the scattering
(170.0170) Medical optics and biotechnology; (170.3660) Light propagation in tissues; (170.6510) Spectroscopy, tissue diagnostics; (170.6935) Tissue characterization
Classification studies with high-dimensional measurements and relatively small sample sizes are increasingly common. Prospective analysis of the role of sample sizes in the performance of such studies is important for study design and interpretation of results, but the complexity of typical pattern discovery methods makes this problem challenging. The approach developed here combines Monte Carlo methods and new approximations for linear discriminant analysis, assuming multivariate normal distributions. Monte Carlo methods are used to sample the distribution of which features are selected for a classifier and the mean and variance of features given that they are selected. Given selected features, the linear discriminant problem involves different distributions of training data and generalization data, for which 2 approximations are compared: one based on Taylor series approximation of the generalization error and the other on approximating the discriminant scores as normally distributed. Combining the Monte Carlo and approximation approaches to different aspects of the problem allows efficient estimation of expected generalization error without full simulations of the entire sampling and analysis process. To evaluate the method and investigate realistic study design questions, full simulations are used to ask how validation error rate depends on the strength and number of informative features, the number of noninformative features, the sample size, and the number of features allowed into the pattern. Both approximation methods perform well for most cases but only the normal discriminant score approximation performs well for cases of very many weakly informative or uninformative dimensions. The simulated cases show that many realistic study designs will typically estimate substantially suboptimal patterns and may have low probability of statistically significant validation results.
Biomarker discovery; Experimental design; Generalization error; Genomic; Pattern recognition; Proteomic
To use a Monte Carlo simulation to predict postoperative results with the AcrySof® Toric lens, evaluating the likelihood of over- or under-correction using various toric lens selection criteria.
Keratometric data were obtained from a large patient population with preoperative corneal astigmatism <= 2.50D (2,000 eyes). The probability distributions for toric marking accuracy, surgically induced astigmatism and lens rotation were estimated using available data. Anticipated residual astigmatism was calculated using a Monte Carlo simulation under two different lens selection scenarios.
This simulation demonstrated that random errors in alignment, surgically induced astigmatism and lens rotation slightly reduced the overall effect of the toric lens. Residual astigmatism was statistically significantly higher under the simulation of surgery relative to an exact calculation (p < 0.05). The simulation also demonstrated that more aggressive lens selection criteria could produce clinically significant reductions in residual astigmatism in a high percentage of patients.
Monte Carlo simulation suggests that surgical variability and lens orientation/rotation variability may combine to produce small reductions in the correction achieved with the AcrySof® Toric® IOL. Adopting more aggressive lens selection criteria may yield significantly lower residual astigmatism values for many patients, with negligible overcorrections. Surgeons are encouraged to evaluate their AcrySof® Toric® outcomes to determine if they should modify their individual lens selection criteria, or their default surgically induced astigmatism value, to benefit their patients.
Using highly precise and accurate Monte Carlo simulations of 20,000,000 replications and 102 independent simulation experiments with extremely low simulation errors and total uncertainties, we evaluated the performance of four single outlier discordancy tests (Grubbs test N2, Dixon test N8, skewness test N14, and kurtosis test N15) for normal samples of sizes 5 to 20. Statistical contaminations of a single observation resulting from parameters called δ from ±0.1 up to ±20 for modeling the slippage of central tendency or ε from ±1.1 up to ±200 for slippage of dispersion, as well as no contamination (δ = 0 and ε = ±1), were simulated. Because of the use of precise and accurate random and normally distributed simulated data, very large replications, and a large number of independent experiments, this paper presents a novel approach for precise and accurate estimations of power functions of four popular discordancy tests and, therefore, should not be considered as a simple simulation exercise unrelated to probability and statistics. From both criteria of the Power of Test proposed by Hayes and Kinsella and the Test Performance Criterion of Barnett and Lewis, Dixon test N8 performs less well than the other three tests. The overall performance of these four tests could be summarized as N2≅N15 > N14 > N8.
High computational requirements restrict the use of Monte Carlo algorithms for dose estimation in a clinical setting, despite the fact that they are considered more accurate than traditional methods. The goal of this study was to compare mean tumor absorbed dose estimates using the unit density sphere model incorporated in OLINDA with previously reported dose estimates from Monte Carlo simulations using the dose planning method (DPMMC) particle transport algorithm. The dataset (57 tumors, 19 lymphoma patients who underwent SPECT/CT imaging during I-131 radioimmunotherapy) included tumors of varying size, shape, and contrast. OLINDA calculations were first carried out using the baseline tumor volume and residence time from SPECT/CT imaging during 6 days post-tracer and 8 days post-therapy. Next, the OLINDA calculation was split over multiple time periods and summed to get the total dose, which accounted for the changes in tumor size. Results from the second calculation were compared with results determined by coupling SPECT/CT images with DPM Monte Carlo algorithms. Results from the OLINDA calculation accounting for changes in tumor size were almost always higher (median 22%, range −1%–68%) than the results from OLINDA using the baseline tumor volume because of tumor shrinkage. There was good agreement (median −5%, range −13%–2%) between the OLINDA results and the self-dose component from Monte Carlo calculations, indicating that tumor shape effects are a minor source of error when using the sphere model. However, because the sphere model ignores cross-irradiation, the OLINDA calculation significantly underestimated (median 14%, range 2%–31%) the total tumor absorbed dose compared with Monte Carlo. These results show that when the quantity of interest is the mean tumor absorbed dose, the unit density sphere model is a practical alternative to Monte Carlo for some applications. For applications requiring higher accuracy, computer-intensive Monte Carlo calculation is needed.
radiation dosimetry; radioimmunotherapy; SPECT; Monte Carlo dosimetry
The Monte Carlo method provides the most accurate dose calculations on a patient computed tomography (CT) geometry. The increase in accuracy is, at least in part, due to the fact that instead of treating human tissues as water of various densities as in analytical algorithms, the Monte Carlo method allows human tissues to be characterized by elemental composition and mass density, and hence allows the accurate consideration of all relevant electromagnetic and nuclear interactions. On the other hand, the algorithm to convert CT Hounsfield numbers to tissue materials for Monte Carlo dose calculation introduces uncertainties. There is not a simple one to one correspondence between Hounsfield numbers and tissue materials. To investigate the effects of Hounsfield number conversion for proton Monte Carlo dose calculations, clinical proton treatment plans were simulated using the Geant4 Monte Carlo code. Three Hounsfield number to material conversion methods were studied. The results were compared in forms of dose volume histograms of gross tumor volume and clinical target volume. The differences found are generally small but can be dosimetrically significant. Further, different methods may cause deviations in the predicted proton beam range in particular for deep proton fields. Typically, slight discrepancies in mass density assignments play only a minor role in the target region, whereas more significant effects are caused by different assignments in elemental compositions. In the presence of large tissue inhomogeneities, for head and neck treatments, treatment planning decisions could be affected by these differences because of deviations in the predicted tumor coverage. Outside the target area, differences in elemental composition and mass density assignments both may play a role. This can lead to pronounced effects for organs at risk, in particular in the spread-out Bragg peak penumbra or distal regions. In addition, the significance of the elemental composition effect (dose to water vs. dose to tissue) is tissue-type dependent and is also affected by nuclear reactions.
Geant4; Monte Carlo; proton therapy; CT Hounsfield conversion
Patient-specific absorbed dose calculation for nuclear medicine therapy is a topic of increasing interest. 3D dosimetry at the voxel level is one of the major improvements for the development of more accurate calculation techniques, as compared to the standard dosimetry at the organ level. This study aims to use the FLUKA Monte Carlo code to perform patient-specific 3D dosimetry through direct Monte Carlo simulation on PET-CT and SPECT-CT images. To this aim, dedicated routines were developed in the FLUKA environment. Two sets of simulations were performed on model and phantom images. Firstly, the correct handling of PET and SPECT images was tested under the assumption of homogeneous water medium by comparing FLUKA results with those obtained with the voxel kernel convolution method and with other Monte Carlo-based tools developed to the same purpose (the EGS-based 3D-RD software and the MCNP5-based MCID). Afterwards, the correct integration of the PET/SPECT and CT information was tested, performing direct simulations on PET/CT images for both homogeneous (water) and non-homogeneous (water with air, lung and bone inserts) phantoms. Comparison was performed with the other Monte Carlo tools performing direct simulation as well. The absorbed dose maps were compared at the voxel level. In the case of homogeneous water, by simulating 108 primary particles a 2% average difference with respect to the kernel convolution method was achieved; such difference was lower than the statistical uncertainty affecting the FLUKA results. The agreement with the other tools was within 3–4%, partially ascribable to the differences among the simulation algorithms. Including the CT-based density map, the average difference was always within 4% irrespective of the medium (water, air, bone), except for a maximum 6% value when comparing FLUKA and 3D-RD in air. The results confirmed that the routines were properly developed, opening the way for the use of FLUKA for patient-specific, image-based dosimetry in nuclear medicine.
Multivariate assays (MVAs) for assisting clinical decisions are becoming commonly available, but due to complexity, are often considered a high-risk approach. A key concern is that uncertainty on the assay's final results is not well understood. This study focuses on developing a process to characterize error introduced in the MVA's results from the intrinsic error in the laboratory process: sample preparation and measurement of the contributing factors, such as gene expression.
Using the PAM50 Breast Cancer Intrinsic Classifier, we show how to characterize error within an MVA, and how these errors may affect results reported to clinicians. First we estimated the error distribution for measured factors within the PAM50 assay by performing repeated measures on four archetypal samples representative of the major breast cancer tumor subtypes. Then, using the error distributions and the original archetypal sample data, we used Monte Carlo simulations to generate a sufficient number of simulated samples. The effect of these errors on the PAM50 tumor subtype classification was estimated by measuring subtype reproducibility after classifying all simulated samples. Subtype reproducibility was measured as the percentage of simulated samples classified identically to the parent sample. The simulation was thereafter repeated on a large, independent data set of samples from the GEICAM 9906 clinical trial. Simulated samples from the GEICAM sample set were used to explore a more realistic scenario where, unlike archetypal samples, many samples are not easily classified.
All simulated samples derived from the archetypal samples were classified identically to the parent sample. Subtypes for simulated samples from the GEICAM set were also highly reproducible, but there were a non-negligible number of samples that exhibit significant variability in their classification.
We have developed a general methodology to estimate the effects of intrinsic errors within MVAs. We have applied the method to the PAM50 assay, showing that the PAM50 results are resilient to intrinsic errors within the assay, but also finding that in non-archetypal samples, experimental errors can lead to quite different classification of a tumor. Finally we propose a way to provide the uncertainty information in a usable way for clinicians.
Multivariate Assays; PAM50; Monte Carlo Simulations; Breast Cancer
In this article we use a latent class model (LCM) with prevalence modeled as a function of covariates to assess diagnostic test accuracy in situations where the true disease status is not observed, but observations on three or more conditionally independent diagnostic tests are available. A fast Monte Carlo EM (MCEM) algorithm with binary (disease) diagnostic data is implemented to estimate parameters of interest; namely, sensitivity, specificity, and prevalence of the disease as a function of covariates. To obtain standard errors for confidence interval construction of estimated parameters, the missing information principle is applied to adjust information matrix estimates. We compare the adjusted information matrix based standard error estimates with the bootstrap standard error estimates both obtained using the fast MCEM algorithm through an extensive Monte Carlo study. Simulation demonstrates that the adjusted information matrix approach estimates the standard error similarly with the bootstrap methods under certain scenarios. The bootstrap percentile intervals have satisfactory coverage probabilities. We then apply the LCM analysis to a real data set of 122 subjects from a Gynecologic Oncology Group (GOG) study of significant cervical lesion (S-CL) diagnosis in women with atypical glandular cells of undetermined significance (AGC) to compare the diagnostic accuracy of a histology-based evaluation, a CA-IX biomarker-based test and a human papillomavirus (HPV) DNA test.
adjusted information matrix; bootstrap standard errors; diagnostic accuracy; imperfect gold standard; latent class model; MCEM estimation
This paper presents results of Monte Carlo modeling of the SRP-68-01 survey meter used to measure exposure rates near the thyroid glands of persons exposed to radioactivity following the Chernobyl accident. This device was not designed to measure radioactivity in humans. To estimate the uncertainty associated with the measurement results, a mathematical model of the SRP-68-01 survey meter was developed and verified. A Monte Carlo method of numerical simulation of radiation transport has been used to calculate the calibration factor for the device and evaluate its uncertainty. The SRP-68-01 survey meter scale coefficient, an important characteristic of the device, was also estimated in this study. The calibration factors of the survey meter were calculated for 131I, 132I, 133I, and 135I content in the thyroid gland for six age groups of population: newborns; children aged 1 yr, 5 yr, 10 yr, 15 yr; and adults. A realistic scenario of direct thyroid measurements with an “extended” neck was used to calculate the calibration factors for newborns and one-year-olds. Uncertainties in the device calibration factors due to variability of the device scale coefficient, variability in thyroid mass and statistical uncertainty of Monte Carlo method were evaluated. Relative uncertainties in the calibration factor estimates were found to be from 0.06 for children aged 1 yr to 0.1 for 10-yr and 15-yr children. The positioning errors of the detector during measurements deviate mainly in one direction from the estimated calibration factors. Deviations of the device position from the proper geometry of measurements were found to lead to overestimation of the calibration factor by up to 24 percent for adults and up to 60 percent for 1-yr children. The results of this study improve the estimates of 131I thyroidal content and, consequently, thyroid dose estimates that are derived from direct thyroid measurements performed in Belarus shortly after the Chernobyl accident.
Chernobyl; Thyroid; Measurement; Survey meter; Monte Carlo
In recent years, the Monte Carlo method has been used in a large number of research studies in radiation therapy. For applications such as treatment planning, it is essential to validate the dosimetric accuracy of the Monte Carlo simulations in heterogeneous media. The AAPM Report no 105 addresses issues concerning clinical implementation of Monte Carlo based treatment planning for photon and electron beams, however for proton-therapy planning, such guidance is not yet available. Here we present the results of our validation of the Monte Carlo model of the double scattering system used at our Proton Therapy Center in Houston. In this study, we compared Monte Carlo simulated depth doses and lateral profiles to measured data for a magnitude of beam parameters. We varied simulated proton energies and widths of the spread-out Bragg peaks, and compared them to measurements obtained during the commissioning phase of the Proton Therapy Center in Houston. Of 191 simulated data sets, 189 agreed with measured data sets to within 3% of the maximum dose difference and within 3 mm of the maximum range or penumbra size difference. The two simulated data sets that did not agree with the measured data sets were in the distal falloff of the measured dose distribution, where large dose gradients potentially produce large differences on the basis of minute changes in the beam steering. Hence, the Monte Carlo models of medium- and large-size double scattering proton-therapy nozzles were valid for proton beams in the 100 MeV–250 MeV interval.
Serological studies are the gold standard method to estimate influenza infection attack rates (ARs) in human populations. In a common protocol, blood samples are collected before and after the epidemic in a cohort of individuals; and a rise in haemagglutination-inhibition (HI) antibody titers during the epidemic is considered as a marker of infection. Because of inherent measurement errors, a 2-fold rise is usually considered as insufficient evidence for infection and seroconversion is therefore typically defined as a 4-fold rise or more. Here, we revisit this widely accepted 70-year old criterion. We develop a Markov chain Monte Carlo data augmentation model to quantify measurement errors and reconstruct the distribution of latent true serological status in a Vietnamese 3-year serological cohort, in which replicate measurements were available. We estimate that the 1-sided probability of a 2-fold error is 9.3% (95% Credible Interval, CI: 3.3%, 17.6%) when antibody titer is below 10 but is 20.2% (95% CI: 15.9%, 24.0%) otherwise. After correction for measurement errors, we find that the proportion of individuals with 2-fold rises in antibody titers was too large to be explained by measurement errors alone. Estimates of ARs vary greatly depending on whether those individuals are included in the definition of the infected population. A simulation study shows that our method is unbiased. The 4-fold rise case definition is relevant when aiming at a specific diagnostic for individual cases, but the justification is less obvious when the objective is to estimate ARs. In particular, it may lead to large underestimates of ARs. Determining which biological phenomenon contributes most to 2-fold rises in antibody titers is essential to assess bias with the traditional case definition and offer improved estimates of influenza ARs.
Each year, seasonal influenza is responsible for about three to five million severe illnesses and about 250,000 to 500,000 deaths worldwide. In order to assess the burden of disease and guide control policies, it is important to quantify the proportion of people infected by an influenza virus each year. Since infection usually leaves a “signature” in the blood of infected individuals (namely a rise in antibodies), a standard protocol consists in collecting blood samples in a cohort of subjects and determining the proportion of those who experienced such rise. However, because of inherent measurement errors, only large rises are accounted for in the standard 4-fold rise case definition. Here, we revisit this 70 year old and widely accepted and applied criterion. We present innovative statistical techniques to better capture the impact of measurement errors and improve our interpretation of the data. Our analysis suggests that the number of people infected by an influenza virus each year might be substantially larger than previously thought, with important implications for our understanding of the transmission and evolution of influenza – and the nature of infection.
The quantification of uncertainty and variability is a key component of quantitative risk analysis. Recent advances in Bayesian statistics make it ideal for integrating multiple sources of information, of different types and quality, and providing a realistic estimate of the combined uncertainty in the final risk estimates.
We present two case studies related to foodborne microbial risks. In the first, we combine models to describe the sequence of events resulting in illness from consumption of milk contaminated with VTEC O157. We used Monte Carlo simulation to propagate uncertainty in some of the inputs to computer models describing the farm and pasteurisation process. Resulting simulated contamination levels were then assigned to consumption events from a dietary survey. Finally we accounted for uncertainty in the dose-response relationship and uncertainty due to limited incidence data to derive uncertainty about yearly incidences of illness in young children. Options for altering the risk were considered by running the model with different hypothetical policy-driven exposure scenarios. In the second case study we illustrate an efficient Bayesian sensitivity analysis for identifying the most important parameters of a complex computer code that simulated VTEC O157 prevalence within a managed dairy herd. This was carried out in 2 stages, first to screen out the unimportant inputs, then to perform a more detailed analysis on the remaining inputs. The method works by building a Bayesian statistical approximation to the computer code using a number of known code input/output pairs (training runs).
We estimated that the expected total number of children aged 1.5-4.5 who become ill due to VTEC O157 in milk is 8.6 per year, with 95% uncertainty interval (0,11.5). The most extreme policy we considered was banning on-farm pasteurisation of milk, which reduced the estimate to 6.4 with 95% interval (0,11). In the second case study the effective number of inputs was reduced from 30 to 7 in the screening stage, and just 2 inputs were found to explain 82.8% of the output variance. A combined total of 500 runs of the computer code were used.
These case studies illustrate the use of Bayesian statistics to perform detailed uncertainty and sensitivity analyses, integrating multiple information sources in a way that is both rigorous and efficient.
Autoregressive regression coefficients for Anopheles arabiensis aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of An. arabiensis aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of An. arabiensis aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled Anopheles aquatic habitat covariates. A test for diagnostic checking error residuals in an An. arabiensis aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature.
Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix.
By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with An. arabiensis aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled An. arabiensis aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat.
An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific An. arabiensis aquatic habitats based on larval/pupal productivity.
The principal goal of this methodological paper is to suggest to a general audience in the genetics community that the consideration of recent developments of self regulating branching processes may lead to the possibility of including this class of stochastic processes as part of working paradigm of evolutionary and population genetics. This class of branching processes is self regulating in the sense that an evolving population will grow only to a total population size that can be sustained by the environment. From the mathematical point of view the class processes under consideration belongs to a subfield of probability and statistics sometimes referred to as computational applied probability and stochastic processes. Computer intensive methods based on Monte Carlo simulation procedures have been used to empirically work out the predictions of a formulation by assigning numerical values to some point in the parameter space and computing replications of realizations of the process over thousands of generations of evolution. Statistical methods are then used on such samples of simulated data to produce informative summarizations of the data that provide insights into the evolutionary implications of computer experiments. Briefly, it is also possible to embed deterministic non-linear difference equations in the stochastic process by using a statistical procedure to estimate the sample functions of the process, which has interesting methodological implications as to whether stochastic or deterministic formulations may be applied separately or in combination in the study of evolution. It is recognized that the literature on population genetics contains a substantial number of papers in which Monte Carlo simulation methods have been used. But, this extensive literature is beyond the scope of this paper, which is focused on potential applications of self regulating branching processes in evolutionary and population genetics.
simulating evolution; mutations; density dependence; Monte Carlo methods; statistical summarizations; branching processes; embedded deterministic model
Detailed modeling and simulation of biochemical systems is complicated by the problem of combinatorial complexity, an explosion in the number of species and reactions due to myriad protein-protein interactions and post-translational modifications. Rule-based modeling overcomes this problem by representing molecules as structured objects and encoding their interactions as pattern-based rules. This greatly simplifies the process of model specification, avoiding the tedious and error prone task of manually enumerating all species and reactions that can potentially exist in a system. From a simulation perspective, rule-based models can be expanded algorithmically into fully-enumerated reaction networks and simulated using a variety of network-based simulation methods, such as ordinary differential equations or Gillespie's algorithm, provided that the network is not exceedingly large. Alternatively, rule-based models can be simulated directly using particle-based kinetic Monte Carlo methods. This “network-free” approach produces exact stochastic trajectories with a computational cost that is independent of network size. However, memory and run time costs increase with the number of particles, limiting the size of system that can be feasibly simulated. Here, we present a hybrid particle/population simulation method that combines the best attributes of both the network-based and network-free approaches. The method takes as input a rule-based model and a user-specified subset of species to treat as population variables rather than as particles. The model is then transformed by a process of “partial network expansion” into a dynamically equivalent form that can be simulated using a population-adapted network-free simulator. The transformation method has been implemented within the open-source rule-based modeling platform BioNetGen, and resulting hybrid models can be simulated using the particle-based simulator NFsim. Performance tests show that significant memory savings can be achieved using the new approach and a monetary cost analysis provides a practical measure of its utility.
Rule-based modeling is a modeling paradigm that addresses the problem of combinatorial complexity in biochemical systems. The key idea is to specify only those components of a biological macromolecule that are directly involved in a biochemical transformation. Until recently, this “pattern-based” approach greatly simplified the process of model building but did nothing to improve the performance of model simulation. This changed with the introduction of “network-free” simulation methods, which operate directly on the compressed rule set of a rule-based model rather than on a fully-enumerated set of reactions and species. However, these methods represent every molecule in a system as a particle, limiting their use to systems containing less than a few million molecules. Here, we describe an extension to the network-free approach that treats rare, complex species as particles and plentiful, simple species as population variables, while retaining the exact dynamics of the model system. By making more efficient use of computational resources for species that do not require the level of detail of a particle representation, this hybrid particle/population approach can simulate systems much larger than is possible using network-free methods and is an important step towards realizing the practical simulation of detailed, mechanistic models of whole cells.
Develop fast sequential Bayesian inference for disease outbreak counts.
Development of effective policy interventions to stem disease outbreaks requires knowledge of the current state of affairs, e.g. how many individuals are currently infected, a strain’s virulence, etc, as well as our uncertainty of these values. A Bayesian inferential approach provides this information, but at a computational expense. We develop a sequential Bayesian approach based on an epidemiological compartment model and noisy count observations of the transitions between compartments.
For simplicity, consider an SIR epidemiological compartment model where compartments exist for susceptible, infected, and recovered individuals. Transitions between compartments occur in discrete time with transitions numbers given by Poisson random variables, the tau-leaping approximation, whose means depend on the current compartment occupancy and some unknown fixed parameters, e.g. virulence. Binomial observations, with possible unknown sampling proportion, are made on these transitions.
The standard sequential Bayesian updating methodology is sequential Monte Carlo (SMC), a.k.a. particle filtering. The original bootstrap filter is effective when the system has no fixed parameters, but exhibits marked degeneracy otherwise . An approach based on resampling the fixed parameters from a kernel density estimate provides a generic approach with less degeneracy .
We build methodology based on a particle learning framework . In this framework, each particle carries a set of parameter-specific sufficient statistics and samples parameter values whenever necessary. In addition, the methodology promotes a resample-move approach based on the predictive likelihood that reduces degeneracy in the first place.
An improvement on the particle learning framework in this model is that some fixed parameters can be integrated out of the predictive likelihood. This Rao-Blackwellization provides an SMC methodology with reduced Monte Carlo variance.
For a fixed number of particles or computational expense, we show improvements in accuracy relative to the kernel density approach and an alternative approach based on sufficient statistics  where compared with a gold-standard Markov chain Monte Carlo analysis.
Many surveillance systems collect counts of adverse events related to some disease. These counts are expected to be a fraction of the true underlying disease extent. The methodology developed here allows a fully Bayesian analysis that uncovers the true number of infected individuals as well as disease virulence based on these count data. This statistical approach can be combined with an optimal policy map to help public health officials react effectively to initial disease reports.
surveillance; Bayesian; sequential Monte Carlo; particle learning
We have implemented highly accurate Monte Carlo based scatter modeling (MCS) with 3-D ordered subsets expectation maximization (OSEM) reconstruction for I-131 single photon emission computed tomography (SPECT). The scatter is included in the statistical model as an additive term and attenuation and detector response are included in the forward/backprojector. In the present implementation of MCS, a simple multiple window-based estimate is used for the initial iterations and in the later iterations the Monte Carlo estimate is used for several iterations before it is updated. For I-131, MCS was evaluated and compared with triple energy window (TEW) scatter compensation using simulation studies of a mathematical phantom and a clinically realistic voxel-phantom. Even after just two Monte Carlo updates, excellent agreement was found between the MCS estimate and the true scatter distribution. Accuracy and noise of the reconstructed images were superior with MCS compared to TEW. However, the improvement was not large, and in some cases may not justify the large computational requirements of MCS. Furthermore, it was shown that the TEW correction could be improved for most of the targets investigated here by applying a suitably chosen scaling factor to the scatter estimate. Finally clinical application of MCS was demonstrated by applying the method to an I-131 radioimmunotherapy (RIT) patient study.
I-131 SPECT; image reconstruction; Monte Carlo; scatter correction; SPECT quantification
There is considerable uncertainty in the disease rate estimation for aggregated area maps, especially for small population areas. As a consequence the delineation of local clustering is subject to substantial variation. Consider the most likely disease cluster produced by any given method, like SaTScan, for the detection and inference of spatial clusters in a map divided into areas; if this cluster is found to be statistically significant, what could be said of the external areas adjacent to the cluster? Do we have enough information to exclude them from a health program of prevention? Do all the areas inside the cluster have the same importance from a practitioner perspective?
We propose a method to measure the plausibility of each area being part of a possible localized anomaly in the map. In this work we assess the problem of finding error bounds for the delineation of spatial clusters in maps of areas with known populations and observed number of cases. A given map with the vector of real data (the number of observed cases for each area) shall be considered as just one of the possible realizations of the random variable vector with an unknown expected number of cases. The method is tested in numerical simulations and applied for three different real data maps for sharply and diffusely delineated clusters. The intensity bounds found by the method reflect the degree of geographic focus of the detected clusters.
Our technique is able to delineate irregularly shaped and multiple clusters, making use of simple tools like the circular scan. Intensity bounds for the delineation of spatial clusters are obtained and indicate the plausibility of each area belonging to the real cluster. This tool employs simple mathematical concepts and interpreting the intensity function is very intuitive in terms of the importance of each area in delineating the possible anomalies of the map of rates. The Monte Carlo simulation requires an effort similar to the circular scan algorithm, and therefore it is quite fast. We hope that this tool should be useful in public health decision making of which areas should be prioritized.