The results given in serve to illustrate two key points. First, even in simple settings such as logistic regression with a single binary exposure, where simulation-based estimators may be expected to be relatively well behaved, MCE can be substantial. Thus, to obtain accurate Monte Carlo estimates of quantities such as bias and power, we may need to perform a simulation with surprisingly large numbers of replications. Second, the magnitude of MCE, and thus the number of replications required, depends on both the design fX
(·) and the target quantity of interest
. As such, whereas “rules of thumb” are useful in a wide range of settings (e.g. van Belle 2002
), it seems unlikely that a single choice for R
will provide practical guidance in a broad range of simulation settings. Consequently, for a reader to fully understand and place into context results obtained via a simulation study, the results should be accompanied by some measure of associated uncertainty.
To gauge the extent to which Monte Carlo error is considered and/or reported in the current literature, we conducted a survey of published articles from a nonrandom sample of three statistics journals: Biometrics, Biometrika, and JASA. We considered all regular articles published in 2007, excluding only those for which MCMC was used as part of a single analysis; Bayesian simulation studies, where the entire MCMC process was repeated, were retained. Each article was downloaded electronically, and a search was performed for any of the following terms: “bootstrap,” “dataset,” “Monte Carlo,” “repetition,” “replication,” “sample,” and “simulation.” In addition, when indicated by the main article, we also performed the search on supplementary materials available online. Articles for which the search returned a positive result were read in detail to determine whether or not a simulation-based result was reported and, if so, whether or justification for the number of replications and/or estimates of MCE was provided.
Of the 328 regular articles studied, 223 reported the results of a simulation study; only 8 reported estimates of MCE. In a similar survey conducted more than 20 years ago, Hauck and Anderson (1984)
found that of the 294 regular articles published in the same three journals in 1981, 63 reported the results of a simulation study, and of those 63, 5 reported some justification for the number of replications. We also recorded the number of replications for each article. Some articles had multiple simulations, for which varying levels of R
were used; in such cases we took the largest reported value of R
. From , of the 223 articles reporting a simulation study, 5 did not explicitly report R
. For those that did report R
, we see wide variability in the number of replications used. The most common choice was R
= 1000 (74 articles); only 5 articles used a value of R
Number of replications associated with simulation studies reported in regular articles published in 2007.
Without the benefit of a reported justification for R
, and given the often-complex nature of many recently proposed methods, it is reasonable to assume that the specific choice for many of these articles was driven by time constraints imposed by the computational burden or by the somewhat arbitrary use of round numbers (often multiples of 100 or 1000). Although work continues on improving the efficiency of simulations (e.g. Efron and Tibshirani 1993
; Robert and Casella 2004
; Givens and Hoeting 2005
), in many cases little can be done to substantially reduce the time needed to run even a single iteration, especially as problems to which simulations are applied become increasingly complex. Given the results of the logistic regression example in Section 2.2, however, such simulations may plausibly experience greater MCE than traditionally thought, suggesting that more emphasis should be placed on reporting MCE in the literature.