Implementation is a straightforward process. For simplicity we divide the process into two phases. The first phase is the design of the use of random numbers. Stochastic events within an individual to be held common across model runs are identified and combined into groups based on their function in the simulated disease process. The second phase is the designation of unique sequences of random numbers to each group within each individual in simulation model code.
Phase One: Design
We illustrate phase one using a microsimulation model of the natural history of breast cancer that was developed in the C programming language. In this model, women are simulated individually, each facing stochastic events relevant to breast cancer including breast cancer onset, growth and progression, detection, treatment and death [5
]. The stochastic events are identified and combined into independent, mutually exclusive groups based on their function in the simulated disease process, ordering, and conditionality of the event within an individual’s lifetime (). Our groups include: 1) woman- and tumor-level characteristics including tumor onset and growth; 2) symptom detection; and 3) determination of screening schedules and screen detection. Each group of events within an individual is assigned an independent sequence of random numbers as described below in Phase 2.
Partitions of Stochastic Events in Breast Cancer Model
Separating events in this manner allows a simulated woman to have a tumor identical in the timing of onset and growth, regardless of her screening schedule or the sensitivity of mammography and vice versa. A change in screening schedule may affect the timing of detection but not the timing of tumor onset and growth up to that point. For example, in an analysis that estimates incremental effects between frequent and infrequent mammography screening, if detection occurs earlier in the “frequent” screening scenario for a particular woman, subsequent events in her lifetime from different random number groups may no longer be in sync. The random number used to select treatment effectiveness, a stochastic event in the first grouping, may be different as the event may occur later in a woman’s lifetime. If desired, we can synchronize the random number for this event to ensure a woman has the same treatment “tendency” regardless of when the event occurs (). If treatment is not effective for a woman’s breast cancer when it was diagnosed at an early stage, synchronizing this event ensures that treatment is not effective if it is diagnosed later. Thus, the “tendency” for treatment to be ineffective in this particular simulated woman is preserved. To implement, we can assign the event a separate grouping with a different sequence of random numbers or re-order when the random number is drawn. For example, pre-drawing and saving a random number at the beginning of a woman’s life ensures synchronization.
Stylized example of the synchronization of a random event under a frequent screening scenario compared with infrequent screening
We note that the level of detail for synchronization should be determined in part by the strength of clinical evidence and level of detail necessary for the model analysis. For example, in breast cancer, the “tendency” for treatment response within an individual may be important to model as it may represent genetic factors or tumor markers (e.g., estrogen receptor status and tumor response to adjuvant treatments such as tamoxifen). There may be other situations where little clinical evidence supports such a relationship in which case synchronizing random numbers is not desired and could lead to incorrect conclusions.
Phase 2: Using random numbers
Once the events are grouped, random numbers that are “common” across model runs are assigned in the model code. We first discuss considerations for choosing a random number generator then provide several practical methods for implementation within model code.
Choosing a Random Number Generator for CRN
While a full discussion of random number generators is beyond the scope of this paper, we note a few properties important to consider when selecting a random number generator for CRN and highlight two random number generators in particular.
In general, random number generators produce sequences of numbers that appear to form a random sample from a particular distribution, typically a uniform distribution. From the uniform distribution, random numbers from all other distributions needed for modeling events can be simulated [11
]. As there are many random number generators available to modelers, we include several references that provide more information on random number generator properties and tests for their quality [11
For CRN, key features of the random number generator used are the length, reproducibility, and independence of random number sequences. As the sequences of random numbers from a random number generator are finite, the length or “period” of the sequence needs to be long enough such that the sequence is not repeated within any model run. If repeated, simulated individuals, for example, are not guaranteed to be independent which may bias model outcomes in unintended ways. Most generators allow for multiple, reproducible sequences of random numbers through the specification of a “seed” or initial value for the sequence. Initializing the generator with the same seed produces the same sequence of random numbers, functionality needed to coordinate the random numbers across model runs. Unique seeds produce unique sequences, functionality needed to simulate unique individuals within a model run. However, in many generators the independence of sequences with unique seeds is not always guaranteed. For example, the use of seeds that are linearly related can produce correlated sequences of random numbers in some generators [15
]. Unintended bias in model outcomes may result as simulated individuals are also not guaranteed to be independent. Caution should be used in choosing seed values [17
]. Random number generators that have methods to guarantee the independence of sequences from different seeds are preferred for CRN as this will further ensure that unintended bias between individuals will not be introduced.
Two random number generators that we have experience with are the “Mersenne Twister” [18
] and one developed by L’Ecuyer and colleagues which we will refer to as “RngStream” [19
]. The first, the Mersenne Twister, is computationally very fast, is freely available in many languages including C, and has a sufficiently long period (219937
-1) for disease simulation models. (Available at: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
The second, the RngStream, was designed for CRN and is freely available in C (Available at: http://www.iro.umontreal.ca/~lecuyer/myftp/streams00/
). It allows for the “instantiation” or creation of multiple independent sequences (called “streams”), each of which has a sufficiently long period (2 191
) for most disease simulation models. Within each stream, it allows for independent “substreams” which also have sufficiently long periods (2 51
). Although this generator is perhaps computationally slower than the Mersenne Twister, the independence properties and its ease of use in general make it desirable for CRN.
Methods for Implementation within Model Code
The first method, and least prone to unintended correlations, is to use a random number generator designed for common random numbers such as the RngStream generator developed by L’Ecuyer and colleagues described above [19
]. We use that generator in the breast cancer model. To implement in the model code, we assigned a stream for each group and used a substream sequence for each individual for each group of stochastic events. Using the same initial seeds for the random number streams across runs ensures commonality. Because the generator has methods to ensure the independence of random number sequences across streams and substreams individuals should not be correlated in an unforeseen way.
If reprogramming to use the RngStream generator is not desired, a similar technique for the assignment can be implemented with most any random number generator. Like the “streams”, multiple instantiations (or copies) of the random number generator are declared in the code, one for each group of stochastic events. To ensure the same sequences are used for individuals across model runs (like the “substreams”), the random number generators are initialized for each individual using the same seeds per individual. A practical method for generating individual-level seeds is to use a separate instantiation of the random number generator where the sequence of random numbers serves as the seeds. However as noted above care should be taken in choosing seeds as this method has the potential to introduce unintended bias. Particular types of generators can produce sequences that are correlated across seeds and simulated individuals would not necessarily be guaranteed to be independent within a model run [15
As another alternative method, a fixed number of random numbers from a single instantiation of any random number generator can be designated for each group of events and each individual [8
]. To ensure commonality across runs, the same set of numbers is used for individuals across model runs. This set of numbers can be pre-drawn. One consideration is that the length of the sequence per group per individual needs to be sufficiently large to accommodate all possible events in that group in any simulated individual’s lifetime. Since most individuals will not “use” all of their allotted random numbers, the generation of extra random numbers may be inefficient and potentially cumbersome depending on the random number generator used and how it is managed within the simulation model code. The benefit of this method is that independence across individuals is preserved and any potential bias induced during the seeding of the generator is avoided altogether [17
We note that stochastic events need not be grouped at all for CRN. An alternative is to assign each event its own unique sequence of random numbers that is held constant across runs [7
]. If separate instantiations of the random number generator are used for each event, this may require re-initializing all the generators with different seeds for each individual potentially increasing computational time. This downside is by far outweighed by the other gains in efficiencies that can be realized from the use of CRN. We also note that all events need not and perhaps should not be synchronized although benefit in terms of stochastic noise reduction across runs may be diminished [8
]. While the capability for counterfactual analyses at an individual level is dependent on more complete synchronization, as noted above the degree of synchronization should follow the strength of clinical evidence.
Of practical concern is the management of model output for individual level analyses using CRN. This may require individual level data for a particular model run to either be stored in memory or outputted to a file to be compared post-hoc with individual level data from alternative model runs. As another option, the model may be programmed to rerun the same individual sequentially under alternative modeling scenarios and compute differences in outcomes in individuals as they are simulated.