The traditional approach to intervention development has involved constructing an intervention a priori and then evaluating it in a standard randomized confirmatory trial (RCT). After the confirmatory trial, post-hoc analyses are done to help explain how the intervention worked, or why it did not work. The results of these analyses may be used to refine the intervention program and construct a second generation version of the program, which is then evaluated in a new RCT.
Collins, Murphy, Nair, and Strecher2
reviewed shortcomings of this approach. While acknowledging that RCTs are the undisputed gold standard for assessing the effect of an intervention as a package once it has been developed, they pointed out that the post hoc analyses that typically follow a RCT in order to inform further intervention design and evaluation are subject to bias because they are not based on random assignment. As a result the cycle of intervention – RCT – post hoc analyses – revision of intervention – RCT is likely to lead very slowly, if at all, to an optimized intervention.
Collins et al. also pointed out that most behavioral interventions can be considered an aggregation of a set of components. Some intervention components are a part of the program itself (e.g. program content). Others may be more concerned with the delivery of the program (e.g., whether a message is delivered by a lay person or by a physician). Some components may be having the intended effect; others may be having no effect at all; and others may even be reducing the overall potency of the intervention. Because the traditional RCT evaluates the intervention only as a whole, using the RCT alone does not enable isolation of the effects of individual program or delivery components. A different experimental approach is necessary to accomplish this.
We suggest MOST as an alternative way of building, optimizing and evaluating e-health interventions. MOST incorporates the standard RCT, but before the RCT is undertaken also includes a principled method for identifying which components are active in an intervention, and which doses of each component lead to the best outcomes. The principles underlying MOST are drawn from engineering, and emphasize efficiency. MOST consists of three phases, each of which addresses a different set of questions about the intervention by means of randomized experimentation.
offers an outline of the three phases of MOST. The first phase is screening. The starting point for the screening phase is a previously identified finite set of intervention components, made up of program components and/or delivery components. It is assumed that there is some theoretical basis for the choice of these components. It is also assumed that any initial pilot testing necessary to assess feasibility and finalize the details of implementation has been completed prior to the start of the screening phase.
Outline of the Multiphase Optimization Strategy (MOST)
The objective of the screening phase is to address questions like the following: Which of the set of program components are active and contributing to positive outcomes, and should be included in the intervention? Which program components are inactive or counterproductive, and should be discarded? Which of the set of delivery components are active and make a difference in the intervention outcome, and thus play a role in maintaining intervention fidelity? Decisions about which program and delivery components are active and should be retained and which are inactive and should be discarded are made based on the results of a randomized experiment. (Experimental design alternatives are discussed below.) The decision may be made on the basis of statistical significance at any alpha level deemed appropriate, or on the basis of estimated effect size. In addition, cost in relation to incremental contribution to the desired outcome may be a consideration. At the conclusion of the screening phase, a set of program and delivery components that are to be retained for further examination has been identified. This set of components constitutes a “first draft” intervention.
This “first draft” intervention is the starting point for the next phase of MOST, the refining phase. In this phase the “first draft” intervention is examined further, with the objective of fine-tuning the intervention and arriving at a “final draft.” The specific activities of the refining phase depend on the intervention being considered, but in general focus on questions such as: Given the components identified in the screening phase, what are the optimal doses? Does the optimal dose vary depending on individual or group characteristics? As in the screening phase, in the refining phase decisions are based on randomized experimentation, and cost may be a consideration. At the conclusion of the refining phase, the investigator has identified an optimized “final draft” intervention consisting of a set of active program and delivery components at the best doses.
The “final draft” intervention provides the starting point for the third phase of MOST, confirming. In the confirming phase this optimized intervention is evaluated in a standard RCT. The confirming phase addresses questions such as: Is this intervention, as a package, efficacious? Is the intervention effect large enough to justify investment in community implementation?
A brief illustrative example
Because MOST is an approach or a perspective rather than an off-the-shelf procedure, exact details about its implementation depend on the application. In order to illustrate MOST, we offer a brief hypothetical example similar to the one in Collins et al.2
. The example is based on (but not identical to or an account of) the work of one of the authors of the current article (VS).
Suppose the objective is to use MOST to build, optimize and evaluate an e-intervention for smoking cessation, and that six components have been identified for study, four of which program components and two of which are delivery components. The program components are outcome expectation messages (messages addressing an individual’s expectations about what will happen if he or she quits smoking) which may be either present or absent in the intervention; efficacy expectation messages (these address barriers to perceived self-efficacy) which may be present or absent; message framing (this concerns how the persuasive messages about quitting smoking are to be framed), which may be positive or negative; and testimonials (from former smokers), which may be present or absent. The delivery components are exposure schedule, which may be one long message or four smaller ones; and source of message, which may be a primary care physician or the individual’s health maintenance organization (HMO).
In the screening phase randomized experimentation is conducted to isolate the effects of each of the six components. Suppose the experimental results indicate that the active program components are outcome expectation messages, efficacy expectation messages, and testimonials, and that there is one active delivery component, exposure schedule. Once this “first draft” intervention has been identified, the screening phase is concluded. The intervention scientist now proceeds to the refining phase in order to fine-tune the “first draft” and arrive at an optimized intervention. An example of this fine-tuning might be experimentation to pinpoint the best dose of exposure schedule, in other words, the optimal number of messages. The “final draft” intervention would then consist of outcome expectation messages, efficacy expectation messages, and testimonials, with the intervention delivered using the optimal number of messages identified in the refining phase. In the confirming phase, this “final draft” smoking cessation intervention is evaluated in a standard RCT.
Design for the screening and refining phases
The research design to be used in the confirming phase (i.e., the RCT) is straightforward and familiar to most intervention scientists. Usually a simple two-group design consisting of random assignment to either a program condition or a suitable comparison condition would be used. It may be less evident what design would be used in the screening and refining phases. One family of designs that lends itself well to the screening and refining phases is the factorial analysis of variance (ANOVA). In an ANOVA design several independent variables, or factors, are investigated at once. A properly chosen and implemented ANOVA design permits the effects of individual independent variables to be isolated. In the behavioral sciences the factors are usually “fully crossed” which means that each level of a variable is combined with each level of the other variables.
For example, suppose there are just two program components under consideration: outcome expectation messages and efficacy expectation messages. To examine these in the screening phase using a fully crossed factorial ANOVA, subjects would be randomly assigned to one of four experimental conditions: both messages present; outcome expectation messages only; efficacy expectation messages only; and both messages absent (perhaps an information-only control). At the end of the screening phase, after the experiment was completed, the decision about which components to select for further consideration would be based on the main effect and interaction estimates obtained from the ANOVA. The decision may be made by selecting statistically significant effects; it may be made by choosing components associated with an estimated effect size over some threshold level; or it may use the results of the ANOVA in some other way.
Although factorial designs are the most efficient way to assess the effect of several independent variables simultaneously, they have for the most part been eschewed by intervention scientists because of the perception that they are impractical due to the number of conditions that must be implemented. For example, a fully crossed ANOVA design to investigate the six components in our example would involve 64 treatment conditions. This may in fact be too many conditions to manage for interventions delivered by teachers and practitioners in settings like schools and hospitals, but it does not necessarily follow that the field of e-health should be similarly discouraged about factorial ANOVA designs. Because e-health interventions are delivered electronically, the primary cost often will be the computer programming required to construct each of the conditions. Once this task is done, it may be relatively straightforward to assign individuals randomly to experimental conditions and then deliver the corresponding version of the intervention. Thus, factorial ANOVA may be more feasible in e-health than it is in other more traditional areas of intervention science.
However, when there are many factors, the construction of each condition in a fully crossed ANOVA design may be too much for an e-intervention study. In this case, fractional factorial ANOVA designs can be an attractive alternative. When using fractional factorial ANOVA designs, it is not necessary to include every possible experimental condition in the design. Instead, based on working assumptions made by the investigator, a subset of conditions is chosen strategically in order to estimate effects of primary interest. Fractional factorial designs are not new; they go back to Fisher4
and Box, Hunter, and Hunter,5
and have been used routinely in engineering and agriculture for many years. Fagerlin et al.6
recently employed a fractional factorial approach in medical decision making research. Intervention science also can and should benefit from the efficiency and economy these designs provide.
Collins et al.2
illustrated how a six-factor fully crossed ANOVA design with 64 conditions can be reduced to a fractional factorial ANOVA design with 16 conditions. The reduced design retains the capability to provide main effects estimates for each of the six independent variables, and also the capability to provide estimates for selected interactions. The power associated with the test of each main effect is the same as that for any simple two-group comparison. In the refining phase, variants on fractional factorial designs, such as response surfaces, may be useful for questions involving identification of optimal doses.
As mentioned above, in some situations it may not be necessary or desirable to base decisions strictly on hypothesis tests.2
If hypothesis testing is to be used, it may be necessary to control the experiment-wise error rate. As a simple expedient, we suggest identifying a priori a limited set of effects predicted to be sizeable, testing those at the desired alpha level without regard to the experiment-wise error rate, and then using a Bonferroni or similar adjustment for the remaining effects. (For more about the experiment-wise error rate see Wu and Hamada.7
) Note that in general interaction effect sizes tend to be small, making it important to power a study accordingly if interactions are of particular interest.
Although we propose that MOST is useful in a wide variety of intervention development settings, there are some situations in which investigators may wish to consider a different approach. When applied to the building of new interventions, MOST is based on the idea that it is feasible to identify individual program components that can stand alone, at least enough to assess their individual effects. It may not be sensible to parse an extremely tightly integrated program into separate parts. Even when meaningful individual program components can be identified, it may be expected that each component has a very small, difficult to detect effect that nevertheless contributes to a larger, more readily detectable cumulative effect of the entire package. If in addition it can safely be assumed that none of the components has a deleterious effect or reduces the efficacy of other components, then the effects of individual components may not be of much interest. However, MOST may still be helpful in examining delivery components associated with these interventions.
Even when an intervention can be meaningfully decomposed, it is possible that the list of components cannot be combined at will, in other words, not every combination of program components is sensible to implement. In some cases a fractional factorial design may be chosen that includes only sensible combinations of components. If there is a component that is expected not to operate properly in the absence of another component, it may be more fruitful to consider the two components as one for the purpose of building the intervention.
In the following section, the SMART trial, another type of design that can be used as a stand-alone method or may be useful in the refining phase of MOST, is described. Adaptive interventions and the Sequential Multiple Assignment Randomized Trial (SMART)
In adaptive interventions,8
which are also called by other names such as stepped care strategies,9,10
and expert systems,12
the dose of intervention components may be varied in response to characteristics of the individual or environment. These characteristics are called tailoring variables. The tailoring variable can be something stable like gender or ethnicity, or something that varies over time, such as stage in the Transtheoretical Model,12
attitude or even progress toward a treatment goal. When the tailoring variable changes over time and there are repeated opportunities to adapt the intervention, this is called a time-varying adaptive intervention. In adaptive interventions, dosage is assigned to individuals based on a priori decision rules that link values on the tailoring variables to specific intervention dosages. See Collins, Murphy, and Bierman8
for a discussion of advantages of adaptive interventions as compared to fixed interventions.
For example, suppose a smoking cessation program includes both positively-framed messages (e.g. “Quitting smoking will help you feel healthier”) and negatively-framed messages (e.g. “Continuing to smoke will increase your risk of serious health problems”). Further suppose that it is expected that those in the precontemplation stage of the Transtheoretical Model are more likely to initiate a quit attempt if presented with a negatively framed message, whereas those in the contemplation stage are more likely to try to quit smoking if presented with a positively framed message. In this example, an individual’s stage in the Transtheoretical Model is the tailoring variable. An adaptive intervention would measure the tailoring variable, i.e. assess whether a smoker is a precontemplator or a contemplator, and deliver a negatively or positively framed message accordingly. A time-varying adaptive intervention would assess this at several different occasions, and once the individual moved from precontemplator to contemplator, would switch to delivering positively framed messages. The strategy “If precontemplator, use negative message framing; if contemplator, use positive message framing” is a decision rule.
The e-health approach lends itself naturally to adaptive interventions. One potential difficulty associated with adaptive interventions is that if decision rules are complex, delivery can be more logistically challenging than that of a comparable fixed intervention. However, a great advantage of e-health is that it can make delivery of even complex time-varying adaptive interventions relatively straightforward. When an e-health approach is used, assessments of tailoring variables can be done electronically, for example, by means of on-line questionnaires and immediate scoring algorithms, and programming algorithms can be used to automate variation of intervention program content or aspects of intervention delivery in response to the tailoring variables. The procedure can be repeated periodically, or as often as each time the individual has contact with the computer program.