In the following sections the authors will elaborate on important aspects of designing and reporting a two‐group comparison whether it is a randomized trial, or a cohort study. For those with a particular interest in cohort studies we will encourage them to scrutinize the highly informative ‘Reader's guide to critical appraisal of cohort studies
’ published in 2005 in British Medical Journal.8‐10
Likewise, we remind authors of observational studies to consult the STROBE statement on how to report cohort, cross‐sectional, and case‐control studies in biomedical journals.7
As a consequence of trying to exclude the technicalities involved in cohort studies, we will in the latter part focus on what to consider when preparing a report for a controlled trial. Two documents are highly endorsed: The CONSORT statement (used by IJSPT),6
as well as the international guideline on statistical methodology (E9) to the collection of ‘International Conference on Harmonisation’ (ICH E9) on statistical principles for clinical trials.11
Once a “clinical question” has been raised, the first step in the conception of a trial is to develop a trial protocol. Protocol development has evolved over the last decades. Writing a protocol for a trial is both an intellectual, rigorous, and creative task, which is usually done by researchers and experts in various areas of research including medical, scientific, statistical, ethical, administrative fields, and maybe even regulatory bodies. The protocol serves as the foundation for study conduct and reporting. Full knowledge of a trial protocol allows an appropriate ethical assessment before trial inception, and the proper critical appraisal of the results after trial completion. The SPIRIT initiative (Standard Protocol Items: Recommendations for Interventional Trials) are currently preparing a statement that will most likely include a final 33‐item checklist that will apply in the future when researchers design and register their trials. This SPIRIT checklist will support the CONSORT statement, and explicitly ask for details about important features of the pre‐specified protocol.
It is recommended that the protocol include at least the following domains: Administrative information; Introduction – including background, rationale, and objective(s); Methods (#1): Participants, interventions, and outcomes; Methods (#2): Assignment of interventions – including sequence generation, concealed allocation, etc.; Methods (#3): Sample size, data collection process, management, and statistical analysis; Methods (#4): Data monitoring; Results: Propose a preliminary outline of the study report; Ethics and dissemination; and finally appendices to all of the above. Independent of these proposed protocol items, we strongly encourage trialists to follow the CONSORT checklist when designing the study.
Before launching a trial, it is important that the researcher considers whether the hypotheses are stated in advance and planed to be evaluated to ‐confirm the hypotheses (i.e., a confirmatory trial), or whether the researchers want to explore different clear and precise objectives (i.e., exploratory trial). Unlike confirmatory trials, exploratory studies may not always lead to simple tests of predefined hypotheses. With this in mind it seems that exploratory studies, as they are more flexible, may be the “better strategic choice”; however, be aware that an exploratory study can never confirm whether e.g. a physical therapy intervention is effective or not. Decision makers would likely argue, that a given intervention that shows promising results in exploratory studies would need at least one (phase‐3‐like) confirmatory ‘landmark study’.
The design of the trial should be described, for example, as parallel group, cluster randomized, crossover, factorial, superiority, equivalence or non‐inferiority design, or some other combination of these designs.6
The most common RCT design is the parallel group trial, in which participants are randomized to one of two (or more) interventions, with each arm being allocated a different treatment. The assumptions for these trials are less complex than for most other trial designs.
In a cross‐over design, participants are randomly allocated to different sequences of treatments, and thereby act as his or her own control for treatment comparisons. In the simplest 2×2 cross‐over design each participant receives each of two treatments in a randomized order in two successive periods, usually separated by a washout period. Cross‐over designs have a number of caveats that can invalidate their results. The major concern is “carryover effects” where the first treatment given is able to influence the “response” in the subsequent treatment. The statistical consequence might be that the effect of unequal carryover will be to bias direct treatment comparisons.
In a factorial design, participants are assigned to more than one treatment‐comparison group, enabling two or more treatments being evaluated simultaneously. The simplest example is the 2×2 factorial design where participants are randomly allocated to one of four possible combinations of two treatments (e.g. A and B). This allows the statistical model to statistically conclude from A alone; B alone; both A & B; neither A nor B. Usually the statistical focus in studies with factorial design will be on examining the interaction between A and B.
For each of the following paragraph headings, that are all directly associated with the protocol the authors remind researchers to report what they anticipate will characterize the participants rather than waiting for the final data. This includes explicit eligibility criteria (i.e., both inclusion and exclusion criteria) for participants and the settings where the data will be collected. RCTs address an issue relevant to a particular population or group with the condition of interest. Participant eligibility criteria may relate to demographics, clinical diagnosis, and comorbid conditions. A clear description of the trial participants and setting in which they will be studied is needed to allow future readers to assess the external validity (generalizability) of the trial and determine its applicability to their own setting.
Interventions and blinding
Detailed information on the interventions intended for each group including the essential features of the experimental and comparison interventions (e.g., control group) should be described. Authors should report details about the interventions, e.g., dose, route of administration, duration of administration, surgical procedure, or manufacturer of inserted device. When describing the interventions it is very important to report whether or not participants care givers, and those assessing the outcomes will be blinded to group assignment.12
Blinding refers to the practice of keeping the trial participants, care providers, data collectors, and sometimes those analyzing the data, unaware of which intervention is being administered to which participant, to reduce the risk of bias. Authors should avoid using terms such as “single” or “double” blind as such terms are not precise enough.
It is important to clearly define what will be the primary outcome for the trial, as well as what are the secondary outcomes. The primary outcome ('primary variable', ‘target variable’, ‘primary endpoint’13
) should be the variable capable of providing the most clinically relevant and convincing evidence directly related to the primary objective of the trial and is usually the one variable used for the sample size calculation. Most trials have several outcomes, some of which are deemed more important than others.3
Such rankings are typically reported as primary and secondary outcomes. Ideally there should only be one primary outcome. Authors should explicitly state the primary outcome for the trial and when it will be assessed (e.g., the time frame over which it is measured).
Secondary outcomes are either supportive measurements related to the primary objective or measurements of effects related to the secondary objectives. Their definition in the protocol is also important, as well as an explanation of their relative importance and roles in interpretation of trial results. Generally, the number of secondary outcome variables should be limited and should be related to the limited number of questions attempted answered in the trial.
The number of participants in a trial should always be large enough to provide a reliable answer to the research questions addressed. The number of participants randomized to each intervention group is an essential element of the results of a trial. This number defines the sample size, and readers of the published article can use it to assess whether all randomized participants were included in the study and subsequent data analysis (referred to as the “Intention To Treat” (ITT) population).14
Investigators calculate sample sizes before the start of their trial and adequately describe what went into the calculation in details in their protocol as well as their published report. In these a priori
calculations, determining the effect size to detect—e.g., difference between means (MD = MI
), or risk difference calculated from the proportions who respond (RD= pI
) — reflects inherently subjective clinical judgments.15
The term treatment effect or effect size generally means – for efficacy outcomes ‐ the net benefit, of applying intervention ‘I’ compared to intervention ‘C’. Typically the type of outcome data used for sample size estimation is either ‘binary data’ (where the participant can have a response yes/no), or it can be estimated from ‘continuous data’ (where the variable is typically guestimated from an expected mean value and a corresponding standard deviation). The authors need to decide a priori
what kind of clinical net benefit (difference between groups) they would expect from the intervention, with focus on clinical relevance.
When determining the appropriate sample size, the following items should be considered and specified: (#1) a primary outcome (i.e., the “name” and whether it is binary or continuous by nature); (#2) the test statistic, (#3) the null hypothesis, and the alternative hypothesis (i.e., the reason for the study); (#4) the probability of erroneously rejecting the null hypothesis (the type I error, i.e., the statistical p‐value); and (#5) the probability or erroneously failing to reject the null hypothesis (the type II error, i.e., 1 – the type II error is referred to as the statistical power of the study). When determining the sample size for a trial it is also important to consider the protocol approach to dealing with treatment withdrawals and protocol violations.
In an excellent tutorial paper, published in the Lancet, Schulz and Grimes argue that the subjective judgments needed from the authors (i.e., content experts) to be able to estimate the sample size ‐ or maybe the statistical power if the number of participants is already determined – is necessary for the trial to be trusted.15
Realizing that these judgments greatly affect sample size calculations, Schulz and Grimes question the branding of trials as unethical on the basis of an imprecise sample size calculation process. They claim, that, some shift of emphasis from a fixation on sample size to a focus on methodological quality would yield more trials with less bias;16
unbiased trials with imprecise results trump no results at all.15
Investigators should make sure to describe how participants will be randomized and allocated to the different interventions. It is important to conceal the allocation sequence from those assigning participants to the intervention groups. Allocation concealment prevents investigators from influencing which participants are assigned to a given intervention group (i.e., selection bias). Evidence shows that reports of trials reporting inadequate allocation concealment are associated with exaggerated treatment effects. Authors should clearly describe in the protocol the method for assigning participants to interventions. Examples of approaches used to ensure adequate concealment include: centralized (e.g., allocation by a central office) or pharmacy‐controlled randomization; sequentially numbered identical containers that are administered serially to participants; on‐site computer system combined with allocations kept in a locked, unreadable computer file that investigators can access only after the characteristics of an enrolled participant are entered; and sequentially numbered, opaque sealed envelopes.6
Medical research is carried out on selected individuals, although the selection criteria are not always clear. As indicated above, each of the mentioned paragraphs could – if manipulated with – lead to biased results and maybe even biased conclusions. In a statistical analysis plan, emphasis should be on which analyses, comparisons, and statistical tests have been planned ‐ given the objective of the study. The statistical methods section should include all the principal features of the proposed confirmatory analysis of the primary variable(s) and the way in which anticipated analysis problems will be handled. In the case of exploratory trials this section could describe more general principles and directions.
The set of participants whose data are to be included in the main analyses should be defined in the statistical section of the protocol. If there are any planned reasons, for excluding from analysis participants for whom data are available, these should be described. Some trials use terminology for these different scenarios (Analysis sets), such as: (#1) Full analysis set: The set of participants that is as close as possible to the ideal implied by the ITT principle. It is derived from the set of all randomized subjects by minimal and justified elimination of subjects; and, (#2) Per protocol set (valid cases, efficacy sample, evaluable participants sample): The set of data generated by the subset of participants who complied with the protocol sufficiently to ensure that these data would be likely to exhibit the effects of treatment, according to the underlying scientific model. In general the ITT population (full analysis set) should be considered for all primary analyses. However, it is advantageous to demonstrate a lack of sensitivity of the principal trial results to alternative choices of the set of subjects analyzed.
When writing your statistical analysis plan, realize that data can be analyzed in many ways, although some of which may not be strictly appropriate in the particular situation. It is essential to specify which statistical pro‐cedure will be used for each analysis (given your objective and anticipated imaginary data set). Later in the full report (article) further clarifica‐tion may be necessary in the results section, but this should never be in conflict with the analyses proposed in the protocol. When considering how elaborate such a paragraph needs to be, the principle is to describe statistical methods with enough detail to enable a knowledgeable reader with access to the original data17
to verify the reported results.
Most trial objectives result in statistical analyses yielding estimates of the treatment effect, which is a contrast between the outcomes in the comparison groups. These group contrasts should be followed by a confidence interval (usually 95% CI) for the estimated effect, which indicates a central range of uncertainty for the true treatment effect. Study findings can also be assessed in terms of their “sta‐tistical significance”. The “p‐value” represents the probability that the observed data could have arisen by chance when the interventions did not truly differ. Which is why “very small” p‐values (e.g., p<0.0001) indicate that it is highly unlikely that the interventions on trial are equally good18
; actual p‐values (for example, p=0.031) are strongly prefer‐able to imprecise threshold reports such as p<0.05.
Finally, standard methods of analysis assume that the data are “independent.” For controlled trials, this usually means that there is one observation per participant. Treating multiple observations from one participant as independent data is a serious error; such data are produced when outcomes can be measured on different parts of the body, as in dentistry or rheumatology.6,19
Data analysis should be based on count‐ing each participant once (e.g., as a single change from baseline) or should be done by using more complex statistical procedures.20,21