Adjustment of sample size based on information about additional parameters
The power of a treatment study depends on the treatment effect and additional parameters. In the case of a continuous outcome (e.g. a decrease of blood pressure measured in mmHg), the power depends on the difference between the means and the population variance. In case of a binary outcome the power depends on the risk difference or risk ratio and the disease prevalence, and, finally, in case of a survival or another time-to-event outcome it depends on the hazard ratio and the event rate. An interim analysis allows us to detect that the original assumptions about the additional parameters used to complete the sample size calculation may be incorrect, and, hence, we can consider adjusting the planned sample size. However, this is usually not the main issue in an interim analysis for a treatment study as the influence of the additional parameters is often relatively limited compared with other factors, for example, the recruitment rate.
In paired diagnostic studies, on the other hand, the need for sample size adjustments is much greater. The simple reason for this is that the sample size required to reject certain hypotheses on the difference in sensitivity and specificity or to reach a certain precision in the corresponding estimates depends heavily on the degree of agreement between the two diagnostic procedures [24
]. For example, if we assume a true difference of 10% in sensitivity between two modalities (and a significance level of 5%), a study including 100 patients can have a power of 56.7% if the agreement rate between the two modalities is 80%, and a power of 94.0% if the agreement rate is 90% [30
]. The higher the agreement rate, i.e., the lower the number of patients showing different results, the higher the power (see Table-6 in [30
]). Nonetheless, it is very difficult to achieve a reliable estimate of the degree of agreement between two diagnostic procedures when planning a diagnostic study. Typically, we compare a new method with a standard procedure with which it has never been compared head-to-head. Therefore, the study is likely to be conducted initially with an incorrect assumption. Then, in an interim analysis, the true degree of agreement can be easily estimated even if the gold standard results are not disclosed. A discrepancy between the original assumptions and the interim results can be detected, and it may be deemed necessary to adjust the sample size. In an earlier publication, we made a concrete suggestion for how to perform such an adjustment [30
], as will be exemplified later. A further source for a need of sample size adjustments may be updated information on the prevalence gleaned from the interim analysis.
Early decision without early stopping
A treatment study can be stopped early due to futility or fertility if the respective predefined stopping criteria have been met at interim, but the study will be continued if no conclusive results have been achieved. Investigating for example the ability of a new compound to decrease high blood pressure (measured in mmHg), the decrease of blood pressure is a mandatory condition for the success of this compound. If it fails to demonstrate this ability, the clinical development for this compound is over (at least for that particular indication). So, in treatment studies there is a clear relationship between early decision and early stopping: if we are sure about the inferiority or the superiority or the new treatment, the study must be stopped at once, as it would be unethical to continue to randomize and to offer half of the patients inferior treatment.
The situation can be very different in a paired diagnostic study. Firstly, accuracy is a two-dimensional concept described typically by sensitivity and specificity. In an interim analysis we may come to a firm conclusion with respect to one parameter, but not with respect to the other, in particular if the prevalence is not close to 0.5. Then, the power to determine sensitivity in diseased patients and specificity in diseasefree patients will differ. Secondly, whereas an improvement in diagnostic accuracy is regarded as sufficient for a change of the standard clinical routine, regulatory authorities require often also proof of clinical benefit, in particular when it comes to reimbursement decisions [2
]. This may imply that we, for instance, may be able to demonstrate an increase in accuracy at the lesion level in an interim analysis, but have to continue the study in order to demonstrate also a benefit at the patient level (as will be exemplified later). Thirdly, even if we have demonstrated an improved diagnostic accuracy of the new modality, the question may remain whether we can become even better by applying both modalities in the future. This question requires typically a larger sample size than the comparison of the two modalities, as the gain from applying both procedures jointly is typically smaller than the difference between the two procedures. Hence, in diagnostic studies there can be many good reasons to perform an interim analysis reaching some firm conclusions of general interest to be published, but nevertheless to continue the study anyway. Consequently, interim analyses in diagnostic accuracy studies have to be considered differently from interim analyses in treatment studies. The central question is not whether to stop or to continue the study, but whether some results are already statistically significant and convincing enough to be made public whereas for others we have still to continue and wait and see. Therefore, we can regard interim analyses as a monitoring tool of important aspects of a study with the plan to continue until the last question of interest has been answered.
Of course, such a strategy has to be fixed a priori already in the study protocol. Ethical objections need also to be addressed, e.g. if we plan to continue to use an invasive procedure which has been proven to be inferior at the lesion level in order to confirm the result at the lesion level. Minor harm like exposure to a limited radiation dose may be justified by the continuing advantage for the patients have their care based on the results of both modalities.
Keeping results from interim analyses secret
In treatment studies, blinding at the individual levels ensures unbiased assessment of treatment outcome and adverse effects in each patient. Moreover, it protects the integrity of a trial by preventing premature conclusions: if blinding is successfully done and kept, there is little danger for rumors to arise about the results of the study. Consequently, results of interim analyses will typically be kept secret (except if the trial is stopped) to avoid any disturbance of the ongoing trial.
The situation is different in paired diagnostic studies. As pointed out above, it is one of the advantages of paired diagnostic studies that patients can benefit from the results of both modalities. Hence, it is common practice that the treating physician knows both results, and, typically that he or she will also know the result of the gold standard procedure or at least the follow-up data of the patient. In either case treating physicians may experience a success or failure of the modalities in some patients, and, hence, there is a risk for developing an opinion about the superiority of one modality over the other. If treatment decisions are made in interdisciplinary conferences, even more people than just the treating physician may experience successes or failures or at least discrepancies between the modalities. Therefore, there is some risk for rumors about the accuracy of each modality to arise.
This increased risk – compared to the classical treatment trail – may change our attitude about keeping the results from interim analyses secret. It may be an advantage to make the results of the interim analysis available for some or all people involved in the study just to avoid that rumors may reduce the willingness of clinicians or patients to participate in or to cooperate to the study. A common, but correct knowledge about the current state of the results with a clear indication of what has already been shown and what still needs to be shown may give a solid basis for a continued smooth conduct of the study and for a uniform interpretation of the results of the modalities in each patient.
Of course, such officially communicated and balanced information does not guarantee avoidance of any bias, as this information may still have an impact on those who evaluate the results of the diagnostic procedure. At the end of the day, we have to choose carefully between two options: a limited, but uncontrolled and heterogeneous spread of information (rumors, no interim analysis) or a controlled, but broader and homogeneous spread of information by means of preplanned interim analyses.
In a prospective diagnostic study at our institution we are currently investigating the detection of bone metastases from prostate cancer in patients with histologically confirmed prostate cancer and at least one bone metastasis at 99m
Tc-MDP whole body bone scintigraphy and no prior or active androgen deprivation [32
Tc-MDP whole body bone scintigraphy is currently the method of choice for detecting bone metastases in these patients. However, the sensitivity and specificity of this image modality is suboptimal, and, therefore, we, as others, are looking for new diagnostic tools. We have examined the value of PET/CT with 18
F-fluorocholine, using MRI as a reference. For this interim analysis, data were collected from 42 consecutive patients from April 2009 to July 2011. The study was planned to be evaluated on a per-lesion basis, with positive and negative findings indicating malignant and benign lesions, respectively. However, due to recent developments about demands for demonstrating clinical benefits as well, we are today also interested in a patient-based analysis.
Both a lesion-based and a patient-based analysis were performed in the interim analysis. The per-lesion-based analysis using 18F-fluorocholine showed a highly significant and clinically very relevant improvement in sensitivity, and even a moderate improvement in specificity ().
Sensitivity and specificity of 99mTc-MDP whole body bone scintigraphy and 18F-fluorocholine PET/CT on a per-lesion basis, using MRI as a reference.
The results of the per-patient-based analysis with respect to the presence of at least one malignant lesion were less clear ().
Results of 99mTc-MDP whole body bone scintigraphy and 18F-fluorocholine PET/CT in relation to the respective MRI results on a per-patient basis.
The difference in sensitivity was estimated to be 4.4% with a 95% confidence interval of [-19%, 28%]. Its width of 47% indicates a rather imprecise estimate of the difference in sensitivity.
The results on the lesion level are very convincing, and there can be no doubt that the scientific community should be informed about this in a publication as soon as possible. However, assessment of the clinical benefit with respect to patient-related outcomes should be performed at patient level, and here the results are still inconclusive. So, a continuation of the study seemed justified.
The change from the originally lesion-based to a patient-based analysis has of course some impact on the sample size. If we stick to the original sample size assumptions with a power of 80% to detect a difference in sensitivity of 10% points, the formulas provided by Gerke et al. [30
] would indicate that a total of 92 patients with at least one malignant lesion would have to be included. As we were able to observe that about half of the patients in the study population had at least one malignant lesion (which was also much higher than originally anticipated), this suggests that an overall sample size of 184 patients would be necessary. However, these numbers are based on the agreement rate of 3/23 observed in the patients with a malignant lesion, and this estimate is rather imprecise. Therefore, we would recommend performing a further interim analysis after the inclusion of 100 patients to obtain a more stable estimate of the agreement rate and to determine the final sample size.