Trials that assess non-inferiority require rigorous methods for their design, analysis and interpretation. Although the design and the sample size were appropriate for AIDS non-inferiority and equivalence trials, there is room for substantial improvement regarding statistical analysis and interpretation of the results.
Patients with HIV infection would be harmed by deferral of therapy. Consequently, the use of placebo would be unethical [2
]. Even if placebo-controlled of HAART therapy are not available, a conclusion about efficacy can be reached because the great majority of patients (about 70%) will not be controlled without treatment [4
]. Because significant inferiority to active-control would be a major problem for patients, the non-inferiority margin for a new drug should be smaller than the difference between active-control and placebo. Because this effect size is so large, only the clinically chosen margin is really an issue, but is also highly subjective. As a result, this margin varied from the conventional 10% up to 15%. Even the same study group chose different margins in studies 903 (10%) and 934 (13%). A small decrease in margin provides greater assurance of satisfactory effect, but the cost of the study will increase because more patients are required. In the 903-study, the authors could not demonstrate non-inferiority at 10% but they point out in their discussion that this margin was more stringent than the 12% chosen in CNAAB4005. However, if the authors had chosen the less powerful 12% as the maximal limit for non-inferiority, the 95% confidence interval would have been wider, possibly beyond the 12% limit. Consequently, data-driven discussion about the non-inferiority margin after completion of the study is pointless.
Blinding has been described as less efficient in non-inferiority than superiority trials, in particular if the primary endpoint is subjective[31
]. For example, a blinded investigator could bias the results toward a preconceived belief in equivalence by assigning similar ratings to the treatment responses of all patients, giving a "bias toward the null". Even when the primary outcome is objective (viral failure, clinical progression or death), however, we believe that blinding is important to protect against bias. Unblinded investigators may provide other effective therapies to patients in the arm that they believe superior or equivalent, such as more regular appointment or adherence support. In addition, patient or physicians may overinterpret subjective endpoints such as side-effects in open-label studies. Finally the absence of blinding can distort the comparability of the groups regarding study withdrawal or patients' adherence, since patients participating in a non-inferiority trial may prefer to receive the simpler therapy. Among the studies observed, significantly more patients discontinued the ALIZE study medication in the control arm for personal reasons, as compared with the simpler, once-a-day experimental group (11% versus 2%, P < 0.0004). This may influence outcome, particularly in an ITT analysis, where withdrawals are considered as failure. Another example comes from the results of the 934 study, where adherence to treatment differed significantly between groups. The conclusion about superior efficacy of the experimental arm in the 934-study may be in part the consequence of greater exposure to the experimental drug. On the other hand, blinding can stand in the way of an optimal drug dispensation in non-inferiority and equivalence trials, in particular if the aim is to simplify antiretroviral therapy. For example, if the purpose is to offer simpler dosage or fewer pills as compared to standard therapy, blinding may require similar regimens in both arms so that any advantages of simplification would be eliminated.
Exclusion of patients after they have been randomized sacrificed the validity of "on-treatment" analysis because it may cause major bias regarding group comparability. For this reason, intention-to-treat analyses has been recognized as the most appropriate and conservative strategy to analyse data of double-blinded trials. However, in case of non-inferiority and equivalence trials, it is well known that this method lacks of robustness since not conservative. For this reason, the study interpretation should also be complemented by "on-treatment analysis"[1
]. If there are discrepancies in the results regarding equivalence or non-inferiority, this should be reported and acknowledged. The CNAAB3005 illustrated how apparent equivalence can be the consequence of a dilutional effect of comparing 2 treatments in the ITT (527 patients) when only 54% of the patients where on-treatment. The same could apply to the ESS40013 study. The use of an "overall" log-rank testing superiority within the 3 arms in the NEFA study may also have blurred the lower efficacy of one study arm, as demonstrated by the "head-to-head" comparison between abacavir and efavirenz.
Like in superiority trials, the choice of the primary outcome is also critical in non-inferiority trials. The BMS-045 illustrated how statistical non-inferiority for viral log difference can be compatible with up to 20.4% of additional virologic failure in the experimental arm, a percentage much larger than non-inferiority margins usually selected for this outcome in this setting.
Finally, the majority of the studies concluded that the effect of at least one experimental arm, based on their prespecified margin, was similar to the control. However, only half of these studies actually demonstrated non-inferiority. Prespecifying the non-inferiority or equivalence margin is necessary but not sufficient to guaranty methodologic quality and appropriate conclusion. We confirmed that AIDS trialists had low adherence to non-inferiority and equivalence methodological standards, as it is the case in other fields[28
]. An antiretroviral drug may not prove non-inferiority in term of efficacy but nonetheless be a good alternative because the observed difference is small and the new drug demonstrates better tolerance. This interpretation should, however, be left to the reader. To allow a risk-benefit assessment to be made, the report has a particular obligation to be as clear as possible, using standard statistical vocabulary for non-inferiority and equivalence trials, in compliance with the CONSORT statement.