In this review, we found five main issues in the design, analysis and reporting of NI trials. First, many of the trials were open label trials. Second, reporting the method to determine the NI margin was infrequent and limited. Third, most of the trials analyzed their data with one statistical analysis method; ITT or PP. Fourth, we observed that only few trials included a placebo-arm to confirm assay sensitivity and that only few trials discussed the constancy assumption. Lastly, we did not observe any difference in terms of reporting in NI trials published before or after the release of the extension of the CONSORT statement for NI trials in 2006.
In our review, about a third of the trials were open label trials. This surprising finding was not consistent with the guidelines
that suggest to use blinding whenever possible to minimize the risk of bias. This leads to discussion on the importance of blinding in an NI trial. Snappin believes that blinding only gives minor protection in NI trials, since a blinded investigator with a preliminary belief in non-inferiority of the test drug can bias the result by assigning similar ratings to the treatment responses of all patients. 
There is no doubt, however, that blinding does offer protection against information bias. In addition, there will usually be endpoints (e.g. safety) for which differences are expected and for which blinding will ensure stronger evidence. We therefore conclude that blinding is still important in NI trials to avoid bias. If blinding is not possible, subjective endpoints need to be avoided and more stringent monitoring should be conducted.
The method to determine the NI margin was not reported in more than half of the trials. This finding is consistent with previous reviews in 2005 to 2006, where the methods were presented in 46% or less of the trials.
Apparently, the extension of the CONSORT statement in 2006 has not brought any significant impact yet. Furthermore, the statement has suggested that the NI margin should be preferably justified on clinical grounds and its relation to the effect of the reference treatment relative to placebo in any previous trials should be noted.
We found that most of the authors included a statement that the NI margin was a clinically acceptable difference, but only three trials mentioned that the margin was validated by a panel of clinical experts. This finding was consistent with other reviews
, where many trials claimed that their margin was clinically relevant without any clear details how the clinically acceptable NI margin was chosen. Putting merely a statement that the margin was determined based on clinically acceptable difference is not sufficient for any subsequent trial replications. Thus, more details are needed in the description on how the NI margin was determined. Furthermore, a detailed description on how the margin was determined can help the reader to decide whether the NI margin and the rationale for the margin's choice influenced the validity of the results.
We observed in anti-infective drug trials, that most of them used a constant difference of 10–20% in treatment difference as their NI margin. Regulators recommend an NI margin of 10% for vaccines and anti-bacterials. 
This margin of 10% is acceptable as long as the primary outcome of interest has a high incidence rate. The implication of using a 10% constant margin in vaccines and anti-infective drugs should be further explored and any improvement on the guidelines to determine NI margin should cover this issue.
We observed that most of the trials reported the result only from ITT analysis or PP analysis. Our results were consistent with a previous review that observed that more NI trials used ITT rather than PP.
We also observed that ITT analysis was more reported in high-impact journals. The CPMP guidelines and the new draft FDA guidelines for NI trials already stated that both analyses have equal importance in NI trials. For superiority trials, ITT analysis is the preferred analysis as it adheres to randomization 
and might best reflect clinical practice. PP analysis might violate randomization and not reflect clinical practice very well. Several reviews with RCT simulation showed that both ITT and PP could be problematic in NI trials, especially if the trial had large number of non-compliance.
In addition, in our data, we did not observe any evidence that ITT will lead to more NI conclusions than PP. We conclude that both analyses are equally important, as each approach brings a different interpretation for the drug in daily practice.
We observed that only a small number of trials included placebo arms to support assay sensitivity. Although our data did not provide sufficient evidence whether the use of placebo was appropriate or not in the trials, we believe that the use of a placebo arm was probably not ethically feasible in most studies. Nonetheless, the non-inferiority result of the drugs in NI trials might bear two meanings: both drugs are equally effective, or both drugs are equally ineffective against placebo. In this sense, a placebo arm in an NI trial will enable evaluation whether both drugs in the trial are effective, if the trial shows non-inferiority. Alternatively, if the use of a placebo arm is not possible, the trial should choose a margin that assures that the estimated effect of the new drug is likely to be superior to placebo, under the constancy assumption for the active comparator. The readers, not only the investigators, also need to be aware of this issue of assay sensitivity in interpreting the result of NI trials. They need to consider the type of endpoints; the number of patients in the final analysis; reasons of patient's dropouts; the similarity of the trial with the previous trial(s) that established the efficacy profile of the comparator; and the constancy assumption of the data used as reference for the NI margin. Based on our review, two of the latter were only being reported in a small numbers of the articles.
Less than five percent of the trials in our review mentioned whether the trials were designed similar to relevant past trial(s). Thus, it was difficult to assess whether the historical data that were used for determining the NI margin were reliable. Since the validity of the NI margin is related to the interpretation of the NI trials, clear reporting of the method of NI margin determination and the constancy assumption is essential for every NI trial publication. It is impossible to check the validity of the constancy assumption without a parallel placebo arm. However, at minimum, it is possible to check whether the current NI trial was similar to previous trial(s) that estimated the efficacy of the active comparator. 
We found no difference between reporting before and after the release of the extension of the CONSORT statement on NI trials. Furthermore, in general, there is no difference in adherence to the CONSORT statement between the high-impact and the low-impact journals. The overall low adherence to the statement might be due to unfamiliarity of the authors, referees, and editors of all of the journals with the statement extension. Researchers and editors of journals should be more aware of this extension and should comply with its recommendations. We realized that it might be too early to see full adherence of the CONSORT statement extension after 3 years, but due to the reputation of the CONSORT statement itself, we considered it reasonable to expect a certain degree of improvement.
Our review has some limitations. First, we excluded several trials since we only used a random sample of all NI trials that we identified. However, as this was a random sample, this will not have influences our results. Second, we only used PubMed to identify NI trials; therefore, we might have missed some trials. However, we assume that NI trials retrieved from PubMed do not have different methodological characteristics than NI trials in other databases, so we do not think that this influenced our results. Third, since the terms that we used to search for non-inferiority trials were not standard MESH terms and our search for those terms was limited to the abstract of the articles, our search might not have captured all NI drug trials available in PubMed. Also for this selection, we expect that the NI trials that we found are not different from the NI trials that we did not capture with our search. A strength of our study is that we did not only focus on the NI margin, as previous reviews
did, but also evaluated other methodological aspects of NI trials. In addition, we evaluated the quality of reporting using the current guidelines from the CONSORT statement.
In conclusion, the conduct and reporting of NI trials can be further improved. Particularly, in terms of maximizing the use of blinding, the use of both ITT and PP analysis, reporting the similarity with the previous comparator's trials to guarantee a valid constancy assumption and reporting the method to determine NI margin.