|Home | About | Journals | Submit | Contact Us | Français|
I take issue with the article by Ferreira et al. 1 The authors’ conclusion that “there was no evidence indicating that contraceptive counselling is effective in increasing acceptance and use of contraceptive methods after an abortion” is correct, but the method used to arrive at this conclusion was flawed. The publication of not only the findings but also the methods sets a dangerous precedent, as researchers are now encouraged to use this same flawed method, specifically the combination of the Jadad score to evaluate trial quality and the cutoff of three as indicating a high quality trial.
Let us suppose that the three trials in this case happened to show that counselling is effective. Since each trial is rated as high quality, we would have accepted this result, without question. But are these really high quality trials? The Jadad score has already been exposed for failing to recognise the flaws in some rather bad trials 2,3. In essence, it artificially singles out five holes in the dike to plug, and plays on the inability of the general public (and, sadly, even researchers who should know better) to distinguish between necessity and sufficiency. When there are 20 or so holes in the dike, the five in question surely must be plugged; this is necessary. But it is hardly sufficient. Nor, for that matter, is it sufficient for a trial to randomise, mask, and describe withdrawals (though these are all necessary). There are many other design and analysis features that can completely invalidate the results of such a trial.
And yet here we do not even need to search for these additional elements. The few elements considered by Jadad will suffice for our purposes, since these three trials all failed these elements, yet by accepting three out of five as constituting high quality, we tacitly accept the old college try as a stand in for true rigour and quality. None of these three trials were masked, as the authors 1 noted. In addition, none used the intent-to-treat approach, which would have required that all randomised patients be analysed, so all five trials mishandled the withdrawals, even if they were described. Now we may ask: Were any of these trials truly randomised?
The first one 4 said nothing more than that random number tables were used, and ended up with extremely low p-values in Table 1 for comparing the groups at baseline with regard to age (p=0.000), number of children (p=0.019), and previous abortion (p=0.002). The lack of description coupled with the obvious baseline imbalances throws into question whether this truly was a randomised trial. Huge numbers of drop-outs escaped analysis; the latter was at best approximate, instead of exact. On so many levels, this was a methodologically failed trial.
The second trial 5 also had numerous drop-outs excluded from the analysis, also used approximate instead of exact analysis, and randomised in clusters rather than by patient, without a full account of exactly how this was done. The third trial 6 did not even randomise at all, as the authors specified that contrary to randomisation, “For every two women one was assigned to the experimental group and the next to the control group, in alternative order.” Alternation is most certainly not randomisation 7. This trial also failed to use the intent-to-treat population, but had fewer drop-outs than the other trials, and did at least use exact statistical analyses.
Because each of these trials had at least one serious methodological flaw that by itself can throw the results into question, these are not the trials one would want to use to set policy. It is fortuitous that the results are negative, but what if they had been positive? The right precedent needs to be set, and this involves evaluating trial quality properly, instead of with the fatally flawed Jadad score. Only when flawed trials are recognised as such and hence not taken into consideration, will we base conclusions on solid evidence. How, then, should trial quality be evaluated? Clearly, more than five elements are needed. Consideration needs to be given to 1) whether the randomisation was appropriate (meaning not too predictable, as permuted blocks would be, for example), 2) whether masking was successful (not just claimed), 3) whether the true (not modified in any way) intent-to-treat approach was used for analysis, 4) whether analyses were exact instead of parametric, 5) whether the endpoints used were pre-specified, maximally informative (not arbitrarily dichotomised), and clinical (not surrogate) endpoints, 6) whether baseline data were truly measured prior to randomisation, and 7) whether a pre-randomisation run-in period was used to ensure that the study is based on a biased sample. This list is not exhaustive, but it does at least represent a good start.