|Home | About | Journals | Submit | Contact Us | Français|
Objectives To see whether telling peer reviewers that their signed reviews of original research papers might be posted on the BMJ’s website would affect the quality of their reviews.
Design Randomised controlled trial.
Setting A large international general medical journal based in the United Kingdom.
Participants 541 authors, 471 peer reviewers, and 12 editors.
Intervention Consecutive eligible papers were randomised either to have the reviewer’s signed report made available on the BMJ’s website alongside the published paper (intervention group) or to have the report made available only to the author—the BMJ’s normal procedure (control group). The intervention was the act of revealing to reviewers—after they had agreed to review but before they undertook their review—that their signed report might appear on the website.
Main outcome measures The main outcome measure was the quality of the reviews, as independently rated on a scale of 1 to 5 using a validated instrument by two editors and the corresponding author. Authors and editors were blind to the intervention group. Authors rated review quality before the fate of their paper had been decided. Additional outcomes were the time taken to complete the review and the reviewer’s recommendation regarding publication.
Results 558 manuscripts were randomised, and 471 manuscripts remained after exclusions. Of the 1039 reviewers approached to take part in the study, 568 (55%) declined. Two editors’ evaluations of the quality of the peer review were obtained for all 471 manuscripts, with the corresponding author’s evaluation obtained for 453. There was no significant difference in review quality between the intervention and control groups (mean difference for editors 0.04, 95% CI −0.09 to 0.17; for authors 0.06, 95% CI −0.09 to 0.20). Any possible difference in favour of the control group was well below the level regarded as editorially significant. Reviewers in the intervention group took significantly longer to review (mean difference 25 minutes, 95% CI 3.0 to 47.0 minutes).
Conclusion Telling peer reviewers that their signed reviews might be available in the public domain on the BMJ’s website had no important effect on review quality. Although the possibility of posting reviews online was associated with a high refusal rate among potential peer reviewers and an increase in the amount of time taken to write a review, we believe that the ethical arguments in favour of open peer review more than outweigh these disadvantages.
Traditional peer review for scientific and medical journals has major flaws. Although the peer reviewer knows the identity of the author of the manuscript being assessed, the author doesn’t know the identity of the peer reviewer. Over recent years this form of peer review has come under increasing criticism, largely for its lack of accountability.1 2 Unscrupulous reviewers can delay or prevent the publication of work with which they disagree, or promote inappropriately the work of likeminded researchers.3 4 Worse still, reports exist of anonymous reviewers appropriating ideas and words from the manuscripts they have been reviewing.5
At an assembly to discuss reviewer anonymity organised in 1994 by the journal Cardiovascular Research, consensus emerged among commentators that open peer review was the way ahead.3 4 6 Part of the impetus for this move towards greater openness has been the belief that science needs to catch up with the rest of the world, where transparency of decision making is becoming the norm. For important decisions that affect us, we now expect to know who made them and how they arrived at their decision. “The editors, assisted by the reviewers, are judges,” says Drummond Rennie, a supporter of open peer review.5 “We have an ample history to tell us that justice is ill served by secrecy.”
A series of studies has examined whether reducing the asymmetry between reviewers and authors affects the quality of reviewers’ reports. An early randomised trial indicated that blinding peer reviewers to the identity of authors would lead to better reports,7 but two larger studies have since failed to confirm this.8 9
Researchers have therefore turned their attention to the effects of being as open with reviewers’ as with authors’ identities. The BMJ has embarked on a series of randomised controlled trials testing the effects of incremental increases in openness on the quality of peer reviewers’ reports. In previous studies we found that although revealing the identity of the reviewer to a co-reviewer had a small, editorially insignificant but statistically significant beneficial effect on the quality of the review,8 revealing the identity of the reviewer to the author had no significant effect.10
No trials have been published on the effect of peer reviewers being told that their review will be available to readers. Although some journals now share reviewers’ identities with authors, as far as we can tell the only publisher to reveal them to readers is BioMed Central. A recently reported survey found only three of 56 journals had what was described as open peer review.11
Although the peer review process has changed and reviewers for some journals have become used to more openness, evidence about the effect of opening review further is still required. Trust in the processes of science is still relevant, and reluctance remains, at least in some areas of medical research, to fully open up peer review.12 13 This study marks the third step in the sequence of trials at the BMJ. We set out to investigate the effects of reviewers’ foreknowledge that their signed review might be shared not just with co-reviewers and authors, but also with any interested reader.
This study was designed to determine whether reviewers who knew that their signed reviews might be posted on the BMJ’s website would produce reports whose quality differed from those of reviewers who knew the reports would be shared only with authors.
During the study, the journal followed its usual practice of seeking external peer review only for the minority of manuscripts that editors regard as having a reasonable chance of acceptance. Consecutive research manuscripts chosen by editors for external review between November 1999 and July 2000 were eligible for inclusion. Four potential reviewers of the paper’s clinical content were selected by one of the BMJ’s 12 editors, after which simple randomisation of manuscripts into either the intervention or the control group was performed using a random number generator. Details of the study were sent to the corresponding author, who was asked to consent to the inclusion of their paper in the study.
The first of the editors’ four potential reviewers was contacted with details of the study, including the title and author of the paper, and asked whether he or she was willing to review the paper. At this point, no information was given about the study arm into which the paper had been allocated. If the first reviewer declined, the next was contacted, continuing until a consenting reviewer was found. If none of the four potential reviewers consented, the paper was excluded.
Allocation was revealed when the full paper was sent to the peer reviewer. The intervention was therefore the act of revealing to reviewers that their signed report might appear on the BMJ’s website. Reviewers in both groups were also sent a questionnaire asking for their recommendation regarding publication and the time taken to review. Questionnaire responses were not shared with the authors.
The responsible editor was asked to assess the quality of the review on a Likert scale of 1 to 5 by using a validated review quality instrument (RQI).14 A second editor, selected from the other 11 editors using a random number generator, also independently assessed the quality of the review. Editors were blinded throughout to the group allocation. The author was sent a copy of the review, told that a decision on the manuscript had not yet been reached, and asked to assess the quality of the review using the same instrument.
A mean total score was calculated from the seven questions on the RQI. The primary outcome measure was editors’ scores calculated as the mean for the two editors. The secondary outcome was the reported time taken to review.
According to previous research, the standard deviation in quality scores is about 1.2.8 10 14 On this basis, the minimum difference between study groups regarded as editorially significant was a difference of 0.4 (10% of the maximum possible difference for a Likert scale of 1 to 5). With α=0.05 and β=0.1, 190 manuscripts in each group were required. Recruitment went beyond this number to allow for exclusions and non-return of evaluations.
When accepted papers in the intervention group were ultimately published, the reviewer’s signed comments were posted on the BMJ’s website alongside the relevant paper. In addition, we posted the manuscript as originally received, our editorial committee’s comments (if any), our statistician’s comments (if any), and the author’s explanation of how he or she had changed their original manuscript in the light of these comments. These constituted the paper’s entire “prepublication history.” For rejected papers, no documentation was posted on the website.
Comparisons of outcome measures were made using χ2 tests and paired and unpaired Student’s t tests. Tests for interaction were done using analysis of variance. Agreement between editors was assessed using weighted κ statistics. Subgroup analysis used stratification by ultimate acceptance or rejection, and reviewer recommendation of acceptance or rejection. Data collection and analysis were performed using Microsoft Access, Microsoft Excel, and Stata statistical software (version 11).
All 558 eligible manuscripts were randomised to either the intervention group (n=265) or the control group (n=293; figurefigure).). A total of 87 manuscripts (40 in the intervention group and 47 in the control group) were excluded after randomisation (table 11).). The most common reason for exclusion was the failure of any of the four selected reviewers invited to agree to review the manuscript. The outcomes of the requests for participation made to potential reviewers of the 525 papers remaining after exclusions (other than a failure to identify a reviewer after four attempts) are given in table 22.. Altogether, 1039 reviewers were approached and 471 (45%) agreed to participate (225 in the intervention group and 246 in the control group). Reasons why first reviewers declined are given in table 33.. No reviewers who had agreed to participate subsequently declined after the group allocation had been revealed.
Evaluations were received from two editors for all 471 manuscripts included in the study and from the corresponding author for 453 (96%) papers (213 in the intervention group and 240 in the control group). Sixteen (3%) reviewers did not make a recommendation regarding publication, and 15 (3%) did not say how long the review took.
Allocation to the intervention group had no significant effect on the likelihood that reviewers would recommend acceptance (135/219 (62%) v 136/237 (57%); difference 4%, 95% CI −5% to 13%) or that the paper would ultimately be accepted for publication (50/225 (22%) v 67/246 (27%); difference −5%, 95% CI −13% to 3%).
Forewarning reviewers that their signed reviews might be published alongside the paper (the intervention) had no discernible effect on the overall quality of their reviews (table 44).
Authors gave lower scores to reviews than editors, but the difference between the mean scores, 0.26, was less than 0.4, the minimum difference regarded as editorially significant. The 95% confidence interval (0.19 to 0.34) also excluded a difference of 0.4, although it was highly statistically significant. When the reviewer recommended rejection (although this was not revealed to the author), the author and editor had markedly different assessments of quality, with authors scoring reviews on average 0.6 (95% CI 0.48 to 0.72) points lower than editors (P<0.0001). These author-editor differences were consistent between the intervention and control groups.
Reviewers in the intervention group took on average 25 minutes longer over their review than reviewers in the control group (182 minutes (SD 135.2) v 157 (SD 101.9) minutes; table 44).). Reviewers in the intervention group took significantly longer than controls over reports on papers eventually accepted (41 minutes, 95% CI 2.0 to 78.7), but the difference was smaller and non-significant for papers that they recommended accepting (22 minutes, 95% CI −7.5 to 52.2); that is, those for which reviews would more likely be posted online.
Of the 115 papers formally accepted for publication, 50 were in the intervention group. The entire prepublication history (as defined in the Methods section) of 32 of these papers has been published on the BMJ’s website alongside the accepted paper (see web table A).
Warning peer reviewers that their signed reports might be published on the BMJ’s website alongside the accepted paper did not noticeably affect the quality of reports compared with those obtained under the journal’s usual procedure. Editors’ and authors’ assessments of quality were very similar, except for reviews of papers where reviewers had recommended rejection. As a previous study has found, authors rate such reviews of lower quality.15
Although their reviews were not rated as higher quality, peer reviewers in the intervention group overall took significantly longer to assess manuscripts than did reviewers in the control group. These reviewers might have been expected to spend notably more time, and give higher quality reports, on papers that they were recommending for acceptance because their reviews were more likely to be shared with the world. However, any differences were relatively small.
Two previous studies at the BMJ have come up with similar findings to those of this study: opening up the process of peer review neither greatly improved the quality of the reviewers’ reports nor, reassuringly, made it worse.8 10 Another study of the quality of signed and unsigned reviews found a statistically, but not editorially, significant improvement in the quality of signed versus unsigned reviews.16
In this study we randomised all eligible papers and used a validated instrument to assess review quality.14 One possible explanation for our failure to find differences between the intervention and control groups could have been poor sensitivity of the RQI. However, the instrument has successfully detected differences in review quality in several randomised studies.8 10 16 17
The rate of refusal of reviewers to participate in the study was high at 55%. All eligible papers were entered into the study, so there was no way of calculating a background refusal rate over the course of the study. However, we know from our previous studies of peer review that more reviewers say they would refuse to take part in a trial of a hypothetical intervention than actually refuse when the intervention is formally introduced.
In this study, the relatively high rate of reviewers’ refusal to participate and the failure of the intervention to produce better reports begs the question of whether, as in ordinary clinical trials, participants differ from those not in the trial. Our intervention could be seen as having two stages, with the first, and arguably the most important, being the selection of reviewers who were happy for their signed reports to be published. Only these individuals could be randomised, so this selection process may itself have picked out reviewers who would provide more careful reports. Informing this subset of potential reviewers that their review would be published online with the paper might have had much less effect on the quality of reviews than it might for reviewers in general. If the policy were to be adopted by a journal, it might mean that the pool of reviewers would be reduced.
Reviewers who knew that their report might be posted online spent longer on the task than those in the control group, so adopting open peer review might result in the process feeling even more arduous to reviewers than it currently does. This is a concern because willing reviewers are already the scarcest component in the peer review process.
Disclosure of a reviewer’s identity has, however, brought substantial benefits. Posting the entire prepublication history of accepted papers on the journal’s website has turned our practice of peer review from what was traditionally a “black box” into a transparent process. Potential authors can see exactly what the BMJ means by peer review and can make an informed decision about whether our version adds enough value to justify sending their papers to us. Posting reviewers’ signed comments alongside the accepted paper means that reviewers can now receive full credit for their contribution to science.
Opponents of open peer review cite two main concerns. The first is that open peer review may provide less critical, and therefore less useful, reports. Although our validated instrument to measure quality doesn’t include “criticality” as one of its dimensions, during the validation process we found good correlation between the instrument’s scores and editors’ overall assessment of review quality.14 In this study, as in previous studies, reviewers in the intervention group were not significantly more likely to recommend acceptance. 8 10
The second concern relates to the possible negative effects of open peer review on relationships among individuals working in the same field. In response to such fears, the BMJ set up a system of adverse event reporting that used a “yellow card,” which was routinely sent out with papers for review. Reviewers and authors were informed of the existence of this system of reporting and were asked to use the yellow card to notify the editor, in confidence, of any adverse events occurring as a direct result of our policy of open peer review. In nearly five years of the system’s operation, only one adverse event was reported, and this event did not occur in the context of any of our studies. One factor that may have militated against this problem is that potential reviewers in general medicine come from a very wide pool, so internecine battles are less likely than in smaller research communities.
A third, subsidiary, concern relates to the extra time required to do carefully constructed reviews suitable for wide scrutiny. The question is whether the benefits of open peer review are sufficient to outweigh this price of extra time and the associated reluctance of some reviewers to participate.
The results of our study suggest that extending open peer review to include sharing reviewers’ signed reports with the world at large is feasible at a large medical journal and does not adversely affect the quality of these reports, nor does it improve it by any notable amount. It may, however, reduce the number of willing reviewers and increase the time taken to review.
The BMJ is committed to opening up its peer review process, providing that it can achieve this without damaging quality. We believe that the ethical arguments in favour of open peer review outweigh any practical concerns.
We would like to see similar studies undertaken at other large journals to establish the generalisability of our results. We would also urge smaller and more specialist journals to consider doing similar studies to ascertain whether our findings are applicable in fields other than general medicine.
Web table ATitles and URLs of papers published on the BMJ website with their peer review prehistory (prepublication history)
This paper has been severely delayed by the sad death of the first author, Susan van Rooyen. Because members of the BMJ’s editorial staff conducted this research, assessment and peer review were carried out entirely by external advisers. No member of BMJ staff was involved in making the decision on the paper. The researchers would like to acknowledge the following people whose help was invaluable during this study: Lisa Bero, for suggesting the study and producing a draft protocol; the members of the research steering group—Richard Smith, Nick Black, Fiona Godlee, Sandra Goldbeck-Wood, and Liz Crossan—for involvement in the conception and design of the study; Richard Smith, then editor of the BMJ, for making it possible to carry out this study in the BMJ offices, for his participation in evaluating reviews, his comments regarding the preparation of the manuscript, and for heading up the BMJ’s research steering group; Maureen Phayer and Daniel Berhane, for their invaluable technical assistance in posting articles’ prepublication history on the website; Sue Minns and Marita Batten, manuscript administrators; Anna Vassilieva, for administrative assistance in data collection and management; BMJ editors—Kamran Abbasi, Douglas Carnall, Sandra Goldbeck-Wood, Trish Groves, Christopher Martyn, Tessa Richards, Roger Robinson, Jane Smith, Tony Smith, Alison Tonks, and Gavin Yamey—for evaluating reviews; the authors who allowed us to use their papers and evaluated the reviews; and all the BMJ reviewers who allowed themselves to be research subjects.
Contributors: SvR was involved in the conception and design of the study; data collection, analysis, and interpretation; day-to-day management of the study; writing the paper; and discussing core ideas, and was a member of the research steering group. TD was involved in the conception and design of the study; interpretation of data; writing the paper; and discussing core ideas, and is the guarantor. SJWE gave statistical advice, was involved in the analysis of the data, and assisted in the writing of the paper.
Funding: SvR was employed by the BMJ as research assistant. The other costs of the study were met by the BMJ.
Competing interests: All authors have completed the Unified Competing Interest form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organisation for the submitted work; TD is employed by the BMJ as deputy editor and the journal is interested in the outcome of this research; no other relationships or activities that could appear to have influenced the submitted work.
Data sharing: The main dataset, in Stata format, and a short Stata programme to reproduce some of the analyses are available from SJWE (stephen.evans[at]lshtm.ac.uk). The data could be converted to other formats if required. Because of the death of SvR, the data dictionary is not available.
Cite this as: BMJ 2010;341:c5729