Home | About | Journals | Submit | Contact Us | Français |

**|**PLoS One**|**v.6(11); 2011**|**PMC3215721

Formats

Article sections

Authors

Related links

PLoS One. 2011; 6(11): e27557.

Published online 2011 November 14. doi: 10.1371/journal.pone.0027557

PMCID: PMC3215721

Bruno Giraudeau,^{1,}^{2,}^{3,}^{4,}^{*} Clémence Leyrat,^{1,}^{4} Amélie Le Gouge,^{1,}^{3} Julie Léger,^{1,}^{3} and Agnès Caille^{1,}^{2,}^{3,}^{4}

Ioannis P. Androulakis, Editor^{}

Rutgers University, United States of America

* E-mail: giraudeau/at/med.univ-tours.fr

Conceived and designed the experiments: BG. Performed the experiments: BG CL ALG JL AC. Analyzed the data: BG CL ALG JL AC. Contributed reagents/materials/analysis tools: BG CL ALG JL AC. Wrote the paper: BG CL ALG JL AC.

Received 2011 August 11; Accepted 2011 October 19.

Copyright Giraudeau et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Grant proposals submitted for funding are usually selected by a peer-review rating process. Some proposals may result in discordant peer-review ratings and therefore require discussion by the selection committee members. The issue is which peer-review ratings are considered as discordant. We propose a simple method to identify such proposals. Our approach is based on the intraclass correlation coefficient, which is usually used in assessing agreement in studies with continuous ratings.

Peer review is now the principal mechanism for selecting grant applications for funding [1], [2]. In this process, inter-reviewer agreement is important for ease in application ranking. Both Wiener *et al* [3] and Hartmann *et al* [4] found high inter-reviewer agreement in rating proposals. Green *et al* [5] demonstrated that the rating intervals of the scale (0.5 or 0.1) did not influence the final assessment. Nevertheless, reviewers still have disagreements about some proposals because of differing scientific backgrounds, perceptions of the proposal, or non-declared conflicts of interest. Proposals with discordant peer-review ratings need to be discussed before a global ranking of proposals. We propose a simple method to help selection committees identify proposals that require discussion because of lack of agreement in peer-reviews.

Let us consider the example of 20 proposals submitted to a fictitious funder and assessed by 3 reviewers. Ratings are displayed in Table 1, and, for each proposal we have estimated the intra-proposal mean rating and standard deviation. Disagreement among ratings translates into a high intra-proposal standard deviation for proposals 3, 14, 19, 20 and 15, for example.

A simple way to identify proposals with discordant peer-review ratings would be to specify a ceiling intra-proposal standard deviation: each proposal with an intra-proposal standard deviation greater than this ceiling value would be considered as having discordant peer-review ratings. Nevertheless, such an approach would have 2 limits. First, this ceiling standard deviation would highly depend on the rating scale (and would therefore differ for each funder). Second, the ceiling standard deviation should be fixed relative to the inter-proposal heterogeneity rather than be an absolute value. Thus, in our example, if we consider the proposal rating means (i.e., the series 15.0, 11.1 … 13.9 in Table 1), the inter-proposal standard deviation is estimated at 2.3. Then, an intra-proposal standard deviation of 3 or 4 would be unacceptably high but would not be high had the estimated inter-proposal standard deviation been around 5.

Considering that the underlying question of our research is agreement, we focus on the intraclass correlation coefficient (ICC), the parameter usually assessed for continuous outcomes [6]. This coefficient is defined as the ratio of the inter-subject variance (here the inter-proposal variance) to the whole variance (here the inter-proposal variance plus the intra-proposal variance). Thus, the ICC theoretically varies between 0 and 1 [7], where 0 is total lack of agreement among ratings and 1 is perfect agreement with no intra-proposal variance. In our example the ICC is estimated at 0.366 (using the ANOVA estimator in absence of an explicit maximum likelihood estimator when the number of ratings per proposal varies [8]), which can be interpreted as 36.6% of the total variation being due to inter-proposal variability (i.e., the “true” variability) and 63.4% to lack of agreement among reviewers.

Giraudeau *et al.* [9] derived an analytical formula that assesses the influence of a subject (here, a proposal) on the estimate of the ICC (Appendix S1). For a given proposal (named *i _{0}* for convenience), this influence is actually the sum of 2 antagonist effects: the positive effect, related to the

We then propose to use the second term of the formula to identify proposals with discordant reviews by the following algorithm:

- Discard any proposal with only one rating, considering that it automatically needs to be discussed.
- Estimate the ICC for the thus truncated dataset.
- Apply the analytical formula for each proposal.
- Identify the proposal for which the second term of the formula is highest in absolute value (i.e., the proposal that has the greater negative impact on the ICC estimate).
- Discard the identified proposal from the sample. In case of ties, discard all proposals for which the second term of the formula is equally high (in absolute value).
- Estimate the ICC for the truncated sample.
- Repeat steps 3 to 7 until the ICC estimate has reached a pre-specified value.
- The discarded proposals are those that need to be discussed because of peer-review rating disagreement.

The code to implement this algorithm is presented in Appendix S3.

In this algorithm, the only arbitrary choice is the ceiling ICC required in step 7, which must be pre-specified for the following reason: specifying this 0.7 value, for instance, means that in the final sample (i.e., once all proposals with too-high discordant ratings have been discarded), 70% of the variability is due to “true variability” (i.e., variability among proposals) and 30% is due to inter-reviewer heterogeneity (i.e., variability within proposals). We consider this reasoning more concrete and easier than specifying a ceiling intra-proposal standard deviation because the ceiling ICC value is independent of the rating scale and the funder requirements.

We applied this algorithm to the dataset previously presented using a threshold value of 0.7 for the ICC. Seven proposals were identified as needing discussion because of disagreements among ratings (Table 2). The first proposal discarded is proposal 3, which results in a sample of 19 proposals and an estimated ICC of 0.415. The second proposal discarded is 19, which results in an estimated ICC of 0.449; the third proposal is 20 etc. Once proposals 3, 19, 20, 14, 15, 5 and 10 have been discarded, the estimated ICC is 0.708. Obviously, a cut-off value of 0.7 for the ICC is a stringent constraint and leads to a high number of proposals needing discussion (7 of 20). We may decide to be less stringent; for instance, an ICC of 0.5 (i.e., 50% of the variability is due to “true variability”) would lead to identifying only 4 proposals (proposals 3, 19, 20 and 14).

We propose a simple way to identify proposals for which inter-reviewer ratings are discordant. Obviously, such an algorithm aims not to replace a selection committee but, rather, help it rank and select proposals. The method is not specific to any reviewing agency. Actually, it may be applied in any peer-review process requiring reviewer to comment on a proposal (whatever the range of notes). The proposed algorithm is easy to apply but supposes a quantitative rating of proposals by reviewers. This approach may also find application in other contexts such as ranking abstracts submitted to a conference, as was done for the 2010 annual meeting of the French pediatric society [10].

**A simulation study to assess the accuracy of the Giraudeau ***et al.*** formula (8) in the unbalanced case.**

(DOC)

Click here for additional data file.^{(120K, doc)}

**Algorithm code in R language (R Project for Statistical Computing v2.8.1).**

(DOC)

Click here for additional data file.^{(29K, doc)}

**Competing Interests: **The authors have declared that no competing interests exist.

**Funding: **No current external funding sources for this study.

1. Demicheli V, Di Pietrantonj C. Peer review for improving the quality of grant applications. Cochrane Database Syst Rev. 2007:MR000003. [PubMed]

2. Wessely S. Peer review of grant applications: what do we know? Lancet. 1998;352:301–305. [PubMed]

3. Wiener SL, Urivetzky M, Bregman D, Cohen J, Eich R, et al. Peer review: inter-reviewer agreement during evaluation of research grant applications. Clin Res. 1977;25:306–311. [PubMed]

4. Hartmann I, Neidhardt F. Peer review at the Deutsche Forschungsgemeinschaft. Seintometrics. 1990;19:419–425.

5. Green JG, Calhoun F, Nierzwicki L, Brackett J, Meier P. Rating intervals: an experiment in peer review. FASEB J. 1989;3:1987–1992. [PubMed]

6. Shoukri M. Measures of interobserver agreement. Boca Raton: Chapman & Hall/CRC; 2004.

7. Giraudeau B. Negative values of the intraclass correlation coefficient are not theoretically possible. J Clin Epidemiol. 1996;49:1205–1206. [PubMed]

8. Donner A. A review of inference procedures for the intraclass correlation coefficient in the one-way random effects model. Int Stat Rev. 1986;54:67–82.

9. Giraudeau B, Mallet A, Chastang C. Case influence on the intraclass correlation coefficient estimate. Biometrics. 1996;52:1492–1497.

10. Hankard R, Giraudeau B, Dubus JC, Tounian P, Sarles J, et al. Comment sont évalués les résumés soumis à la Société Française de Pédiatrie (SFP)? Arch Pediatr In press [PubMed]

Articles from PLoS ONE are provided here courtesy of **Public Library of Science**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |