Some may argue that a single letter grade is insufficient for evaluating evidence, because its very uni-dimensionality fails to capture the array of important attributes necessary for evidence interpretation, such as the magnitude of effect and statistical uncertainty. However, an analogous argument may be applied to virtually any other source of lay information that employs single letter or number grades, yet these grades are often embraced by their intended audience. For example, the quality of health care organizations clearly has distinct domains, yet the National Committee for Quality Assurance chooses to synthesize individual, domain-specific grades into a summary grade. Consumer products have individual attributes, yet the Consumers Union chooses to synthesize individual, attribute-specific ratings into a single rating. It is important to observe that providing a summary grade does not preclude the option of supplementing it with grades for individual evidence attributes.
It also may be argued that editors write headlines rather than journalists, and so incorporation of an evidence scale is unlikely to reduce the number of sensationalist headlines. While this argument may be true in the short term, it is less likely to be true in the long term, as an increasing lay understanding of evidence ratings could make a sensationalistic headline more incompatible with the article that follows.
Because the “gray literature” (e.g., blogs) is becoming a more important source of lay medical information, it would be desirable to include these sources in any systematic effort to make evidence more transparent. Yet the approach advocated here may be less feasible for gray literature sources than for established scientific journals.
Editorial staff of medical journals are often overwhelmed by existing work demands and may balk at assuming the additional responsibility of not only evaluating the merit of the research, but also summarizing its addition to the cumulative body of evidence. However, over time, the authors and peer-reviewers may absorb some of this additional burden, especially if editors require an evidence summary to be incorporated into the Discussion section of the manuscript. The work burden could also be restricted by limiting lay evidence reports to those articles that are particularly relevant to public health (e.g., when high prevalence conditions or risk factors are being associated with diseases that confer great morbidity and mortality).
Finally, it could be argued that evidence ratings should always be performed by impartial organizations that have nothing to gain by grade inflation (e.g., the USPSTF itself). However, it seems implausible that any organization would ever be sufficiently resourced or invested with sufficient authority to offer prompt evaluation of all research reports at the time of press.