The use of medical experts in rating the content of health-related sites on the Internet has flourished in recent years. In this research, it has been common practice to use a single medical expert to rate the content of the Web sites. In many cases, the expert has rated the Internet health information as poor, and even potentially dangerous. However, one problem with this approach is that there is no guarantee that other medical experts will rate the sites in a similar manner.
The aim was to assess the reliability of medical experts' judgments of threads in an Internet newsgroup related to a common disease. A secondary aim was to show the limitations of commonly-used statistics for measuring reliability (eg, kappa).
The participants in this study were 5 medical doctors, who worked in a specialist unit dedicated to the treatment of the disease. They each rated the information contained in newsgroup threads using a 6-point scale designed by the experts themselves. Their ratings were analyzed for reliability using a number of statistics: Cohen's kappa, gamma, Kendall's W, and Cronbach's alpha.
Reliability was absent for ratings of questions, and low for ratings of responses. The various measures of reliability used gave conflicting results. No measure produced high reliability.
The medical experts showed a low agreement when rating the postings from the newsgroup. Hence, it is important to test inter-rater reliability in research assessing the accuracy and quality of health-related information on the Internet. A discussion of the different measures of agreement that could be used reveals that the choice of statistic can be problematic. It is therefore important to consider the assumptions underlying a measure of reliability before using it. Often, more than one measure will be needed for "triangulation" purposes.