We carried out two series of user tests in 2005 (Test 1) and 2006 (Test 2), with participants from Norway and UK. The publisher of the site, Wiley-Blackwell, made changes to the site after Test 1, partly based on the results we uncovered. Most of these changes regarded branding at the top of the site, making The Cochrane Library
the prominent identity and toning down the logo and universal navigation of the publisher. Therefore we altered the interview guide of Test 2 in small ways so that the questions would match the changes that had been made. See Additional file 1
for the complete interview guide we used in Test 2.
We limited our selection to health professionals who used the Internet and had some knowledge of systematic reviews, to ensure that the results of the interface testing would not be confounded by unfamiliarity with the media or the site's content. We sent email invitations to lists of previous attendees of evidence-based practice workshops, employees in the Directorate of Health and Social Affairs in Oslo and individuals in evidence-based health care networks in Oxford. Volunteers who responded were screened by phone or email to assess whether they fitted the requirements, and also to find relevant topics of interest so that we could individually tailor test questions. We also asked them about their online searching habits, and what sources of online information they usually used in connection with work. We did not reveal the name of the site we were testing during recruitment. Test persons were promised a gift certificate worth the equivalent of $80 USD or a USB memory stick if they showed up for the test.
Tests were performed individually and took approximately one hour. The test participant sat at a computer in a closed office together with the test leader who followed a semi-structured test guide. We recorded all movement on the computer desktop through use of Morae usability test software [11
] and video-filmed the participant, who was prompted to think out loud during the whole session. We projected the filming of the desktop and the participant as well as the sound track, to another room where two observers transcribed, discussed, and took notes.
The data was anonymous to the degree that participants' names were not connected to video, audio or text results. We received written permission to store the recordings for five years before deleting it, guaranteeing that video/audio tapes would not be used for any purpose outside of the study and not be published/stored in places of public access. The protocol was approved by the Norwegian Social Science Data Services and found in line with national laws for privacy rights.
We began the test with preliminary questions about the participant's profession, use of Internet, and knowledge of The Cochrane Library. We then asked the participant to find specific material published on the Library starting from an empty browser window. Once on the site, we asked about their initial reactions to the front page, and they were invited to browse freely, looking for content of interest to themselves. Then we asked them to perform a series of tasks, some of which involved looking for specific content about topics tailored to their field or professional interests. For instance, a midwife was asked to find:
- all information on the whole library that dealt with prevention of spontaneous abortion
- a specific review about the effect of caesarean section for non-medical reasons
- all new Cochrane Reviews relevant to the topic "music used to relieve pain".
Other general tasks included finding help, finding the home page, and finding information about Cochrane. We also had specific tasks leading to searching and to reading a review. At the end, we asked if they had any general comments to the site and suggestions to how it could be improved.
Our analysis was done in two phases. The aim of the first analysis was to provide the stakeholders and site developers with an overview and a prioritizing of the problems we had identified. At least two of us carried out content analysis of the transcripts, independently coding each test. These codes were then compared, discussed and merged. The topics were then rated according to the severity of the problem for the user. We rated severity in three categories: high (show-stopper, leads to critical errors or hinders task completion), medium (creates much frustration or slows user down), or low (minor or cosmetic problems).
The second analysis was done to lift more generalizable issues underlying this article out of the site-specific data. We re-sorted the findings into the seven user-experience categories from the honeycomb model by re-reading the transcript, checking the context where the problems came from, and evaluating which of the seven categories best fit each finding. Severity-of-problem ratings from the first analysis were kept in the second analysis.
We did not evaluate accessibility (the degree to which the website complied with standards of universal accessibility, for instance as defined by the Web Accessibility Initiative [12
]), since user testing methods are not an effective way of gathering data on various aspects of this issue.
The findings presented here are a selection of issues that received a high degree of saturation in our tests, and that we judge to be critical ("high severity") to the user experience of evidence-based web sites in general. This judgement is based on basic principles for web usability [7
] as well as the principles underlying evidence-based health care: to successfully search for, critically appraise and apply evidence in medical practice [16
Most of the findings here are still of relevance to The Cochrane Library in its current format, though we have included some observations of problems that are now resolved, because they illustrate issues that are potentially important for others. Our aim is not to write a critical review of the library, but to highlight issues we found that can be important to user experience of evidence-based web sites for health professionals.