Developing a Specification
We developed a specification for our development program based on our objectives and on our experiential learning about the limitations of existing techniques. We recognized that our technique should be extendible, to combine a number of monitoring methods which, at that time, we would not be able to define. At the time, we identified: (1) an indeterminate number of video channels; (2) a transcript of the consultation, captured with a precise time stamp, possibly using voice recognition software; (3) output from pattern recognition software [9
] and other change recognition technologies [10
]; (4) aggregate log files from observation techniques that we could not anticipate, as elements of our specification.
Developing Separate Work Packages
We converted this work schedule into small work packages, which we developed separately on a largely opportunistic basis, as we had not received any consistent funding. The elements of this were:
1. To determine the optimal number of video channels and a low-cost way of recording. This should have time stamps to allow synchronization with other video channels and methods of data collection.
2. To find a reliable way to code the video footage, so we could navigate directly to particular activities in the consultation and measure their durations.
3. To automate the capture of body language and eye contact, using pattern recognition and gaze detection direction technologies.
4. To aggregate all these elements into a single navigable analysis output.
5. To introduce the ability to export data in a format that could readily be utilized by software engineers to improve systems.
We explored using 3, 4, and 5 channels of video, mixed onto a single screen, as well as a 4-channel version where clicking on a screen would enlarge that window to full screen (). The additional channels experimented with since the 3-channel stage are the cameras focused on the patient’s upper body and the clinician’s facial view. We showed example consultations to experienced educationalists and academics accustomed to assessing video consultations, and we conducted semi-structured interviews to elicit their opinions [11
The multichannel video output, combined recordings of clinical computer system screen and 3 views of consultation
We also needed to identify low-cost methods of filming the consultation, ideally using unobtrusive tools, which recorded sound and video with a digital time signal so that precise synchronization was possible [12
Capturing and Coding Consultation Activity
We needed to be able to code interactions in the consultation so that we could readily navigate to a particular activity (eg, prescribing) and also identify its duration. We selected a flexible software called “ObsWin” (Antam Ltd, London, UK) to do this [13
] (). We conducted reliability tests of our manual coding method using multiple observers coding simulated blood pressure management follow-up consultations. We used intra-class correlation coefficient as an index of reliability [14
]. Subsequently, we compared the manual coding time for prescribing activities with frame-by-frame analysis of the video to further assess the reliability of our approach.
Observational data capture using ObsWin, rating interface and outputs with summary statistics
Wherever possible, we set out to automate the time stamps for the start and end of activities in the consultation. We developed a User Action Recording (UAR) application to measure the precise time stamp of keyboard use (each key depression is recorded and time stamped), as well as all mouse clicks and coordinates. We also produced a Voice Activity Recorder (VAR), which detects and time stamps the start and end of speech ().
Time-stamped consultation transcript creation using VAR
Automated Capture of Body Language
We automated the capture of body language to interpret nonverbal interactions and the direction of gaze to infer eye contact between clinician and patient. We experimented with Algol, an experimental pattern recognition software (PRS) not released as a commercial product (Main Highway Services, Winchester, UK), exploring correlation between movements detected with the software and manually detected activity [15
] (). We explored the possibility of obtaining software that measured the direction of gaze.
Measurement of nonverbal interactions using PRS, patient’s head nodding and doctor’s keyboard use
Aggregation and Navigation Application
We needed to aggregate the output from multiple data collection systems () into a single application that would be readily navigable. It needed to be able to flexibly load any number of input files and produce outputs that could be readily utilized in other applications. Unsuccessful effort to identify an appropriate proprietary application resulted in the in-house development of the Log Files Aggregation (LFA) application [16
Time stamped log files created by three different consultation activity observation methods.
Output That Could Facilitate Better Clinical Computer System Development
We wanted to produce an output that would be readily interpretable by software engineers, so that our findings had a utility beyond the health care community. We specified our aggregation tool to export the combined log files in XML (extensible mark-up language) format, so they can be readily imported and interpreted by other applications. Process models of consultation tasks created using the UML, a standard modeling and specification notation widely used in software engineering, was chosen as our main mechanism for representing the use and impact of clinical system features within the consultation.
Pilot Recording of Consultations
We developed our method using simulated consultations between clinicians and actor patients within a simulated clinical environment. We initially developed the technique using standard consultations (eg, follow-up blood pressure checks [14
]) and then a wider range of clinical problems.
We needed to know whether our technique was practical to set up within a standard consulting room and could cope with background noise, variable lighting including window position, and room size. We next tested our technique using actor patients in GP surgery premises. We found that audio recording from 1 camera was satisfactory; modern cameras coped well with variations in lighting, and 2 people could set up the cameras and install the other data-capture methods in less than 20 minutes. We found that the cameras and other data-capture tools could capture more than an hour’s data, but that it was prudent to remove screen capture and video data in a pause between consultations after 45 minutes.
We next developed a protocol that included our technical method, obtaining proper consent from patients and securing the data. We wanted to obtain pilot data from the 4 different most used brands of GP EPR systems, so we could make comparisons. These 4 brands are: (1) EMIS LV, the longest established and, at the time of the study, the most used system; (2) EMIS PCS, a more modern version from the same manufacturer; (3) INPS Vision; and (4) iSoft Synergy. EMIS LV is largely character user interface (CHUI) driven, whereas the other 3 have graphical user interfaces (GUI).
In our pilot analysis, we only included coding carried out using the picking list or other routine coding tools. We did not include data entry forms or templates that could facilitate more rapid data entry. The 4 GPs we filmed had used their current computer system for at least 3 years and had not routinely consulted with paper records for at least this period.
We planned to compare the time taken to carry out clinical coding, prescribing, and other routine tasks in the clinical consultation. We expected data from a small pilot to not have a normal distribution. This expectation is for 2 reasons: (1) we have a small sample and (2) we expected a skewed distribution because sometimes these tasks take a long time, but they always take a minimum time. We used box whisker plots to visually compare actions that were frequently recorded. We also used nonparametric tests (Mann-Whitney U test) to differentiate between EMIS LV (the then most used brand of GP EPR system) with the other systems. We next used the Krushkal-Wallis to explore any statistically significant difference in mean ranking. We used SPSS version 15 to carry out these analyses.
We obtained ethical approval for the pilot recording of live consultations via the National Health Services Central Office for Research Ethics Committees (COREC). The protocol included making proper provision for the secure transport and storage of media and limiting access.
We used a 3-step process to obtain consent from patients to be video recorded. First, the video sessions were marked as such in participating practices, so that patients who booked into these sessions knew they were going to have their consultation video taped by 3 cameras as part of a research project. Second, they signed consent at the start of the consultation and were told that, if they did not want the video used after the consultation, they were free to say so. Finally, they and the clinician signed consent after the consultation stating that they remained willing for the consultation data to be used in research.