Design and setting
The WSD telehealth trial is a pragmatic, cluster randomised controlled trial of telehealth (n=3230). This paper reports on the nested WSD telehealth questionnaire study, which was designed to include 1650 participants. Between May 2008 to December 2009, we recruited participants for the WSD telehealth trial, across three sociodemographically distinct regions in England (rural Cornwall, rural and urban Kent, and urban Newham in London) comprising four primary care trusts. Participants were also invited to take part in the WSD telehealth questionnaire study, a supplementary investigation of patient reported outcomes. We assessed participants at four and 12 months after recruitment (the last 12 month assessment occurred in December 2010). Figure 1 shows a CONSORT diagram of general practice and participant flow into the parent trial and the questionnaire study (n=1573). Table 1 compares sample characteristics at baseline across the parent trial and the nested questionnaire study.
Fig 1 CONSORT diagram of the WSD telehealth trial and WSD telehealth questionnaire study. *Recruitment into the questionnaire study was implemented at the patient level, but descriptive data at the cluster level (general practice) are presented for comparison (more ...)
Table 1 Sample characteristics at baseline. Data are no (%) unless stated otherwise
Cluster level recruitment and randomisation
Allocation was conducted at the cluster (general practice) level. All 365 general practices in the four primary care trusts were invited to participate. To maximise participation, the evaluation adopted a pragmatic approach: each practice provided intervention participants for one technology (telehealth or telecare) in one trial and control participants for the other technology (telecare or telehealth) in the other trial, ensuring equity of access to advanced assistive technology at the level of the practice population.18
Consenting practices were allocated to the intervention and control groups by the trial statistician (HD), using a centrally administered minimisation algorithm that ensured comparability across trial arms in terms of practice size; deprivation; proportion of patients from non-white ethnic groups; prevalence of diabetes, chronic obstructive pulmonary disease, and heart failure; and WSD site.
There was no blinding for practices, participants, or assessors, although most measures in the WSD telehealth and telecare questionnaires studies were self reported. Participants allocated to the control arm were informed that they would be offered the appropriate technology at the end of the 12 month trial period, subject to a further needs assessment.
Participant level recruitment
In participating practices, patients with chronic obstructive pulmonary disease, diabetes, or heart failure were deemed eligible on the basis of one of the following:
- Inclusion on the relevant Quality Outcomes Framework register in primary care
- A confirmed medical diagnosis in primary or secondary care medical records, as indicated by general practice Read codes or ICD-10 (international classification of diseases, 10th revision) codes
- Confirmation of disease status by a local clinician (such as general practitioner or community matron) or hospital consultant.
Patients were not excluded on the basis of additional physical comorbidities. However, participants were required to have a telephone landline for broadband internet connection (at all WSD sites), and a digital television (in Newham). Other financial costs (including phone calls and data transmission to the monitoring centres) were paid for by the local WSD project teams. Since the telehealth system used in the trial required participants to read and respond to textual information presented via a base unit or television screen, sufficient English language literacy was required, as determined by the local WSD project team. Cognitive impairment was not an exclusion criterion for the WSD telehealth trial, provided that an informal carer was available to assist with use of the telehealth system. However, cognitive impairment was an exclusion criterion for the questionnaire study because we aimed to collect self reported data without third party influence. Participants with physical impairments could receive practical assistance with completing the questionnaire battery from an independent trained researcher.
171 potentially eligible patients in participating practices were contacted about the study. To meet ethical obligations, these patients were initially sent and asked to complete a data sharing consent form if they were interested in the study and willing to allow their medical and social care data to be shared with the WSD research team. Follow-up letters and telephone calls encouraged responses.
Once data sharing consent was received eligibility was confirmed by the local WSD project team and eligible patients were contacted to arrange a home visit to discuss the research in more detail. At this visit, the suitability of the participant’s home infrastructure was checked, and a participant information sheet and consent form were provided. Participants provided written consent and indicated whether they would be willing to take part in the supplementary questionnaire study. Those willing were contacted by trained interviewers to arrange a baseline interview in the participant’s home. At baseline interview, patients received a second information sheet relating specifically to the questionnaire study, and signed a second consent form for this part of the evaluation.
To minimise participant burden and create mutually exclusive subgroups for subsequent disease specific analyses (not reported here), participants with at least two of the three long term conditions were allocated to a single index condition using simple randomisation. Based on a prospective power calculation (see below), we aimed to recruit 1650 participants into the questionnaire study with an approximately even split between the three long term conditions (fig 2). Recruitment ended in December 2009.
Fig 2 Composition of baseline sample, by diagnosis of long term condition. COPD=chronic obstructive pulmonary disease
Telehealth treatment (intervention arm)
To facilitate comparisons between clinical studies, four classes (or “generations”) of telehealth have recently been proposed on the basis of the type of data transfer, decision making ability of the care provider reviewing the data, and level of integration of all systems with the patient’s primary care structure.40
First generation telehealth comprises non-reactive data collection and analysis systems. Measurements of interest are collected and transferred to the care provider asynchronously (that is, by store and forward protocols). There is no full telemedical system, and the provider cannot respond immediately to patient data. Second generation systems have a non-immediate analytical or decision making structure. Data transfer is synchronous—that is, there is some real time processing of patient data using, for example, automated algorithms to interpret the data. Care providers can recognise important changes in essential measurements, but delays can occur if the systems are only active during office hours.
Third generation systems provide constant analytical and decision making support. Monitoring centres are physician led, staffed by specialist nurses, and have full therapeutic authority 24 h per day, seven days per week. Fourth generation systems are an extension of third generation systems, comprising invasive (such as with surgical implantation) and non-invasive telemedical devices for data collection. The complexity of incoming information and subsequent therapeutic decisions requires the continuous presence of a physician.
WSD sites delivered variations of telehealth, but all systems focused on monitoring vital signs, symptoms, and self management behaviour. They provided general and disease specific health education, with non-immediate review by specialist nurses and other care providers. This configuration most closely approximates second generation telehealth.
Web appendix 2 and figure 3 describe the WSD telehealth intervention. Web figure 1 shows the provision of peripheral telehealth devices to intervention participants according to diagnosis of long term condition in each WSD site. Sites differed in the number of peripheral devices installed per participant, with a mode (across all long term conditions) of two in Cornwall and three in Kent and Newham. Web figure 2 shows the early removal of telehealth from participants for reasons other than death, by site. Differences in functionality of the telehealth equipment supplied, type and number of peripheral devices provided, transfer of data to the monitoring centres, triage or risk stratification, and response pathways reflected variations that would occur if telehealth was implemented across the UK’s entire health system.
Fig 3 Overview of WSD telehealth intervention. Numbers indicate stages described in web appendix 2
Usual care treatment (control arm)
Participants allocated to the control arm continued to receive their existing healthcare and social services, in line with local protocols, for the 12 months of the trial. Across the three WSD sites, healthcare was provided by a combination of community matrons, district nurses, specialist nurses, general practitioners, and hospital services based on clinical need. Patients had pre-established, tailored care plans that included routine assessments at a frequency appropriate for their disease severity—typically ranging from once per week to once or twice per year. Control participants had no telehealth or telecare equipment installed their homes for the duration of the study. A Lifeline pendant (a personal alarm) plus a smoke alarm linked to a monitoring centre were not, on their own, sufficient to classify as telecare for current purposes. We planned to reassess control participants at the end of the trial and, if still eligible, offer them telehealth.
All participants (intervention and control) were beneficiaries of the Whole Systems Redesign, which was a precondition of sites’ participation in the trial. Putative benefits for patients included a better understanding of their condition and how to look after themselves through the development of self care behaviours and the continued support of services such as community matrons (web appendix 1).
Trial assessment procedures
Outcomes were assessed at the level of the patient. At baseline, questions on outcome measures were answered by participants with a trained researcher on hand to explain or clarify the meaning of particular questions or assist with completing the questionnaire if participants were physically unable to do so. After the baseline interview, two further assessments were conducted. A short term assessment was conducted at about four months (median duration 127 days (interquartile range 37); 132 days (40) for control group, 126 days (35) for intervention group), and a long term assessment at around 12 months (347 days (49); 358 days (48), 342 days (47)). Duration at both assessments was similar across trial arms.
The questionnaire battery was the same at baseline and at short term; long term assessment included two additional scales measuring functional status56
and impact of illness57
(not reported here). At short term assessment, the survey battery was primarily administered as a postal survey with one reminder letter for non-responders; some participants also received telephone reminders. At long term assessment, the survey was posted to participants and non-responders were contacted to arrange a home interview with a trained researcher, in line with the baseline protocol. Participants who did not complete a questionnaire at short term were still invited to complete a questionnaire at long term. However, participants who withdrew from the trial, including intervention participants who asked for the telehealth equipment to be removed before the end of the 12 month trial period, were not sent further questionnaires after their withdrawal date.
Patient reported outcomes
Findings in the current report are based on instruments assessing different domains of generic health related QoL (SF-12, EQ-5D), anxiety (Brief State-Trait Anxiety Inventory (STAI)), and depressive symptoms (Center for Epidemiologic Studies Depression Scale (CESD-10)).
is a 12 item measure of general health status and health related QoL that uses norm based scoring for the general population in the United States in 1998. The instrument was scored in two subscales, the physical component summary score and the mental component summary score; higher scores represent better health related QoL. The SF-12 has shown good test-retest reliability, validity, and responsiveness, and is recommended for patients with heart failure.59
assesses five domains of generic health related QoL (mobility, self care, usual activities, pain and discomfort, anxiety and depression) and can generate either a health state (of 243 different states) or a single summary score (higher scores reflect better health related QoL). The EQ-5D has shown good validity and responsiveness and has been recommended for patients with diabetes61
and, more cautiously, for patients with chronic obstructive pulmonary disease62
and heart failure.59
For current purposes, the summary score was used.
The Brief STAI63
is a six item measure of state anxiety that has shown acceptable reliability and validity.63
It is widely used in clinical research, notably in studies of patients with diabetes.65
The state version, rather than the trait version, of the Brief STAI was used (higher scores reflect greater state anxiety).
is a 10 item measure of depressive symptoms covering cognitive, emotional, and behavioural domains. It has acceptable validity and reliability,66
and sensitivity and specificity.67
The original 20 item version has been used widely with clinical populations, including chronic obstructive pulmonary disease68
and heart failure,69
although both versions of the scale include items that confound symptoms of physical illness with symptoms of depression (for example, “I felt that everything I did was an effort”; “My sleep was restless”).70
Scores range from 0 to 30, with higher scores indicating more depressive symptoms.
Minimal clinically important differences (MCIDs) have not been established for these patient reported outcomes. To evaluate the magnitude of any treatment effect, we regarded a trial defined MCID as an effect size equivalent to Cohen’s d=0.3. This magnitude represents a “small” effect in the behavioural sciences.71
Covariates in the analyses
Data were collected on a range of sociodemographic and trial related characteristics that could plausibly be related to the study outcomes. These variables were used as covariates in the main analyses. Date of birth and sex were extracted from general practice records. Ethnicity was assessed by self report, using 16 response categories based on standard UK categories from the Office of National Statistics72
; missing responses were subsequently completed using data from medical records, where available. Education was assessed by self report using five response categories ranging from no formal education to graduate or professional level. We used participants’ postcodes to allocate an index of multiple deprivation score.73
Comorbidity was assessed by a count of diagnosed conditions in hospital episode statistics over the three years before the trial began. The WSD project teams provided data for participants’ WSD site; the presence or absence of a diagnosis of chronic obstructive pulmonary disease, diabetes, and heart failure; and the number and type of telehealth peripheral devices installed. The WSD evaluation team held data for participants’ allocation (to telehealth or usual care) and calculated the duration of exposure to telehealth (in days) at the time each assessment questionnaire was completed. Owing to the variability in telehealth duration for intervention participants at short term and long term assessments, this variable was included as a covariate.
For the telehealth questionnaire study, a power calculation was conducted on the basis of detecting a small effect size (Cohen’s d=0.3), allowing for an intracluster correlation coefficient of 0.05, power of 80%, and P<0.05. This calculation indicated that about 500 patients would be required to allow sufficient power to detect this small difference, ranging from 420 participants (five from each of 84 practices) to 520 participants (10 from each of 52 practices). These numbers were inflated by 10% to allow for the maximum possible increase in sample size due to variable cluster size.74
The required sample size thus increased to 550. For sufficient power in our secondary subgroup analyses (not reported here), we aimed to recruit 550 patients per long term condition, or 1650 overall. All analyses reported here exceed the required sample size (550) and are therefore adequately powered.
Missing self reported data could occur at the questionnaire level or at the item or scale level. A participant who completes the questionnaire battery at baseline could fail to complete the questionnaire at short term or long term. Alternatively, a participant who largely completes a questionnaire could nevertheless fail to provide responses to certain items or may miss out whole scales within the battery.
For the outcomes reported, missing values at the questionnaire level were not imputed. We imputed missing values at the item or scale level using two methods. If a missing value belonged to a scale and at least 50% of responses were available for the scale (for a particular participant), we used the series mean for that scale (for that participant) to fill in missing values. If a missing value for an item did not belong to a scale (for example, index of multiple deprivation score) or if fewer than 50% of scale items were completed, missing values (either for items or scale totals) were multiply imputed (m=10), on the basis of available data from several scales and items across all participants. We did multiple imputation using the Markov chain Monte Carlo function (SPSS).
We repeated analyses on each of the ten imputed datasets, and thereafter used standard multiple imputation procedures to combine the multiple scalar and multivariate estimates75
with SPSS (version 19) and NORM.78
We explored the influence of missing data at the questionnaire level by conducting complete case analyses (participants with data for all variables at all time points) and available case analyses (participants with data for all variables at baseline and at least one other time point). Depending on the reasons for missingness, both these approaches can generate biased results, but they are used here as sensitivity analyses to assess the robustness of the findings.
General practices were the unit of randomisation and were directly involved in the delivery of care to all participants, which could result in participants within practices being more similar than participants between practices. Causes of similarity within practices include pre-existing case mix differences between practice populations, and both general and specific practice effects (for example, factors that facilitate or inhibit access, general practitioner case load, the extent to which care is centred around the patient). To account for practice differences, multilevel modelling was used with observations (at different time points) nested within participants, and participants nested within practices. The model included random intercepts and random slopes at the practice level.
Repeated measures for each outcome over the trial period were analysed with the linear mixed model procedures in SPSS. We used restricted maximum likelihood to estimate model parameters, with an ante-dependent (first order) variance-covariance matrix structure. A separate analysis was conducted for each of the five outcome variables, and the main effect of trial arm (telehealth v usual care) was estimated to answer the principal research question. We estimated the main effect of time to determine whether the outcome measures were different in the short and long term. The interaction between trial arm and time (trial arm×time) was also estimated to determine whether the trial arm had differential effects at short term and long term.
In each model, the baseline measure of a respective outcome variable was treated as a covariate, with the measures at short term and long term treated as the outcome. Covariates included in the model adjusted for baseline distributional differences between trial arms on sociodemographic and trial related variables that could be related to the outcomes. Sociodemographic covariates included age, sex, ethnicity (white or non-white); education (ordinal, five levels); deprivation (continuous data); diagnosis of chronic obstructive pulmonary disease, diabetes, or heart failure; and total number of comorbidities (ordinal, nine levels). Trial related covariates included WSD site, number of peripheral telehealth devices installed (ordinal, five levels), and duration of exposure to telehealth (in days) at short term and long term assessment (continuous data). For all parameter tests, the α level was set to 0.05.
We did intention to treat analyses to assess treatment effectiveness, as the most appropriate strategy for analysing pragmatic randomised controlled trials. However, this approach is conservative and risks underestimating treatment effects.79
Complex healthcare interventions administered as part of a pragmatic trial risk being administered suboptimally, compared with being administered in routine care.
Obtaining an estimate of treatment efficacy would require a heavily resourced explanatory randomised controlled trial. However, an approximate evaluation of efficacy in pragmatic randomised controlled trials can be achieved via per protocol analyses. Thus, we conducted secondary per protocol analyses that analysed patients according to the treatment received rather than the treatment allocated (web appendix 3). Per protocol analyses risk overestimating the potential benefits that would be observed in routine practice. Considering primary (intention to treat) and secondary (per protocol) analyses together can help to disentangle treatments effects from implementation effects. Sensitivity analyses assessed the robustness of the findings to decisions taken at the analysis stage. Primary and secondary analyses were conducted for complete and available case cohorts. Here, data in the results section are taken from the primary analyses unless specified as being from secondary analyses.