|Home | About | Journals | Submit | Contact Us | Français|
Health economic analysis traditionally relies on patient derived questionnaire data, routine datasets, and outcomes data from experimental randomised control trials and other clinical studies, which are generally used as stand-alone datasets. Herein, we outline the potential implications of linking these datasets to give one single joined up data-resource for health economic analysis.
The linkage of individual level data from questionnaires with routinely-captured health care data allows the entire patient journey to be mapped both retrospectively and prospectively. We illustrate this with examples from an Ankylosing Spondylitis (AS) cohort by linking patient reported study dataset with the routinely collected general practitioner (GP) data, inpatient (IP) and outpatient (OP) datasets, and Accident and Emergency department data in Wales. The linked data system allows: (1) retrospective and prospective tracking of patient pathways through multiple healthcare facilities; (2) validation and clarification of patient-reported recall data, complementing the questionnaire/routine data information; (3) obtaining objective measure of the costs of chronic conditions for a longer time horizon, and during the pre-diagnosis period; (4) assessment of health service usage, referral histories, prescribed drugs and co-morbidities; and (5) profiling and stratification of patients relating to disease manifestation, lifestyles, co-morbidities, and associated costs.
Using the GP data system we tracked about 183 AS patients retrospectively and prospectively from the date of questionnaire completion to gather the following information: (a) number of GP events; (b) presence of a GP 'drug' read codes; and (c) the presence of a GP 'diagnostic' read codes. We tracked 236 and 296 AS patients through the OP and IP data systems respectively to count the number of OP visits; and IP admissions and duration. The results are presented under several patient stratification schemes based on disease severity, functions, age, sex, and the onset of disease symptoms.
The linked data system offers unique opportunities for enhanced longitudinal health economic analysis not possible through the use of traditional isolated datasets. Additionally, this data linkage provides important information to improve diagnostic and referral pathways, and thus helps maximise clinical efficiency and efficiency in the use of resources.
Health service research have tended to rely on data generated from randomised controlled trials (RCT), observational studies based on patient derived questionnaire data, and routinely assembled data abstracted from the primary and secondary care patient record - popularly known as routine data . Data generated from these three routes are predominantly used as stand-alone datasets, and alongside non-health data such as demographic and geographical data [1,2]. The health sector analyses are enriched when the patient-level data generated from these different sources are linked. Many countries worldwide already routinely capture health care data that can be used for such purposes. For example, the Scottish Morbidity Linked Dataset which encompasses Scottish Health Survey records, linked to NHS acute and psychiatric hospital records, cancer, and death registrations provides powerful research database http://www.esds.ac.uk/government/shes/; in France, record linkage between a hospital database and the French national mortality database offers new prospects for large prognostic studies based on hospital data ; and in Norway, the Medical Birth Registry of Norway is routinely linked with the Central Population Register, and can be linked with the other central health registers. This paper discusses the potential methodological advantages in the conduct of health economics analyses using patient-derived questionnaire data linked with routinely collected information and secondary care clinical datasets available in Wales, United Kingdom, with examples from a research cohort.
In order to realise the potential of electronically-held routinely collected information to conduct and support health-related research, the Health Information Research Unit (HIRU) at the College of Medicine at Swansea University, as part of the Welsh Assembly Government's commitment to the UK Clinical Research Collaboration (UKCRC), has set up the Secure Anonymised Information Linkage (SAIL) databank [4,5]. The SAIL databank brings together and links a wide range of person-based data. SAIL utilises a split-file approach to anonymisation to overcome issues of confidentiality and disclosure in health-related data warehousing by creating personal-level unique and encrypted identifiers for merging information from various sources [4,5]. The range of complementary sets of data includes clinical data from rheumatologists, existing routinely collected datasets such as the General Practice (GP) records, outpatient (OP) clinical data, inpatient (IP) episodes, accident and emergency (A&E) department, pathology data, NHS administrative register, breast and cervical cancer screening data, all Wales injury surveillance system, all Wales perinatal survey, congenital anomaly register and information service, birth and death data from the Office for National Statistics, and social services databases.
HIRU uses the MACRAL (Matching Algorithm for Consistent Results in Anonymised Linkage) algorithm to create encrypted Anonymised Linking Field (ALF) for each individual . The ALFs are mainly created based on the patient's NHS number; and if the NHS number is absent in a dataset, a mixture of other identifying variables like forename, surname, gender, postcode of residence, and date of birth are used for probabilistic matching, while maintaining complete anonymity for the end users . This linkage allows us to follow the patient pathway through the NHS system both retrospectively and prospectively from a reference date (e.g. questionnaire completion date). This system also allows linkage of data collected through patient questionnaires with other routinely collected datasets in the SAIL system.
As part of the Medical Research Council (MRC) Patient Research Cohort Initiative, a cohort of people with ankylosing spondylitis (AS), i.e. the Welsh population-based ankylosing spondylitis (PAS) cohort, has been developed using data collected from patient completed questionnaires linked with routine data . The study aims to recruit 1000 AS patients living in Wales and currently about 500+ AS patients are participating. This study has ethical approval from the London Multi-centre Research Ethics committee and the written consent of participants was obtained according to the Declaration of Helsinki. For the PAS cohort, the data collected from patients with a diagnosis of AS can be linked to other routinely collected datasets using the SAIL system. To highlight the potential benefits of using the linked routine data, this paper uses information on healthcare visits as reported by the AS patients through questionnaires and explores patient pathways in terms of actual events obtained from the linked GP, OP, IP, and A&E datasets in the SAIL databank.
The paper attempts to demonstrate the strength of data linkage in extracting complementary information from routine sources that are beyond the scope and time horizon of the study data, and does not intend to test hypotheses with regards to the data quality pertaining to the specific variable of interest or the individual datasets. The potential benefits of using linked data are discussed below.
With data spanning multiple years and the ability to link records across several datasets, it is possible using SAIL to track the healthcare utilisation history of patients in receipt of some form of intervention for a given condition across multiple healthcare sectors both before and after their index/reference healthcare event. Therefore, the SAIL data linkage system allows tracking of the patient pathways, both retrospectively and prospectively. Linkage with GP data system provides information about patients' primary care events going back many years including; previous diagnoses, referrals, presenting symptoms, investigation results and previous medications. This dataset can also be used to follow the patient at every visit to the GP and therefore record the development of associated conditions and use of co-medications. Linkage with IP data will record all hospital visits, surgery and hospital treatment. Linkage with the mortality datasets will ensure the dataset remains relevant and can examine survival of included patients. Linkage with A&E datasets will give information on emergency visits.
The use of linked routine data allows cross-checking of patient-reported recall data with actual health care events at the personal level. The inherent limitations (or strengths) of the data quality pertaining to the survey questionnaires under the recall method can be flagged; and an assessment of the generalisability of the patient-reported data can be made. On the other hand, data obtained from routinely collected data systems often require careful interpretation with respect to their quality, validity, timelines, bias, confounding and statistical stability . With the triangulation of datasets in the SAIL system, the validity and reliability of single datasets can be assessed . The triangulation process will at least flag the discrepancies, and we can then have an idea about any quality issue pertaining to both the routine and questionnaire data. However, this paper does not intend to make assertions with regards to the data quality of individual datasets, and views the linking of datasets primarily as a source of extracting complementary information which are beyond the scope and time horizon of traditional study data.
Cost of illness studies are typically subject to a degree of scrutiny with regards to the sources and methods of estimating the quantities and prices, the specification of study perspective, and the identification of the timeframe to which the costs apply [7-9]. The use of linked data enhances the precision of the healthcare use information and the timelines within which the costs incurred; and therefore will help provide an objective estimate of the cost and burden of diseases to the funders, health service (NHS in the United Kingdom), society and the individual at each stage of disease over a prolonged period of time.
In many conditions there is a delay between the onset of symptoms and establishing a diagnosis, during which period the patients still utilise healthcare resources. The linked routine data can provide information about the patients' visits to health care facilities during this symptomatic pre-diagnosis period, when the requirement for diagnostic investigations is often greatest. For example, within the SAIL data system, using the encrypted ALF, we can identify patients from a cohort of any particular disease who were diagnosed during a reference time (as indicated by first appearance of a specific diagnostic read code); and link those with various datasets (e.g. GP data, IP hospital admissions, OP, A&E data etc.) to track their pre-diagnosis visits to healthcare facilities since the date the symptom onset (as reported by the patients or established from the GP or A&E records). This allows one to compare the extent of health service utilisation, and therefore related costs, before and after the symptoms developed. In addition, one can also calculate the costs as a result of delayed diagnosis.
The linked healthcare analysis within SAIL need not be confined to deducing the extent of health service utilisation during the pre- and post-diagnosis illness periods but can additionally be performed to ascertain the size of the direct medical costs associated with the index illness that are incurred across different healthcare sectors. This is possible, for example, with the combined use of the cost figures included in the Trust Financial Return 2 (TFR2) accounts  that incorporate expenditures relating specifically to A&E attendances, IP admissions and OP contacts; and the cross-sector (i.e. primary care, secondary care, IP, OP, A&E etc.) health services utilisation at the individual level obtained from the linked data system. A system such as the SAIL system therefore not only allows the index event for a given condition to be identified but additionally introduces a longitudinal, temporal, dimension to the analysis as each of the healthcare sectors captured within SAIL can be searched for multiple years pre- and post-index healthcare event to determine the extent of health service utilisation and direct medical costs, made possible by the inclusion of an ALF within all of the SAIL datasets. This provides a more objective estimate of the actual costs of chronic conditions and any interventions.
A retrospective analysis of the patient's healthcare history can identify the types of referrals to healthcare services made at different points in time, thus, giving an assessment of health service usage and recommendations for improving patient care pathways. In particular, the linked GP data would provide important information to improve diagnostic and referral pathways, and thus maximise clinical efficiency and efficiency in the use of resources. The temporal aspects of the linked data sets also help conducting event history analysis, survival analysis and other relevant statistical and econometric models.
The linked SAIL data includes diverse sets of information, which enables profiling and stratification of patients relating to disease manifestations and severity, lifestyle, co-morbidities, and associated costs. Additionally, given that many chronic conditions have heterogeneous manifestations with a variable course and unpredictable episodes of exacerbation, the analysis can be carried out under several person stratification schemes based on severity of disease, various demographic attributes, and socio-economic conditions. This stratification will facilitate early targeting of interventions to patients at highest risk, thereby improving the cost-effectiveness ratio of these interventions.
Here we present an example of one AS patient's health service usage history by tracking the healthcare events through the linked datasets and comparing this with self-reported data. To preserve complete anonymity, the actual dates are modified by replacing with fictitious dates. As part of the PAS cohort study, the patient completed a questionnaire during the first week of November 2009, in which s/he was asked to recall the number of visits to the GP, OP, IP, A&E, and to various health professionals during the three months before the questionnaire completion date. The patient reported 4 GP visits, 1 OP visit, 1 IP visit, no A&E visit, and visited a rheumatologist, a radiologist, and a chiropractor once each. Distances to the healthcare facilities were 1 mile, 3.5 miles, and 3 miles for GP, OP, and IP, respectively. In each case the patient used their own car; and was accompanied by someone during the GP and IP visits. The patient also reported having taken pain reducing medicines (paracetamol, ibuprofen, and naproxen); having undergone an MRI scan and had blood and urine tests during the 3 months recall period.
Using the unique ALF, we tracked the patient's healthcare pathways through the routine data in SAIL system. Figure Figure11 plots the patient's healthcare events from the OP, GP, and A&E datasets for a 2 year period (i.e. August 2008 to August 2010), which represents the timeline approximately one year before and one year after the completion date of the questionnaire by the patient.
The linked routine data show 10 GP events, 2 OP visits and 1 A&E visit during the 3 month recall period. There is no IP visit recorded during this period. Therefore, the self-reported IP visit in the questionnaire may actually be an A&E visit, which would correlate with the routine data. Out of those 10 GP events, not all of them are physical visits by the patients, but may include any event (e.g. letter encounter, prescription collection, telephone conversation etc.). Further exploration of the GP read codes and descriptions for these 10 GP events yields information about medication, tests, and other GP related encounters, as shown in Table Table1.1. Retrospective and prospective tracking of events reveals that there are 51 such GP events during the 2 year period (Figure (Figure1).1). There are no OP, IP, or A&E visits before or after the recall period, indicating the danger of extrapolating patient reported 3 month recall data in the questionnaire over a longer period (e.g. one year). However, going further back through the linked data system in terms of the timeline (not shown in the figure), it was found that the patient made 4 OP visits during August, October, and November of 2005; and 1 IP visit on the first week of June 1999.
Using the GP data we tracked about 183 AS patients from the PAS cohort retrospectively and prospectively from the date of questionnaire completion to gather the following information: (a) number of GP events; (b) presence of a GP parent family 'drugs' read codes; and (c) the presence of a GP parent family 'diagnostic' read codes.
Table Table22 presents the average number of GP events for the AS patients grouped under several stratification schemes based on baseline disease severity score (low and high BASDAI); disease function score (low and high BASFI); age, sex, and the age of the onset of first symptoms. Results are presented for the retrospective 3 months recall period, 1 year period, and 5 years period - the questionnaire completion dates being the reference date for each patient. In the last column of Table Table2,2, we present the self-reported number of GP visits during the 3 month period from the questionnaires. As mentioned earlier the GP 'event' and 'visits' are to be construed differently. The GP event is defined as the unique dates for each patient where we can find GP read codes indicating administrative actions, referrals, visits, telephone conversation, prescription collection, symptoms, diagnosis, prescription drugs etc., pertaining to the particular patient. The GP visits refer to the physical visits to the GP by the patient. In the questionnaire the patients were asked about the visits to the GP. In the presence of numerous read codes, it is beyond the scope this paper to identify 'visits' from within the 'events'. Nevertheless, the number of actual visits obtained from the questionnaire can be used as complementary information as to what proportion of the GP events were GP visits.
Table Table22 shows that the patients with low disease severity have less GP visits and events than the patients with high disease severity. The patients with low disease severity, had 2.81, 12.92, and 61.92 events recorded in the GP system during the 3 months, 1 year, and 5 year retrospective period as opposed to 4.25, 17.79, and 80.28 events for the high disease severity groups. These ratios are consistent with those for the number of self-reported visits obtained from the questionnaires during the 3 month recall period, which are 1.31 visits for the low severity group and 1.78 for the high severity group. The 3 month GP events and self-reported GP visits in the high disease severity group were 1.51 and 1.36 times respectively more than in the low disease severity group and the between-group relative differences are largely consistent throughout the 5 year period (see Table Table2).2). The same strategy can be applied to the other stratification models. The data in Table Table22 indicates that the largest discrepancies between self-reported GP visits and routine data GP events are in the groups stratified by age and gender. We postulate that this suggests that older patients with AS (age ≥50) either tend to under-report GP visits or have more non-visit related GP events (e.g. for prescriptions) compared to younger patients, or that the reverse is true for younger patients. Similar hypotheses can be generated for female and male patients. This may have important implications as AS affects men more commonly than women, with onset in late teens or early adult years.
Table Table33 indicates the presence of parent drugs in the AS patients' GP history. The GP read codes starting with small letters a-z indicate the parent drug family the prescribed medicines belong to. Starting with the small letters a-z, the drug related GP read codes extends up to 4 more sub-digits to specifically identify the drug. It is beyond the scope of this paper to go beyond the first parent groups. Table Table33 shows, in different retrospective time spans, how many patients' GP read codes include the mentioning of a particular drug code at least once. It is evident that the highest number of patients is prescribed musculoskeletal and joint drugs (read code 'j'), followed by the central nervous system drugs (which include analgesics) (d). Other drug classes frequently recorded in the AS patients include gastro-intestinal system drugs (a), cardiovascular system drugs (b), and the skin drugs (m), which is consistent with the association of these diseases with AS. One could go beyond the parent read codes and specifically identify exactly which drug was prescribed at which date. The drug codes for the 3 months after (prospective) the dates of baseline questionnaire completion for 103 AS patients are shown in the last column of Table Table33.
Table Table44 similarly shows the presence of parent disease diagnostic read codes (start with capital letters A-Z) in the GP data system for the AS patients at 10 years, 5 years, 1 year and 3 months retrospective, and 3 months prospective time span, relative to the date of questionnaire completion. The most frequent disease group as indicated by the read codes fall under the musculoskeletal/connective tissue (N), skin/subcutaneous tissue disease (M), nervous system/sense organ diseases (F), respiratory system disorders (H), digestive system disorders (J), symptoms, signs, ill-defined conditions (R), and infectious/parasitic diseases (A). These are consistent with the conditions that are associated with AS (e.g. psoriasis, uveitis, colitis) or complicate the treatment (e.g. respiratory infections in patients on immunosuppressive therapy). Again, further exploration of the parent read codes by going beyond the first digit would reveal the specific disease diagnosis.
We tracked 236 and 296 AS patients retrospectively through the OP and IP data systems respectively to derive the average number of OP visits and IP admissions made by the patients grouped under different stratification schemes. The results are presented in Table Table5.5. The table has two panels - columns 1-4 relate to OP visits, and columns 5-8 relate to the IP admissions.
The estimates in columns 3 and 4 of Table Table55 report the average number of visits for the 3 months recall period obtained from the routine data system and the questionnaires respectively. In principle, the numbers in these two columns should match as they relate to OP visits only. It is observed that patients in all groups tend to overestimate the number of OP visits in the questionnaire. The over-reporting of OP visits is most marked in those with high disease activity (BASDAI) and functional impairment (BASFI). We postulate that one reason for this may be as a result of these patients finding it more difficult to physically attend OP clinics due to their greater disease activity and disability. Again, this may have important implications when using patient-reported data to estimate utilisation of healthcare resources. Figure Figure22 shows the number of self-reported and recorded OP visits by the AS patients. The x-axis shows the number of visits during the 3 months recall period self-reported in the questionnaire and the y-axis shows the corresponding number of visits obtained from the records of OP data. The numbers in the bracket is the number of patients. For instance, out of 79 patients who reported in the questionnaire to have visited once to the OP during the 3 months recall period, we found 29 patients with zero visit, 34 patients with 1 visit, 9 patients with 2 visits, and 7 patients more than 2 visits. It can be seen that in general, the patients overestimated the number of OP visits when completing the questionnaires. 29 patients reported an OP visit that was not recorded during the 3 month recall period, highlighting issues with using recall data.
In the second panel of Table Table55 (i.e. columns 5-8), we tracked 296 AS patients through the IP data system to obtain the number of IP admissions. Again, in principle, columns 7 and 8 should match, and indeed these results for IP admissions are more similar than the corresponding data for OP visits (columns 3 and 4). The results in columns 7 and 8 suggest that, in contrast to OP visits, patients tend to underestimate the number of IP visits. Younger patients and those with less severe disease severity (BASDAI) and functional impairment (BASFI) were most likely to underestimate IP admissions. We postulate that this is because the admission in these patients was more likely to be for a reason unrelated to their AS, and therefore overlooked and not reported in a questionnaire for an AS study.
Tracking through the retrospective data of those who had at least one IP admission, Table Table66 indicates additional complementary information on the number of days spent in hospital. This information was not captured in the questionnaire data. These data indicate that older patients, those with longer disease duration, higher disease activity and functional impairment spent significantly more days in hospital than their relevant comparison groups. This may also have been a contributory factor to the relative under-reporting of IP admissions in patients with lower disease severity or functional impairment.
The above examples demonstrate that linked routine data enables validation and clarification of patient reported data; the retrospective and prospective tracking of the patient healthcare utilization and pathways; and the referral history in a cohort of patients with AS. Such analysis makes it possible to deduce whether the anonymised individuals in question were suffering any common co-morbidities, in receipt of healthcare treatment prior to the occurrence of the reference event, whilst it also allows any frequent complications requiring medical attention in the days, months, years following the event (which could be an intervention or questionnaire) to be identified. From the methodological perspective, any linkage system would add new dimensions and perspective to traditional health related research (e.g. complement and enhance the results of RCTs), as a resource for clinical audits, and in a variety of health impact assessment exercises. For example, the longitudinal routine data would allow an assessment of the impact of specific healthcare interventions on subsequent healthcare utilisation (e.g. A&E visits or hospital admission). Important limitations of solely relying on questionnaire data include reliance on accurate patient recall and that the healthcare events of interest may not occur within the limited recall period (e.g. 3 months), but just before the recall period or after the completion date of the questionnaire. This makes extrapolation of the questionnaire data for an extended period of time unreliable. The longitudinal linked routine data comes into aid in this respect.
The linkage of the questionnaire data from the PAS patients with the GP data as shown above enhances and helps make sense of the rich information obtained from the GP Read codes. This constitutes a rich health history for these AS patients, for whom we can carry out patient pathway analysis from various clinical and economic aspects. In particular, in keeping with other AS cohorts, these patients had an average lag of about 8 years from symptom onset to AS diagnosis, on which we can conduct pre- and post-diagnosis analyses of health care utilisation.
Again, a matrix of traits based on the PAS questionnaire information linked with the SAIL data system will help profiling of AS patients for health and other related interventions. Table Table77 summarises the types of information gathered through the PAS questionnaires, which could all be linked with the routine data sources as well as various demographic, socio-economic, and environmental attributes of the patients.
The use of HERALD methodology can stratify groups of patients to identify the early characteristics of patients who subsequently develop severe disease, thus, enabling these patients to be targeted with early aggressive therapy in order to prevent severe damage and need for surgery. This profiling can be used to estimate the potential resource savings of focusing treatment on those patients with patterns of disease suggestive of the development of a severe outcome. All these will directly affect patient care for AS in terms of informing NHS service provision and NICE guidelines for the use of expensive biological therapies, and informing the process of assessment of cost-effectiveness. In principle, the methods developed for the PAS cohort and described here can be extrapolated to be used in other chronic disease conditions ; thus improving patient care for all those conditions. Linked routine data provides many opportunities for enhanced healthcare research and allows evaluation of impacts beyond the limited primary outcomes of interventional studies. As an example, the expanding SAIL databank in Wales already holds over a billion anonymised records from various databases, which can be anonymously linked at the individual record level. The combination of routine data with information from patients and RCTs allows the validation of real-life data and its application for clinical research. These linkable databases provide factual and continuous information with rich clinical and non-clinical details, which offers wide ranging opportunities in the realm of conducting evaluative research, clinical epidemiology, trial recruitment, genetic research, basic research of biological markers, stratified medicine, post-trial surveillance, risk assessment, service delivery evaluation, resource use, decision analysis, identification of early disease predictors, and the identification of subjects for prospective studies [6,22,23]. This data system also offers the opportunity for post-marketing surveillance and pharmacovigilance of new expensive, and often potentially dangerous, healthcare interventions in real-life settings. Complementing this resource with targeted health economic analysis, as proposed in the HERALD methodology, offers a unique opportunity to deliver the level of health economic data required to evaluate and drive forward cost-effective modern healthcare services.
The linkage of routine data, patient completed questionnaires and trial data offers unique opportunities for enhanced health economic analysis, including assessment of the validity, reliability and generalisability of health economic data not possible through the use of traditional isolated datasets. The information obtained from the linked data system would help improving patient pathways, and thus maximise clinical efficiency and efficiency in the use of resources.
The authors declare that they have no competing interests.
All authors were involved in the design of the HERALD protocol. The first draft of the paper was written by MJH, SB, and SS. Subsequent drafts were amended and finally approved by all the authors.
MJH is a Lecturer in Economics at Keele Management School, Keele University; and also worked as a health economist at the College of Medicine, Swansea University during 2010/11. SB is a Reader in Epidemiology and has worked for 15 years in the field of spondyloarthropathy. SS is a Clinical Senior Lecturer and Honorary Consultant Rheumatologist (MBBCh, MRCP, PhD) whose specialist clinical and research interest is Spondyloarthritis.
The pre-publication history for this paper can be accessed here:
Funding for the PAS cohort was provided by a Medical Research Council (MRC) and National Institute for Social Care and Health Research (NISCHR) grant (G080058). This study makes use of anonymised data held in the Secure Anonymised Information Linkage (SAIL) system, which is part of the national e-health records research infrastructure for Wales. We would like to acknowledge all the data providers who make anonymised data available for research.