|Home | About | Journals | Submit | Contact Us | Français|
Liver disease develops silently and presents late, with often fatal complications.
To develop a ‘traffic light’ test for liver disease suitable for community use that could enhance assessment of liver risk and allow rational referral of more severe disease to specialist care.
Two cohorts from Southampton University Hospital Trust Liver Unit: model development and a validation cohort to evaluate prognosis.
A total of 1038 consecutive liver patients (inpatient and outpatient) (development n = 397, validation n = 641) for whom the relevant blood tests had been performed, were followed for a mean of 46 months (range 13–89 months). Blood tests for: hyaluronic acid (HA), procollagen-3 N-terminal peptide (P3NP), and platelet count were combined in a diagnostic algorithm to stage liver disease.
A simple clinical rule combined: HA, P3NP, and platelet count into a ‘traffic light’ algorithm, grading the results red — high risk, amber — intermediate risk, and green — low risk. In the validation cohort, no green subjects died or developed varices or ascites (n = 202); in the amber group, 9/267 (3.3%) died, 0/267 developed varices, and 2/267 (0.7%) developed ascites; in the red group, 24/172 died (14%), 24/172 (14%) developed varices, and 20/172 developed (11.6%) ascites. Survival was reduced in red (P<0.001) and amber (P<0.012) groups compared with green.
A simple blood test triages liver disease into three prognostic groups; used in the community, it could enhance the management of risk factors in primary care and rationalise secondary care referrals, including the many patients with fatty liver and relatively minor elevations in alanine transaminase.
At a time when mortality for many diseases is falling, deaths from liver disease have increased fourfold since 1970 and doubled since 1993.1 The majority of these deaths have been from alcohol-related disease as a result of increasing alcohol intake, but the increasing incidence of obesity, metabolic syndrome, and diabetes has increased the prevalence of other forms of fatty liver disease, and chronic viral hepatitis is also increasing. Hospital episodes for liver disease have increased by 8.3% each year from 1998 to 2008; in 2005, there were 43 694 episodes coded with liver disease as the primary diagnosis, and 6798 deaths — a case fatality rate of 15.5% per episode.2 Unfortunately, liver disease develops silently and frequently presents with the late complications of cirrhosis: variceal haemorrhage, decompensated cirrhosis, or acute or chronic liver failure — all have a high mortality. The hospital mortality of cirrhosis has not changed for 30 years,3 suggesting a significant rethink is desperately needed.
If liver deaths are to be reduced, then there is a need to address the major risk factors for liver disease: alcohol, obesity, and viral hepatitis,2 but it is also necessary to detect liver disease before the development of cirrhosis, when lifestyle changes or specific treatment can prevent the progression of disease. Historically, the diagnosis of liver disease is reliant on a referral to specialist services, very often based on an elevated level of alanine transaminase (ALT). ALT and gamma glutamyltransferase are useful tests for inflammatory liver disease, and are elevated in around half of simple fatty livers but it is important to realise that they are of little help in predicting which patients have liver fibrosis or cirrhosis.4
This study used fibrosis markers (procollagen-3 N-terminal peptide [P3NP] and hyaluronic acid), along with routine liver function tests in a clinical algorithm in the liver clinic (Southampton Traffic Light — STL). The algorithm was developed for the ongoing Alcohol and Liver Disease Detection Study (ALDDeS), in which 10 000 subjects in 10 general practices were screened for alcohol use, and hazardous and harmful drinkers offered the ‘traffic light’ screening test.5 Alongside this project, a number of competing and perhaps equally effective non-invasive diagnostic methodologies have been published.6–10 This traffic light system is probably no more accurate than the better ones, but it was developed specifically for use in primary care, with the aim of being intuitively easy for patients to understand.
Liver disease mortality in the UK has doubled over the last 15 years, and survival rates of liver admissions have not improved. This study’s experience in Southampton is that more than 90% of first liver admissions have unsuspected liver disease, whereas clinic referrals are dominated by patients with minor elevations of liver enzymes who could be dealt with in primary care. The current diagnosis and management of liver disease in the UK leaves much to be desired. The study has developed a simple blood test which triages patients into red, amber, or green categories according to the degree of liver fibrosis. These categories predict survival and the development of liver complications and have the potential to rationalise the diagnosis and management of liver disease in primary care. Ongoing studies are addressing how best this new technology can be best used.
This study presents the STL test; examines whether a modified version (mSTL) created using logistic regression analysis would be an improvement; and analyses how well both models (STL and mSTL) predicted prognosis in a separate validation cohort. The study aimed to examine the following clinical questions:
The study population comprised 1038 consecutive patients with suspected liver disease, in whom the routine full blood count, liver function tests, and analysis of the serum fibrosis markers HA or collagen P3NP had been performed as part of routine diagnosis in the NHS laboratory of Southampton University Hospital Trust (SUHT), between July 2003 and November 2009. P3NP (Orion Diagnostica, Espoo, Finland) and HA (Corgenix Inc, Broomfield, US) were assayed using commercial immunoassays. Results are given throughout the text as follows: HA and P3NP (µg/l), platelets (109/l), ALT (iu/l).
Ethical permission to prospectively study these patients was obtained in 2003; as no research procedures were involved, informed consent was not required. Subjects were subsequently identified through the SUHT biochemistry database; all subjects with suspected liver disease and with the relevant clinical and laboratory data were included, with no exclusions. Medical, endoscopy, radiology, and pathology records were analysed to provide clinical data, and the subjects are described in two groups; demographic and clinical data are given alongside outcome data in Table 1.
The STL is a clinically derived rule of thumb, based on the authors’ experience using fibrosis markers in the liver clinic. To aid interpretation for the ALDDeS study, results were categorised into three grades: green, amber, and red, as follows:
At the time the original algorithm was designed, the researchers did not have the benefit of the huge dataset reported here, and had no validation cohort. Earlier versions included the international normalised ratio (INR) and albumin but these were dropped, as interim analyses showed them to be of no discriminatory value for liver fibrosis. For the platelet cut-off, the normal range in Southampton was used, P3NP and HA cut-offs were informed by interim area under the receiver operator curve (AUROC) analyses, but the algorithm was a clinical interpretation as opposed to a scientific analysis.
It was anticipated that binary logistic regression analysis of the development cohort would produce a more accurate algorithm than the STL. In actual fact, although the mSTL proved slightly more accurate in terms of AUROC analysis, the difference was clinically insignificant and the authors have continued to use the STL, which is easier to calculate. In the analysis of the validation cohort, the results of the two algorithms, STL (clinically derived) and mSTL (logistic regression model), are presented side by side.
In accordance with the standards for reporting of diagnostic accuracy (STARD),11 the cohort was split into a model development cohort (397) and a validation cohort (641). The model derivation cohort comprised 397 subjects with objective evidence of the degree of liver fibrosis on liver biopsy within 2 years of the fibrosis markers (n = 334), or cirrhosis (n = 63), as evidenced by clinical pathological features together with evidence of portal hypertension, ascites, or liver morphology on imaging prior to the fibrosis markers. Biopsies were graded according to severity of fibrosis: no fibrosis (F0), fibrosis without cirrhosis (progressive fibrosis F1–3), or cirrhosis (F4), and the earliest stage of fibrosis, F1, was chosen as the cut-off for the analysis, because the study aim was to investigate the accuracy of the STL in diagnosing early disease.
This model was developed and internally validated in the development cohort (n = 397), using a logistic regression analysis and a 0.632 bootstrap sampling process (Table 2).12,13 A sample of 397 subjects was taken, with replacement, from the 397 with a biopsy, HA, P3NP, and platelet values. As this sample was taken with replacement, it was possible for some subjects to be sampled multiple times and for others not to be sampled at all. A logistic regression model was then fitted to the sampled subjects, forcing HA, P3NP, and platelets into the model. This model was then applied to the subjects that were not sampled, with the AUROC of this validation model saved. This process was repeated a large number of times (n = 10 000) and the 2.5th, 50th, and 97.5th percentiles used as the validation AUROC and accompanying 95% probability interval. Findings were then validated against the key clinical outcomes, in a prognostic model, in a separate cohort of subjects who had undergone the test for routine diagnostic purposes and in whom the stage of fibrosis was unknown at the time of the test.
In the development dataset, there were 379/397 subjects with the full panel of possible variables being considered. A backwards-stepwise modelling approach was used to derive the model in the full set of subjects with a biopsy. Initial variables were: hyaluronic acid (HA), P3NP, albumin, international prothrombin ratio (INR), platelet count (PLT), bilirubin (Bili), alkaline phosphatase (ALP), and alanine transaminase (ALT).
The intermediate logistic regression model (mSTL) was as follows:
predicted probability (p) = exp(HA * 0.015 + P3NP * 0.447 – PLT * 0.005 + –0.611) / (1 + exp(HA * 0.015 + P3NP * 0.447 – PLT * 0.005 + –0.611))
Green/amber and amber red cut-off values were obtained from AUROC analysis. The red/amber cut off of 0.921 corresponded to 95% specificity (52% sensitivity) for any degree of fibrosis, and the green/amber cut-off of 0.616 corresponded to 90% sensitivity (54% specificity).
The equation above can also be written as:
log(P/1 – P) = HA * 0.015 + P3NP * 0.447 – PLT * 0.005 – 0.61.
Clinical outcomes were analysed in a separate cohort of 641 subjects out of 1038 in total, in whom objective evidence of the stage of liver fibrosis was not available when the risk algorithm was performed. No data from this cohort were used in the development cohort. In the validation cohort, 53/641 were missing the full dataset. The period of validation was as follows: validation cohort, mean 41 months (range 13–89 months); entire cohort, mean 46 months (range 13–89 months). Follow-up time and Kaplan–Meir survival curves were calculated from the day of the fibrosis marker test.
Mortality data were obtained from the NHS Strategic Tracing Service (NSTS); other data were extracted from the SUHT computer-based records and medical notes. The date at which oesophageal varices were first found at endoscopy, or ascites first demonstrated on ultrasound, computed tomography (CT), or clinical examination was recorded (Table 1). All investigations were part of routine NHS diagnosis and so the incidence of varices and ascites is likely to be an underestimate, but mortality data are comprehensive, as all deaths are recorded by NSTS. Survival and time to varices/ascites was measured from the time of the fibrosis marker test, and censored for dead patients from the day of death.
The AUROC analysis for the mSTL regression model was 0.87 (95% confidence interval [CI] = 0.83 to 0.91, F0 versus F1–4) for any fibrosis and 0.88 (95% CI = 0.85 to 0.92, F1–2 versus F3–4) for severe fibrosis. The values are categorised into three bands to aid clinical decision, with a resultant drop in the AUROC value. For the banded logistic regression model, the mSTL, the AUROC was 0.85 (95% CI = 0.81 to 0.90) for any fibrosis and 0.84 (95% CI = 0.80 to 0.88) for severe fibrosis.
The ‘easy to calculate’ version of the algorithm, the STL, performed as well as the complex regression model. In the derivation cohort, the AUROC analyses for the STL were 0.78 (95% CI = 0.72 to 0.83) for any degree of fibrosis (F0 versus F1–4) and 0.81 (95% CI = 0.77 to 0.86) for severe fibrosis (F0–2 versus F3–4). The breakdown for various grades of severity for the two traffic light scores mSTL and STL is presented in Table 3.
The study data originated from a secondary care population with a high prevalence of fibrosis and cirrhosis, and prevalence affects positive (PPV) and negative (NPV) predictive values, but the researchers were specifically interested in how the test might perform in a community population where the prevalence of fibrosis or cirrhosis is unknown. This modelling was done in the development cohort, because it is only in this cohort that it is possible to correlate the traffic light tests’ data with the stage of liver fibrosis on liver biopsy. A wide range of estimates of liver fibrosis and cirrhosis for a community sample were used to illustrate the effect on PPV and NPV. PPV and NPV, as well as estimated values (ePPV and eNPV) are given in Table 4.
In the hospital setting, a red STL had a high PPV for both fibrosis (0.96) and cirrhosis (0.69), whereas a green test had only a moderate NPV for fibrosis (0.50), and a very good NPV for cirrhosis (0.97). For the community population, the estimated PPV of a red STL for fibrosis dropped to 0.31–0.57, and of a red/amber STL to 0.12–0.29; the estimated NPV of a green test for fibrosis was 0.95–98 and for cirrhosis 0.99–1.00. The predictive values for the mSTL were marginally better.
The red mSTL group had a very poor survival and a high development of complications with varices and ascites; the green STL group had an excellent survival and almost no liver complications, and the intermediate amber group had a slightly reduced survival and a low rate of liver complications (Table 5).
Kaplan–Meier plots for survival and for the development of varices and ascites are given for the validation (FU) cohort (Figure 1); compared with the green group, survival was significantly diminished in the red (Mantel–Cox, P<0.001), and amber groups (Mantel–Cox, P = 0.012). Significantly higher numbers of patients with a red grade developed varices (Mantel–Cox, P<0.001) and ascites (Mantel–Cox, P<0.001), but there was no difference between green and amber grades (Figure 1) over the time course of the study.
In terms of AUROC analysis for survival, the respective data in the validation cohort were for the mSTL 0.85 (95% CI = 0.78 to 0.91); in the validation cohort, the STL was associated with slightly lower AUROC values, 0.78 (95% CI = 0.72 to 0.83), but the clinical outcomes were as good as for the mSTL (Figure 1 and Table 5).
ALT was lower in patients who developed varices (mean 42 iu/l versus 76 iu/l, P = 0.03) or ascites (mean 49 iu/l versus 76 iu/l, P = 0.004); these figures are from both cohorts combined, but the same trend was seen in the creation and validation cohorts separately. ALT was essentially the same in subjects with either no fibrosis, fibrosis, or cirrhosis (73 iu/l, 82 iu/l, and 67 iu/l respectively, creation cohort), or in subjects who died (mean 74 iu/l versus 73 iu/l). In the study population, a high ALT was not a useful discriminator for the severity of liver disease.
This study has developed a ‘traffic light’ grading system, which is a simple-to-apply method of estimating the risk of liver fibrosis and cirrhosis in a clinic population. The score was validated in a second cohort from routine clinical data and shown to predict clinically relevant outcomes and mortality. Although the score will have lower predictive value in lower-risk community populations, it has clinical utility and provides valuable risk data to aid clinical management. A more complex score derived from the same dataset using logistic regression had marginally improved performance in terms of AUROC, but was not better at predicting relevant clinical outcomes. In practice, the STL is easier to calculate and the algorithm is still used routinely in the Southampton liver clinic.
The traffic light score is simple to use, and in the follow-up cohort had good predictive value for clinical outcomes and survival. The score was derived using readily available clinical and biochemical data and commercially available fibrosis markers. Not all subjects in the derivation cohort had biopsy-proven cirrhosis but the authors believe the presence of ascites or varices is a sufficiently robust marker of cirrhosis to include these data. When only patients with biopsy data were included, the model was essentially the same. Only patients in whom liver biopsy results were available, or in whom there was strong clinical evidence of fibrosis/cirrhosis, were included in the development cohort, and hence those without these were included in the cohort used for survival analysis. Therefore, the follow-up cohort represents a slightly different population (ascites or varices excluded). This, in turn, will influence prognosis; as expected compared to the derivation cohort, members of the follow-up cohort have a better prognosis. Hence, the survival curves will underestimate true test performance in this clinic population.
The score and modified score were derived in a clinic population with a high risk of fibrosis and cirrhosis (82%); the score may not perform as well in a lower-risk community sample. It is important to appreciate that a study derived in a high-prevalence population, in this instance the clinic population, may perform less well when applied to a different population, due to spectrum bias. The prevalence of fibrosis and cirrhosis in the community is unknown and so the researchers modelled a wide range of prevalence for fibrosis 8–20%, and cirrhosis 4–10%. A French study of FibroTest (known as FibroSure in the US, is a patented biomarker test that uses the results of six blood serum tests to generate a score that is correlated with the degree of liver damage in people with a variety of liver disease) and elastography in 7463 normal subjects found that 2.8% had evidence of liver fibrosis.14 In the ALDDeS study, around 14% of hazardous drinkers had a red traffic light.15 The study considers their range of estimates for fibrosis of 8–20% is reasonable for a high-risk community group. It is widely recognised that predictive values will be lower, but spectrum effects may also change sensitivity and specificity.16 The validation sample used for prognostic validation had a poor outcome, reflecting the underlying severity of liver disease in this population. Since it is unlikely that liver biopsies to confirm the test result will be widely adopted, it will only be possible to ascertain the performance of these tests in community samples by validation of cohorts to determine outcomes. Despite the provisos, the study considers the results to be sufficiently useful to inform initial clinical management and lifestyle advice.
The various methodologies for non-invasive diagnosis of liver fibrosis and cirrhosis fall into two groups, various combinations of blood tests including serum markers of fibrosis as detailed in this study, and imaging modalities including elastography. All have been subject to intensive investigation, including systematic reviews and meta-analyses,7,9,17–20 with a consensus emerging that in liver disease of varied aetiologies, the detection of cirrhosis and severe fibrosis is accurate, but the detection of early stages of fibrosis considerably less so.
A number of studies have examined clinical outcomes, and various tests have been shown to predict survival; a study of FibroTest in 537 subjects with hepatitis C found that it was able to predict survival with an AUROC of 0.76, compared with an AUROC of 0.66 for histological staging.20 Another study compared FibroTest (AUROC for survival = 0.69) with Fibrometer A (0.69), Hepascore (0.69), histological staging (0.69), Pugh (0.62), FIB4 (0.64), AST to platelet ratio index (APRI) (0.56), and Forns 24 (0.43) in 218 subjects with alcohol-related liver disease followed for up to 11.8 years.22 Further studies have used the AST/platelet count ratio and the commercial ELF test (a combination of HA, P3NP, and tissue inhibitor metalloproteinase [TIMP]), which predicted survival in 457 subjects with a range of liver diseases, with an AUROC of 0.87.23 All these outcome data are essentially similar to the findings of the present study and the study would conclude that the various non-invasive tests including the STL, mSTL, FibroTest, ELF and APRI appear to have similar accuracy in predicting outcomes, and, given the accuracy of cirrhosis prediction in general, are probably equally likely to be able to define a population at risk of variceal haemorrhage.
There are several tests incorporating biomarkers or other measures which are showing promise in the secondary care environment as indicators of liver fibrosis/cirrhosis, with prognostic relevance. None have yet been validated in the primary care context, where issues of both disease prevalence and performance bias may reduce their predictive value. These tests are not suitable for screening in low-risk individuals but they may now have a place in the management of patients, with evidence of some non-specified liver disease or high liver risk, when making decisions about further tests or referral. The STL test results may be used to recommend lifestyle changes and to guide referral for further investigation, with a red test indicating a probability of significant underlying liver disease, and the possibility of cirrhosis. While the PPV of the test in general practice may not be as good as predicted from the development cohorts — an amber test is associated with approximately 50% likelihood of liver fibrosis in the study’s secondary care setting, but perhaps only 10–30% in a primary care setting depending on the degree of liver risk — NPVs are still likely to be high. The study would therefore advocate that tests such as the STL may gain a place in the rational management of liver risk when a period of watchful waiting is appropriate because the risk of fibrosis is so low. This should allow appropriate investigation of those at higher risk, by liver specialists, and delivery of lifestyle interventions in those managed in the community.
We are grateful to the following Southampton medical students who contributed to data collection during their fourth-year projects: Emily Williams, Will Mackintosh, Lisa Kramer, Nina Kapoor, Alicia Watts, and Michelle Ledbury; to Professor Paul Roderick for advice on the preparation of the manuscript; and to Mr Scott Harris who advised on statistical analysis and performed the bootstrap analysis and logistic regression modelling for the mSTL. The copyright of STL and mSTL model algorithms is owned by the lead author Nick Sheron; please contact the author if you wish to use the algorithm for commercial purposes.
No funding received.
Ethical permission to prospectively study the patients included in this research was obtained in 2003; as no research procedures were involved, informed consent was not required.
Freely submitted; externally peer reviewed.
The authors have declared no competing interests.
Contribute and read comments about this article on the Discussion Forum: http://www.rcgp.org.uk/bjgp-discuss