The study population comprised 1038 consecutive patients with suspected liver disease, in whom the routine full blood count, liver function tests, and analysis of the serum fibrosis markers HA or collagen P3NP had been performed as part of routine diagnosis in the NHS laboratory of Southampton University Hospital Trust (SUHT), between July 2003 and November 2009. P3NP (Orion Diagnostica, Espoo, Finland) and HA (Corgenix Inc, Broomfield, US) were assayed using commercial immunoassays. Results are given throughout the text as follows: HA and P3NP (µg/l), platelets (109/l), ALT (iu/l).
Ethical permission to prospectively study these patients was obtained in 2003; as no research procedures were involved, informed consent was not required. Subjects were subsequently identified through the SUHT biochemistry database; all subjects with suspected liver disease and with the relevant clinical and laboratory data were included, with no exclusions. Medical, endoscopy, radiology, and pathology records were analysed to provide clinical data, and the subjects are described in two groups; demographic and clinical data are given alongside outcome data in .
Demographic details, diagnosis, and outcome in the derivation and validation cohorts
The Southampton Traffic Light
The STL is a clinically derived rule of thumb, based on the authors’ experience using fibrosis markers in the liver clinic. To aid interpretation for the ALDDeS study, results were categorised into three grades: green, amber, and red, as follows:
- HA > 30 μg/l or P3NP >5.5 μg/l – score +1
- HA >75 μg/l – score +2
- platelet count <150 ×109/l – score +1
- total score: 0 = green, 1 = amber, 2 or more = red.
At the time the original algorithm was designed, the researchers did not have the benefit of the huge dataset reported here, and had no validation cohort. Earlier versions included the international normalised ratio (INR) and albumin but these were dropped, as interim analyses showed them to be of no discriminatory value for liver fibrosis. For the platelet cut-off, the normal range in Southampton was used, P3NP and HA cut-offs were informed by interim area under the receiver operator curve (AUROC) analyses, but the algorithm was a clinical interpretation as opposed to a scientific analysis.
It was anticipated that binary logistic regression analysis of the development cohort would produce a more accurate algorithm than the STL. In actual fact, although the mSTL proved slightly more accurate in terms of AUROC analysis, the difference was clinically insignificant and the authors have continued to use the STL, which is easier to calculate. In the analysis of the validation cohort, the results of the two algorithms, STL (clinically derived) and mSTL (logistic regression model), are presented side by side.
Development of a logistic regression model
In accordance with the standards for reporting of diagnostic accuracy (STARD),11
the cohort was split into a model development cohort (397) and a validation cohort (641). The model derivation cohort comprised 397 subjects with objective evidence of the degree of liver fibrosis on liver biopsy within 2 years of the fibrosis markers (n
= 334), or cirrhosis (n
= 63), as evidenced by clinical pathological features together with evidence of portal hypertension, ascites, or liver morphology on imaging prior to the fibrosis markers. Biopsies were graded according to severity of fibrosis: no fibrosis (F0), fibrosis without cirrhosis (progressive fibrosis F1–3), or cirrhosis (F4), and the earliest stage of fibrosis, F1, was chosen as the cut-off for the analysis, because the study aim was to investigate the accuracy of the STL in diagnosing early disease.
This model was developed and internally validated in the development cohort (n
= 397), using a logistic regression analysis and a 0.632 bootstrap sampling process ().12,13
A sample of 397 subjects was taken, with replacement, from the 397 with a biopsy, HA, P3NP, and platelet values. As this sample was taken with replacement, it was possible for some subjects to be sampled multiple times and for others not to be sampled at all. A logistic regression model was then fitted to the sampled subjects, forcing HA, P3NP, and platelets into the model. This model was then applied to the subjects that were not sampled, with the AUROC of this validation model saved. This process was repeated a large number of times (n
= 10 000) and the 2.5th, 50th, and 97.5th percentiles used as the validation AUROC and accompanying 95% probability interval. Findings were then validated against the key clinical outcomes, in a prognostic model, in a separate cohort of subjects who had undergone the test for routine diagnostic purposes and in whom the stage of fibrosis was unknown at the time of the test.
Initial and final covariates in the binary logistic regression analysis; the dependent variable was the presence of liver fibrosis (F0 versus F1–4)
In the development dataset, there were 379/397 subjects with the full panel of possible variables being considered. A backwards-stepwise modelling approach was used to derive the model in the full set of subjects with a biopsy. Initial variables were: hyaluronic acid (HA), P3NP, albumin, international prothrombin ratio (INR), platelet count (PLT), bilirubin (Bili), alkaline phosphatase (ALP), and alanine transaminase (ALT).
The intermediate logistic regression model (mSTL) was as follows:
predicted probability (p) = exp(HA * 0.015 + P3NP * 0.447 + (PLT * –0.005) – 0.611)/(1 + exp(HA * 0.015 + P3NP * 0.447 + (PLT * –0.005) – 0.661))
Green/amber and amber red cut-off values were obtained from AUROC analysis. The red/amber cut off of 0.921 corresponded to 95% specificity (52% sensitivity) for any degree of fibrosis, and the green/amber cut-off of 0.616 corresponded to 90% sensitivity (54% specificity).
The equation above can also be written as:
log(P/1 – P) = HA * 0.015 + P3NP * 0.447 – PLT * 0.005 – 0.61.
Clinical outcomes were analysed in a separate cohort of 641 subjects out of 1038 in total, in whom objective evidence of the stage of liver fibrosis was not available when the risk algorithm was performed. No data from this cohort were used in the development cohort. In the validation cohort, 53/641 were missing the full dataset. The period of validation was as follows: validation cohort, mean 41 months (range 13–89 months); entire cohort, mean 46 months (range 13–89 months). Follow-up time and Kaplan–Meir survival curves were calculated from the day of the fibrosis marker test.
Mortality data were obtained from the NHS Strategic Tracing Service (NSTS); other data were extracted from the SUHT computer-based records and medical notes. The date at which oesophageal varices were first found at endoscopy, or ascites first demonstrated on ultrasound, computed tomography (CT), or clinical examination was recorded (). All investigations were part of routine NHS diagnosis and so the incidence of varices and ascites is likely to be an underestimate, but mortality data are comprehensive, as all deaths are recorded by NSTS. Survival and time to varices/ascites was measured from the time of the fibrosis marker test, and censored for dead patients from the day of death.