|Home | About | Journals | Submit | Contact Us | Français|
Primary biliary cirrhosis (PBC) and primary sclerosing cholangitis (PSC) are infrequent autoimmune cholestatic liver diseases, that disproportionate to their incidence and prevalence, remain very important causes of morbidity and mortality for patients with liver disease. Mechanistic insights spanning genetic risks and biological pathways to liver injury and fibrosis have led to a renewed interest in developing therapies beyond ursodeoxycholic acid that are aimed at both slowing disease course and improving quality of life. International cohort studies have facilitated a much greater understanding of disease heterogeneity, and in so doing highlight the opportunity to provide patients with a more individualized assessment of their risk of progressive liver disease, based on clinical, laboratory, or imaging findings. This has led to a new approach to patient care that focuses on risk stratification (both high and low risk); and furthermore allows such stratification tools to help identify patient subgroups at greatest potential benefit from inclusion in clinical trials. In this article, we review the applicability and validity of risk stratification in autoimmune cholestatic liver disease, highlighting strengths and weaknesses of current and emergent approaches. (Hepatology 2016;63:644–659)
Primary biliary cirrhosis (PBC) and primary sclerosing cholangitis (PSC) are chronic autoimmune cholestatic liver diseases, for which clinical outcome is largely dictated by development of cirrhosis, portal hypertension (PH), and variable predisposition to malignancy.1, 2, 3, 4 Rates of clinical progression vary, and accurately identifying disease course is of critical importance to patients, clinicians, as well as those committed to developing new, effective and affordable treatments.5 Patients seek reassurance and guidance as to their own prognosis, and clinicians wish to confidently recognize those at highest risk of poor outcomes as equally as they strive to reassure individuals with good prognosis. Partnerships with industry are essential to drug development; and collectively all those involved in clinical trial design, recruitment and analysis wish to understand unmet need and conduct studies of new therapies as carefully constructed interventions that deliver Specific, Measurable, Achievable, Relevant and Time‐cost limited outputs. Such ventures seek to “de‐risk” drug development pathways where possible, but maximize opportunity to advance therapy for patient benefit in a timely way.
Herein, we present an appraisal of existing parameters that stratify individuals with PBC and PSC, before examining the effectiveness and applicability of more incipient classification systems (Fig. (Fig.1).1). The strengths and weaknesses of various approaches are highlighted specifically throughout, as well more generally with regard to study design (Table 1).
The full appreciation of the breadth of PBC as a disease has evolved as awareness has risen, particularly given widespread access to anti‐mitochondrial antibody (AMA) testing, reactivity of which in the presence of cholestasis facilitates robust and timely patient identification without need for histological confirmation.6, 7 PBC is increasingly identified at an earlier precirrhotic stage,8 and well‐conducted multicenter cohort studies have aided in the recognition of variant presentations (Table 2), including male patients and women age <50 years.9 Ursodeoxycholic acid (UDCA) is the only approved therapy, with diminished disease progression evident in treated patients and significantly improved 10‐year transplant‐free survival (78% vs. 66%; P < 0.001).3, 8, 10, 11, 12 Pooled survival indices nevertheless remain lower than age‐ and sex‐matched control populations.10, 11, 12
Modeling the clinical course of PSC, in contrast to PBC, is far more testing, perhaps inevitably so given a lower incidence and absence of a defined serological marker. This is paralleled by a clinical phenotype driven by variable, unpredictable consequences related to chronic inflammation, fibrosis and neoplasia of medium‐ to large‐sized bile ducts. In the largest population‐based study to date (n = 590), disease was validated as being male predominant (~60%), with a median age at diagnosis of ~40 years.13 However, PSC can develop at any age, with younger patients frequently manifesting a hepatitic presentation.14 Associations with inflammatory bowel disease (IBD) are well recognized and ~70% of PSC patients have a history of colitis, which confers a 5‐fold greater risk of colonic cancer relative to IBD alone, as well as increased susceptibility to cholangiocarcinoma (CCA) independent of liver disease stage. PSC portends a standardized mortality ratio (SMR) more than 4‐fold that of a matched control population, although there is discrepancy between event‐free survival (EFS) times across transplant centers versus true population‐based cohorts (median, 13.2 vs. 21.3 years; P < 0.00113). Population‐level data thus highlight significant challenges to prognostic modeling and unmask the inadequate phenotypic representation of early‐stage disease and inherent selection bias with tertiary‐center‐restricted reporting.
Pruritus and fatigue are frequent symptoms associated with cholestasis15 and approximately 60% of patients with PBC are asymptomatic at diagnosis, with as few as 5% remaining symptom free over time.16 The prognostic importance of fatigue in PBC is contentious, but concern is perhaps best highlighted in the prospective cohort study from Jones et al. (n = 136),12 wherein transplant‐free survival (TFS) was significantly shorter among fatigued patients relative to nonfatigued, disease‐matched controls (56% vs. 74%; P < 0.0001), independent of UDCA provision. Although a consensus biological explanation for fatigue is lacking, presenting age and sex heavily influence the clinical phenotype, with young women (a group failing UDCA therapy more commonly) having the greatest symptom burden.9, 17 However, there is no evidence that symptomatic presentations impart additional discriminatory value to existing risk‐prediction models.
Symptomatic presentations in PSC similarly vary (36%‐56%), with over 20% developing symptoms de novo during follow‐up.18, 19, 20 Relapsing‐remitting episodes of acute cholangitis are a frequent concern; and data from several cohorts suggest symptomatic presentations carry poorer TFS and malignancy‐free survival.18, 20 One third of CCA are diagnosed within the first year of PSC presentation (annual incidence thereafter: 0.5%‐1.5%; lifetime risk: 7%‐15%),13, 18 and patients often report abdominal pain preceding diagnosis, particularly those with a prolonged history of IBD (>1 year).18, 21
Serum bilirubin is well established as a predictor of outcome and incorporated into several prognostic scoring systems.22, 23 However, “time‐constrained” models, such as the Mayo score, which include bilirubin together with other markers of cirrhosis, are limited to prediction of short‐term survival (<2 years) in relatively late‐stage disease. A potentially more applicable surrogate is serum alkaline phosphatase (ALP); and in the largest ever meta‐analysis of individual patient data (n = 4,845), a near log‐linear relationship was illustrated between ALP and subsequent risk of transplantation/death across several time points.8 This study demonstrated that ALP bestows prognostic information early in the clinical course, incremental to the predictive power of bilirubin and independent of follow‐up time, presenting age, sex, disease stage, and treatment status.
To this effect, several studies illustrate strong associations between percentage reduction or absolute decreases/normalization in serum ALP (in isolation or combination with other biochemical covariates) and significantly improved clinical outcome.10, 11, 24, 25 Indeed, the majority who successfully attain predefined biochemical thresholds 1‐2 years after UDCA treatment (13‐15 mg/kg/day) experience survival patterns akin to that of an age‐ and sex‐matched control population (Table 3A). All response criteria have been independently and externally validated, with Paris I capturing the greatest breadth of biochemical changes. Furthermore, there is clear, negative prognostic impact of biochemical nonresponse on future hepatocellular carcinoma (HCC) risk in PBC patients, independently and additive to the effects posed by male sex and advanced baseline disease stage.2
Although a small proportion of PBC patients with early‐stage disease meet response criteria free of therapy,26 this represents an understudied population, and presently, it is not possible to identify individuals likely to endure a good prognosis regardless of intervention. Inversely, paradigms reliant on waiting 1 year for therapeutic evaluation may leave high‐risk patients (future nonresponders) on a medical treatment lacking benefit and reduce impact of second‐line therapy because of delayed initiation. In this regard, a prospective study from China suggests that attainment rates, as well as predictive value, is identical when biochemical response is assessed at 6 versus 12 months (Table 3B),27 but this needs validation.
Population‐level and international multicenter studies have substantiated the predictive performance of biochemical response criteria, independently of disease stage and UDCA exposure.9, 28 Perhaps most notable is the UK‐PBC study (n = 2,353), which not only recognized an increasing prevalence of younger presenting women (25% age <50), but also an inverse correlation of patient age and likelihood of meeting biochemical response.9 Attainment rates were reportedly ≤50% in women age below 40 and echo results of an earlier, single‐center study wherein age <55 years conferred poorer relative survival compared to matched controls (SMR, 7.4).29 Younger women often present with more pronounced elevations in serum ALP17 but frequently fail therapy owing to transaminase elevations,9 possibly reflecting a more hepatitic phenotype. This is noteworthy given that the degree of interface activity is recognized to influence disease progression in PBC.10, 14, 30, 31 The impact of presenting age was less apparent in men,9 who, despite being older at diagnosis, exhibited greater frequency of nonresponse overall, possibly reflecting more advanced baseline fibrosis at presentation.32
The strong influence of presenting age may allow more timely stratification of at‐risk groups (preceding assessment of 12‐month biochemical response), who, because of a relatively poor predicted survival, would be potentially eligible for early clinical trial entry. However, the more opportune recognition of at‐risk individuals must ensure that low‐risk patients are not over treated,27 particularly given that 50% of all patients under 50 do indeed meet current biochemical response criteria on UDCA.9
Existing biochemical response criteria remain to be refined, with a subgroup of responders still at risk of developing adverse events. There is evidence that reduction in hepatic venoportal gradient whereas on UDCA treatment associates with improved TFS in PBC, stratifying through a 20% gradient decline over 2 years.3 Conversely, the presence of gastroesophageal varices (GEVs) is a poor prognostic factor4; and given that PH can develop in the absence of cirrhosis secondary to presinusoidal resistance, several algorithms for prediction of GEVs are proposed. Although advocated for guiding variceal surveillance, such models carry preselection bias, given that study populations from which they derive were included after endoscopy referral. Moreover, no current strategy allows noninvasive discrimination of clinically significant PH.
With regard to patient survival, performance characteristics of the aspartate aminotransferase (AST)/platelet ratio index (APRI) have been ascertained given ability to infer not only PH, but also fibrosis.3, 28 When applied at baseline or at 1 year, APRI was identified as an independent predictor of TFS across a tertiary center population (n = 386), with a discriminatory cutpoint of 0.54 externally validated in three international cohorts.28, 33 Moreover, 1‐year APRI identified the subgroup at risk of disease progression and earlier mortality despite successful attainment of biochemical response (Table 3C), indicating independent and additive prognostic information to existing criteria.28, 34, 35
Newer, highly complex, and robust computational algorithms incorporating facets of APRI in addition to conventional biochemical response parameters have recently been published. These scoring systems derive from large, multicenter cohorts as part of UK‐PBC as well as the Global PBC Study Group34, 35 and convey probability of TFS on a continuous, as opposed to dichotomous, scale (area under the receiver operator curve [AUROC]: >0.9). In addition to being internally validated, the latter in particular has been compared against a healthy age‐ and sex‐matched control population, demonstrating comparable prognostic performance to Paris‐I + APRI.35 However, it remains uncertain above what point patients will be deemed high risk enough for clinical trial stratification, how the modifier effects of UDCA on risk score will influence outcome (delta change), and which additional stratifiers will continue to retain independent clinical impact.
Serum bilirubin is inherent to many historic PSC prognostic models, including the disease‐specific Mayo score.36 Despite widespread application, the series from which the latter derives antedates modern management of variceal bleeding and receives further criticism given inability to foreshadow adverse events (AEs) in previous clinical trials.37 Although a persistently elevated bilirubin for >3 months incites concern for hepatobiliary malignancy,18 levels have a propensity to fluctuate with flares of cholangitis and potentially influenced by biliary interventions.
There is no proven survival advantage, or reduction in hepatobiliary/colorectal malignancy risk for PSC patients receiving UDCA, and an increased predisposition toward AEs well documented with high dosages (28‐30 mg/kg/day).1, 5 Several groups have nevertheless attempted construction of “ALP‐based” biochemical response criteria (Table 4),38, 39, 40, 41, 42, 43 but ultimately, each has failed cross‐validation at the originally conceived time points. For instance, the 1.5× the upper limit of normal (ULN) cutpoint proved discriminatory at 2 years in the Oxford cohort (irrespective of UDCA receipt40), but was only predictive when applied at 6 and 12 months in the Heidelberg and national UK series, respectively. Moreover, in only one published study has the predictive value of ALP as a continuous variable been confirmed before establishing utility through dichotomization43; however, full statistical methodology was not presented and clinical endpoints incorrectly assessed as time‐constrained events.
Systematic efforts to validate the prognostic utility of serum ALP in PSC therefore remain in their infancy, and none of the studies thus far incorporate a comparator control group. Therefore, it is difficult to infer what an improved serum ALP truly means, given that “PSC biochemical responders” may still benefit from trials of new therapy if survival significantly deviates from the healthy population. Spontaneous normalization has been reported in up to ~40% of patients;38 and whereas this may indicate a slowly progressive form of disease, based on available evidence ALP cannot be recommended as a stand‐alone stratifier of risk in PSC.
Unlike AMA, which holds no prognostic value,8, 9, 28 there exist several anti‐nuclear antibody (ANA) subtypes that may associate with adverse clinical outcome in PBC. Baseline anti‐gp210 reactivity imparts over a 6‐fold risk of progression to liver failure/transplantation44 and although neither independent nor additive to biochemical response12, 30 may assist in the earlier, prospective identification of high‐risk patients.27, 44 Anti‐centromere antibodies similarly associate with PH,44 although more often present in autoimmune connective tissue disease. Extrahepatic autoimmunity develops in ~60% of PBC patients; however, impact on liver‐related outcomes is not readily apparent.45
Between 9% and 15% of PSC patients have raised serum immunoglobulin subclass 4 (IgG4) values,46, 47, 48, 49 and at least three separate studies support clinical distinctions based on elevations; those having higher than normal values (>1.4 g/L) exhibiting greater derangements in liver biochemistry.46, 47, 48 One group identified shorter median time to transplantation in patients harboring elevated serum IgG4,48 although this observation has repeatedly failed replication across several international centers.49 Therefore, the stratifying properties of serum IgG4 in PSC remain unsubstantiated and require further evaluation.
Several historic studies suggest that the presence of colitis influences liver disease progression. However, many were flawed given their assessment of IBD as a time‐fixed covariate; and the chronological displacement of disease presence and activity between gut and liver manifestations impart significant difficulties in examining colitis as a risk stratifier. Nevertheless, in a prospective follow‐up of nearly 200 PSC patients, all hepatobiliary malignancies were observed to develop on a background of concurrent colitis, with no cancers in the absence of IBD.50 Moreover, TFS independent of CCA was also significantly different between groups (23% vs. 80%; P = 0.045). The negative prognostic impact of colitis on liver‐related outcomes has since been confirmed in a large Dutch PSC cohort (n = 161) as well as two population‐based series.13, 51, 52, 53
Several cholangiographic prognostic models derived from endoscopic retrograde cholangiographic (ERC) appearances have been proposed54; however, diagnostic paradigms have evolved and no correlation between severity of ductal involvement and survival through two‐dimensional magnetic resonance cholangiography (MRC) was demonstrated. Nevertheless, a promising study utilizing annual three‐dimensional MRC to score liver parenchymal appearances, PH and bile duct lesions predicted radiological progression from baseline with high accuracy (AUROC, >0.8).55 Sixty percent of patients developed evolving changes over ~4 years, and preliminary data indicate baseline radiological score to be a highly sensitive prognosticator of clinical outcome, with the most predictive components relating to parenchymal as opposed to ductal changes.56
Dominant strictures (DS) were originally defined based on historical ERC findings, and consensus opinion as to how such lesions are to be classified noninvasively is yet to be delivered. Observational studies report a presenting frequency of 12%‐60%,57, 58 with no population‐level indications of true incidence. Natural history data are similarly restricted to specialist centers, with reduced survival largely reflecting difficulties in CCA recognition.18, 50, 58, 59 However, more recent reports suggest actuarial TFS as significantly poorer irrespective of cancer development and heavily influenced by presence of colitis.50, 60 Several investigators report biochemical and clinical improvements after endoscopic therapy,61 but the prognostic impact of intervention needs assessment.
Small duct PSC (sdPSC) represents 10%‐15% of the disease spectrum, with affected individuals less often symptomatic.62 There is now well‐validated evidence that disease progression is relatively infrequent, occurring over a longer time period than the classical form.13, 63, 64 Although colitis manifests to a similar degree there is little to suggest an impact on liver‐related outcomes; and given that survival patterns mirror those of an age‐ and sex‐matched population, the need for investigative therapy is perhaps less perceptible in those with the small duct variant.
Disease identification in PBC and PSC is largely reliant on serology and cholangiography, respectively, in the appropriate clinical and biochemical context. Nevertheless, liver biopsy is invaluable in cases of diagnostic doubt and provides key information with regard to disease activity and severity that may improve predictive power of existing algorithms.31
Several contemporary histological systems have emerged for PBC,65, 66 with the aim of accurately representing interface activity, ductopenia, chronic cholestasis, and fibrotic indices—variables well known to forecast biochemical nonresponse and clinical outcome (Table 5).9, 10, 14, 25, 30, 31, 67, 68 Common histological changes in PSC include interface activity, ductopenia and concentric periductal fibrosis, although individual prognostic weightings are unclear, and no disease‐specific classification exists. Nevertheless, data extrapolated from the Dutch population‐based registry (n = 64) indicate that scoring through PBC‐based classification systems, as well as lobular fibrosis stage (Ishak), significantly associates with time to transplantation in PSC patients.69
Histology remains the gold standard for assessing fibrosis progression—a clear determinant of clinical outcome. However, the intrusiveness, coupled with well‐known sampling variability and disconcordant reporting in cholestatic disease, has fostered development of several noninvasive surrogates (Table 6). In the current clinical climate, histological stratification holds limited routine applicability, although staging systems and evaluation of prognosis‐related histological lesions may have a place as surrogate endpoints in clinical trials—a topic beyond the scope of this review.
The accuracy of vibration controlled transient elastography (VCTE) in fibrosis staging has been demonstrated in at least two large PBC cohorts,70, 71 with prognostic capabilities independent of biochemical response evident in a recent single‐center retrospective study of 150 patients.70 Though VCTE outperforms APRI as well as several noninvasive surrogates of fibrosis, it remains unclear whether the former confers additive discrimination to biochemical response. The prognostic impact of liver stiffness measurement (LSM) in PSC has also recently been described,72 and as with previous descriptors, correlated well with degree of liver fibrosis but performing best at extremes of histological stage (≤F1 and ≥F3). More striking was the observation that increased baseline measurements and rate of change in LSM were strongly and independently linked with PSC‐specific clinical events.72
LSM, in addition to reflecting severity of fibrosis, can also be influenced by extrahepatic cholestasis and may not necessarily capture disease facets, such as hepatic necroinflammatory activity, ductopenia and PH. Nevertheless, encouraging data from existing series strongly support VCTE‐derived LSM—absolute values as well as fluctuations over time—as major predictors of AEs. Given correlations with mortality and liver transplantation (LT) in PBC and PSC, VCTE may represent a generic surrogate in chronic cholestatic liver disease, and prospective validations as part of multicenter collaborative efforts continue to emerge.
The enhanced liver fibrosis (ELF) score bears similar prognostic utility to histological fibrosis staging in PBC,73 although akin to VCTE, additive predictive value to biochemical response has not been demonstrated. More recent focus on the stratifying properties in PSC led to a notable publication by the Norwegian Study Group. Therein, patients exhibited significantly divergent TFS curves according to tertile distribution, or through a dichotomous Youden‐index‐derived cutpoint.74 Moreover, ELF score correlated well with elastography and provided incremental prognostic utility to Mayo risk. However, one caveat is the relatively short disease duration experienced by transplant‐free survivors (median, 0.2 years) and of further uncertainty is how dynamic fluctuations impact outcome longitudinally. Nevertheless, this study represents the first noninvasive, externally validated serum biomarker panel in PSC.
Biochemical nonresponders represent the most readily identifiable at‐risk group in PBC, and incorporating a step‐wise algorithm with response criteria as the central feature is likely to capture the greatest breadth of individuals who will benefit from clinical trials (Fig. (Fig.2A).2A). Validation at interim time points for groups who commonly experience treatment failure is urgently decreed and may assist in the earlier identification of high‐risk patients. Along similar lines, prospective banking of biological materials with paired long‐term clinical follow‐up data could yield predictive markers from the point of diagnosis through interrogation of key pathways underlying nonresponse. The few PBC patients who endure AEs despite attainment of response remain poorly defined, but increasingly recognized28, 35; and the additional impact of “biochemical escape”—wherein previous responders develop subsequent elevations in laboratory parameters—yet to be explored. The additive predictive value of histology and its noninvasive surrogates to existing criteria also requires further validation in a manner similar to that presented for APRI, in addition to newer biochemical response criteria with dynamic predictive capabilities.31, 34, 35
By contrast, safe discrimination of risk phenotypes in PSC is not possible through early application of a single modality, and timely assessment requires harnessing multiple predictive techniques collectively (Fig. (Fig.2B).2B). Despite invasiveness of histological stratification, the advent of VCTE and related biomarkers hold promise, although predictive performance is best at stages of advanced fibrosis implying surrogacy toward disease stage, rather than severity, and prospective validation currently remaining. Present biochemical surrogates are far from robust, and it is crucial for future endeavors to secure appropriate control groups before stratifying PSC patients as low risk based on serum ALP alone, particularly given that 20% of UDCA‐treated patients with normal laboratory values still develop progressive disease.38 Further efforts are also needed to appraise the relative independence of existing parameters that stratify risk, both consequentially and concurrently.
Patients with PBC and PSC remain a heterogeneous cohort with concerns surrounding reliable outcome forecasting. Stratification paradigms are shifting with increased efforts toward recognition of at‐risk phenotypes. The increased utilization of such tools, both clinically and in trial settings, is hoped to allow for more personalized care. In so doing, low‐risk patients can be reassured and managed accordingly, whereas higher‐risk individuals are offered tailored care, as well as access to carefully designed trials relevant to their disease course.
Author names in bold designate shared co‐first authorship.
Additional Supporting Information may be found in the online version of this article at http://onlinelibrary.wiley.com/doi/10.1002/hep.28128/suppinfo.
Potential conflict of interest: Nothing to report.
G.M.H. has received funding from the NIHR Birmingham Liver Biomedical Research Unit and is a coinvestigator for UK‐PBC (www.uk‐pbc.com) supported by a Stratified Medicine Award from the UK Medical Research Council and principal investigator for UK‐PSC, a NIHR Rare Disease Translational Collaboration. P.J.T. has received funding from the NIHR Liver Biomedical Research Unit and is recipient of a Wellcome Trust Clinical Research Fellowship. The views expressed are those of the authors(s) and not necessarily those of the National Health System, the National Institute for Health Research, or the Department of Health.