Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Biostat**|**PMC2836212

Formats

Article sections

- Abstract
- Introduction
- Model and Estimation Methods
- Post-transplant HCV Progression Data
- Results for Post-transplant Data
- Simulations
- Discussion
- REFERENCES

Authors

Related links

Int J Biostat. 2010 January 1; 6(1): Article 7.

Published online 2010 February 20. doi: 10.2202/1557-4679.1213

PMCID: PMC2836212

Peter Bacchetti,^{*} Ross D. Boylan,^{†} Norah A. Terrault,^{‡} Alexander Monto,^{**} and Marina Berenguer^{††}

Copyright © 2010 The Berkeley Electronic Press. All rights reserved

This article has been cited by other articles in PMC.

Multistate modeling methods are well-suited for analysis of some chronic diseases that move through distinct stages. The memoryless or Markov assumptions typically made, however, may be suspect for some diseases, such as hepatitis C, where there is interest in whether prognosis depends on history. This paper describes methods for multistate modeling where transition risk can depend on any property of past progression history, including time spent in the current stage and the time taken to reach the current stage. Analysis of 901 measurements of fibrosis in 401 patients following liver transplantation found decreasing risk of progression as time in the current stage increased, even when controlled for several fixed covariates. Longer time to reach the current stage did not appear associated with lower progression risk. Analysis of simulation scenarios based on the transplant study showed that greater misclassification of fibrosis produced more technical difficulties in fitting the models and poorer estimation of covariate effects than did less misclassification or error-free fibrosis measurement. The higher risk of progression when less time has been spent in the current stage could be due to varying disease activity over time, with recent progression indicating an “active” period and consequent higher risk of further progression.

The course of many diseases can be modeled as moving through several distinct states or “stages” using methods known as multistate modeling (Jackson and Sharples, 2002; Jackson et al., 2003). This approach is particularly natural for progression of liver disease called *fibrosis* in patients with hepatitis C virus (HCV), because fibrosis is often measured by rating liver biopsies on a multistage scale ranging from no damage to cirrhosis, with 3 intermediate stages (Desmet et al., 1994). Although some studies have analyzed “fibrosis units per year” by assuming numeric equivalence of the differences between all consecutive stages and a constant rate of progression within each patient (Poynard, Bedossa and Opolon, 1997), a multistate model would likely be more realistic and useful. Standard multistate Markov models, such as implemented in the msm() function available in the R statistical package (http://cran.us.r-project.org/src/contrib/Descriptions/msm.html), have been used for HCV progression modeling (Deuffic-Burban, Poynard and Valleron, 2002; Terrault et al., 2008), but the Markov assumption—that prior history has no effect on disease progression—is questionable. The assumption is particularly undesirable for questions about clinical prognosis, where it is crucial to know whether someone who has progressed slowly in the past is likely to continue progressing slowly. Although a more general model has recently been developed (Lin et al., 2008), the invasiveness and expense of liver biopsies prevents the intensive data collection needed for that method. An additional difficulty is that fibrosis is measured with considerable misclassification (Bacchetti and Boylan, 2009).

This paper describes and applies a method for performing multistate modeling without the typical Markov assumptions about how the process evolves. In particular, we allow the chance of transition from one state to another at any given time to depend on properties of the entire previous history of the process. We begin by describing the methods, and in Section 3 describe a set of data addressing biopsy-measured fibrosis progression among liver transplant recipients who have HCV. We apply the method in Section 4, describe some simulation results in Section 5, and then conclude with discussion.

Multistate Markov models assume that the chance of transitioning to a different stage at any given time depends only on the current stage and covariates. This is convenient because dependence on more complex aspects of the past disease course makes calculations difficult, but the assumptions may be unrealistic. To allow dependence on the previous disease course, we consider a discrete-time model. This will permit calculation of the likelihoods of more complex models by enumeration of every possible time-course of stages that a person could have had, which we term *paths*.

We consider the evolution of a disease through stages at times *t*_{0}, *t*_{1}, *t*_{2}, *. . ., t _{N}* up to a particular maximum time

*S*= stage at time_{n}*t*_{n}*P*= a particular_{j}*path*of specific stages, i.e., (*s*_{j}_{0},*s*_{j}_{1},*s*_{j}_{2}, . . .,*s*)_{jN}

Here, *s _{jn}* is the stage that path

*o*= observed stage at observation time m_{m}*t*._{τm}

For liver biopsy data, we assume *S*_{0}=0, meaning that the liver is assumed to have been fully healthy before the disease process started. Note also that {*τ _{m}*} will generally be a small subset (sometimes just {0,

In general, observations may be misclassified, so we define (mis)classification probabilities

These will usually depend only on the observed and true stages, not directly on *j* or *τ _{m}*. The

To express the likelihood of a particular path, we define

where **x*** _{jn}* is a vector of covariates that pertain to path

Here, *β _{k}* is a vector of parameters for the transition from stage

The likelihood of the observed stages *and* a given path *P _{j}* can be expressed as

(1)

where **β** includes the parameters for all transitions, and we have assumed that *S*_{0} and *s _{j}*

The likelihood for an entire data set is the product across individuals of their likelihoods, because of the independence assumption noted above. We estimate the parameters **β** by maximum likelihood. Because of the simple form for equation (1), the main difficulty in implementing this strategy is efficiently working through all possible paths. At each given time *t _{n}*, many of the

We apply the above methods to an updated and modified version of a data set on post-liver-transplant progression that was previously analyzed by different methods (Berenguer et al., 2000). Transplanted livers are known to be free of hepatitis C and to be at fibrosis stage 0 prior to implantation, which simplifies the analysis compared to the typical situation for chronic hepatitis C, in which the exact time of infection is unknown (Bacchetti et al., 2007). Our data set consists of 446 recipients from four clinical centers who had a total of 1021 biopsies performed following transplant. Each biopsy was assigned an integer fibrosis score ranging from 0, meaning no fibrosis, to 4, meaning cirrhosis. We did not include death as a final stage because the available data did not reliably distinguish death due to the fibrosis disease process from death due to other causes. We excluded patients with hepatitis B or HIV co-infection. Because testing for HCV was not widely available before 1992 and known infection with HCV was required for inclusion in this study, we excluded subjects transplanted before 1992. This was to prevent bias due to potentially incomplete retrospective testing of subjects transplanted earlier who did not survive long enough to be routinely tested. We assume that the true fibrosis stage can only increase over time, and we set the end of followup, *t _{N}*, to be the time of the last biopsy before antiviral treatment and excluded 27 biopsies from 17 patients that occurred after treatment, because successful treatment can result in regression of fibrosis. Table 2 summarizes some characteristics of our study population. Additional variables evaluated for their possible influence on progression were: recipient and donor race; whether recipient and donor were the same race or the same sex; whether the recipient experienced a rejection episode (treated or untreated), or more than one episode; HCV genotype 1 versus all others; HCV genotype 1b versus all others (including 1a); reported age at first HCV exposure risk; and the elapsed time and estimated rate of fibrosis progression from first HCV exposure risk to transplant.

Characteristics of the 446 subjects available for analysis of post-transplant progression of liver fibrosis.

We used a discrete time scale with 4 steps per year. Eleven subjects had observed stages that, if correct, would have required progressing faster than one stage per time step. To make these observations more compatible with our assumption of progressing at most one stage per step, and to reduce their potentially excessive influence, we re-coded those biopsies as occurring one time step later. The time already spent in the current stage is an important time-varying covariate that can relax the Markov assumption of exponential waiting times in a stage, so we evaluated the time in the current stage and its logarithm as potential covariates. To avoid undefined logarithms, we defined time in current stage as 1/8 year for the first time step in a stage, 3/8 year for the second step, and so on. We chose 1/8 because it is half the step size; this also was small enough to allow rapidly changing hazard over the first few time steps in a stage, which enhances the contrast with the raw version of time already spent in stage.

We began by assuming the (mis)classification probabilities shown in Table 1 and evaluating models that included an intercept term for each stage along with time in current stage. Logarithmically transforming time in current stage produced a substantially better fit than using its raw value. We also evaluated total time spent in previous stages and time since transplant as predictors, but these did not fit as well as time already spent in current stage and did not substantially improve the fit when added to it, as measured by the log-likelihood (p=0.15). We then evaluated all other available predictors, selecting those with small Wald p-values to build a multivariate model. The “Overall” column in Table 3 shows estimated odds ratios for the resulting model, and Figure 1 depicts baseline progression risk in that model versus time in stage for a subject at Center 1, with donor age in the 31–50 category, transplanted in 1994, and with no OKT3 use. For each transition, the model estimated decreasing risk of progression over time, with the transition from stage 2 to 3 being an extreme case with an estimated 71% risk of progressing right away (only one time step spent in stage 2) and no subsequent chance of progression. Fitting a Markov model with the same covariates did not fit the data as well (p<0.0001 by likelihood ratio test). No additional covariates appeared to substantially improve the model (all p≥0.13). Indicator variables for HCV genotype 1 (odds ratio 1.04, p=0.80) or genotype 1b (odds ratio 1.00, p=0.99) appeared to have little effect, but HCV genotype was missing for 133 subjects. The odds ratio for recipient age was estimated to be 0.94 per decade (95% CI 0.83 to 1.07, p=0.36). Models for a few of the additional candidate variables did not reach convergence, but the likelihoods reached did not appear promising enough to warrant additional efforts to obtain exact solutions.

Estimated effects of covariates on the hazard of progression, with effects assumed to be the same for all transitions (first column, “Overall”) or allowed to vary by stage. Missing values for donor age and OKT3 use left 401 subjects with **...**

Including total time spent in previous stages as a predictor allows modeling of the possibility that some people inherently progress slower than others—a negative effect indicates those who took longer to reach the current stage also have a reduced risk of progressing further. Adding this variable, however, did not substantially improve the model (p=0.30). (In contrast, adding time already spent in current stage did substantially improve a model that already included total time spent in previous stages (p<0.0001).) More importantly, time in previous stages showed estimates somewhat contrary to inherently slower or faster individual progression rates. Only for the stage 1 to 2 transition was the coefficient in the expected direction, an estimated OR of 0.88 (95% CI 0.65 to 1.19) per additional year in stage 0. For the 2 to 3 transition, an additional year spent in stages 0 and 1 was estimated to *increase* the odds of progression to stage 3 by a factor of 1.80 (0.63 to 5.1), and for the 3 to 4 transition the corresponding OR was 1.05 (0.74 to 1.47). Using the logarithm of time spent in previous stages produced qualitatively similar results.

Table 3 also shows estimated covariate effects when they were allowed to vary by transition. The largest improvement in the log-likelihood was when the effect of Center was allowed to vary (p<0.0001 by likelihood ratio test), although improvements for year of transplant (p=0.038) and OKT3 use (p=0.0004) were also evident. Allowing the effect of donor age to vary by stage resulted in little improvement (p=0.94). With the stage-varying effect of Center, the baseline risk for the 2 to 3 transition was no longer degenerate and was similar to that shown in Figure 1 for the 1 to 2 transition, but in Center 2 the estimate remained degenerate—certain progression from stage 2 to 3 in one time step in all cases. The estimates for 2 to 3 and 3 to 4 at Center 4 are also fairly extreme, but the wide confidence intervals indicate very little information for those transitions. Some of the Center differences could be due to differential reader effects; in this archival data set, no information on specific readers is available to assess this possibility. The effect of year of transplant is fairly stable except for the larger effect for the 3 to 4 transition. OKT3 is an immunosuppressive agent usually used early following transplant. The increased risk for early transitions is therefore expected, but the possible mechanism of the estimated late protective effects is not clear, and the wide confidence intervals indicate that there may simply be no effect on those later transition.

We performed additional assumption checks and sensitivity analyses for the model corresponding to the “Overall” column in Table 3 and Figure 1. Adding a quadratic term for year of transplant did not substantially improve the fit to the data (p=0.91). There was some improvement from a quadratic term for donor age (p=0.0066), but this model implied reduced risk in the >65 category (effect intermediate between those of the 11–30 and 31–50 categories); the effects of other covariates remained similar to Table 3. Because some biopsies are performed in response to clinical developments, there is potential for bias to occur due to greater likelihood of having a measurement when disease has recently worsened, so we repeated the Figure 1 model using only routine (“protocol”) biopsies. This resulted in dropping 23 subjects and 172 biopsies, but results were very similar; the only substantial difference was a more steeply declining hazard for the 3 to 4 transition, resulting in about 0.2 less chance of reaching stage 4 by 7 years. We also examined an alternative measurement error assumption with larger misclassification probabilities (Table 3 in Bacchetti and Boylan, 2009). This had similar covariate effects and baseline functions for the 0 to 1 and 1 to 2 transitions. The estimated 2 to 3 transition was an estimated 69% chance of immediate progression, similar to Figure 1, but with ongoing risk resulting in a 96% chance of progression within 1 year (4 time steps) and a >99% chance within 2 years. There was an estimated 21% chance of immediate progression from 3 to 4, higher than in Figure 1, but the hazard declined more steeply and the chance of progression by 7 years only reached 40%.

Finally, we allowed for misclassification probabilities to be estimated as part of the modeling process. This required estimation of 9 misclassification parameters and improved the value of −2 times the log-likelihood (used for likelihood ratio testing) by 16.8, which is not quite enough to meet conventional cutoffs for the Akaike Information Criterion (18, or 2 times the number of additional parameters) or significance testing (p=0.052). In addition, some estimates were implausible. There was a 9% chance of measuring true stage 3 as stage 1, but no chance of measuring it as stage 2, and a 26% chance of measuring it as stage 4. True stage 2 was also estimated to be more likely to be measured too high than too low. This is implausible because one source of error is the biopsy’s missing the most diseased part of the liver, which produces only downward errors (Bacchetti and Boylan, 2009). The estimated covariate effects were very similar to those shown in Table 3, but the baseline hazard for the 3 to 4 transition increased rapidly from near zero, resulting in >99% chance of progression within 2 years.

In order to focus on reasonably realistic and relevant situations, and to gain insight into difficulties encountered in analyzing the liver transplant data, we performed a limited set of simulations based loosely on the results in the previous section. We generated simulated data sets based on the model shown in Table 2 and Figure 1, with the exception of replacing the degenerate baseline hazard for the 2 to 3 transition with one equal to that shown for the 1 to 2 transition. Each simulated dataset used the same individuals, with their original covariates, as in the real data, but gave them simulated observations. We obtained those by first generating an entire true path through the stages, and then generating observed stages at the same observation times as in the original data, applying various misclassification probabilities to the true stages at those times.

Table 4 summarizes results of analyzing 400 simulated data sets under each of four different sets of misclassification assumptions. The first three columns use the correct assumed misclassification rates when fitting the models, i.e., the rates actually used to generate the simulated observations. The rightmost column assumes less misclassification than was used to generate the observations. Failure to converge to a solution with a positive definite estimated covariance matrix did not occur without misclassification but increased with increasing misclassification. Extreme baseline hazard estimates, like that shown in Figure 1 for the stage 2 to 3 transition, also occurred more frequently with increasing misclassification and was more common for the later transitions. The extreme cases had essentially infinite coefficient estimates that precluded assessment of bias and root mean squared error (RMSE). For the covariates, bias was modest, never exceeding 8% of the true coefficient value, and mostly increased with increasing misclassification. Root mean squared error (RMSE) also increased with increasing misclassification. Interestingly, the rightmost column shows that using an over-optimistic misclassification assumption mostly reduced convergence and baseline hazard problems and improved bias and RMSE.

The method described in Section 2 permits relaxation of the usual Markov assumptions used in multistate models by allowing the risk of transition at any given time to depend on any property of the individual’s entire history up to that time. For modeling progression of biopsy-measured liver fibrosis due to HCV, we are particularly interested in whether slow progression in the past predicts low risk of progression in the future, and assessing this requires evaluation of non-Markov terms in the model.

We evaluated two non-Markov predictors. Allowing progression risk to depend on time already spent in the current stage allows a non-exponential distribution of time in each stage, and dependence on total time spent in all previous stages permits assessment of whether previously slow progression predicts lower current risk of progression. We found for all stages that hazard appears to decrease with longer time already spent in the stage. This could be due to there being periods when the disease is more active than at other times, so that having recently progressed to a stage is associated with active disease and greater risk of further progression. Decreasing hazard could also be caused by frailty selection. For example, risk of further progression upon reaching stage 2 could be very heterogeneous, with some persons certain to progress immediately and others immune from ever progressing. This would produce the type of baseline distribution shown in Figure 1b. Such heterogeneity, however, would also be expected to create an association of longer times in previous stages with lower current risk of progression, which did not appear to hold in our models. The effect of time in previous stages not only appeared weak, but also was often in the wrong direction for frailty effects. We therefore believe our results suggest a dynamic nature of post-transplant HCV disease. An important limitation, however, is that our data are from many years ago and our findings may not extrapolate to current patients, because care has changed, including use of different immunosuppression regimens and greater use of antiviral therapy.

An alternative approach to non-Markov multistate modeling incorporates latent traits (Lin et al., 2008), such as an inherent tendency to progress more slowly or more rapidly than average. This is similar to random effects modeling of continuous outcomes and could also address the clinical issue of whether slow progression in the past predicts low risk in the future. A recent description of such methods (Lin et al., 2008) appears to include too much complexity to be viable for application to our data set and also did not consider misclassification, but a simplified version with a single latent trait that applies to all transitions would likely be viable. Our evaluation of total time in previous stages, however, addresses the same issue and suggests that such a model would not provide additional insight.

The data analyzed here are relatively favorable for multistate modeling of HCV-related fibrosis progression in that the time of the start of the process is known and many of these post-transplant patients were biopsied several times. We nevertheless encountered limitations in the complexity of models that could be successfully fit, particularly when covariate effects were allowed to vary by stage. We also saw increasing technical difficulties, much greater computational burden, and poorer estimation when simulations included realistic amounts of misclassification. For this particular simulated situation, estimating models using overly optimistic misclassification assumptions generally improved performance. Reducing or eliminating misclassification would make our methods more viable, particularly because exact observations greatly reduce the number of possible paths. Eliminating misclassification, however, may not be possible.

An important alternative to liver biopsy is use of non-invasive methods, including some that produce numerical measurements rather than discrete stages (Cross, Antoniades and Harrison, 2008; Manning and Afdhal, 2008; Mehta et al., 2009). Longitudinal modeling of such measures could avoid the technical difficulties we encountered, and they can also be performed more frequently. A potential difficulty with such an approach, however, would be the need to evaluate the continuous analog of the stage-varying effects that we observed in Table 3.

Software for the methods of Section 2 is available in the mspath package for R at http://cran.r-project.org/web/packages/mspath/index.html. The mspath package handles more general models than considered here: it permits an arbitrary transition matrix between states and it allows misclassification to depend on covariates.

^{*}This work was supported by grant R01AI069952 from the United States National Institutes of Health. CIBEREHD is funded by the Instituto de Salud Carlos III. Computations for this study were performed using the UCSF Biostatistics High Performance Computing System.

- Bacchetti P, Boylan R. Estimating Complex Multi-State Misclassification Rates for Biopsy-Measured Liver Fibrosis in Patients with Hepatitis C. International Journal of Biostatistics. 2009;5:5. [PMC free article] [PubMed]
- Bacchetti P, Tien PC, Seaberg EC, et al. Estimating past hepatitis C infection risk from reported risk factor histories: implications for imputing age of infection and modeling fibrosis progression. BMC Infectious Diseases. 2007;7:145. doi: 10.1186/1471-2334-7-145. [PMC free article] [PubMed] [Cross Ref]
- Berenguer M, Ferrell L, Watson J, et al. HCV-related fibrosis progression following liver transplantation: increase in recent years. Journal of Hepatology. 2000;32:673–684. doi: 10.1016/S0168-8278(00)80231-7. [PubMed] [Cross Ref]
- Cross T, Antoniades C, Harrison P. Non-invasive markers for the prediction of fibrosis in chronic hepatitis C infection. Hepatology Research. 2008;38:762–769. doi: 10.1111/j.1872-034X.2008.00364.x. [PubMed] [Cross Ref]
- Desmet VJ, Gerber M, Hoofnagle JH, Manns M, Scheuer PJ. Classification of chronic hepatitis - diagnosis, grading and staging. Hepatology. 1994;19:1513–1520. doi: 10.1002/hep.1840190629. [PubMed] [Cross Ref]
- Deuffic-Burban S, Poynard T, Valleron AJ. Quantification of fibrosis progression in patients with chronic hepatitis C using a Markov model. Journal of Viral Hepatitis. 2002;9:114–122. doi: 10.1046/j.1365-2893.2002.00340.x. [PubMed] [Cross Ref]
- Jackson CH, Sharples LD. Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients. Statistics in Medicine. 2002;21:113–128. doi: 10.1002/sim.886. [PubMed] [Cross Ref]
- Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E. Multistate Markov models for disease progression with classification error. Journal of the Royal Statistical Society Series D-the Statistician. 2003;52:193–209. doi: 10.1111/1467-9884.00351. [Cross Ref]
- Lin HQ, Guo ZC, Peduzzi PN, Gill TM, Allore HG. A Semiparametric Transition Model with Latent Traits for Longitudinal Multistate Data. Biometrics. 2008;64:1032–1042. doi: 10.1111/j.1541-0420.2008.01011.x. [PMC free article] [PubMed] [Cross Ref]
- Manning DS, Afdhal NH. Diagnosis and quantitation of fibrosis. Gastroenterology. 2008;134:1670–1681. doi: 10.1053/j.gastro.2008.03.001. [PubMed] [Cross Ref]
- Mehta SH, Lau B, Afdhal NH, Thomas DL. Exceeding the limits of liver histology markers. Journal of Hepatology. 2009;50:36–41. doi: 10.1016/j.jhep.2008.07.039. [PMC free article] [PubMed] [Cross Ref]
- Poynard T, Bedossa P, Opolon P. Natural history of liver fibrosis progression in patients with chronic hepatitis C. Lancet. 1997;349:825–832. doi: 10.1016/S0140-6736(96)07642-8. [PubMed] [Cross Ref]
- Terrault NA, Im K, Boylan R, et al. Fibrosis Progression in African Americans and Caucasian Americans With Chronic Hepatitis C. Clinical Gastroenterology and Hepatology. 2008;6:1403–1411. doi: 10.1016/j.cgh.2008.08.006. [PMC free article] [PubMed] [Cross Ref]

Articles from The International Journal of Biostatistics are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |