Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Biol Blood Marrow Transplant. Author manuscript; available in PMC 2013 December 18.
Published in final edited form as:
PMCID: PMC3867133

A review of multistate models in hematopoietic cell transplantation studies


Multi-state models are used to describe how a patient’s clinical status evolves over time. They can be a useful tool for describing the complex post transplant recovery process. In this paper we review basic concepts of multistate models and describe how they are used to model transition rates or state probabilities. We also discuss the use of landmark analysis to summarize how a patient’s long-term prognosis changes as they transition to different health states. We illustrate these uses of multistate models on a dataset with outcomes after bone marrow transplantation for patients with Severe Aplastic Anemia.


Patients experience a number of complications after a hematopoietic cell transplant (HCT). For example, patients may fail to recover neutrophil or platelet counts, they may develop acute or chronic GVHD or both, and they may relapse from their disease. All of these complications may lead to death, while patients may also die of other causes. Accurate modeling of the likelihood of experiencing these complications can provide insight into the transplant recovery process and guide clinical monitoring of patient’s status. Multi-state models are a useful tool for describing the post transplant recovery process. In this paper we review several key applications of multistate models and illustrate them on a dataset from the Center for International Blood and Marrow Transplant Research (CIBMTR) with outcomes after unrelated donor bone marrow transplantation (BMT) for 375 patients with Severe Aplastic Anemia (SAA).

Overview of multi-state models

In a multistate model, the state X(t) represents a patient’s clinical status at a particular time point t post transplant, from among a set of possible states, and the lines between the states indicate transitions between states that can occur with a particular rate. Andersen and Keiding (2002) provide an introduction into event history analysis through multi-state models. As an example, consider the dataset of outcomes after BMT for patients with SAA. One concern post transplant is that the donor cells will fail to engraft and repopulate the recipient’s immune system; this can manifest as either lack of initial engraftment or recovery of neutrophils, or as secondary graft failure when the patient’s neutrophil counts drop after initial recovery. This process can be shown as a multi-state model, where a patient starts post transplant in state 0: alive without neutrophil recovery, and from there can either go to state 1 (alive with neutrophil recovery) or die prior to engraftment (state 2). Once they are in state 1, they can die or they can experience secondary graft failure (state 3), and from state 3 they can further progress to death. This multistate model is summarized in Figure 1, which also shows the numbers of patients experiencing each transition as well as the number at risk.

Figure 1
Multistate model for engraftment, secondary graft failure, and death after BMT for SAA

Some states are absorbing states where it is not possible to transition out of that state, while the remaining ones are transient states; in this example only state 2 is an absorbing state. Note that survival data and competing risks data are both special cases of multistate models; for survival data there is one absorbing state (death) and one transient state (alive), while for competing risks data there are multiple absorbing states (failures from each of several causes) and one transient state (alive and failure free). Models for survival data and competing risks data are reviewed for a clinical BMT audience in Klein et al.(2000) and Logan et al. (2006), so we do not discuss them further here; rather we focus on utility and applications of more complex multistate models.

A multistate model is often described by the transition intensities or rates, hij(t). These tell you the likelihood of being in state j tomorrow given that you are in state i today (at time t post transplant). Alternatively, one may be interested in describing the probability of being in state j at time t given that you are in state i at time s, denoted Pij(s,t). This probability is a function of the transition intensities for all possible paths between the two states and it also accounts for the time interval between s and t. Finally, one may be interested in describing the probability of being in a particular state j at a particular time t post transplant. This is given by the state probability Pj(t) = P0j(0,t), where state 0 refers to the immediate post transplant state. Often multistate models are assumed to have a Markov property, which means that the transition intensity hij(t) only depends on the patient’s history through the state they are in at the time t, X(t). Less restrictive assumptions include semi-Markov models, where the transition intensity depends on the amount of time they spent in that state, or non-Markov models, where the transition intensity depends on the entire path that they took to get to their current state.

Modeling covariate effects

Often researchers are interested in understanding the effects of covariates on post transplant outcomes. There are a number of ways in which the effect of covariates can be modeled. An Andersen-Aalen-Johansen Markov model assumes that each transition rate can be modeled using a Cox proportional hazards model


so that the effect of covariate Z on a particular transition rate can be interpreted as a hazard ratio, exp(β). Note that a Cox model must be estimated for each transition, and this may be difficult to do particularly for modest sized datasets. Sometimes simplifying assumptions are made to stabilize the models by pooling the data across several transitions. One of these is particularly common when analyzing the effect of a time dependent covariate, such as whether the patient has experienced GVHD, on subsequent outcome such as relapse. Here the transitions into the relapse state from a state with GVHD and a state without GVHD are assumed to have proportional hazards relative to one another, so that the transition intensity into the relapse state simply depends on whether or not they have experienced GVHD by time t,


where I(TGVHDt) is 1 if the patient has experienced GVHD prior to t and 0 otherwise. The hazard ratio for relapse for a patient who has experienced GVHD compared to a similar patient (same covariates Z) who has not experienced GVHD is given by exp(γ).

Alternatively, one may be interested in the effect of covariates on the transition probabilities or state probabilities rather than the transition rates. One approach is to combine the various Cox models for each transition into a combined model for the transition probability; however, the effect of the covariate in such a model is complex and difficult to interpret. Alternatively, a number of researchers have proposed methods for direct modeling of state probabilities, including pseudo-value regression (Andersen et al., 2002) and direct binomial regression (Scheike and Zhang, 2006). In both cases, one assumes that the state probability is related to the covariates through a link function g(·), so that g(Pj(t)) = βZ.

To estimate the parameters using pseudovalue regression, first estimate the probability of being in state j at time t for the complete data set, denoted Pj(t) and the state probability on the data set obtained by deleting patient i, denoted by P^j(-i)(t). Then the pseudovalue estimate of the state probability for patient i at time t is given by the difference Yij(t)=nP^j(t)-(n-1)P^j(-i)(t). When there is no censoring, Yij(t) reduces to a simple indicator of whether the ith subject is in state j at time t. State probabilities can be estimated using either the Product Integral relationship between transition rates and state probabilities, or for transient states Pepe (1991) proposed writing the state probability as a difference in Kaplan-Meier estimates; see Andersen and Klein (2007) for details.

Once these pseudovalues are computed for each individual and time point on a prespecified grid of time points, they are used as the dependent variables in the model above to examine the effects of covariates on outcome. Parameter estimates and standard errors are obtained using generalized estimating equations (Liang and Zeger, 1986), and may be computed using the SAS procedure GENMOD for example.

While many choices of the link function are possible, we mention three explicitly. First, the logistic link function g(x) = log(x/(1 − x)) gives results analogous to logistic regression, so that we can interpret exp(β) for a binary covariate as the odds ratio for the likelihood of being in state j at time t for patients with the factor vs. those without the factor. Alternatively, a complementary log-log link function g(x) = log(−log(x)) or an identity link function g(x) = x are also often considered.

This approach of direct modeling of state probabilities has been described to model the current leukemia free survival probability (Andersen and Klein, 2007), as well as other probabilities in an illness-death model where aGVHD is the transitional illness (Andersen et al., 2002). Here we apply it to the SAA example described in the previous section. One outcome which may be of direct interest to clinicians is the probability of being alive and engrafted (state 1) as a function of time; this combines both primary recovery and secondary graft failure into one summary endpoint describing a positive result. Note that the probability of being in state 1 can be written as the probability of being in state 0 or 1 minus the probability of being in state 0. Therefore, an estimate of the probability of being alive and engrafted is provided by a difference in the Kaplan-Meier estimates


where Ŝ2,3(t) is the Kaplan-Meier estimate treating transitions to states 2 or 3 as events and Ŝ1,2(t) is the Kaplan-Meier estimate treating transitions to states 1 or 2 as events. An estimate of the marginal probability of being alive and engrafted is given in Figure 2.

Figure 2
Probability of being alive with neutrophil recovery as a function of time post transplant

The pseudo-value for P1(t) is obtained using the delete one estimator as


We can use these pseudo-values to directly model the probability of being alive and engrafted as a function of covariates, including age at transplant, gender, karnofsky performance score, HLA matching of donor and recipient, and graft versus host disease prophylaxis. We use pseudo-values at 3 time points (3, 6, and 12 months), and a logistic link function. The results of the regression model for survival with engraftment in terms of odds ratios for each of these covariates is given in the table below. HLA mismatch is significantly associated with worse survival with engraftment, and patients age 21–40 have significantly better survival with engraftment compared to those <=20 or >40. Other risk factors are not significantly associated with the probability of being alive with neutrophil recovery.

Prediction and Landmark Analysis

Another important use of multi-state models is to provide predictions of how a patient’s prognosis may change over time, as they transition through different states. Klein and Shu (2002) use multistate models to describe how the probability of dying in remission within two years can be predicted based on whether a patient has recovered their platelets and/or developed acute GVHD by a particular time. They also illustrate the notion of innovation gain, which is the difference in predicted probabilities between patients in two states and reflects the effect of having a specific condition on outcome. Van Houwelingen and Putter (2008) propose a simplified model to obtain long-term survival predictions based on a patient’s status at various landmark times. At each landmark time s, they assume a proportional hazards model for the risk of death at subsequent times given the patient’s current status X(s), e.g. alive with or without platelet recovery and with or without prior aGVHD,


Since the parameters are expected to vary slowly over the landmark times, they model the baseline hazard function h0(t|s) and the log hazard ratio β(s) as a smooth function of the landmark time s. The data for each landmark time can then be stacked on top of one another in the dataset and standard software which allows delayed entry can be applied to estimate the model parameters.

For the SAA example, we use the landmarking method of Van Houwelingen and Putter to show the probability of 2 year mortality for patients alive and in each of the three states at various landmark times post transplant. The curves in Figure 3 refer to a patient with age <=20, KPS <90 and HLA matched to the donor. Note that there are few patients alive and in state 0 more than 1 month post transplant, so this state is not particularly informative for such a landmark analysis. However, the plot shows that the probability of dying by 2 years for a patient who is alive with neutrophil recovery at 3 months is less than 20%, compared to over 50% for a patient who is alive but has experienced secondary graft failure. At 6 months, these probabilities of dying by 2 years change to 10% and 35% respectively for patients without and with secondary graft failure. These plots help give a clinical interpretation of the impact of secondary graft failure on long term outcome to augment hazard ratios from a model.

Figure 3
Probabilities of dying within 2 years for patients in various states at each landmark time point

Other applications of multistate models

Multistate models have been used in a number of other settings with BMT applications. Lee et al. (1997) and Cutler et al. (2004) used Markov multistate models to facilitate allogeneic transplant vs. no transplant decision analysis for CML and MDS patients, respectively. For example, in Cutler et al. (2004), they considered several transplantation strategies including decisions of proceeding directly to BMT or proceeding to transplant only after progression to AML. Patient health states included being alive with MDS, alive with AML, BMT, alive after BMT, AML or MDS relapse, and death. For these kinds of decision analyses, several data sources are typically needed to estimate the various transitions present, particularly since the transitions may be pre or post transplant. Because these different data sources are being merged, one should be cautious about potential selection bias for the transplant and non-transplant data. Once the transition rates for the model are estimated, mean life years and quality adjusted life years can then be compared between the different strategies.

Quality adjusted life years have been studied as a multistate model in other settings as well. For example, Tunes da Silva and Klein (2009) propose using pseudo-values to fit direct regression models for mean quality adjusted life years.

Another application of multistate models is estimation of the prevalence of a particular intermediate condition among survivors. For example, Klein and Shu (2002) discuss various methods of estimating the prevalence of chronic GVHD at time t, defined as the probability a subject has cGVHD at time t given the subject is alive at time t.


Multi-state models provide a flexible framework for understanding clinical events occurring post transplant and their impact on outcome. They can be used to estimate and model clinically relevant quantities such as the current leukemia free survival or the probability of being alive with engraftment, as well as to provide information on how a patient’s current state affects their long-term prognosis. However, they do require a greater level of detail in data collection due to the number of transitions between states that need to be captured. They also may require greater sample sizes in order to model various transitions adequately. These caveats aside, recent statistical techniques developed for multistate models have helped to make these a promising area of clinical research in BMT.

Table 1
Pseudovalue regression model for the probability of being alive with engraftment


  • Klein JP, Shu Y. Multi-state models for bone marrow transplantation studies. Statistical Methods in Medical Research. 2002;11:117–139. [PubMed]
  • Tunes da Silva G, Klein JP. Regression analysis of mean quality adjusted survival time based on pseudo-observations. Statistics in Medicine. 2009;28:1054–1066. [PMC free article] [PubMed]
  • Lee SJ, Kuntz KM, Horowitz MM. Unrelated donor bone marrow transplantation for chronic myelogenous leukemia: A decision analysis. Annals of Internal Medicine. 1997;127:1080–1088. [PubMed]
  • Cutler CS, Lee SJ, Greenberg P, et al. A decision analysis of allogeneic bone marrow transplantation for the myelodysplastic syndromes: delayed transplantation for low-risk myelodysplasia is associated with improved outcome. Blood. 2004;104:579–585. [PubMed]
  • Van Houwelingen HC, Putter H. Dynamic predicting by landmarking as an alternative for multi-state modeling: an application to acute lymphoid leukemia data. Lifetime Data Analysis. 2008;14:447–463. [PMC free article] [PubMed]
  • Andersen PK, Klein JP. Regression analysis for multi-state models based on a pseudo-value approach with applications to bone marrow transplantation studies. Scandinavian Journal of Statistics. 2007;34:3–16.
  • Andersen PK, Klein JP, Rosthoj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2002;90:15–27.
  • Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
  • Scheike TH, Zhang M-J. Direct modeling of regression effects for transition probabilities in multi-state models. Scandinavian Journal of Statistics. 2006;34:17–32.
  • Klein JP, Rizzo JD, Zhang M-J, Keiding N. Statistical methods for the analysis and presentation of the results of bone marrow transplants. Part 2: Regression modeling. Bone Marrow Transplant. 2001;28:1001–1011. [PubMed]
  • Logan BR, Zhang M-J, Klein JP. Regression models for hazard rates versus cumulative incidence probabilities in bone marrow transplant data. Biology of Blood and Marrow Transplantation. 2006;12:107–112. [PubMed]
  • Andersen PK, Keiding N. Multi-state models for event history analysis. Statistical Methods in Medical Research. 2002;11:91–115. [PubMed]
  • Pepe MS. Inference for events with dependent risks in multiple end point studies. Journal of the American Statistical Association. 1991;86:770–778.