Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC5364951

Formats

Article sections

Authors

Related links

Stat Med. Author manuscript; available in PMC 2017 March 24.

Published in final edited form as:

PMCID: PMC5364951

NIHMSID: NIHMS852145

Mortality rates are probably the most important indicator for the performance of kidney transplant centers. Motivated by the national evaluation of mortality rates at kidney transplant centers in the United States, we seek to categorize the transplant centers based on the mortality outcome. We describe a Dirichlet process model and a Dirichlet process mixture model with a half-cauchy prior for the estimation of the risk-adjusted effects of the transplant centers, with strategies for improving the model performance and interpretability as well as classification ability. We derive statistical measures and create graphical tools to rate transplant centers and identify outlying centers with exceptionally good or poor performance. The proposed method was evaluated through simulation, and then applied to assess kidney transplant centers from a national organ failure registry.

Monitoring and tracking the performance of health care providers, such as hospitals, nursing homes, dialysis facilities or surgical wards, ensures the delivery of high quality care to the vulnerable patient population [1]. This article is in response to the urgent need for the evaluation of kidney transplant centers in the United States with respect to their mortality rates after transplantation. The data include patients in the Scientific Registry of Transplant Recipients (SRTR) who received their kidney from 2008 to 2011. A total of 56455 kidney transplants were performed at 242 transplant centers.

There is a large amount of literature describing methods for the evaluation of center performances and identification of outlying centers with extremely good or poor performance. The application includes a variety of data: (standardized) mortality, proportions, counts of adverse events, categorical data or continuous data measuring quality of life. Some examples of using parametric approaches can be found in [2, 3, 4, 5]. Among these articles, Liu *et al.* [2] and Jones and Spiegelhalter [3] used a normal hierarchical (random effects) model for the center effects. As we know, random effects models improve estimation by borrowing information across transplant centers, and thus shrinking estimates of the center effects toward the overall mean and leading to a reduced variation of the estimates. However, the smaller variance is achieved at the cost of bias and inappropriate shrinkage could prevent the centers with exceptionally good or poor performance from being identified. For this reason, He et al. [5] and Kalbfleisch and Wolfe [4] prefer a model with center effects being considered as fixed, leading to independent (no shrinkage) center estimates. It seems that a desirable model would combine the advantages of both the fixed and random effects models, in the sense that it would allow borrowing strength across similar centers, but avoid shrinking outlying centers towards the population mean.

Moreover, in both random or fixed effects models, it is not immediately clear how unusual centers, i.e., any with exceptionally good or poor performance, can be identified. An common strategy is to measure the deviation of each transplant center relative to the population average using a p-value or an (adjusted) Z-score derived from an assumed parametric or empirical null distribution [6, 4]. However, ideally a model would provide an in-built diagnostic measure for centers with unusual outcomes.

Ohlssen *et al.* [7] applied a Dirichlet process (DP) model and a Dirichlet process mixture (DPM) model to the problem of hospital comparisons using mortality rates. These Bayesian non-parametric approaches satisfies the above requirements. They efficiently accommodates outlying centers and allows for a more flexible distribution of center effects with the possibility of skewness and multimodality. Furthermore, the embedded clustering feature in Dirichlet process models provides inherent diagnostic measures to identify outlying centers. However, Ohlssen *et al.* [7] only considered binary outcome data. In our application, however, the majority of data are censored; as of Jan 31, 2013, 93% patients were still alive.

In view of limited work in the survival setting, we propose a Dirichlet process (DP) survival model and a Dirichlet process mixture (DPM) survival model. Our survival come is defined from the time of kidney transplantation to death; patients who are alive at the last follow-up time point were considered to be right centered. Center effects are represented as random effects (frailties) in a Cox proportional hazard model. In the Cox model, we include important patient-level characteristics and dealt with missing data. Due to the large number of transplants (*>* 50000) and the large dimension of patient-level covariates, Kalbfleisch and Wolfe [4] used a two-stage approach to obtain the risk-adjusted center effects. In the firs stage, they estimated patient-level covariates from a Cox model stratified by transplant centers; in the second stage, they derived center effects by fixing the covariate effects obtained from the first stage. However, we propose a fully Bayesian approach, in which we use a Gibbs algorithm that alternates between (1) updating effects of covariates with a Metroplis-Hasting algorithm conditional on estimated center effects, and then (2) updating center effects conditional on estimated covariate effects using a DP or DPM model. To our knowledge, this would be the first attempt to apply such an approach to evaluate survival outcomes of nationwide transplant centers, the results of which may be of national significance.

The remaining of the article is organized as follows. Section 2 describes a DP prior and a DPM prior for modelling center effects in a Cox proportional hazard model, proposes strategies to improve model performance and creates graphical tools to evaluate centers. Section 3 presents simulation studies and investigates shrinkage effects for data with different clustering structures. Section 4 illustrates the analysis on the Kidney transplant data. Section 5 is the concluding discussion.

The data are denoted by {(*t _{ij}*,

Under the proportional hazards model, we have

$$\lambda ({t}_{ij})={\lambda}_{0}({t}_{ij})\phantom{\rule{0.16667em}{0ex}}\text{exp}\{\alpha {x}_{ij}+{\beta}_{i}\},$$

where *β _{i}* is effect for center

$$L(\alpha ,\mathit{\lambda},\beta )=\prod _{i=1}^{N}\prod _{j=1}^{{n}_{i}}f({\mathit{N}}_{ij},{x}_{ij},{\mathbf{\Delta}}_{ij};\alpha ,\mathit{\lambda},\beta )$$

and

$$f({\mathit{N}}_{ij},{x}_{ij},{\mathbf{\Delta}}_{ij};\alpha ,\mathit{\lambda},\beta )=\prod _{k=1}^{K}exp\{-exp\{log{\lambda}_{k}+\alpha {x}_{ij}+{\beta}_{i}\}{\mathrm{\Delta}}_{\mathit{ijk}}\}{\{exp\{log\{{\lambda}_{k}\}+\alpha {x}_{ij}+{\beta}_{i}\}{\mathrm{\Delta}}_{\mathit{ijk}}\}}^{{N}_{\mathit{ijk}}},$$

(1)

where *N** _{ij}* = (

The gamma process is used as a prior for the cumulative baseline hazard function Λ_{0} [10], i.e.,
${\mathrm{\Lambda}}_{0}~\mathcal{G}\mathcal{P}({c}_{0}{\mathrm{\Lambda}}_{0}^{\ast},{c}_{0})$, where
${\mathrm{\Lambda}}_{0}^{\ast}$ is often assumed to be a known parametric function. For example,
${\mathrm{\Lambda}}_{0}^{\ast}=\eta {y}^{{k}_{0}}$ corresponds to the Weibull distribution, and *c*_{0} represents the degree of confidence in this prior guess.

In this article the attention is focused on modelling random effects (frailties) of *β*_{1}, · · ·, *β _{N}*. Often the random effects
${\beta}_{i}^{\prime}s$ (or

$$\begin{array}{lll}\hfill {\beta}_{1},\cdots ,{\beta}_{N}& ~\hfill & G\hfill \\ \hfill G& ~\hfill & \text{DP}(a,{G}_{0}),\hfill \end{array}$$

where *G*_{0} corresponds to a best guess for *G* as a priori and *a* expresses confidence in this guess.

The stick-breaking representation [11] implies that *G ~* DP(*a*, *G*_{0}) is equivalent to

$$G=\sum _{h=1}^{\infty}{\pi}_{h}{\delta}_{{\beta}_{h}},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\beta}_{h}~{G}_{0},\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}\sum _{h=1}^{\infty}{\pi}_{h}=1,$$

(2)

where *π _{h}* =

One potential issue with this representation of a mixture of point mass is that it assumes a discrete distribution for the random effects so that different centers in a cluster have exactly the same random effect values. It may be more realistic to assume that centers in a cluster have similar, but not identical, random effect values. To accomplish this, let ${\beta}_{i}~N({\mu}_{h},{\sigma}_{h}^{2})$ and $({\mu}_{h},{\sigma}_{h}^{2})~G$. That is, (2) becomes

$$G=\sum _{h=1}^{\infty}{\pi}_{h}\mathbf{N}({\mu}_{h},{\sigma}_{h}^{2}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}({\mu}_{h},{\sigma}_{h}^{2})~{G}_{0},\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}\sum _{h=1}^{\infty}{\pi}_{h}=1$$

(3)

In this case, the random distribution, *G*, is characterized as a DP mixture (DPM) of normals [13]. A mixture of normals allow a flexible continuous random-effects distribution of the center effects. (Readers can refer to a book by Dunson [12] for a detailed review of the DP and DPM model).

Recent research has focussed on using the constructive definition of the DP to produce practical MCMC algorithms [14]. The principle is to approximate the full process by truncating the DP(M) at a maximum number of components *H*, so that

$$G=\sum _{h=1}^{H}{\pi}_{h}{\delta}_{{\beta}_{h}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\text{in}\phantom{\rule{0.16667em}{0ex}}\text{DP}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}G=\sum _{h=1}^{H}{\pi}_{h}\mathbf{N}({\mu}_{h},{\sigma}_{h}^{2})\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\text{in}\phantom{\rule{0.16667em}{0ex}}\text{DPM}$$

Closely related to *a*, a large *H* provides an accurate approximation to the full DP(M) but requires a large computation effort. Strategies have been proposed to specify *H*[15, 7]. Since we are interested in detecting a few subgroups, i.e., two groups below and two above the population average, we fix *H* = 5 and find that *H* = 5 works very well for both the cases studied in the simulation and the kidney transplant data.

In PD model *G*_{0} is often chosen to have a normal distribution, i.e.,
${G}_{0}~N({\mu}_{0},{\sigma}_{0}^{2})$. The hyperparemters (*μ*_{0},
${\sigma}_{0}^{2}$) can be fixed, or assigned a normal-inverse gamma hyperprior. A hyperprior would allow the base distribution having unknown mean and variance and provide a shrinkage of center effects towards the overall mean. In DPM model we assumed that
${\mu}_{h}~N({\mu}_{0},{\sigma}_{0}^{2})$, with a normal hyperprior for *μ*_{0} and a Half-Cauchy prior for *σ*_{0} [16], i.e.,
$f({\sigma}_{0})\propto {(1+{({\scriptstyle \frac{{\sigma}_{0}}{A}})}^{2})}^{-1}$, with a smaller *A* indicating a stronger prior information and a greater shrinkage. This Cauchy prior behaviors well for a small number of components (clusters), such as *H* = 5, and it restricts
${\sigma}_{0}^{2}$ away from very large values and have better behavior near zero, compared to the inverse-gamma family [16]. We also assume that
$1/{\sigma}_{h}^{2}~\text{Gamma}({e}_{0},{f}_{0})$ and fix hyperparemters (*e*_{0}, *f*_{0}) to be weakly informative, since fully non-informative priors are not possible in a mixture context [17].

In parametric hierarchical models, it is a standard practice to place a mean constraint on the latent variable distribution for sake of identifiability and interpretability [18, 19]. In this article we center the Dirichlet process to have zero mean. Following Yang *et al.* [18], we estimate the mean of the process,
${\mu}_{G}^{m}$, at the *m ^{th}* MCMC iteration as

$${\mu}_{G}^{m}=\sum _{h=1}^{H}{V}_{h}^{m}\prod _{l<h}(1-{V}_{l}^{m}){\beta}_{h}^{m},$$

where
${V}_{h}^{m}$ and
${\beta}_{h}^{m}$ are the posterior samples from the un-centered process defined in (2), and
${\beta}_{i}^{m}-{\mu}_{G}^{m}\phantom{\rule{0.16667em}{0ex}}(h=1,\cdots ,H)$ is the “centered” estimate for center *i* at the *m ^{th}* iteration. The same idea applies to the DPM model.

Centering the process improves model performance in two aspects. First, it improves MCMC convergence and mixing rates. Second, an “centered” estimate can be interpreted as a deviation from the population average.

The blocked Sampler of Ishwaran and James [14] is used to allocate each center to one of the components by sampling the label *Z _{i}* from a multinomial conditional posterior. In the DP model, probabilities in the multinomial distribution are:

$$Pr({Z}_{i}=h\mid -)=\frac{\{{V}_{h}{\prod}_{l<h}(1-{V}_{l})\}{\prod}_{j=1}^{{n}_{i}}f({\mathit{N}}_{ij},{\mathbf{\Delta}}_{ij},{x}_{ij};\alpha ,\mathit{\lambda},{\beta}_{h})}{{\sum}_{r=1}^{H}\{{V}_{r}{\prod}_{l<r}(1-{V}_{l})\}{\prod}_{j=1}^{{n}_{i}}f({\mathit{N}}_{ij},{\mathbf{\Delta}}_{ij},{x}_{ij};\alpha ,\lambda ,{\beta}_{r})},$$

where *f*(*N** _{ij}*,

In the DPM, the probabilities are

$$Pr({Z}_{i}=h\mid -)=\frac{\{{V}_{h}{\prod}_{l<h}(1-{V}_{l})\}{\prod}_{j=1}^{{n}_{i}}{\{N({\beta}_{i};{\mu}_{h},{\sigma}_{h}^{2})\}}^{\eta}}{{\sum}_{r=1}^{H}\{{V}_{r}{\prod}_{l<r}(1-{V}_{l})\}{\prod}_{j=1}^{{n}_{i}}{\{N({\beta}_{i};{\mu}_{h},{\sigma}_{h}^{2})\}}^{\eta}}$$

Motivated by the work of Hofmann [20], we introduce an annealing (tempering) parameter *η*. Similar strategies have been used in simpler mixture models for efficient Gibbs sampling [21]. Intuitively, *η* acts as a weight to strengthen or weaken the contribution of each observation. Our numerical experiments reveal that a *η* (1.5, 3) lead to significantly improved the clustering performance, especially when the prior for component parameters (such as
${\sigma}_{1}^{2}={\sigma}_{2}^{2}=\cdots ={\sigma}_{H}^{2}$) are weak. In different contexts, such as in adaptive randomization trials [22], a similar annealing parameter has been found useful in making randomization more balanced between treatment groups.

A useful metric to rate transplant centers is their ranks. In each MCMC iteration, *β _{i}* (

Additionally, in order to visually detect outlying centers, we calculate the proportion of centers in the same cluster as center *i*, denoted by *PS*. Together with the rank (percentile) statistics, we create a graph that helps identify centers that are in isolated small clusters with exceptionally low or high ranks.

In this section we conduct simulation studies to investigate the performance of the proposed DP and DPM models.

**DP**: a DP model with fixed hyperparameters, i.e., (*μ*_{0}, ${\sigma}_{0}^{2}$) are fixed.**DP-HP**: a DP model with a random normal-inverse gamma hyperprior for (*μ*_{0}, ${\sigma}_{0}^{2}$).**DPM**: a DPM model with ${\mu}_{h}~({\mu}_{0},{\sigma}_{0}^{2})$, where*μ*_{0}has a normal hyperprior and*σ*_{0}has a Half-Cauchy prior with*A*= 1. We also assigned a very weak prior for ${\sigma}_{h}^{2}$, i.e., $1/{\sigma}_{h}^{2}~\text{Gamma}(0.01,0.01)$.

The DP model, with fixed hyperparameters, does not induce shrinkage between clusters, but shrinks centers within the same cluster to a single estimate. In contrast, DPM allows shrinkage between- and within-clusters with a smaller *A* indicating a stronger shrinkage. Intuitively, DP-HP could have a stronger shrinkage than DPM since DP-HP has the strongest shrinkage within cluster by forcing all centers within a cluster having the same estimate, as well as a between-cluster shrinkage that is induced by a hyperprior for (*μ*_{0},
${\sigma}_{0}^{2}$).

In all simulations, survival times are generated from a Cox model [24], *S*(*t*|center* _{i}*) = exp[−Λ

We generate data under four data clustering structures, each consists of 48 centers and each center has 20 or 40 patients. Scenario I–III have three clusters but with different sizes (the size, denoted by *m*, refers to the number of centers in a cluster). In the last Scenario, all *β _{i}*’s are generated from a single cluster.

- I: C1:
*m*= 16 ~*N*(0.69, 0.2^{2}), C2:*m*= 16 ~*N*(0, 0.2^{2}) and C3:*m*= 16 ~*N*(0.69, 0.2^{2}) - II: C1:
*m*= 8 ~*N*(0.69, 0.2^{2}), C2:*m*= 24 ~*N*(0, 0.2^{2}) and C3:*m*= 16 ~*N*(0.69, 0.2^{2}) - III: C1:
*m*= 4 ~*N*(0.69, 0.1^{2}), C2:*m*= 40 ~*N*(0, 0.2^{2}) and C3:*m*= 4 ~*N*(0.69, 0.1^{2}) - IV: A single cluster
*m*= 48 ~*N*(0, 0.4)

For instance, in Scenario I, the first 16 centers form a cluster named as C1, with
${\beta}_{i}^{\prime}s$ simulated from a normal distribution with a mean −0.69 and a standard deviation of 0.2; likewise, the next 16 centers form a cluster named as C2 and the last 16 centers form a cluster named C3.Within each cluster, the first half centers have *n* = 20 and other half centers have *n* = 40. A large *β _{i}* means poor center performance.

We also compute three Bayesian model comparison criteria for selecting the best model: modified Deviance Information Criterion (DIC_{3}) [25], Watanabe-Akaike information criterion (WAIC) [26] and log-pseudo marginal likelihood (LPML) [9]. DIC_{3} is preferred in our setting over the standard DIC proposed by Spiegelhalter [27] since it correctly reflects the effective number of parameters in mixture models.WAIC was proposed recently and can also be viewed as an improvement on the standard DIC and it also approximates Bayesian cross-validation. It is invariant to parametrization and also works for singular model [28]. LPML is a cross-validated leave-one-out measure of a models ability to predict the data. It is valid for small and large samples and does not suffer from a heuristic justification based on large sample normality. The best model should have the smallest DIC_{3} and WAIC and largest LPML.

The models are implemented in **R**. All normal priors are assumed to have a mean of zero and variance of 100. The baseline hazard is assumed to have an exponential distribution with a rate of 1 and *c*_{0} = 0.1, and the time axis is partitioned into 5 intervals based on the observed quantiles. All the priors are set to be quite weak. With a burn-in of 1000 iterations, an additional 2000 iterations are used for inference.

As indicated by the diagnostic statistics in Table 2, when data consist of a few clusters, and each with a decent size (such as Scenario I and II), DPM is a better choice. A DP model is slightly better if a big cluster is accompanied by a few small outlying clusters (such as Scenario III). In the last scenario when there is only a single cluster, DP and DMP perform similarly.

Table 1 shows the parameter estimations and performance statistics with respect to the absolute bias (Bias), standard deviation (SD) and mean square error (MSE), based on 1000 repeated datasets. As expected, centers with *n* = 40 have more accurate estimates compared to centers with *n* = 20, as evidenced by a smaller bias, SD and MSE. Surprisingly, parameter estimations are very similar between DP-HP and DP models, so we only present the DP for illustration. It is also interesting to note that, in both DP and DPM models, estimates in a small cluster can be significantly biased toward a large cluster.

Parameter estimation and performance statistics with respect to the absolute bias (Bias), standard deviation (SD) and mean square error (MSE), based on 1000 simulated datasets.

Figure 1 displays the estimated clustering structure for the DP and DPM model under four scenarios over 1000 repeated datasets. Each heat map is created based on 48 × 48 matrix, containing pair-wise posterior probabilities for two centers being classified into the same cluster. Since we know the true cluster status for each center, centers are ordered by their true IDs. A red square represents a cluster (subgroup). It is apparent that the true clustering structure is well represented in all scenarios in both DP and DPM models. As expected, centers with a large sample size (*n* = 40) are more likely to be classified correctly than centers with a small sample size (*n* = 20), as demonstrated by larger centers having a darker red (higher probability) in a cluster and a darker blue (lower probability) between clusters. Upon closer inspection, DP models generally have higher pair-wise probabilities than DPM models, as evidenced by a darker red and a lighter blue in the DP model. We observed that the pair-wise probabilities are less influenced by undue shrinkage than the estimates of the center effects seen in Table 1. Here, the average pair-wise probabilities are between 65%–70% for all three clusters. For example, in Scenario III, averaged pair-wise probabilities for C1, C2 and C3 are 0.65, 0.70 and 0.70, respectively in the DPM model and 0.65, 0.78 and 0.69, respectively in the DP model.

We apply our model to evaluate 213 nationwide transplant centers in the US. The number of patients per center has a median of 198 and an interquartile range of (111, 356). A total of nine patient-level covariates are selected using a forward selection algorithm and also per relevant medical literature, including Cold ischemia time, Peak renal reactive antibody level (PRRA), Body mass index, Time on renal replacement therapy (TRRT), Donor race, Recipient race, Donor history of Diabetes, Previous Solid Organ Transplant, Recipient Diagnosis. Due to the retrospective nature of the analysis, values are missing for some of those characteristics. For instance, there are 16.47% missing data in TRRT and 2.14% missing in PRRA. In order to include patients with partially missing covariates while reserving the original covariate distributions, we create a binary variable for each covariate indicating if the data is missing for each subject. For example, a continuous covariate is created into two variables with one variable containing the original value and the other variable containing one if the data is missing and zero otherwise. By doing so, we create 18 covariates.

The priors used in the application are the same as in the simulation studies except that 1/*σ _{h} ~* Gamma(3, 0.5), which is also weak relative to the likelihood. With a burn-in of 10000 iterations, an additional 20000 iterations were used for posterior inference. It takes about four hours for data to run on an Intel Xeon 3.10 GHz 4GB RAM, x64 Linux computer. We observe that the chain mixes well and the results are robust to different choices of the initial values.

As illustrated in Table 3, A DPM model is preferred over the DP model for the kidney transplant data based on the DIC_{3}, WAIC and LPML statistics. Figure 2 presents caterpillar plots of the center estimates and Figure 3 depicts outlying centers at two tails, i.e., centers with very low and high percentiles and small probabilities of being in the same cluster as other centers. In both DP and DPM models, two transplant centers (with id 652 and 437) have the worst outcomes (*PS <* 0.2 and Percentile*>* 0.8). It is also interesting to note that a few centers with exceptionally good performance are observed in DPM model but not in DP model.

Caterpillar plots of 95% credible intervals for estimates of the center effects in kidney transplant data using the DP model in (a) and the DPM model in (b). The transplant centers were placed in the order of their posterior means. The dotted vertical **...**

The x axis is the mean percentile and the y axis is the mean percentage of centers being in the same cluster as center *i* for the kidney transplant data using the DP model (a) and DPM model (b). Isolated data points in the lower left corner depict outlying **...**

The heat map in Figure 4 is based on pair-wise probabilities and ordered by rank statistics as described in Section 2.6. It appears that a few centers at the upper right corner form an isolated cluster (this is clearer when the plot was magnified to 200%), which performed significantly worse than the population average. Moving along the diagonal toward the lower left corner, the next cluster consists of approximately 60 centers that performed better than the previously mentioned small outlying cluster but still worse than the population average. No isolated subgroups of centers seem to perform significantly better than the population average; however, a big group of around 60 centers performed above average. We find that the DP and DPM model identify very similar outlying centers with slightly different ordering of across the centers.

Heat maps representing pair-wise posterior probabilities of the two centers are classified into the same cluster; centers are ordered based on their mean ranking scores. (a) and (b) are from the DP and DPM model, respectively.

We also did sensitivity analysis for the parameters in the model. In this application, we tried a larger *H* (*H* = 20 and *H* = 50) and considered a random *a*, respectively. We find that a large *H* does not improve the model performance (DIC_{3}, WAIC and LPML are the same as fixing *H* = 5), and data seem to contain little information about estimating the parameter *a*, leading to the same, or slightly worse diagnostic statistics, compared to a fixed *a* (data not shown). We also increased the *A* to 5 in the Half-Cauchy prior, this stronger prior provides similar model performance statistics as with *A* = 1 (DIC_{3}, WAIC and LPML are similar) and its heatmap also resembles that with *A* = 1. For the annealing parameter *η*, we tried *η* = 1, 1.5, 2 and 3. We find that *η* = 1 classifies all centers into one single cluster, and *η* of 2 and 3 result in efficient clustering. It is worth mentioning that different choices of *η* do not affect the posterior estimation of the center effects, and estimates with different *η* are very similar.

We propose a fully Bayesian approach to model patient-level covariates and center effects in a Cox Proportional Hazard model. The work has been applied to evaluate long-term mortality rates of kidney transplant centers from a national organ failure registry. In modelling the risk-adjusted center effects, we describe a Dirichlet process model and a Dirichlet process mixture model with a half-cauchy prior. We also derive statistical measures and presented graphical tools to rate transplant centers and identify outlying centers with exceptionally good or poor performance.

To improve the model performance and interpretability for the survival data, we center the Dirichlet process. Without centering, MCMC chains exhibit very high autocorrelation which has no hope of yielding any meaningful estimates. Centering the DP process by constraining the mean to zero dramatically improved model convergence and interpretability of the estimates of the center effects. To increase model’s classification ability, we introduce an annealing parameter in calculating the allocation probability. In practice this annealing parameter, *η*, can be chosen in a range of 1.5–3 based on model diagnostic statistics. Our simulation results show that a good annealing parameter is critical for improving the classification ability with the survival data, that is, it makes the classification less “ fussy ” and avoid classifying all centers into a single cluster. It outperforms alternative strategies, such as forcing a common variance across components (
${\sigma}_{1}^{2}={\sigma}_{2}^{2}=\cdots ={\sigma}_{H}^{2}$) and assigning strongly informative priors for
${\sigma}_{h}^{2}\phantom{\rule{0.16667em}{0ex}}(h=1,\cdots ,H)$.

The graphical tools and statistical measures proposed in this article are particularly useful to reveal a clustering structure of all transplant centers and help policy-makers identify potential outlying centers for further investigations.

During the study, we also tried to implement the proposed method in WinBUGS and JAGS. However, it is not straightforward to incorporate the annealing parameter *η* in the modeling. We will continue working on it and meanwhile the R codes will be made available to the public through the author’s web site once it is published.

The Markov chain Monte Carlo procedure for estimating the posterior distributions is implemented by repeatedly drawing samples from the full conditional distributions of the parameters. In DP model,

- Update the stick-breaking weights from conditionally conjugate beta posterior distributions:$${V}_{h}\mid -~\text{Beta}\phantom{\rule{0.16667em}{0ex}}\left(1+\sum _{i=1}^{N}\mathrm{I}({Z}_{i}=h),a+\sum _{i=1}^{N}\mathrm{I}({Z}_{i}>h)\right),h=1,\cdots ,H$$
- Update baseline hazard in interval
*k*(*k*= 1, · · ·, 5) from Gamma(1 × 0.1 +*D*, 0.1 + Σ_{k}_{iRk}*exp*{*αx*+_{ij}*β*}Δ_{i}), and_{ijk}*D*and_{k}*R*represents the number of death and the number subjects at risk in interval_{k}*k*. - The vector of covariates was divided into 3 groups with 6 covariates per group, and
*α*was updated by groups. Within each group, the corresponding*α*was updated using the adaptive Metropolis-Hastings algorithm [30]. The initial estimates of*α*was calculated from a Cox model stratified by centers. The multivariate normal proposal density centered at the previous value, and the covariance in the proposal was “refined” by using the empirical covariance from an extended burn-in period. - Update
*a*from a Gamma distribution$$a~\text{Gamma}\phantom{\rule{0.16667em}{0ex}}\left(1,{a}_{0}+H-1,{b}_{0}-\sum _{r=1}^{H-1}log(1-{V}_{r})\right)\phantom{\rule{0.16667em}{0ex}}\mathrm{I}(0.3,10)$$The prior for*a*is gamma with hyperparameters*a*_{0}and*b*_{0}, which are constrained in the range from 0.3 to 10. - In DP-HP, given
*β*_{1}, · · ·,*β*, · · ·,_{h}*β*, update (_{H}*μ*_{0}, ${\sigma}_{0}^{2}$) using the normal-inverse-gamma conjugacy form Carlin*et. al*[31].

Compared to the DP model, there are some changes in the DPM model,

- 2Update
*β*by the adaptive rejection algorithm and with ${\beta}_{i}~({\mu}_{{Z}_{i}},{\sigma}_{{Z}_{i}}^{2})$ as a priori._{i} - 7Gibbs sampling of component-specific parameters and hyperparameters in
*G*_{0}using the half-cauchy prior can be found in [16].

1. Conway PH, Clancy C. Transformation of health care at the front line. JAMA. 2009;301:763–765. [PubMed]

2. Liu J, Louis T, Pan W, Ma J, Collins A. Methods for estimating and interpreting provider-specific standardized mortality ratios. Health Serv Outcomes Res Methodol. 2003;1754:135–149. [PMC free article] [PubMed]

3. Jones HE, Spiegelhalter DJ. The identification of unusual heath-care providers from a hierarchical model. Lifetime Data Anal. 2011;65:154–163.

4. Kalbfleisch JD, Wolfe RA. On monitoring outcomes of medical provider. Stat Biosciences. 2013;40:1–30.

5. He K, Kalbfleisch JD, Li Y, Li Y. Evaluating hospital readmission rates in dialysis facilities with and without adjusting for hospital effects. Lifetime Data Anal. 2013;19:490–512. [PubMed]

6. Spiegelhalter D, Sherlaw-Johnson C, Bardsley M, Blunt I, CCW, Grigg O. Statistical methods for healthcare regulation: rating, screening and surveilliance. J Royal Stat Soc A. 2012;175:1–47.

7. Ohlssen D, Sharples L, Spiegelhalter D. Flexible random-effects models using bayesian semi-parametric models: applications to institutional comparisons. Statistics in Medicine. 2007;26:2088–112. [PubMed]

8. Jara A, Hanson T, Quintana F, Muller P, Rosner G. Ppackage: Bayesian semi and nonparametric modeling in r. Journal of Statistical Software. 2011;40:1–30. [PMC free article] [PubMed]

9. Ibrahim JG, Chen MH, Sinha D. Bayesian Survival Analysis. New York: Springer; 2001.

10. Kalbfleisch J. Non-parametric Bayesian analysis of survival time data. Journal of the Royal Statistical Society Series B. 1978;40:214–221.

11. Sethuraman J. A constructive definition of dirichlet priors. Statistica Sinica. 1994;4:639–650.

12. Dunson D. Nonparametric Bayes applications to biostatistics. Cambridge: Cambridge University Press; 2010.

13. Escobar M, West M. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association. 1995;90:577–588.

14. Ishwaran H, James L. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association. 2001;101:179–194.

15. Ishwaran H, Zarepour M. Markov chain monte carlo in approximate dirichlet and beta two-parameter process hierarchical models. Biometrika. 2000;87:371–390.

16. Gelman A. Prior distributions for variance parameters in hierachical models. Bayesian Analysis. 2006;1:515–533.

17. Richardson S, Green P. On bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society, Series B. 1997;59:731–792.

18. Yang M, Dunson DB, Baird D. Semiparametric Bayes hierarchical models with mean and variance constraints. Computational Statistics and Data Analysis. 2010;54:2172–2186. [PMC free article] [PubMed]

19. Yisheng Li PM, Lin X. Center-adjusted inference for a Non-parametric Bayesian random effect distribution. Statistica Sinica. 2011;21:1201–1223. [PMC free article] [PubMed]

20. Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning. 2001;42:177–196.

21. Dunson D, Das S. Bayesian distribution regression via augmented particle sampling. University Lecture; 2009.

22. Lee JJ, Gu X, Liu S. Bayesian adaptive randomization designs for targeted agent development. Clinical Trials. 2010;7:584–596. [PMC free article] [PubMed]

23. Dunson DB, Xue Y, Carin L. The matrix stick-breaking process: Flexible Bayes meta analysis. Journal of the American Statistical Association. 2008;103:317–327.

24. Bender R, Augustin T, Blettner M. Generating survival times to simulate cox proportional hazards models. Statistics in Medicine. 2005;24:1713–1723. [PubMed]

25. Celeux G, Forbes F, Robert C, Titterington D. Deviance information criteria for missing data models. Bayesian Analysis. 2006;1:651–674. doi: 10.1214/06-BA122. [Cross Ref]

26. Watanabe S. Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research. 2010;11:3571–3594.

27. Spiegelhalter DJ, Best NG, Carlin B, Van de Linde A. Bayesian measures of model complexity and fit (with discussion) Journal of the Royal Statistical Society. 2002;64:583–639. doi: 10.1111/1467-9868.00353. [Cross Ref]

28. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Data Analysis. 3. Chapman & Hall/CRC; 2013.

29. Gilks W, Best N, Tan K. Adaptive rejection metropolis sampling within gibbs sampling. Applied Statistics. 1995;44:455–472.

30. Haario H, Saksman S, Tamminen J. An adaptive Metropolis algorithm. Bernoulli. 2001;7:223–242. doi: 10.2307/3318737. [Cross Ref]

31. Carlin B, Louis T. Bayes and Empirical Bayes Methods for Data Analysis. 2. Chapman & Hall/CRC; 2000.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |