Frequentist sample size determination for binary outcome data in a two-arm clinical trial requires initial guesses of the event probabilities for the two treatments. Misspecification of these event rates may lead to a poor estimate of the necessary sample size. In contrast, the Bayesian approach that considers the treatment effect to be random variable having some distribution may offer a better, more flexible approach. The Bayesian sample size proposed by Whitehead et al. (2008) for exploratory studies on efficacy justifies the acceptable minimum sample size by a “conclusiveness” condition. In this work, we introduce a new two-stage Bayesian design with sample size reestimation at the interim stage. Our design inherits the properties of good interpretation and easy implementation from Whitehead et al. (2008), generalizes their method to a two-sample setting, and uses a fully Bayesian predictive approach to reduce an overly large initial sample size when necessary. Moreover, our design can be extended to allow patient level covariates via logistic regression, now adjusting sample size within each subgroup based on interim analyses. We illustrate the benefits of our approach with a design in non-Hodgkin lymphoma with a simple binary covariate (patient gender), offering an initial step toward within-trial personalized medicine.
Bayesian design; clinical trial; personalized medicine; predictive approach; sample size reestimation; subgroup analysis
The exact mechanisms relating exposure to ultraviolet (UV) radiation and elevated risk of skin cancer remain the subject of debate. For example, there is disagreement on whether the main risk factor is duration of the exposure, its intensity, or some combination of both. There is also uncertainty regarding the form of the dose-response curve, with many authors believing only exposures exceeding a given (but unknown) threshold are important. In this paper we explore methods to estimate such thresholds using hierarchical spatial logistic models based on a sample of a cohort of x-ray technologists for whom we have self-reports of time spent in the sun and numbers of blistering sunburns in childhood. A preliminary goal is to explore the temporal pattern of UV exposure and its gradient. Changes here would imply that identical exposure self-reports from different calendar years may correspond to differing cancer risks.
Conditionally autoregressive (CAR) model; Erythemal exposure; Hierarchical model; Non-melanoma skin cancer
The analysis of point-level (geostatistical) data has historically been plagued by computational difficulties, owing to the high dimension of the nondiagonal spatial covariance matrices that need to be inverted. This problem is greatly compounded in hierarchical Bayesian settings, since these inversions need to take place at every iteration of the associated Markov chain Monte Carlo (MCMC) algorithm. This paper offers an approach for modeling the spatial correlation at two separate scales. This reduces the computational problem to a collection of lower-dimensional inversions that remain feasible within the MCMC framework. The approach yields full posterior inference for the model parameters of interest, as well as the fitted spatial response surface itself. We illustrate the importance and applicability of our methods using a collection of dense point-referenced breast cancer data collected over the mostly rural northern part of the state of Minnesota. Substantively, we wish to discover whether women who live more than a 60-mile drive from the nearest radiation treatment facility tend to opt for mastectomy over breast conserving surgery (BCS, or “lumpectomy”), which is less disfiguring but requires 6 weeks of follow-up radiation therapy. Our hierarchical multiresolution approach resolves this question while still properly accounting for all sources of spatial association in the data.
Aggregated geographic data; Big N problem; Breast cancer; Conditionally autoregressive (CAR) model; Hierarchical modeling; Kriging
Recently, many Bayesian methods have been developed for dose-finding when simultaneously modeling both toxicity and efficacy outcomes in a blended phase I/II fashion. A further challenge arises when all the true efficacy data cannot be obtained quickly after the treatment, so that surrogate markers are instead used (e.g, in cancer trials). We propose a framework to jointly model the probabilities of toxicity, efficacy and surrogate efficacy given a particular dose. Our trivariate binary model is specified as a composition of two bivariate binary submodels. In particular, we extend the bCRM approach , as well as utilize the Gumbel copula of Thall and Cook . The resulting trivariate algorithm utilizes all the available data at any given time point, and can flexibly stop the trial early for either toxicity or efficacy. Our simulation studies demonstrate our proposed method can successfully improve dosage targeting efficiency and guard against excess toxicity over a variety of true model settings and degrees of surrogacy.
Bayesian adaptive methods; Continual reassessment method (CRM); Maximum tolerated dose (MTD); Phase I/II clinical trial; Surrogate efficacy; Toxicity
With the ready availability of spatial databases and geographical information system software, statisticians are increasingly encountering multivariate modelling settings featuring associations of more than one type: spatial associations between data locations and associations between the variables within the locations. Although flexible modelling of multivariate point-referenced data has recently been addressed by using a linear model of co-regionalization, existing methods for multivariate areal data typically suffer from unnecessary restrictions on the covariance structure or undesirable dependence on the conditioning order of the variables. We propose a class of Bayesian hierarchical models for multivariate areal data that avoids these restrictions, permitting flexible and order-free modelling of correlations both between variables and across areal units. Our framework encompasses a rich class of multivariate conditionally autoregressive models that are computationally feasible via modern Markov chain Monte Carlo methods. We illustrate the strengths of our approach over existing models by using simulation studies and also offer a real data application involving annual lung, larynx and oesophageal cancer death-rates in Minnesota counties between 1990 and 2000.
Lattice data; Linear model of co-regionalization; Markov chain Monte Carlo methods; Multivariate conditionally autoregressive model; Spatial statistics
Lung transplantation is now a standard intervention for patients with advanced lung disease. Home monitoring of pulmonary function and symptoms has been used to follow the progress of lung transplant recipients in an effort to improve care and clinical status. The study objective was to determine the relative performance of a computer-based Bayesian algorithm compared with a manual nurse decision process for triaging clinical intervention in lung transplant recipients participating in a home monitoring program.
Materials and Methods:
This randomized controlled trial had 65 lung transplant recipients assigned to either the Bayesian or nurse triage study arm. Subjects monitored and transmitted spirometry and respiratory symptoms daily to the data center using an electronic spirometer/diary device. Subjects completed the Short Form-36 (SF-36) survey at baseline and after 1 year. End points were change from baseline after 1 year in forced expiratory volume at 1 s (FEV1) and quality of life (SF-36 scales) within and between each study arm.
There were no statistically significant differences between groups in FEV1 or SF-36 scales at baseline or after 1 year.: Results were comparable between nurse and Bayesian system for detecting changes in spirometry and symptoms, providing support for using computer-based triage support systems as remote monitoring triage programs become more widely available.
The feasibility of monitoring critical patient data with a computer-based decision system is especially important given the likely economic constraints on the growth in the nurse workforce capable of providing these early detection triage services.
home health monitoring; telehealth; telemedicine; m-health; transplantation
Mixtures of Polya trees offer a very flexible nonparametric approach for modelling time-to-event data. Many such settings also feature spatial association that requires further sophistication, either at the point level or at the lattice level. In this paper, we combine these two aspects within three competing survival models, obtaining a data analytic approach that remains computationally feasible in a fully hierarchical Bayesian framework using Markov chain Monte Carlo methods. We illustrate our proposed methods with an analysis of spatially oriented breast cancer survival data from the Surveillance, Epidemiology and End Results program of the National Cancer Institute. Our results indicate appreciable advantages for our approach over competing methods that impose unrealistic parametric assumptions, ignore spatial association or both.
Areal data; Bayesian modelling; Breast cancer; Conditionally autoregressive model; Log pseudo marginal likelihood; Nonparametric modelling
Post-market device surveillance studies often have important primary objectives tied to estimating a survival function at some future time T with a certain amount of precision.
This paper presents the details and various operating characteristics of a Bayesian adaptive design for device surveillance, as well as a method for estimating a sample size vector (determined by the maximum sample size and a pre-set number of interim looks) that will deliver the desired power.
We adopt a Bayesian adaptive framework which recognizes the fact that persons enrolled in a study report their results over time, not all at once. At each interim look we assess whether we expect to achieve our goals with only the current group, or whether the achievement of such goals is extremely unlikely even for the maximum sample size.
Our Bayesian adaptive design can outperform two non-adaptive frequentist methods currently recommended by FDA guidance documents in many settings.
Our method's performance can be sensitive to model misspecification and changes in the trial's enrollment rate.
The proposed design provides a more efficient framework for conducting postmarket surveillance of medical devices.
Adaptive trial; Bayesian statistics; futility analysis; interim analysis; Monte Carlo sampling
The study of ecological boundaries and their dynamics is of fundamental importance to much of ecology, biogeography, and evolution. Over the past two decades, boundary analysis (of which wombling is a subfield) has received considerable research attention, resulting in multiple approaches for the quantification of ecological boundaries. Nonetheless, few methods have been developed that can simultaneously (1) analyze spatially homogenized data sets (i.e., areal data in the form of polygons rather than point-reference data); (2) account for spatial structure in these data and uncertainty associated with them; and (3) objectively assign probabilities to boundaries once detected. Here we describe the application of a Bayesian hierarchical framework for boundary detection developed in public health, which addresses these issues but which has seen limited application in ecology. As examples, we analyze simulated spread data and the historic pattern of spread of an invasive species, the hemlock woolly adelgid (Adelges tsugae), using county-level summaries of the year of first reported infestation and several covariates potentially important to influencing the observed spread dynamics. Bayesian areal wombling is a promising approach for analyzing ecological boundaries and dynamics related to changes in the distributions of native and invasive species.
Adelges tsugae; boundary analysis; ecotones; edge detection; hemlock woolly adelgid; invasive species; spatial statistics
Assessing between-study variability in the context of conventional random-effects meta-analysis is notoriously difficult when incorporating data from only a small number of historical studies. In order to borrow strength, historical and current data are often assumed to be fully homogeneous, but this can have drastic consequences for power and Type I error if the historical information is biased. In this paper, we propose empirical and fully Bayesian modifications of the commensurate prior model (Hobbs et al., 2011) extending Pocock (1976), and evaluate their frequentist and Bayesian properties for incorporating patient-level historical data using general and generalized linear mixed regression models. Our proposed commensurate prior models lead to preposterior admissible estimators that facilitate alternative bias-variance trade-offs than those offered by pre-existing methodologies for incorporating historical data from a small number of historical studies. We also provide a sample analysis of a colon cancer trial comparing time-to-disease progression using a Weibull regression model.
clinical trials; historical controls; meta-analysis; Bayesian analysis; survival analysis; correlated data
Trial investigators often have a primary interest in the estimation of the survival curve in a population for which there exists acceptable historical information from which to borrow strength. However, borrowing strength from a historical trial that is non-exchangeable with the current trial can result in biased conclusions. In this paper we propose a fully Bayesian semiparametric method for the purpose of attenuating bias and increasing efficiency when jointly modeling time-to-event data from two possibly non-exchangeable sources of information. We illustrate the mechanics of our methods by applying them to a pair of post-market surveillance datasets regarding adverse events in persons on dialysis that had either a bare metal or drug-eluting stent implanted during a cardiac revascularization surgery. We finish with a discussion of the advantages and limitations of this approach to evidence synthesis, as well as directions for future work in this area. The paper’s Supplementary Materials offer simulations to show our procedure’s bias, mean squared error, and coverage probability properties in a variety of settings.
Bayesian hierarchical modeling; Commensurate prior; Evidence synthesis; Flexible proportional hazards model; Hazard smoothing; Non-exchangeable sources of data
Prospective trial design often occurs in the presence of “acceptable”  historical control data. Typically this data is only utilized for treatment comparison in a posteriori retrospective analysis to estimate population-averaged effects in a random-effects meta-analysis.
We propose and investigate an adaptive trial design in the context of an actual randomized controlled colorectal cancer trial. This trial, originally reported by Goldberg et al. , succeeded a similar trial reported by Saltz et al. , and used a control therapy identical to that tested (and found beneficial) in the Saltz trial.
The proposed trial implements an adaptive randomization procedure for allocating patients aimed at balancing total information (concurrent and historical) among the study arms. This is accomplished by assigning more patients to receive the novel therapy in the absence of strong evidence for heterogeneity among the concurrent and historical controls. Allocation probabilities adapt as a function of the effective historical sample size (EHSS) characterizing relative informativeness defined in the context of a piecewise exponential model for evaluating time to disease progression. Commensurate priors  are utilized to assess historical and concurrent heterogeneity at interim analyses and to borrow strength from the historical data in the final analysis. The adaptive trial’s frequentist properties are simulated using the actual patient-level historical control data from the Saltz trial and the actual enrollment dates for patients enrolled into the Goldberg trial.
Assessing concurrent and historical heterogeneity at interim analyses and balancing total information with the adaptive randomization procedure leads to trials that on average assign more new patients to the novel treatment when the historical controls are unbiased or slightly biased compared to the concurrent controls. Large magnitudes of bias lead to approximately equal allocation of patients among the treatment arms. Using the proposed commensurate prior model to borrow strength from the historical data, after balancing total information with the adaptive randomization procedure, provides admissible estimators of the novel treatment effect with desirable bias-variance trade-offs.
Adaptive randomization methods in general are sensitive to population drift and more suitable for trials that initiate with gradual enrollment. Balancing information among study arms in time-to-event analyses is difficult in the presence of informative right-censoring.
The proposed design could prove important in trials that follow recent evaluations of a control therapy. Efficient use of the historical controls is especially important in contexts where reliance on pre-existing information is unavoidable because the control therapy is exceptionally hazardous, expensive, or the disease is rare.
adaptive designs; Bayesian analysis; historical controls
Numerous studies have found that areas with higher alcohol establishment density are more likely to have higher violent crime rates but many of these studies did not assess the differential effects of type of establishments or the effects on multiple categories of crime. In this study, we assess whether alcohol establishment density is associated with four categories of violent crime, and whether the strength of the associations varies by type of violent crime and by on-premise establishments (e.g., bars, restaurants) versus off-premise establishments (e.g., liquor and convenience stores).
Data come from the city of Minneapolis, Minnesota in 2009 and were aggregated and analyzed at the neighborhood level. Across the 83 neighborhoods in Minneapolis, we examined four categories of violent crime: assault, rape, robbery, and total violent crime. We used a Bayesian hierarchical inference approach to model the data, accounting for spatial auto-correlation and controlling for relevant neighborhood demographics. Models were estimated for total alcohol establishment density as well as separately for on-premise establishments and off-premise establishments.
Positive, statistically significant associations were observed for total alcohol establishment density and each of the violent crime outcomes. We estimate that a 3.9% to 4.3% increase across crime categories would result from a 20% increase in neighborhood establishment density. The associations between on-premise density and each of the individual violent crime outcomes were also all positive and significant and similar in strength as for total establishment density. The relationships between off-premise density and the crime outcomes were all positive but not significant for rape or total violent crime, and the strength of the associations was weaker than those for total and on-premise density.
Results of this study, combined with earlier findings, provide more evidence that community leaders should be cautious about increasing the density of alcohol establishments within their neighborhoods.
Alcohol outlets; violent crime; neighborhood
Given the growing availability of multilevel data from national surveys, researchers interested in contextual effects may find themselves with a small number of individuals per group. Although there is a growing body of literature on sample size in multilevel modeling, few have explored the impact of group size < 5.
In a simulated analysis of real data, we examined the impact of group size < 5 on both a continuous and dichotomous outcome in a simple two-level multilevel model. Models with group sizes 1 to 5 were compared to models with complete data. Four different linear and logistic models were examined: empty models, models with a group-level covariate, models with an individual-level covariate, and models with an aggregated group-level covariate. We further evaluated whether the impact of small group size differed depending on the total number of groups.
When the number of groups was large (N=459), neither fixed nor random components were affected by small group size, even when 90% of tracts had only 1 individual per tract and even when an aggregated group -level covariate was examined. As the number of groups decreased, the standard error estimates of both fixed and random effects were inflated. Furthermore, group-level variance estimates were more affected than were fixed components.
Datasets where there are a small to moderate number of groups with the majority very small group size (n < 5) size may fail to find or even consider a group-level effect when one may exist and also may be under-powered to detect fixed effects.
Multilevel; Neighborhood; Body Weight; Obesity; Sample Size
The evaluation of surrogate endpoints for primary use in future clinical trials is an increasingly important research area, due to demands for more efficient trials coupled with recent regulatory acceptance of some surrogates as ‘valid.’ However, little consideration has been given to how a trial which utilizes a newly-validated surrogate endpoint as its primary endpoint might be appropriately designed. We propose a novel Bayesian adaptive trial design that allows the new surrogate endpoint to play a dominant role in assessing the effect of an intervention, while remaining realistically cautious about its use. By incorporating multi-trial historical information on the validated relationship between the surrogate and clinical endpoints, then subsequently evaluating accumulating data against this relationship as the new trial progresses, we adaptively guard against an erroneous assessment of treatment based upon a truly invalid surrogate. When the joint outcomes in the new trial seem plausible given similar historical trials, we proceed with the surrogate endpoint as the primary endpoint, and do so adaptively–perhaps stopping the trial for early success or inferiority of the experimental treatment, or for futility. Otherwise, we discard the surrogate and switch adaptive determinations to the original primary endpoint. We use simulation to test the operating characteristics of this new design compared to a standard O’Brien-Fleming approach, as well as the ability of our design to discriminate trustworthy from untrustworthy surrogates in hypothetical future trials. Furthermore, we investigate possible benefits using patient-level data from 18 adjuvant therapy trials in colon cancer, where disease-free survival is considered a newly-validated surrogate endpoint for overall survival.
Bayesian adaptive design; Clinical trials; Surrogate endpoints; Survival analysis
Falling is a common and morbid condition among elderly persons. Effective strategies to prevent falls have been identified but are underutilized.
Using a nonrandomized design, we compared rates of injuries from falls in a region of Connecticut where clinicians had been exposed to interventions to change clinical practice (intervention region) and in a region where clinicians had not been exposed to such interventions (usual-care region). The interventions encouraged primary care clinicians and staff members involved in home care, outpatient rehabilitation, and senior centers to adopt effective risk assessments and strategies for the prevention of falls (e.g., medication reduction and balance and gait training). The outcomes were rates of serious fall-related injuries (hip and other fractures, head injuries, and joint dislocations) and fall-related use of medical services per 1000 person-years among persons who were 70 years of age or older. The interventions occurred from 2001 to 2004, and the evaluations took place from 2004 to 2006.
Before the interventions, the adjusted rates of serious fall-related injuries (per 1000 person-years) were 31.2 in the usual-care region and 31.9 in the intervention region. During the evaluation period, the adjusted rates were 31.4 and 28.6, respectively (adjusted rate ratio, 0.91; 95% Bayesian credibility interval, 0.88 to 0.94). Between the preintervention period and the evaluation period, the rate of fall-related use of medical services increased from 68.1 to 83.3 per 1000 person-years in the usual-care region and from 70.7 to 74.2 in the intervention region (adjusted rate ratio, 0.89; 95% credibility interval, 0.86 to 0.92). The percentages of clinicians who received intervention visits ranged from 62% (131 of 212 primary care offices) to 100% (26 of 26 home care agencies).
Dissemination of evidence about fall prevention, coupled with interventions to change clinical practice, may reduce fall-related injuries in elderly persons.
Estimation of extreme quantal-response statistics, such as the concentration required to kill 99.9% of test subjects (LC99.9), remains a challenge in the presence of multiple covariates and complex study designs. Accurate and precise estimates of the LC99.9 for mixtures of toxicants is critical to ongoing control of a parasitic invasive species, the sea lamprey, in the Laurentian Great Lakes of North America. The toxicity of those chemicals is affected by local and temporal variations in water chemistry, which must be incorporated into the modeling. We develop multilevel empirical Bayes models for data from multiple laboratory studies. Our approach yields more accurate and precise estimation of the LC99.9 compared to alternative models considered. This study demonstrates that properly incorporating hierarchical structure in laboratory data yields better estimates of LC99.9 stream treatment values that are critical to larvae control in the field. In addition, out-of-sample prediction of the results of in situ tests reveals the presence of a latent seasonal effect not manifest in the laboratory studies, suggesting avenues for future study and illustrating the importance of dual consideration of both experimental and observational data.
Lethal concentration/dose; Markov chain Monte Carlo (MCMC); Non-linear model; Quantal-response bioassay
Bayesian clinical trial designs offer the possibility of a substantially reduced sample size, increased statistical power, and reductions in cost and ethical hazard. However when prior and current information conflict, Bayesian methods can lead to higher than expected Type I error, as well as the possibility of a costlier and lengthier trial. This motivates an investigation of the feasibility of hierarchical Bayesian methods for incorporating historical data that are adaptively robust to prior information that reveals itself to be inconsistent with the accumulating experimental data. In this paper, we present several models that allow for the commensurability of the information in the historical and current data to determine how much historical information is used. A primary tool is elaborating the traditional power prior approach based upon a measure of commensurability for Gaussian data. We compare the frequentist performance of several methods using simulations, and close with an example of a colon cancer trial that illustrates a linear models extension of our adaptive borrowing approach. Our proposed methods produce more precise estimates of the model parameters, in particular conferring statistical significance to the observed reduction in tumor size for the experimental regimen as compared to the control regimen.
Adaptive Designs; Bayesian; Colorectal Cancer; Clinical Trials; Power Priors
Researchers often include patient-reported outcomes (PROs) in Phase III clinical trials to demonstrate the value of treatment from the patient’s perspective. These data are collected as longitudinal repeated measures and are often censored by occurrence of a clinical event that defines a survival time. Hierarchical Bayesian models having latent individual-level trajectories provide a flexible approach to modeling such multiple outcome types simultaneously. We consider the case of many zeros in the longitudinal data motivating a mixture model, and demonstrate several approaches to modeling multiple longitudinal PROs with survival in a cancer clinical trial. These joint models may enhance Phase III analyses and better inform health care decision makers.
cancer; failure time; multivariate analysis; random effects model; repeated measures
Hospice service offers a convenient and ethically preferable health care option for terminally ill patients. However, this option is unavailable to patients in remote areas not served by any hospice system. In this paper we seek to determine the service areas of two particular cancer hospice systems in northeastern Minnesota based only on death counts abstracted from Medicare billing records. The problem is one of spatial boundary analysis, a field that appears statistically underdeveloped for irregular areal (lattice) data, even though most publicly available human health data are of this type. In this paper, we suggest a variety of hierarchical models for areal boundary analysis that hierarchically or jointly parameterize both the areas and the edge segments. This leads to conceptually appealing solutions for our data that remain computationally feasible. While our approaches parallel similar developments in statistical image restoration using Markov random fields, important differences arise due to the irregular nature of our lattices, the sparseness and high variability of our data, the existence of important covariate information, and most importantly, our desire for full posterior inference on the boundary. Our results successfully delineate service areas for our two Minnesota hospice systems that sometimes conflict with the hospices' self-reported service areas. We also obtain boundaries for the spatial residuals from our fits, separating regions that differ for reasons yet unaccounted for by our model.
Areal data; Conditionally autoregressive (CAR) model; Health services research; Ising model; Wombling
Scientists and investigators in such diverse fields as geological and environmental sciences, ecology, forestry, disease mapping, and economics often encounter spatially referenced data collected over a fixed set of locations with coordinates (latitude–longitude, Easting–Northing etc.) in a region of study. Such point-referenced or geostatistical data are often best analyzed with Bayesian hierarchical models. Unfortunately, fitting such models involves computationally intensive Markov chain Monte Carlo (MCMC) methods whose efficiency depends upon the specific problem at hand. This requires extensive coding on the part of the user and the situation is not helped by the lack of available software for such algorithms. Here, we introduce a statistical software package, spBayes, built upon the R statistical computing platform that implements a generalized template encompassing a wide variety of Gaussian spatial process models for univariate as well as multivariate point-referenced data. We discuss the algorithms behind our package and illustrate its use with a synthetic and real data example.
Bayesian inference; coregionalization; kriging; Markov chain Monte Carlo; multivariate spatial process; R
In many applications involving geographically indexed data, interest focuses on identifying regions of rapid change in the spatial surface, or the related problem of the construction or testing of boundaries separating regions with markedly different observed values of the spatial variable. This process is often referred to in the literature as boundary analysis or wombling. Recent developments in hierarchical models for point-referenced (geostatistical) and areal (lattice) data have led to corresponding statistical wombling methods, but there does not appear to be any literature on the subject in the point process case, where the locations themselves are assumed to be random and likelihood evaluation is notoriously difficult. We extend existing point-level and areal wombling tools to this case, obtaining full posterior inference for multivariate spatial random effects that, when mapped, can help suggest spatial covariates still missing from the model. In the areal case we can also construct wombled maps showing significant boundaries in the fitted intensity surface, while the point-referenced formulation permits testing the significance of a postulated boundary. In the computationally demanding point-referenced case, our algorithm combines Monte Carlo approximants to the likelihood with a predictive process step to reduce the dimension of the problem to a manageable size. We apply these techniques to an analysis of colorectal and prostate cancer data from the northern half of Minnesota, where a key substantive concern is possible similarities in their spatial patterns, and whether they are affected by each patient's distance to facilities likely to offer helpful cancer screening options.
Bayesian; Cancer; Spatial point process; Wombling
With rapid improvements in medical treatment and health care, many datasets dealing with time to relapse or death now reveal a substantial portion of patients who are cured (i.e., who never experience the event). Extended survival models called cure rate models account for the probability of a subject being cured and can be broadly classified into the classical mixture models of Berkson and Gage (BG type) or the stochastic tumor models pioneered by Yakovlev and extended to a hierarchical framework by Chen, Ibrahim, and Sinha (YCIS type). Recent developments in Bayesian hierarchical cure models have evoked significant interest regarding relationships and preferences between these two classes of models. Our present work proposes a unifying class of cure rate models that facilitates flexible hierarchical model-building while including both existing cure model classes as special cases. This unifying class enables robust modeling by accounting for uncertainty in underlying mechanisms leading to cure. Issues such as regressing on the cure fraction and propriety of the associated posterior distributions under different modeling assumptions are also discussed. Finally, we offer a simulation study and also illustrate with two datasets (on melanoma and breast cancer) that reveal our framework’s ability to distinguish among underlying mechanisms that lead to relapse and cure.
Bayesian hierarchical model; Cure fraction; Cure rate model; Latent activation scheme; Markov chain Monte Carlo algorithm; Moment-generating functions; Survival analysis