The exact mechanisms relating exposure to ultraviolet (UV) radiation and elevated risk of skin cancer remain the subject of debate. For example, there is disagreement on whether the main risk factor is duration of the exposure, its intensity, or some combination of both. There is also uncertainty regarding the form of the dose-response curve, with many authors believing only exposures exceeding a given (but unknown) threshold are important. In this paper we explore methods to estimate such thresholds using hierarchical spatial logistic models based on a sample of a cohort of x-ray technologists for whom we have self-reports of time spent in the sun and numbers of blistering sunburns in childhood. A preliminary goal is to explore the temporal pattern of UV exposure and its gradient. Changes here would imply that identical exposure self-reports from different calendar years may correspond to differing cancer risks.
doi:10.1016/j.csda.2008.10.013
PMCID: PMC2705173
PMID: 20161236
Conditionally autoregressive (CAR) model; Erythemal exposure; Hierarchical model; Non-melanoma skin cancer
Summary
The evaluation of surrogate endpoints for primary use in future clinical trials is an increasingly important research area, due to demands for more efficient trials coupled with recent regulatory acceptance of some surrogates as ‘valid.’ However, little consideration has been given to how a trial which utilizes a newly-validated surrogate endpoint as its primary endpoint might be appropriately designed. We propose a novel Bayesian adaptive trial design that allows the new surrogate endpoint to play a dominant role in assessing the effect of an intervention, while remaining realistically cautious about its use. By incorporating multi-trial historical information on the validated relationship between the surrogate and clinical endpoints, then subsequently evaluating accumulating data against this relationship as the new trial progresses, we adaptively guard against an erroneous assessment of treatment based upon a truly invalid surrogate. When the joint outcomes in the new trial seem plausible given similar historical trials, we proceed with the surrogate endpoint as the primary endpoint, and do so adaptively–perhaps stopping the trial for early success or inferiority of the experimental treatment, or for futility. Otherwise, we discard the surrogate and switch adaptive determinations to the original primary endpoint. We use simulation to test the operating characteristics of this new design compared to a standard O’Brien-Fleming approach, as well as the ability of our design to discriminate trustworthy from untrustworthy surrogates in hypothetical future trials. Furthermore, we investigate possible benefits using patient-level data from 18 adjuvant therapy trials in colon cancer, where disease-free survival is considered a newly-validated surrogate endpoint for overall survival.
doi:10.1111/j.1541-0420.2011.01647.x
PMCID: PMC3218207
PMID: 21838811
Bayesian adaptive design; Clinical trials; Surrogate endpoints; Survival analysis
Summary
With the ready availability of spatial databases and geographical information system software, statisticians are increasingly encountering multivariate modelling settings featuring associations of more than one type: spatial associations between data locations and associations between the variables within the locations. Although flexible modelling of multivariate point-referenced data has recently been addressed by using a linear model of co-regionalization, existing methods for multivariate areal data typically suffer from unnecessary restrictions on the covariance structure or undesirable dependence on the conditioning order of the variables. We propose a class of Bayesian hierarchical models for multivariate areal data that avoids these restrictions, permitting flexible and order-free modelling of correlations both between variables and across areal units. Our framework encompasses a rich class of multivariate conditionally autoregressive models that are computationally feasible via modern Markov chain Monte Carlo methods. We illustrate the strengths of our approach over existing models by using simulation studies and also offer a real data application involving annual lung, larynx and oesophageal cancer death-rates in Minnesota counties between 1990 and 2000.
doi:10.1111/j.1467-9868.2007.00612.x
PMCID: PMC2963450
PMID: 20981244
Lattice data; Linear model of co-regionalization; Markov chain Monte Carlo methods; Multivariate conditionally autoregressive model; Spatial statistics
Mixtures of Polya trees offer a very flexible nonparametric approach for modelling time-to-event data. Many such settings also feature spatial association that requires further sophistication, either at the point level or at the lattice level. In this paper, we combine these two aspects within three competing survival models, obtaining a data analytic approach that remains computationally feasible in a fully hierarchical Bayesian framework using Markov chain Monte Carlo methods. We illustrate our proposed methods with an analysis of spatially oriented breast cancer survival data from the Surveillance, Epidemiology and End Results program of the National Cancer Institute. Our results indicate appreciable advantages for our approach over competing methods that impose unrealistic parametric assumptions, ignore spatial association or both.
doi:10.1093/biomet/asp014
PMCID: PMC2749263
PMID: 19779579
Areal data; Bayesian modelling; Breast cancer; Conditionally autoregressive model; Log pseudo marginal likelihood; Nonparametric modelling
Background
Falling is a common and morbid condition among elderly persons. Effective strategies to prevent falls have been identified but are underutilized.
Methods
Using a nonrandomized design, we compared rates of injuries from falls in a region of Connecticut where clinicians had been exposed to interventions to change clinical practice (intervention region) and in a region where clinicians had not been exposed to such interventions (usual-care region). The interventions encouraged primary care clinicians and staff members involved in home care, outpatient rehabilitation, and senior centers to adopt effective risk assessments and strategies for the prevention of falls (e.g., medication reduction and balance and gait training). The outcomes were rates of serious fall-related injuries (hip and other fractures, head injuries, and joint dislocations) and fall-related use of medical services per 1000 person-years among persons who were 70 years of age or older. The interventions occurred from 2001 to 2004, and the evaluations took place from 2004 to 2006.
Results
Before the interventions, the adjusted rates of serious fall-related injuries (per 1000 person-years) were 31.2 in the usual-care region and 31.9 in the intervention region. During the evaluation period, the adjusted rates were 31.4 and 28.6, respectively (adjusted rate ratio, 0.91; 95% Bayesian credibility interval, 0.88 to 0.94). Between the preintervention period and the evaluation period, the rate of fall-related use of medical services increased from 68.1 to 83.3 per 1000 person-years in the usual-care region and from 70.7 to 74.2 in the intervention region (adjusted rate ratio, 0.89; 95% credibility interval, 0.86 to 0.92). The percentages of clinicians who received intervention visits ranged from 62% (131 of 212 primary care offices) to 100% (26 of 26 home care agencies).
Conclusions
Dissemination of evidence about fall prevention, coupled with interventions to change clinical practice, may reduce fall-related injuries in elderly persons.
doi:10.1056/NEJMoa0801748
PMCID: PMC3472807
PMID: 18635430
Summary
Estimation of extreme quantal-response statistics, such as the concentration required to kill 99.9% of test subjects (LC99.9), remains a challenge in the presence of multiple covariates and complex study designs. Accurate and precise estimates of the LC99.9 for mixtures of toxicants is critical to ongoing control of a parasitic invasive species, the sea lamprey, in the Laurentian Great Lakes of North America. The toxicity of those chemicals is affected by local and temporal variations in water chemistry, which must be incorporated into the modeling. We develop multilevel empirical Bayes models for data from multiple laboratory studies. Our approach yields more accurate and precise estimation of the LC99.9 compared to alternative models considered. This study demonstrates that properly incorporating hierarchical structure in laboratory data yields better estimates of LC99.9 stream treatment values that are critical to larvae control in the field. In addition, out-of-sample prediction of the results of in situ tests reveals the presence of a latent seasonal effect not manifest in the laboratory studies, suggesting avenues for future study and illustrating the importance of dual consideration of both experimental and observational data.
doi:10.1111/j.1541-0420.2011.01566.x
PMCID: PMC3111860
PMID: 21361894
Lethal concentration/dose; Markov chain Monte Carlo (MCMC); Non-linear model; Quantal-response bioassay
Summary
Bayesian clinical trial designs offer the possibility of a substantially reduced sample size, increased statistical power, and reductions in cost and ethical hazard. However when prior and current information conflict, Bayesian methods can lead to higher than expected Type I error, as well as the possibility of a costlier and lengthier trial. This motivates an investigation of the feasibility of hierarchical Bayesian methods for incorporating historical data that are adaptively robust to prior information that reveals itself to be inconsistent with the accumulating experimental data. In this paper, we present several models that allow for the commensurability of the information in the historical and current data to determine how much historical information is used. A primary tool is elaborating the traditional power prior approach based upon a measure of commensurability for Gaussian data. We compare the frequentist performance of several methods using simulations, and close with an example of a colon cancer trial that illustrates a linear models extension of our adaptive borrowing approach. Our proposed methods produce more precise estimates of the model parameters, in particular conferring statistical significance to the observed reduction in tumor size for the experimental regimen as compared to the control regimen.
doi:10.1111/j.1541-0420.2011.01564.x
PMCID: PMC3134568
PMID: 21361892
Adaptive Designs; Bayesian; Colorectal Cancer; Clinical Trials; Power Priors
Researchers often include patient-reported outcomes (PROs) in Phase III clinical trials to demonstrate the value of treatment from the patient’s perspective. These data are collected as longitudinal repeated measures and are often censored by occurrence of a clinical event that defines a survival time. Hierarchical Bayesian models having latent individual-level trajectories provide a flexible approach to modeling such multiple outcome types simultaneously. We consider the case of many zeros in the longitudinal data motivating a mixture model, and demonstrate several approaches to modeling multiple longitudinal PROs with survival in a cancer clinical trial. These joint models may enhance Phase III analyses and better inform health care decision makers.
doi:10.1080/10543406.2011.590922
PMCID: PMC3212950
PMID: 21830926
cancer; failure time; multivariate analysis; random effects model; repeated measures
The analysis of point-level (geostatistical) data has historically been plagued by computational difficulties, owing to the high dimension of the nondiagonal spatial covariance matrices that need to be inverted. This problem is greatly compounded in hierarchical Bayesian settings, since these inversions need to take place at every iteration of the associated Markov chain Monte Carlo (MCMC) algorithm. This paper offers an approach for modeling the spatial correlation at two separate scales. This reduces the computational problem to a collection of lower-dimensional inversions that remain feasible within the MCMC framework. The approach yields full posterior inference for the model parameters of interest, as well as the fitted spatial response surface itself. We illustrate the importance and applicability of our methods using a collection of dense point-referenced breast cancer data collected over the mostly rural northern part of the state of Minnesota. Substantively, we wish to discover whether women who live more than a 60-mile drive from the nearest radiation treatment facility tend to opt for mastectomy over breast conserving surgery (BCS, or “lumpectomy”), which is less disfiguring but requires 6 weeks of follow-up radiation therapy. Our hierarchical multiresolution approach resolves this question while still properly accounting for all sources of spatial association in the data.
doi:10.1016/j.csda.2007.09.011
PMCID: PMC2344142
PMID: 19158942
Aggregated geographic data; Big N problem; Breast cancer; Conditionally autoregressive (CAR) model; Hierarchical modeling; Kriging
Summary
Hospice service offers a convenient and ethically preferable health care option for terminally ill patients. However, this option is unavailable to patients in remote areas not served by any hospice system. In this paper we seek to determine the service areas of two particular cancer hospice systems in northeastern Minnesota based only on death counts abstracted from Medicare billing records. The problem is one of spatial boundary analysis, a field that appears statistically underdeveloped for irregular areal (lattice) data, even though most publicly available human health data are of this type. In this paper, we suggest a variety of hierarchical models for areal boundary analysis that hierarchically or jointly parameterize both the areas and the edge segments. This leads to conceptually appealing solutions for our data that remain computationally feasible. While our approaches parallel similar developments in statistical image restoration using Markov random fields, important differences arise due to the irregular nature of our lattices, the sparseness and high variability of our data, the existence of important covariate information, and most importantly, our desire for full posterior inference on the boundary. Our results successfully delineate service areas for our two Minnesota hospice systems that sometimes conflict with the hospices' self-reported service areas. We also obtain boundaries for the spatial residuals from our fits, separating regions that differ for reasons yet unaccounted for by our model.
doi:10.1111/j.1541-0420.2009.01291.x
PMCID: PMC3061258
PMID: 19645704
Areal data; Conditionally autoregressive (CAR) model; Health services research; Ising model; Wombling
Scientists and investigators in such diverse fields as geological and environmental sciences, ecology, forestry, disease mapping, and economics often encounter spatially referenced data collected over a fixed set of locations with coordinates (latitude–longitude, Easting–Northing etc.) in a region of study. Such point-referenced or geostatistical data are often best analyzed with Bayesian hierarchical models. Unfortunately, fitting such models involves computationally intensive Markov chain Monte Carlo (MCMC) methods whose efficiency depends upon the specific problem at hand. This requires extensive coding on the part of the user and the situation is not helped by the lack of available software for such algorithms. Here, we introduce a statistical software package, spBayes, built upon the R statistical computing platform that implements a generalized template encompassing a wide variety of Gaussian spatial process models for univariate as well as multivariate point-referenced data. We discuss the algorithms behind our package and illustrate its use with a synthetic and real data example.
PMCID: PMC3074178
PMID: 21494410
Bayesian inference; coregionalization; kriging; Markov chain Monte Carlo; multivariate spatial process; R
Summary
In many applications involving geographically indexed data, interest focuses on identifying regions of rapid change in the spatial surface, or the related problem of the construction or testing of boundaries separating regions with markedly different observed values of the spatial variable. This process is often referred to in the literature as boundary analysis or wombling. Recent developments in hierarchical models for point-referenced (geostatistical) and areal (lattice) data have led to corresponding statistical wombling methods, but there does not appear to be any literature on the subject in the point process case, where the locations themselves are assumed to be random and likelihood evaluation is notoriously difficult. We extend existing point-level and areal wombling tools to this case, obtaining full posterior inference for multivariate spatial random effects that, when mapped, can help suggest spatial covariates still missing from the model. In the areal case we can also construct wombled maps showing significant boundaries in the fitted intensity surface, while the point-referenced formulation permits testing the significance of a postulated boundary. In the computationally demanding point-referenced case, our algorithm combines Monte Carlo approximants to the likelihood with a predictive process step to reduce the dimension of the problem to a manageable size. We apply these techniques to an analysis of colorectal and prostate cancer data from the northern half of Minnesota, where a key substantive concern is possible similarities in their spatial patterns, and whether they are affected by each patient's distance to facilities likely to offer helpful cancer screening options.
doi:10.1111/j.1541-0420.2009.01203.x
PMCID: PMC2795082
PMID: 19302408
Bayesian; Cancer; Spatial point process; Wombling
With rapid improvements in medical treatment and health care, many datasets dealing with time to relapse or death now reveal a substantial portion of patients who are cured (i.e., who never experience the event). Extended survival models called cure rate models account for the probability of a subject being cured and can be broadly classified into the classical mixture models of Berkson and Gage (BG type) or the stochastic tumor models pioneered by Yakovlev and extended to a hierarchical framework by Chen, Ibrahim, and Sinha (YCIS type). Recent developments in Bayesian hierarchical cure models have evoked significant interest regarding relationships and preferences between these two classes of models. Our present work proposes a unifying class of cure rate models that facilitates flexible hierarchical model-building while including both existing cure model classes as special cases. This unifying class enables robust modeling by accounting for uncertainty in underlying mechanisms leading to cure. Issues such as regressing on the cure fraction and propriety of the associated posterior distributions under different modeling assumptions are also discussed. Finally, we offer a simulation study and also illustrate with two datasets (on melanoma and breast cancer) that reveal our framework’s ability to distinguish among underlying mechanisms that lead to relapse and cure.
doi:10.1198/016214507000000112
PMCID: PMC2964090
PMID: 21031152
Bayesian hierarchical model; Cure fraction; Cure rate model; Latent activation scheme; Markov chain Monte Carlo algorithm; Moment-generating functions; Survival analysis
SUMMARY
Hodges & Sargent (2001) developed a measure of a hierarchical model’s complexity, degrees of freedom (DF), that is consistent with definitions for scatterplot smoothers, interpretable in terms of simple models, and that enables control of a fit’s complexity by means of a prior distribution on complexity. DF describes complexity of the whole fitted model but in general it is unclear how to allocate DF to individual effects. We give a new definition of DF for arbitrary normal-error linear hierarchical models, consistent with Hodges & Sargent’s, that naturally partitions the n observations into DF for individual effects and for error. The new conception of an effect’s DF is the ratio of the effect’s modeled variance matrix to the total variance matrix. This gives a way to describe the sizes of different parts of a model (e.g., spatial clustering vs. heterogeneity), to place DF-based priors on smoothing parameters, and to describe how a smoothed effect competes with other effects. It also avoids difficulties with the most common definition of DF for residuals. We conclude by comparing DF to the effective number of parameters pD of Spiegelhalter et al (2002). Technical appendices and a dataset are available online as supplemental materials.
doi:10.1198/TECH.2009.08161
PMCID: PMC2886314
PMID: 20559456
Degrees of freedom; hierarchical model; model complexity; prior distribution
Summary
Mixtures of Polya trees offer a very flexible nonparametric approach for modelling time-to-event data. Many such settings also feature spatial association that requires further sophistication, either at the point level or at the lattice level. In this paper, we combine these two aspects within three competing survival models, obtaining a data analytic approach that remains computationally feasible in a fully hierarchical Bayesian framework using Markov chain Monte Carlo methods. We illustrate our proposed methods with an analysis of spatially oriented breast cancer survival data from the Surveillance, Epidemiology and End Results program of the National Cancer Institute. Our results indicate appreciable advantages for our approach over competing methods that impose unrealistic parametric assumptions, ignore spatial association or both.
doi:10.1093/biomet/asp014
PMCID: PMC2749263
PMID: 19779579
Areal data; Bayesian modelling; Breast cancer; Conditionally autoregressive model; Log pseudo marginal likelihood; Nonparametric modelling
Colon and rectum cancer share many risk factors, and are often tabulated together as “colorectal cancer” in published summaries. However, recent work indicating that exercise, diet, and family history may have differential impacts on the two cancers encourages analyzing them separately, so that corresponding public health interventions can be more efficiently targeted. We analyze colon and rectum cancer data from the Minnesota Cancer Surveillance System from 1998-2002 over the 16-county Twin Cities (Minneapolis-St. Paul) metro and exurban area. The data consist of two marked point patterns, meaning that any statistical model must account for randomness in the observed locations, and expected positive association between the two cancer patterns. Our model extends marked spatial point pattern analysis in the context of a log Guassian Cox process to accommodate spatially referenced covariates (local poverty rate and location within the metro area), individual-level risk factors (patient age and cancer stage), and related interactions. We obtain smoothed maps of marginal log-relative intensity surfaces for colon and rectum cancer, and uncover significant age and stage differences between the two groups. This encourages more aggressive colon cancer screening in the inner Twin Cities and their southern and western exurbs, where our model indicates higher colon cancer relative intensity.
doi:10.1214/09-AOAS240
PMCID: PMC2857924
PMID: 20414368
Colon cancer; log Guassian Cox process (LGCP); rectum cancer; spatial point process
DNA microarray analysis is a biological technology which permits the whole genome
to be monitored simultaneously on a single slide. Microarray technology not only
opens an exciting research area for biologists, but also provides significant new
challenges to statisticians. Two very common questions in the analysis of microarray
data are, first, should we normalize arrays to remove potential systematic biases,
and if so, what normalization method should we use? Second, how should we then
implement tests of statistical significance? Straightforward and uniform answers
to these questions remain elusive. In this paper, we use a real data example to
illustrate a practical approach to addressing these questions. Our data is taken from a
DNA–protein binding microarray experiment aimed at furthering our understanding
of transcription regulation mechanisms, one of the most important issues in biology.
For the purpose of preprocessing data, we suggest looking at descriptive plots first
to decide whether we need preliminary normalization and, if so, how this should
be accomplished. For subsequent comparative inference, we recommend use of
an empirical Bayes method (the B statistic), since it performs much better than
traditional methods, such as the sample mean (M statistic) and Student's t statistic,
and it is also relatively easy to compute and explain compared to the others. The false
discovery rate (FDR) is used to evaluate the different methods, and our comparative
results lend support to our above suggestions.
doi:10.1002/cfg.416
PMCID: PMC2447464
PMID: 18629172