Among patients on dialysis, cardiovascular disease and infection are leading causes of hospitalization and death. Although recent studies have found that the risk of cardiovascular events is higher after an infection-related hospitalization, studies have not fully elucidated how the risk of cardiovascular events changes over time for patients on dialysis. In this work, we characterize the dynamics of cardiovascular event risk trajectories for patients on dialysis while conditioning on survival status via multiple time indices: (1) time since the start of dialysis, (2) time since the pivotal initial infection-related hospitalization and (3) the patient’s age at the start of dialysis. This is achieved by using a new class of generalized multiple-index varying coefficient (GM-IVC) models. The proposed GM-IVC models utilize a multiplicative structure and one-dimensional varying coefficient functions along each time and age index to capture the cardiovascular risk dynamics before and after the initial infection-related hospitalization among the dynamic cohort of survivors. We develop a two-step estimation procedure for the GM-IVC models based on local maximum likelihood. We report new insights on the dynamics of cardiovascular events risk using the United States Renal Data System database, which collects data on nearly all patients with end-stage renal disease in the U.S. Finally, simulation studies assess the performance of the proposed estimation procedures.
doi:10.1111/biom.12176
PMCID: PMC4209204
PMID: 24766178
Cardiovascular outcomes; End stage renal disease; Generalized linear models; Infection; Time-varying effects; United States Renal Data System
Summary
In cancer research, profiling studies have been extensively conducted, searching for genes/SNPs associated with prognosis. Cancer is diverse. Examining the similarity and difference in the genetic basis of multiple subtypes of the same cancer can lead to a better understanding of their connections and distinctions. Classic meta-analysis methods analyze each subtype separately and then compare analysis results across subtypes. Integrative analysis methods, in contrast, analyze the raw data on multiple subtypes simultaneously and can outperform meta-analysis methods. In this study, prognosis data on multiple subtypes of the same cancer are analyzed. An AFT (accelerated failure time) model is adopted to describe survival. The genetic basis of multiple subtypes is described using the heterogeneity model, which allows a gene/SNP to be associated with prognosis of some subtypes but not others. A compound penalization method is developed to identify genes that contain important SNPs associated with prognosis. The proposed method has an intuitive formulation and is realized using an iterative algorithm. Asymptotic properties are rigorously established. Simulation shows that the proposed method has satisfactory performance and outperforms a penalization-based meta-analysis method and a regularized thresholding method. An NHL (non-Hodgkin lymphoma) prognosis study with SNP measurements is analyzed. Genes associated with the three major subtypes, namely DLBCL, FL, and CLL/SLL, are identified. The proposed method identifies genes that are different from alternatives and have important implications and satisfactory prediction performance.
doi:10.1111/biom.12177
PMCID: PMC4209207
PMID: 24766212
Cancer prognosis; Integrative analysis; Genetic association; Marker identification; Penalization
Summary
A potential venue to improve healthcare efficiency is to effectively tailor individualized treatment strategies by incorporating patient level predictor information such as environmental exposure, biological, and genetic marker measurements. Many useful statistical methods for deriving individualized treatment rules (ITR) have become available in recent years. Prior to adopting any ITR in clinical practice, it is crucial to evaluate its value in improving patient outcomes. Existing methods for quantifying such values mainly consider either a single marker or semi-parametric methods that are subject to bias under model misspecification. In this paper, we consider a general setting with multiple markers and propose a two-step robust method to derive ITRs and evaluate their values. We also propose procedures for comparing different ITRs, which can be used to quantify the incremental value of new markers in improving treatment selection. While working models are used in step I to approximate optimal ITRs, we add a layer of calibration to guard against model misspecification and further assess the value of the ITR non-parametrically, which ensures the validity of the inference. To account for the sampling variability of the estimated rules and their corresponding values, we propose a resampling procedure to provide valid confidence intervals for the value functions as well as for the incremental value of new markers for treatment selection. Our proposals are examined through extensive simulation studies and illustrated with the data from a clinical trial that studies the effects of two drug combinations on HIV-1 infected patients.
doi:10.1111/biom.12179
PMCID: PMC4213325
PMID: 24779731
Biomarker-analysis Design; Counterfactual Outcome; Personalized Medicine; Perturbation-resampling; Predictive Biomarkers; Subgroup Analysis
Summary
The identification of causal peer effects (also known as social contagion or induction) from observational data in social networks is challenged by two distinct sources of bias: latent homophily and unobserved confounding. In this paper, we investigate how causal peer effects of traits and behaviors can be identified using genes (or other structurally isomorphic variables) as instrumental variables (IV) in a large set of data generating models with homophily and confounding. We use directed acyclic graphs to represent these models and employ multiple IV strategies and report three main identification results. First, using a single fixed gene (or allele) as an IV will generally fail to identify peer effects if the gene affects past values of the treatment. Second, multiple fixed genes/alleles, or, more promisingly, time-varying gene expression, can identify peer effects if we instrument exclusion violations as well as the focal treatment. Third, we show that IV identification of peer effects remains possible even under multiple complications often regarded as lethal for IV identification of intra-individual effects, such as pleiotropy on observables and unobservables, homophily on past phenotype, past and ongoing homophily on genotype, inter-phenotype peer effects, population stratification, gene expression that is endogenous to past phenotype and past gene expression, and others. We apply our identification results to estimating peer effects of body mass index (BMI) among friends and spouses in the Framingham Heart Study. Results suggest a positive causal peer effect of BMI between friends.
doi:10.1111/biom.12172
PMCID: PMC4213357
PMID: 24779654
Body-mass index; Causality; Directed Acyclic Graphs; Dyad; Genes; Homophily; Instrumental variable; Longitudinal; Mendelian randomization; Peer effect; Social network; Two-stage least squares
Summary
Estimating the effectiveness of a new intervention is usually the primary objective for HIV prevention trials. The Cox proportional hazard model is mainly used to estimate effectiveness by assuming that participants share the same risk under the covariates and the risk is always non-zero. In fact, the risk is only non-zero when an exposure event occurs, and participants can have a varying risk to transmit due to varying patterns of exposure events. Therefore, we propose a novel estimate of effectiveness adjusted for the heterogeneity in the magnitude of exposure among the study population, using a latent Poisson process model for the exposure path of each participant. Moreover, our model considers the scenario in which a proportion of participants never experience an exposure event and adopts a zero-inflated distribution for the rate of the exposure process. We employ a Bayesian estimation approach to estimate the exposure-adjusted effectiveness eliciting the priors from the historical information. Simulation studies are carried out to validate the approach and explore the properties of the estimates. An application example is presented from an HIV prevention trial.
doi:10.1111/biom.12183
PMCID: PMC4239192
PMID: 24845658
Hierarchical models; HIV prevention; Intercourse; Markov chain Monte Carlo; Per-exposure effectiveness; Zero-inflated gamma
Summary
Motivated by the problem of construction gene co-expression network, we propose a statistical framework for estimating high-dimensional partial correlation matrix by a three-step approach. We first obtain a penalized estimate of a partial correlation matrix using ridge penalty. Next we select the non-zero entries of the partial correlation matrix by hypothesis testing. Finally we reestimate the partial correlation coefficients at these non-zero entries. In the second step, the null distribution of the test statistics derived from penalized partial correlation estimates has not been established. We address this challenge by estimating the null distribution from the empirical distribution of the test statistics of all the penalized partial correlation estimates. Extensive simulation studies demonstrate the good performance of our method. Application on a yeast cell cycle gene expression data shows that our method delivers better predictions of the protein-protein interactions than the Graphic Lasso.
doi:10.1111/biom.12186
PMCID: PMC4239206
PMID: 24845967
Co-expression network; Empirical null distribution; Graphical model; Partial correlation matrix; Ridge regression
Summary
Interference occurs when the treatment of one person affects the outcome of another. For example, in infectious diseases, whether one individual is vaccinated may affect whether another individual becomes infected or develops disease. Quantifying such indirect (or spillover) effects of vaccination could have important public health or policy implications. In this paper we use recently developed inverse-probability weighted (IPW) estimators of treatment effects in the presence of interference to analyze an individually-randomized, placebo-controlled trial of cholera vaccination that targeted 121,982 individuals in Matlab, Bangladesh. Because these IPW estimators have not been employed previously, a simulation study was also conducted to assess the empirical behavior of the estimators in settings similar to the cholera vaccine trial. Simulation study results demonstrate the IPW estimators can yield unbiased estimates of the direct, indirect, total and overall effects of vaccination when there is interference provided the untestable no unmeasured confounders assumption holds and the group-level propensity score model is correctly specified. Application of the IPW estimators to the cholera vaccine trial indicates the presence of interference. For example, the IPW estimates suggest on average 5.29 fewer cases of cholera per 1000 person-years (95% confidence interval 2.61, 7.96) will occur among unvaccinated individuals within neighborhoods with 60% vaccine coverage compared to neighborhoods with 32% coverage. Our analysis also demonstrates how not accounting for interference can render misleading conclusions about the public health utility of vaccination.
doi:10.1111/biom.12184
PMCID: PMC4239215
PMID: 24845800
Causal inference; Interference; Inverse-probability weighted estimators; Spillover effect; Two-stage randomization; Vaccine
Summary
Spatial-clustered data refer to high-dimensional correlated measurements collected from units or subjects that are spatially clustered. Such data arise frequently from studies in social and health sciences. We propose a unified modeling framework, termed as GeoCopula, to characterize both large-scale variation, and small-scale variation for various data types, including continuous data, binary data, and count data as special cases. To overcome challenges in the estimation and inference for the model parameters, we propose an efficient composite likelihood approach in that the estimation efficiency is resulted from a construction of over-identified joint composite estimating equations. Consequently, the statistical theory for the proposed estimation is developed by extending the classical theory of the generalized method of moments. A clear advantage of the proposed estimation method is the computation feasibility. We conduct several simulation studies to assess the performance of the proposed models and estimation methods for both Gaussian and binary spatial-clustered data. Results show a clear improvement on estimation efficiency over the conventional composite likelihood method. An illustrative data example is included to motivate and demonstrate the proposed method.
doi:10.1111/biom.12199
PMCID: PMC4431962
PMID: 24945876
Gaussian copula; Generalized method of moments; Geographical cluster; Matérn class; Regression
Summary
Competing risks arise naturally in time-to-event studies. In this article, we propose time-dependent accuracy measures for a marker when we have censored survival times and competing risks. Time-dependent versions of sensitivity or true positive (TP) fraction naturally correspond to consideration of either cumulative (or prevalent) cases that accrue over a fixed time period, or alternatively to incident cases that are observed among event-free subjects at any select time. Time-dependent (dynamic) specificity (1–false positive (FP)) can be based on the marker distribution among event-free subjects. We extend these definitions to incorporate cause of failure for competing risks outcomes. The proposed estimation for cause-specific cumulative TP/dynamic FP is based on the nearest neighbor estimation of bivariate distribution function of the marker and the event time. On the other hand, incident TP/dynamic FP can be estimated using a possibly nonproportional hazards Cox model for the cause-specific hazards and riskset reweighting of the marker distribution. The proposed methods extend the time-dependent predictive accuracy measures of Heagerty, Lumley, and Pepe.
doi:10.1111/j.1541-0420.2009.01375.x
PMCID: PMC4512205
PMID: 20070296
Accuracy; Competing risks; Cox regression; Discrimination; Kaplan–Meier estimator; Kernel smoothing; Prediction; Sensitivity; Specificity
Summary
The availability of cross-platform, large-scale genomic data has enabled the investigation of complex biological relationships for many cancers. Identification of reliable cancer-related biomarkers requires the characterization of multiple interactions across complex genetic networks. MicroRNAs are small non-coding RNAs that regulate gene expression; however, the direct relationship between a microRNA and its target gene is difficult to measure. We propose a novel Bayesian model to identify microRNAs and their target genes that are associated with survival time by incorporating the microRNA regulatory network through prior distributions. We assume that biomarkers involved in regulatory networks are likely associated with survival time. We employ non-local prior distributions and a stochastic search method for the selection of biomarkers associated with the survival outcome. We use KEGG pathway information to incorporate correlated gene effects within regulatory networks. Using simulation studies, we assess the performance of our method, and apply it to experimental data of kidney renal cell carcinoma (KIRC) obtained from The Cancer Genome Atlas. Our novel method validates previously identified cancer biomarkers and identifies biomarkers specific to KIRC progression that were not previously discovered. Using the KIRC data, we confirm that biomarkers involved in regulatory networks are more likely to be associated with survival time, showing connections in one regulatory network for five out of six such genes we identified.
doi:10.1111/biom.12266
PMCID: PMC4499566
PMID: 25639276
Bayesian variable selection; genomic data; miRNA regulatory network; non-local prior
Summary
A general framework is proposed for Bayesian model based designs of Phase I cancer trials, in which a general criterion for coherence of a design is also developed. This framework can incorporate both “individual” and “collective” ethics into the design of the trial. We propose a new design that minimizes a risk function composed of two terms, with one representing the individual risk of the current dose and the other representing the collective risk. The performance of this design, which is measured in terms of the accuracy of the estimated target dose at the end of the trial, the toxicity and overdose rates, and certain loss functions reflecting the individual and collective ethics, is studied and compared with existing Bayesian model based designs and is shown to have better performance than existing designs.
doi:10.1111/j.1541-0420.2010.01471.x
PMCID: PMC4485382
PMID: 20731643
Cancer trials; Coherence; Dose-finding; Logistic regression; Markov decision problem; Phase I
Several statistical methods for meta-analysis of diagnostic accuracy studies have been discussed in the presence of a gold standard. However, in practice, the selected reference test may be imperfect due to measurement error, non-existence, invasive nature, or expensive cost of a gold standard. It has been suggested that treating an imperfect reference test as a gold standard can lead to substantial bias in the estimation of diagnostic test accuracy. Recently, two models have been proposed to account for imperfect reference test, namely, a multivariate generalized linear mixed model (MGLMM) and a hierarchical summary receiver operating characteristic (HSROC) model. Both models are very flexible in accounting for heterogeneity in accuracies of tests across studies as well as the dependence between tests. In this paper, we show that these two models, although with different formulations, are closely related and are equivalent in the absence of study-level covariates. Furthermore, we provide the exact relations between the parameters of these two models and assumptions under which two models can be reduced to equivalent submodels. On the other hand, we show that some submodels of the MGLMM do not have corresponding equivalent submodels of the HSROC model, and vice versa. With three real examples, we illustrate the cases when fitting the MGLMM and HSROC models leads to equivalent submodels and hence identical inference, and the cases when the inferences from two models are slightly different. Our results generalize the important relations between the bivariate generalized linear mixed model and HSROC model when the reference test is a gold standard.
doi:10.1111/biom.12264
PMCID: PMC4416105
PMID: 25358907
Diagnostic test; Generalized linear mixed model; Hierarchical model; Imperfect reference test; Meta-analysis
Summary
Time-dependent receiver operating characteristic (ROC) curves and their area under the curve (AUC) are important measures to evaluate the prediction accuracy of biomarkers for time-to-event endpoints (e.g., time to disease progression or death). In this paper, we propose a direct method to estimate AUC(t) as a function of time t using a flexible fractional polynomials model, without the middle step of modeling the time-dependent ROC. We develop a pseudo partial-likelihood procedure for parameter estimation and provide a test procedure to compare the predictive performance between biomarkers. We establish the asymptotic properties of the proposed estimator and test statistics. A major advantage of the proposed method is its ease to make inference and to compare the prediction accuracy across biomarkers, rendering our method particularly appealing for studies that require comparing and screening a large number of candidate biomarkers. We evaluate the finite-sample performance of the proposed method through simulation studies and illustrate our method in an application to AIDS Clinical Trials Group 175 data.
doi:10.1111/biom.12293
PMCID: PMC4479968
PMID: 25758584
Biomarker evaluation; pseudo partial-likelihood; time-dependent AUC; time-dependent ROC
Summary
This manuscript considers regression models for generalized, multilevel functional responses: functions are generalized in that they follow an exponential family distribution and multilevel in that they are clustered within groups or subjects. This data structure is increasingly common across scientific domains and is exemplified by our motivating example, in which binary curves indicating physical activity or inactivity are observed for nearly six hundred subjects over five days. We use a generalized linear model to incorporate scalar covariates into the mean structure, and decompose subject-specific and subject-day-specific deviations using multilevel functional principal components analysis. Thus, functional fixed effects are estimated while accounting for within-function and within-subject correlations, and major directions of variability within and between subjects are identified. Fixed effect coefficient functions and principal component basis functions are estimated using penalized splines; model parameters are estimated in a Bayesian framework using
Stan, a programming language that implements a Hamiltonian Monte Carlo sampler. Simulations designed to mimic the application have good estimation and inferential properties with reasonable computation times for moderate datasets, in both cross-sectional and multilevel scenarios; code is publicly available. In the application we identify effects of age and BMI on the time-specific change in probability of being active over a twenty-four hour period; in addition, the principal components analysis identifies the patterns of activity that distinguish subjects and days within subjects.
doi:10.1111/biom.12278
PMCID: PMC4479975
PMID: 25620473
Accelerometry; Bayesian Inference; Generalized Functional Data; Hamiltonian Monte Carlo; Penalized Splines
Summary
We propose a multivariate sparse group lasso variable selection and estimation method for data with high-dimensional predictors as well as high-dimensional response variables. The method is carried out through a penalized multivariate multiple linear regression model with an arbitrary group structure for the regression coefficient matrix. It suits many biology studies well in detecting associations between multiple traits and multiple predictors, with each trait and each predictor embedded in some biological functioning groups such as genes, pathways or brain regions. The method is able to effectively remove unimportant groups as well as unimportant individual coefficients within important groups, particularly for large p small n problems, and is flexible in handling various complex group structures such as overlapping or nested or multilevel hierarchical structures. The method is evaluated through extensive simulations with comparisons to the conventional lasso and group lasso methods, and is applied to an eQTL association study.
doi:10.1111/biom.12292
PMCID: PMC4479976
PMID: 25732839
coordinate descent algorithm; eQTL; high-dimensional data; genetic association; oracle inequalities; sparsity
Summary
Data sources with repeated measurements are an appealing resource to understand the relationship between changes in biological markers and risk of a clinical event. While longitudinal data present opportunities to observe changing risk over time, these analyses can be complicated if the measurement of clinical metrics is sparse and/or irregular, making typical statistical methods unsuitable. In this article, we use electronic health record (EHR) data as an example to present an analytic procedure to both create an analytic sample and analyze the data to detect clinically meaningful markers of acute myocardial infarction (MI). Using an EHR from a large national dialysis organization we abstracted the records of 64,318 individuals and identified 4769 people that had an MI during the study period. We describe a nested case-control design to sample appropriate controls and an analytic approach using regression splines. Fitting a mixed-model with truncated power splines we perform a series of goodness-of-fit tests to determine whether any of 11 regularly collected laboratory markers are useful clinical predictors. We test the clinical utility of each marker using an independent test set. The results suggest that EHR data can be easily used to detect markers of clinically acute events. Special software or analytic tools are not needed, even with irregular EHR data.
doi:10.1111/biom.12283
PMCID: PMC4479980
PMID: 25652566
Biological markers; Dialysis; Longitudinal data; Myocardial infarction; Risk prediction; Splines
Summary
In clinical trials, an intermediate marker measured after randomization can often provide early information about the treatment effect on the final outcome of interest. We explore the use of recurrence time as an auxiliary variable for estimating the treatment effect on overall survival in phase three randomized trials of colon cancer. A multi-state model with an incorporated cured fraction for recurrence is used to jointly model time to recurrence and time to death. We explore different ways in which the information about recurrence time and the assumptions in the model can lead to improved efficiency. Estimates of overall survival and disease-free survival can be derived directly from the model with efficiency gains obtained as compared to Kaplan-Meier estimates. Alternatively, efficiency gains can be achieved by using the model in a weaker way in a multiple imputation procedure which imputes death times for censored subjects. By using the joint model, recurrence is used as an auxiliary variable in predicting survival times. We demonstrate the potential use of the proposed methods in shortening the length of a trial and reducing sample sizes.
doi:10.1111/biom.12281
PMCID: PMC4480062
PMID: 25585942
Auxiliary variable; Colon cancer; Cure models; Multiple imputation; Multi-state model
Summary
This paper develops methods and inference for causal estimation in semiparametric transformation models for prevalent survival data. Through estimation of the transformation models and covariate distribution, we propose analytical procedures to estimate the causal survival function. As the data are observational, the unobserved potential outcome (survival time) may be associated with the treatment assignment, and therefore there may exist a systematic imbalance between the data observed from each treatment arm. Further, due to prevalent sampling, subjects are observed only if they have not experienced the failure event when data collection began, causing the prevalent sampling bias. We propose a unified approach which simultaneously corrects the bias from the prevalent sampling and balances the systematic differences from the observational data. We illustrate in the simulation study that standard analysis without proper adjustment would result in biased causal inference. Large sample properties of the proposed estimation procedures are established by techniques of empirical processes and examined by simulation studies. The proposed methods are applied to the Surveillance, Epidemiology, and End Results (SEER) and Medicare linked data for women diagnosed with breast cancer.
doi:10.1111/biom.12286
PMCID: PMC4480066
PMID: 25715045
Causal estimation; Dependent truncation; Prevalent sampling; Survival analysis
Summary
In longitudinal studies comparing two treatments with a maximum follow-up time there may be interest in examining treatment effects for intermediate follow-up times. One motivation may be to identify the time period with greatest treatment difference when there is a non-monotone treatment effect over time; another motivation may be to make the trial more efficient in terms of time to reach a decision on whether a new treatment is efficacious or not. Here we test the composite null hypothesis of no difference at any follow-up time versus the alternative that there is a difference at at least one follow-up time. The methods are applicable when a few measurements are taken over time, such as in early longitudinal trials or in ancillary studies. Suppose the test statistic Ztk will be used to test the hypothesis of no treatment effect at a fixed follow-up time tk. In this context a common approach is to perform a pilot study on N1 subjects, and evaluate the treatment effect at the fixed time points t1, …, tK and choose t* as the value of tk for which Ztk is maximized. Having chosen t* a second trial can be designed. In a setting with group sequential testing we consider several adaptive alternatives to this approach that treat the pilot and second trial as a seamless, combined entity and evaluate Type I error and power characteristics. The adaptive designs we consider typically have improved power over the common, separate trial approach.
doi:10.1111/biom.12287
PMCID: PMC4480157
PMID: 25818116
Adaptive design; Adaptive follow-up time; Adaptive longitudinal trial; Longitudinal study; Model-free longitudinal analysis
Summary
Pharmacogenetics investigates the relationship between heritable genetic variation and the variation in how individuals respond to drug therapies. Often, gene-drug interactions play a primary role in this response, and identifying these effects can aid in the development of individualized treatment regimes. Haplotypes can hold key information in understanding the association between genetic variation and drug response. However, the standard approach for haplotype-based association analysis does not directly address the research questions dictated by individualized medicine. A complementary post-hoc analysis is required, and this post-hoc analysis is usually under powered after adjusting for multiple comparisons and may lead to seemingly contradictory conclusions. In this work, we propose a penalized likelihood approach that is able to overcome the drawbacks of the standard approach and yield the desired personalized output. We demonstrate the utility of our method by applying it to the Scottish Randomized Trial in Ovarian Cancer. We also conducted simulation studies and showed that the proposed penalized method has comparable or more power than the standard approach and maintains low Type I error rates for both binary and quantitative drug responses. The largest performance gains are seen when the haplotype frequency is low, the difference in effect sizes are small, or the true relationship among the drugs is more complex.
doi:10.1111/biom.12259
PMCID: PMC4480191
PMID: 25604216
Association analysis; Haplotype; Individualized medicine; Multiple comparisons; Penalized regression; Pharmacogenetics
Summary
This article deals with jointly modeling a large number of geographically referenced outcomes observed over a very large number of locations. We seek to capture associations among the variables as well as the strength of spatial association for each variable. In addition, we reckon with the common setting where not all the variables have been observed over all locations, which leads to spatial misalignment. Dimension reduction is needed in two aspects: (i) the length of the vector of outcomes, and (ii) the very large number of spatial locations. Latent variable (factor) models are usually used to address the former, although low-rank spatial processes offer a rich and flexible modeling option for dealing with a large number of locations. We merge these two ideas to propose a class of hierarchical low-rank spatial factor models. Our framework pursues stochastic selection of the latent factors without resorting to complex computational strategies (such as reversible jump algorithms) by utilizing certain identifiability characterizations for the spatial factor model. A Markov chain Monte Carlo algorithm is developed for estimation that also deals with the spatial misalignment problem. We recover the full posterior distribution of the missing values (along with model parameters) in a Bayesian predictive framework. Various additional modeling and implementation issues are discussed as well. We illustrate our methodology with simulation experiments and an environmental data set involving air pollutants in California.
doi:10.1111/j.1541-0420.2012.01832.x
PMCID: PMC4466112
PMID: 23379832
Bayesian inference; Factor analysis; Gaussian predictive process; Linear model of coregionalization; Low-rank spatial modeling; Multivariate spatial processes; Spatial misalignment
Summary
In this report, we consider the setting where the event of interest can occur repeatedly for the same subject (i.e., a recurrent event; e.g., hospitalization) and may be stopped permanently by a terminating event (e.g., death). Among the different ways to model recurrent/terminal event data, the marginal mean (i.e., averaging over the survival distribution) is of primary interest from a public health or health economics perspective. Often, the difference between treatment-specific recurrent event means will not be constant over time, particularly when treatment-specific differences in survival exist. In such cases, it makes more sense to quantify treatment effect based on the cumulative difference in the recurrent event mean, as opposed to the instantaneous difference in the rate. We propose a method that compares treatments by separately estimating the survival probabilities and recurrent event rate given survival, then integrating to get the mean number of events. The proposed method combines an additive model for the conditional recurrent event rate and a proportional hazards model for the terminating event hazard. The treatment effects on survival and on recurrent event rate among survivors are estimated in constructing our measure and explain the mechanism generating the difference under study. The example which motivates this research is the repeated occurrence of hospitalization among kidney transplant recipients, where the effect of Expanded Criteria Donor (ECD) compared to non-ECD kidney transplantation on the mean number of hospitalizations is of interest.
doi:10.1111/j.1541-0420.2008.01157.x
PMCID: PMC4465273
PMID: 19053997
Additive rates model; Competing risks; Marginal mean; Proportional hazards model; Rate regression; Semiparametric model
Summary
Case-cohort sampling is a commonly used and efficient method for studying large cohorts. Most existing methods of analysis for case-cohort data have concerned the analysis of univariate failure time data. However, clustered failure time data are commonly encountered in public health studies. For example, patients treated at the same center are unlikely to be independent. In this article, we consider methods based on estimating equations for case-cohort designs for clustered failure time data. We assume a marginal hazards model, with a common baseline hazard and common regression coefficient across clusters. The proposed estimators of the regression parameter and cumulative baseline hazard are shown to be consistent and asymptotically normal, and consistent estimators of the asymptotic covariance matrices are derived. The regression parameter estimator is easily computed using any standard Cox regression software that allows for offset terms. The proposed estimators are investigated in simulation studies, and demonstrated empirically to have increased efficiency relative to some existing methods. The proposed methods are applied to a study of mortality among Canadian dialysis patients.
doi:10.1111/j.1541-0420.2010.01445.x
PMCID: PMC4458467
PMID: 20560939
Case-cohort study; Clustered data; Cox model; Estimating equation; Robust variance; Survival analysis
Summary
To evaluate the utility of automated deformable image registration (DIR) algorithms, it is necessary to evaluate both the registration accuracy of the DIR algorithm itself, as well as the registration accuracy of the human readers from whom the ”gold standard” is obtained. We propose a Bayesian hierarchical model to evaluate the spatial accuracy of human readers and automatic DIR methods based on multiple image registration data generated by human readers and automatic DIR methods. To fully account for the locations of landmarks in all images, we treat the true locations of landmarks as latent variables and impose a hierarchical structure on the magnitude of registration errors observed across image pairs. DIR registration errors are modeled using Gaussian processes with reference prior densities on prior parameters that determine the associated covariance matrices. We develop a Gibbs sampling algorithm to efficiently fit our models to high-dimensional data, and apply the proposed method to analyze an image dataset obtained from a 4D thoracic CT study.
doi:10.1111/biom.12146
PMCID: PMC4061263
PMID: 24575781
Image processing; latent variable; Bayesian analysis; spatial correlation
doi:10.1111/biom.12229
PMCID: PMC4447210
PMID: 25355405