The effect of a targeted agent on a cancer patient's clinical outcome putatively is mediated through the agent's effect on one or more early biological events. This is motivated by pre-clinical experiments with cells or animals that identify such events, represented by binary or quantitative biomarkers. When evaluating targeted agents in humans, central questions are whether the distribution of a targeted biomarker changes following treatment, the nature and magnitude of this change, and whether it is associated with clinical outcome. Major difficulties in estimating these effects are that a biomarker's distribution may be complex, vary substantially between patients, and have complicated relationships with clinical outcomes. We present a probabilistically coherent framework for modeling and estimation in this setting, including a hierarchical Bayesian nonparametric mixture model for biomarkers that we use to define a functional profile of pre-versus-post treatment biomarker distribution change. The functional is similar to the receiver operating characteristic used in diagnostic testing. The hierarchical model yields clusters of individual patient biomarker profile functionals, and we use the profile as a covariate in a regression model for clinical outcome. The methodology is illustrated by analysis of a dataset from a clinical trial in prostate cancer using imatinib to target platelet-derived growth factor, with the clinical aim to improve progression-free survival time.
Bayesian nonparametrics; Nested Dirichlet Process; Receiving Operating Curve; Biomarker profiles; Survival analysis
Event-history studies of recurrent events are often conducted in fields such as demography, epidemiology, medicine, and social sciences (Cook and Lawless, 2007; Zhao et al., 2011). For such analysis, two types of data have been extensively investigated: recurrent-event data and panel-count data. However, in practice, one may face a third type of data, mixed recurrent-event and panel-count data or mixed event-history data. Such data occur if some study subjects are monitored or observed continuously and thus provide recurrent-event data, while the others are observed only at discrete times and hence give only panel-count data. A more general situation is that each subject is observed continuously over certain time periods but only at discrete times over other time periods. There exists little literature on the analysis of such mixed data except that published by Zhu et al. (2013). In this paper, we consider the regression analysis of mixed data using the additive rate model and develop some estimating equation-based approaches to estimate the regression parameters of interest. Both finite sample and asymptotic properties of the resulting estimators are established, and the numerical studies suggest that the proposed methodology works well for practical situations. The approach is applied to a Childhood Cancer Survivor Study that motivated this study.
Additive rate model; Event-history studies; Mixed data; Panel-count data; Recurrent-event data
Ordinal outcomes arise frequently in clinical studies when each subject is assigned to a category and the categories have a natural order. Classification rules for ordinal outcomes may be developed with commonly used regression models such as the full continuation ratio (CR) model (fCR), which allows the covariate effects to differ across all continuation ratios, and the CR model with a proportional odds structure (pCR), which assumes the covariate effects to be constant across all continuation ratios. For settings where the covariate effects differ between some continuation ratios but not all, fitting either fCR or pCR may lead to suboptimal prediction performance. In addition, these standard models do not allow for non-linear covariate effects. In this paper, we propose a sparse CR kernel machine (KM) regression method for ordinal outcomes where we use the KM framework to incorporate non-linearity and impose sparsity on the overall differences between the covariate effects of continuation ratios to control for overfitting. In addition, we provide data driven rule to select an optimal kernel to maximize the prediction accuracy. Simulation results show that our proposed procedures perform well under both linear and non-linear settings, especially when the true underlying model is in-between fCR and pCR models. We apply our procedures to develop a prediction model for levels of anti-CCP among rheumatoid arthritis patients and demonstrate the advantage of our method over other commonly used methods.
Continuation ratio model; Kernel machine regression; Kernel PCA; Ordinal outcome; Prediction
A recent paper (Zhang et al., 2012)
compares regression based and inverse probability based methods of estimating an
optimal treatment regime and shows for a small number of covariates that inverse
probability weighted methods are more robust to model misspecification than
regression methods. We demonstrate that using models that fit the data better
reduces the concern about non-robustness for the regression methods. We extend
the simulation study of Zhang et al
(2012), also considering the situation of a larger number of covariates,
and show that incorporating random forests into both regression and inverse
probability weighted based methods improves their properties.
Optimal Treatment Regime; Random Forests
Vaccine clinical trials with active surveillance for infection often use the time to infection as the primary endpoint. A common method of analysis for such trials is to compare the times to infection between the vaccine and placebo groups using a Cox regression model. With new technology, we can sometimes additionally record the precise number of virions that cause infection rather than just the indicator that infection occurred. In this article, we develop a unified approach for vaccine trials that couples the time to infection with the number of infecting or founder viruses. We assume that the instantaneous risk of a potentially infectious exposure for individuals in the placebo and vaccine groups follows the same proportional intensity model. Following exposure, the number of founder viruses X* is assumed to be generated from some distribution on 0, 1, . . . , which is allowed to be different for the two groups. Exposures that result in X* = 0 are unobservable. We denote the placebo and vaccine means of X* by μ and μΔ so that 1 – Δ measures the proportion reduction in the mean number of infecting virions due to vaccination per exposure. We develop different semi-parametric methods of estimating Δ. We allow the distribution of X* to be Poisson or unspecified, and discuss how to incorporate covariates that impact the time to exposure and/or X*. Interestingly Δ, which is a ratio of untruncated means, can be reliably estimated using truncated data (X*
> 0), even if the placebo and vaccine distributions of X* are completely unspecified. Simulations of vaccine clinical trials show that the method can reliably recover Δ in realistic settings. We apply our methods to an HIV vaccine trial conducted in injecting drug users.
Burden of illness; Competing risks; Cox regression; Empirical process; Infectious disease; Marked process
The amount and complexity of patient-level data being collected in randomized controlled trials offers both opportunities and challenges for developing personalized rules for assigning treatment for a given disease or ailment. For example, trials examining treatments for major depressive disorder are not only collecting typical baseline data such as age, gender, or scores on various tests, but also data that measure the structure and function of the brain such as images from magnetic resonance imaging (MRI), functional MRI (fMRI), or electroencephalography (EEG). These latter types of data have an inherent structure and may be considered as functional data. We propose an approach that uses baseline covariates, both scalars and functions, to aid in the selection of an optimal treatment. In addition to providing information on which treatment should be selected for a new patient, the estimated regime has the potential to provide insight into the relationship between treatment response and the set of baseline covariates. Our approach can be viewed as an extension of “advantage learning” to include both scalar and functional covariates. We describe our method and how to implement it using existing software. Empirical performance of our method is evaluated with simulated data in a variety of settings and also applied to real data arising from the study of patients suffering from major depressive disorder from which baseline scalar covariates as well as functional data from EEG are available.
Advantage learning; Depression; Electroencephalography data; Functional data; Personalized medicine; Treatment regime
We thank the Editors and the Associate Editor for the opportunity to have this exchange. We also thank the discussants for welcoming this search to make semiparametric inference more deductive, and for all their very interesting and useful comments.
A treatment regime formalizes personalized medicine as a function from individual patient characteristics to a recommended treatment. A high-quality treatment regime can improve patient outcomes while reducing cost, resource consumption, and treatment burden. Thus, there is tremendous interest in estimating treatment regimes from observational and randomized studies. However, the development of treatment regimes for application in clinical practice requires the long-term, joint effort of statisticians and clinical scientists. In this collaborative process, the statistician must integrate clinical science into the statistical models underlying a treatment regime and the clinician must scrutinize the estimated treatment regime for scientific validity. To facilitate meaningful information exchange, it is important that estimated treatment regimes be interpretable in a subject-matter context. We propose a simple, yet flexible class of treatment regimes whose members are representable as a short list of if-then statements. Regimes in this class are immediately interpretable and are therefore an appealing choice for broad application in practice. We derive a robust estimator of the optimal regime within this class and demonstrate its finite sample performance using simulation experiments. The proposed method is illustrated with data from two clinical trials.
Decision lists; Exploratory analyses; Interpretability; Personalized medicine; Treatment regimes
Researchers often seek robust inference for a parameter through semiparametric estimation. Efficient semiparametric estimation currently requires theoretical derivation of the efficient influence function (EIF), which can be a challenging and time-consuming task. If this task can be computerized, it can save dramatic human effort, which can be transferred, for example, to the design of new studies. Although the EIF is, in principle, a derivative, simple numerical differentiation to calculate the EIF by a computer masks the EIF’s functional dependence on the parameter of interest. For this reason, the standard approach to obtaining the EIF relies on the theoretical construction of the space of scores under all possible parametric submodels. This process currently depends on the correctness of conjectures about these spaces, and the correct verification of such conjectures. The correct guessing of such conjectures, though successful in some problems, is a nondeductive process, i.e., is not guaranteed to succeed (e.g., is not computerizable), and the verification of conjectures is generally susceptible to mistakes. We propose a method that can deductively produce semiparametric locally efficient estimators. The proposed method is computerizable, meaning that it does not need either conjecturing, or otherwise theoretically deriving the functional form of the EIF, and is guaranteed to produce the desired estimates even for complex parameters. The method is demonstrated through an example.
Compatibility; Deductive procedure; Gateaux derivative; Influence function; Semiparametric estimation; Turing machine
In high-dimensional data analysis, it is of primary interest to reduce the data dimensionality without loss of information. Sufficient dimension reduction (SDR) arises in this context, and many successful SDR methods have been developed since the introduction of sliced inverse regression (SIR) [Li (1991)
Journal of the American Statistical Association 86, 316–327]. Despite their fast progress, though, most existing methods target on regression problems with a continuous response. For binary classification problems, SIR suffers the limitation of estimating at most one direction since only two slices are available. In this article, we develop a new and flexible probability-enhanced SDR method for binary classification problems by using the weighted support vector machine (WSVM). The key idea is to slice the data based on conditional class probabilities of observations rather than their binary responses. We first show that the central subspace based on the conditional class probability is the same as that based on the binary response. This important result justifies the proposed slicing scheme from a theoretical perspective and assures no information loss. In practice, the true conditional class probability is generally not available, and the problem of probability estimation can be challenging for data with large-dimensional inputs. We observe that, in order to implement the new slicing scheme, one does not need exact probability values and the only required information is the relative order of probability values. Motivated by this fact, our new SDR procedure bypasses the probability estimation step and employs the WSVM to directly estimate the order of probability values, based on which the slicing is performed. The performance of the proposed probability-enhanced SDR scheme is evaluated by both simulated and real data examples.
Binary classification; Conditional class probability; Fisher consistency; Sufficient dimension reduction; Weighted support vector machines (WSVMs)
Studying the interactions between different brain regions is essential to achieve a more complete understanding of brain function. In this paper, we focus on identifying functional co-activation patterns and undirected functional networks in neuroimaging studies. We build a functional brain network, using a sparse covariance matrix, with elements representing associations between region-level peak activations. We adopt a penalized likelihood approach to impose sparsity on the covariance matrix based on an extended multivariate Poisson model. We obtain penalized maximum likelihood estimates via the expectation-maximization (EM) algorithm and optimize an associated tuning parameter by maximizing the predictive log-likelihood. Permutation tests on the brain co-activation patterns provide region pair and network-level inference. Simulations suggest that the proposed approach has minimal biases and provides a coverage rate close to 95% of covariance estimations. Conducting a meta-analysis of 162 functional neuroimaging studies on emotions, our model identifies a functional network that consists of connected regions within the basal ganglia, limbic system, and other emotion-related brain regions. We characterize this network through statistical inference on region-pair connections as well as by graph measures.
EM algorithm; Emotion; Functional brain networks; Functional co-activation pattern identification; Poisson Graphical Model
Mediation analysis is important for understanding the mechanisms whereby one variable causes changes in another. Measurement error could obscure the ability of the potential mediator to explain such changes. This paper focuses on developing correction methods for measurement error in the mediator with failure time outcomes. We consider a broad definition of measurement error, including technical error and error associated with temporal variation. The underlying model with the ‘true’ mediator is assumed to be of the Cox proportional hazards model form. The induced hazard ratio for the observed mediator no longer has a simple form independent of the baseline hazard function, due to the conditioning event. We propose a mean-variance regression calibration approach and a follow-up time regression calibration approach, to approximate the partial likelihood for the induced hazard function. Both methods demonstrate value in assessing mediation effects in simulation studies. These methods are generalized to multiple biomarkers and to both case-cohort and nested case-control sampling design. We apply these correction methods to the Women's Health Initiative hormone therapy trials to understand the mediation effect of several serum sex hormone measures on the relationship between postmenopausal hormone therapy and breast cancer risk.
Cox model; Mean-variance estimating functions; Measurement error; Mediation analysis; Regression calibration
Semi-parametric regression models for the joint estimation of marginal mean and within-cluster pairwise association parameters are used in a variety of settings for population-averaged modeling of multivariate categorical outcomes. Recently, a formulation of alternating logistic regressions based on orthogonalized, marginal residuals has been introduced for correlated binary data. Unlike the original procedure based on conditional residuals, its covariance estimator is invariant to the ordering of observations within clusters. In this article, the orthogonalized residuals method is extended to model correlated ordinal data with a global odds ratio, and shown in a simulation study to be more eficient and less biased with regards to estimating within-cluster association parameters than an existing extension to ordinal data of alternating logistic regressions based on conditional residuals. Orthogonalized residuals are used to estimate a model for three correlated ordinal outcomes measured repeatedly in a longitudinal clinical trial of an intervention to improve recovery of patients’ perception of altered sensation following jaw surgery.
Clustered data; Generalized estimating equations; Marginal models; Proportional odds; Sensory retraining
Many techniques of functional data analysis require choosing a measure of distance between functions, with the most common choice being L2 distance. In this paper we show that using a weighted L2 distance, with a judiciously chosen weight function, can improve the performance of various statistical methods for functional data, including k-medoids clustering, nonparametric classification, and permutation testing. Assuming a quadratically penalized (e.g. spline) basis representation for the functional data, we consider three nontrivial weight functions: design density weights, inverse-variance weights, and a new weight function that minimizes the coefficient of variation of the resulting squared distance by means of an efficient iterative procedure. The benefits of weighting, in particular with the proposed weight function, are demonstrated both in simulation studies and in applications to the Berkeley growth data and a functional magnetic resonance imaging data set.
Coefficient of variation; Functional classification analysis; Functional clustering analysis; Penalized splines; Weighted L2 distance
We derive regression estimators that can compare longitudinal treatments using only the longitudinal propensity scores as regressors. These estimators, which assume knowledge of the variables used in the treatment assignment, are important for reducing the large dimension of covariates for two reasons. First, if the regression models on the longitudinal propensity scores are correct, then our estimators share advantages of correctly-specified model-based estimators, a benefit not shared by estimators based on weights alone. Second, if the models are incorrect, the misspecification can be more easily limited through model checking than with models based on the full covariates. Thus, our estimators can also be better when used in place of the regression on the full covariates. We use our methods to compare longitudinal treatments for type 2 diabetes mellitus.
Admissibility; Longitudinal treatments; Observation study; Propensity scores
Multi-state models can be viewed as generalizations of both the standard and competing risks models for survival data. Models for multi-state data have been the theme of many recent published works. Motivated by bone marrow transplant data, we propose a Bayesian model using the gap times between two successive events in a path of events experienced by a subject. Path specific frailties are introduced to capture the dependence structure of the gap times in the paths with two or more states. Under improper prior distributions for the parameters, we establish propriety of the posterior distribution. An efficient Gibbs sampling algorithm is developed for drawing samples from the posterior distribution. An extensive simulation study is carried out to examine the empirical performance of the proposed approach. A bone marrow transplant data set is analyzed in detail to further demonstrate the proposed methodology.
Gamma frailty; Gap time; Gibbs sampler; Piecewise exponential model; Survival analysis
We present a principled technique for estimating the effect of covariates on malaria parasite clearance rates in the presence of “lag” and “tail” phases through the use of a Bayesian hierarchical linear model. The hierarchical approach enables us to appropriately incorporate the uncertainty in both estimating clearance rates in patients and assessing the potential impact of covariates on these rates into the posterior intervals generated for the parameters associated with each covariate. Furthermore, it permits us to incorporate information about individuals for whom there exists only one observation time before censoring, which alleviates a systematic bias affecting inference when these individuals are excluded. We use a changepoint model to account for both lag and tail phases, and hence base our estimation of the parasite clearance rate only on observations within the decay phase. The Bayesian approach allows us to treat the delineation between lag, decay, and tail phases within an individual’s clearance profile as themselves being random variables, thus taking into account the additional uncertainty of boundaries between phases. We compare our method to existing methodology used in the antimalarial research community through a simulation study and show that it possesses desirable frequentist properties for conducting inference. We use our methodology to measure the impact of several covariates on Plasmodium falciparum clearance rate data collected in 2009 and 2010. Though our method was developed with this application in mind, it can be easily applied to any biological system exhibiting these hindrances to estimation.
Bayesian; changepoint models; clearance rate estimation; drug resistance; hierarchical linear models; left censoring; malaria; Plasmodium falciparum
When a true survival endpoint cannot be assessed for some subjects, an alternative endpoint that measures the true endpoint with error may be collected, which often occurs when obtaining the true endpoint is too invasive or costly. We develop an estimated likelihood function for the situation where we have both uncertain endpoints for all participants and true endpoints for only a subset of participants. We propose a nonparametric maximum estimated likelihood estimator of the discrete survival function of time to the true endpoint. We show that the proposed estimator is consistent and asymptotically normal. We demonstrate through extensive simulations that the proposed estimator has little bias compared to the naïve Kaplan-Meier survival function estimator, which uses only uncertain endpoints, and more efficient with moderate missingness compared to the complete-case Kaplan-Meier survival function estimator, which uses only available true endpoints. Finally, we apply the proposed method to a dataset for estimating the risk of developing Alzheimer's disease from the Alzheimer's Disease Neuroimaging Initiative.
Measurement error; Missing data; Nonparametric survival analysis; Uncertain endpoints; Validation sample
Confounder selection and adjustment are essential elements of assessing
the causal effect of an exposure or treatment in observational studies. Building
upon work by Wang et al. (2012) and Lefebvre et al. (2014), we propose and
evaluate a Bayesian method to estimate average causal effects in studies with a
large number of potential confounders, relatively few observations, likely
interactions between confounders and the exposure of interest, and uncertainty
on which confounders and interaction terms should be included. Our method is
applicable across all exposures and outcomes that can be handled through
generalized linear models. In this general setting, estimation of the average
causal effect is different from estimation of the exposure coefficient in the
outcome model due to non-collapsibility. We implement a Bayesian bootstrap
procedure to integrate over the distribution of potential confounders and to
estimate the causal effect. Our method permits estimation of both the overall
population causal effect and effects in specified subpopulations, providing
clear characterization of heterogeneous exposure effects that may vary
considerably across different covariate profiles. Simulation studies demonstrate
that the proposed method performs well in small sample size situations with 100
to 150 observations and 50 covariates. The method is applied to data on 15060 US
Medicare beneficiaries diagnosed with a malignant brain tumor between 2000 and
2009 to evaluate whether surgery reduces hospital readmissions within thirty
days of diagnosis.
Average causal effect; Confounder selection; Treatment effect heterogeneity; Bayesian adjustment for confounding
In a relative risk analysis of colorectal caner on nutrition intake scores across genders, we show that, surprisingly, when comparing the relative risks for men and women based on the index of a weighted sum of various nutrition scores, the problem reduces to forming a confidence interval for the ratio of two (asymptotically) normal random variables. The latter is an old problem, with a substantial literature. However, our simulation results suggest that existing methods often either give inaccurate coverage probabilities or have a positive probability to produce confidence intervals with infinite length. Motivated by such a problem, we develop a new methodology which we call the Direct Integral Method for Ratios (DIMER), which, unlike the other methods, is based directly on the distribution of the ratio. In simulations, we compare this method to many others. These simulations show that, generally, DIMER more closely achieves the nominal confidence level, and in those cases that the other methods achieve the nominal levels, DIMER has comparable confidence interval lengths. The methodology is then applied to a real data set, and with follow up simulations.
Confidence Interval; Direct Integral Method for Ratios; DIMER; Fieller’s interval; Hayya’s method; Ratios of location parameters
Gene regulatory networks represent the regulatory relationships between genes and their products and are important for exploring and defining the underlying biological processes of cellular systems. We develop a novel framework to recover the structure of nonlinear gene regulatory networks using semiparametric spline-based directed acyclic graphical models. Our use of splines allows the model to have both flexibility in capturing nonlinear dependencies as well as control of overfitting via shrinkage, using mixed model representations of penalized splines. We propose a novel discrete mixture prior on the smoothing parameter of the splines that allows for simultaneous selection of both linear and nonlinear functional relationships as well as inducing sparsity in the edge selection. Using simulation studies, we demonstrate the superior performance of our methods in comparison with several existing approaches in terms of network reconstruction and functional selection. We apply our methods to a gene expression dataset in glioblastoma multiforme, which reveals several interesting and biologically relevant nonlinear relationships.
Directed acyclic graph; Hierarchical model; MCMC; Model and functional selection; P-splines; Gene regulatory network
Stochastic process models are widely employed for analyzing spatiotemporal datasets in various scientific disciplines including, but not limited to, environmental monitoring, ecological systems, forestry, hydrology, meteorology and public health. After inferring on a spatiotemporal process for a given dataset, inferential interest may turn to estimating rates of change, or gradients, over space and time. This manuscript develops fully model-based inference on spatiotemporal gradients under continuous space, continuous time settings. Our contribution is to offer, within a exible spatiotemporal process model setting, a framework to estimate arbitrary directional gradients over space at any given timepoint, temporal derivatives at any given spatial location and, finally, mixed spatiotemporal gradients that reflect rapid change in spatial gradients over time and vice-versa. We achieve such inference without compromising on rich and exible spatiotemporal process models and use nonseparable covariance structures. We illustrate our methodology using a simulated data example and subsequently apply it to a dataset of daily PM2.5 concentrations in California, where the spatiotemporal gradient process reveals the effects of California’s unique topography on pollution and detects the aftermath of a devastating series of wildfires.
Gaussian process; Gradients; Markov chain Monte Carlo
Progressive and insidious cognitive decline that interferes with daily life is the defining characteristic of Alzheimer’s disease (AD). Epidemiological studies have found that the pathological process of AD begins years before a clinical diagnosis is made and can be highly variable within a given population. Characterizing cognitive decline in the preclinical phase of AD is critical for the development of early intervention strategies when disease-modifying therapies may be most effective. In the last decade, there has been an increased interest in the application of change-point models to longitudinal cognitive outcomes prior to and after diagnosis. Most of the proposed statistical methodology for describing decline relies upon distributional assumptions that may not hold. In this paper, we introduce a quantile regression with a change-point model for longitudinal data of cognitive function in persons bound to develop AD. A change-point in our model reflects the transition from the cognitive decline due to normal aging to the accelerated decline due to disease progression. Quantile regression avoids common distributional assumptions on cognitive outcomes and allows the covariate effects and the change-point to vary for different quantiles of the response. We provided an approach for estimating the model parameters, including the change-point, and presented inferential procedures based on the asymptotic properties of the estimators. A simulation study showed that the estimation and inferential procedures perform reasonably well in finite samples. The practical use of our model was illustrated by an application to longitudinal episodic memory outcomes from two cohort studies of aging and AD.
Alzheimer’s disease; Change-point model; Cognitive aging; Disease progression; Longitudinal data; Quantile regression
We propose a Bayesian dose-finding design that accounts for two important factors, the severity of toxicity and heterogeneity in patients’ susceptibility to toxicity. We consider toxicity outcomes with various levels of severity and define appropriate scores for these severity levels. We then use a multinomial likelihood function and a Dirichlet prior to model the probabilities of these toxicity scores at each dose, and characterize the overall toxicity using an average toxicity score parameter. To address the issue of heterogeneity in patients’ susceptibility to toxicity, we categorize patients into different risk groups based on their susceptibility. A Bayesian isotonic transformation is applied to induce an order-restricted posterior inference on the average toxicity scores. We demonstrate the performance of the proposed dose-finding design using simulations based on a clinical trial in multiple myeoloma.
Adaptive design; Bayesian inference; Isotonic regression; Average toxicity score