Modeling clinical endpoints as a function of change in antiretroviral therapy (ART) attempts to answer one simple but very challenging question: was the change in ART beneficial or not? We conceive a similar scientific question of interest in the current manuscript except that we are interested in modeling the time of ART regimen change rather than a comparison of two or more ART regimens. The answer to this scientific riddle is unknown and has been difficult to address clinically. Naturally, ART regimen change is left to a participant and his or her provider and so the date of change depends on participant characteristics. There exists a vast literature on how to address potential confounding and those techniques are vital to the success of the method here. A more substantial challenge is devising a systematic modeling strategy to overcome the missing time of regimen change for those participants who do not switch to second-line ART within the study period even after failing the initial ART. In this paper, we adopt and apply a statistical method that was originally proposed for modeling infusion trial data, where infusion length may be informatively censored, and argue that the same strategy may be employed here. Our application of this method to therapeutic HIV/AIDS studies is new and interesting. Using data from the AIDS Clinical Trials Group (ACTG) Study A5095, we model immunological endpoints as a polynomial function of a participant’s switching time to second-line ART for 182 participants who already failed the initial ART. In our analysis, we find that participants who switch early have somewhat better sustained suppression of HIV-1 RNA after virological failure than those who switch later. However, we also found that participants who switched very late, possibly censored due to the end of the study, had good HIV-1 RNA suppression, on average. We believe our scientific conclusions contribute to the relevant HIV literature and hope that the basic modeling strategy outlined here would be useful to others contemplating similar analyses with partially missing treatment length data.
Causal inference; Informative Censoring; Observational data; Propensity score
We consider semiparametric transition measurement error models for
longitudinal data, where one of the covariates is measured with error in
transition models, and no distributional assumption is made for the underlying
unobserved covariate. An estimating equation approach based on the pseudo
conditional score method is proposed. We show the resulting estimators of the
regression coefficients are consistent and asymptotically normal. We also
discuss the issue of efficiency loss. Simulation studies are conducted to
examine the finite-sample performance of our estimators. The longitudinal AIDS
Costs and Services Utilization Survey data are analyzed for illustration.
Asymptotic efficiency; Conditional score method; Functional modeling; Measurement error; Longitudinal data; Transition models
In this article, we propose a Bayesian approach to dose-response assessment and the assessment of synergy between two combined agents. We consider the case of an in vitro ovarian cancer research study aimed at investigating the antiproliferative activities of four agents, alone and paired, in two human ovarian cancer cell lines. In this study, independent dose-response experiments were repeated three times. Each experiment included replicates at investigated dose levels including control (no drug). We have developed a Bayesian hierarchical nonlinear regression model that accounts for variability between-experiments, variability within experiments (i.e., replicates), and variability in the observed responses of the controls. We use Markov chain Monte Carlo (MCMC) to fit the model to the data and carry out posterior inference on quantites of interest (e.g., median inhibitory concentration IC50). In addition, we have developed a method, based on Loewe additivity, that allows one to assess the presence of synergy with honest accounting of uncertainty. Extensive simulation studies show that our proposed approach is more reliable in declaring synergy compared to current standard analyses such as the Median-Effect Principle/Combination Index method (Chou and Talalay, 1984), that ignore important sources of variability and uncertainty.
Combination Index method; Drug interaction; Emax model; Interaction index; Median-effect principle; Loewe additivity model
High-density SNP microarrays provide a useful tool for the detection of copy number variants (CNVs). The analysis of such large amounts of data is complicated, especially with regard to determining where copy numbers change and their corresponding values. In this paper, we propose a Bayesian multiple change point model (BMCP) for segmentation and estimation of SNP microarray data. Segmentation concerns separating a chromosome into regions of equal copy number differences between the sample of interest and some reference, and involves the detection of locations of copy number difference changes. Estimation concerns determining true copy number for each segment. Our approach not only gives posterior estimates for the parameters of interest, namely locations for copy number difference changes and true copy number estimates, but also useful confidence measures. In addition, our algorithm can segment multiple samples simultaneously, and infer both common and rare CNVs across individuals. Finally, for studies of CNVs in tumors, we incorporate an adjustment factor for signal attenuation due to tumor heterogeneity or normal contamination that can improve copy number estimates.
Bayesian multiple change points; copy number variant; estimation; segmentation; signal attenuation; SNP microarrays
Absence of a perfect reference test is an acknowledged source of bias in diagnostic studies. In the case of tuberculous pleuritis, standard reference tests such as smear microscopy, culture and biopsy have poor sensitivity. Yet meta-analyses of new tests for this disease have always assumed the reference standard is perfect, leading to biased estimates of the new test’s accuracy. We describe a method for joint meta-analysis of sensitivity and specificity of the diagnostic test under evaluation, while considering the imperfect nature of the reference standard. We use a Bayesian hierarchical model that takes into account within- and between-study variability. We show how to obtain pooled estimates of sensitivity and specificity, and how to plot a hierarchical summary receiver operating characteristic curve. We describe extensions of the model to situations where multiple reference tests are used, and where index and reference tests are conditionally dependent. The performance of the model is evaluated using simulations and illustrated using data from a meta-analysis of nucleic acid amplification tests (NAATs) for tuberculous pleuritis. The estimate of NAAT specificity was higher and the sensitivity lower compared to a model that assumed that the reference test was perfect.
PMID: 22568612 CAMSID: cams3174
Bayesian; Bivariate model; Diagnostic test accuracy; Latent class model; Meta-analysis
Identification of novel biomarkers for risk assessment is important for both effective disease prevention and optimal treatment recommendation. Discovery relies on the precious yet limited resource of stored biological samples from large prospective cohort studies. Case-cohort sampling design provides a cost-effective tool in the context of biomarker evaluation, especially when the clinical condition of interest is rare. Existing statistical methods focus on making efficient inference on relative hazard parameters from the Cox regression model. Drawing on recent theoretical development on the weighted likelihood for semiparametric models under two-phase studies (Breslow and Wellner, 2007), we propose statistical methods to evaluate accuracy and predictiveness of a risk prediction biomarker, with censored time-to-event outcome under stratified case-cohort sampling. We consider nonparametric methods and a semiparametric method. We derive large sample properties of proposed estimators and evaluate their finite sample performance using numerical studies. We illustrate new procedures using data from Framingham Offspring study to evaluate the accuracy of a recently developed risk score incorporating biomarker information for predicting cardiovascular disease.
Case Cohort Sampling; Negative predictive value; Positive predictive value; Receiver Operating Characteristics Curve (ROC curve); Integrated Discrimination Improvement (IDI); Risk prediction; Survival analysis; Two-phase study
The recovery of gradients of sparsely observed functional data is a challenging ill-posed inverse problem. Given observations of smooth curves (e.g., growth curves) at isolated time points, the aim is to provide estimates of the underlying gradients (or growth velocities). To address this problem, we develop a Bayesian inversion approach that models the gradient in the gaps between the observation times by a tied-down Brownian motion, conditionally on its values at the observation times. The posterior mean and covariance kernel of the growth velocities are then found to have explicit and computationally tractable representations in terms of quadratic splines. The hyperparameters in the prior are specified via nonparametric empirical Bayes, with the prior precision matrix at the observation times estimated by constrained ℓ1 minimization. The infinitessimal variance of the Brownian motion prior is selected by cross-validation. The approach is illustrated using both simulated and real data examples.
Growth trajectories; Functional data analysis; Ill-posed inverse problem; Nonparametric Empirical Bayes; Tied-down Brownian motion
Fetal growth restriction is a leading cause of perinatal morbidity and mortality that could be reduced if high risk infants are identified early in pregnancy. We propose a Bayesian model for aggregating 18 longitudinal ultrasound measurements of fetal size and blood flow into three underlying, continuous latent factors. Our procedure is more flexible than typical latent variable methods in that we relax the normality assumptions by allowing the latent factors to follow finite mixture distributions. Using mixture distributions also permits us to cluster individuals with similar observed characteristics and identify latent classes of subjects who are more likely to be growth or blood flow restricted during pregnancy. We also use our latent variable mixture distribution model to identify a clinically-meaningful latent class of subjects with low birth weight and early gestational age. We then examine the association of latent classes of intrauterine growth restriction with latent classes of birth outcomes as well as observed maternal covariates including fetal gender and maternal race, parity, body mass index (BMI), and height. Our methods identified a latent class of subjects who have increased blood flow restriction and below average intrauterine size during pregnancy who were more likely to be growth restricted at birth than a class of individuals with typical size and blood flow.
Bayesian methods; birth weight; correlated data; intrauterine growth restriction; pre-term birth; latent variables; small for gestational age
In observational studies, interest often lies in estimation of the population-level relationship between the explanatory variables and dependent variables, and the estimation is often done using longitudinal data. Longitudinal data often feature sampling error and bias due to non-random drop-out. However, inclusion of population-level information can increase estimation efficiency. In this paper we consider a generalized partially linear model for incomplete longitudinal data in the presence of the population-level information. A pseudo-empirical likelihood-based method is introduced to incorporate population-level information, and non-random drop-out bias is corrected by using a weighted generalized estimating equations method. A three-step estimation procedure is proposed, which makes the computation easier. Several methods that are often used in practice are compared in simulation studies, which demonstrate that our proposed method can correct the non-random drop-out bias and increase the estimation efficiency, especially for small sample size or when the missing proportion is high. We apply this method to an Alzheimer's disease study.
Auxiliary; drop-out; longitudinal data; partially linear model; population-level information; pseudo-empirical likelihood
In vaccine research, immune biomarkers that can reliably predict a vaccine’s effect on the clinical endpoint (i.e., surrogate markers) are important tools for guiding vaccine development. This paper addresses issues on optimizing two-phase sampling study design for evaluating surrogate markers in a principal surrogate framework, motivated by the design of a future HIV vaccine trial. To address the problem of missing potential outcomes in a standard trial design, novel trial designs have been proposed that utilize baseline predictors of the immune response biomarker(s) and/or augment the trial by vaccinating uninfected placebo recipients at the end of the trial and measuring their immune biomarkers. However, inefficient use of the augmented information can lead to counterintuitive results on the precision of estimation. To remedy this problem, we propose a pseudo-score type estimator suitable for the augmented design and characterize its asymptotic properties. This estimator has superior performance compared with existing estimators and allows calculation of analytical variances useful for guiding study design. Based on the new estimator we investigate in detail the problem of optimizing the sampling scheme of a biomarker in a vaccine efficacy trial for efficiently estimating its surrogate effect, as characterized by the vaccine efficacy curve (a causal effect predictiveness curve) and by the predicted overall vaccine efficacy using the biomarker.
Closeout placebo vaccination; Estimated likelihood; Immune correlate; Principal surrogate; Pseudo-score; Two-phase sampling design
A real-time surveillance method is developed with emphasis on rapid and accurate detection of emerging outbreaks. We develop a model with relatively weak assumptions regarding the latent processes generating the observed data, ensuring a robust prediction of the spatiotemporal incidence surface. Estimation occurs via a local linear fitting combined with day-of-week effects, where spatial smoothing is handled by a novel distance metric that adjusts for population density. Detection of emerging outbreaks is carried out via residual analysis. Both daily residuals and AR model-based de-trended residuals are used for detecting abnormalities in the data given that either a large daily residual or an increasing temporal trend in the residuals signals a potential outbreak, with the threshold for statistical significance determined using a resampling approach.
Disease surveillance; Local linear estimation; Residual analysis; Lattice Data; Time series modeling
To develop more targeted intervention strategies, an important research goal is to identify markers predictive of clinical events. A crucial step towards this goal is to characterize the clinical performance of a marker for predicting different types of events. In this manuscript, we present statistical methods for evaluating the performance of a prognostic marker in predicting multiple competing events. To capture the potential time-varying predictive performance of the marker and incorporate competing risks, we define time- and cause-specific accuracy summaries by stratifying cases based on causes of failure. Such definition would allow one to evaluate the predictive accuracy of a marker for each type of event and compare its predictiveness across event types. Extending the nonparametric crude cause-specific ROC curve estimators by Saha and Heagerty (2010), we develop inference procedures for a range of cause-specific accuracy summaries. To estimate the accuracy measures and assess how covariates may affect the accuracy of a marker under the competing risk setting, we consider two forms of semiparametric models through the cause-specific hazard framework. These approaches enable a flexible modeling of the relationships between the marker and failure times for each cause, while efficiently accommodating additional covariates. We investigate the asymptotic property of the proposed accuracy estimators and demonstrate the finite sample performance of these estimators through simulation studies. The proposed procedures are illustrated with data from a prostate cancer prognostic study.
Biomarker evaluation; Cause-specific Hazard; Competing risk; Negative predictive value; Positive predictive value; Receiver Operating Characteristics Curve (ROC curve); Survival analysis
This article examines group testing procedures where units within a group (or pool) may be correlated. The expected number of tests per unit (i.e., efficiency) of hierarchical- and matrix-based procedures is derived based on a class of models of exchangeable binary random variables. The effect on efficiency of the arrangement of correlated units within pools is then examined. In general, when correlated units are arranged in the same pool, the expected number of tests per unit decreases, sometimes substantially, relative to arrangements that ignore information about correlation.
Composite sampling; Epitope mapping; Exchangeable binary random variables; Group testing; HIV; Matrix testing; Pooled testing
We consider frailty models with additive semiparametric covariate effects
for clustered failure time data. We propose a doubly penalized partial
likelihood (DPPL) procedure to estimate the nonparametric functions using
smoothing splines. We show that the DPPL estimators could be obtained from
fitting an augmented working frailty model with parametric covariate effects,
whereas the nonparametric functions being estimated as linear combinations of
fixed and random effects, and the smoothing parameters being estimated as extra
variance components. This approach allows us to conveniently estimate all model
components within a unified frailty model framework. We evaluate the finite
sample performance of the proposed method via a simulation study, and apply the
method to analyze data from a study of sexually transmitted infections
Doubly penalized partial likelihood; smoothing spline; Gaussian frailty; sexually transmitted disease; Smoothing parameter; Variance components
We propose optimal choice of the design parameters for random discontinuation designs (RDD) using a Bayesian decision-theoretic approach. We consider applications of RDDs to oncology phase II studies evaluating activity of cytostatic agents. The design consists of two stages. The preliminary open-label stage treats all patients with the new agent and identifies a possibly sensitive subpopulation. The subsequent second stage randomizes, treats, follows, and compares outcomes among patients in the identified subgroup, with randomization to either the new or a control treatment. Several tuning parameters characterize the design: the number of patients in the trial, the duration of the preliminary stage, and the duration of follow-up after randomization. We define a probability model for tumor growth, specify a suitable utility function, and develop a computational procedure for selecting the optimal tuning parameters.
Clinical trials; Enrichment designs; Randomized discontinuation design; Tumor growth models
Recurrent events are common in medical research for subjects who are followed for the duration of a study. For example, cardiovascular patients with an implantable cardioverter defibrillator (ICD) experience recurrent arrhythmic events that are terminated by shocks or antitachycardia pacing delivered by the device. In a published randomized clinical trial, a recurrent-event model was used to study the effect of a drug therapy in subjects with ICDs, who were experiencing recurrent symptomatic arrhythmic events. Under this model, one expects the robust variance for the estimated treatment effect to diminish when the duration of the trial is extended, due to the additional events observed. However, as shown in this article, that is not always the case. We investigate this phenomenon using large datasets from this arrhythmia trial and from a diabetes study, with some analytical results, as well as through simulations. Some insights are also provided on existing sample size formulae using our results.
Andersen–Gill model; Clinical trials; Recurrent-events data; Robust standard error; Sample size; Sandwich estimator
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation–maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.
Doubly robust; Estimating equation; Missing at random; Missing covariate; Missing response
present four U-statistic based tests to compare genetic diversity between different samples. The proposed tests improved upon previously used methods by accounting for the correlations in the data. We find, however, that the same correlations introduce an unacceptable bias in the sample estimators used for the variance and covariance of the inter-sequence genetic distances for modest sample sizes. Here, we compute unbiased estimators for these and test the resulting improvement using simulated data. We also show that, contrary to the claims in Gilbert et al., it is not always possible to apply the Welch–Satterthwaite approximate t-test, and we provide explicit formulas for the degrees of freedom to be used when, on the other hand, such approximation is indeed possible.
HIV genetic diversity; Hypothesis testing; Nonparametric statistics; Two-sample test; U-statistic
High-dimensional and highly correlated data leading to non- or weakly identified effects are commonplace. Maximum likelihood will typically fail in such situations and a variety of shrinkage methods have been proposed. Standard techniques, such as ridge regression or the lasso, shrink estimates toward zero, with some approaches allowing coefficients to be selected out of the model by achieving a value of zero. When substantive information is available, estimates can be shrunk to nonnull values; however, such information may not be available. We propose a Bayesian semiparametric approach that allows shrinkage to multiple locations. Coefficients are given a mixture of heavy-tailed double exponential priors, with location and scale parameters assigned Dirichlet process hyperpriors to allow groups of coefficients to be shrunk toward the same, possibly nonzero, mean. Our approach favors sparse, but flexible, structure by shrinking toward a small number of random locations. The methods are illustrated using a study of genetic polymorphisms and Parkinson’s disease.
Dirichlet process; Hierarchical model; Lasso; MCMC; Mixture model; Nonparametric; Regularization; Shrinkage prior
In this article, we first study parameter identifiability in randomized clinical trials with noncompliance and missing outcomes. We show that under certain conditions the parameters of interest are identifiable even under different types of completely nonignorable missing data: that is, the missing mechanism depends on the outcome. We then derive their maximum likelihood and moment estimators and evaluate their finite-sample properties in simulation studies in terms of bias, efficiency, and robustness. Our sensitivity analysis shows that the assumed nonignorable missing-data model has an important impact on the estimated complier average causal effect (CACE) parameter. Our new method provides some new and useful alternative nonignorable missing-data models over the existing latent ignorable model, which guarantees parameter identifiability, for estimating the CACE in a randomized clinical trial with noncompliance and missing data.
Causal inference; Identifiability; Maximum likelihood estimates; Missing data; Noncompliance; Nonignorable
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses’ Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer.
Errors in variables; nonlinear models; logistic regression; integral equations
Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient.
Joint longitudinal-survival model; Online calculator; Predicted probability; Prostate cancer; PSA
In estimation of the ROC curve, when the true disease status is subject to nonignorable missingness, the observed likelihood involves the missing mechanism given by a selection model. In this paper, we proposed a likelihood-based approach to estimate the ROC curve and the area under ROC curve when the verification bias is nonignorable. We specified a parametric disease model in order to make the nonignorable selection model identifiable. With the estimated verification and disease probabilities, we constructed four types of empirical estimates of the ROC curve and its area based on imputation and reweighting methods. In practice, a reasonably large sample size is required to estimate the nonignorable selection model in our settings. Simulation studies showed that all the four estimators of ROC area performed well, and imputation estimators were generally more efficient than the other estimators proposed. We applied the proposed method to a data set from research in the Alzheimer’s disease.
Alzheimer’s disease; nonignorable missing data; ROC curve; verification bias