PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (44)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Estimating treatment effects with treatment switching via semicompeting risks models: an application to a colorectal cancer study 
Biometrika  2011;99(1):167-184.
Summary
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial.
doi:10.1093/biomet/asr062
PMCID: PMC3412606  PMID: 23049136
Expectation-maximization algorithm; Maximum likelihood estimate; Noncompliance; Panitumumab; Partial switching; Transition model; Treatment switching
2.  Directed acyclic graphs with edge-specific bounds 
Biometrika  2011;99(1):115-126.
Summary
We give a definition of a bounded edge within the causal directed acyclic graph framework. A bounded edge generalizes the notion of a signed edge and is defined in terms of bounds on a ratio of survivor probabilities. We derive rules concerning the propagation of bounds. Bounds on causal effects in the presence of unmeasured confounding are also derived using bounds related to specific edges on a graph. We illustrate the theory developed by an example concerning estimating the effect of antihistamine treatment on asthma in the presence of unmeasured confounding.
doi:10.1093/biomet/asr059
PMCID: PMC3412607  PMID: 23049135
Bayesian network; Bound; Causal inference; Confounding; Directed acyclic graph
3.  Conservative hypothesis tests and confidence intervals using importance sampling 
Biometrika  2012;99(1):57-69.
Summary
Importance sampling is a common technique for Monte Carlo approximation, including that of p-values. Here it is shown that a simple correction of the usual importance sampling p-values provides valid p-values, meaning that a hypothesis test created by rejecting the null hypothesis when the p-value is at most α will also have a Type I error rate of at most α. This correction uses the importance weight of the original observation, which gives valuable diagnostic information under the null hypothesis. Using the corrected p-values can be crucial for multiple testing and also in problems where evaluating the accuracy of importance sampling approximations is difficult. Inverting the corrected p-values provides a useful way to create Monte Carlo confidence intervals that maintain the nominal significance level and use only a single Monte Carlo sample.
doi:10.1093/biomet/asr079
PMCID: PMC3412608  PMID: 23049134
Exact inference; Monte Carlo simulation; Multiple testing; p-value; Rasch model
4.  Optimality of group testing in the presence of misclassification 
Biometrika  2011;99(1):245-251.
Summary
Several optimality properties of Dorfman’s (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases.
doi:10.1093/biomet/asr064
PMCID: PMC3412609  PMID: 23049137
Binary outcome; Maximum likelihood estimation; Pooling; Prevalence; Sensitivity; Specificity
5.  Sparse estimation of a covariance matrix 
Biometrika  2011;98(4):807-820.
Summary
We suggest a method for estimating a covariance matrix on the basis of a sample of vectors drawn from a multivariate normal distribution. In particular, we penalize the likelihood with a lasso penalty on the entries of the covariance matrix. This penalty plays two important roles: it reduces the effective number of parameters, which is important even when the dimension of the vectors is smaller than the sample size since the number of parameters grows quadratically in the number of variables, and it produces an estimate which is sparse. In contrast to sparse inverse covariance estimation, our method’s close relative, the sparsity attained here is in the covariance matrix itself rather than in the inverse matrix. Zeros in the covariance matrix correspond to marginal independencies; thus, our method performs model selection while providing a positive definite estimate of the covariance. The proposed penalized maximum likelihood problem is not convex, so we use a majorize-minimize approach in which we iteratively solve convex approximations to the original nonconvex problem. We discuss tuning parameter selection and demonstrate on a flow-cytometry dataset how our method produces an interpretable graphical display of the relationship between variables. We perform simulations that suggest that simple elementwise thresholding of the empirical covariance matrix is competitive with our method for identifying the sparsity structure. Additionally, we show how our method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified.
doi:10.1093/biomet/asr054
PMCID: PMC3413177  PMID: 23049130
Concave-convex procedure; Covariance graph; Covariance matrix; Generalized gradient descent; Lasso; Majorization-minimization; Regularization; Sparsity
6.  Wild bootstrap for quantile regression 
Biometrika  2011;98(4):995-999.
Summary
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points.
doi:10.1093/biomet/asr052
PMCID: PMC3413178  PMID: 23049133
Bahadur representation; Heteroscedastic error; Quantile regression; Wild bootstrap
7.  Threshold estimation based on a p-value framework in dose-response and regression settings 
Biometrika  2011;98(4):887-900.
Summary
We use p-values to identify the threshold level at which a regression function leaves its baseline value, a problem motivated by applications in toxicological and pharmacological dose-response studies and environmental statistics. We study the problem in two sampling settings: one where multiple responses can be obtained at a number of different covariate levels, and the other the standard regression setting involving limited number of response values at each covariate. Our procedure involves testing the hypothesis that the regression function is at its baseline at each covariate value and then computing the potentially approximate p-value of the test. An estimate of the threshold is obtained by fitting a piecewise constant function with a single jump discontinuity, known as a stump, to these observed p-values, as they behave in markedly different ways on the two sides of the threshold. The estimate is shown to be consistent and its finite sample properties are studied through simulations. Our approach is computationally simple and extends to the estimation of the baseline value of the regression function, heteroscedastic errors and to time series. It is illustrated on some real data applications.
doi:10.1093/biomet/asr051
PMCID: PMC3413179  PMID: 23049132
Baseline value; Changepoint; Consistent estimate; Misspecified model; Stump function
8.  Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment 
Biometrika  2011;98(4):845-860.
Summary
It is a challenge to evaluate experimental treatments where it is suspected that the treatment effect may only be strong for certain subpopulations, such as those having a high initial severity of disease, or those having a particular gene variant. Standard randomized controlled trials can have low power in such situations. They also are not optimized to distinguish which subpopulations benefit from a treatment. With the goal of overcoming these limitations, we consider randomized trial designs in which the criteria for patient enrollment may be changed, in a preplanned manner, based on interim analyses. Since such designs allow data-dependent changes to the population enrolled, care must be taken to ensure strong control of the familywise Type I error rate. Our main contribution is a general method for constructing randomized trial designs that allow changes to the population enrolled based on interim data using a prespecified decision rule, for which the asymptotic, familywise Type I error rate is strongly controlled at a specified level α. As a demonstration of our method, we prove new, sharp results for a simple, two-stage enrichment design. We then compare this design to fixed designs, focusing on each design’s ability to determine the overall and subpopulation-specific treatment effects.
doi:10.1093/biomet/asr055
PMCID: PMC3413180  PMID: 23049131
Adaptive design; Enrichment design; Group sequential design; Optimization; Patient-oriented research; Randomized trial; Subpopulation
9.  Sample size formulae for two-stage randomized trials with survival outcomes 
Biometrika  2011;98(3):503-518.
Two-stage randomized trials are growing in importance in developing adaptive treatment strategies, i.e. treatment policies or dynamic treatment regimes. Usually, the first stage involves randomization to one of the several initial treatments. The second stage of treatment begins when an early nonresponse criterion or response criterion is met. In the second-stage, nonresponding subjects are re-randomized among second-stage treatments. Sample size calculations for planning these two-stage randomized trials with failure time outcomes are challenging because the variances of common test statistics depend in a complex manner on the joint distribution of time to the early nonresponse criterion or response criterion and the primary failure time outcome. We produce simple, albeit conservative, sample size formulae by using upper bounds on the variances. The resulting formulae only require the working assumptions needed to size a standard single-stage randomized trial and, in common settings, are only mildly conservative. These sample size formulae are based on either a weighted Kaplan–Meier estimator of survival probabilities at a fixed time-point or a weighted version of the log-rank test.
doi:10.1093/biomet/asr019
PMCID: PMC3254237  PMID: 22363091
Dynamic treatment regime; Sample size calculation; Sequential multiple assignment randomized trial; Weighted Kaplan–Meier estimator; Weighted log-rank test
10.  Conditional Akaike information under generalized linear and proportional hazards mixed models 
Biometrika  2011;98(3):685-700.
We study model selection for clustered data, when the focus is on cluster specific inference. Such data are often modelled using random effects, and conditional Akaike information was proposed in Vaida & Blanchard (2005) and used to derive an information criterion under linear mixed models. Here we extend the approach to generalized linear and proportional hazards mixed models. Outside the normal linear mixed models, exact calculations are not available and we resort to asymptotic approximations. In the presence of nuisance parameters, a profile conditional Akaike information is proposed. Bootstrap methods are considered for their potential advantage in finite samples. Simulations show that the performance of the bootstrap and the analytic criteria are comparable, with bootstrap demonstrating some advantages for larger cluster sizes. The proposed criteria are applied to two cancer datasets to select models when the cluster-specific inference is of interest.
doi:10.1093/biomet/asr023
PMCID: PMC3384357  PMID: 22822261
Akaike information; Conditional likelihood; Effective degrees of freedom
11.  On protected estimation of an odds ratio model with missing binary exposure and confounders 
Biometrika  2011;98(3):749-754.
We describe an estimator of the parameter indexing a model for the conditional odds ratio between a binary exposure and a binary outcome given a high-dimensional vector of confounders, when the exposure and a subset of the confounders are missing, not necessarily simultaneously, in a subsample. We argue that a recently proposed estimator restricted to complete-cases confers more protection to model misspecification than existing ones in the sense that the set of data laws under which it is consistent strictly contains each set of data laws under which each of the previous estimators are consistent.
doi:10.1093/biomet/asr027
PMCID: PMC3384358  PMID: 22822262
Inverse probability weighted; Logistic regression; Missing at random; Model misspecification
12.  Bayesian isotonic density regression 
Biometrika  2011;98(3):537-551.
Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered.
doi:10.1093/biomet/asr025
PMCID: PMC3384359  PMID: 22822259
Conditional density estimation; Dependent Dirichlet process; Hypothesis test; Isotonic regression; Nonparametric Bayes; Quantile regression; Stochastic ordering
13.  A class of mixtures of dependent tail-free processes 
Biometrika  2011;98(3):553-566.
We propose a class of dependent processes in which density shape is regressed on one or more predictors through conditional tail-free probabilities by using transformed Gaussian processes. A particular linear version of the process is developed in detail. The resulting process is flexible and easy to fit using standard algorithms for generalized linear models. The method is applied to growth curve analysis, evolving univariate random effects distributions in generalized linear mixed models, and median survival modelling with censored data and covariate-dependent errors.
doi:10.1093/biomet/asq082
PMCID: PMC3398659  PMID: 22822260
Bayesian nonparametrics; Median regression; Partial exchangeability; Polya tree; Related probability distribution
14.  Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times 
Biometrika  2011;98(2):325-340.
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified naive estimator have been established under certain smoothness conditions. In this paper, we establish the large-sample behaviour of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two datasets regarding the cumulative incidence of different types of menopause from a cross-sectional sample of women in the United States and subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand.
doi:10.1093/biomet/asq083
PMCID: PMC3372275  PMID: 22822257
Competing risk; Confidence interval; Current status data; Interval censoring; Nonparametric maximum likelihood estimator; Survival analysis
15.  Time-dependent cross ratio estimation for bivariate failure times 
Biometrika  2011;98(2):341-354.
In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross ratio. Motivated by Cox’s partial likelihood idea, we propose a novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is examined using simulation studies, and it is applied to the Australian twin data.
doi:10.1093/biomet/asr005
PMCID: PMC3376771  PMID: 22822258
Correlated survival times; Empirical process theory; Local dependency measure; Pseudo-partial likelihood
16.  Sample size and power analysis for sparse signal recovery in genome-wide association studies 
Biometrika  2011;98(2):273-290.
Genome-wide association studies have successfully identified hundreds of novel genetic variants associated with many complex human diseases. However, there is a lack of rigorous work on evaluating the statistical power for identifying these variants. In this paper, we consider sparse signal identification in genome-wide association studies and present two analytical frameworks for detailed analysis of the statistical power for detecting and identifying the disease-associated variants. We present an explicit sample size formula for achieving a given false non-discovery rate while controlling the false discovery rate based on an optimal procedure. Sparse genetic variant recovery is also considered and a boundary condition is established in terms of sparsity and signal strength for almost exact recovery of both disease-associated variants and nondisease-associated variants. A data-adaptive procedure is proposed to achieve this bound. The analytical results are illustrated with a genome-wide association study of neuroblastoma.
doi:10.1093/biomet/asr003
PMCID: PMC3419390  PMID: 23049128
False discovery rate; False non-discovery rate; High-dimensional data; Multiple testing; Oracle exact recovery
17.  Sparse Bayesian infinite factor models 
Biometrika  2011;98(2):291-306.
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data.
doi:10.1093/biomet/asr013
PMCID: PMC3419391  PMID: 23049129
Adaptive Gibbs sampling; Factor analysis; High-dimensional data; Multiplicative gamma process; Parameter expansion; Regularization; Shrinkage
18.  Estimation of covariate effects in generalized linear mixed models with informative cluster sizes 
Biometrika  2011;98(1):147-162.
Summary
In standard regression analyses of clustered data, one typically assumes that the expected value of the response is independent of cluster size. However, this is often false. For example, in studies of surgical interventions, investigators have frequently found surgery volume and outcomes to be related to the skill level of the surgeons. This paper examines the effect of ignoring response-dependent, informative, cluster sizes on standard analytical methods such as mixed-effects models and conditional likelihood methods using analytic calculations, simulation studies and an example from a study of periodontal disease. We consider the case in which cluster sizes and responses share random effects which we assume to be independent of the covariates. Our focus is on maximum likelihood methods that ignore informative cluster sizes, and we show that they exhibit little bias in estimating covariate effects that are uncorrelated with the random effects associated with cluster sizes. However, estimation of covariate effects that are associated with the random effects can be biased. In particular, for models with random intercepts only, ignoring informative cluster sizes can yield biased estimators of the intercept but little bias in estimation of all covariate effects.
doi:10.1093/biomet/asq066
PMCID: PMC3412602  PMID: 23049125
Conditional likelihood; Generalized linear mixed model; Misspecified mixing distribution; Random slope
19.  The effect of correlation in false discovery rate estimation 
Biometrika  2011;98(1):199-214.
Summary
The objective of this paper is to quantify the effect of correlation in false discovery rate analysis. Specifically, we derive approximations for the mean, variance, distribution and quantiles of the standard false discovery rate estimator for arbitrarily correlated data. This is achieved using a negative binomial model for the number of false discoveries, where the parameters are found empirically from the data. We show that correlation may increase the bias and variance of the estimator substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large.
doi:10.1093/biomet/asq075
PMCID: PMC3412603  PMID: 23049127
High-dimensional data; Microarray data; Multiple testing; Negative binomial
20.  Joint estimation of multiple graphical models 
Biometrika  2011;98(1):1-15.
Summary
Gaussian graphical models explore dependence relationships between random variables, through the estimation of the corresponding inverse covariance matrices. In this paper we develop an estimator for such models appropriate for data from several graphical models that share the same variables and some of the dependence structure. In this setting, estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category does not take advantage of the common structure. We propose a method that jointly estimates the graphical models corresponding to the different categories present in the data, aiming to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. We establish the asymptotic consistency and sparsity of the proposed estimator in the high-dimensional case, and illustrate its performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is included.
doi:10.1093/biomet/asq060
PMCID: PMC3412604  PMID: 23049124
Covariance matrix; Graphical model; Hierarchical penalty; High-dimensional data; Network
21.  Nonparametric estimation for length-biased and right-censored data 
Biometrika  2011;98(1):177-186.
Summary
This paper considers survival data arising from length-biased sampling, where the survival times are left truncated by uniformly distributed random truncation times. We propose a nonparametric estimator that incorporates the information about the length-biased sampling scheme. The new estimator retains the simplicity of the truncation product-limit estimator with a closed-form expression, and has a small efficiency loss compared with the nonparametric maximum likelihood estimator, which requires an iterative algorithm. Moreover, the asymptotic variance of the proposed estimator has a closed form, and a variance estimator is easily obtained by plug-in methods. Numerical simulation studies with practical sample sizes are conducted to compare the performance of the proposed method with its competitors. A data analysis of the Canadian Study of Health and Aging is conducted to illustrate the methods and theory.
doi:10.1093/biomet/asq069
PMCID: PMC3412605  PMID: 23049126
Backward and forward recurrence time; Cross-sectional sampling; Partial likelihood; Random truncation; Renewal process
22.  A note on overadjustment in inverse probability weighted estimation 
Biometrika  2010;97(4):997-1001.
Summary
Standardized means, commonly used in observational studies in epidemiology to adjust for potential confounders, are equal to inverse probability weighted means with inverse weights equal to the empirical propensity scores. More refined standardization corresponds with empirical propensity scores computed under more flexible models. Unnecessary standardization induces efficiency loss. However, according to the theory of inverse probability weighted estimation, propensity scores estimated under more flexible models induce improvement in the precision of inverse probability weighted means. This apparent contradiction is clarified by explicitly stating the assumptions under which the improvement in precision is attained.
doi:10.1093/biomet/asq049
PMCID: PMC3371719  PMID: 22822256
Causal inference; Propensity score; Standardized mean
23.  Nonparametric Bayesian density estimation on manifolds with applications to planar shapes 
Biometrika  2010;97(4):851-865.
Summary
Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define notions of centre and spread on this manifold and work with their estimates. However, it is useful to consider full likelihood-based methods, which allow nonparametric estimation of the probability density. This article proposes a broad class of mixture models constructed using suitable kernels on a general compact metric space and then on the planar shape space in particular. Following a Bayesian approach with a nonparametric prior on the mixing distribution, conditions are obtained under which the Kullback–Leibler property holds, implying large support and weak posterior consistency. Gibbs sampling methods are developed for posterior computation, and the methods are applied to problems in density estimation and classification with shape-based predictors. Simulation studies show improved estimation performance relative to existing approaches.
doi:10.1093/biomet/asq044
PMCID: PMC3371720  PMID: 22822255
Dirichlet process mixture; Discriminant analysis; Kullback–Leibler property; Metric space; Nonparametric Bayes; Planar shape space; Posterior consistency; Riemannian manifold
24.  Noncrossing quantile regression curve estimation 
Biometrika  2010;97(4):825-838.
Summary
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels.
doi:10.1093/biomet/asq048
PMCID: PMC3371721  PMID: 22822254
Crossing quantile curve; Heteroscedastic error; Quantile regression; Robustness; Smoothing spline; Tropical cyclone
25.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs 
Biometrika  2010;97(3):519-538.
Summary
Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical and biological systems where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NP-hard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of lasso and adaptive lasso penalties in high-dimensional sparse settings, and propose an error-based choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions.
doi:10.1093/biomet/asq038
PMCID: PMC3254233  PMID: 22434937
Adaptive lasso; Directed acyclic graph; High-dimensional sparse graphs; Lasso; Penalized likelihood estimation; Small n large p asymptotics

Results 1-25 (44)