Search tips
Search criteria

Results 1-25 (240)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation 
In most analyses of large-scale genomic data sets, differential expression analysis is typically assessed by testing for differences in the mean of the distributions between 2 groups. A recent finding by Tomlins and others (2005) is of a different type of pattern of differential expression in which a fraction of samples in one group have overexpression relative to samples in the other group. In this work, we describe a general mixture model framework for the assessment of this type of expression, called outlier profile analysis. We start by considering the single-gene situation and establishing results on identifiability. We propose 2 nonparametric estimation procedures that have natural links to familiar multiple testing procedures. We then develop multivariate extensions of this methodology to handle genome-wide measurements. The proposed methodologies are compared using simulation studies as well as data from a prostate cancer gene expression study.
PMCID: PMC2605210  PMID: 18539648
Bonferroni correction; DNA microarray; False discovery rate; Goodness of fit; Multiple comparisons; Uniform distribution
2.  Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation 
In most analyses of large-scale genomic data sets, differential expression analysis is typically assessed by testing for differences in the mean of the distributions between 2 groups. A recent finding by Tomlins and others (2005) is of a different type of pattern of differential expression in which a fraction of samples in one group have overexpression relative to samples in the other group. In this work, we describe a general mixture model framework for the assessment of this type of expression, called outlier profile analysis. We start by considering the single-gene situation and establishing results on identifiability. We propose 2 nonparametric estimation procedures that have natural links to familiar multiple testing procedures. We then develop multivariate extensions of this methodology to handle genome-wide measurements. The proposed methodologies are compared using simulation studies as well as data from a prostate cancer gene expression study.
PMCID: PMC2605210  PMID: 18539648
Bonferroni correction; DNA microarray; False discovery rate; Goodness of fit; Multiple comparisons; Uniform distribution
3.  Role of the tumor suppressor IQGAP2 in metabolic homeostasis: Possible link between diabetes and cancer 
Deficiency of IQGAP2, a scaffolding protein expressed primarily in liver leads to rearrangements of hepatic protein compartmentalization and altered regulation of enzyme functions predisposing development of hepatocellular carcinoma and diabetes. Employing a systems approach with proteomics, metabolomics and fluxes characterizations, we examined the effects of IQGAP2 deficient proteomic changes on cellular metabolism and the overall metabolic phenotype. Iqgap2−/− mice demonstrated metabolic inflexibility, fasting hyperglycemia and obesity. Such phenotypic characteristics were associated with aberrant hepatic regulations of glycolysis/gluconeogenesis, glycogenolysis, lipid homeostasis and futile cycling corroborated with corresponding proteomic changes in cytosolic and mitochondrial compartments. IQGAP2 deficiency also led to truncated TCA-cycle, increased anaplerosis, increased supply of acetyl-CoA for de novo lipogenesis, and increased mitochondrial methyl-donor metabolism necessary for nucleotides synthesis. Our results suggest that changes in metabolic networks in IQGAP2 deficiency create a hepatic environment of a ‘pre-diabetic’ phenotype and a predisposition to non-alcoholic fatty liver disease (NAFLD) which has been linked to the development of hepatocellular carcinoma.
PMCID: PMC4169985  PMID: 25254002
4.  Theory of Mind Predicts Emotion Knowledge Development in Head Start Children 
Early education and development  2014;25(7):933-948.
Research Findings
Emotion knowledge (EK) enables children to identify emotions in themselves and others and its development facilitates emotion recognition in complex social situations. Social-cognitive processes, such as theory of mind (ToM), may contribute to developing EK by helping children realize the inherent variability associated with emotion expression across individuals and situations. The present study explored how ToM, particularly false belief understanding, in preschool predicts children’s developing EK in kindergarten. Participants were 60 3- to 5-year-old Head Start children. ToM and EK measures were obtained from standardized child tasks. ToM scores were positively related to performance on an EK task in kindergarten after controlling for preschool levels of EK and verbal ability. Exploratory analyses provided preliminary evidence that ToM serves as an indirect effect between verbal ability and EK.
Practice or Policy
Early intervention programs may benefit from including lessons on ToM to help promote socio-emotional learning, specifically EK. This consideration may be the most fruitful when the targeted population is at-risk.
PMCID: PMC4214863  PMID: 25364212
5.  Determining Urea Levels in Exhaled Breath Condensate with Minimal Preparation Steps and Classic LC–MS 
Journal of Chromatographic Science  2013;52(9):1026-1032.
Exhaled breath condensate (EBC) provides a relatively easy, non-invasive method for measuring biomarkers of inflammation and oxidative stress in the airways. However, the levels of these biomarkers in EBC are influenced, not only by their levels in lung lining fluid but also by the volume of water vapor that also condenses during EBC collection. For this reason, the use of a biomarker of dilution has been recommended. Urea has been proposed and utilized as a promising dilution biomarker due to its even distribution throughout the body and relatively low volatility. Current EBC urea analytical methods either are not sensitive enough, necessitating large volumes of EBC, or are labor intensive, requiring a derivatization step or other pretreatment. We report here a straightforward and reliable LC–MS approach that we developed that does not require derivatization or large sample volume (∼36 µL). An Acclaim mixed-mode hydrophilic interaction chromatography column was selected because it can produce good peak symmetry and efficiently separate urea from other polar and nonpolar compounds. To achieve a high recovery rate, a slow and incomplete evaporation method was used followed by a solvent-phase exchange. Among EBC samples collected from 28 children, urea levels were found to be highly variable, with a relative standard deviation of 234%, suggesting high variability in dilution of the lung lining fluid component of EBC. The limit of detection was found to be 0.036 µg/mL.
PMCID: PMC4215077  PMID: 24190872
6.  Resident Duty Hours: A Survey of Internal Medicine Program Directors 
Journal of General Internal Medicine  2014;29(10):1349-1354.
In 2011, the Accreditation Council for Graduate Medical Education (ACGME) implemented new Common Program Requirements to regulate duty hours of resident physicians, with three goals: improved patient safety, quality of resident education and quality of life for trainees. We sought to assess Internal Medicine program director (IMPD) perceptions of the 2011 Common Program Requirements in July 2012, one year following implementation of the new standards.
A cross-sectional study of all IMPDs at ACGME-accredited programs in the United States (N = 381) was performed using a 32-question, self-administered survey. Contact information was identified for 323 IMPDs. Three individualized emails were sent to each director over a 6-week period, requesting participation in the survey. Outcomes measured included approval of duty hours regulations, as well as perceptions of changes in graduate medical education and patient care resulting from the revised ACGME standards.
A total of 237 surveys were returned (73 % response rate). More than half of the IMPDs (52 %) reported “overall” approval of the 2011 duty hour regulations, with greater than 70 % approval of all individual regulations except senior resident daily duty periods (49 % approval) and 16-hour intern shifts (17 % approval). Although a majority feel resident quality of life has improved (55 %), most IMPDs believe that resident education (60 %) is worse. A minority report that quality (8 %) or safety (11 %) of patient care has improved.
One year after implementation of new ACGME duty hour requirements, IMPDs report overall approval of the standards, but strong disapproval of 16-hour shift limits for interns. Few program directors perceive that the duty hour restrictions have resulted in better care for patients or education of residents. Although resident quality of life seems improved, most IMPDs report that their own workload has increased. Based on these results, the intended benefits of duty hour regulations may not yet have been realized.
Electronic supplementary material
The online version of this article (doi:10.1007/s11606-014-2912-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4175662  PMID: 24913004
graduate medical education; resident duty hours; compliance; patient safety
8.  Parametric modeling of whole-genome sequencing data for CNV identification 
Biostatistics (Oxford, England)  2014;15(3):427-441.
Copy number variants (CNVs) constitute an important class of genetic variants in human genome and are shown to be associated with complex diseases. Whole-genome sequencing provides an unbiased way of identifying all the CNVs that an individual carries. In this paper, we consider parametric modeling of the read depth (RD) data from whole-genome sequencing with the aim of identifying the CNVs, including both Poisson and negative-binomial modeling of such count data. We propose a unified approach of using a mean-matching variance stabilizing transformation to turn the relatively complicated problem of sparse segment identification for count data into a sparse segment identification problem for a sequence of Gaussian data. We apply the optimal sparse segment identification procedure to the transformed data in order to identify the CNV segments. This provides a computationally efficient approach for RD-based CNV identification. Simulation results show that this approach often results in a small number of false identifications of the CNVs and has similar or better performances in identifying the true CNVs when compared with other RD-based approaches. We demonstrate the methods using the trio data from the 1000 Genomes Project.
PMCID: PMC4059462  PMID: 24478395
Natural exponential family; Sparse segment identification; Variance stabilization
9.  The estimation of direct and indirect causal effects in the presence of misclassified binary mediator 
Biostatistics (Oxford, England)  2014;15(3):498-512.
Mediation analysis serves to quantify the effect of an exposure on an outcome mediated by a certain intermediate and to quantify the extent to which the effect is direct. When the mediator is misclassified, the validity of mediation analysis can be severely undermined. The contribution of the present work is to study the effects of non-differential misclassification of a binary mediator in the estimation of direct and indirect causal effects when the outcome is either continuous or binary and exposure–mediator interaction can be present, and to allow the correction of misclassification. A hybrid of likelihood-based and predictive value weighting method for misclassification correction coupled with sensitivity analysis is proposed and a second approach using the expectation–maximization algorithm is developed. The correction strategy requires knowledge of a plausible range of sensitivity and specificity parameters. The approaches are applied to a perinatal epidemiological study of the determinants of pre-term birth.
PMCID: PMC4059465  PMID: 24671909
EM algorithm; Iteratively re-weighted least squares; Mediation analysis; Misclassification; Predictive value weighting; Pre-eclampsia; Pre-term birth; Sensitivity analysis
10.  Reproducibility of 3D chromatin configuration reconstructions 
Biostatistics (Oxford, England)  2014;15(3):442-456.
It is widely recognized that the three-dimensional (3D) architecture of eukaryotic chromatin plays an important role in processes such as gene regulation and cancer-driving gene fusions. Observing or inferring this 3D structure at even modest resolutions had been problematic, since genomes are highly condensed and traditional assays are coarse. However, recently devised high-throughput molecular techniques have changed this situation. Notably, the development of a suite of chromatin conformation capture (CCC) assays has enabled elicitation of contacts—spatially close chromosomal loci—which have provided insights into chromatin architecture. Most analysis of CCC data has focused on the contact level, with less effort directed toward obtaining 3D reconstructions and evaluating the accuracy and reproducibility thereof. While questions of accuracy must be addressed experimentally, questions of reproducibility can be addressed statistically—the purpose of this paper. We use a constrained optimization technique to reconstruct chromatin configurations for a number of closely related yeast datasets and assess reproducibility using four metrics that measure the distance between 3D configurations. The first of these, Procrustes fitting, measures configuration closeness after applying reflection, rotation, translation, and scaling-based alignment of the structures. The others base comparisons on the within-configuration inter-point distance matrix. Inferential results for these metrics rely on suitable permutation approaches. Results indicate that distance matrix-based approaches are preferable to Procrustes analysis, not because of the metrics per se but rather on account of the ability to customize permutation schemes to handle within-chromosome contiguity. It has recently been emphasized that the use of constrained optimization approaches to 3D architecture reconstruction are prone to being trapped in local minima. Our methods of reproducibility assessment provide a means for comparing 3D reconstruction solutions so that we can discern between local and global optima by contrasting solutions under perturbed inputs.
PMCID: PMC4059464  PMID: 24519450
Chromatin conformation; Distance matrix; Genome architecture; Procrustes analysis
11.  Extension of a Cox proportional hazards cure model when cure information is partially known 
Biostatistics (Oxford, England)  2014;15(3):540-554.
When there is evidence of long-term survivors, cure models are often used to model the survival curve. A cure model is a mixture model consisting of a cured fraction and an uncured fraction. Traditional cure models assume that the cured or uncured status in the censored set cannot be distinguished. But in many practices, some diagnostic procedures may provide partial information about the cured or uncured status relative to certain sensitivity and specificity. The traditional cure model does not take advantage of this additional information. Motivated by a clinical study on bone injury in pediatric patients, we propose a novel extension of a traditional Cox proportional hazards (PH) cure model that incorporates the additional information about the cured status. This extension can be applied when the latency part of the cure model is modeled by the Cox PH model. Extensive simulations demonstrated that the proposed extension provides more efficient and less biased estimations, and the higher efficiency and smaller bias is associated with higher sensitivity and specificity of diagnostic procedures. When the proposed extended Cox PH cure model was applied to the motivating example, there was a substantial improvement in the estimation.
PMCID: PMC4059463  PMID: 24511081
Cure model; Expectation-maximization (EM) algorithm; Proportional hazards; Relative efficiency; Sensitivity and specificity
12.  Regression analysis of mixed recurrent-event and panel-count data 
Biostatistics (Oxford, England)  2014;15(3):555-568.
In event history studies concerning recurrent events, two types of data have been extensively discussed. One is recurrent-event data (Cook and Lawless, 2007. The Analysis of Recurrent Event Data. New York: Springer), and the other is panel-count data (Zhao and others, 2010. Nonparametric inference based on panel-count data. Test 20, 1–42). In the former case, all study subjects are monitored continuously; thus, complete information is available for the underlying recurrent-event processes of interest. In the latter case, study subjects are monitored periodically; thus, only incomplete information is available for the processes of interest. In reality, however, a third type of data could occur in which some study subjects are monitored continuously, but others are monitored periodically. When this occurs, we have mixed recurrent-event and panel-count data. This paper discusses regression analysis of such mixed data and presents two estimation procedures for the problem. One is a maximum likelihood estimation procedure, and the other is an estimating equation procedure. The asymptotic properties of both resulting estimators of regression parameters are established. Also, the methods are applied to a set of mixed recurrent-event and panel-count data that arose from a Childhood Cancer Survivor Study and motivated this investigation.
PMCID: PMC4059466  PMID: 24648408
Estimating equation-based approach; Maximum likelihood approach; Regression analysis
13.  Semiparametric regression analysis for time-to-event marked endpoints in cancer studies 
Biostatistics (Oxford, England)  2013;15(3):513-525.
In cancer studies the disease natural history process is often observed only at a fixed, random point of diagnosis (a survival time), leading to a current status observation (Sun (2006). The statistical analysis of interval-censored failure time data. Berlin: Springer.) representing a surrogate (a mark) (Jacobsen (2006). Point process theory and applications: marked point and piecewise deterministic processes. Basel: Birkhauser.) attached to the observed survival time. Examples include time to recurrence and stage (local vs. metastatic). We study a simple model that provides insights into the relationship between the observed marked endpoint and the latent disease natural history leading to it. A semiparametric regression model is developed to assess the covariate effects on the observed marked endpoint explained by a latent disease process. The proposed semiparametric regression model can be represented as a transformation model in terms of mark-specific hazards, induced by a process-based mixed effect. Large-sample properties of the proposed estimators are established. The methodology is illustrated by Monte Carlo simulation studies, and an application to a randomized clinical trial of adjuvant therapy for breast cancer.
PMCID: PMC4102917  PMID: 24379192
Disease natural history; Marked endpoints; Semiparametric regression
14.  Extending distributed lag models to higher degrees 
Biostatistics (Oxford, England)  2013;15(2):398-412.
Distributed lag (DL) models relate lagged covariates to a response and are a popular statistical model used in a wide variety of disciplines to analyze exposure–response data. However, classical DL models do not account for possible interactions between lagged predictors. In the presence of interactions between lagged covariates, the total effect of a change on the response is not merely a sum of lagged effects as is typically assumed. This article proposes a new class of models, called high-degree DL models, that extend basic DL models to incorporate hypothesized interactions between lagged predictors. The modeling strategy utilizes Gaussian processes to counterbalance predictor collinearity and as a dimension reduction tool. To choose the degree and maximum lags used within the models, a computationally manageable model comparison method is proposed based on maximum a posteriori estimators. The models and methods are illustrated via simulation and application to investigating the effect of heat exposure on mortality in Los Angeles and New York.
PMCID: PMC3944968  PMID: 23990524
Dimension reduction; Gaussian process; Heat exposure; Lagged interaction; NMMAPS dataset
15.  High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis 
Biostatistics (Oxford, England)  2013;15(2):207-221.
Survival analysis endures as an old, yet active research field with applications that spread across many domains. Continuing improvements in data acquisition techniques pose constant challenges in applying existing survival analysis methods to these emerging data sets. In this paper, we present tools for fitting regularized Cox survival analysis models on high-dimensional, massive sample-size (HDMSS) data using a variant of the cyclic coordinate descent optimization technique tailored for the sparsity that HDMSS data often present. Experiments on two real data examples demonstrate that efficient analyses of HDMSS data using these tools result in improved predictive performance and calibration.
PMCID: PMC3944969  PMID: 24096388
Big data; Cox proportional hazards; Regularized regression; Survival analysis
16.  Bayesian semiparametric estimation of covariate-dependent ROC curves 
Biostatistics (Oxford, England)  2013;15(2):353-369.
Receiver operating characteristic (ROC) curves are widely used to measure the discriminating power of medical tests and other classification procedures. In many practical applications, the performance of these procedures can depend on covariates such as age, naturally leading to a collection of curves associated with different covariate levels. This paper develops a Bayesian heteroscedastic semiparametric regression model and applies it to the estimation of covariate-dependent ROC curves. More specifically, our approach uses Gaussian process priors to model the conditional mean and conditional variance of the biomarker of interest for each of the populations under study. The model is illustrated through an application to the evaluation of prostate-specific antigen for the diagnosis of prostate cancer, which contrasts the performance of our model against alternative models.
PMCID: PMC3944970  PMID: 24174579
Bayesian inference; Gaussian process; Non-parametric regression; Receiver operating characteristic curve
17.  Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations 
Biostatistics (Oxford, England)  2013;15(2):284-295.
A classical approach to combine independent test statistics is Fisher's combination of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$p$\end{document}-values, which follows the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$\chi ^2$\end{document} distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.
PMCID: PMC3944971  PMID: 24174580
Dependent tests; Fisher's combination; Gamma distributions; Genetic pleiotropic associations; Genome-wide association studies; Type I error
18.  Bayesian inference for longitudinal data with non-parametric treatment effects 
Biostatistics (Oxford, England)  2013;15(2):341-352.
We consider inference for longitudinal data based on mixed-effects models with a non-parametric Bayesian prior on the treatment effect. The proposed non-parametric Bayesian prior is a random partition model with a regression on patient-specific covariates. The main feature and motivation for the proposed model is the use of covariates with a mix of different data formats and possibly high-order interactions in the regression. The regression is not explicitly parameterized. It is implied by the random clustering of subjects. The motivating application is a study of the effect of an anticancer drug on a patient's blood pressure. The study involves blood pressure measurements taken periodically over several 24-h periods for 54 patients. The 24-h periods for each patient include a pretreatment period and several occasions after the start of therapy.
PMCID: PMC3944972  PMID: 24285773
Clustering; Mixed-effects model; Non-parametric Bayesian model; Random partition; Repeated measurement data
19.  Predicting the restricted mean event time with the subject's baseline covariates in survival analysis 
Biostatistics (Oxford, England)  2013;15(2):222-233.
For designing, monitoring, and analyzing a longitudinal study with an event time as the outcome variable, the restricted mean event time (RMET) is an easily interpretable, clinically meaningful summary of the survival function in the presence of censoring. The RMET is the average of all potential event times measured up to a time point τ and can be estimated consistently by the area under the Kaplan–Meier curve over \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}$[0, \tau ]$\end{document}. In this paper, we study a class of regression models, which directly relates the RMET to its “baseline” covariates for predicting the future subjects’ RMETs. Since the standard Cox and the accelerated failure time models can also be used for estimating such RMETs, we utilize a cross-validation procedure to select the “best” among all the working models considered in the model building and evaluation process. Lastly, we draw inferences for the predicted RMETs to assess the performance of the final selected model using an independent data set or a “hold-out” sample from the original data set. All the proposals are illustrated with the data from the an HIV clinical trial conducted by the AIDS Clinical Trials Group and the primary biliary cirrhosis study conducted by the Mayo Clinic.
PMCID: PMC3944973  PMID: 24292992
Accelerated failure time model; Cox model; Cross-validation; Hold-out sample; Personalized medicine; Perturbation-resampling method
20.  Evaluating principal surrogate endpoints with time-to-event data accounting for time-varying treatment efficacy 
Biostatistics (Oxford, England)  2013;15(2):251-265.
Principal surrogate (PS) endpoints are relatively inexpensive and easy to measure study outcomes that can be used to reliably predict treatment effects on clinical endpoints of interest. Few statistical methods for assessing the validity of potential PSs utilize time-to-event clinical endpoint information and to our knowledge none allow for the characterization of time-varying treatment effects. We introduce the time-dependent and surrogate-dependent treatment efficacy curve, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\mathrm {TE}}(t|s)$\end{document}, and a new augmented trial design for assessing the quality of a biomarker as a PS. We propose a novel Weibull model and an estimated maximum likelihood method for estimation of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}${\mathrm {TE}}(t|s)$\end{document} curve. We describe the operating characteristics of our methods via simulations. We analyze data from the Diabetes Control and Complications Trial, in which we find evidence of a biomarker with value as a PS.
PMCID: PMC3944974  PMID: 24337534
Case–control study; Causal inference; Clinical trials; Principal stratification; Survival analysis; Treatment efficacy curve; Weibull model
21.  Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal 
Biostatistics (Oxford, England)  2013;15(2):266-283.
In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics 58, 21–29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431–440). The method is applied to data from a macular degeneration study and an ovarian cancer study.
PMCID: PMC4023321  PMID: 24285772
Bayesian estimation; Principal stratification; Surrogate endpoints
22.  Case-only method for cause-specific hazards models with application to assessing differential vaccine efficacy by viral and host genetics 
Biostatistics (Oxford, England)  2013;15(1):196-203.
Cause-specific proportional hazards models are commonly used for analyzing competing risks data in clinical studies. Motivated by the objective to assess differential vaccine protection against distinct pathogen types in randomized preventive vaccine efficacy trials, we present an alternative case-only method to standard maximum partial likelihood estimation that applies to a rare failure event, e.g. acquisition of HIV infection. A logistic regression model is fit to the counts of cause-specific events (infecting pathogen type) within study arms, with an offset adjusting for the randomization ratio. This formulation of cause-specific hazard ratio estimation permits immediate incorporation of host-genetic factors to be assessed as effect modifiers, an important area of vaccine research for identifying immune correlates of protection, thus inheriting the estimation efficiency, and cost benefits of the case-only estimator commonly used for assessing gene–treatment interactions. The method is used to reassess HIV genotype-specific vaccine efficacy in the RV144 trial, providing nearly identical results to standard Cox methods, and to assess if and how this vaccine efficacy depends on Fc-γ receptor genes.
PMCID: PMC3862206  PMID: 23813283
Gene–treatment interaction; Sieve analysis; Vaccine efficacy
23.  Mixture models for single-cell assays with applications to vaccine studies 
Biostatistics (Oxford, England)  2013;15(1):87-101.
Blood and tissue are composed of many functionally distinct cell subsets. In immunological studies, these can be measured accurately only using single-cell assays. The characterization of these small cell subsets is crucial to decipher system-level biological changes. For this reason, an increasing number of studies rely on assays that provide single-cell measurements of multiple genes and proteins from bulk cell samples. A common problem in the analysis of such data is to identify biomarkers (or combinations of biomarkers) that are differentially expressed between two biological conditions (e.g. before/after stimulation), where expression is defined as the proportion of cells expressing that biomarker (or biomarker combination) in the cell subset(s) of interest. Here, we present a Bayesian hierarchical framework based on a beta-binomial mixture model for testing for differential biomarker expression using single-cell assays. Our model allows the inference to be subject specific, as is typically required when assessing vaccine responses, while borrowing strength across subjects through common prior distributions. We propose two approaches for parameter estimation: an empirical-Bayes approach using an Expectation–Maximization algorithm and a fully Bayesian one based on a Markov chain Monte Carlo algorithm. We compare our method against classical approaches for single-cell assays including Fisher’s exact test, a likelihood ratio test, and basic log-fold changes. Using several experimental assays measuring proteins or genes at single-cell level and simulations, we show that our method has higher sensitivity and specificity than alternative methods. Additional simulations show that our framework is also robust to model misspecification. Finally, we demonstrate how our approach can be extended to testing multivariate differential expression across multiple biomarker combinations using a Dirichlet-multinomial model and illustrate this approach using single-cell gene expression data and simulations.
PMCID: PMC3862207  PMID: 23887981
Bayesian modeling; Expectation–Maximization; Flow cytometry; Hierarchical modeling; Immunology; Marginal likelihood; Markov Chain Monte Carlo; MIMOSA; Single-cell gene expression
24.  Density estimation on multivariate censored data with optional Pólya tree 
Biostatistics (Oxford, England)  2013;15(1):182-195.
Analyzing the failure times of multiple events is of interest in many fields. Estimating the joint distribution of the failure times in a non-parametric way is not straightforward because some failure times are often right-censored and only known to be greater than observed follow-up times. Although it has been studied, there is no universally optimal solution for this problem. It is still challenging and important to provide alternatives that may be more suitable than existing ones in specific settings. Related problems of the existing methods are not only limited to infeasible computations, but also include the lack of optimality and possible non-monotonicity of the estimated survival function. In this paper, we proposed a non-parametric Bayesian approach for directly estimating the density function of multivariate survival times, where the prior is constructed based on the optional Pólya tree. We investigated several theoretical aspects of the procedure and derived an efficient iterative algorithm for implementing the Bayesian procedure. The empirical performance of the method was examined via extensive simulation studies. Finally, we presented a detailed analysis using the proposed method on the relationship among organ recovery times in severely injured patients. From the analysis, we suggested interesting medical information that can be further pursued in clinics.
PMCID: PMC3862208  PMID: 23902636
Multivariate survival analysis; Non-parametric Bayesian; Optional Pólya tree
25.  Prior robust empirical Bayes inference for large-scale data by conditioning on rank with application to microarray data 
Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method’s prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank.Shrinkage provides a ready-to-use implementation of the proposed methodology.
PMCID: PMC3862209  PMID: 23934072
Bayesian shrinkage; Confidence intervals; Ranking bias; Robust multiple estimation

Results 1-25 (240)