PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (206)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
1.  Case-only method for cause-specific hazards models with application to assessing differential vaccine efficacy by viral and host genetics 
Biostatistics (Oxford, England)  2013;15(1):196-203.
Cause-specific proportional hazards models are commonly used for analyzing competing risks data in clinical studies. Motivated by the objective to assess differential vaccine protection against distinct pathogen types in randomized preventive vaccine efficacy trials, we present an alternative case-only method to standard maximum partial likelihood estimation that applies to a rare failure event, e.g. acquisition of HIV infection. A logistic regression model is fit to the counts of cause-specific events (infecting pathogen type) within study arms, with an offset adjusting for the randomization ratio. This formulation of cause-specific hazard ratio estimation permits immediate incorporation of host-genetic factors to be assessed as effect modifiers, an important area of vaccine research for identifying immune correlates of protection, thus inheriting the estimation efficiency, and cost benefits of the case-only estimator commonly used for assessing gene–treatment interactions. The method is used to reassess HIV genotype-specific vaccine efficacy in the RV144 trial, providing nearly identical results to standard Cox methods, and to assess if and how this vaccine efficacy depends on Fc-γ receptor genes.
doi:10.1093/biostatistics/kxt018
PMCID: PMC3862206  PMID: 23813283
Gene–treatment interaction; Sieve analysis; Vaccine efficacy
2.  Mixture models for single-cell assays with applications to vaccine studies 
Biostatistics (Oxford, England)  2013;15(1):87-101.
Blood and tissue are composed of many functionally distinct cell subsets. In immunological studies, these can be measured accurately only using single-cell assays. The characterization of these small cell subsets is crucial to decipher system-level biological changes. For this reason, an increasing number of studies rely on assays that provide single-cell measurements of multiple genes and proteins from bulk cell samples. A common problem in the analysis of such data is to identify biomarkers (or combinations of biomarkers) that are differentially expressed between two biological conditions (e.g. before/after stimulation), where expression is defined as the proportion of cells expressing that biomarker (or biomarker combination) in the cell subset(s) of interest. Here, we present a Bayesian hierarchical framework based on a beta-binomial mixture model for testing for differential biomarker expression using single-cell assays. Our model allows the inference to be subject specific, as is typically required when assessing vaccine responses, while borrowing strength across subjects through common prior distributions. We propose two approaches for parameter estimation: an empirical-Bayes approach using an Expectation–Maximization algorithm and a fully Bayesian one based on a Markov chain Monte Carlo algorithm. We compare our method against classical approaches for single-cell assays including Fisher’s exact test, a likelihood ratio test, and basic log-fold changes. Using several experimental assays measuring proteins or genes at single-cell level and simulations, we show that our method has higher sensitivity and specificity than alternative methods. Additional simulations show that our framework is also robust to model misspecification. Finally, we demonstrate how our approach can be extended to testing multivariate differential expression across multiple biomarker combinations using a Dirichlet-multinomial model and illustrate this approach using single-cell gene expression data and simulations.
doi:10.1093/biostatistics/kxt024
PMCID: PMC3862207  PMID: 23887981
Bayesian modeling; Expectation–Maximization; Flow cytometry; Hierarchical modeling; Immunology; Marginal likelihood; Markov Chain Monte Carlo; MIMOSA; Single-cell gene expression
3.  Density estimation on multivariate censored data with optional Pólya tree 
Biostatistics (Oxford, England)  2013;15(1):182-195.
Analyzing the failure times of multiple events is of interest in many fields. Estimating the joint distribution of the failure times in a non-parametric way is not straightforward because some failure times are often right-censored and only known to be greater than observed follow-up times. Although it has been studied, there is no universally optimal solution for this problem. It is still challenging and important to provide alternatives that may be more suitable than existing ones in specific settings. Related problems of the existing methods are not only limited to infeasible computations, but also include the lack of optimality and possible non-monotonicity of the estimated survival function. In this paper, we proposed a non-parametric Bayesian approach for directly estimating the density function of multivariate survival times, where the prior is constructed based on the optional Pólya tree. We investigated several theoretical aspects of the procedure and derived an efficient iterative algorithm for implementing the Bayesian procedure. The empirical performance of the method was examined via extensive simulation studies. Finally, we presented a detailed analysis using the proposed method on the relationship among organ recovery times in severely injured patients. From the analysis, we suggested interesting medical information that can be further pursued in clinics.
doi:10.1093/biostatistics/kxt025
PMCID: PMC3862208  PMID: 23902636
Multivariate survival analysis; Non-parametric Bayesian; Optional Pólya tree
4.  Prior robust empirical Bayes inference for large-scale data by conditioning on rank with application to microarray data 
Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method’s prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank.Shrinkage provides a ready-to-use implementation of the proposed methodology.
doi:10.1093/biostatistics/kxt026
PMCID: PMC3862209  PMID: 23934072
Bayesian shrinkage; Confidence intervals; Ranking bias; Robust multiple estimation
5.  Censored quantile regression with recursive partitioning-based weights 
Biostatistics (Oxford, England)  2013;15(1):170-181.
Censored quantile regression provides a useful alternative to the Cox proportional hazards model for analyzing survival data. It directly models the conditional quantile of the survival time and hence is easy to interpret. Moreover, it relaxes the proportionality constraint on the hazard function associated with the popular Cox model and is natural for modeling heterogeneity of the data. Recently, Wang and Wang (2009. Locally weighted censored quantile regression. Journal of the American Statistical Association 103, 1117–1128) proposed a locally weighted censored quantile regression approach that allows for covariate-dependent censoring and is less restrictive than other censored quantile regression methods. However, their kernel smoothing-based weighting scheme requires all covariates to be continuous and encounters practical difficulty with even a moderate number of covariates. We propose a new weighting approach that uses recursive partitioning, e.g. survival trees, that offers greater flexibility in handling covariate-dependent censoring in moderately high dimensions and can incorporate both continuous and discrete covariates. We prove that this new weighting scheme leads to consistent estimation of the quantile regression coefficients and demonstrate its effectiveness via Monte Carlo simulations. We also illustrate the new method using a widely recognized data set from a clinical trial on primary biliary cirrhosis.
doi:10.1093/biostatistics/kxt027
PMCID: PMC3862210  PMID: 23975800
Censored quantile regression; Recursive partitioning; Survival analysis; Survival ensembles
7.  A general regression framework for a secondary outcome in case–control studies 
Biostatistics (Oxford, England)  2013;15(1):117-128.
Modern case–control studies typically involve the collection of data on a large number of outcomes, often at considerable logistical and monetary expense. These data are of potentially great value to subsequent researchers, who, although not necessarily concerned with the disease that defined the case series in the original study, may want to use the available information for a regression analysis involving a secondary outcome. Because cases and controls are selected with unequal probability, regression analysis involving a secondary outcome generally must acknowledge the sampling design. In this paper, the author presents a new framework for the analysis of secondary outcomes in case–control studies. The approach is based on a careful re-parameterization of the conditional model for the secondary outcome given the case–control outcome and regression covariates, in terms of (a) the population regression of interest of the secondary outcome given covariates and (b) the population regression of the case–control outcome on covariates. The error distribution for the secondary outcome given covariates and case–control status is otherwise unrestricted. For a continuous outcome, the approach sometimes reduces to extending model (a) by including a residual of (b) as a covariate. However, the framework is general in the sense that models (a) and (b) can take any functional form, and the methodology allows for an identity, log or logit link function for model (a).
doi:10.1093/biostatistics/kxt041
PMCID: PMC3983430  PMID: 24152770
Case–control studies; Generalized linear models; Statistical genetics; Secondary outcomes
8.  Normalization and extraction of interpretable metrics from raw accelerometry data 
Biostatistics (Oxford, England)  2013;15(1):102-116.
We introduce an explicit set of metrics for human activity based on high-density acceleration recordings from a hip-worn tri-axial accelerometer. These metrics are based on two concepts: (i) Time Active, a measure of the length of time when activity is distinguishable from rest and (ii) AI, a measure of the relative amplitude of activity relative to rest. All measurements are normalized (have the same interpretation across subjects and days), easy to explain and implement, and reproducible across platforms and software implementations. Metrics were validated by visual inspection of results and quantitative in-lab replication studies, and by an association study with health outcomes
doi:10.1093/biostatistics/kxt029
PMCID: PMC4072911  PMID: 23999141
Activity intensity; Movelets; Movement; Signal processing; Time active; Tri-axial accelerometer
9.  Inference on periodicity of circadian time series 
Biostatistics (Oxford, England)  2013;14(4):792-806.
Estimation of the period length of time-course data from cyclical biological processes, such as those driven by the circadian pacemaker, is crucial for inferring the properties of the biological clock found in many living organisms. We propose a methodology for period estimation based on spectrum resampling (SR) techniques. Simulation studies show that SR is superior and more robust to non-sinusoidal and noisy cycles than a currently used routine based on Fourier approximations. In addition, a simple fit to the oscillations using linear least squares is available, together with a non-parametric test for detecting changes in period length which allows for period estimates with different variances, as frequently encountered in practice. The proposed methods are motivated by and applied to various data examples from chronobiology.
doi:10.1093/biostatistics/kxt020
PMCID: PMC3988453  PMID: 23743206
Circadian rhythms; Non-parametric testing; Period estimation; Resampling; Spectrum
10.  A Bayesian model for cluster detection 
Biostatistics (Oxford, England)  2013;14(4):752-765.
The detection of areas in which the risk of a particular disease is significantly elevated, leading to an excess of cases, is an important enterprise in spatial epidemiology. Various frequentist approaches have been suggested for the detection of “clusters” within a hypothesis testing framework. Unfortunately, these suffer from a number of drawbacks including the difficulty in specifying a p-value threshold at which to call significance, the inherent multiplicity problem, and the possibility of multiple clusters. In this paper, we suggest a Bayesian approach to detecting “areas of clustering” in which the study region is partitioned into, possibly multiple, “zones” within which the risk is either at a null, or non-null, level. Computation is carried out using Markov chain Monte Carlo, tuned to the model that we develop. The method is applied to leukemia data in upstate New York.
doi:10.1093/biostatistics/kxt001
PMCID: PMC3769995  PMID: 23476026
Bayes factors; Markov chain Monte Carlo; Scan statistic; Spatial epidemiology
11.  Test for interactions between a genetic marker set and environment in generalized linear models 
Biostatistics (Oxford, England)  2013;14(4):667-681.
We consider in this paper testing for interactions between a genetic marker set and an environmental variable. A common practice in studying gene–environment (GE) interactions is to analyze one single-nucleotide polymorphism (SNP) at a time. It is of significant interest to analyze SNPs in a biologically defined set simultaneously, e.g. gene or pathway. In this paper, we first show that if the main effects of multiple SNPs in a set are associated with a disease/trait, the classical single SNP–GE interaction analysis can be biased. We derive the asymptotic bias and study the conditions under which the classical single SNP–GE interaction analysis is unbiased. We further show that, the simple minimum p-value-based SNP-set GE analysis, can be biased and have an inflated Type 1 error rate. To overcome these difficulties, we propose a computationally efficient and powerful gene–environment set association test (GESAT) in generalized linear models. Our method tests for SNP-set by environment interactions using a variance component test, and estimates the main SNP effects under the null hypothesis using ridge regression. We evaluate the performance of GESAT using simulation studies, and apply GESAT to data from the Harvard lung cancer genetic study to investigate GE interactions between the SNPs in the 15q24–25.1 region and smoking on lung cancer risk.
doi:10.1093/biostatistics/kxt006
PMCID: PMC3769996  PMID: 23462021
Asymptotic bias analysis; Gene–environment interactions; Genome-wide association studies; Score statistic; Single-nucleotide polymorphism; Variance component test
12.  Controlling the local false discovery rate in the adaptive Lasso 
Biostatistics (Oxford, England)  2013;14(4):653-666.
The Lasso shrinkage procedure achieved its popularity, in part, by its tendency to shrink estimated coefficients to zero, and its ability to serve as a variable selection procedure. Using data-adaptive weights, the adaptive Lasso modified the original procedure to increase the penalty terms for those variables estimated to be less important by ordinary least squares. Although this modified procedure attained the oracle properties, the resulting models tend to include a large number of “false positives” in practice. Here, we adapt the concept of local false discovery rates (lFDRs) so that it applies to the sequence, λn, of smoothing parameters for the adaptive Lasso. We define the lFDR for a given λn to be the probability that the variable added to the model by decreasing λn to λn−δ is not associated with the outcome, where δ is a small value. We derive the relationship between the lFDR and λn, show lFDR=1 for traditional smoothing parameters, and show how to select λn so as to achieve a desired lFDR. We compare the smoothing parameters chosen to achieve a specified lFDR and those chosen to achieve the oracle properties, as well as their resulting estimates for model coefficients, with both simulation and an example from a genetic study of prostate specific antigen.
doi:10.1093/biostatistics/kxt008
PMCID: PMC3769997  PMID: 23575212
Adaptive Lasso; Local false discovery rate; Smoothing parameter; Variable selection
13.  Adaptive enrichment designs for clinical trials 
Biostatistics (Oxford, England)  2013;14(4):613-625.
Modern medicine has graduated from broad spectrum treatments to targeted therapeutics. New drugs recognize the recently discovered heterogeneity of many diseases previously considered to be fairly homogeneous. These treatments attack specific genetic pathways which are only dysregulated in some smaller subset of patients with the disease. Often this subset is only rudimentarily understood until well into large-scale clinical trials. As such, standard practice has been to enroll a broad range of patients and run post hoc subset analysis to determine those who may particularly benefit. This unnecessarily exposes many patients to hazardous side effects, and may vastly decrease the efficiency of the trial (especially if only a small subset of patients benefit). In this manuscript, we propose a class of adaptive enrichment designs that allow the eligibility criteria of a trial to be adaptively updated during the trial, restricting entry to patients likely to benefit from the new treatment. We show that our designs both preserve the type 1 error, and in a variety of cases provide a substantial increase in power.
doi:10.1093/biostatistics/kxt010
PMCID: PMC3769998  PMID: 23525452
Adaptive clinical trials; Biomarker; Cutpoint; Enrichment
14.  Pointwise confidence intervals for a survival distribution with small samples or heavy censoring 
Biostatistics (Oxford, England)  2013;14(4):723-736.
We propose a beta product confidence procedure (BPCP) that is a non-parametric confidence procedure for the survival curve at a fixed time for right-censored data assuming independent censoring. In such situations, the Kaplan–Meier estimator is typically used with an asymptotic confidence interval (CI) that can have coverage problems when the number of observed failures is not large, and/or when testing the latter parts of the curve where there are few remaining subjects at risk. The BPCP guarantees central coverage (i.e. ensures that both one-sided error rates are no more than half of the total nominal rate) when there is no censoring (in which case it reduces to the Clopper–Pearson interval) or when there is progressive type II censoring (i.e. when censoring only occurs immediately after failures on fixed proportions of the remaining individuals). For general independent censoring, simulations show that the BPCP maintains central coverage in many situations where competing methods can have very substantial error rate inflation for the lower limit. The BPCP gives asymptotically correct coverage and is asymptotically equivalent to the CI on the Kaplan–Meier estimator using Greenwood’s variance. The BPCP may be inverted to create confidence procedures for a quantile of the underlying survival distribution. Because the BPCP is easy to implement, offers protection in settings when other methods fail, and essentially matches other methods when they succeed, it should be the method of choice.
doi:10.1093/biostatistics/kxt016
PMCID: PMC3769999  PMID: 23632624
Clopper–Pearson confidence interval; Exact confidence interval; Kaplan–Meierestimator; Median survival; Non-parametric methods; Survival analysis
15.  Adjusting for matching and covariates in linear discriminant analysis 
Biostatistics (Oxford, England)  2013;14(4):779-791.
In studies that compare several diagnostic or treatment groups, subjects may not only be measured on a certain set of feature variables, but also be matched on a number of demographic characteristics and measured on additional covariates. Linear discriminant analysis (LDA) is sometimes used to identify which feature variables best discriminate among groups, while accounting for the dependencies among the feature variables. We present a new approach to LDA for multivariate normal data that accounts for the subject matching used in a particular study design, as well as covariates not used in the matching. Applications are given for post-mortem tissue data with the aim of comparing neurobiological characteristics of subjects with schizophrenia with those of normal controls, and for a post-mortem tissue primate study comparing brain biomarker measurements across three treatment groups. We also investigate the performance of our approach using a simulation study.
doi:10.1093/biostatistics/kxt017
PMCID: PMC3770000  PMID: 23640791
Covariates; Linear discriminant analysis; Matched design; Post-mortem tissue studies; Schizophrenia
16.  Sample size requirements for training high-dimensional risk predictors 
Biostatistics (Oxford, England)  2013;14(4):639-652.
A common objective of biomarker studies is to develop a predictor of patient survival outcome. Determining the number of samples required to train a predictor from survival data is important for designing such studies. Existing sample size methods for training studies use parametric models for the high-dimensional data and cannot handle a right-censored dependent variable. We present a new training sample size method that is non-parametric with respect to the high-dimensional vectors, and is developed for a right-censored response. The method can be applied to any prediction algorithm that satisfies a set of conditions. The sample size is chosen so that the expected performance of the predictor is within a user-defined tolerance of optimal. The central method is based on a pilot dataset. To quantify uncertainty, a method to construct a confidence interval for the tolerance is developed. Adequacy of the size of the pilot dataset is discussed. An alternative model-based version of our method for estimating the tolerance when no adequate pilot dataset is available is presented. The model-based method requires a covariance matrix be specified, but we show that the identity covariance matrix provides adequate sample size when the user specifies three key quantities. Application of the sample size method to two microarray datasets is discussed.
doi:10.1093/biostatistics/kxt022
PMCID: PMC3770001  PMID: 23873895
Conditional score; Cox regression; High-dimensional data; Risk prediction; Sample size; Training set
17.  Estimation and selection of complex covariate effects in pooled nested case–control studies with heterogeneity 
Biostatistics (Oxford, England)  2013;14(4):682-694.
A major challenge in cancer epidemiologic studies, especially those of rare cancers, is observing enough cases. To address this, researchers often join forces by bringing multiple studies together to achieve large sample sizes, allowing for increased power in hypothesis testing, and improved efficiency in effect estimation. Combining studies, however, renders the analysis difficult owing to the presence of heterogeneity in the pooled data. In this article, motivated by a collaborative nested case–control (NCC) study of ovarian cancer in three cohorts from United States, Sweden, and Italy, we investigate the use of penalty regularized partial likelihood estimation in the context of pooled NCC studies to achieve two goals. First, we propose an adaptive group lasso (gLASSO) penalized approach to simultaneously identify important variables and estimate their effects. Second, we propose a composite agLASSO penalized approach to identify variables with heterogeneous effects. Both methods are readily implemented with the group coordinate gradient decent algorithm and shown to enjoy the oracle property. We conduct simulation studies to evaluate the performance of our proposed approaches in finite samples under various heterogeneity settings, and apply them to the pooled ovarian cancer study.
doi:10.1093/biostatistics/kxt015
PMCID: PMC3841381  PMID: 23632625
Cox's proportional hazards model; Group penalty; Heterogeneity; Nested case–control sampling; Ovarian cancer; Pooled studies; Shrinkage estimation
18.  Structured variable selection with q-values 
Biostatistics (Oxford, England)  2013;14(4):695-707.
When some of the regressors can act on both the response and other explanatory variables, the already challenging problem of selecting variables when the number of covariates exceeds the sample size becomes more difficult. A motivating example is a metabolic study in mice that has diet groups and gut microbial percentages that may affect changes in multiple phenotypes related to body weight regulation. The data have more variables than observations and diet is known to act directly on the phenotypes as well as on some or potentially all of the microbial percentages. Interest lies in determining which gut microflora influence the phenotypes while accounting for the direct relationship between diet and the other variables A new methodology for variable selection in this context is presented that links the concept of q-values from multiple hypothesis testing to the recently developed weighted Lasso.
doi:10.1093/biostatistics/kxt012
PMCID: PMC3841382  PMID: 23580317
False discovery rate; Microbial data; q-Values; Variable selection; Weighted Lasso
19.  Letter to the editor 
Biostatistics (Oxford, England)  2013;14(4):807-808.
doi:10.1093/biostatistics/kxt013
PMCID: PMC3841383  PMID: 23624212
20.  Joint analysis of differential gene expression in multiple studies using correlation motifs 
The standard methods for detecting differential gene expression are mostly designed for analyzing a single gene expression experiment. When data from multiple related gene expression studies are available, separately analyzing each study is not ideal as it may fail to detect important genes with consistent but relatively weak differential signals in multiple studies. Jointly modeling all data allows one to borrow information across studies to improve the analysis. However, a simple concordance model, in which each gene is assumed to be differential in either all studies or none of the studies, is incapable of handling genes with study-specific differential expression. In contrast, a model that naively enumerates and analyzes all possible differential patterns across studies can deal with study-specificity and allow information pooling, but the complexity of its parameter space grows exponentially as the number of studies increases. Here, we propose a correlation motif approach to address this dilemma. This approach searches for a small number of latent probability vectors called correlation motifs to capture the major correlation patterns among multiple studies. The motifs provide the basis for sharing information among studies and genes. The approach has flexibility to handle all possible study-specific differential patterns. It improves detection of differential expression and overcomes the barrier of exponential model complexity.
doi:10.1093/biostatistics/kxu038
PMCID: PMC4263229  PMID: 25143368
Bayes hierarchical model; Correlation motif; EM algorithm; Microarray; Multiple datasets
21.  Longitudinal scalar-on-functions regression with application to tractography data 
Biostatistics (Oxford, England)  2013;14(3):447-461.
We propose a class of estimation techniques for scalar-on-function regression where both outcomes and functional predictors may be observed at multiple visits. Our methods are motivated by a longitudinal brain diffusion tensor imaging tractography study. One of the study’s primary goals is to evaluate the contemporaneous association between human function and brain imaging over time. The complexity of the study requires the development of methods that can simultaneously incorporate: (1) multiple functional (and scalar) regressors; (2) longitudinal outcome and predictor measurements per patient; (3) Gaussian or non-Gaussian outcomes; and (4) missing values within functional predictors. We propose two versions of a new method, longitudinal functional principal components regression (PCR). These methods extend the well-known functional PCR and allow for different effects of subject-specific trends in curves and of visit-specific deviations from that trend. The new methods are compared with existing approaches, and the most promising techniques are used for analyzing the tractography data.
doi:10.1093/biostatistics/kxs051
PMCID: PMC3677735  PMID: 23292804
Diffusion tensor imaging; Functional principal components; Functional regression; Longitudinal functional principal components regression; Multiple sclerosis; Repeated measurements
22.  Likelihood-based population independent component analysis 
Biostatistics (Oxford, England)  2013;14(3):514-527.
Independent component analysis (ICA) is a widely used technique for blind source separation, used heavily in several scientific research areas including acoustics, electrophysiology, and functional neuroimaging. We propose a scalable two-stage iterative true group ICA methodology for analyzing population level functional magnetic resonance imaging (fMRI) data where the number of subjects is very large. The method is based on likelihood estimators of the underlying source densities and the mixing matrix. As opposed to many commonly used group ICA algorithms, the proposed method does not require significant data reduction by a 2-fold singular value decomposition. In addition, the method can be applied to a large group of subjects since the memory requirements are not restrictive. The performance of our approach is compared with a commonly used group ICA algorithm via simulation studies. Furthermore, the proposed method is applied to a large collection of resting state fMRI datasets. The results show that established brain networks are well recovered by the proposed algorithm.
doi:10.1093/biostatistics/kxs055
PMCID: PMC3677736  PMID: 23314416
Functional MRI; Signal processing; Source separation
23.  Flexible marginalized models for bivariate longitudinal ordinal data 
Biostatistics (Oxford, England)  2013;14(3):462-476.
Random effects models are commonly used to analyze longitudinal categorical data. Marginalized random effects models are a class of models that permit direct estimation of marginal mean parameters and characterize serial correlation for longitudinal categorical data via random effects (Heagerty, 1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55, 688–698; Lee and Daniels, 2008. Marginalized models for longitudinal ordinal data with application to quality of life studies. Statistics in Medicine 27, 4359–4380). In this paper, we propose a Kronecker product (KP) covariance structure to capture the correlation between processes at a given time and the correlation within a process over time (serial correlation) for bivariate longitudinal ordinal data. For the latter, we consider a more general class of models than standard (first-order) autoregressive correlation models, by re-parameterizing the correlation matrix using partial autocorrelations (Daniels and Pourahmadi, 2009). Modeling covariance matrices via partial autocorrelations. Journal of Multivariate Analysis 100, 2352–2363). We assess the reasonableness of the KP structure with a score test. A maximum marginal likelihood estimation method is proposed utilizing a quasi-Newton algorithm with quasi-Monte Carlo integration of the random effects. We examine the effects of demographic factors on metabolic syndrome and C-reactive protein using the proposed models.
doi:10.1093/biostatistics/kxs058
PMCID: PMC3677737  PMID: 23365416
Kronecker product; Metabolic syndrome; Partial autocorrelation
24.  Incorporating parental information into family-based association tests 
Biostatistics (Oxford, England)  2012;14(3):556-572.
Assumptions regarding the true underlying genetic model, or mode of inheritance, are necessary when quantifying genetic associations with disease phenotypes. Here we propose new methods to ascertain the underlying genetic model from parental data in family-based association studies. Specifically, for parental mating-type data, we propose a novel statistic to test whether the underlying genetic model is additive, dominant, or recessive; for parental genotype–phenotype data, we propose three strategies to determine the true mode of inheritance. We illustrate how to incorporate the information gleaned from these strategies into family-based association tests. Because family-based association tests are conducted conditional on parental genotypes, the type I error rate of these procedures is not inflated by the information learned from parental data. This result holds even if such information is weak or when the assumption of Hardy–Weinberg equilibrium is violated. Our simulations demonstrate that incorporating parental data into family-based association tests can improve power under common inheritance models. The application of our proposed methods to a candidate-gene study of type 1 diabetes successfully detects a recessive effect in MGAT5 that would otherwise be missed by conventional family-based association tests.
doi:10.1093/biostatistics/kxs048
PMCID: PMC3732025  PMID: 23266418
Case–parents; Dominant; Mode of inheritance; Nuclear families; Recessive; Robust
25.  Estimating incremental cost-effectiveness ratios and their confidence intervals with different terminating events for survival time and costs 
Biostatistics (Oxford, England)  2013;14(3):422-432.
Cost-effectiveness analysis (CEA) is an important component of the economic evaluation of new treatment options. In many clinical and observational studies of costs, censored data pose challenges to the CEA. We consider a special situation where the terminating events for the survival time and costs are different. Traditional methods for statistical inference offer no means for dealing with censored data in these circumstances. To address this gap, we propose a new method for deriving the confidence interval for the incremental cost-effectiveness ratio. The simulation studies and real data example show that our method performs very well for some practical settings, revealing a great potential for application to actual settings in which terminating events for the survival time and costs differ.
doi:10.1093/biostatistics/kxt002
PMCID: PMC3732026  PMID: 23426525
Censored data; Cost-effectiveness analysis; Different terminating events; Fieller method; Survival analysis

Results 1-25 (206)