PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (855762)

Clipboard (0)
None

Related Articles

1.  Modeling Covariance Matrices via Partial Autocorrelations 
Journal of multivariate analysis  2009;100(10):2352-2363.
Summary
We study the role of partial autocorrelations in the reparameterization and parsimonious modeling of a covariance matrix. The work is motivated by and tries to mimic the phenomenal success of the partial autocorrelations function (PACF) in model formulation, removing the positive-definiteness constraint on the autocorrelation function of a stationary time series and in reparameterizing the stationarity-invertibility domain of ARMA models. It turns out that once an order is fixed among the variables of a general random vector, then the above properties continue to hold and follows from establishing a one-to-one correspondence between a correlation matrix and its associated matrix of partial autocorrelations. Connections between the latter and the parameters of the modified Cholesky decomposition of a covariance matrix are discussed. Graphical tools similar to partial correlograms for model formulation and various priors based on the partial autocorrelations are proposed. We develop frequentist/Bayesian procedures for modelling correlation matrices, illustrate them using a real dataset, and explore their properties via simulations.
doi:10.1016/j.jmva.2009.04.015
PMCID: PMC2748961  PMID: 20161018
Autoregressive parameters; Cholesky decomposition; Positive-definiteness constraint; Levinson-Durbin algorithm; Prediction variances; Uniform and Reference Priors; Markov Chain Monte Carlo
2.  A joint model of longitudinal and competing risks survival data with heterogeneous random effects and outlying longitudinal measurements 
Statistics and its interface  2010;3(2):185-195.
This article proposes a joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model with t-distributed measurement errors for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the survival outcome, and a regression sub-model for the variance-covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. A Bayesian MCMC procedure is developed for parameter estimation and inference. Our method is insensitive to outlying longitudinal measurements in the presence of non-ignorable missing data due to dropout. Moreover, by modeling the variance-covariance matrix of the latent random effects, our model provides a useful framework for handling high-dimensional heterogeneous random effects and testing the homogeneous random effects assumption which is otherwise untestable in commonly used joint models. Finally, our model enables analysis of a survival outcome with intermittently measured time-dependent covariates and possibly correlated competing risks and dependent censoring, as well as joint analysis of the longitudinal and survival outcomes. Illustrations are given using a real data set from a lung study and simulation.
PMCID: PMC3166346  PMID: 21892381
Joint model; Competing risks; Bayesian analysis; Cholesky decomposition; Mixed effects model; MCMC; Modeling random effects covariance matrix; Outlier
3.  Dynamic Conditionally Linear Mixed Models for Longitudinal Data 
Biometrics  2002;58(1):225-231.
Summary
We develop a new class of models, dynamic conditionally linear mixed models, for longitudinal data by decomposing the within-subject covariance matrix using a special Cholesky decomposition. Here ‘dynamic’ means using past responses as covariates and ‘conditional linearity’ means that parameters entering the model linearly may be random, but nonlinear parameters are nonrandom. This setup offers several advantages and is surprisingly similar to models obtained from the first-order linearization method applied to nonlinear mixed models. First, it allows for flexible and computationally tractable models that include a wide array of covariance structures; these structures may depend on covariates and hence may differ across subjects. This class of models includes, e.g., all standard linear mixed models, antedependence models, and Vonesh–Carter models. Second, it guarantees the fitted marginal covariance matrix of the data is positive definite. We develop methods for Bayesian inference and motivate the usefulness of these models using a series of longitudinal depression studies for which the features of these new models are well suited.
PMCID: PMC2755537  PMID: 11890319
Covariance matrix; Heterogeneity; Hierarchical models; Markov chain Monte Carlo; Missing data; Unconstrained parameterization
4.  Bayesian estimation in animal breeding using the Dirichlet process prior for correlated random effects 
In the case of the mixed linear model the random effects are usually assumed to be normally distributed in both the Bayesian and classical frameworks. In this paper, the Dirichlet process prior was used to provide nonparametric Bayesian estimates for correlated random effects. This goal was achieved by providing a Gibbs sampler algorithm that allows these correlated random effects to have a nonparametric prior distribution. A sampling based method is illustrated. This method which is employed by transforming the genetic covariance matrix to an identity matrix so that the random effects are uncorrelated, is an extension of the theory and the results of previous researchers. Also by using Gibbs sampling and data augmentation a simulation procedure was derived for estimating the precision parameter M associated with the Dirichlet process prior. All needed conditional posterior distributions are given. To illustrate the application, data from the Elsenburg Dormer sheep stud were analysed. A total of 3325 weaning weight records from the progeny of 101 sires were used.
doi:10.1186/1297-9686-35-2-137
PMCID: PMC2732692  PMID: 12633530
Bayesian methods; mixed linear model; Dirichlet process prior; correlated random effects; Gibbs sampler
5.  A general joint model for longitudinal measurements and competing risks survival data with heterogeneous random effects 
Lifetime data analysis  2010;17(1):80-100.
This article studies a general joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the competing risks survival data, and a regression sub-model for the variance–covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. The model provides a useful approach to adjust for non-ignorable missing data due to dropout for the longitudinal outcome, enables analysis of the survival outcome with informative censoring and intermittently measured time-dependent covariates, as well as joint analysis of the longitudinal and survival outcomes. Unlike previously studied joint models, our model allows for heterogeneous random covariance matrices. It also offers a framework to assess the homogeneous covariance assumption of existing joint models. A Bayesian MCMC procedure is developed for parameter estimation and inference. Its performances and frequentist properties are investigated using simulations. A real data example is used to illustrate the usefulness of the approach.
doi:10.1007/s10985-010-9169-6
PMCID: PMC3162577  PMID: 20549344
Cause-specific hazard; Bayesian analysis; Cholesky decomposition; Mixed effects model; MCMC; Modeling covariance matrices
6.  Flexible estimation of covariance function by penalized spline with application to longitudinal family data 
Statistics in medicine  2011;30(15):1883-1897.
Longitudinal data are routinely collected in biomedical research studies. A natural model describing longitudinal data decomposes an individual’s outcome as the sum of a population mean function and random subject-specific deviations. When parametric assumptions are too restrictive, methods modeling the population mean function and the random subject-specific functions nonparametrically are in demand. In some applications, it is desirable to estimate a covariance function of random subject-specific deviations. In this work, flexible yet computationally efficient methods are developed for a general class of semiparametric mixed effects models, where the functional forms of the population mean and the subject-specific curves are unspecified. We estimate nonparametric components of the model by penalized spline (P-spline, [1]), and reparametrize the random curve covariance function by a modified Cholesky decomposition [2] which allows for unconstrained estimation of a positive semidefinite matrix. To provide smooth estimates, we penalize roughness of fitted curves and derive closed form solutions in the maximization step of an EM algorithm. In addition, we present models and methods for longitudinal family data where subjects in a family are correlated and we decompose the covariance function into a subject-level source and observation-level source. We apply these methods to the multi-level Framingham Heart Study data to estimate age-specific heritability of systolic blood pressure (SBP) nonparametrically.
doi:10.1002/sim.4236
PMCID: PMC3115522  PMID: 21491474
Multi-level functional data; Cholesky decomposition; Age-specific heritability; Framingham Heart Study
7.  A Bayesian approach to functional-based multilevel modeling of longitudinal data: applications to environmental epidemiology 
Biostatistics (Oxford, England)  2008;9(4):686-699.
Flexible multilevel models are proposed to allow for cluster-specific smooth estimation of growth curves in a mixed-effects modeling format that includes subject-specific random effects on the growth parameters. Attention is then focused on models that examine between-cluster comparisons of the effects of an ecologic covariate of interest (e.g. air pollution) on nonlinear functionals of growth curves (e.g. maximum rate of growth). A Gibbs sampling approach is used to get posterior mean estimates of nonlinear functionals along with their uncertainty estimates. A second-stage ecologic random-effects model is used to examine the association between a covariate of interest (e.g. air pollution) and the nonlinear functionals. A unified estimation procedure is presented along with its computational and theoretical details. The models are motivated by, and illustrated with, lung function and air pollution data from the Southern California Children's Health Study.
doi:10.1093/biostatistics/kxm059
PMCID: PMC2733176  PMID: 18349036
Air pollution; Correlated data; Growth curves; Mixed-effects; Splines
8.  Covariate-Adjusted Linear Mixed Effects Model with an Application to Longitudinal Data 
Linear mixed effects (LME) models are useful for longitudinal data/repeated measurements. We propose a new class of covariate-adjusted LME models for longitudinal data that nonparametrically adjusts for a normalizing covariate. The proposed approach involves fitting a parametric LME model to the data after adjusting for the nonparametric effects of a baseline confounding covariate. In particular, the effect of the observable covariate on the response and predictors of the LME model is modeled nonparametrically via smooth unknown functions. In addition to covariate-adjusted estimation of fixed/population parameters and random effects, an estimation procedure for the variance components is also developed. Numerical properties of the proposed estimators are investigated with simulation studies. The consistency and convergence rates of the proposed estimators are also established. An application to a longitudinal data set on calcium absorption, accounting for baseline distortion from body mass index, illustrates the proposed methodology.
doi:10.1080/10485250802226435
PMCID: PMC2650843  PMID: 19266053
Binning; Covariance structure; Covariate-adjusted regression (CAR); Longitudinal data; Mixed model; Multiplicative effect; Varying coefficient models
9.  Time-Varying Latent Effect Model for Longitudinal Data with Informative Observation Times 
Biometrics  2012;68(4):1093-1102.
Summary
In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject-specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random-effect model and assume a time-varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance-covariance matrix that has a closed form and can be consistently estimated by the usual plug-in method. One additional advantage of the procedure is that, it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time-varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology.
doi:10.1111/j.1541-0420.2012.01794.x
PMCID: PMC3543780  PMID: 23025338
Estimating equation method; Informative observation times; Longitudinal data analysis; Time-varying effect
10.  A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates 
Statistics in Medicine  2010;29(29):3017-3029.
Missing covariate data is common in observational studies of time to an event, especially when covariates are repeatedly measured over time. Failure to account for the missing data can lead to bias or loss of efficiency, especially when the data are non-ignorably missing. Previous work has focused on the case of fixed covariates rather than those that are repeatedly measured over the follow-up period, so here we present a selection model that allows for proportional hazards regression with time-varying covariates when some covariates may be non-ignorably missing. We develop a fully Bayesian model and obtain posterior estimates of the parameters via the Gibbs sampler in WinBUGS. We illustrate our model with an analysis of post-diagnosis weight change and survival after breast cancer diagnosis in the Long Island Breast Cancer Study Project (LIBCSP) follow-up study. Our results indicate that post-diagnosis weight gain is associated with lower all-cause and breast cancer specific survival among women diagnosed with new primary breast cancer. Our sensitivity analysis showed only slight differences between models with different assumptions on the missing data mechanism yet the complete case analysis yielded markedly different results.
doi:10.1002/sim.4076
PMCID: PMC3253577  PMID: 20960582
proportional hazards regression; non-ignorably missing data; missing covariates; selection model
11.  Sparse Bayesian infinite factor models 
Biometrika  2011;98(2):291-306.
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data.
doi:10.1093/biomet/asr013
PMCID: PMC3419391  PMID: 23049129
Adaptive Gibbs sampling; Factor analysis; High-dimensional data; Multiplicative gamma process; Parameter expansion; Regularization; Shrinkage
12.  Influence of priors in Bayesian estimation of genetic parameters for multivariate threshold models using Gibbs sampling 
Simulated data were used to investigate the influence of the choice of priors on estimation of genetic parameters in multivariate threshold models using Gibbs sampling. We simulated additive values, residuals and fixed effects for one continuous trait and liabilities of four binary traits, and QTL effects for one of the liabilities. Within each of four replicates six different datasets were generated which resembled different practical scenarios in horses with respect to number and distribution of animals with trait records and availability of QTL information. (Co)Variance components were estimated using a Bayesian threshold animal model via Gibbs sampling. The Gibbs sampler was implemented with both a flat and a proper prior for the genetic covariance matrix. Convergence problems were encountered in > 50% of flat prior analyses, with indications of potential or near posterior impropriety between about round 10 000 and 100 000. Terminations due to non-positive definite genetic covariance matrix occurred in flat prior analyses of the smallest datasets. Use of a proper prior resulted in improved mixing and convergence of the Gibbs chain. In order to avoid (near) impropriety of posteriors and extremely poorly mixing Gibbs chains, a proper prior should be used for the genetic covariance matrix when implementing the Gibbs sampler.
doi:10.1186/1297-9686-39-2-123
PMCID: PMC2682833  PMID: 17306197
Gibbs sampling; multivariate threshold model; covariance estimates; flat prior; proper prior
13.  Sparse estimation of a covariance matrix 
Biometrika  2011;98(4):807-820.
Summary
We suggest a method for estimating a covariance matrix on the basis of a sample of vectors drawn from a multivariate normal distribution. In particular, we penalize the likelihood with a lasso penalty on the entries of the covariance matrix. This penalty plays two important roles: it reduces the effective number of parameters, which is important even when the dimension of the vectors is smaller than the sample size since the number of parameters grows quadratically in the number of variables, and it produces an estimate which is sparse. In contrast to sparse inverse covariance estimation, our method’s close relative, the sparsity attained here is in the covariance matrix itself rather than in the inverse matrix. Zeros in the covariance matrix correspond to marginal independencies; thus, our method performs model selection while providing a positive definite estimate of the covariance. The proposed penalized maximum likelihood problem is not convex, so we use a majorize-minimize approach in which we iteratively solve convex approximations to the original nonconvex problem. We discuss tuning parameter selection and demonstrate on a flow-cytometry dataset how our method produces an interpretable graphical display of the relationship between variables. We perform simulations that suggest that simple elementwise thresholding of the empirical covariance matrix is competitive with our method for identifying the sparsity structure. Additionally, we show how our method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified.
doi:10.1093/biomet/asr054
PMCID: PMC3413177  PMID: 23049130
Concave-convex procedure; Covariance graph; Covariance matrix; Generalized gradient descent; Lasso; Majorization-minimization; Regularization; Sparsity
14.  Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling 
A fully Bayesian analysis using Gibbs sampling and data augmentation in a multivariate model of Gaussian, right censored, and grouped Gaussian traits is described. The grouped Gaussian traits are either ordered categorical traits (with more than two categories) or binary traits, where the grouping is determined via thresholds on the underlying Gaussian scale, the liability scale. Allowances are made for unequal models, unknown covariance matrices and missing data. Having outlined the theory, strategies for implementation are reviewed. These include joint sampling of location parameters; efficient sampling from the fully conditional posterior distribution of augmented data, a multivariate truncated normal distribution; and sampling from the conditional inverse Wishart distribution, the fully conditional posterior distribution of the residual covariance matrix. Finally, a simulated dataset was analysed to illustrate the methodology. This paper concentrates on a model where residuals associated with liabilities of the binary traits are assumed to be independent. A Bayesian analysis using Gibbs sampling is outlined for the model where this assumption is relaxed.
doi:10.1186/1297-9686-35-2-159
PMCID: PMC2732693  PMID: 12633531
categorical; Gaussian; multivariate Bayesian analysis; right censored Gaussian
15.  Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data 
Biometrics  2009;66(1):70-78.
SUMMARY
We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.
doi:10.1111/j.1541-0420.2009.01227.x
PMCID: PMC3081790  PMID: 19432777
Dirichlet process prior; Identifiability; Postprocessing; Random effects; Smoothing spline; Uniform shrinkage prior; Variance components
16.  Joint Modeling of Survival Time and Longitudinal Data with Subject-specific Changepoints in the Covariates 
Statistics in medicine  2010;30(3):232-249.
Joint models are frequently used in survival analysis to assess the relationship between time-to-event data and time-dependent covariates, which are measured longitudinally but often with errors. Routinely, a linear mixed-effects model is used to describe the longitudinal data process, while the survival times are assumed to follow the proportional hazards model. However, in some practical situations, individual covariate profiles may contain changepoints. In this article, we assume a two-phase polynomial random effects with subject-specific changepoint model for the longitudinal data process and the proportional hazards model for the survival times. Our main interest is in the estimation of the parameter in the hazards model. We incorporate a smooth transition function into the changepoint model for the longitudinal data and develop the corrected score and conditional score estimators, which do not require any assumption regarding the underlying distribution of the random effects or that of the changepoints. The estimators are shown to be asymptotically equivalent and their finite-sample performance is examined via simulations. The methods are applied to AIDS clinical trial data.
doi:10.1002/sim.4107
PMCID: PMC3059268  PMID: 21213341
Changepoint; Conditional score; Corrected score; Measurement error; Random effects; Proportional hazards
17.  Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci 
Biometrics  2009;65(4):1068-1077.
Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.’s (2006a) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L2 penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology and health sciences.
doi:10.1111/j.1541-0420.2009.01222.x
PMCID: PMC2987658  PMID: 19302406
Functional Mapping; Quantitative Trait Loci; Covariance Estimation; Longitudinal Data; Multivariate Normal Mixture
18.  Equivalence of multibreed animal models and hierarchical Bayes analysis for maternally influenced traits 
Background
It has been argued that multibreed animal models should include a heterogeneous covariance structure. However, the estimation of the (co)variance components is not an easy task, because these parameters can not be factored out from the inverse of the additive genetic covariance matrix. An alternative model, based on the decomposition of the genetic covariance matrix by source of variability, provides a much simpler formulation. In this study, we formalize the equivalence between this alternative model and the one derived from the quantitative genetic theory. Further, we extend the model to include maternal effects and, in order to estimate the (co)variance components, we describe a hierarchical Bayes implementation. Finally, we implement the model to weaning weight data from an Angus × Hereford crossbred experiment.
Methods
Our argument is based on redefining the vectors of breeding values by breed origin such that they do not include individuals with null contributions. Next, we define matrices that retrieve the null-row and the null-column pattern and, by means of appropriate algebraic operations, we demonstrate the equivalence. The extension to include maternal effects and the estimation of the (co)variance components through the hierarchical Bayes analysis are then straightforward. A FORTRAN 90 Gibbs sampler was specifically programmed and executed to estimate the (co)variance components of the Angus × Hereford population.
Results
In general, genetic (co)variance components showed marginal posterior densities with a high degree of symmetry, except for the segregation components. Angus and Hereford breeds contributed with 50.26% and 41.73% of the total direct additive variance, and with 23.59% and 59.65% of the total maternal additive variance. In turn, the contribution of the segregation variance was not significant in either case, which suggests that the allelic frequencies in the two parental breeds were similar.
Conclusion
The multibreed maternal animal model introduced in this study simplifies the problem of estimating (co)variance components in the framework of a hierarchical Bayes analysis. Using this approach, we obtained for the first time estimates of the full set of genetic (co)variance components. It would be interesting to assess the performance of the procedure with field data, especially when interbreed information is limited.
doi:10.1186/1297-9686-42-20
PMCID: PMC2909157  PMID: 20540758
19.  Simultaneous Bayesian inference for skew-normal semiparametric nonlinear mixed-effects models with covariate measurement errors 
Bayesian analysis (Online)  2011;7(1):189-210.
Longitudinal data arise frequently in medical studies and it is a common practice to analyze such complex data with nonlinear mixed-effects (NLME) models which enable us to account for between-subject and within-subject variations. To partially explain the variations, covariates are usually introduced to these models. Some covariates, however, may be often measured with substantial errors. It is often the case that model random error is assumed to be distributed normally, but the normality assumption may not always give robust and reliable results, particularly if the data exhibit skewness. Although there has been considerable interest in accommodating either skewness or covariate measurement error in the literature, there is relatively little work that considers both features simultaneously. In this article, our objectives are to address simultaneous impact of skewness and covariate measurement error by jointly modeling the response and covariate processes under a general framework of Bayesian semiparametric nonlinear mixed-effects models. The method is illustrated in an AIDS data example to compare potential models which have different distributional specifications. The findings from this study suggest that the models with a skew-normal distribution may provide more reasonable results if the data exhibit skewness and/or have measurement errors in covariates.
doi:10.1214/12-BA706
PMCID: PMC3584628  PMID: 23459161
Bayesian approach; Covariate measurement errors; HIV/AIDS; Joint models; Longitudinal data; Semiparametric nonlinear mixed-effects models; Skew-normal distribution
20.  VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA 
Statistica Sinica  2010;20(1):149-165.
We consider the variable selection problem for a class of statistical models with missing data, including missing covariate and/or response data. We investigate the smoothly clipped absolute deviation penalty (SCAD) and adaptive LASSO and propose a unified model selection and estimation procedure for use in the presence of missing data. We develop a computationally attractive algorithm for simultaneously optimizing the penalized likelihood function and estimating the penalty parameters. Particularly, we propose to use a model selection criterion, called the ICQ statistic, for selecting the penalty parameters. We show that the variable selection procedure based on ICQ automatically and consistently selects the important covariates and leads to efficient estimates with oracle properties. The methodology is very general and can be applied to numerous situations involving missing data, from covariates missing at random in arbitrary regression models to nonignorably missing longitudinal responses and/or covariates. Simulations are given to demonstrate the methodology and examine the finite sample performance of the variable selection procedures. Melanoma data from a cancer clinical trial is presented to illustrate the proposed methodology.
PMCID: PMC2844735  PMID: 20336190
EM algorithm; ICQ; missing data; penalized likelihood; variable selection
21.  Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination 
Bayesian analysis (Online)  2011;6(2):329-352.
We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a ‘sparsity prior’ representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate’s relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.
doi:10.1214/11-BA612
PMCID: PMC3121559  PMID: 21709771
Bayesian mixture models; Bayesian nonparametric priors; variable selection; unsupervised learning
22.  Analysis of rabbit doe longevity using a semiparametric log-Normal animal frailty model with time-dependent covariates 
Data on doe longevity in a rabbit population were analysed using a semiparametric log-Normal animal frailty model. Longevity was defined as the time from the first positive pregnancy test to death or culling due to pathological problems. Does culled for other reasons had right censored records of longevity. The model included time dependent covariates associated with year by season, the interaction between physiological state and the number of young born alive, and between order of positive pregnancy test and physiological state. The model also included an additive genetic effect and a residual in log frailty. Properties of marginal posterior distributions of specific parameters were inferred from a full Bayesian analysis using Gibbs sampling. All of the fully conditional posterior distributions defining a Gibbs sampler were easy to sample from, either directly or using adaptive rejection sampling. The marginal posterior mean estimates of the additive genetic variance and of the residual variance in log frailty were 0.247 and 0.690.
doi:10.1186/1297-9686-38-3-281
PMCID: PMC2689286  PMID: 16635450
longevity in rabbit does; semiparametric log-Normal frailty model; survival analysis; time dependent covariates
23.  Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources 
Statistics in medicine  2005;24(11):1725-1744.
We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which parsimonious models for the covariance can be obtained for longitudinal multiple source data. The methods are illustrated with an example of multiple informant data arising from a longitudinal interventional trial in psychiatry.
doi:10.1002/sim.2056
PMCID: PMC1618794  PMID: 15726666
Covariance modelling; mixed-effects models; multiple informants; psychiatry; repeated measures
24.  Semiparametric Approach to a Random Effects Quantile Regression Model 
We consider a random effects quantile regression analysis of clustered data and propose a semiparametric approach using empirical likelihood. The random regression coefficients are assumed independent with a common mean, following parametrically specified distributions. The common mean corresponds to the population-average effects of explanatory variables on the conditional quantile of interest, while the random coefficients represent cluster specific deviations in the covariate effects. We formulate the estimation of the random coefficients as an estimating equations problem and use empirical likelihood to incorporate the parametric likelihood of the random coefficients. A likelihood-like statistical criterion function is yield, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates its maximizer as a natural estimator. We use Markov Chain Monte Carlo (MCMC) samplers in the Bayesian framework, and propose the resulting quasi-posterior mean as an estimator. We show that the proposed estimator of the population-level parameter is asymptotically normal and the estimators of the random coefficients are shrunk toward the population-level parameter in the first order asymptotic sense. These asymptotic results do not require Gaussian random effects, and the empirical likelihood based likelihood-like criterion function is free of parameters related to the error densities. This makes the proposed approach both flexible and computationally simple. We illustrate the methodology with two real data examples.
doi:10.1198/jasa.2011.tm10470.
PMCID: PMC3280824  PMID: 22347760
Empirical likelihood; Markov Chain Monte Carlo; Quasi-posterior distribution
25.  Estimating dementia-free life expectancy for Parkinson's patients using Bayesian inference and microsimulation 
Biostatistics (Oxford, England)  2009;10(4):729-743.
Interval-censored longitudinal data taken from a Norwegian study of individuals with Parkinson's disease are investigated with respect to the onset of dementia. Of interest are risk factors for dementia and the subdivision of total life expectancy (LE) into LE with and without dementia. To estimate LEs using extrapolation, a parametric continuous-time 3-state illness–death Markov model is presented in a Bayesian framework. The framework is well suited to allow for heterogeneity via random effects and to investigate additional computation using model parameters. In the estimation of LEs, microsimulation is used to take into account random effects. Intensities of moving between the states are allowed to change in a piecewise-constant fashion by linking them to age as a time-dependent covariate. Possible right censoring at the end of the follow-up can be incorporated. The model is applicable in many situations where individuals are followed over a long time period. In describing how a disease develops over time, the model can help to predict future need for health care.
doi:10.1093/biostatistics/kxp027
PMCID: PMC2742499  PMID: 19648228
Dementia; Life expectancy; Microsimulation; Multistate model; Random effects; Right censoring; Survival

Results 1-25 (855762)